This is a nice & short explanation of the concept of ‘grounding’ of large language models. Grounding is important to reduce AI hallucinations
I need a place to point to, so let's talk about grounding in generative AI. Plain old generative AI makes predictions about what concepts are likely to follow each other given a prompt, and then gives the prompter the most likely string of concepts for that prompt. A is followed by B, which is followed by C. Based on the training data, it's unlikely that the next in line will be E. Rather it's likely it'll be D. This you get "A, B, C, D" for a prompt like "what are the first 4 letters of the Latin alphabet." It doesn't "know" that; it just knows that there's a 98.7% chance that D is the letter following C, and only 0.7% chance E is. If the likelihood of D or E following C are close to each other, the LLM might hallucinate and tell you the first four letters of the Latin alphabet are A, B, C, E. This can be caused by a bunch of things, but it's usually because of ambiguity in or staleness of the training data. You can ramp up the quality of your training data to solve this, but usually that's not feasible for one reason or other. However there's a quick hack (?) to decrease the chance of hallucinations: grounding. We could add another data source to our system, say elementary school text book that actually contains the whole Latin alphabet, and essentially verify the output of the LLM against that source. Now the output of the LLM is grounded.