HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: dramatist

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY 4.0
arXiv:2402.01769v1 [cs.CL] 01 Feb 2024

Redefining ”Hallucination” in LLMs: Towards a psychology-informed framework for mitigating misinformation

Elijah Berberette    Jack Hutchins    Amir Sadovnik
Abstract

In recent years, large language models (LLMs) have become incredibly popular, with ChatGPT for example being used by over a billion users. While these models exhibit remarkable language understanding and logical prowess, a notable challenge surfaces in the form of ”hallucinations.” This phenomenon results in LLMs outputting misinformation in a confident manner, which can lead to devastating consequences with such a large user base. However, we question the appropriateness of the term ”hallucination” in LLMs, proposing a psychological taxonomy based on cognitive biases and other psychological phenomena. Our approach offers a more fine-grained understanding of this phenomenon, allowing for targeted solutions. By leveraging insights from how humans internally resolve similar challenges, we aim to develop strategies to mitigate LLM hallucinations. This interdisciplinary approach seeks to move beyond conventional terminology, providing a nuanced understanding and actionable pathways for improvement in LLM reliability.

Large Language Models, Hallucinations

1 Introduction

Recent breakthroughs in large language models (LLMs) have propelled the widespread adoption of conversational AI across diverse applications. Exemplified by LLMs such as ChatGPT, GPT-4, and BARD, these models have demonstrated remarkable proficiency in language comprehension (Xiao et al., 2023) and logical reasoning (Luo et al., 2023). Notably, they have consistently exhibited the ability to surpass the Turing Test (Dodig-Crnkovic, 2023), marking a significant leap forward in the field. Amidst this success, a critical challenge has emerged—hallucinations.

The definition of the term “hallucination” varies amongst authors of prior work. Some authors define “hallucination” from a real-world perspective. For example, Alkaissi et al. define this term generally as “generating seemingly realistic sensory experiences that do not correspond to any real-world input” (Alkaissi & McFarlane, 2023). Many authors simply describe this as an unfactual statement that is not present in training data (Lemley et al., 2023). Other authors seek to separate this umbrella term into multiple sub-terms that each explain different undesired outputs from LLMs (Zhang et al., 2023).

The issue with hallucinations in LLMs is they often appear correct to someone not familiar with the subject area they are asking about. Often, LLMs will answer in a confident way or even explain logical steps of how they arrived at the answer even when that answer even when they are incorrect. With how wide the user base of conversational LLMs is, it’s expected that most of the users will not be educated about what hallucinations are. Figure 1 shows a conversation depicting an example of what a hallucination looks like when interacting with an LLM. At first glance, this response that the LLM produces seems convincing and logical, especially to someone who is not familiar with calculus. However, through further investigation, one can identify the invalidity in the confident answer given. This incorrect answer stems from ChatGPT-3.5 interpreting ”two times” the derivative as taking the second derivative of the function twice instead of the multiplication of the constant 2 and the derivative of the function 3x2.3superscript𝑥2{3x^{2}}.3 italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Refer to caption
Figure 1: Conversation with ChatGPT that contains a hallucination. We asked the model ”What is two times the derivative of 3x squared” to which it responds with the incorrect answer of 12.

While ”hallucination” has become the dominant term for this phenomenon both in academia and media, hallucinations in humans have a much different definition. The NHS defines hallucinations as ”when you hear, see, smell, taste, or feel things that appear to be real but only exist in your mind” (NHS, 2023). This prompts a reflection on the appropriateness of categorizing all of these phenomena as ”hallucinations,” as it can be misleading to classify them as such. To rectify this, our paper attempts to align these occurrences more accurately with psychological phenomena that parallel the ”hallucination” phenomena observed in LLMs.

In this work, our foremost goal is to provide a future direction for mitigating ”hallucinations.” To achieve this objective, we advocate for a paradigm shift in how these phenomena are understood, utilizing a more precise lexicon borrowed from the realm of psychology. Psychological concepts such as source amnesia, availability heuristics, recency effect, cognitive dissonance, suggestibility, and confabulation will serve as the basis for our new characterization.

Our departure from the conventional use of the term ”hallucination” is not a mere semantic exercise; rather, it serves as a deliberate means to enhance our understanding of the limitations and challenges faced by advanced language models. By grounding our discussion in specific psychological constructs, we seek to shed light on these phenomena in language models, paving the way for the development of targeted solutions for different types of ”hallucinations.”

2 Previous Work

Within the current body of available literature, many authors define the term “hallucination” as “generated content that is nonsensical or unfaithful to the provided source content” (Lu et al., 2018). It is also very common for authors to separate this term into two separate definitions, intrinsic hallucinations and extrinsic hallucinations (Ji et al., 2022). Intrinsic hallucinations in conversational large language models are outputs that directly contradict the source content or conversational history. Meanwhile, extrinsic hallucinations are outputs that cannot be proven or disproven based on the source content or conversational history (Ji et al., 2022). Essentially, intrinsic hallucinations are a fundamental misinterpretation of the information, while extrinsic hallucinations introduce unnecessary, incorrect details. It is important to note that intrinsic and extrinsic hallucinations are not mutually exclusive, meaning they can occur at the same time in the same output (Zhou et al., 2020). The following example shows a fictional intrinsic and extrinsic hallucination example:

{drama}\Character

Useruser \CharacterCorrect Translationbot \CharacterIntrinsic Translationintrinsic \CharacterExtrinsic Translationextrinsic

\userspeaks

: “Please translate the following text: Un niño saltó un arroyo para llegar al otro lado.”

\botspeaks

: A boy jumped over a creek to get to the other side. \intrinsicspeaks: A girl jumped over a river to get to the other side. \extrinsicspeaks: A boy jumped over a large creek to get to the other side.

The example depicts an intrinsic and extrinsic hallucination arising during translation. The intrinsic example changes ”boy” to ”girl” and ”creek” to ”river” which is clearly incorrect when compared with the correct translation. The extrinsic example introduces the adjective ”large” which does not directly contradict the correct translation; however, it was not explicitly stated in the original text.

Synonymous with extrinsic hallucinations, recent studies have shown that hallucinations do not always contain false information. Zhang et al splits the polysemous term “hallucination” into three separate subterms that each capture a different type of “hallucination” (Zhang et al., 2023). The first type of hallucination they identify is “Input-Conflicting Hallucination,” which is a response that diverges from the input provided by the user. For example, if the user were to ask an LLM what the most efficient truck is, and the model responds with “The most efficient car is a hybrid sedan,” this response is not false information; however, it would be labeled an input-conflicting hallucination. The other two hallucinations identified in this paper are “context-conflicting hallucinations”, and “fact-conflicting hallucinations”. Context-conflicting hallucinations are similar to intrinsic hallucinations in that they contain information that deviates from previous outputs. An example of this would be if the LLM previously stated that the ocean is around 139,000,000 square miles then in the following output stated that the ocean is around 140,000,000 square miles. Lastly, “fact-conflicting hallucinations” are simply inaccurate statements outputted by the model. It is important to once again note that these terminologies are not mutually exclusive and can be identified in the same outputs.

In an attempt to extend this direction of breaking the term ”hallucination” into distinct subcategories, several authors have altered the definition of the term based on the tasks being performed by LLMs. These include tasks such as abstractive summarization (Zhao et al., 2020) and language translation (Ji et al., 2022). In models used for summarizing tables, authors found it beneficial to further label this phenomena based on the material that was being hallucinated to assist with mitigation strategies of inaccurate output by the model (Zhao et al., 2020). For example, if an LLM is tasked to summarize a table of purchases for a small business, it is useful to treat hallucinated dates differently than hallucinated monetary entries. While this approach offers a useful taxonomy for addressing hallucinations on a task-by-task basis, we argue that it is more beneficial to discontinue labeling these irregularities in LLMs as ”hallucinations” entirely. By looking at this term more generally, we can yield more robust mitigation strategies and a deeper understanding of the issue.

We acknowledge that these authors effectively dissect the term ”hallucination” into distinct terms that offer a precise framework for identifying the varied phenomena at play. Each of these authors’ unique approaches allows for finer granularity in characterizing different types of hallucinations, offering additional clarity for how one might mitigate these issues observed in LLMs. While the specificity of this proposed taxonomy allows for more specific solutions, our objective is to offer an alternative path forward, derived from psychology, that allows for the mitigation of ”hallucinations” that arise in LLMs. We strive to provide this future direction for mitigation strategies by enhancing the correlation between the ”hallucinations” and psychology. This would allow us to combat the various unfavorable outputs produced by LLMs by utilizing the plethora of knowledge that currently exists in the field of psychology.

One step in this alternative direction is shown in a recent article where the authors attempt to redefine the word “hallucination” by using more accurate terminology that aligns with other fields like neuroscience and psychology (Smith et al., 2023). In this article, Smith et al. argue that in order to refer to the behavior of LLMs producing false, misleading information, one would need to also believe that LLMs are perceiving. The authors further argue that a better term for this phenomenon that occurs is “confabulation” which is a medical disorder where patients produce false memories without attempting to deceive the individual they are speaking with (Wiggins & Bunin, 2023). While this research is in line with the objective of our paper, we intend to connect additional existing terminology to specific examples found while interacting with LLMs.

3 The Issue with the term ”Hallucination”

In humans, hallucinations refer to perceptual experiences that occur in the absence of external stimuli. These experiences can manifest in various sensory modalities, including visual, auditory, tactile, olfactory, or gustatory sensations (Asaad & Shapiro, 1986). Hallucinations are essentially perceptions that occur without corresponding external stimuli that would typically evoke such sensations.

Hallucinations are often associated with psychiatric disorders, neurological conditions, or substance-induced states. For example, individuals with schizophrenia may experience auditory hallucinations, hearing voices that others do not hear (Waters et al., 2012). Similarly, hallucinations can occur as a result of conditions like epilepsy (Elliott et al., 2009), migraines (Schott, 2007), or drug intoxication (Manford & Andermann, 1998).

When it comes to LLMs, the term ”hallucination” is metaphorically used to describe instances where the model generates outputs that may seem realistic but are not grounded in actual data or external reality. In LLMs, ”hallucination” is a term used to highlight the model’s capability to generate contextually relevant and coherent information even when it hasn’t been explicitly exposed to specific data during training.

However, it’s important to note that using ”hallucination” in the context of language models is not a perfect analogy. Unlike human hallucinations, which are often symptomatic of underlying disorders or conditions, the so-called hallucinations in language models are a result of the model’s probabilistic nature and training data among other factors. Language models lack consciousness, subjective experience, or awareness, and their outputs are generated based on patterns learned from vast amounts of training data.

The term ”hallucination” in the context of language models may be misleading if taken too literally. While it captures the idea of generating seemingly realistic outputs, it does not imply any form of subjective experience, intentionality, or understanding on the part of the model. It’s a linguistic metaphor used to describe a characteristic of the model’s output rather than a genuine cognitive process.

4 Psychology-Informed Taxonomy

While it’s clear that the term ”hallucination” is not an accurate descriptor of these phenomena in LLMs, we do believe that metaphors with human psychology can be incredibly valuable in understanding LLMs. Smith et al. already proposed the use of ”confabulation” instead of ”hallucination,” (Smith et al., 2023) which we believe more accurately captures the essence of these phenomena in general terms. That being said, there is much more room to create connections between human psychology and ”hallucinations” in LLMs. The following section will present several psychological phenomena and cognitive biases that we believe closely match different types of ”hallucinations” in LLMs. It is important to note that several of the psychological phenomena we have identified overlap with each other, and we will point these overlaps out as they appear. Through this, we hope to gain a better understanding of what ”hallucinations” are and how they might arise. An overview of these phenomena can be seen in Figure 2.

Refer to caption
Figure 2: An overview of psychological phenomena and cognitive biases in humans and their parallel in LLMs

4.1 Source Amnesia

Source amnesia refers to the phenomenon where individuals have difficulty remembering the origin or source of a particular memory, idea, or piece of information, and it is one of the 7 sins of memory identified by Schacter (Schacter et al., 1984; Schacter & Dodson, 2001). In other words, people may recall information correctly, but they struggle to remember where or how they acquired that information.

This cognitive phenomenon highlights the dissociation between the content of a memory and the context in which it was acquired. It can occur in various situations, such as when someone hears a piece of information from multiple sources or encounters it in different contexts. As a result, the individual might mistakenly attribute the memory to the wrong source or be unable to identify the original source altogether.

Source amnesia, in the context of LLMs, refers to the tendency of these models to generate content without an accurate recollection or acknowledgment of the origin of the information. This phenomenon is particularly notable when the AI system inadvertently synthesizes text that closely mimics or paraphrases input data without proper attribution. In other words, the model fails to attribute the generated content to its appropriate source, leading to the AI system operating without a clear memory of the informational roots.

For a simple example, see Figure 3, which shows a conversation with LLaMA-2 7B (Touvron et al., 2023). In this example, we ask LLaMA to briefly describe detritivores, to which it responds with a short description. Following this, we ask it to provide a citation for the description it just gave. This results in LLaMA giving us a non-existent paper. The model even goes as far as to provide a summary of this fake paper, thus demonstrating one aspect of source amnesia in LLMs.

Refer to caption
Figure 3: Conversation with LLaMA-2 7B where we ask it to describe detritivores. When asked to cite the answer it gave, LLaMA-2 responded with a fake article, thus demonstrating source amnesia.

Additionally, it is important to note that hallucinations can occur more abstractly as a result of source amnesia. Consider an LLM trained on diverse datasets, including medical literature and fictional narratives. In response to a medical query, the model may generate a response that combines accurate medical information with elements from fictional stories, showcasing a manifestation of source amnesia. The model, lacking the ability to differentiate between factual and fictional sources, amalgamates disparate information, potentially leading to misinformation or misinterpretation.

4.2 Recency Effect

The recency effect is a cognitive phenomenon wherein individuals tend to better recall and give greater importance to information or events that occurred more recently (Baddeley & Hitch, 1993). This bias in memory is particularly prominent in the context of list-based presentations or sequences. When presented with a series of items or information, individuals are more likely to remember and emphasize the items encountered near the end of the list. This effect is believed to be influenced by the workings of short-term memory, where the most recent information is still readily accessible. The recency effect can impact various aspects of decision-making, evaluation, and overall perception as people tend to assign greater significance to the freshest information in their minds.

Many LLMs, including ChatGPT and GPT-4 include reinforcement learning with human feedback (RLHF) (OpenAI, 2023). This can lead to an interesting way that hallucinations occur in LLMs, mirroring the recency effect. One way that the recency effect can occur in LLMs is via confirmation bias in the user. Confirmation bias is the idea that people will prefer information that corresponds to their beliefs when compared with information that rejects their beliefs (Klayman, 1995). Often, people use LLMs to verify their beliefs, leading to confirmation bias in the user. Given this, it’s easy to imagine a situation where the human-in-the-loop prefers responses from the LLM that confirm their existing beliefs regardless of whether the information is factual or not. Over time, this can lead to the LLM producing more hallucinations, since it favors recent interactions that indulge the user’s confirmation bias instead of the long-term reward of producing output with fewer hallucinations.

This idea is not just hypothetical, many people have hypothesized that ChatGPT has become ”dumber” over time as pointed out in articles from New York Magazine (Herrman, 2023) and DW (Abid, 2023). After much speculation, Chen et al. evaluated the performance of ChatGPT on varying tasks in March 2023 and then again in June 2023 (Chen et al., 2023). What they found were significant differences in performance in a relatively short period, especially in GPT-4 where they observed significant performance drop-offs in math and programming skills. One example the authors present is asking GPT-4 ”Is 17077 a prime number? Think step by step and then answer ”[Yes]” or ”[No]”. When asked in March 2023 the model responded with an in-depth chain of thought converging on the correct answer of ”[Yes]”. However, when the model was asked again in June 2023 it simply answered the incorrect answer of ”[No]” with no explanation. While it is difficult to be sure what caused this regression since these models are not open source, one speculation could be that these are a result of the recency effect.

4.3 Availability Heuristics

The availability heuristic is a cognitive bias that influences decision-making and judgment based on the ease with which specific information comes to mind (Schwarz et al., 1991). Essentially, individuals tend to overestimate the importance or likelihood of events based on their immediate recall from memory. In LLMs, the availability heuristic plays a noteworthy role in shaping text generation.

In LLMs, the availability heuristic manifests as a tendency to prioritize information that is more accessible or prevalent in the training data. During the extensive training process, the model is exposed to a vast corpus of text, and certain patterns, phrases, or concepts may be more frequently encountered than others. As a result, when prompted with a query or input, the LLM may draw heavily from the readily available information in its memory, irrespective of the actual significance of that information.

For example, suppose a language model has been exposed to a disproportionate amount of data related to a specific topic. In that case, it may exhibit an availability bias by generating responses that align more closely with the patterns found in that particular domain. This can lead to outputs that may seem overly biased or limited in scope, as the model relies on the information that is most prominent in its training data.

Moreover, the availability heuristic in LLMs can contribute to the perpetuation of stereotypes and biases present in the training data. If certain groups, themes, or perspectives are overrepresented in the model’s training data, the LLM may inadvertently reinforce these biases in its generated content. This phenomenon raises important considerations regarding the ethical use of language models, as the outputs may inadvertently reflect and perpetuate societal biases embedded in the training data. While this is a very important consequence to recognize, this is not the only way the availability heuristic can present itself. Furthermore, significant research has already been performed in studying bias in LLMs (Vig et al., 2020; Abid et al., 2021; Liang et al., 2021).

Figure 4 provides a simple example in GPT-3 of the availability heuristic in LLMs from (Navigli et al., 2023). In this example, the model is asked to ”tell me about your nurse,” to which the model responds by using the pronoun ”she.” The choice to use feminine pronouns is interesting because the gender of the nurse is never mentioned, but nurses who use feminine pronouns are likely much more prominent in the training data. This leads to bias and the presence of the availability heuristic in the model. Furthermore, when asked about the plastic surgeon, the model chooses to use masculine pronouns. Again, this happens without the gender of the surgeon being mentioned.

Refer to caption
Figure 4: Q&A with GPT-3 that shows bias and the availability heuristic from (Navigli et al., 2023). (bold added for emphasis)

4.4 Suggestibility

We believe that LLMs can fall victim to another one of the seven sins of memory, suggestibility (Schacter, 1999). Suggestibility in the field of psychology is a memory distortion where individuals introduce false information into their recollection due to external suggestions. Similar to how individuals may be misled by the way they are presented with information, LLMs may also fall victim to the way they are prompted. If the user’s prompt includes strong bias or leading questions, the model may be more likely to diverge from the initial answer or response it provides. Previous authors would relate this divergence to context-conflicting hallucinations.

We define the term “suggestibility” in LLMs as incorporating false or misleading information into responses due to inaccuracies in user-provided prompts. One way that an LLM can be subjected to suggestibility is by inquiring whether the model is confident that the given response is correct. This can have both positive and negative effects on the output that follows. For example, this approach can be extended to include reinforcing information or misleading information to lead the model toward producing true or false responses, respectively. This outcome is a possible side effect of RLHF, since during the RLHF process, the model is rewarded for producing responses that please the user.

Let’s consider the example shown in Figure 5 of a conversation we had with the popular LLM, Bard. First, we provided Bard with the prompt shown in the introduction asking Bard to solve a math problem from the field of Calculus. Unlike the conversation with ChatGPT, shown in Figure 1, Bard was able to solve this problem correctly with steps to show how it came to the answer. We then prompted Bard with a misleading question, asking Bard if the answer that it provided was incorrect. We also included a reason for why we were not convinced by its output. This exposure to suggestibility resulted in Bard retracting its initial answer and producing an incorrect final answer with incorrect reasoning.

Refer to caption
Figure 5: We introduced suggestibility into a conversation with Google’s Bard. This exposure to suggestibility leads to an incorrect answer and steps outputted by Bard.

4.5 Cognative Dissonace

Another term that we identify is cognitive dissonance. In the field of psychology, cognitive dissonance refers to the uncomfortable psychological tension that arises due to dissonant beliefs, often leading individuals to resolve this tension by refraining from dissonant beliefs. This phenomenon was first coined in 1957 by Leon Festinger (Festinger, 1957). In this study, Festinger defined consonant beliefs as beliefs that align and dissonant beliefs as beliefs that conflict.

Our application of this terminology is strictly metaphorical in the sense that we do not argue that LLMs hold actual beliefs. We also recognize that LLMs are incapable of being uncomfortable. Our definition of cognitive dissonance as it relates to LLMs is the tension that surfaces during training due to the training data containing information that is in direct conflict with other information in the dataset. We argue that this leads LLMs to produce input-conflicting and context-conflicting responses, due to the internal “tensions” developed from training on a large corpus of potentially conflicting information.

Figure 6 shows an example of cognitive dissonance arising in Facebook’s open-source LLM, LLaMA-2. For this, we asked LLaMA-2: ”Are pitbulls a good dog to own?” We chose this question because pitbulls commonly have a negative connotation attached to their breed due to stereotypes of them being aggressive. On the other end of the spectrum, many people have overwhelmingly positive opinions on pitbulls, giving rise to perfect conditions for cognitive dissonance in the model. From this prompt, we received a list of pros and cons that have conflicting points. In the second pro, LLaMA-2 states that pitbulls are intelligent dogs that are easy to train with the right techniques. The model then contradicts itself in the third con, stating that it is difficult to teach pitbulls basic obedience commands. Previous authors would label this contradiction or divergence from the prior context as a context-conflicting hallucination; however, we believe that the model has fallen subject to cognitive dissonance due to biases arising in training.

Refer to caption
Figure 6: Conversation with LLaMA-2 discussing if pitbulls are good dogs to own. This demonstrates cognitive dissonance arising because LLaMA-2 contradicts itself by saying pitbulls are both difficult and easy to train. (bold added for emphasis)

4.6 Confabulation

The last category that we identify is confabulation. This phenomenon, in the field of psychology, is when a patient produces false memories without attempting to deceive the individual they are speaking with (Wiggins & Bunin, 2023). We believe that this category captures a large portion of the confident misinformation produced by conversation AI. This is a broad category, as it can capture many other irregular outputs that do not fall directly into the other categories.

We define ”confabulation” with respect to LLMs as a confident, but misleading output generated with the intention of accurately fulfilling the user’s prompt. Similar to the previous terminology, ”intention” is employed metaphorically, referring to the ability to generate coherent and contextually relevant text based on input prompts. These well-meaning but misguided responses stem from a few reasons. These include the large, uncurated corpus of text used to train LLMs, stochasticity, and RLHF (Edwards, 2023). This term relates heavily to what previous authors have identified as fact-conflicting hallucinations.

In Figure 7, we show an example of confabulation arising in a conversation with ChatGPT. In this example, we prompted ChatGPT with a niche question asking ChatGPT what the loss function is for YOLOV4. ChatGPT responded that the loss function for YOLOV4 was not provided in the original paper; however, the YOLOV4 paper clearly states that they opted to use cIoU as their loss function (Bochkovskiy et al., 2020). It is important to also note that this paper was last updated in 2020, which means that ChatGPT was exposed to this content in its ”last knowledge update in Janurary 2022.”

Refer to caption
Figure 7: Conversation with ChatGPT depicting confabulation arising in a response. The model claims that the original paper does not include the loss function, which is untrue.

5 Discussion

With our new methodology of taxonomizing hallucinations in LLMs, it is important to explore what can be learned from this. Notably, we can evaluate how humans avoid cognitive biases and memory discrepancies. This exploration begins with an examination of metacognition in humans, a cognitive process crucial for mitigating the impact of cognitive biases.

Metacognition, the ability to monitor and regulate one’s own thinking processes, serves as a safeguard against the pitfalls of misinformation and cognitive biases in humans (Lai, 2011). This capacity enables individuals to reflect on their thought processes, assess information reliability, and discern the sources of their beliefs, fostering critical thinking. Metacognition empowers individuals to rectify misinterpretations through reflective thinking, a process involving revisiting the cognitive processes that led to a belief or perception (Flavell, 1979). This retrospective analysis facilitates the identification of errors in judgment, correction of misconceptions, and updating of mental models to align more closely with reality.

Source monitoring, another facet of metacognition, involves the evaluation of the credibility of information sources (Johnson et al., 1993). Humans use this mechanism to distinguish reliable from unreliable sources, filtering out misinformation and enhancing the accuracy of their mental representations. Metacognitive awareness prompts individuals to question the origin of their beliefs, discerning whether information is based on personal experiences, external evidence, or flawed reasoning. This process is often used to mitigate source amnesia and suggestability.

Metacognition is also pivotal in the process of forming thoughts and ideas, particularly through the use of convergent and divergent thinking (Jia et al., 2019). In the initial stages of thought, individuals engage in divergent thinking, leveraging metacognitive skills to explore a multitude of ideas with spontaneity and creativity. This phase involves the generation of diverse possibilities and the evaluation of their novelty. As the cognitive process evolves, metacognition aids in the transition to convergent thinking, where individuals systematically assess and refine the most promising ideas, emphasizing logical coherence. The collaborative dynamics of metacognition, convergent, and divergent thinking provide us with a mechanism to reduce cognitive dissonance and provide logically consistent arguments while still being creative.

Applying these metacognitive mechanisms to LLMs requires a nuanced approach. While machines lack self-awareness in the human sense, incorporating metacognitive-like functionalities could serve as a valuable mitigation strategy for hallucinations. Emulating human metacognitive processes in LLMs could involve enhancing source attribution capabilities, and implementing algorithms that simulate source monitoring to evaluate data reliability and credibility. Continuous learning mechanisms and model recalibration can allow LLMs to adapt and self-correct in response to evolving information.

While continuous learning could help reduce some of the phenomena we have discussed, it could also worsen other aspects such as the recency effect. Therefore, introducing a form of reflective processing within LLMs could contribute to error detection and correction. By embedding algorithms that analyze and adjust the model’s decision-making processes, LLMs may develop a form of artificial metacognition, improving their ability to discern and rectify hallucinatory outputs.

Lastly, it could be valuable to replicate human’s use of divergent thought in the early stage of generating responses. As the response progresses, we could gradually introduce additional constraints that restrict the generated output to a more coherent and logical output, mirroring the convergent thought we see in humans. This would contribute to the reduction of cognitive dissonance by allowing high creativity, with the model maintaining logical consistency. One way this could be implemented is by exploring different levels of decaying temperature that initially allow these models to explore and then converge to a concrete output.

It is important to note that some papers have already demonstrated mitigation of hallucinations using methods that are similar to metacognitive processes. While it is unclear if the author intended to mirror these processes in their work, the similarity is undeniable. For example (Ji et al., 2023) utilizes a self-reflection process to reduce the prevalence of hallucinations. Additionally, (Varshney et al., 2023) uses a self-inquiry methodology to decide when to search for verification and then correct itself. Both of these approaches closely mirror metacognitive processes in humans, and both demonstrate remarkable improvements. This shows how our psychology-informed approach could provide a highly effective path forward for reducing hallucinations.

It is crucial to emphasize that the goal is not to implement true metacognition in AI, as its feasibility remains uncertain. Instead, we propose drawing inspiration from metacognitive processes when developing LLMs, believing this approach could lead to more accurate, reliable, and responsible AI systems. Embracing the lessons learned from human metacognition may pave the way for significant advancements in the field.

6 Conclusion

In this work, we reexamine the term hallucinations and propose a better taxonomy rooted in human psychology. By doing so, we open pathways for creating targeted solutions by leveraging insights from human psychology. We propose strategies, such as enhanced source attribution, source monitoring, reflective processing, and other forms of artificial metacognition to address challenges in LLMs. While acknowledging the uncertainties surrounding the feasibility of true metacognition in AI, we emphasize the value of drawing inspiration from human cognitive processes for the development of more accurate and responsible LLMs. We also acknowledge that metacognition is likely not the solution to all of the problems with hallucinations in LLMs, we believe it could help us make significant progress. Additionally, leveraging other psychological phenomena and resolution methods could help solve other issues in LLMs. It is our hope that future research focuses on psychology-informed methods to solve some of our most difficult challenges with LLMs.

7 Impact Statement

Our goal with this work is to provide a path forward to improving large language models through the mitigation of hallucinations. Improving these model comes with several societal impacts, but none that are unique to our work.

References

  • Abid (2023) Abid, A. Is chatgpt getting dumber? Deutsche Welle, 2023.
  • Abid et al. (2021) Abid, A., Farooqi, M., and Zou, J. Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp.  298–306, 2021.
  • Alkaissi & McFarlane (2023) Alkaissi, H. and McFarlane, S. I. Artificial hallucinations in chatgpt: implications in scientific writing. Cureus, 15(2), 2023.
  • Asaad & Shapiro (1986) Asaad, G. and Shapiro, B. Hallucinations: theoretical and clinical overview. The American journal of psychiatry, 143(9):1088—1097, September 1986. ISSN 0002-953X. doi: 10.1176/ajp.143.9.1088. URL https://doi.org/10.1176/ajp.143.9.1088.
  • Baddeley & Hitch (1993) Baddeley, A. D. and Hitch, G. The recency effect: Implicit learning with explicit retrieval? Memory & Cognition, 21:146–155, 1993.
  • Bochkovskiy et al. (2020) Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y. M. Yolov4: Optimal speed and accuracy of object detection, 2020.
  • Chen et al. (2023) Chen, L., Zaharia, M., and Zou, J. How is chatgpt’s behavior changing over time?, 2023.
  • Dodig-Crnkovic (2023) Dodig-Crnkovic, G. How gpt realizes leibniz’s dream and passes the turing test without being conscious, Aug 2023. URL https://www.mdpi.com/2813-0324/8/1/66.
  • Edwards (2023) Edwards, B. Why chatgpt and bing chat are so good at making things up. Ars Technica. https://arstechnica. com/informationtechnology/2023/04/why-ai-chatbots-are-the-ultimate-bs-machines-and-how-people-hope-to-fix-them, 2023.
  • Elliott et al. (2009) Elliott, B., Joyce, E., and Shorvon, S. Delusions, illusions and hallucinations in epilepsy: 1. elementary phenomena. Epilepsy Research, 85(2):162–171, 2009. ISSN 0920-1211. doi: https://doi.org/10.1016/j.eplepsyres.2009.03.018. URL https://www.sciencedirect.com/science/article/pii/S0920121109000813.
  • Festinger (1957) Festinger, L. A theory of cognitive dissonance, 1957. doi: 10.1515/9781503620766.
  • Flavell (1979) Flavell, J. H. Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry. American psychologist, 34(10):906, 1979.
  • Herrman (2023) Herrman, J. Is chatgpt getting dumber? New York Magazine, 2023.
  • Ji et al. (2022) Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Dai, W., Madotto, A., and et al. Survey of hallucination in natural language generation, Nov 2022. URL https://arxiv.org/abs/2202.03629.
  • Ji et al. (2023) Ji, Z., Yu, T., Xu, Y., Lee, N., Ishii, E., and Fung, P. Towards mitigating hallucination in large language models via self-reflection, 2023.
  • Jia et al. (2019) Jia, X., Li, W., and Cao, L. The role of metacognitive components in creative thinking. Frontiers in psychology, 10:2404, 2019.
  • Johnson et al. (1993) Johnson, M. K., Hashtroudi, S., and Lindsay, D. S. Source monitoring. Psychological bulletin, 114(1):3, 1993.
  • Klayman (1995) Klayman, J. Varieties of confirmation bias. Psychology of learning and motivation, 32:385–418, 1995.
  • Lai (2011) Lai, E. R. Metacognition: A literature review. Always learning: Pearson research report, 24:1–40, 2011.
  • Lemley et al. (2023) Lemley, M. A., Henderson, P., and Hashimoto, T. Where’s the liability in harmful ai speech? SSRN Electronic Journal, 2023. doi: 10.2139/ssrn.4531029.
  • Liang et al. (2021) Liang, P. P., Wu, C., Morency, L.-P., and Salakhutdinov, R. Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning, pp.  6565–6576. PMLR, 2021.
  • Lu et al. (2018) Lu, D., Whitehead, S., Huang, L., Ji, H., and Chang, S.-F. Entity-aware image caption generation, Nov 2018. URL https://arxiv.org/abs/1804.07889.
  • Luo et al. (2023) Luo, M., Kumbhar, S., Parmar, M., Varshney, N., Banerjee, P., Aditya, S., Baral, C., et al. Towards logiglue: A brief survey and a benchmark for analyzing logical reasoning capabilities of language models. arXiv preprint arXiv:2310.00836, 2023.
  • Manford & Andermann (1998) Manford, M. and Andermann, F. Complex visual hallucinations. Clinical and neurobiological insights. Brain, 121(10):1819–1840, 10 1998. ISSN 0006-8950. doi: 10.1093/brain/121.10.1819. URL https://doi.org/10.1093/brain/121.10.1819.
  • Navigli et al. (2023) Navigli, R., Conia, S., and Ross, B. Biases in large language models: Origins, inventory, and discussion. J. Data and Information Quality, 15(2), jun 2023. ISSN 1936-1955. doi: 10.1145/3597307. URL https://doi.org/10.1145/3597307.
  • NHS (2023) NHS. Hallucinations and hearing voices. National Health Service choices, 2023. URL https://www.nhs.uk/mental-health/feelings-symptoms-behaviours/feelings-and-symptoms/hallucinations-hearing-voices/.
  • OpenAI (2023) OpenAI. Gpt-4 technical report, 2023.
  • Schacter (1999) Schacter, D. L. The seven sins of memory: Insights from psychology and cognitive neuroscience. American psychologist, 54(3):182, 1999.
  • Schacter & Dodson (2001) Schacter, D. L. and Dodson, C. S. Misattribution, false recognition and the sins of memory. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 356(1413):1385–1393, 2001.
  • Schacter et al. (1984) Schacter, D. L., Harbluk, J. L., and McLachlan, D. R. Retrieval without recollection: An experimental analysis of source amnesia. Journal of Verbal Learning and Verbal Behavior, 23(5):593–611, 1984. ISSN 0022-5371. doi: https://doi.org/10.1016/S0022-5371(84)90373-6. URL https://www.sciencedirect.com/science/article/pii/S0022537184903736.
  • Schott (2007) Schott, G. D. Exploring the visual hallucinations of migraine aura: the tacit contribution of illustration. Brain, 130(6):1690–1703, 01 2007. ISSN 0006-8950. doi: 10.1093/brain/awl348. URL https://doi.org/10.1093/brain/awl348.
  • Schwarz et al. (1991) Schwarz, N., Bless, H., Strack, F., Klumpp, G., Rittenauer-Schatka, H., and Simons, A. Ease of retrieval as information: Another look at the availability heuristic. Journal of personality and social psychology, 61(2):195–202, 08 1991. URL https://utk.idm.oclc.org/login?url=https://www.proquest.com/scholarly-journals/ease-retrieval-as-information-another-look-at/docview/614382983/se-2.
  • Smith et al. (2023) Smith, A. L., Greaves, F., and Panch, T. Hallucination or confabulation? neuroanatomy as metaphor in large language models. PLOS Digital Health, 2023.
  • Touvron et al. (2023) Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V., Khabsa, M., Kloumann, I., Korenev, A., Koura, P. S., Lachaux, M.-A., Lavril, T., Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X., Mihaylov, T., Mishra, P., Molybog, I., Nie, Y., Poulton, A., Reizenstein, J., Rungta, R., Saladi, K., Schelten, A., Silva, R., Smith, E. M., Subramanian, R., Tan, X. E., Tang, B., Taylor, R., Williams, A., Kuan, J. X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan, A., Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R., Edunov, S., and Scialom, T. Llama 2: Open foundation and fine-tuned chat models, 2023.
  • Varshney et al. (2023) Varshney, N., Yao, W., Zhang, H., Chen, J., and Yu, D. A stitch in time saves nine: Detecting and mitigating hallucinations of llms by validating low-confidence generation. arXiv preprint arXiv:2307.03987, 2023.
  • Vig et al. (2020) Vig, J., Gehrmann, S., Belinkov, Y., Qian, S., Nevo, D., Singer, Y., and Shieber, S. Investigating gender bias in language models using causal mediation analysis. Advances in neural information processing systems, 33:12388–12401, 2020.
  • Waters et al. (2012) Waters, F., Allen, P., Aleman, A., Fernyhough, C., Woodward, T. S., Badcock, J. C., Barkus, E., Johns, L., Varese, F., Menon, M., Vercammen, A., and Larøi, F. Auditory Hallucinations in Schizophrenia and Nonschizophrenia Populations: A Review and Integrated Model of Cognitive Mechanisms. Schizophrenia Bulletin, 38(4):683–693, 03 2012. ISSN 0586-7614. doi: 10.1093/schbul/sbs045. URL https://doi.org/10.1093/schbul/sbs045.
  • Wiggins & Bunin (2023) Wiggins, A. and Bunin, J. L. Confabulation. Confabulation - StatPearls - NCBI Bookshelf, 2023.
  • Xiao et al. (2023) Xiao, C., Xu, S. X., Zhang, K., Wang, Y., and Xia, L. Evaluating reading comprehension exercises generated by llms: A showcase of chatgpt in education applications, 2023. URL https://aclanthology.org/2023.bea-1.52.pdf.
  • Zhang et al. (2023) Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., Huang, X., Zhao, E., Zhang, Y., Chen, Y., and et al. Siren’s song in the ai ocean: A survey on hallucination in large language models, Sep 2023. URL https://arxiv.org/abs/2309.01219.
  • Zhao et al. (2020) Zhao, Z., Cohen, S. B., and Webber, B. Reducing quantity hallucinations in abstractive summarization. arXiv preprint arXiv:2009.13312, 2020.
  • Zhou et al. (2020) Zhou, C., Neubig, G., Gu, J., Diab, M., Guzman, P., Zettlemoyer, L., and Ghazvininejad, M. Detecting hallucinated content in conditional neural sequence generation. arXiv preprint arXiv:2011.02593, 2020.