-
Banishing LLM Hallucinations Requires Rethinking Generalization
Authors:
Johnny Li,
Saksham Consul,
Eda Zhou,
James Wong,
Naila Farooqui,
Yuxin Ye,
Nithyashree Manohar,
Zhuxiaona Wei,
Tian Wu,
Ben Echols,
Sharon Zhou,
Gregory Diamos
Abstract:
Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional a…
▽ More
Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
XVir: A Transformer-Based Architecture for Identifying Viral Reads from Cancer Samples
Authors:
Shorya Consul,
John Robertson,
Haris Vikalo
Abstract:
It is estimated that approximately 15% of cancers worldwide can be linked to viral infections. The viruses that can cause or increase the risk of cancer include human papillomavirus, hepatitis B and C viruses, Epstein-Barr virus, and human immunodeficiency virus, to name a few. The computational analysis of the massive amounts of tumor DNA data, whose collection is enabled by the recent advancemen…
▽ More
It is estimated that approximately 15% of cancers worldwide can be linked to viral infections. The viruses that can cause or increase the risk of cancer include human papillomavirus, hepatitis B and C viruses, Epstein-Barr virus, and human immunodeficiency virus, to name a few. The computational analysis of the massive amounts of tumor DNA data, whose collection is enabled by the recent advancements in sequencing technologies, have allowed studies of the potential association between cancers and viral pathogens. However, the high diversity of oncoviral families makes reliable detection of viral DNA difficult and thus, renders such analysis challenging. In this paper, we introduce XVir, a data pipeline that relies on a transformer-based deep learning architecture to reliably identify viral DNA present in human tumors. In particular, XVir is trained on genomic sequencing reads from viral and human genomes and may be used with tumor sequence information to find evidence of viral DNA in human cancers. Results on semi-experimental data demonstrate that XVir is capable of achieving high detection accuracy, generally outperforming state-of-the-art competing methods while being more compact and less computationally demanding.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting
Authors:
Rylan Schaeffer,
Kateryna Pistunova,
Samar Khanna,
Sarthak Consul,
Sanmi Koyejo
Abstract:
Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-s…
▽ More
Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-specific information with abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easily solved tasks to draw meaningful conclusions. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically \textit{invalid} reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts. We also discover that some CoT prompts used by previous works contain logical errors. This suggests that covariates beyond logically valid reasoning are responsible for performance improvements.
△ Less
Submitted 22 July, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
An intelligent tutor for planning in large partially observable environments
Authors:
Lovis Heindrich,
Saksham Consul,
Falk Lieder
Abstract:
AI can not only outperform people in many planning tasks, but it can also teach them how to plan better. A recent and promising approach to improving human decision-making is to create intelligent tutors that utilize AI to discover and teach optimal planning strategies automatically. Prior work has shown that this approach can improve planning in artificial, fully observable planning tasks. Unlike…
▽ More
AI can not only outperform people in many planning tasks, but it can also teach them how to plan better. A recent and promising approach to improving human decision-making is to create intelligent tutors that utilize AI to discover and teach optimal planning strategies automatically. Prior work has shown that this approach can improve planning in artificial, fully observable planning tasks. Unlike these artificial tasks, the world is only partially observable. To bridge this gap, we developed and evaluated the first intelligent tutor for planning in partially observable environments. Compared to previous intelligent tutors for teaching planning strategies, this novel intelligent tutor combines two innovations: 1) a new metareasoning algorithm for discovering optimal planning strategies for large, partially observable environments, and 2) scaffolding the learning processing by having the learner choose from an increasing larger set of planning operations in increasingly larger planning problems. We found that our new strategy discovery algorithm is superior to the state-of-the-art. A preregistered experiment with 330 participants demonstrated that the new intelligent tutor is highly effective at improving people's ability to make good decisions in partially observable environments. This suggests our human-centered tutoring approach can successfully boost human planning in complex, partially observable sequential decision problems, a promising step towards using AI-powered intelligent tutors to improve human planning in the real world.
△ Less
Submitted 6 June, 2024; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Optimal To-Do List Gamification for Long Term Planning
Authors:
Saksham Consul,
Jugoslav Stojcheski,
Valkyrie Felso,
Falk Lieder
Abstract:
Most people struggle with prioritizing work. While inexact heuristics have been developed over time, there is still no tractable principled algorithm for deciding which of the many possible tasks one should tackle in any given day, month, week, or year. Additionally, some people suffer from cognitive biases such as the present bias, leading to prioritization of their immediate experience over long…
▽ More
Most people struggle with prioritizing work. While inexact heuristics have been developed over time, there is still no tractable principled algorithm for deciding which of the many possible tasks one should tackle in any given day, month, week, or year. Additionally, some people suffer from cognitive biases such as the present bias, leading to prioritization of their immediate experience over long-term consequences which manifests itself as procrastination and inefficient task prioritization. Our method utilizes optimal gamification to help people overcome these problems by incentivizing each task by a number of points that convey how valuable it is in the long-run. We extend the previous version of our optimal gamification method with added services for helping people decide which tasks should and should not be done when there is not enough time to do everything. To improve the efficiency and scalability of the to-do list solver, we designed a hierarchical procedure that tackles the problem from the top-level goals to fine-grained tasks. We test the accuracy of the incentivised to-do list by comparing the performance of the strategy with the points computed exactly using Value Iteration for a variety of case studies. These case studies were specifically designed to cover the corner cases to get an accurate judge of performance. Our method yielded the same performance as the exact method for all case studies. To demonstrate its functionality, we released an API that makes it easy to deploy our method in Web and app services. We assessed the scalability of our method by applying it to to-do lists with increasingly larger numbers of goals, sub-goals per goal, hierarchically nested levels of subgoals. We found that the method provided through our API is able to tackle fairly large to-do lists having a 576 tasks. This indicates that our method is suitable for real-world applications.
△ Less
Submitted 15 September, 2021; v1 submitted 14 September, 2021;
originally announced September 2021.
-
Improving Human Decision-Making by Discovering Efficient Strategies for Hierarchical Planning
Authors:
Saksham Consul,
Lovis Heindrich,
Jugoslav Stojcheski,
Falk Lieder
Abstract:
To make good decisions in the real world people need efficient planning strategies because their computational resources are limited. Knowing which planning strategies would work best for people in different situations would be very useful for understanding and improving human decision-making. But our ability to compute those strategies used to be limited to very small and very simple planning tas…
▽ More
To make good decisions in the real world people need efficient planning strategies because their computational resources are limited. Knowing which planning strategies would work best for people in different situations would be very useful for understanding and improving human decision-making. But our ability to compute those strategies used to be limited to very small and very simple planning tasks. To overcome this computational bottleneck, we introduce a cognitively-inspired reinforcement learning method that can overcome this limitation by exploiting the hierarchical structure of human behavior. The basic idea is to decompose sequential decision problems into two sub-problems: setting a goal and planning how to achieve it. This hierarchical decomposition enables us to discover optimal strategies for human planning in larger and more complex tasks than was previously possible. The discovered strategies outperform existing planning algorithms and achieve a super-human level of computational efficiency. We demonstrate that teaching people to use those strategies significantly improves their performance in sequential decision-making tasks that require planning up to eight steps ahead. By contrast, none of the previous approaches was able to improve human performance on these problems. These findings suggest that our cognitively-informed approach makes it possible to leverage reinforcement learning to improve human decision-making in complex sequential decision-problems. Future work can leverage our method to develop decision support systems that improve human decision making in the real world.
△ Less
Submitted 31 January, 2021;
originally announced February 2021.
-
Lower Bounds for Policy Iteration on Multi-action MDPs
Authors:
Kumar Ashutosh,
Sarthak Consul,
Bhishma Dedhia,
Parthasarathi Khirwadkar,
Sahil Shah,
Shivaram Kalyanakrishnan
Abstract:
Policy Iteration (PI) is a classical family of algorithms to compute an optimal policy for any given Markov Decision Problem (MDP). The basic idea in PI is to begin with some initial policy and to repeatedly update the policy to one from an improving set, until an optimal policy is reached. Different variants of PI result from the (switching) rule used for improvement. An important theoretical que…
▽ More
Policy Iteration (PI) is a classical family of algorithms to compute an optimal policy for any given Markov Decision Problem (MDP). The basic idea in PI is to begin with some initial policy and to repeatedly update the policy to one from an improving set, until an optimal policy is reached. Different variants of PI result from the (switching) rule used for improvement. An important theoretical question is how many iterations a specified PI variant will take to terminate as a function of the number of states $n$ and the number of actions $k$ in the input MDP. While there has been considerable progress towards upper-bounding this number, there are fewer results on lower bounds. In particular, existing lower bounds primarily focus on the special case of $k = 2$ actions. We devise lower bounds for $k \geq 3$. Our main result is that a particular variant of PI can take $Ω(k^{n/2})$ iterations to terminate. We also generalise existing constructions on $2$-action MDPs to scale lower bounds by a factor of $k$ for some common deterministic variants of PI, and by $\log(k)$ for corresponding randomised variants.
△ Less
Submitted 16 September, 2020;
originally announced September 2020.
-
Balance is key: Private median splits yield high-utility random trees
Authors:
Shorya Consul,
Sinead A. Williamson
Abstract:
Random forests are a popular method for classification and regression due to their versatility. However, this flexibility can come at the cost of user privacy, since training random forests requires multiple data queries, often on small, identifiable subsets of the training data. Privatizing these queries typically comes at a high utility cost, in large part because we are privatizing queries on s…
▽ More
Random forests are a popular method for classification and regression due to their versatility. However, this flexibility can come at the cost of user privacy, since training random forests requires multiple data queries, often on small, identifiable subsets of the training data. Privatizing these queries typically comes at a high utility cost, in large part because we are privatizing queries on small subsets of the data, which are easily corrupted by added noise. In this paper, we propose DiPriMe forests, a novel tree-based ensemble method for differentially private regression and classification, which is appropriate for real or categorical covariates. We generate splits using a differentially private version of the median, which encourages balanced leaf nodes. By avoiding low occupancy leaf nodes, we avoid high signal-to-noise ratios when privatizing the leaf node sufficient statistics. We show theoretically and empirically that the resulting algorithm exhibits high utility, while ensuring differential privacy.
△ Less
Submitted 19 February, 2021; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Analysis of Lower Bounds for Simple Policy Iteration
Authors:
Sarthak Consul,
Bhishma Dedhia,
Kumar Ashutosh,
Parthasarathi Khirwadkar
Abstract:
Policy iteration is a family of algorithms that are used to find an optimal policy for a given Markov Decision Problem (MDP). Simple Policy iteration (SPI) is a type of policy iteration where the strategy is to change the policy at exactly one improvable state at every step. Melekopoglou and Condon [1990] showed an exponential lower bound on the number of iterations taken by SPI for a 2 action MDP…
▽ More
Policy iteration is a family of algorithms that are used to find an optimal policy for a given Markov Decision Problem (MDP). Simple Policy iteration (SPI) is a type of policy iteration where the strategy is to change the policy at exactly one improvable state at every step. Melekopoglou and Condon [1990] showed an exponential lower bound on the number of iterations taken by SPI for a 2 action MDP. The results have not been generalized to $k-$action MDP since. In this paper, we revisit the algorithm and the analysis done by Melekopoglou and Condon. We generalize the previous result and prove a novel exponential lower bound on the number of iterations taken by policy iteration for $N-$state, $k-$action MDPs. We construct a family of MDPs and give an index-based switching rule that yields a strong lower bound of $\mathcal{O}\big((3+k)2^{N/2-3}\big)$.
△ Less
Submitted 28 November, 2019;
originally announced November 2019.