-
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
Authors:
Johannes Treutlein,
Dami Choi,
Jan Betley,
Cem Anil,
Samuel Marks,
Roger Baker Grosse,
Owain Evans
Abstract:
One way to address safety risks from large language models (LLMs) is to censor dangerous knowledge from their training data. While this removes the explicit information, implicit information can remain scattered across various training documents. Could an LLM infer the censored knowledge by piecing together these implicit hints? As a step towards answering this question, we study inductive out-of-…
▽ More
One way to address safety risks from large language models (LLMs) is to censor dangerous knowledge from their training data. While this removes the explicit information, implicit information can remain scattered across various training documents. Could an LLM infer the censored knowledge by piecing together these implicit hints? As a step towards answering this question, we study inductive out-of-context reasoning (OOCR), a type of generalization in which LLMs infer latent information from evidence distributed across training documents and apply it to downstream tasks without in-context learning. Using a suite of five tasks, we demonstrate that frontier LLMs can perform inductive OOCR. In one experiment we finetune an LLM on a corpus consisting only of distances between an unknown city and other known cities. Remarkably, without in-context examples or Chain of Thought, the LLM can verbalize that the unknown city is Paris and use this fact to answer downstream questions. Further experiments show that LLMs trained only on individual coin flip outcomes can verbalize whether the coin is biased, and those trained only on pairs $(x,f(x))$ can articulate a definition of $f$ and compute inverses. While OOCR succeeds in a range of cases, we also show that it is unreliable, particularly for smaller LLMs learning complex structures. Overall, the ability of LLMs to "connect the dots" without explicit in-context learning poses a potential obstacle to monitoring and controlling the knowledge acquired by LLMs.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Authors:
Evan Hubinger,
Carson Denison,
Jesse Mu,
Mike Lambert,
Meg Tong,
Monte MacDiarmid,
Tamera Lanham,
Daniel M. Ziegler,
Tim Maxwell,
Newton Cheng,
Adam Jermyn,
Amanda Askell,
Ansh Radhakrishnan,
Cem Anil,
David Duvenaud,
Deep Ganguli,
Fazl Barez,
Jack Clark,
Kamal Ndousse,
Kshitij Sachan,
Michael Sellitto,
Mrinank Sharma,
Nova DasSarma,
Roger Grosse,
Shauna Kravec
, et al. (14 additional authors not shown)
Abstract:
Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept exa…
▽ More
Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.
△ Less
Submitted 17 January, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
Studying Large Language Model Generalization with Influence Functions
Authors:
Roger Grosse,
Juhan Bae,
Cem Anil,
Nelson Elhage,
Alex Tamkin,
Amirhossein Tajdini,
Benoit Steiner,
Dustin Li,
Esin Durmus,
Ethan Perez,
Evan Hubinger,
Kamilė Lukošiūtė,
Karina Nguyen,
Nicholas Joseph,
Sam McCandlish,
Jared Kaplan,
Samuel R. Bowman
Abstract:
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set?…
▽ More
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Path Independent Equilibrium Models Can Better Exploit Test-Time Computation
Authors:
Cem Anil,
Ashwini Pokle,
Kaiqu Liang,
Johannes Treutlein,
Yuhuai Wu,
Shaojie Bai,
Zico Kolter,
Roger Grosse
Abstract:
Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances. Recent efforts have shown promising results in this direction by making use of depth-wise recurrent networks. We show that a broad class of architectures named equilibrium models display strong upwards generalization, and find that str…
▽ More
Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances. Recent efforts have shown promising results in this direction by making use of depth-wise recurrent networks. We show that a broad class of architectures named equilibrium models display strong upwards generalization, and find that stronger performance on harder examples (which require more iterations of inference to get correct) strongly correlates with the path independence of the system -- its tendency to converge to the same steady-state behaviour regardless of initialization, given enough computation. Experimental interventions made to promote path independence result in improved generalization on harder problem instances, while those that penalize it degrade this ability. Path independence analyses are also useful on a per-example basis: for equilibrium models that have good in-distribution performance, path independence on out-of-distribution samples strongly correlates with accuracy. Our results help explain why equilibrium models are capable of strong upwards generalization and motivates future work that harnesses path independence as a general modelling principle to facilitate scalable test-time usage.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Exploring Length Generalization in Large Language Models
Authors:
Cem Anil,
Yuhuai Wu,
Anders Andreassen,
Aitor Lewkowycz,
Vedant Misra,
Vinay Ramasesh,
Ambrose Slone,
Guy Gur-Ari,
Ethan Dyer,
Behnam Neyshabur
Abstract:
The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring th…
▽ More
The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring the length generalization capabilities of transformer-based language models. We first establish that naively finetuning transformers on length generalization tasks shows significant generalization deficiencies independent of model scale. We then show that combining pretrained large language models' in-context learning abilities with scratchpad prompting (asking the model to output solution steps before producing an answer) results in a dramatic improvement in length generalization. We run careful failure analyses on each of the learning modalities and identify common sources of mistakes that highlight opportunities in equipping language models with the ability to generalize to longer problems.
△ Less
Submitted 14 November, 2022; v1 submitted 11 July, 2022;
originally announced July 2022.
-
Solving Quantitative Reasoning Problems with Language Models
Authors:
Aitor Lewkowycz,
Anders Andreassen,
David Dohan,
Ethan Dyer,
Henryk Michalewski,
Vinay Ramasesh,
Ambrose Slone,
Cem Anil,
Imanol Schlag,
Theo Gutman-Solo,
Yuhuai Wu,
Behnam Neyshabur,
Guy Gur-Ari,
Vedant Misra
Abstract:
Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained o…
▽ More
Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.
△ Less
Submitted 30 June, 2022; v1 submitted 29 June, 2022;
originally announced June 2022.
-
Learning to Give Checkable Answers with Prover-Verifier Games
Authors:
Cem Anil,
Guodong Zhang,
Yuhuai Wu,
Roger Grosse
Abstract:
Our ability to know when to trust the decisions made by machine learning systems has not kept up with the staggering improvements in their performance, limiting their applicability in high-stakes domains. We introduce Prover-Verifier Games (PVGs), a game-theoretic framework to encourage learning agents to solve decision problems in a verifiable manner. The PVG consists of two learners with competi…
▽ More
Our ability to know when to trust the decisions made by machine learning systems has not kept up with the staggering improvements in their performance, limiting their applicability in high-stakes domains. We introduce Prover-Verifier Games (PVGs), a game-theoretic framework to encourage learning agents to solve decision problems in a verifiable manner. The PVG consists of two learners with competing objectives: a trusted verifier network tries to choose the correct answer, and a more powerful but untrusted prover network attempts to persuade the verifier of a particular answer, regardless of its correctness. The goal is for a reliable justification protocol to emerge from this game. We analyze variants of the framework, including simultaneous and sequential games, and narrow the space down to a subset of games which provably have the desired equilibria. We develop instantiations of the PVG for two algorithmic tasks, and show that in practice, the verifier learns a robust decision rule that is able to receive useful and reliable information from an untrusted prover. Importantly, the protocol still works even when the verifier is frozen and the prover's messages are directly optimized to convince the verifier.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
Learning to Elect
Authors:
Cem Anil,
Xuchan Bao
Abstract:
Voting systems have a wide range of applications including recommender systems, web search, product design and elections. Limited by the lack of general-purpose analytical tools, it is difficult to hand-engineer desirable voting rules for each use case. For this reason, it is appealing to automatically discover voting rules geared towards each scenario. In this paper, we show that set-input neural…
▽ More
Voting systems have a wide range of applications including recommender systems, web search, product design and elections. Limited by the lack of general-purpose analytical tools, it is difficult to hand-engineer desirable voting rules for each use case. For this reason, it is appealing to automatically discover voting rules geared towards each scenario. In this paper, we show that set-input neural network architectures such as Set Transformers, fully-connected graph networks and DeepSets are both theoretically and empirically well-suited for learning voting rules. In particular, we show that these network models can not only mimic a number of existing voting rules to compelling accuracy -- both position-based (such as Plurality and Borda) and comparison-based (such as Kemeny, Copeland and Maximin) -- but also discover near-optimal voting rules that maximize different social welfare functions. Furthermore, the learned voting rules generalize well to different voter utility distributions and election sizes unseen during training.
△ Less
Submitted 1 October, 2021; v1 submitted 5 August, 2021;
originally announced August 2021.
-
Applications of Game Theory in Vehicular Networks: A Survey
Authors:
Zemin Sun,
Yanheng Liu,
Jian Wang,
Guofa Li,
Carie Anil,
Keqiang Li,
Xinyu Guo,
Geng Sun,
Daxin Tian,
Dongpu Cao
Abstract:
In the Internet of Things (IoT) era, vehicles and other intelligent components in an intelligent transportation system (ITS) are connected, forming Vehicular Networks (VNs) that provide efficient and secure traffic and ubiquitous access to various applications. However, as the number of nodes in ITS increases, it is challenging to satisfy a varied and large number of service requests with differen…
▽ More
In the Internet of Things (IoT) era, vehicles and other intelligent components in an intelligent transportation system (ITS) are connected, forming Vehicular Networks (VNs) that provide efficient and secure traffic and ubiquitous access to various applications. However, as the number of nodes in ITS increases, it is challenging to satisfy a varied and large number of service requests with different Quality of Service and security requirements in highly dynamic VNs. Intelligent nodes in VNs can compete or cooperate for limited network resources to achieve either an individual or a group's objectives. Game Theory (GT), a theoretical framework designed for strategic interactions among rational decision-makers sharing scarce resources, can be used to model and analyze individual or group behaviors of communicating entities in VNs. This paper primarily surveys the recent developments of GT in solving various challenges of VNs. This survey starts with an introduction to the background of VNs. A review of GT models studied in the VNs is then introduced, including its basic concepts, classifications, and applicable vehicular issues. After discussing the requirements of VNs and the motivation of using GT, a comprehensive literature review on GT applications in dealing with the challenges of current VNs is provided. Furthermore, recent contributions of GT to VNs integrating with diverse emerging 5G technologies are surveyed. Finally, the lessons learned are given, and several key research challenges and possible solutions for applying GT in VNs are outlined.
△ Less
Submitted 5 January, 2022; v1 submitted 22 March, 2021;
originally announced March 2021.
-
Game Theoretic Approaches in Vehicular Networks: A Survey
Authors:
Zemin Sun,
Yanheng Liu,
Jian Wang,
Carie Anil,
Dongpu Cao
Abstract:
In the era of the Internet of Things (IoT), vehicles and other intelligent components in Intelligent Transportation System (ITS) are connected, forming the Vehicular Networks (VNs) that provide efficient and secure traffic, ubiquitous access to information, and various applications. However, as the number of connected nodes keeps increasing, it is challenging to satisfy various and large amounts o…
▽ More
In the era of the Internet of Things (IoT), vehicles and other intelligent components in Intelligent Transportation System (ITS) are connected, forming the Vehicular Networks (VNs) that provide efficient and secure traffic, ubiquitous access to information, and various applications. However, as the number of connected nodes keeps increasing, it is challenging to satisfy various and large amounts of service requests with different Quality of Service (QoS ) and security requirements in the highly dynamic VNs. Intelligent nodes in VNs can compete or cooperate for limited network resources so that either an individual or group objectives can be achieved. Game theory, a theoretical framework designed for strategic interactions among rational decision-makers who faced with scarce resources, can be used to model and analyze individual or group behaviors of communication entities in VNs. This paper primarily surveys the recent advantages of GT used in solving various challenges in VNs. As VNs and GT have been extensively investigate34d, this survey starts with a brief introduction of the basic concept and classification of GT used in VNs. Then, a comprehensive review of applications of GT in VNs is presented, which primarily covers the aspects of QoS and security. Moreover, with the development of fifth-generation (5G) wireless communication, recent contributions of GT to diverse emerging technologies of 5G integrated into VNs are surveyed in this paper. Finally, several key research challenges and possible solutions for applying GT in VNs are outlined.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks
Authors:
Qiyang Li,
Saminul Haque,
Cem Anil,
James Lucas,
Roger Grosse,
Jörn-Henrik Jacobsen
Abstract:
Lipschitz constraints under L2 norm on deep neural networks are useful for provable adversarial robustness bounds, stable training, and Wasserstein distance estimation. While heuristic approaches such as the gradient penalty have seen much practical success, it is challenging to achieve similar practical performance while provably enforcing a Lipschitz constraint. In principle, one can design Lips…
▽ More
Lipschitz constraints under L2 norm on deep neural networks are useful for provable adversarial robustness bounds, stable training, and Wasserstein distance estimation. While heuristic approaches such as the gradient penalty have seen much practical success, it is challenging to achieve similar practical performance while provably enforcing a Lipschitz constraint. In principle, one can design Lipschitz constrained architectures using the composition property of Lipschitz functions, but Anil et al. recently identified a key obstacle to this approach: gradient norm attenuation. They showed how to circumvent this problem in the case of fully connected networks by designing each layer to be gradient norm preserving. We extend their approach to train scalable, expressive, provably Lipschitz convolutional networks. In particular, we present the Block Convolution Orthogonal Parameterization (BCOP), an expressive parameterization of orthogonal convolution operations. We show that even though the space of orthogonal convolutions is disconnected, the largest connected component of BCOP with 2n channels can represent arbitrary BCOP convolutions over n channels. Our BCOP parameterization allows us to train large convolutional networks with provable Lipschitz bounds. Empirically, we find that it is competitive with existing approaches to provable adversarial robustness and Wasserstein distance estimation.
△ Less
Submitted 9 November, 2019; v1 submitted 3 November, 2019;
originally announced November 2019.
-
TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer
Authors:
Sicong Huang,
Qiyang Li,
Cem Anil,
Xuchan Bao,
Sageev Oore,
Roger B. Grosse
Abstract:
In this work, we address the problem of musical timbre transfer, where the goal is to manipulate the timbre of a sound sample from one instrument to match another instrument while preserving other musical content, such as pitch, rhythm, and loudness. In principle, one could apply image-based style transfer techniques to a time-frequency representation of an audio signal, but this depends on having…
▽ More
In this work, we address the problem of musical timbre transfer, where the goal is to manipulate the timbre of a sound sample from one instrument to match another instrument while preserving other musical content, such as pitch, rhythm, and loudness. In principle, one could apply image-based style transfer techniques to a time-frequency representation of an audio signal, but this depends on having a representation that allows independent manipulation of timbre as well as high-quality waveform generation. We introduce TimbreTron, a method for musical timbre transfer which applies "image" domain style transfer to a time-frequency representation of the audio signal, and then produces a high-quality waveform using a conditional WaveNet synthesizer. We show that the Constant Q Transform (CQT) representation is particularly well-suited to convolutional architectures due to its approximate pitch equivariance. Based on human perceptual evaluations, we confirmed that TimbreTron recognizably transferred the timbre while otherwise preserving the musical content, for both monophonic and polyphonic samples.
△ Less
Submitted 22 October, 2023; v1 submitted 22 November, 2018;
originally announced November 2018.
-
Sorting out Lipschitz function approximation
Authors:
Cem Anil,
James Lucas,
Roger Grosse
Abstract:
Training neural networks under a strict Lipschitz constraint is useful for provable adversarial robustness, generalization bounds, interpretable gradients, and Wasserstein distance estimation. By the composition property of Lipschitz functions, it suffices to ensure that each individual affine transformation or nonlinear activation is 1-Lipschitz. The challenge is to do this while maintaining the…
▽ More
Training neural networks under a strict Lipschitz constraint is useful for provable adversarial robustness, generalization bounds, interpretable gradients, and Wasserstein distance estimation. By the composition property of Lipschitz functions, it suffices to ensure that each individual affine transformation or nonlinear activation is 1-Lipschitz. The challenge is to do this while maintaining the expressive power. We identify a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation. Based on this, we propose to combine a gradient norm preserving activation function, GroupSort, with norm-constrained weight matrices. We show that norm-constrained GroupSort architectures are universal Lipschitz function approximators. Empirically, we show that norm-constrained GroupSort networks achieve tighter estimates of Wasserstein distance than their ReLU counterparts and can achieve provable adversarial robustness guarantees with little cost to accuracy.
△ Less
Submitted 11 June, 2019; v1 submitted 13 November, 2018;
originally announced November 2018.
-
Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization
Authors:
Jonathan Tremblay,
Aayush Prakash,
David Acuna,
Mark Brophy,
Varun Jampani,
Cem Anil,
Thang To,
Eric Cameracci,
Shaad Boochoon,
Stan Birchfield
Abstract:
We present a system for training deep neural networks for object detection using synthetic images. To handle the variability in real-world data, the system relies upon the technique of domain randomization, in which the parameters of the simulator$-$such as lighting, pose, object textures, etc.$-$are randomized in non-realistic ways to force the neural network to learn the essential features of th…
▽ More
We present a system for training deep neural networks for object detection using synthetic images. To handle the variability in real-world data, the system relies upon the technique of domain randomization, in which the parameters of the simulator$-$such as lighting, pose, object textures, etc.$-$are randomized in non-realistic ways to force the neural network to learn the essential features of the object of interest. We explore the importance of these parameters, showing that it is possible to produce a network with compelling performance using only non-artistically-generated synthetic data. With additional fine-tuning on real data, the network yields better performance than using real data alone. This result opens up the possibility of using inexpensive synthetic data for training neural networks while avoiding the need to collect large amounts of hand-annotated real-world data or to generate high-fidelity synthetic worlds$-$both of which remain bottlenecks for many applications. The approach is evaluated on bounding box detection of cars on the KITTI dataset.
△ Less
Submitted 23 April, 2018; v1 submitted 17 April, 2018;
originally announced April 2018.