Skip to main content

Showing 1–8 of 8 results for author: Treutlein, J

  1. arXiv:2406.14546  [pdf, other

    cs.CL cs.AI cs.LG

    Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

    Authors: Johannes Treutlein, Dami Choi, Jan Betley, Cem Anil, Samuel Marks, Roger Baker Grosse, Owain Evans

    Abstract: One way to address safety risks from large language models (LLMs) is to censor dangerous knowledge from their training data. While this removes the explicit information, implicit information can remain scattered across various training documents. Could an LLM infer the censored knowledge by piecing together these implicit hints? As a step towards answering this question, we study inductive out-of-… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2305.17601  [pdf, other

    cs.AI

    Incentivizing honest performative predictions with proper scoring rules

    Authors: Caspar Oesterheld, Johannes Treutlein, Emery Cooper, Rubi Hudson

    Abstract: Proper scoring rules incentivize experts to accurately report beliefs, assuming predictions cannot influence outcomes. We relax this assumption and investigate incentives when predictions are performative, i.e., when they can influence the outcome of the prediction, such as when making public predictions about the stock market. We say a prediction is a fixed point if it accurately reflects the exp… ▽ More

    Submitted 30 May, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted for the 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023)

  3. arXiv:2302.00805  [pdf, other

    cs.AI

    Conditioning Predictive Models: Risks and Strategies

    Authors: Evan Hubinger, Adam Jermyn, Johannes Treutlein, Rubi Hudson, Kate Woolverton

    Abstract: Our intention is to provide a definitive reference on what it would take to safely make use of generative/predictive models in the absence of a solution to the Eliciting Latent Knowledge problem. Furthermore, we believe that large language models can be understood as such predictive models of the world, and that such a conceptualization raises significant opportunities for their safe yet powerful… ▽ More

    Submitted 6 February, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

  4. arXiv:2211.14468  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    Similarity-based cooperative equilibrium

    Authors: Caspar Oesterheld, Johannes Treutlein, Roger Grosse, Vincent Conitzer, Jakob Foerster

    Abstract: As machine learning agents act more autonomously in the world, they will increasingly interact with each other. Unfortunately, in many social dilemmas like the one-shot Prisoner's Dilemma, standard game theory predicts that ML agents will fail to cooperate with each other. Prior work has shown that one way to enable cooperative outcomes in the one-shot Prisoner's Dilemma is to make the agents mutu… ▽ More

    Submitted 12 November, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: Published at NeurIPS 2023. 32 pages, 9 figures

    MSC Class: 91A10 (Primary) 91A05 91A26 91A35 (Secondary) ACM Class: I.2.11

  5. arXiv:2211.09961  [pdf, other

    cs.LG stat.ML

    Path Independent Equilibrium Models Can Better Exploit Test-Time Computation

    Authors: Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, Zico Kolter, Roger Grosse

    Abstract: Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances. Recent efforts have shown promising results in this direction by making use of depth-wise recurrent networks. We show that a broad class of architectures named equilibrium models display strong upwards generalization, and find that str… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022

  6. arXiv:2203.04098  [pdf, other

    cs.LG cs.AI cs.GT

    COLA: Consistent Learning with Opponent-Learning Awareness

    Authors: Timon Willi, Alistair Letcher, Johannes Treutlein, Jakob Foerster

    Abstract: Learning in general-sum games is unstable and frequently leads to socially undesirable (Pareto-dominated) outcomes. To mitigate this, Learning with Opponent-Learning Awareness (LOLA) introduced opponent shaping to this setting, by accounting for each agent's influence on their opponents' anticipated learning steps. However, the original LOLA formulation (and follow-up work) is inconsistent because… ▽ More

    Submitted 27 June, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted @ ICML 2022

  7. arXiv:2111.13872  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Normative Disagreement as a Challenge for Cooperative AI

    Authors: Julian Stastny, Maxime Riché, Alexander Lyzhov, Johannes Treutlein, Allan Dafoe, Jesse Clifton

    Abstract: Cooperation in settings where agents have both common and conflicting interests (mixed-motive environments) has recently received considerable attention in multi-agent learning. However, the mixed-motive environments typically studied have a single cooperative outcome on which all agents can agree. Many real-world multi-agent environments are instead bargaining problems (BPs): they have several Pa… ▽ More

    Submitted 27 November, 2021; originally announced November 2021.

    Comments: Accepted at the Cooperative AI workshop and the Strategic ML workshop at NeurIPS 2021

  8. arXiv:2106.06613  [pdf, other

    cs.AI cs.LG

    A New Formalism, Method and Open Issues for Zero-Shot Coordination

    Authors: Johannes Treutlein, Michael Dennis, Caspar Oesterheld, Jakob Foerster

    Abstract: In many coordination problems, independently reasoning humans are able to discover mutually compatible policies. In contrast, independently trained self-play policies are often mutually incompatible. Zero-shot coordination (ZSC) has recently been proposed as a new frontier in multi-agent reinforcement learning to address this fundamental issue. Prior work approaches the ZSC problem by assuming pla… ▽ More

    Submitted 12 July, 2023; v1 submitted 11 June, 2021; originally announced June 2021.