Skip to main content

Showing 1–33 of 33 results for author: Raileanu, R

  1. arXiv:2407.04467  [pdf, other

    cs.AI cs.CL cs.GT

    Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games

    Authors: Nathan Herr, Fernando Acero, Roberta Raileanu, María Pérez-Ortiz, Zhibin Li

    Abstract: Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic abilities remain largely unexplored. Game theory provides a good framework for assessing the decision-making abilities of LLMs in interactions with other agents. Although prior studies have shown that LLMs can solve these tasks with carefully curated prompts, they fail when the problem setting or p… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 8 pages (19 with appendix), 6 figures in the main body (4 in the appendix), 4 tables in the main body

  2. arXiv:2404.15538  [pdf, other

    cs.GR cs.AI cs.CL cs.LG

    DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft

    Authors: Sam Earle, Filippos Kokkinos, Yuhe Nie, Julian Togelius, Roberta Raileanu

    Abstract: Procedural Content Generation (PCG) algorithms enable the automatic generation of complex and diverse artifacts. However, they don't provide high-level control over the generated content and typically require domain expertise. In contrast, text-to-3D methods allow users to specify desired characteristics in natural language, offering a high amount of flexibility and expressivity. But unlike PCG, s… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 16 pages, 9 figures, accepted to Foundation of Digital Games 2024

  3. arXiv:2403.04642  [pdf, other

    cs.LG

    Teaching Large Language Models to Reason with Reinforcement Learning

    Authors: Alex Havrilla, Yuqing Du, Sharath Chandra Raparthy, Christoforos Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Sainbayar Sukhbaatar, Roberta Raileanu

    Abstract: Reinforcement Learning from Human Feedback (\textbf{RLHF}) has emerged as a dominant approach for aligning LLM outputs with human preferences. Inspired by the success of RLHF, we study the performance of multiple algorithms that learn from feedback (Expert Iteration, Proximal Policy Optimization (\textbf{PPO}), Return-Conditioned RL) on improving LLM reasoning capabilities. We investigate both spa… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  4. arXiv:2402.16822  [pdf, other

    cs.CL cs.AI cs.LG

    Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

    Authors: Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

    Abstract: As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance. Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations. To address these limitations, we present Rainbow Teaming, a novel app… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  5. arXiv:2402.14158  [pdf, other

    cs.CL

    TOOLVERIFIER: Generalization to New Tools via Self-Verification

    Authors: Dheeraj Mekala, Jason Weston, Jack Lanchantin, Roberta Raileanu, Maria Lomeli, Jingbo Shang, Jane Dwivedi-Yu

    Abstract: Teaching language models to use tools is an important milestone towards building general assistants, but remains an open problem. While there has been significant progress on learning to use specific tools via fine-tuning, language models still struggle with learning how to robustly use new tools from only a few demonstrations. In this work we introduce a self-verification method which distinguish… ▽ More

    Submitted 13 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  6. arXiv:2402.10963  [pdf, other

    cs.CL cs.LG

    GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

    Authors: Alex Havrilla, Sharath Raparthy, Christoforus Nalmpantis, Jane Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Roberta Raileanu

    Abstract: State-of-the-art language models can exhibit impressive reasoning refinement capabilities on math, science or coding tasks. However, recent work demonstrates that even the best models struggle to identify \textit{when and where to refine} without access to external feedback. Outcome-based Reward Models (\textbf{ORMs}), trained to predict correctness of the final answer indicating when to refine, o… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  7. arXiv:2312.05742  [pdf, other

    cs.LG cs.AI

    The Generalization Gap in Offline Reinforcement Learning

    Authors: Ishita Mediratta, Qingfei You, Minqi Jiang, Roberta Raileanu

    Abstract: Despite recent progress in offline learning, these methods are still trained and tested on the same environment. In this paper, we compare the generalization abilities of widely used online and offline learning methods such as online reinforcement learning (RL), offline RL, sequence modeling, and behavioral cloning. Our experiments show that offline learning algorithms perform worse on new environ… ▽ More

    Submitted 14 March, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: Published as a conference paper at ICLR 2024; First two authors contributed equally

  8. arXiv:2312.03801  [pdf, other

    cs.LG cs.AI

    Generalization to New Sequential Decision Making Tasks with In-Context Learning

    Authors: Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu

    Abstract: Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new language or vision tasks without any weight updates from only a few examples, also referred to as in-context learning. However, the sequential decision making setting poses additional challenges having a lower… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  9. arXiv:2310.06452  [pdf, other

    cs.LG cs.AI cs.CL

    Understanding the Effects of RLHF on LLM Generalisation and Diversity

    Authors: Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu

    Abstract: Large language models (LLMs) fine-tuned with reinforcement learning from human feedback (RLHF) have been used in some of the most widely deployed AI models to date, such as OpenAI's ChatGPT or Anthropic's Claude. While there has been significant work developing these methods, our understanding of the benefits and downsides of each stage in RLHF is still limited. To fill this gap, we present an ext… ▽ More

    Submitted 19 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Code available here: https://github.com/facebookresearch/rlfh-gen-div

  10. arXiv:2310.00166  [pdf, other

    cs.AI cs.LG

    Motif: Intrinsic Motivation from Artificial Intelligence Feedback

    Authors: Martin Klissarov, Pierluca D'Oro, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff

    Abstract: Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging. In this paper, we propose Motif, a general method to interface such prior knowledge from a Large Language Model (LLM) with an agent. Motif is based on the idea of grounding LLMs for decision-making without requiring them to interact with the environment: it elicits preferences from an LLM ove… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: The first two authors equally contributed - order decided by coin flip

  11. arXiv:2309.11495  [pdf, other

    cs.CL cs.AI

    Chain-of-Verification Reduces Hallucination in Large Language Models

    Authors: Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston

    Abstract: Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mistakes. We develop the Chain-of-Verification (CoVe) method whereby the model first (i) drafts an initial response; then (ii) plans verification questions to fact-c… ▽ More

    Submitted 25 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  12. arXiv:2307.10169  [pdf, other

    cs.CL cs.AI cs.LG

    Challenges and Applications of Large Language Models

    Authors: Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, Robert McHardy

    Abstract: Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: 72 pages. v01. Work in progress. Feedback and comments are highly appreciated!

  13. arXiv:2307.01163  [pdf, other

    cs.CL cs.LG cs.NE

    Improving Language Plasticity via Pretraining with Active Forgetting

    Authors: Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe

    Abstract: Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data an… ▽ More

    Submitted 12 January, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023 Final Version

  14. arXiv:2306.05483  [pdf, other

    cs.LG

    On the Importance of Exploration for Generalization in Reinforcement Learning

    Authors: Yiding Jiang, J. Zico Kolter, Roberta Raileanu

    Abstract: Existing approaches for improving generalization in deep reinforcement learning (RL) have mostly focused on representation learning, neglecting RL-specific aspects such as exploration. We hypothesize that the agent's exploration strategy plays a key role in its ability to generalize to new environments. Through a series of experiments in a tabular contextual MDP, we show that exploration is helpfu… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  15. arXiv:2306.03236  [pdf, other

    cs.AI

    A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

    Authors: Mikael Henaff, Minqi Jiang, Roberta Raileanu

    Abstract: Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the agent's entire training experience, and \textit{episodic novelty bonuses}, computed using only experience from the current episode. However, the use of these two types of bonuses has been ad-hoc and poorly und… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  16. arXiv:2306.01324  [pdf, other

    cs.LG

    Hyperparameters in Reinforcement Learning and How To Tune Them

    Authors: Theresa Eimer, Marius Lindauer, Roberta Raileanu

    Abstract: In order to improve reproducibility, deep reinforcement learning (RL) has been adopting better scientific practices such as standardized evaluation metrics and reporting. However, the process of hyperparameter optimization still varies widely across papers, which makes it challenging to compare RL algorithms fairly. In this paper, we show that hyperparameter choices in RL can significantly affect… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  17. arXiv:2303.03376  [pdf, other

    cs.LG cs.MA

    MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

    Authors: Mikayel Samvelyan, Akbir Khan, Michael Dennis, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Roberta Raileanu, Tim Rocktäschel

    Abstract: Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents. Existing methods adapt curricula independently over either environment parameters (in single-agent settings) or co-player policies (in multi-agent settings). However, the strengths and weaknesses of co-players can… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: International Conference on Learning Representations (ICLR) 2023

  18. arXiv:2302.07842  [pdf, ps, other

    cs.CL

    Augmented Language Models: a Survey

    Authors: Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann LeCun, Thomas Scialom

    Abstract: This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demo… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  19. arXiv:2302.04761  [pdf, other

    cs.CL

    Toolformer: Language Models Can Teach Themselves to Use Tools

    Authors: Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom

    Abstract: Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  20. arXiv:2211.10445  [pdf, other

    cs.LG cs.AI

    Building a Subspace of Policies for Scalable Continual Learning

    Authors: Jean-Baptiste Gaya, Thang Doan, Lucas Caccia, Laure Soulier, Ludovic Denoyer, Roberta Raileanu

    Abstract: The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size models that scale poorly with the number of tasks. In this work, we aim to strike a better balance between an agent's size and performance by designing a method tha… ▽ More

    Submitted 2 March, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: Accepted at ICLR2023 (notable-top-25%). website: https://continual-subspace-policies-streamlit-app-gofujp.streamlit.app/ code: https://github.com/facebookresearch/salina/tree/main/salina_cl

  21. arXiv:2211.00539  [pdf, other

    cs.LG cs.AI

    Dungeons and Data: A Large-Scale NetHack Dataset

    Authors: Eric Hambro, Roberta Raileanu, Danielle Rothermel, Vegard Mella, Tim Rocktäschel, Heinrich Küttler, Naila Murray

    Abstract: Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dat… ▽ More

    Submitted 24 November, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: 9 pages, published in the Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks. New links to hosting location. Revised results, same conclusions

  22. arXiv:2210.05805  [pdf, other

    cs.LG cs.AI

    Exploration via Elliptical Episodic Bonuses

    Authors: Mikael Henaff, Roberta Raileanu, Minqi Jiang, Tim Rocktäschel

    Abstract: In recent years, a number of reinforcement learning (RL) methods have been proposed to explore complex environments which differ across episodes. In this work, we show that the effectiveness of these methods critically relies on a count-based episodic term in their exploration bonus. As a result, despite their success in relatively simple, noise-free settings, these methods fall short in more real… ▽ More

    Submitted 4 January, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

  23. arXiv:2203.11889  [pdf, other

    cs.LG cs.AI cs.NE cs.SC stat.ML

    Insights From the NeurIPS 2021 NetHack Challenge

    Authors: Eric Hambro, Sharada Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty, Edward Grefenstette, Minqi Jiang, Daejin Jo, Anssi Kanervisto, Jongmin Kim, Sungwoong Kim, Robert Kirk, Vitaly Kurin, Heinrich Küttler, Taehwon Kwon, Donghoon Lee, Vegard Mella, Nantas Nardelli, Ivan Nazarov, Nikita Ovsov, Jack Parker-Holder, Roberta Raileanu, Karolis Ramanauskas, Tim Rocktäschel, Danielle Rothermel , et al. (4 additional authors not shown)

    Abstract: In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challeng… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: Under review at PMLR for the NeuRIPS 2021 Competition Workshop Track, 10 pages + 10 in appendices

  24. arXiv:2202.08938  [pdf, other

    cs.LG cs.AI cs.CL

    Improving Intrinsic Exploration with Language Abstractions

    Authors: Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette

    Abstract: Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural la… ▽ More

    Submitted 21 November, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022

  25. arXiv:2107.12808  [pdf, other

    cs.LG cs.AI cs.MA

    Open-Ended Learning Leads to Generally Capable Agents

    Authors: Open Ended Learning Team, Adam Stooke, Anuj Mahajan, Catarina Barros, Charlie Deck, Jakob Bauer, Jakub Sygnowski, Maja Trebacz, Max Jaderberg, Michael Mathieu, Nat McAleese, Nathalie Bradley-Schmieg, Nathaniel Wong, Nicolas Porcel, Roberta Raileanu, Steph Hughes-Fitt, Valentin Dalibard, Wojciech Marian Czarnecki

    Abstract: In this work we create agents that can perform well beyond a single, individual task, that exhibit much wider generalisation of behaviour to a massive, rich space of challenges. We define a universe of tasks within an environment domain and demonstrate the ability to train agents that are generally capable across this vast space and beyond. The environment is natively multi-agent, spanning the con… ▽ More

    Submitted 31 July, 2021; v1 submitted 27 July, 2021; originally announced July 2021.

  26. arXiv:2102.10330  [pdf, other

    cs.LG cs.AI

    Decoupling Value and Policy for Generalization in Reinforcement Learning

    Authors: Roberta Raileanu, Rob Fergus

    Abstract: Standard deep reinforcement learning algorithms use a shared representation for the policy and value function, especially when training directly from images. However, we argue that more information is needed to accurately estimate the value function than to learn the optimal policy. Consequently, the use of a shared representation for the policy and value function can lead to overfitting. To allev… ▽ More

    Submitted 15 June, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

  27. arXiv:2007.02879  [pdf, other

    cs.LG cs.AI

    Fast Adaptation via Policy-Dynamics Value Functions

    Authors: Roberta Raileanu, Max Goldstein, Arthur Szlam, Rob Fergus

    Abstract: Standard RL algorithms assume fixed environment dynamics and require a significant amount of interaction to adapt to new environments. We introduce Policy-Dynamics Value Functions (PD-VF), a novel approach for rapidly adapting to dynamics different from those previously seen in training. PD-VF explicitly estimates the cumulative reward in a space of policies and environments. An ensemble of conven… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

  28. arXiv:2006.13760  [pdf, other

    cs.LG cs.AI cs.CL cs.NE stat.ML

    The NetHack Learning Environment

    Authors: Heinrich Küttler, Nantas Nardelli, Alexander H. Miller, Roberta Raileanu, Marco Selvatici, Edward Grefenstette, Tim Rocktäschel

    Abstract: Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging… ▽ More

    Submitted 1 December, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: 28 pages. Accepted at NeurIPS 2020

  29. arXiv:2006.12862  [pdf, other

    cs.LG cs.AI

    Automatic Data Augmentation for Generalization in Deep Reinforcement Learning

    Authors: Roberta Raileanu, Max Goldstein, Denis Yarats, Ilya Kostrikov, Rob Fergus

    Abstract: Deep reinforcement learning (RL) agents often fail to generalize to unseen scenarios, even when they are trained on many instances of semantically similar environments. Data augmentation has recently been shown to improve the sample efficiency and generalization of RL agents. However, different tasks tend to benefit from different kinds of data augmentation. In this paper, we compare three approac… ▽ More

    Submitted 20 February, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

  30. arXiv:2006.12122  [pdf, other

    cs.LG cs.AI stat.ML

    Learning with AMIGo: Adversarially Motivated Intrinsic Goals

    Authors: Andres Campero, Roberta Raileanu, Heinrich Küttler, Joshua B. Tenenbaum, Tim Rocktäschel, Edward Grefenstette

    Abstract: A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards. In contrast to current RL methods, humans are able to learn new skills with little or no reward by using various forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating -- as form of meta-learning -- a goal-generating teacher that proposes Adversarially Motivated… ▽ More

    Submitted 23 February, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: 18 pages, 6 figures, published at The Ninth International Conference on Learning Representations (2021)

  31. arXiv:2002.12292  [pdf, other

    cs.LG cs.AI

    RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

    Authors: Roberta Raileanu, Tim Rocktäschel

    Abstract: Exploration in sparse reward environments remains one of the key challenges of model-free reinforcement learning. Instead of solely relying on extrinsic rewards provided by the environment, many state-of-the-art methods use intrinsic rewards to encourage exploration. However, we show that existing methods fall short in procedurally-generated environments where an agent is unlikely to visit a state… ▽ More

    Submitted 29 February, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

  32. arXiv:1807.06919  [pdf, other

    cs.LG cs.AI stat.ML

    Backplay: "Man muss immer umkehren"

    Authors: Cinjon Resnick, Roberta Raileanu, Sanyam Kapoor, Alexander Peysakhovich, Kyunghyun Cho, Joan Bruna

    Abstract: Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fix… ▽ More

    Submitted 21 April, 2022; v1 submitted 18 July, 2018; originally announced July 2018.

    Comments: AAAI-19 Workshop on Reinforcement Learning in Games; 0xd1a80a702b8170f6abeaabcf32a0c4c4401e9177

  33. arXiv:1802.09640  [pdf, other

    cs.AI cs.LG

    Modeling Others using Oneself in Multi-Agent Reinforcement Learning

    Authors: Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus

    Abstract: We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players' hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Othe… ▽ More

    Submitted 23 March, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

    Comments: 10 pages, 16 figures, submitted to ICML 2018