Skip to main content

Showing 1–46 of 46 results for author: McAleer, S

  1. arXiv:2407.01476  [pdf, other

    cs.AI cs.CL cs.LG

    Tree Search for Language Model Agents

    Authors: Jing Yu Koh, Stephen McAleer, Daniel Fried, Ruslan Salakhutdinov

    Abstract: Autonomous agents powered by language models (LMs) have demonstrated promise in their ability to perform decision-making tasks such as web automation. However, a key limitation remains: LMs, primarily optimized for natural language understanding and generation, struggle with multi-step reasoning, planning, and using environmental feedback when attempting to solve realistic computer tasks. Towards… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 11 pages. Models and code available at https://jykoh.com/search-agents

  2. arXiv:2404.12626  [pdf, other

    cs.AI cs.GT cs.MA

    Grasper: A Generalist Pursuer for Pursuit-Evasion Problems

    Authors: Pengdeng Li, Shuxin Li, Xinrun Wang, Jakub Cerny, Youzhi Zhang, Stephen McAleer, Hau Chan, Bo An

    Abstract: Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments such as urban street networks. Recent advancements have demonstrated the effectiveness of the pre-training and fine-tuning paradigm in PSRO to improve scalability in solving large-scale PEGs. However, these methods primarily focus on specific PEGs with fixed initial conditions that… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: To appear in the 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2024)

  3. arXiv:2404.11483  [pdf, other

    cs.AI cs.LG

    AgentKit: Flow Engineering with Graphs, not Coding

    Authors: Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen McAleer, Yonatan Bisk, Ruslan Salakhutdinov, Yuanzhi Li, Tom Mitchell

    Abstract: We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offers a unified framework for explicitly constructing a complex "thought process" from simple natural language prompts. The basic building block in AgentKit is a node, containing a natural language prompt for a specific subtask. The user then puts together chains of nodes, like stacking LEGO pieces. Th… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  4. arXiv:2404.09097  [pdf, other

    cs.GT

    Faster Game Solving via Hyperparameter Schedules

    Authors: Naifeng Zhang, Stephen McAleer, Tuomas Sandholm

    Abstract: The counterfactual regret minimization (CFR) family of algorithms consists of iterative algorithms for imperfect-information games. In two-player zero-sum games, the time average of the iterates converges to a Nash equilibrium. The state-of-the-art prior variants, Discounted CFR (DCFR) and Predictive CFR$^+$ (PCFR$^+$) are the fastest known algorithms for solving two-player zero-sum games in pract… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  5. arXiv:2403.02227  [pdf, other

    cs.GT cs.AI cs.MA

    Policy Space Response Oracles: A Survey

    Authors: Ariyan Bighashdel, Yongzhao Wang, Stephen McAleer, Rahul Savani, Frans A. Oliehoek

    Abstract: Game theory provides a mathematical way to study the interaction between multiple decision makers. However, classical game-theoretic analysis is limited in scalability due to the large number of strategies, precluding direct application to more complex scenarios. This survey provides a comprehensive overview of a framework for large games, known as Policy Space Response Oracles (PSRO), which holds… ▽ More

    Submitted 27 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Ariyan Bighashdel and Yongzhao Wang contributed equally

    Journal ref: The 33rd International Joint Conference on Artificial Intelligence, 2024

  6. arXiv:2402.08129  [pdf, ps, other

    cs.GT

    Automated Design of Affine Maximizer Mechanisms in Dynamic Settings

    Authors: Michael Curry, Vinzenz Thoma, Darshan Chakrabarti, Stephen McAleer, Christian Kroer, Tuomas Sandholm, Niao He, Sven Seuken

    Abstract: Dynamic mechanism design is a challenging extension to ordinary mechanism design in which the mechanism designer must make a sequence of decisions over time in the face of possibly untruthful reports of participating agents. Optimizing dynamic mechanisms for welfare is relatively well understood. However, there has been less work on optimizing for other goals (e.g. revenue), and without restrictiv… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: To be published in the Thirty-Eighth Proceedings of the AAAI Conference on Artificial Intelligence 2024

  7. arXiv:2401.17044  [pdf, other

    cs.AI cs.GT cs.MA

    Scalable Mechanism Design for Multi-Agent Path Finding

    Authors: Paul Friedrich, Yulun Zhang, Michael Curry, Ludwig Dierks, Stephen McAleer, Jiaoyang Li, Tuomas Sandholm, Sven Seuken

    Abstract: Multi-Agent Path Finding (MAPF) involves determining paths for multiple agents to travel simultaneously and collision-free through a shared area toward given goal locations. This problem is computationally complex, especially when dealing with large numbers of agents, as is common in realistic applications like autonomous vehicle coordination. Finding an optimal solution is often computationally i… ▽ More

    Submitted 8 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: 12 pages, 5 figures. IJCAI'24 camera-ready version

  8. arXiv:2310.19852  [pdf, other

    cs.AI

    AI Alignment: A Comprehensive Survey

    Authors: Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

    Abstract: AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness,… ▽ More

    Submitted 1 May, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Continually updated, including weak-to-strong generalization and socio-technical thinking. 58 pages (excluding bibliography), 801 references

  9. arXiv:2310.10631  [pdf, other

    cs.CL cs.AI cs.LO

    Llemma: An Open Language Model For Mathematics

    Authors: Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck

    Abstract: We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool u… ▽ More

    Submitted 15 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Updated references; corrected description of COPRA search budget

  10. arXiv:2310.04373  [pdf, other

    cs.LG cs.AI

    Confronting Reward Model Overoptimization with Constrained RLHF

    Authors: Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen McAleer

    Abstract: Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback. However, human preferences are multi-faceted, and it is increasingly common to derive reward from a composition of simpler reward models which each capture a different aspect of language quality. This itself presents a challenge, as it is difficult to appropriat… ▽ More

    Submitted 10 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  11. arXiv:2309.17179  [pdf, other

    cs.LG cs.AI cs.CL

    Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training

    Authors: Xidong Feng, Ziyu Wan, Muning Wen, Stephen Marcus McAleer, Ying Wen, Weinan Zhang, Jun Wang

    Abstract: Recent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment the reasoning capabilities of LLMs by using tree-search algorithms to guide multi-step reasoning. These methods rely on prompting a pre-trained model to serve as a value function and focus on problems with low search depth. As a result, these methods will not work in domains where the pre-trained LLM does not h… ▽ More

    Submitted 8 February, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

  12. arXiv:2308.04719  [pdf, other

    cs.AI

    JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games

    Authors: Yang Li, Kun Xiong, Yingping Zhang, Jiangcheng Zhu, Stephen Mcaleer, Wei Pan, Jun Wang, Zonghong Dai, Yaodong Yang

    Abstract: This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi. By analyzing over 10,000 records of human Xiangqi play, we highlight the existence of both transitive and non-transitive elements within the game's strategic structure. To address non… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 28 pages, accepted by Transactions on Machine Learning Research (TMLR)

  13. arXiv:2307.12062  [pdf, other

    cs.LG cs.AI

    Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

    Authors: Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Xiangyu Liu, Benjamin Eysenbach, Tuomas Sandholm, Furong Huang, Stephen McAleer

    Abstract: Deploying reinforcement learning (RL) systems requires robustness to uncertainty and model misspecification, yet prior robust RL methods typically only study noise introduced independently across time. However, practical sources of uncertainty are usually coupled across time. We formally introduce temporally-coupled perturbations, presenting a novel challenge for existing robust RL methods. To tac… ▽ More

    Submitted 25 April, 2024; v1 submitted 22 July, 2023; originally announced July 2023.

    Comments: Accepted at The Twelfth International Conference on Learning Representations (ICLR 2024)

  14. arXiv:2306.16884  [pdf, other

    cs.GT cs.LG cs.MA

    Policy Space Diversity for Non-Transitive Games

    Authors: Jian Yao, Weiming Liu, Haobo Fu, Yaodong Yang, Stephen McAleer, Qiang Fu, Wei Yang

    Abstract: Policy-Space Response Oracles (PSRO) is an influential algorithm framework for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous studies have been trying to promote policy diversity in PSRO. A major weakness in existing diversity metrics is that a more diverse (according to their diversity metrics) population does not necessarily mean (as we proved in the pap… ▽ More

    Submitted 8 November, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

  15. arXiv:2306.05221  [pdf, other

    cs.GT

    Steering No-Regret Learners to a Desired Equilibrium

    Authors: Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen Marcus McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

    Abstract: A mediator observes no-regret learners playing an extensive-form game repeatedly across $T$ rounds. The mediator attempts to steer players toward some desirable predetermined equilibrium by giving (nonnegative) payments to players. We call this the steering problem. The steering problem captures problems several problems of interest, among them equilibrium selection and information design (persuas… ▽ More

    Submitted 17 February, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

  16. arXiv:2306.05216  [pdf, ps, other

    cs.GT

    Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

    Authors: Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen Marcus McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

    Abstract: We introduce a new approach for computing optimal equilibria via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe that optimal equilibria are minimax equilibrium strategies of a player in an extensive-form zero-sum gam… ▽ More

    Submitted 23 May, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

  17. arXiv:2304.10498  [pdf, other

    cs.GT

    Regret-Minimizing Double Oracle for Extensive-Form Games

    Authors: Xiaohang Tang, Le Cong Dinh, Stephen Marcus McAleer, Yaodong Yang

    Abstract: By incorporating regret minimization, double oracle methods have demonstrated rapid convergence to Nash Equilibrium (NE) in normal-form games and extensive-form games, through algorithms such as online double oracle (ODO) and extensive-form double oracle (XDO), respectively. In this study, we further examine the theoretical convergence rate and sample complexity of such regret minimization-based d… ▽ More

    Submitted 13 July, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: Accepted at ICML, 2023

  18. arXiv:2303.17491  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Language Models can Solve Computer Tasks

    Authors: Geunwoo Kim, Pierre Baldi, Stephen McAleer

    Abstract: Agents capable of carrying out general tasks on a computer can improve efficiency and productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally, such agents should be able to solve new computer tasks presented to them through natural language commands. However, previous approaches to this problem require large amounts of expert demonstrations and task-specific r… ▽ More

    Submitted 16 November, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

  19. arXiv:2303.00466  [pdf, other

    cs.LG cs.AI math.OC

    ASP: Learn a Universal Neural Solver!

    Authors: Chenguang Wang, Zhouliang Yu, Stephen McAleer, Tianshu Yu, Yaodong Yang

    Abstract: Applying machine learning to combinatorial optimization problems has the potential to improve both efficiency and accuracy. However, existing learning-based solvers often struggle with generalization when faced with changes in problem distributions and scales. In this paper, we propose a new approach called ASP: Adaptive Staircase Policy Space Response Oracle to address these generalization issues… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  20. arXiv:2302.05910  [pdf, other

    cs.MA

    MANSA: Learning Fast and Slow in Multi-Agent Systems

    Authors: David Mguni, Haojun Chen, Taher Jafferjee, Jianhong Wang, Long Fei, Xidong Feng, Stephen McAleer, Feifei Tong, Jun Wang, Yaodong Yang

    Abstract: In multi-agent reinforcement learning (MARL), independent learning (IL) often shows remarkable performance and easily scales with the number of agents. Yet, using IL can be inefficient and runs the risk of failing to successfully train, particularly in scenarios that require agents to coordinate their actions. Using centralised learning (CL) enables MARL agents to quickly learn how to coordinate t… ▽ More

    Submitted 4 June, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

  21. arXiv:2302.03439  [pdf, other

    cs.MA cs.LG

    Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

    Authors: Lukas Schäfer, Oliver Slumbers, Stephen McAleer, Yali Du, Stefano V. Albrecht, David Mguni

    Abstract: Existing value-based algorithms for cooperative multi-agent reinforcement learning (MARL) commonly rely on random exploration, such as $ε$-greedy, to explore the environment. However, such exploration is inefficient at finding effective joint actions in states that require cooperation of multiple agents. In this work, we propose ensemble value functions for multi-agent exploration (EMAX), a genera… ▽ More

    Submitted 16 April, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: Preprint. Previously presented at the Adaptive and Learning Agents Workshop (ALA) at the AAMAS conference 2023

  22. arXiv:2301.02129  [pdf, ps, other

    cs.GT cs.CC cs.DS

    Algorithms and Complexity for Computing Nash Equilibria in Adversarial Team Games

    Authors: Ioannis Anagnostides, Fivos Kalogiannis, Ioannis Panageas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Stephen McAleer

    Abstract: Adversarial team games model multiplayer strategic interactions in which a team of identically-interested players is competing against an adversarial player in a zero-sum game. Such games capture many well-studied settings in game theory, such as congestion games, but go well-beyond to environments wherein the cooperation of one team -- in the absence of explicit communication -- is obstructed by… ▽ More

    Submitted 30 May, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

    Comments: To appear at the conference on Economics and Computation (EC) 2023

  23. arXiv:2210.02205  [pdf, other

    cs.GT cs.LG cs.MA

    Game Theoretic Rating in N-player general-sum games with Equilibria

    Authors: Luke Marris, Marc Lanctot, Ian Gemp, Shayegan Omidshafiei, Stephen McAleer, Jerome Connor, Karl Tuyls, Thore Graepel

    Abstract: Rating strategies in a game is an important area of research in game theory and artificial intelligence, and can be applied to any real-world competitive or cooperative setting. Traditionally, only transitive dependencies between strategies have been used to rate strategies (e.g. Elo), however recent work has expanded ratings to utilize game theoretic solutions to better rate strategies in non-tra… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  24. arXiv:2209.07670  [pdf, other

    cs.LG

    Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks

    Authors: Litian Liang, Yaosheng Xu, Stephen McAleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

    Abstract: In temporal-difference reinforcement learning algorithms, variance in value estimation can cause instability and overestimation of the maximal target value. Many algorithms have been proposed to reduce overestimation, including several recent ensemble methods, however none have shown success in sample-efficient learning through addressing estimation variance as the root cause of overestimation. In… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Journal ref: ICML 2022

  25. arXiv:2207.10170  [pdf, other

    cs.AI

    Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks

    Authors: Tim Franzmeyer, Stephen McAleer, João F. Henriques, Jakob N. Foerster, Philip H. S. Torr, Adel Bibi, Christian Schroeder de Witt

    Abstract: Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detect… ▽ More

    Submitted 6 May, 2024; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: ICLR 2024 Spotlight (top 5%)

  26. arXiv:2207.09597  [pdf, other

    cs.LG cs.AI cs.GT

    Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments

    Authors: JB Lanier, Stephen McAleer, Pierre Baldi, Roy Fox

    Abstract: Robust reinforcement learning (RL) considers the problem of learning policies that perform well in the worst case among a set of possible environment parameter values. In real-world environments, choosing the set of possible values for robust RL can be a difficult task. When that set is specified too narrowly, the agent will be left vulnerable to reasonable parameter values unaccounted for. When s… ▽ More

    Submitted 3 October, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: Added new theory sections. Added comparison to self-play. Added adversary mixed-strategy analysis

  27. arXiv:2207.06541  [pdf, other

    cs.GT cs.LG cs.MA

    Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

    Authors: Stephen McAleer, JB Lanier, Kevin Wang, Pierre Baldi, Roy Fox, Tuomas Sandholm

    Abstract: In competitive two-agent environments, deep reinforcement learning (RL) methods based on the \emph{Double Oracle (DO)} algorithm, such as \emph{Policy Space Response Oracles (PSRO)} and \emph{Anytime PSRO (APSRO)}, iteratively add RL best response policies to a population. Eventually, an optimal mixture of these population policies will approximate a Nash equilibrium. However, these methods might… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

  28. arXiv:2206.15378  [pdf, other

    cs.AI cs.GT cs.MA

    Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

    Authors: Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot , et al. (9 additional authors not shown)

    Abstract: We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additiona… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  29. arXiv:2206.08686  [pdf, other

    cs.RO cs.AI cs.LG cs.MA

    Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning

    Authors: Yuanpei Chen, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuang Jiang, Stephen Marcus McAleer, Yiran Geng, Hao Dong, Zongqing Lu, Song-Chun Zhu, Yaodong Yang

    Abstract: Achieving human-level dexterity is an important open problem in robotics. However, tasks of dexterous hand manipulation, even at the baby level, are challenging to solve through reinforcement learning (RL). The difficulty lies in the high degrees of freedom and the required cooperation among heterogeneous agents (e.g., joints of fingers). In this study, we propose the Bimanual Dexterous Hands Benc… ▽ More

    Submitted 11 October, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: 38 pages, 8 figures

    Report number: V-02

  30. arXiv:2206.04122  [pdf, other

    cs.GT cs.AI cs.LG stat.ML

    ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

    Authors: Stephen McAleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm

    Abstract: Recent techniques for approximating Nash equilibria in very large games leverage neural networks to learn approximately optimal policies (strategies). One promising line of research uses neural networks to approximate counterfactual regret minimization (CFR) or its modern variants. DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains… ▽ More

    Submitted 11 October, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

  31. arXiv:2205.15434  [pdf, other

    cs.LG cs.AI cs.GT cs.MA

    A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems

    Authors: Oliver Slumbers, David Henry Mguni, Stephen Marcus McAleer, Stefano B. Blumberg, Jun Wang, Yaodong Yang

    Abstract: In order for agents in multi-agent systems (MAS) to be safe, they need to take into account the risks posed by the actions of other agents. However, the dominant paradigm in game theory (GT) assumes that agents are not affected by risk from other agents and only strive to maximise their expected utility. For example, in hybrid human-AI driving systems, it is necessary to limit large deviations in… ▽ More

    Submitted 2 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  32. arXiv:2201.07700  [pdf, other

    cs.GT cs.LG cs.MA

    Anytime PSRO for Two-Player Zero-Sum Games

    Authors: Stephen McAleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox

    Abstract: Policy space response oracles (PSRO) is a multi-agent reinforcement learning algorithm that has achieved state-of-the-art performance in very large two-player zero-sum games. PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next. We propose anytime double oracle (ADO)… ▽ More

    Submitted 28 January, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

    Comments: Published in AAAI Reinforcement Learning in Games Workshop

  33. arXiv:2112.02852  [pdf, other

    cs.LG cs.AI

    Target Entropy Annealing for Discrete Soft Actor-Critic

    Authors: Yaosheng Xu, Dailin Hu, Litian Liang, Stephen McAleer, Pieter Abbeel, Roy Fox

    Abstract: Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings. It uses the maximum entropy framework for efficiency and stability, and applies a heuristic temperature Lagrange term to tune the temperature $α$, which determines how "soft" the policy should be. It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domai… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Journal ref: neurips 2021 deep rl workshop

  34. arXiv:2110.14818  [pdf, other

    cs.LG cs.AI

    Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates

    Authors: Litian Liang, Yaosheng Xu, Stephen McAleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

    Abstract: Temporal-Difference (TD) learning methods, such as Q-Learning, have proven effective at learning a policy to perform control tasks. One issue with methods like Q-Learning is that the value update introduces bias when predicting the TD target of a unfamiliar state. Estimation noise becomes a bias after the max operator in the policy improvement step, and carries over to value estimations of other s… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted to Deep Reinforcement Learning Workshop @ NeurIPS 2021

  35. arXiv:2110.10614  [pdf, other

    cs.LG cs.GT

    Independent Natural Policy Gradient Always Converges in Markov Potential Games

    Authors: Roy Fox, Stephen McAleer, Will Overman, Ioannis Panageas

    Abstract: Multi-agent reinforcement learning has been successfully applied to fully-cooperative and fully-competitive environments, but little is currently known about mixed cooperative/competitive environments. In this paper, we focus on a particular class of multi-agent mixed cooperative/competitive stochastic games called Markov Potential Games (MPGs), which include cooperative games as a special case. R… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: 24 pages

  36. arXiv:2106.03927  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    Improving Social Welfare While Preserving Autonomy via a Pareto Mediator

    Authors: Stephen McAleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox

    Abstract: Machine learning algorithms often make decisions on behalf of agents with varied and sometimes conflicting interests. In domains where agents can choose to take their own action or delegate their action to a central mediator, an open question is how mediators should take actions on behalf of delegating agents. The main existing approach uses delegating agents to punish non-delegating agents in an… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  37. arXiv:2106.02745  [pdf, other

    cs.AI cs.MA

    Neural Auto-Curricula

    Authors: Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen McAleer, Ying Wen, Jun Wang, Yaodong Yang

    Abstract: When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of "who to compete with" (i.e., the opponent mixture) and "how to beat them" (i.e., finding best responses) are unde… ▽ More

    Submitted 1 November, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: corresponding to <yaodong.yang@outlook.com>

  38. arXiv:2103.07780  [pdf, other

    cs.AI cs.GT

    Online Double Oracle

    Authors: Le Cong Dinh, Yaodong Yang, Stephen McAleer, Zheng Tian, Nicolas Perez Nieves, Oliver Slumbers, David Henry Mguni, Haitham Bou Ammar, Jun Wang

    Abstract: Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) methods… ▽ More

    Submitted 15 February, 2023; v1 submitted 13 March, 2021; originally announced March 2021.

    Comments: Accepted at Transactions on Machine Learning Research (TMLR)

    Journal ref: Transactions on Machine Learning Research 2022

  39. arXiv:2103.06426  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    XDO: A Double Oracle Algorithm for Extensive-Form Games

    Authors: Stephen McAleer, John Lanier, Kevin Wang, Pierre Baldi, Roy Fox

    Abstract: Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates)… ▽ More

    Submitted 28 January, 2022; v1 submitted 10 March, 2021; originally announced March 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  40. arXiv:2102.04518  [pdf, ps, other

    cs.AI cs.LG

    A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

    Authors: Forest Agostinelli, Alexander Shmakov, Stephen McAleer, Roy Fox, Pierre Baldi

    Abstract: Efficiently solving problems with large action spaces using A* search has been of importance to the artificial intelligence community for decades. This is because the computation and memory requirements of A* search grow linearly with the size of the action space. This burden becomes even more apparent when A* search uses a heuristic function learned by computationally expensive function approxima… ▽ More

    Submitted 23 March, 2023; v1 submitted 8 February, 2021; originally announced February 2021.

    Comments: Added theoretical results to show that Q* search is an admissible search algorithm. Added comparisons to deferred heuristic evaluation. Added experiments with Lights Out and the 35-Pancake puzzle

  41. arXiv:2011.06408  [pdf

    eess.IV cs.LG

    Deep machine learning-assisted multiphoton microscopy to reduce light exposure and expedite imaging

    Authors: Stephen McAleer, Alex Fast, Yuntian Xue, Magdalene Seiler, William Tang, Mihaela Balu, Pierre Baldi, Andrew W. Browne

    Abstract: Two-photon excitation fluorescence (2PEF) allows imaging of tissue up to about one millimeter in thickness. Typically, reducing fluorescence excitation exposure reduces the quality of the image. However, using deep learning super resolution techniques, these low-resolution images can be converted to high-resolution images. This work explores improving human tissue imaging by applying deep learning… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

  42. arXiv:2006.08555  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

    Authors: Stephen McAleer, John Lanier, Roy Fox, Pierre Baldi

    Abstract: Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making… ▽ More

    Submitted 18 February, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: SM and JL contributed equally

  43. arXiv:1912.04451  [pdf, other

    cs.MA

    ColosseumRL: A Framework for Multiagent Reinforcement Learning in $N$-Player Games

    Authors: Alexander Shmakov, John Lanier, Stephen McAleer, Rohan Achar, Cristina Lopes, Pierre Baldi

    Abstract: Much of recent success in multiagent reinforcement learning has been in two-player zero-sum games. In these games, algorithms such as fictitious self-play and minimax tree search can converge to an approximate Nash equilibrium. While playing a Nash equilibrium strategy in a two-player zero-sum game is optimal, in an $n$-player general sum game, it becomes a much less informative solution concept.… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

    Comments: Accepted for the 2020 AAAI Spring Symposium, Challenges and Opportunities for Multi-Agent Reinforcement Learning. Source code available at https://github.com/colosseumrl/colosseumrl/

  44. arXiv:1906.07315  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

    Authors: Shauharda Khadka, Somdeb Majumdar, Santiago Miret, Stephen McAleer, Kagan Tumer

    Abstract: Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Furthermore, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team… ▽ More

    Submitted 11 June, 2020; v1 submitted 17 June, 2019; originally announced June 2019.

    Comments: Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 108, 2020

    Journal ref: Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020

  45. arXiv:1906.03710  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Curiosity-Driven Multi-Criteria Hindsight Experience Replay

    Authors: John B. Lanier, Stephen McAleer, Pierre Baldi

    Abstract: Dealing with sparse rewards is a longstanding challenge in reinforcement learning. The recent use of hindsight methods have achieved success on a variety of sparse-reward tasks, but they fail on complex tasks such as stacking multiple blocks with a robot arm in simulation. Curiosity-driven exploration using the prediction error of a learned dynamics model as an intrinsic reward has been shown to b… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

    Comments: 14 pages

  46. arXiv:1805.07470  [pdf, other

    cs.AI

    Solving the Rubik's Cube Without Human Knowledge

    Authors: Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi

    Abstract: A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision. Recently, deep reinforcement learning algorithms combined with self-play have achieved superhuman proficiency in Go, Chess, and Shogi without human data or domain knowledge. In these environments, a reward is always received at the end of the game, however, for many c… ▽ More

    Submitted 18 May, 2018; originally announced May 2018.

    Comments: First three authors contributed equally. Submitted to NIPS 2018