subscribe to arXiv mailings

Monte Carlo Planning for Stochastic Control on Constrained Markov Decision Processes

Authors: Larkin Liu, Shiqi Liu, Matej Jusup

Abstract: In the world of stochastic control, especially in economics and engineering, Markov Decision Processes (MDPs) can effectively model various stochastic decision processes, from asset management to transportation optimization. These underlying MDPs, upon closer examination, often reveal a specifically constrained causal structure concerning the transition and reward dynamics. By exploiting this stru… ▽ More In the world of stochastic control, especially in economics and engineering, Markov Decision Processes (MDPs) can effectively model various stochastic decision processes, from asset management to transportation optimization. These underlying MDPs, upon closer examination, often reveal a specifically constrained causal structure concerning the transition and reward dynamics. By exploiting this structure, we can obtain a reduction in the causal representation of the problem setting, allowing us to solve of the optimal value function more efficiently. This work defines an MDP framework, the \texttt{SD-MDP}, where we disentangle the causal structure of MDPs' transition and reward dynamics, providing distinct partitions on the temporal causal graph. With this stochastic reduction, the \texttt{SD-MDP} reflects a general class of resource allocation problems. This disentanglement further enables us to derive theoretical guarantees on the estimation error of the value function under an optimal policy by allowing independent value estimation from Monte Carlo sampling. Subsequently, by integrating this estimator into well-known Monte Carlo planning algorithms, such as Monte Carlo Tree Search (MCTS), we derive bounds on the simple regret of the algorithm. Finally, we quantify the policy improvement of MCTS under the \texttt{SD-MDP} framework by demonstrating that the MCTS planning algorithm achieves higher expected reward (lower costs) under a constant simulation budget, on a tangible economic example based on maritime refuelling. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: Working manuscript

ACM Class: C.4

arXiv:2306.17052 [pdf, other]

Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning

Authors: Matej Jusup, Barna Pásztor, Tadeusz Janik, Kenan Zhang, Francesco Corman, Andreas Krause, Ilija Bogunovic

Abstract: Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent interacting with the infinite population of identical agents instead of considering individual pairwise interactions. In this paper, we address an important generalization where… ▽ More Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent interacting with the infinite population of identical agents instead of considering individual pairwise interactions. In this paper, we address an important generalization where there exist global constraints on the distribution of agents (e.g., requiring capacity constraints or minimum coverage requirements to be met). We propose Safe-M$^3$-UCRL, the first model-based mean-field reinforcement learning algorithm that attains safe policies even in the case of unknown transitions. As a key ingredient, it uses epistemic uncertainty in the transition model within a log-barrier approach to ensure pessimistic constraints satisfaction with high probability. Beyond the synthetic swarm motion benchmark, we showcase Safe-M$^3$-UCRL on the vehicle repositioning problem faced by many shared mobility operators and evaluate its performance through simulations built on vehicle trajectory data from a service provider in Shenzhen. Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand. △ Less

Submitted 27 December, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: 23 pages, 26 figures, 6 tables

arXiv:2302.04376 [pdf, other]

Efficient Planning in Combinatorial Action Spaces with Applications to Cooperative Multi-Agent Reinforcement Learning

Authors: Volodymyr Tkachuk, Seyed Alireza Bakhtiari, Johannes Kirschner, Matej Jusup, Ilija Bogunovic, Csaba Szepesvári

Abstract: A practical challenge in reinforcement learning are combinatorial action spaces that make planning computationally demanding. For example, in cooperative multi-agent reinforcement learning, a potentially large number of agents jointly optimize a global reward function, which leads to a combinatorial blow-up in the action space by the number of agents. As a minimal requirement, we assume access to… ▽ More A practical challenge in reinforcement learning are combinatorial action spaces that make planning computationally demanding. For example, in cooperative multi-agent reinforcement learning, a potentially large number of agents jointly optimize a global reward function, which leads to a combinatorial blow-up in the action space by the number of agents. As a minimal requirement, we assume access to an argmax oracle that allows to efficiently compute the greedy policy for any Q-function in the model class. Building on recent work in planning with local access to a simulator and linear function approximation, we propose efficient algorithms for this setting that lead to polynomial compute and query complexity in all relevant problem parameters. For the special case where the feature decomposition is additive, we further improve the bounds and extend the results to the kernelized setting with an efficient algorithm. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2110.01866 [pdf, other]

doi 10.1016/j.physrep.2021.10.005

Social physics

Authors: Marko Jusup, Petter Holme, Kiyoshi Kanazawa, Misako Takayasu, Ivan Romic, Zhen Wang, Suncana Gecek, Tomislav Lipic, Boris Podobnik, Lin Wang, Wei Luo, Tin Klanjscek, Jingfang Fan, Stefano Boccaletti, Matjaz Perc

Abstract: Recent decades have seen a rise in the use of physics methods to study different societal phenomena. This development has been due to physicists venturing outside of their traditional domains of interest, but also due to scientists from other disciplines taking from physics the methods that have proven so successful throughout the 19th and the 20th century. Here we dub this field 'social physics'… ▽ More Recent decades have seen a rise in the use of physics methods to study different societal phenomena. This development has been due to physicists venturing outside of their traditional domains of interest, but also due to scientists from other disciplines taking from physics the methods that have proven so successful throughout the 19th and the 20th century. Here we dub this field 'social physics' and pay our respect to intellectual mavericks who nurtured it to maturity. We do so by reviewing the current state of the art. Starting with a set of topics that are at the heart of modern human societies, we review research dedicated to urban development and traffic, the functioning of financial markets, cooperation as the basis for our evolutionary success, the structure of social networks, and the integration of intelligent machines into these networks. We then shift our attention to a set of topics that explore potential threats to society. These include criminal behaviour, large-scale migrations, epidemics, environmental challenges, and climate change. We end the coverage of each topic with promising directions for future research. Based on this, we conclude that the future for social physics is bright. Physicists studying societal phenomena are no longer a curiosity, but rather a force to be reckoned with. Notwithstanding, it remains of the utmost importance that we continue to foster constructive dialogue and mutual respect at the interfaces of different scientific disciplines. △ Less

Submitted 11 January, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: 359 pages, 78 figures; published in Physics Reports

Journal ref: Phys. Rep. 948, 1-148 (2022)

arXiv:2002.05106 [pdf, other]

doi 10.1098/rsif.2019.0789

A novel route to cyclic dominance in voluntary social dilemmas

Authors: Hao Guo, Zhao Song, Sunčana Geček, Xuelong Li, Marko Jusup, Matjaz Perc, Yamir Moreno, Stefano Boccaletti, Zhen Wang

Abstract: Cooperation is the backbone of modern human societies, making it a priority to understand how successful cooperation-sustaining mechanisms operate. Cyclic dominance, a non-transitive setup comprising at least three strategies wherein the first strategy overrules the second which overrules the third which, in turn, overrules the first strategy, is known to maintain bio-diversity, drive competition… ▽ More Cooperation is the backbone of modern human societies, making it a priority to understand how successful cooperation-sustaining mechanisms operate. Cyclic dominance, a non-transitive setup comprising at least three strategies wherein the first strategy overrules the second which overrules the third which, in turn, overrules the first strategy, is known to maintain bio-diversity, drive competition between bacterial strains, and preserve cooperation in social dilemmas. Here, we present a novel route to cyclic dominance in voluntary social dilemmas by adding to the traditional mix of cooperators, defectors, and loners, a fourth player type, risk-averse hedgers, who enact tit-for-tat upon paying a hedging cost to avoid being exploited. When this cost is sufficiently small, cooperators, defectors, and hedgers enter a loop of cyclic dominance that preserves cooperation even under the most adverse conditions. In contrast, when the hedging cost is large, hedgers disappear, consequently reverting to the traditional interplay of cooperators, defectors, and loners. In the interim region of hedging costs, complex evolutionary dynamics ensues, prompting transitions between states with two, three, or four competing strategies. Our results thus reveal that voluntary participation is but one pathway to sustained cooperation via cyclic dominance. △ Less

Submitted 12 February, 2020; originally announced February 2020.

Comments: 9 pages, 6 figures, supplementary information

Journal ref: J. R. Soc. Interface 17, 20190789 (2020)

Showing 1–5 of 5 results for author: Jusup, M