skip to main content
research-article
Open access

Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games

Published: 10 June 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS ‘21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is O(polylog(T)) after T repetitions of the game. In this paper we extend their result from external regret to internal and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium at the rate of O( T−1 ). This substantially improves over the prior best rate of convergence of O(T−3/4) due to Chen and Peng (NeurIPS ‘20), and it is optimal up to polylogarithmic factors.
    To obtain these results, we develop new techniques for establishing higher-order smoothness for learning dynamics involving fixed point operations. Specifically, we first establish that the no-internal-regret learning dynamics of Stoltz and Lugosi (Mach Learn ‘05) are equivalently simulated by no-external-regret dynamics on a combinatorial space. This allows us to trade the computation of the stationary distribution on a polynomial-sized Markov chain for a (much more well-behaved) linear transformation on an exponential-sized set, enabling us to leverage similar techniques as DGF to near-optimally bound the internal regret.
    Moreover, we establish an O(polylog(T)) no-swap-regret bound for the classic algorithm of Blum and Mansour (BM) (JMLR ‘07). We do so by introducing a technique based on the Cauchy Integral Formula that circumvents the more limited combinatorial arguments of DFG. In addition to shedding clarity on the near-optimal regret guarantees of BM, our arguments provide insights into the various ways in which the techniques by DFG can be extended and leveraged in the analysis of more involved learning algorithms.

    References

    [1]
    Jacob Abernethy, Peter L Bartlett, and Elad Hazan. 2011. Blackwell Approachability and No-Regret Learning are Equivalent. In COLT. 27–46.
    [2]
    V. Anantharam and P. Tsoucas. 1989. A proof of the Markov chain tree theorem. Statistics & Probability Letters, 8, 2 (1989), 189–192. https://doi.org/10.1016/0167-7152(89)90016-3
    [3]
    Robert Aumann. 1974. Subjectivity and Correlation in Randomized Strategies. Journal of Mathematical Economics, 1 (1974), 67–96. https://doi.org/10.1016/0304-4068(74)90037-8
    [4]
    Yakov Babichenko and Aviad Rubinstein. 2017. Communication complexity of approximate Nash equilibria. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017. ACM, 878–889. https://doi.org/10.1145/3055399.3055407
    [5]
    David Blackwell. 1956. An analog of the minmax theorem for vector payoffs. Pacific J. Math., 6 (1956), 1–8. https://doi.org/pjm/1103044235
    [6]
    Avrim Blum and Yishay Mansour. 2007. From External to Internal Regret. J. Mach. Learn. Res., 8 (2007), 1307–1324.
    [7]
    Noam Brown and Tuomas Sandholm. 2017. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, Dec., https://doi.org/10.1126/science.aao1733
    [8]
    Arthur Cayley. 1889. A theorem on trees. Quart. J. Math., 23 (1889), 376–378.
    [9]
    Nicolo Cesa-Bianchi and Gabor Lugosi. 2006. Prediction, learning, and games. Cambridge University Press. https://doi.org/10.1017/CBO9780511546921
    [10]
    Xi Chen, Xiaotie Deng, and Shang-Hua Teng. 2009. Settling the Complexity of Computing Two-Player Nash Equilibria. J. ACM, https://doi.org/10.1145/1516512.1516516
    [11]
    Xi Chen and Binghui Peng. 2020. Hedging in games: Faster convergence of external and swap regrets. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS).
    [12]
    Constantinos Daskalakis, Alan Deckelbaum, and Anthony Kim. 2011. Near-optimal no-regret algorithms for zero-sum games. In Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). https://doi.org/10.1137/1.9781611973082.21
    [13]
    Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. 2021. Near-Optimal No-Regret Learning in General Games. In Advances in Neural Information Processing Systems. 34, Curran Associates, Inc., 27604–27616.
    [14]
    Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. 2009. The complexity of computing a Nash equilibrium. SIAM J. Comput., 39, 1 (2009), https://doi.org/10.1137/070699652
    [15]
    Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, and Haoyang Zeng. 2018. Training GANs with Optimism. In 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net.
    [16]
    Constantinos Daskalakis and Ioannis Panageas. 2018. The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization. In NeurIPS 2018. 9256–9266.
    [17]
    Constantinos Daskalakis and Ioannis Panageas. 2019. Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization. In 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, Avrim Blum (Ed.) (LIPIcs, Vol. 124). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 27:1–27:18. https://doi.org/10.4230/LIPIcs.ITCS.2019.27
    [18]
    Kousha Etessami and Mihalis Yannakakis. 2007. On the Complexity of Nash Equilibria and Other Fixed Points (Extended Abstract). In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS). 113–123. https://doi.org/10.1109/FOCS.2007.48
    [19]
    Gabriele Farina, Andrea Celli, Alberto Marchesi, and Nicola Gatti. 2021. Simple Uncoupled No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium. CoRR, abs/2104.01520 (2021).
    [20]
    Dean Foster and Rakesh Vohra. 1997. Calibrated Learning and Correlated Equilibrium. Games and Economic Behavior, 21 (1997), 40–55. https://doi.org/10.1006/game.1997.0595
    [21]
    Geoffrey J Gordon, Amy Greenwald, and Casey Marks. 2008. No-regret learning in convex games. In Proceedings of the 25 th international conference on Machine learning. 360–367. https://doi.org/10.1145/1390156.1390202
    [22]
    Amy Greenwald and Amir Jafari. 2003. A General Class of No-Regret Learning Algorithms and Game-Theoretic Equilibria. In Conference on Learning Theory (COLT). Washington, D.C. https://doi.org/10.1007/978-3-540-45167-9_2
    [23]
    Sergiu Hart and Andreu Mas-Colell. 2000. A Simple Adaptive Procedure Leading to Correlated Equilibrium. Econometrica, 68 (2000), 1127–1150. https://doi.org/10.1111/1468-0262.00153
    [24]
    Shinji Ito. 2020. A Tight Lower Bound and Efficient Reduction for Swap Regret. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020.
    [25]
    Alex Kruckman, Amy Greenwald, and John R. Wicks. 2010. An elementary proof of the Markov chain tree theorem. Brown University.
    [26]
    Nick Littlestone and M. K. Warmuth. 1994. The Weighted Majority Algorithm. Information and Computation, 108, 2 (1994), 212–261. https://doi.org/10.1006/inco.1994.1009
    [27]
    Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, and Michael Bowling. 2017. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science, May, https://doi.org/10.1126/science.aam6960
    [28]
    John Nash. 1950. Equilibrium points in N-person games. Proceedings of the National Academy of Sciences, 36 (1950), 48–49. https://doi.org/10.1073/pnas.36.1.48
    [29]
    Yurii Nesterov. 2005. Smooth Minimization of Non-Smooth Functions. Mathematical Programming, 103 (2005), https://doi.org/10.1007/s10107-004-0552-5
    [30]
    Christos H. Papadimitriou and Tim Roughgarden. 2008. Computing correlated equilibria in multi-player games. J. ACM, 55, 3 (2008), 14:1–14:29. https://doi.org/10.1145/1379759.1379762
    [31]
    Alexander Rakhlin and Karthik Sridharan. 2013. Online Learning with Predictable Sequences. In Conference on Learning Theory. 993–1019.
    [32]
    Alexander Rakhlin and Karthik Sridharan. 2013. Optimization, learning, and games with predictable sequences. In Advances in Neural Information Processing Systems. 3066–3074.
    [33]
    Julia Robinson. 1951. An iterative method of solving a game. Annals of Mathematics, 54 (1951), 296–301. https://doi.org/10.2307/1969530
    [34]
    Tim Roughgarden. 2015. Intrinsic Robustness of the Price of Anarchy. J. ACM, 62, 5 (2015), 32:1–32:42. https://doi.org/10.1145/2806883
    [35]
    Aviad Rubinstein. 2016. Settling the Complexity of Computing Approximate Two-Player Nash Equilibria. In IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS, Irit Dinur (Ed.). IEEE Computer Society, 258–265. https://doi.org/10.1109/FOCS.2016.35
    [36]
    Shai Shalev-Shwartz. 2012. Online Learning and Online Convex Optimization. Foundations and Trends in Machine Learning, 4, 2 (2012), issn:1935-8237 https://doi.org/10.1561/2200000018
    [37]
    Gilles Stoltz and Gábor Lugosi. 2005. Internal regret in on-line portfolio selection. Machine Learning, 59, 1-2 (2005), 125–159. https://doi.org/10.1007/s10994-005-0465-4
    [38]
    Gilles Stoltz and Gábor Lugosi. 2007. Learning correlated equilibria in games with compact sets of strategies. Games Econ. Behav., 59, 1 (2007), 187–208. https://doi.org/10.1016/j.geb.2006.04.007
    [39]
    Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E Schapire. 2015. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems. 2989–2997.
    [40]
    Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, and Haipeng Luo. 2021. Linear Last-iterate Convergence in Constrained Saddle-point Optimization. In ICLR. OpenReview.net.

    Cited By

    View all
    • (2024)Distributed Learning of Unknown Games for HetNet SelectionIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.335479211:3(2914-2926)Online publication date: May-2024
    • (2024)Multi-Agent Task Assignment in Vehicular Edge Computing: A Regret-Matching Learning-Based ApproachIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2023.33395408:2(1527-1539)Online publication date: Apr-2024
    • (2024)Learning in games: a systematic reviewScience China Information Sciences10.1007/s11432-023-3955-x67:7Online publication date: 28-Jun-2024
    • Show More Cited By

    Index Terms

    1. Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      STOC 2022: Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing
      June 2022
      1698 pages
      ISBN:9781450392648
      DOI:10.1145/3519935
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 June 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Convergence of learning dynamics
      2. correlated equilibria
      3. internal regret
      4. swap regret

      Qualifiers

      • Research-article

      Conference

      STOC '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)324
      • Downloads (Last 6 weeks)26

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Distributed Learning of Unknown Games for HetNet SelectionIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.335479211:3(2914-2926)Online publication date: May-2024
      • (2024)Multi-Agent Task Assignment in Vehicular Edge Computing: A Regret-Matching Learning-Based ApproachIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2023.33395408:2(1527-1539)Online publication date: Apr-2024
      • (2024)Learning in games: a systematic reviewScience China Information Sciences10.1007/s11432-023-3955-x67:7Online publication date: 28-Jun-2024
      • (2024)A survey of decision making in adversarial gamesScience China Information Sciences10.1007/s11432-022-3777-y67:4Online publication date: 4-Feb-2024
      • (2023)Doubly optimal no-regret learning in monotone gamesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618551(3507-3524)Online publication date: 23-Jul-2023
      • (2023)On the Tradeoff Between Heterogeneity and Communication Complexity in Federated Learning2023 57th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF59524.2023.10476887(115-121)Online publication date: 29-Oct-2023
      • (2023)Near Optimal Memory-Regret Tradeoff for Online Learning2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00069(1171-1194)Online publication date: 6-Nov-2023
      • (2022)Near-optimal no-regret learning dynamics for general convex gamesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3603102(39076-39089)Online publication date: 28-Nov-2022
      • (2022)Optimistic mirror descent either converges to nash or to strong coarse correlated equilibria in bimatrix gamesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601466(16439-16454)Online publication date: 28-Nov-2022
      • (2022)Uncoupled learning dynamics with O(log T) swap regret in multiplayer gamesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600508(3292-3304)Online publication date: 28-Nov-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media