subscribe to arXiv mailings

A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback

Authors: Kihyun Kim, Jiawei Zhang, Asuman Ozdaglar, Pablo A. Parrilo

Abstract: Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning, which involve inferring and shaping the underlying reward function of sequential decision-making problems based on observed human demonstrations and feedback. Most prior work in reward learning has relied on prior knowledge or assumptions about decision or prefer… ▽ More Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning, which involve inferring and shaping the underlying reward function of sequential decision-making problems based on observed human demonstrations and feedback. Most prior work in reward learning has relied on prior knowledge or assumptions about decision or preference models, potentially leading to robustness issues. In response, this paper introduces a novel linear programming (LP) framework tailored for offline reward learning. Utilizing pre-collected trajectories without online exploration, this framework estimates a feasible reward set from the primal-dual optimality conditions of a suitably designed LP, and offers an optimality guarantee with provable sample efficiency. Our LP framework also enables aligning the reward functions with human feedback, such as pairwise trajectory comparison data, while maintaining computational tractability and sample efficiency. We demonstrate that our framework potentially achieves better performance compared to the conventional maximum likelihood estimation (MLE) approach through analytical examples and numerical experiments. △ Less

Submitted 3 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2405.01817 [pdf, other]

Uniformly Stable Algorithms for Adversarial Training and Beyond

Authors: Jiancong Xiao, Jiawei Zhang, Zhi-Quan Luo, Asuman Ozdaglar

Abstract: In adversarial machine learning, neural networks suffer from a significant issue known as robust overfitting, where the robust test accuracy decreases over epochs (Rice et al., 2020). Recent research conducted by Xing et al.,2021; Xiao et al., 2022 has focused on studying the uniform stability of adversarial training. Their investigations revealed that SGD-based adversarial training fails to exhib… ▽ More In adversarial machine learning, neural networks suffer from a significant issue known as robust overfitting, where the robust test accuracy decreases over epochs (Rice et al., 2020). Recent research conducted by Xing et al.,2021; Xiao et al., 2022 has focused on studying the uniform stability of adversarial training. Their investigations revealed that SGD-based adversarial training fails to exhibit uniform stability, and the derived stability bounds align with the observed phenomenon of robust overfitting in experiments. This motivates us to develop uniformly stable algorithms specifically tailored for adversarial training. To this aim, we introduce Moreau envelope-$\mathcal{A}$, a variant of the Moreau Envelope-type algorithm. We employ a Moreau envelope function to reframe the original problem as a min-min problem, separating the non-strong convexity and non-smoothness of the adversarial loss. Then, this approach alternates between solving the inner and outer minimization problems to achieve uniform stability without incurring additional computational overhead. In practical scenarios, we show the efficacy of ME-$\mathcal{A}$ in mitigating the issue of robust overfitting. Beyond its application in adversarial training, this represents a fundamental result in uniform stability analysis, as ME-$\mathcal{A}$ is the first algorithm to exhibit uniform stability for weakly-convex, non-smooth problems. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2405.00254 [pdf, other]

RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

Authors: Chanwoo Park, Mingyang Liu, Dingwen Kong, Kaiqing Zhang, Asuman Ozdaglar

Abstract: Reinforcement learning from human feedback (RLHF) has been an effective technique for aligning AI systems with human values, with remarkable successes in fine-tuning large-language models recently. Most existing RLHF paradigms make the underlying assumption that human preferences are relatively homogeneous, and can be encoded by a single reward model. In this paper, we focus on addressing the issu… ▽ More Reinforcement learning from human feedback (RLHF) has been an effective technique for aligning AI systems with human values, with remarkable successes in fine-tuning large-language models recently. Most existing RLHF paradigms make the underlying assumption that human preferences are relatively homogeneous, and can be encoded by a single reward model. In this paper, we focus on addressing the issues due to the inherent heterogeneity in human preferences, as well as their potential strategic behavior in providing feedback. Specifically, we propose two frameworks to address heterogeneous human feedback in principled ways: personalization-based one and aggregation-based one. For the former, we propose two approaches based on representation learning and clustering, respectively, for learning multiple reward models that trades off the bias (due to preference heterogeneity) and variance (due to the use of fewer data for learning each model by personalization). We then establish sample complexity guarantees for both approaches. For the latter, we aim to adhere to the single-model framework, as already deployed in the current RLHF paradigm, by carefully aggregating diverse and truthful preferences from humans. We propose two approaches based on reward and preference aggregation, respectively: the former utilizes both utilitarianism and Leximin approaches to aggregate individual reward models, with sample complexity guarantees; the latter directly aggregates the human feedback in the form of probabilistic opinions. Under the probabilistic-opinion-feedback model, we also develop an approach to handle strategic human labelers who may bias and manipulate the aggregated preferences with untruthful feedback. Based on the ideas in mechanism design, our approach ensures truthful preference reporting, with the induced aggregation rule maximizing social welfare functions. △ Less

Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

Comments: Added experiments

arXiv:2403.16843 [pdf, other]

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Authors: Chanwoo Park, Xiangyu Liu, Asuman Ozdaglar, Kaiqing Zhang

Abstract: Large language models (LLMs) have been increasingly employed for (interactive) decision-making, via the development of LLM-based autonomous agents. Despite their emerging successes, the performance of LLM agents in decision-making has not been fully investigated through quantitative metrics, especially in the multi-agent setting when they interact with each other, a typical scenario in real-world… ▽ More Large language models (LLMs) have been increasingly employed for (interactive) decision-making, via the development of LLM-based autonomous agents. Despite their emerging successes, the performance of LLM agents in decision-making has not been fully investigated through quantitative metrics, especially in the multi-agent setting when they interact with each other, a typical scenario in real-world LLM-agent applications. To better understand the limits of LLM agents in these interactive environments, we propose to study their interactions in benchmark decision-making settings in online learning and game theory, through the performance metric of \emph{regret}. We first empirically study the {no-regret} behaviors of LLMs in canonical (non-stationary) online learning problems, as well as the emergence of equilibria when LLM agents interact through playing repeated games. We then provide some theoretical insights into the no-regret behaviors of LLM agents, under certain assumptions on the supervised pre-training and the rationality model of human decision-makers who generate the data. Notably, we also identify (simple) cases where advanced LLMs such as GPT-4 fail to be no-regret. To promote the no-regret behaviors, we propose a novel \emph{unsupervised} training loss of \emph{regret-loss}, which, in contrast to the supervised pre-training loss, does not require the labels of (optimal) actions. We then establish the statistical guarantee of generalization bound for regret-loss minimization, followed by the optimization guarantee that minimizing such a loss may automatically lead to known no-regret learning algorithms. Our further experiments demonstrate the effectiveness of our regret-loss, especially in addressing the above ``regrettable'' cases. △ Less

Submitted 26 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: Added experimental results for open-source models

arXiv:2401.00313 [pdf, other]

Matching of Users and Creators in Two-Sided Markets with Departures

Authors: Daniel Huttenlocher, Hannah Li, Liang Lyu, Asuman Ozdaglar, James Siderius

Abstract: Many online platforms of today, including social media sites, are two-sided markets bridging content creators and users. Most of the existing literature on platform recommendation algorithms largely focuses on user preferences and decisions, and does not simultaneously address creator incentives. We propose a model of content recommendation that explicitly focuses on the dynamics of user-content m… ▽ More Many online platforms of today, including social media sites, are two-sided markets bridging content creators and users. Most of the existing literature on platform recommendation algorithms largely focuses on user preferences and decisions, and does not simultaneously address creator incentives. We propose a model of content recommendation that explicitly focuses on the dynamics of user-content matching, with the novel property that both users and creators may leave the platform permanently if they do not experience sufficient engagement. In our model, each player decides to participate at each time step based on utilities derived from the current match: users based on alignment of the recommended content with their preferences, and creators based on their audience size. We show that a user-centric greedy algorithm that does not consider creator departures can result in arbitrarily poor total engagement, relative to an algorithm that maximizes total engagement while accounting for two-sided departures. Moreover, in stark contrast to the case where only users or only creators leave the platform, we prove that with two-sided departures, approximating maximum total engagement within any constant factor is NP-hard. We present two practical algorithms, one with performance guarantees under mild assumptions on user preferences, and another that tends to outperform algorithms that ignore two-sided departures in practice. △ Less

Submitted 19 January, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

arXiv:2312.04905 [pdf, ps, other]

Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

Authors: Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman

Abstract: We consider two-player zero-sum stochastic games and propose a two-timescale $Q$-learning algorithm with function approximation that is payoff-based, convergent, rational, and symmetric between the two players. In two-timescale $Q$-learning, the fast-timescale iterates are updated in spirit to the stochastic gradient descent and the slow-timescale iterates (which we use to compute the policies) ar… ▽ More We consider two-player zero-sum stochastic games and propose a two-timescale $Q$-learning algorithm with function approximation that is payoff-based, convergent, rational, and symmetric between the two players. In two-timescale $Q$-learning, the fast-timescale iterates are updated in spirit to the stochastic gradient descent and the slow-timescale iterates (which we use to compute the policies) are updated by taking a convex combination between its previous iterate and the latest fast-timescale iterate. Introducing the slow timescale as well as its update equation marks as our main algorithmic novelty. In the special case of linear function approximation, we establish, to the best of our knowledge, the first last-iterate finite-sample bound for payoff-based independent learning dynamics of these types. The result implies a polynomial sample complexity to find a Nash equilibrium in such stochastic games. To establish the results, we model our proposed algorithm as a two-timescale stochastic approximation and derive the finite-sample bound through a Lyapunov-based approach. The key novelty lies in constructing a valid Lyapunov function to capture the evolution of the slow-timescale iterates. Specifically, through a change of variable, we show that the update equation of the slow-timescale iterates resembles the classical smoothed best-response dynamics, where the regularized Nash gap serves as a valid Lyapunov function. This insight enables us to construct a valid Lyapunov function via a generalized variant of the Moreau envelope of the regularized Nash gap. The construction of our Lyapunov function might be of broad independent interest in studying the behavior of stochastic approximation algorithms. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2308.11518 [pdf, ps, other]

EM for Mixture of Linear Regression with Clustered Data

Authors: Amirhossein Reisizadeh, Khashayar Gatmiry, Asuman Ozdaglar

Abstract: Modern data-driven and distributed learning frameworks deal with diverse massive data generated by clients spread across heterogeneous environments. Indeed, data heterogeneity is a major bottleneck in scaling up many distributed learning paradigms. In many settings however, heterogeneous data may be generated in clusters with shared structures, as is the case in several applications such as federa… ▽ More Modern data-driven and distributed learning frameworks deal with diverse massive data generated by clients spread across heterogeneous environments. Indeed, data heterogeneity is a major bottleneck in scaling up many distributed learning paradigms. In many settings however, heterogeneous data may be generated in clusters with shared structures, as is the case in several applications such as federated learning where a common latent variable governs the distribution of all the samples generated by a client. It is therefore natural to ask how the underlying clustered structures in distributed data can be exploited to improve learning schemes. In this paper, we tackle this question in the special case of estimating $d$-dimensional parameters of a two-component mixture of linear regressions problem where each of $m$ nodes generates $n$ samples with a shared latent variable. We employ the well-known Expectation-Maximization (EM) method to estimate the maximum likelihood parameters from $m$ batches of dependent samples each containing $n$ measurements. Discarding the clustered structure in the mixture model, EM is known to require $O(\log(mn/d))$ iterations to reach the statistical accuracy of $O(\sqrt{d/(mn)})$. In contrast, we show that if initialized properly, EM on the structured data requires only $O(1)$ iterations to reach the same statistical accuracy, as long as $m$ grows up as $e^{o(n)}$. Our analysis establishes and combines novel asymptotic optimization and generalization guarantees for population and empirical EM with dependent samples, which may be of independent interest. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2307.09470 [pdf, other]

Multi-Player Zero-Sum Markov Games with Networked Separable Interactions

Authors: Chanwoo Park, Kaiqing Zhang, Asuman Ozdaglar

Abstract: We study a new class of Markov games, \emph(multi-player) zero-sum Markov Games} with \emph{Networked separable interactions} (zero-sum NMGs), to model the local interaction structure in non-cooperative multi-agent sequential decision-making. We define a zero-sum NMG as a model where {the payoffs of the auxiliary games associated with each state are zero-sum and} have some separable (i.e., polymat… ▽ More We study a new class of Markov games, \emph(multi-player) zero-sum Markov Games} with \emph{Networked separable interactions} (zero-sum NMGs), to model the local interaction structure in non-cooperative multi-agent sequential decision-making. We define a zero-sum NMG as a model where {the payoffs of the auxiliary games associated with each state are zero-sum and} have some separable (i.e., polymatrix) structure across the neighbors over some interaction network. We first identify the necessary and sufficient conditions under which an MG can be presented as a zero-sum NMG, and show that the set of Markov coarse correlated equilibrium (CCE) collapses to the set of Markov Nash equilibrium (NE) in these games, in that the product of per-state marginalization of the former for all players yields the latter. Furthermore, we show that finding approximate Markov \emph{stationary} CCE in infinite-horizon discounted zero-sum NMGs is \texttt{PPAD}-hard, unless the underlying network has a ``star topology''. Then, we propose fictitious-play-type dynamics, the classical learning dynamics in normal-form games, for zero-sum NMGs, and establish convergence guarantees to Markov stationary NE under a star-shaped network structure. Finally, in light of the hardness result, we focus on computing a Markov \emph{non-stationary} NE and provide finite-iteration guarantees for a series of value-iteration-based algorithms. We also provide numerical experiments to corroborate our theoretical results. △ Less

Submitted 21 March, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

arXiv:2305.00474 [pdf, other]

Learning, Diversity and Adaptation in Changing Environments: The Role of Weak Links

Authors: Daron Acemoglu, Asuman Ozdaglar, Sarath Pattathil

Abstract: Adaptation to dynamic conditions requires a certain degree of diversity. If all agents take the best current action, learning that the underlying state has changed and behavior should adapt will be slower. Diversity is harder to maintain when there is fast communication between agents, because they tend to find out and pursue the best action rapidly. We explore these issues using a model of (Bayes… ▽ More Adaptation to dynamic conditions requires a certain degree of diversity. If all agents take the best current action, learning that the underlying state has changed and behavior should adapt will be slower. Diversity is harder to maintain when there is fast communication between agents, because they tend to find out and pursue the best action rapidly. We explore these issues using a model of (Bayesian) learning over a social network. Agents learn rapidly from and may also have incentives to coordinate with others to whom they are connected via strong links. We show, however, that when the underlying environment changes sufficiently rapidly, any network consisting of just strong links will do only a little better than random choice in the long run. In contrast, networks combining strong and weak links, whereby the latter type of links transmit information only slowly, can achieve much higher long-run average payoffs. The best social networks are those that combine a large fraction of agents into a strongly-connected component, while still maintaining a sufficient number of smaller communities that make diverse choices and communicate with this component via weak links. △ Less

Submitted 30 April, 2023; originally announced May 2023.

arXiv:2303.03100 [pdf, ps, other]

A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Authors: Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman

Abstract: We study two-player zero-sum stochastic games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. Our main… ▽ More We study two-player zero-sum stochastic games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. Our main results provide finite-sample guarantees. In particular, we prove the first-known $\tilde{\mathcal{O}}(1/ε^2)$ sample complexity bound for payoff-based independent learning dynamics, up to a smoothing bias. In the special case where the stochastic game has only one state (i.e., matrix games), we provide a sharper $\tilde{\mathcal{O}}(1/ε)$ sample complexity. Our analysis uses a novel coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest. △ Less

Submitted 3 March, 2023; originally announced March 2023.

arXiv:2212.13861 [pdf, ps, other]

Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation

Authors: Asuman Ozdaglar, Sarath Pattathil, Jiawei Zhang, Kaiqing Zhang

Abstract: Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-making using a pre-collected dataset, without further interaction with the environment. Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators, especially to handle the case with excessively lar… ▽ More Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-making using a pre-collected dataset, without further interaction with the environment. Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators, especially to handle the case with excessively large state-action spaces. Among them, the framework based on the linear-programming (LP) reformulation of Markov decision processes has shown promise: it enables sample-efficient offline RL with function approximation, under only partial data coverage and realizability assumptions on the function classes, with favorable computational tractability. In this work, we revisit the LP framework for offline RL, and provide a new reformulation that advances the existing results in several aspects, relaxing certain assumptions and achieving optimal statistical rates in terms of sample size. Our key enabler is to introduce proper constraints in the reformulation, instead of using any regularization as in the literature, also with careful choices of the function classes and initial state distributions. We hope our insights bring into light the use of LP formulations and the induced primal-dual minimax optimization, in offline RL. △ Less

Submitted 8 February, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: 35 pages

arXiv:2210.12812 [pdf, ps, other]

Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence

Authors: Sarath Pattathil, Kaiqing Zhang, Asuman Ozdaglar

Abstract: Multi-agent interactions are increasingly important in the context of reinforcement learning, and the theoretical foundations of policy gradient methods have attracted surging research interest. We investigate the global convergence of natural policy gradient (NPG) algorithms in multi-agent learning. We first show that vanilla NPG may not have parameter convergence, i.e., the convergence of the ve… ▽ More Multi-agent interactions are increasingly important in the context of reinforcement learning, and the theoretical foundations of policy gradient methods have attracted surging research interest. We investigate the global convergence of natural policy gradient (NPG) algorithms in multi-agent learning. We first show that vanilla NPG may not have parameter convergence, i.e., the convergence of the vector that parameterizes the policy, even when the costs are regularized (which enabled strong convergence guarantees in the policy space in the literature). This non-convergence of parameters leads to stability issues in learning, which becomes especially relevant in the function approximation setting, where we can only operate on low-dimensional parameters, instead of the high-dimensional policy. We then propose variants of the NPG algorithm, for several standard multi-agent learning scenarios: two-player zero-sum matrix and Markov games, and multi-player monotone games, with global last-iterate parameter convergence guarantees. We also generalize the results to certain function approximation settings. Note that in our algorithms, the agents take symmetric roles. Our results might also be of independent interest for solving nonconvex-nonconcave minimax optimization problems with certain structures. Simulations are also provided to corroborate our theoretical findings. △ Less

Submitted 20 March, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: Initially submitted for publication in January 2022

arXiv:2206.09495 [pdf, other]

The Power of Regularization in Solving Extensive-Form Games

Authors: Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang

Abstract: In this paper, we investigate the power of {\it regularization}, a common technique in reinforcement learning and optimization, in solving extensive-form games (EFGs). We propose a series of new algorithms based on regularizing the payoff functions of the game, and establish a set of convergence results that strictly improve over the existing ones, with either weaker assumptions or stronger conver… ▽ More In this paper, we investigate the power of {\it regularization}, a common technique in reinforcement learning and optimization, in solving extensive-form games (EFGs). We propose a series of new algorithms based on regularizing the payoff functions of the game, and establish a set of convergence results that strictly improve over the existing ones, with either weaker assumptions or stronger convergence guarantees. In particular, we first show that dilated optimistic mirror descent (DOMD), an efficient variant of OMD for solving EFGs, with adaptive regularization can achieve a fast $\tilde O(1/T)$ last-iterate convergence in terms of duality gap and distance to the set of Nash equilibrium (NE) without uniqueness assumption of the NE. Second, we show that regularized counterfactual regret minimization (\texttt{Reg-CFR}), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(1/T^{1/4})$ best-iterate, and $O(1/T^{3/4})$ average-iterate convergence rate for finding NE in EFGs. Finally, we show that \texttt{Reg-CFR} can achieve asymptotic last-iterate convergence, and optimal $O(1/T)$ average-iterate convergence rate, for finding the NE of perturbed EFGs, which is useful for finding approximate extensive-form perfect equilibria (EFPE). To the best of our knowledge, they constitute the first last-iterate convergence results for CFR-type algorithms, while matching the state-of-the-art average-iterate convergence rate in finding NE for non-perturbed EFGs. We also provide numerical results to corroborate the advantages of our algorithms. △ Less

Submitted 9 March, 2023; v1 submitted 19 June, 2022; originally announced June 2022.

arXiv:2206.05637 [pdf, ps, other]

Convergence and Stability of Coupled Belief--Strategy Learning Dynamics in Continuous Games

Authors: Manxi Wu, Saurabh Amin, Asuman Ozdaglar

Abstract: We propose a learning dynamics to model how strategic agents repeatedly play a continuous game while relying on an information platform to learn an unknown payoff-relevant parameter. In each time step, the platform updates a belief estimate of the parameter based on players' strategies and realized payoffs using Bayes's rule. Then, players adopt a generic learning rule to adjust their strategies b… ▽ More We propose a learning dynamics to model how strategic agents repeatedly play a continuous game while relying on an information platform to learn an unknown payoff-relevant parameter. In each time step, the platform updates a belief estimate of the parameter based on players' strategies and realized payoffs using Bayes's rule. Then, players adopt a generic learning rule to adjust their strategies based on the updated belief. We present results on the convergence of beliefs and strategies and the properties of convergent fixed points of the dynamics. We obtain sufficient and necessary conditions for the existence of globally stable fixed points. We also provide sufficient conditions for the local stability of fixed points. These results provide an approach to analyzing the long-term outcomes that arise from the interplay between Bayesian belief learning and strategy learning in games, and enable us to characterize conditions under which learning leads to a complete information equilibrium. △ Less

Submitted 31 October, 2023; v1 submitted 11 June, 2022; originally announced June 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2109.00719

arXiv:2206.04502 [pdf, other]

What is a Good Metric to Study Generalization of Minimax Learners?

Authors: Asuman Ozdaglar, Sarath Pattathil, Jiawei Zhang, Kaiqing Zhang

Abstract: Minimax optimization has served as the backbone of many machine learning (ML) problems. Although the convergence behavior of optimization algorithms has been extensively studied in the minimax settings, their generalization guarantees in stochastic minimax optimization problems, i.e., how the solution trained on empirical data performs on unseen testing data, have been relatively underexplored. A… ▽ More Minimax optimization has served as the backbone of many machine learning (ML) problems. Although the convergence behavior of optimization algorithms has been extensively studied in the minimax settings, their generalization guarantees in stochastic minimax optimization problems, i.e., how the solution trained on empirical data performs on unseen testing data, have been relatively underexplored. A fundamental question remains elusive: What is a good metric to study generalization of minimax learners? In this paper, we aim to answer this question by first showing that primal risk, a universal metric to study generalization in minimization problems, which has also been adopted recently to study generalization in minimax ones, fails in simple examples. We thus propose a new metric to study generalization of minimax learners: the primal gap, defined as the difference between the primal risk and its minimum over all models, to circumvent the issues. Next, we derive generalization error bounds for the primal gap in nonconvex-concave settings. As byproducts of our analysis, we also solve two open questions: establishing generalization error bounds for primal risk and primal-dual risk, another existing metric that is only well-defined when the global saddle-point exists, in the strong sense, i.e., without strong concavity or assuming that the maximization and expectation can be interchanged, while either of these assumptions was needed in the literature. Finally, we leverage this new metric to compare the generalization behavior of two popular algorithms -- gradient descent-ascent (GDA) and gradient descent-max (GDMax) in stochastic minimax optimization. △ Less

Submitted 20 June, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 34 pages, 2 figures

arXiv:2205.11389 [pdf, ps, other]

Fictitious Play in Markov Games with Single Controller

Authors: Muhammed O. Sayin, Kaiqing Zhang, Asuman Ozdaglar

Abstract: Certain but important classes of strategic-form games, including zero-sum and identical-interest games, have the fictitious-play-property (FPP), i.e., beliefs formed in fictitious play dynamics always converge to a Nash equilibrium (NE) in the repeated play of these games. Such convergence results are seen as a (behavioral) justification for the game-theoretical equilibrium analysis. Markov games… ▽ More Certain but important classes of strategic-form games, including zero-sum and identical-interest games, have the fictitious-play-property (FPP), i.e., beliefs formed in fictitious play dynamics always converge to a Nash equilibrium (NE) in the repeated play of these games. Such convergence results are seen as a (behavioral) justification for the game-theoretical equilibrium analysis. Markov games (MGs), also known as stochastic games, generalize the repeated play of strategic-form games to dynamic multi-state settings with Markovian state transitions. In particular, MGs are standard models for multi-agent reinforcement learning -- a reviving research area in learning and games, and their game-theoretical equilibrium analyses have also been conducted extensively. However, whether certain classes of MGs have the FPP or not (i.e., whether there is a behavioral justification for equilibrium analysis or not) remains largely elusive. In this paper, we study a new variant of fictitious play dynamics for MGs and show its convergence to an NE in n-player identical-interest MGs in which a single player controls the state transitions. Such games are of interest in communications, control, and economics applications. Our result together with the recent results in [Sayin et al. 2020] establishes the FPP of two-player zero-sum MGs and n-player identical-interest MGs with a single controller (standing at two different ends of the MG spectrum from fully competitive to fully cooperative). △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: Accepted to ACM Conference on Economics and Computation (EC) 2022

arXiv:2201.03968 [pdf, other]

Optimal and Differentially Private Data Acquisition: Central and Local Mechanisms

Authors: Alireza Fallah, Ali Makhdoumi, Azarakhsh Malekian, Asuman Ozdaglar

Abstract: We consider a platform's problem of collecting data from privacy sensitive users to estimate an underlying parameter of interest. We formulate this question as a Bayesian-optimal mechanism design problem, in which an individual can share her (verifiable) data in exchange for a monetary reward or services, but at the same time has a (private) heterogeneous privacy cost which we quantify using diffe… ▽ More We consider a platform's problem of collecting data from privacy sensitive users to estimate an underlying parameter of interest. We formulate this question as a Bayesian-optimal mechanism design problem, in which an individual can share her (verifiable) data in exchange for a monetary reward or services, but at the same time has a (private) heterogeneous privacy cost which we quantify using differential privacy. We consider two popular differential privacy settings for providing privacy guarantees for the users: central and local. In both settings, we establish minimax lower bounds for the estimation error and derive (near) optimal estimators for given heterogeneous privacy loss levels for users. Building on this characterization, we pose the mechanism design problem as the optimal selection of an estimator and payments that will elicit truthful reporting of users' privacy sensitivities. Under a regularity condition on the distribution of privacy sensitivities we develop efficient algorithmic mechanisms to solve this problem in both privacy settings. Our mechanism in the central setting can be implemented in time $\mathcal{O}(n \log n)$ where $n$ is the number of users and our mechanism in the local setting admits a Polynomial Time Approximation Scheme (PTAS). △ Less

Submitted 5 September, 2023; v1 submitted 9 January, 2022; originally announced January 2022.

Comments: To appear in the Operations Research journal. The abstract appeared in the Proceedings of the 23rd ACM Conference on Economics and Computation (EC 2022)

arXiv:2111.11743 [pdf, ps, other]

Independent Learning in Stochastic Games

Authors: Asuman Ozdaglar, Muhammed O. Sayin, Kaiqing Zhang

Abstract: Reinforcement learning (RL) has recently achieved tremendous successes in many artificial intelligence applications. Many of the forefront applications of RL involve multiple agents, e.g., playing chess and Go games, autonomous driving, and robotics. Unfortunately, the framework upon which classical RL builds is inappropriate for multi-agent learning, as it assumes an agent's environment is statio… ▽ More Reinforcement learning (RL) has recently achieved tremendous successes in many artificial intelligence applications. Many of the forefront applications of RL involve multiple agents, e.g., playing chess and Go games, autonomous driving, and robotics. Unfortunately, the framework upon which classical RL builds is inappropriate for multi-agent learning, as it assumes an agent's environment is stationary and does not take into account the adaptivity of other agents. In this review paper, we present the model of stochastic games for multi-agent learning in dynamic environments. We focus on the development of simple and independent learning dynamics for stochastic games: each agent is myopic and chooses best-response type actions to other agents' strategy without any coordination with her opponent. There has been limited progress on developing convergent best-response type independent learning dynamics for stochastic games. We present our recently proposed simple and independent learning dynamics that guarantee convergence in zero-sum stochastic games, together with a review of other contemporaneous algorithms for dynamic multi-agent learning in this setting. Along the way, we also reexamine some classical results from both the game theory and RL literature, to situate both the conceptual contributions of our independent learning dynamics, and the mathematical novelties of our analysis. We hope this review paper serves as an impetus for the resurgence of studying independent and natural learning dynamics in game theory, for the more challenging settings with a dynamic environment. △ Less

Submitted 23 November, 2021; originally announced November 2021.

Comments: An invited chapter for the International Congress of Mathematicians 2022 (ICM 2022)

arXiv:2109.00719 [pdf, ps, other]

Multi-agent Bayesian Learning with Best Response Dynamics: Convergence and Stability

Authors: Manxi Wu, Saurabh Amin, Asuman Ozdaglar

Abstract: We study learning dynamics induced by strategic agents who repeatedly play a game with an unknown payoff-relevant parameter. In this dynamics, a belief estimate of the parameter is repeatedly updated given players' strategies and realized payoffs using Bayes's rule. Players adjust their strategies by accounting for best response strategies given the belief. We show that, with probability 1, belief… ▽ More We study learning dynamics induced by strategic agents who repeatedly play a game with an unknown payoff-relevant parameter. In this dynamics, a belief estimate of the parameter is repeatedly updated given players' strategies and realized payoffs using Bayes's rule. Players adjust their strategies by accounting for best response strategies given the belief. We show that, with probability 1, beliefs and strategies converge to a fixed point, where the belief consistently estimates the payoff distribution for the strategy, and the strategy is an equilibrium corresponding to the belief. However, learning may not always identify the unknown parameter because the belief estimate relies on the game outcomes that are endogenously generated by players' strategies. We obtain sufficient and necessary conditions, under which learning leads to a globally stable fixed point that is a complete information Nash equilibrium. We also provide sufficient conditions that guarantee local stability of fixed point beliefs and strategies. △ Less

Submitted 2 September, 2021; originally announced September 2021.

Comments: arXiv admin note: text overlap with arXiv:2010.09128

arXiv:2106.07537 [pdf, other]

A Wasserstein Minimax Framework for Mixed Linear Regression

Authors: Theo Diamandis, Yonina C. Eldar, Alireza Fallah, Farzan Farnia, Asuman Ozdaglar

Abstract: Multi-modal distributions are commonly used to model clustered data in statistical learning tasks. In this paper, we consider the Mixed Linear Regression (MLR) problem. We propose an optimal transport-based framework for MLR problems, Wasserstein Mixed Linear Regression (WMLR), which minimizes the Wasserstein distance between the learned and target mixture regression models. Through a model-based… ▽ More Multi-modal distributions are commonly used to model clustered data in statistical learning tasks. In this paper, we consider the Mixed Linear Regression (MLR) problem. We propose an optimal transport-based framework for MLR problems, Wasserstein Mixed Linear Regression (WMLR), which minimizes the Wasserstein distance between the learned and target mixture regression models. Through a model-based duality analysis, WMLR reduces the underlying MLR task to a nonconvex-concave minimax optimization problem, which can be provably solved to find a minimax stationary point by the Gradient Descent Ascent (GDA) algorithm. In the special case of mixtures of two linear regression models, we show that WMLR enjoys global convergence and generalization guarantees. We prove that WMLR's sample complexity grows linearly with the dimension of data. Finally, we discuss the application of WMLR to the federated learning task where the training samples are collected by multiple agents in a network. Unlike the Expectation Maximization algorithm, WMLR directly extends to the distributed, federated learning setting. We support our theoretical results through several numerical experiments, which highlight our framework's ability to handle the federated learning setting with mixture models. △ Less

Submitted 16 June, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

Comments: To appear in 38th International Conference on Machine Learning (ICML 2021)

arXiv:2106.02748 [pdf, other]

Decentralized Q-Learning in Zero-sum Markov Games

Authors: Muhammed O. Sayin, Kaiqing Zhang, David S. Leslie, Tamer Basar, Asuman Ozdaglar

Abstract: We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized controller, but only based on their own payoffs and local actions executed. The agents need not observe the opponent's actions or payoffs, possibly being ev… ▽ More We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized controller, but only based on their own payoffs and local actions executed. The agents need not observe the opponent's actions or payoffs, possibly being even oblivious to the presence of the opponent, nor be aware of the zero-sum structure of the underlying game, a setting also referred to as radically uncoupled in the literature of learning in games. In this paper, we develop a radically uncoupled Q-learning dynamics that is both rational and convergent: the learning dynamics converges to the best response to the opponent's strategy when the opponent follows an asymptotically stationary strategy; when both agents adopt the learning dynamics, they converge to the Nash equilibrium of the game. The key challenge in this decentralized setting is the non-stationarity of the environment from an agent's perspective, since both her own payoffs and the system evolution depend on the actions of other agents, and each agent adapts her policies simultaneously and independently. To address this issue, we develop a two-timescale learning dynamics where each agent updates her local Q-function and value function estimates concurrently, with the latter happening at a slower timescale. △ Less

Submitted 12 December, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: To appear at NeurIPS 2021. Strengthened the results in Theorem 1 and Corollary 1

arXiv:2102.08441 [pdf, other]

Optimal intervention in transportation networks

Authors: Leonardo Cianfanelli, Giacomo Como, Asuman Ozdaglar, Francesca Parise

Abstract: We study a network design problem (NDP) where the planner aims at selecting the optimal single-link intervention on a transportation network to minimize the travel time under Wardrop equilibrium flows. Our first result is that, if the delay functions are affine and the support of the equilibrium is not modified with interventions, the NDP may be formulated in terms of electrical quantities compute… ▽ More We study a network design problem (NDP) where the planner aims at selecting the optimal single-link intervention on a transportation network to minimize the travel time under Wardrop equilibrium flows. Our first result is that, if the delay functions are affine and the support of the equilibrium is not modified with interventions, the NDP may be formulated in terms of electrical quantities computed on a related resistor network. In particular, we show that the travel time variation corresponding to an intervention on a given link depends on the effective resistance between the endpoints of the link. We suggest an approach to approximate such an effective resistance by performing only local computation, and exploit it to design an efficient algorithm to solve the NDP. We discuss the optimality of this procedure in the limit of infinitely large networks, and provide a sufficient condition for its optimality. We then provide numerical simulations, showing that our algorithm achieves good performance even if the equilibrium support varies and the delay functions are non-linear. △ Less

Submitted 14 November, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: 40 pages, 12 figures

MSC Class: 91A14; 91A43; 91A16; 90B20 ACM Class: G.2.2

arXiv:2102.03832 [pdf, other]

Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks

Authors: Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

Abstract: In this paper, we study the generalization properties of Model-Agnostic Meta-Learning (MAML) algorithms for supervised learning problems. We focus on the setting in which we train the MAML model over $m$ tasks, each with $n$ data points, and characterize its generalization error from two points of view: First, we assume the new task at test time is one of the training tasks, and we show that, for… ▽ More In this paper, we study the generalization properties of Model-Agnostic Meta-Learning (MAML) algorithms for supervised learning problems. We focus on the setting in which we train the MAML model over $m$ tasks, each with $n$ data points, and characterize its generalization error from two points of view: First, we assume the new task at test time is one of the training tasks, and we show that, for strongly convex objective functions, the expected excess population loss is bounded by ${\mathcal{O}}(1/mn)$. Second, we consider the MAML algorithm's generalization to an unseen task and show that the resulting generalization error depends on the total variation distance between the underlying distributions of the new task and the tasks observed during the training process. Our proof techniques rely on the connections between algorithmic stability and generalization bounds of algorithms. In particular, we propose a new definition of stability for meta-learning algorithms, which allows us to capture the role of both the number of tasks $m$ and number of samples per task $n$ on the generalization error of MAML. △ Less

Submitted 16 November, 2021; v1 submitted 7 February, 2021; originally announced February 2021.

Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2010.12561 [pdf, other]

Train simultaneously, generalize better: Stability of gradient-based minimax learners

Authors: Farzan Farnia, Asuman Ozdaglar

Abstract: The success of minimax learning problems of generative adversarial networks (GANs) has been observed to depend on the minimax optimization algorithm used for their training. This dependence is commonly attributed to the convergence speed and robustness properties of the underlying optimization algorithm. In this paper, we show that the optimization algorithm also plays a key role in the generaliza… ▽ More The success of minimax learning problems of generative adversarial networks (GANs) has been observed to depend on the minimax optimization algorithm used for their training. This dependence is commonly attributed to the convergence speed and robustness properties of the underlying optimization algorithm. In this paper, we show that the optimization algorithm also plays a key role in the generalization performance of the trained minimax model. To this end, we analyze the generalization properties of standard gradient descent ascent (GDA) and proximal point method (PPM) algorithms through the lens of algorithmic stability under both convex concave and non-convex non-concave minimax settings. While the GDA algorithm is not guaranteed to have a vanishing excess risk in convex concave problems, we show the PPM algorithm enjoys a bounded excess risk in the same setup. For non-convex non-concave problems, we compare the generalization performance of stochastic GDA and GDmax algorithms where the latter fully solves the maximization subproblem at every iteration. Our generalization analysis suggests the superiority of GDA provided that the minimization and maximization subproblems are solved simultaneously with similar learning rates. We discuss several numerical results indicating the role of optimization algorithms in the generalization of the learned minimax models. △ Less

Submitted 23 October, 2020; originally announced October 2020.

arXiv:2010.09128 [pdf, ps, other]

Multi-agent Bayesian Learning with Adaptive Strategies: Convergence and Stability

Authors: Manxi Wu, Saurabh Amin, Asuman Ozdaglar

Abstract: We study learning dynamics induced by strategic agents who repeatedly play a game with an unknown payoff-relevant parameter. In each step, an information system estimates a belief distribution of the parameter based on the players' strategies and realized payoffs using Bayes' rule. Players adjust their strategies by accounting for an equilibrium strategy or a best response strategy based on the up… ▽ More We study learning dynamics induced by strategic agents who repeatedly play a game with an unknown payoff-relevant parameter. In each step, an information system estimates a belief distribution of the parameter based on the players' strategies and realized payoffs using Bayes' rule. Players adjust their strategies by accounting for an equilibrium strategy or a best response strategy based on the updated belief. We prove that beliefs and strategies converge to a fixed point with probability 1. We also provide conditions that guarantee local and global stability of fixed points. Any fixed point belief consistently estimates the payoff distribution given the fixed point strategy profile. However, convergence to a complete information Nash equilibrium is not always guaranteed. We provide a sufficient and necessary condition under which fixed point belief recovers the unknown parameter. We also provide a sufficient condition for convergence to complete information equilibrium even when parameter learning is incomplete. △ Less

Submitted 18 October, 2020; originally announced October 2020.

arXiv:2010.04223 [pdf, other]

Fictitious play in zero-sum stochastic games

Authors: Muhammed O. Sayin, Francesca Parise, Asuman Ozdaglar

Abstract: We present a novel variant of fictitious play dynamics combining classical fictitious play with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games. Our dynamics involves players forming beliefs on the opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response by using the estimated continuation pa… ▽ More We present a novel variant of fictitious play dynamics combining classical fictitious play with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games. Our dynamics involves players forming beliefs on the opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response by using the estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game. △ Less

Submitted 2 June, 2022; v1 submitted 8 October, 2020; originally announced October 2020.

Comments: The extended arXiv version of the original paper to appear in SIAM Journal on Control and Optimization

arXiv:2002.09124 [pdf, other]

GANs May Have No Nash Equilibria

Authors: Farzan Farnia, Asuman Ozdaglar

Abstract: Generative adversarial networks (GANs) represent a zero-sum game between two machine players, a generator and a discriminator, designed to learn the distribution of data. While GANs have achieved state-of-the-art performance in several benchmark learning tasks, GAN minimax optimization still poses great theoretical and empirical challenges. GANs trained using first-order optimization methods commo… ▽ More Generative adversarial networks (GANs) represent a zero-sum game between two machine players, a generator and a discriminator, designed to learn the distribution of data. While GANs have achieved state-of-the-art performance in several benchmark learning tasks, GAN minimax optimization still poses great theoretical and empirical challenges. GANs trained using first-order optimization methods commonly fail to converge to a stable solution where the players cannot improve their objective, i.e., the Nash equilibrium of the underlying game. Such issues raise the question of the existence of Nash equilibrium solutions in the GAN zero-sum game. In this work, we show through several theoretical and numerical results that indeed GAN zero-sum games may not have any local Nash equilibria. To characterize an equilibrium notion applicable to GANs, we consider the equilibrium of a new zero-sum game with an objective function given by a proximal operator applied to the original objective, a solution we call the proximal equilibrium. Unlike the Nash equilibrium, the proximal equilibrium captures the sequential nature of GANs, in which the generator moves first followed by the discriminator. We prove that the optimal generative model in Wasserstein GAN problems provides a proximal equilibrium. Inspired by these results, we propose a new approach, which we call proximal training, for solving GAN problems. We discuss several numerical experiments demonstrating the existence of proximal equilibrium solutions in GAN minimax problems. △ Less

Submitted 20 February, 2020; originally announced February 2020.

arXiv:2002.07948 [pdf, other]

Personalized Federated Learning: A Meta-Learning Approach

Authors: Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

Abstract: In Federated Learning, we aim to train models across multiple computing units (users), while users can only communicate with a common central server, without exchanging their data samples. This mechanism exploits the computational power of all users and allows users to obtain a richer model as their models are trained over a larger set of data points. However, this scheme only develops a common ou… ▽ More In Federated Learning, we aim to train models across multiple computing units (users), while users can only communicate with a common central server, without exchanging their data samples. This mechanism exploits the computational power of all users and allows users to obtain a richer model as their models are trained over a larger set of data points. However, this scheme only develops a common output for all the users, and, therefore, it does not adapt the model to each user. This is an important missing feature, especially given the heterogeneity of the underlying data distribution for various users. In this paper, we study a personalized variant of the federated learning in which our goal is to find an initial shared model that current or new users can easily adapt to their local dataset by performing one or a few steps of gradient descent with respect to their own data. This approach keeps all the benefits of the federated learning architecture, and, by structure, leads to a more personalized model for each user. We show this problem can be studied within the Model-Agnostic Meta-Learning (MAML) framework. Inspired by this connection, we study a personalized variant of the well-known Federated Averaging algorithm and evaluate its performance in terms of gradient norm for non-convex loss functions. Further, we characterize how this performance is affected by the closeness of underlying distributions of user data, measured in terms of distribution distances such as Total Variation and 1-Wasserstein metric. △ Less

Submitted 22 October, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

Comments: To appear in 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

arXiv:2002.05683 [pdf, ps, other]

An Optimal Multistage Stochastic Gradient Method for Minimax Problems

Authors: Alireza Fallah, Asuman Ozdaglar, Sarath Pattathil

Abstract: In this paper, we study the minimax optimization problem in the smooth and strongly convex-strongly concave setting when we have access to noisy estimates of gradients. In particular, we first analyze the stochastic Gradient Descent Ascent (GDA) method with constant stepsize, and show that it converges to a neighborhood of the solution of the minimax problem. We further provide tight bounds on the… ▽ More In this paper, we study the minimax optimization problem in the smooth and strongly convex-strongly concave setting when we have access to noisy estimates of gradients. In particular, we first analyze the stochastic Gradient Descent Ascent (GDA) method with constant stepsize, and show that it converges to a neighborhood of the solution of the minimax problem. We further provide tight bounds on the convergence rate and the size of this neighborhood. Next, we propose a multistage variant of stochastic GDA (M-GDA) that runs in multiple stages with a particular learning rate decay schedule and converges to the exact solution of the minimax problem. We show M-GDA achieves the lower bounds in terms of noise dependence without any assumptions on the knowledge of noise characteristics. We also show that M-GDA obtains a linear decay rate with respect to the error's dependence on the initial error, although the dependence on condition number is suboptimal. In order to improve this dependence, we apply the multistage machinery to the stochastic Optimistic Gradient Descent Ascent (OGDA) algorithm and propose the M-OGDA algorithm which also achieves the optimal linear decay rate with respect to the initial error. To the best of our knowledge, this method is the first to simultaneously achieve the best dependence on noise characteristic as well as the initial error and condition number. △ Less

Submitted 13 February, 2020; originally announced February 2020.

arXiv:2002.05135 [pdf, other]

On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning

Authors: Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, Asuman Ozdaglar

Abstract: We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems, where the goal is to find a policy using data from several tasks represented by Markov Decision Processes (MDPs) that can be updated by one step of stochastic policy gradient for the realized MDP. In particular, using stochastic gradients in MAML update steps is crucial for RL problems since computati… ▽ More We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems, where the goal is to find a policy using data from several tasks represented by Markov Decision Processes (MDPs) that can be updated by one step of stochastic policy gradient for the realized MDP. In particular, using stochastic gradients in MAML update steps is crucial for RL problems since computation of exact gradients requires access to a large number of possible trajectories. For this formulation, we propose a variant of the MAML method, named Stochastic Gradient Meta-Reinforcement Learning (SG-MRL), and study its convergence properties. We derive the iteration and sample complexity of SG-MRL to find an $ε$-first-order stationary point, which, to the best of our knowledge, provides the first convergence guarantee for model-agnostic meta-reinforcement learning algorithms. We further show how our results extend to the case where more than one step of stochastic policy gradient method is used at test time. Finally, we empirically compare SG-MRL and MAML in several deep RL environments. △ Less

Submitted 16 November, 2021; v1 submitted 12 February, 2020; originally announced February 2020.

Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2002.00057 [pdf, ps, other]

Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems

Authors: Noah Golowich, Sarath Pattathil, Constantinos Daskalakis, Asuman Ozdaglar

Abstract: In this paper we study the smooth convex-concave saddle point problem. Specifically, we analyze the last iterate convergence properties of the Extragradient (EG) algorithm. It is well known that the ergodic (averaged) iterates of EG converge at a rate of $O(1/T)$ (Nemirovski, 2004). In this paper, we show that the last iterate of EG converges at a rate of $O(1/\sqrt{T})$. To the best of our knowle… ▽ More In this paper we study the smooth convex-concave saddle point problem. Specifically, we analyze the last iterate convergence properties of the Extragradient (EG) algorithm. It is well known that the ergodic (averaged) iterates of EG converge at a rate of $O(1/T)$ (Nemirovski, 2004). In this paper, we show that the last iterate of EG converges at a rate of $O(1/\sqrt{T})$. To the best of our knowledge, this is the first paper to provide a convergence rate guarantee for the last iterate of EG for the smooth convex-concave saddle point problem. Moreover, we show that this rate is tight by proving a lower bound of $Ω(1/\sqrt{T})$ for the last iterate. This lower bound therefore shows a quadratic separation of the convergence rates of ergodic and last iterates in smooth convex-concave saddle point problems. △ Less

Submitted 6 July, 2020; v1 submitted 31 January, 2020; originally announced February 2020.

Comments: 27 pages

arXiv:2001.03232 [pdf, other]

Optimal dynamic information provision in traffic routing

Authors: Emily Meigs, Francesca Parise, Asuman Ozdaglar, Daron Acemoglu

Abstract: We consider a two-road dynamic routing game where the state of one of the roads (the "risky road") is stochastic and may change over time. This generates room for experimentation. A central planner may wish to induce some of the (finite number of atomic) agents to use the risky road even when the expected cost of travel there is high in order to obtain accurate information about the state of the r… ▽ More We consider a two-road dynamic routing game where the state of one of the roads (the "risky road") is stochastic and may change over time. This generates room for experimentation. A central planner may wish to induce some of the (finite number of atomic) agents to use the risky road even when the expected cost of travel there is high in order to obtain accurate information about the state of the road. Since agents are strategic, we show that in order to generate incentives for experimentation the central planner however needs to limit the number of agents using the risky road when the expected cost of travel on the risky road is low. In particular, because of congestion, too much use of the risky road when the state is favorable would make experimentation no longer incentive compatible. We characterize the optimal incentive compatible recommendation system, first in a two-stage game and then in an infinite-horizon setting. In both cases, this system induces only partial, rather than full, information sharing among the agents (otherwise there would be too much exploitation of the risky road when costs there are low). △ Less

Submitted 9 January, 2020; originally announced January 2020.

arXiv:1910.14380 [pdf, other]

A Decentralized Proximal Point-type Method for Saddle Point Problems

Authors: Weijie Liu, Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil, Zebang Shen, Nenggan Zheng

Abstract: In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network. Specifically, we assume that each node has access to a summand of a global objective function and nodes are allowed to exchange information only with their neighboring nodes. We propose a decentralized variant of the proximal point metho… ▽ More In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network. Specifically, we assume that each node has access to a summand of a global objective function and nodes are allowed to exchange information only with their neighboring nodes. We propose a decentralized variant of the proximal point method for solving this problem. We show that when the objective function is $ρ$-weakly convex-weakly concave the iterates converge to approximate stationarity with a rate of $\mathcal{O}(1/\sqrt{T})$ where the approximation error depends linearly on $\sqrtρ$. We further show that when the objective function satisfies the Minty VI condition (which generalizes the convex-concave case) we obtain convergence to stationarity with a rate of $\mathcal{O}(1/\sqrt{T})$. To the best of our knowledge, our proposed method is the first decentralized algorithm with theoretical guarantees for solving a non-convex non-concave decentralized saddle point problem. Our numerical results for training a general adversarial network (GAN) in a decentralized manner match our theoretical guarantees. △ Less

Submitted 31 October, 2019; originally announced October 2019.

Comments: 18 pages

arXiv:1910.08701 [pdf, other]

Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks

Authors: Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar, Umut Simsekli, Lingjiong Zhu

Abstract: We study distributed stochastic gradient (D-SG) method and its accelerated variant (D-ASG) for solving decentralized strongly convex stochastic optimization problems where the objective function is distributed over several computational units, lying on a fixed but arbitrary connected communication graph, subject to local communication constraints where noisy estimates of the gradients are availabl… ▽ More We study distributed stochastic gradient (D-SG) method and its accelerated variant (D-ASG) for solving decentralized strongly convex stochastic optimization problems where the objective function is distributed over several computational units, lying on a fixed but arbitrary connected communication graph, subject to local communication constraints where noisy estimates of the gradients are available. We develop a framework which allows to choose the stepsize and the momentum parameters of these algorithms in a way to optimize performance by systematically trading off the bias, variance, robustness to gradient noise and dependence to network effects. When gradients do not contain noise, we also prove that distributed accelerated methods can \emph{achieve acceleration}, requiring $\mathcal{O}(κ\log(1/\varepsilon))$ gradient evaluations and $\mathcal{O}(κ\log(1/\varepsilon))$ communications to converge to the same fixed point with the non-accelerated variant where $κ$ is the condition number and $\varepsilon$ is the target accuracy. To our knowledge, this is the first acceleration result where the iteration complexity scales with the square root of the condition number in the context of \emph{primal} distributed inexact first-order methods. For quadratic functions, we also provide finer performance bounds that are tight with respect to bias and variance terms. Finally, we study a multistage version of D-ASG with parameters carefully varied over stages to ensure exact $\mathcal{O}(-k/\sqrtκ)$ linear decay in the bias term as well as optimal $\mathcal{O}(σ^2/k)$ in the variance term. We illustrate through numerical experiments that our approach results in practical algorithms that are robust to gradient noise and that can outperform existing methods. △ Less

Submitted 4 October, 2021; v1 submitted 19 October, 2019; originally announced October 2019.

arXiv:1908.10400 [pdf, other]

On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms

Authors: Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

Abstract: We study the convergence of a class of gradient-based Model-Agnostic Meta-Learning (MAML) methods and characterize their overall complexity as well as their best achievable accuracy in terms of gradient norm for nonconvex loss functions. We start with the MAML method and its first-order approximation (FO-MAML) and highlight the challenges that emerge in their analysis. By overcoming these challeng… ▽ More We study the convergence of a class of gradient-based Model-Agnostic Meta-Learning (MAML) methods and characterize their overall complexity as well as their best achievable accuracy in terms of gradient norm for nonconvex loss functions. We start with the MAML method and its first-order approximation (FO-MAML) and highlight the challenges that emerge in their analysis. By overcoming these challenges not only we provide the first theoretical guarantees for MAML and FO-MAML in nonconvex settings, but also we answer some of the unanswered questions for the implementation of these algorithms including how to choose their learning rate and the batch size for both tasks and datasets corresponding to tasks. In particular, we show that MAML can find an $ε$-first-order stationary point ($ε$-FOSP) for any positive $ε$ after at most $\mathcal{O}(1/ε^2)$ iterations at the expense of requiring second-order information. We also show that FO-MAML which ignores the second-order information required in the update of MAML cannot achieve any small desired level of accuracy, i.e., FO-MAML cannot find an $ε$-FOSP for any $ε>0$. We further propose a new variant of the MAML algorithm called Hessian-free MAML which preserves all theoretical guarantees of MAML, without requiring access to second-order information. △ Less

Submitted 15 May, 2020; v1 submitted 27 August, 2019; originally announced August 2019.

Comments: To appear in the proceedings of the $23^{rd}$ International Conference on Artificial Intelligence and Statistics (AISTATS) 2020

arXiv:1906.01115 [pdf, ps, other]

Convergence Rate of $\mathcal{O}(1/k)$ for Optimistic Gradient and Extra-gradient Methods in Smooth Convex-Concave Saddle Point Problems

Authors: Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil

Abstract: We study the iteration complexity of the optimistic gradient descent-ascent (OGDA) method and the extra-gradient (EG) method for finding a saddle point of a convex-concave unconstrained min-max problem. To do so, we first show that both OGDA and EG can be interpreted as approximate variants of the proximal point method. This is similar to the approach taken in [Nemirovski, 2004] which analyzes EG… ▽ More We study the iteration complexity of the optimistic gradient descent-ascent (OGDA) method and the extra-gradient (EG) method for finding a saddle point of a convex-concave unconstrained min-max problem. To do so, we first show that both OGDA and EG can be interpreted as approximate variants of the proximal point method. This is similar to the approach taken in [Nemirovski, 2004] which analyzes EG as an approximation of the `conceptual mirror prox'. In this paper, we highlight how gradients used in OGDA and EG try to approximate the gradient of the Proximal Point method. We then exploit this interpretation to show that both algorithms produce iterates that remain within a bounded set. We further show that the primal dual gap of the averaged iterates generated by both of these algorithms converge with a rate of $\mathcal{O}(1/k)$. Our theoretical analysis is of interest as it provides a the first convergence rate estimate for OGDA in the general convex-concave setting. Moreover, it provides a simple convergence analysis for the EG algorithm in terms of function value without using compactness assumption. △ Less

Submitted 29 September, 2020; v1 submitted 3 June, 2019; originally announced June 2019.

Comments: 19 pages

arXiv:1901.08511 [pdf, ps, other]

A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach

Authors: Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil

Abstract: In this paper we consider solving saddle point problems using two variants of Gradient Descent-Ascent algorithms, Extra-gradient (EG) and Optimistic Gradient Descent Ascent (OGDA) methods. We show that both of these algorithms admit a unified analysis as approximations of the classical proximal point method for solving saddle point problems. This viewpoint enables us to develop a new framework for… ▽ More In this paper we consider solving saddle point problems using two variants of Gradient Descent-Ascent algorithms, Extra-gradient (EG) and Optimistic Gradient Descent Ascent (OGDA) methods. We show that both of these algorithms admit a unified analysis as approximations of the classical proximal point method for solving saddle point problems. This viewpoint enables us to develop a new framework for analyzing EG and OGDA for bilinear and strongly convex-strongly concave settings. Moreover, we use the proximal point approximation interpretation to generalize the results for OGDA for a wide range of parameters. △ Less

Submitted 5 September, 2019; v1 submitted 24 January, 2019; originally announced January 2019.

Comments: 25 pages, 3 figures

arXiv:1901.08022 [pdf, other]

A Universally Optimal Multistage Accelerated Stochastic Gradient Method

Authors: Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar

Abstract: We study the problem of minimizing a strongly convex, smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics. The algorithm consists of stages that use a stochastic… ▽ More We study the problem of minimizing a strongly convex, smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics. The algorithm consists of stages that use a stochastic version of Nesterov's method with a specific restart and parameters selected to achieve the fastest reduction in the bias-variance terms in the convergence rate bounds. △ Less

Submitted 27 October, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

arXiv:1809.02162 [pdf, ps, other]

Escaping Saddle Points in Constrained Optimization

Authors: Aryan Mokhtari, Asuman Ozdaglar, Ali Jadbabaie

Abstract: In this paper, we study the problem of escaping from saddle points in smooth nonconvex optimization problems subject to a convex set $\mathcal{C}$. We propose a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function. Specifically, our results hold if one can find a $ρ$-approximate sol… ▽ More In this paper, we study the problem of escaping from saddle points in smooth nonconvex optimization problems subject to a convex set $\mathcal{C}$. We propose a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function. Specifically, our results hold if one can find a $ρ$-approximate solution of a quadratic program subject to $\mathcal{C}$ in polynomial time, where $ρ<1$ is a positive constant that depends on the structure of the set $\mathcal{C}$. Under this condition, we show that the sequence of iterates generated by the proposed framework reaches an $(ε,γ)$-second order stationary point (SOSP) in at most $\mathcal{O}(\max\{ε^{-2},ρ^{-3}γ^{-3}\})$ iterations. We further characterize the overall complexity of reaching an SOSP when the convex set $\mathcal{C}$ can be written as a set of quadratic constraints and the objective function Hessian has a specific structure over the convex set $\mathcal{C}$. Finally, we extend our results to the stochastic setting and characterize the number of stochastic gradient and Hessian evaluations to reach an $(ε,γ)$-SOSP. △ Less

Submitted 9 October, 2018; v1 submitted 6 September, 2018; originally announced September 2018.

arXiv:1809.01485 [pdf, other]

Blind Community Detection from Low-rank Excitations of a Graph Filter

Authors: Hoi-To Wai, Santiago Segarra, Asuman E. Ozdaglar, Anna Scaglione, Ali Jadbabaie

Abstract: This paper considers a new framework to detect communities in a graph from the observation of signals at its nodes. We model the observed signals as noisy outputs of an unknown network process, represented as a graph filter that is excited by a set of unknown low-rank inputs/excitations. Application scenarios of this model include diffusion dynamics, pricing experiments, and opinion dynamics. Rath… ▽ More This paper considers a new framework to detect communities in a graph from the observation of signals at its nodes. We model the observed signals as noisy outputs of an unknown network process, represented as a graph filter that is excited by a set of unknown low-rank inputs/excitations. Application scenarios of this model include diffusion dynamics, pricing experiments, and opinion dynamics. Rather than learning the precise parameters of the graph itself, we aim at retrieving the community structure directly. The paper shows that communities can be detected by applying a spectral method to the covariance matrix of graph signals. Our analysis indicates that the community detection performance depends on a `low-pass' property of the graph filter. We also show that the performance can be improved via a low-rank matrix plus sparse decomposition method when the latent parameter vectors are known. Numerical experiments demonstrate that our approach is effective. △ Less

Submitted 12 April, 2019; v1 submitted 5 September, 2018; originally announced September 2018.

Comments: Single column format, 32 pages, 9 figures

arXiv:1808.10590 [pdf, ps, other]

Value of Information in Bayesian Routing Games

Authors: Manxi Wu, Saurabh Amin, Asuman E. Ozdaglar

Abstract: We study a routing game in an environment with multiple heterogeneous information systems and an uncertain state that affects edge costs of a congested network. Each information system sends a noisy signal about the state to its subscribed traveler population. Travelers make route choices based on their private beliefs about the state and other populations' signals. The question then arises, "How… ▽ More We study a routing game in an environment with multiple heterogeneous information systems and an uncertain state that affects edge costs of a congested network. Each information system sends a noisy signal about the state to its subscribed traveler population. Travelers make route choices based on their private beliefs about the state and other populations' signals. The question then arises, "How does the presence of asymmetric and incomplete information affect the travelers' equilibrium route choices and costs?'' We develop a systematic approach to characterize the equilibrium structure, and determine the effect of population sizes on the relative value of information (i.e. difference in expected traveler costs) between any two populations. This effect can be evaluated using a population-specific size threshold. One population enjoys a strictly positive value of information in comparison to the other if and only if its size is below the corresponding threshold. We also consider the situation when travelers may choose an information system based on its value, and characterize the set of equilibrium adoption rates delineating the sizes of subscribed traveler populations. The resulting routing strategies are such that all travelers face an identical expected cost and no traveler has the incentive to change her subscription. △ Less

Submitted 6 March, 2020; v1 submitted 31 August, 2018; originally announced August 2018.

arXiv:1807.04428 [pdf, ps, other]

Convergence Rate of Block-Coordinate Maximization Burer-Monteiro Method for Solving Large SDPs

Authors: Murat A. Erdogdu, Asuman Ozdaglar, Pablo A. Parrilo, Nuri Denizcan Vanli

Abstract: Semidefinite programming (SDP) with diagonal constraints arise in many optimization problems, such as Max-Cut, community detection and group synchronization. Although SDPs can be solved to arbitrary precision in polynomial time, generic convex solvers do not scale well with the dimension of the problem. In order to address this issue, Burer and Monteiro proposed to reduce the dimension of the prob… ▽ More Semidefinite programming (SDP) with diagonal constraints arise in many optimization problems, such as Max-Cut, community detection and group synchronization. Although SDPs can be solved to arbitrary precision in polynomial time, generic convex solvers do not scale well with the dimension of the problem. In order to address this issue, Burer and Monteiro proposed to reduce the dimension of the problem by appealing to a low-rank factorization and solve the subsequent non-convex problem instead. In this paper, we present coordinate ascent based methods to solve this non-convex problem with provable convergence guarantees. More specifically, we prove that the block-coordinate maximization algorithm applied to the non-convex Burer-Monteiro method globally converges to a first-order stationary point with a sublinear rate without any assumptions on the problem. We further show that this algorithm converges linearly around a local maximum provided that the objective function exhibits quadratic decay. We establish that this condition generically holds when the rank of the factorization is sufficiently large. Furthermore, incorporating Lanczos method to the block-coordinate maximization, we propose an algorithm that is guaranteed to return a solution that provides $1-O(1/r)$ approximation to the original SDP without any assumptions, where $r$ is the rank of the factorization. This approximation ratio is known to be optimal (up to constants) under the unique games conjecture, and we can explicitly quantify the number of iterations to obtain such a solution. △ Less

Submitted 26 November, 2019; v1 submitted 12 July, 2018; originally announced July 2018.

arXiv:1805.10579 [pdf, other]

Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions

Authors: Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar

Abstract: We study the trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm. We focus on gradient descent (GD) and accelerated gradient (AG) methods for minimizing strongly convex functions when the gradient has random errors in the form of additive white noise. With gradient errors, the function values of the iterates need not converge to the optimal va… ▽ More We study the trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm. We focus on gradient descent (GD) and accelerated gradient (AG) methods for minimizing strongly convex functions when the gradient has random errors in the form of additive white noise. With gradient errors, the function values of the iterates need not converge to the optimal value; hence, we define the robustness of an algorithm to noise as the asymptotic expected suboptimality of the iterate sequence to input noise power. For this robustness measure, we provide exact expressions for the quadratic case using tools from robust control theory and tight upper bounds for the smooth strongly convex case using Lyapunov functions certified through matrix inequalities. We use these characterizations within an optimization problem which selects parameters of each algorithm to achieve a particular trade-off between rate and robustness. Our results show that AG can achieve acceleration while being more robust to random gradient errors. This behavior is quite different than previously reported in the deterministic gradient noise setting. We also establish some connections between the robustness of an algorithm and how quickly it can converge back to the optimal solution if it is perturbed from the optimal point with deterministic noise. Our framework also leads to practical algorithms that can perform better than other state-of-the-art methods in the presence of random gradient noise. △ Less

Submitted 5 November, 2019; v1 submitted 27 May, 2018; originally announced May 2018.

Comments: To appear in SIAM Journal on Optimization (SIOPT)

arXiv:1802.00080 [pdf, other]

Graphon games: A statistical framework for network games and interventions

Authors: Francesca Parise, Asuman Ozdaglar

Abstract: In this paper, we present a unifying framework for analyzing equilibria and designing interventions for large network games sampled from a stochastic network formation process represented by a graphon. We first introduce a new class of infinite population games, termed graphon games, where a continuum of heterogeneous agents interact according to a graphon. After studying properties of equilibria… ▽ More In this paper, we present a unifying framework for analyzing equilibria and designing interventions for large network games sampled from a stochastic network formation process represented by a graphon. We first introduce a new class of infinite population games, termed graphon games, where a continuum of heterogeneous agents interact according to a graphon. After studying properties of equilibria in graphon games, we show that graphon equilibria can approximate equilibria of large network games sampled from the graphon. We next show that, under some regularity assumptions, the graphon approach enables the design of asymptotically optimal interventions via the solution of an optimization problem with much lower dimension than the one based on the entire network structure. We illustrate our framework on a synthetic dataset of rural villages and show that the graphon intervention can be computed efficiently and based solely on aggregated relational data. △ Less

Submitted 30 June, 2020; v1 submitted 31 January, 2018; originally announced February 2018.

arXiv:1712.08277 [pdf, other]

A variational inequality framework for network games: Existence, uniqueness, convergence and sensitivity analysis

Authors: Francesca Parise, Asuman Ozdaglar

Abstract: We provide a unified variational inequality framework for the study of fundamental properties of the Nash equilibrium in network games. We identify several conditions on the underlying network (in terms of spectral norm, infinity norm and minimum eigenvalue of its adjacency matrix) that guarantee existence, uniqueness, convergence and continuity of equilibrium in general network games with multidi… ▽ More We provide a unified variational inequality framework for the study of fundamental properties of the Nash equilibrium in network games. We identify several conditions on the underlying network (in terms of spectral norm, infinity norm and minimum eigenvalue of its adjacency matrix) that guarantee existence, uniqueness, convergence and continuity of equilibrium in general network games with multidimensional and possibly constrained strategy sets. We delineate the relations between these conditions and characterize classes of networks that satisfy each of these conditions. △ Less

Submitted 9 August, 2018; v1 submitted 21 December, 2017; originally announced December 2017.

arXiv:1706.08693 [pdf, other]

Sensitivity analysis for network aggregative games

Authors: Francesca Parise, Asuman Ozdaglar

Abstract: We investigate the sensitivity of the Nash equilibrium of constrained network aggregative games to changes in exogenous parameters affecting the cost function of the players. This setting is motivated by two applications. The first is the analysis of interventions by a social planner with a networked objective function while the second is network routing games with atomic players and information c… ▽ More We investigate the sensitivity of the Nash equilibrium of constrained network aggregative games to changes in exogenous parameters affecting the cost function of the players. This setting is motivated by two applications. The first is the analysis of interventions by a social planner with a networked objective function while the second is network routing games with atomic players and information constraints. By exploiting a primal reformulation of a sensitivity analysis result for variational inequalities, we provide a characterization of the sensitivity of the Nash equilibrium that depends on primal variables only. To derive this result we assume strong monotonicity of the mapping associated with the game. As the second main result, we derive sufficient conditions that guarantee this strong monotonicity property in network aggregative games. These two characterizations allows us to systematically study changes in the Nash equilibrium due to perturbations or parameter variations in the two applications mentioned above. △ Less

Submitted 27 June, 2017; originally announced June 2017.

arXiv:1706.01131 [pdf, ps, other]

Strategic Dynamic Pricing with Network Effects

Authors: Ali Makhdoumi, Azarakhsh Malekian, Asuman Ozdaglar

Abstract: We study the optimal pricing strategy of a monopolist selling homogeneous goods to customers over multiple periods. The customers choose their time of purchase to maximize their payoff that depends on their valuation of the product, the purchase price, and the utility they derive from past purchases of others, termed the network effect. We first show that the optimal price sequence is non-decreasi… ▽ More We study the optimal pricing strategy of a monopolist selling homogeneous goods to customers over multiple periods. The customers choose their time of purchase to maximize their payoff that depends on their valuation of the product, the purchase price, and the utility they derive from past purchases of others, termed the network effect. We first show that the optimal price sequence is non-decreasing. Therefore, by postponing purchase to future rounds, customers trade-off a higher utility from the network effects with a higher price. We then show that a customer's equilibrium strategy can be characterized by a threshold rule in which at each round a customer purchases the product if and only if her valuation exceeds a certain threshold. This implies that customers face an inference problem regarding the valuations of others, i.e., observing that a customer has not yet purchased the product, signals that her valuation is below a threshold. We consider a block model of network interactions, where there are blocks of buyers subject to the same network effect. A natural benchmark, this model allows us to provide an explicit characterization of the optimal price sequence asymptotically as the number of agents goes to infinity, which notably is linearly increasing in time with a slope that depends on the network effect through a scalar given by the sum of entries of the inverse of the network weight matrix. Our characterization shows that increasing the "imbalance" in the network defined as the difference between the in and out degree of the nodes increases the revenue of the monopolist. We further study the effects of price discrimination and show that in earlier periods monopolist offers lower prices to blocks with higher Bonacich centrality to encourage them to purchase, which in turn further incentivizes other customers to buy in subsequent periods. △ Less

Submitted 26 June, 2018; v1 submitted 4 June, 2017; originally announced June 2017.

arXiv:1601.02039 [pdf, ps, other]

Informational Braess' Paradox: The Effect of Information on Traffic Congestion

Authors: Daron Acemoglu, Ali Makhdoumi, Azarakhsh Malekian, Asuman Ozdaglar

Abstract: To systematically study the implications of additional information about routes provided to certain users (e.g., via GPS-based route guidance systems), we introduce a new class of congestion games in which users have differing information sets about the available edges and can only use routes consisting of edges in their information set. After defining the notion of Information Constrained Wardrop… ▽ More To systematically study the implications of additional information about routes provided to certain users (e.g., via GPS-based route guidance systems), we introduce a new class of congestion games in which users have differing information sets about the available edges and can only use routes consisting of edges in their information set. After defining the notion of Information Constrained Wardrop Equilibrium (ICWE) for this class of congestion games and studying its basic properties, we turn to our main focus: whether additional information can be harmful (in the sense of generating greater equilibrium costs/delays). We formulate this question in the form of Informational Braes' Paradox (IBP), which extends the classic Braess' Paradox in traffic equilibria, and asks whether users receiving additional information can become worse off. We provide a comprehensive answer to this question showing that in any network in the series of linearly independent (SLI) class, which is a strict subset of series-parallel networks, IBP cannot occur, and in any network that is not in the SLI class, there exists a configuration of edge-specific cost functions for which IBP will occur. In the process, we establish several properties of the SLI class of networks, which include the characterization of the complement of the SLI class in terms of embedding a specific set of networks, and also an algorithm which determines whether a graph is SLI in linear time. We further prove that the worst-case inefficiency performance of ICWE is no worse than the standard Wardrop equilibrium. △ Less

Submitted 3 November, 2017; v1 submitted 8 January, 2016; originally announced January 2016.

arXiv:1510.06055 [pdf, other]

A lower bound on the performance of dynamic curing policies for epidemics on graphs

Authors: Kimon Drakopoulos, Asuman Ozdaglar, John N. Tsitsiklis

Abstract: We consider an SIS-type epidemic process that evolves on a known graph. We assume that a fixed curing budget can be allocated at each instant to the nodes of the graph, towards the objective of minimizing the expected extinction time of the epidemic. We provide a lower bound on the optimal expected extinction time as a function of the available budget, the epidemic parameters, the maximum degree,… ▽ More We consider an SIS-type epidemic process that evolves on a known graph. We assume that a fixed curing budget can be allocated at each instant to the nodes of the graph, towards the objective of minimizing the expected extinction time of the epidemic. We provide a lower bound on the optimal expected extinction time as a function of the available budget, the epidemic parameters, the maximum degree, and the CutWidth of the graph. For graphs with large CutWidth (close to the largest possible), and under a budget which is sublinear in the number of nodes, our lower bound scales exponentially with the size of the graph. △ Less

Submitted 20 October, 2015; originally announced October 2015.

arXiv:1510.06054 [pdf, other]

When is a network epidemic hard to eliminate?

Authors: Kimon Drakopoulos, Asuman Ozdaglar, John N. Tsitsiklis

Abstract: We consider the propagation of a contagion process (epidemic) on a network and study the problem of dynamically allocating a fixed curing budget to the nodes of the graph, at each time instant. For bounded degree graphs, we provide a lower bound on the expected time to extinction under any such dynamic allocation policy, in terms of a combinatorial quantity that we call the resistance of the set o… ▽ More We consider the propagation of a contagion process (epidemic) on a network and study the problem of dynamically allocating a fixed curing budget to the nodes of the graph, at each time instant. For bounded degree graphs, we provide a lower bound on the expected time to extinction under any such dynamic allocation policy, in terms of a combinatorial quantity that we call the resistance of the set of initially infected nodes, the available budget, and the number of nodes n. Specifically, we consider the case of bounded degree graphs, with the resistance growing linearly in n. We show that if the curing budget is less than a certain multiple of the resistance, then the expected time to extinction grows exponentially with n. As a corollary, if all nodes are initially infected and the CutWidth of the graph grows linearly, while the curing budget is less than a certain multiple of the CutWidth, then the expected time to extinction grows exponentially in n. The combination of the latter with our prior work establishes a fairly sharp phase transition on the expected time to extinction (sub-linear versus exponential) based on the relation between the CutWidth and the curing budget. △ Less

Submitted 20 October, 2015; originally announced October 2015.

Showing 1–50 of 81 results for author: Ozdaglar, A