subscribe to arXiv mailings

Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms

Authors: Khashayar Khosravi, Renato Paes Leme, Chara Podimata, Apostolis Tsorvantzis

Abstract: We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states that we call Bandits with Deterministically Evolving States ($B$-$DES$). The workhorse applications of our model are learning for recommendation systems and learning for online ads. In both cases, the reward that the algorithm obtains at each round is a function of the short… ▽ More We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states that we call Bandits with Deterministically Evolving States ($B$-$DES$). The workhorse applications of our model are learning for recommendation systems and learning for online ads. In both cases, the reward that the algorithm obtains at each round is a function of the short-term reward of the action chosen and how "healthy" the system is (i.e., as measured by its state). For example, in recommendation systems, the reward that the platform obtains from a user's engagement with a particular type of content depends not only on the inherent features of the specific content, but also on how the user's preferences have evolved as a result of interacting with other types of content on the platform. Our general model accounts for the different rate $λ\in [0,1]$ at which the state evolves (e.g., how fast a user's preferences shift as a result of previous content consumption) and encompasses standard multi-armed bandits as a special case. The goal of the algorithm is to minimize a notion of regret against the best-fixed sequence of arms pulled, which is significantly harder to attain compared to standard benchmark of the best-fixed action in hindsight. We present online learning algorithms for any possible value of the evolution rate $λ$ and we show the robustness of our results to various model misspecifications. △ Less

Submitted 19 February, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

arXiv:2102.13028 [pdf, other]

Batched Neural Bandits

Authors: Quanquan Gu, Amin Karbasi, Khashayar Khosravi, Vahab Mirrokni, Dongruo Zhou

Abstract: In many sequential decision-making problems, the individuals are split into several batches and the decision-maker is only allowed to change her policy at the end of batches. These batch problems have a large number of applications, ranging from clinical trials to crowdsourcing. Motivated by this, we study the stochastic contextual bandit problem for general reward distributions under the batched… ▽ More In many sequential decision-making problems, the individuals are split into several batches and the decision-maker is only allowed to change her policy at the end of batches. These batch problems have a large number of applications, ranging from clinical trials to crowdsourcing. Motivated by this, we study the stochastic contextual bandit problem for general reward distributions under the batched setting. We propose the BatchNeuralUCB algorithm which combines neural networks with optimism to address the exploration-exploitation tradeoff while keeping the total number of batches limited. We study BatchNeuralUCB under both fixed and adaptive batch size settings and prove that it achieves the same regret as the fully sequential version while reducing the number of policy updates considerably. We confirm our theoretical results via simulations on both synthetic and real-world datasets. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: 21 pages, 7 figures

arXiv:2002.10121 [pdf, other]

The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Authors: Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi

Abstract: We investigate a Bayesian $k$-armed bandit problem in the \emph{many-armed} regime, where $k \geq \sqrt{T}$ and $T$ represents the time horizon. Initially, and aligned with recent literature on many-armed bandit problems, we observe that subsampling plays a key role in designing optimal algorithms; the conventional UCB algorithm is sub-optimal, whereas a subsampled UCB (SS-UCB), which selects… ▽ More We investigate a Bayesian $k$-armed bandit problem in the \emph{many-armed} regime, where $k \geq \sqrt{T}$ and $T$ represents the time horizon. Initially, and aligned with recent literature on many-armed bandit problems, we observe that subsampling plays a key role in designing optimal algorithms; the conventional UCB algorithm is sub-optimal, whereas a subsampled UCB (SS-UCB), which selects $Θ(\sqrt{T})$ arms for execution under the UCB framework, achieves rate-optimality. However, despite SS-UCB's theoretical promise of optimal regret, it empirically underperforms compared to a greedy algorithm that consistently chooses the empirically best arm. This observation extends to contextual settings through simulations with real-world data. Our findings suggest a new form of \emph{free exploration} beneficial to greedy algorithms in the many-armed context, fundamentally linked to a tail event concerning the prior distribution of arm rewards. This finding diverges from the notion of free exploration, which relates to covariate variation, as recently discussed in contextual bandit literature. Expanding upon these insights, we establish that the subsampled greedy approach not only achieves rate-optimality for Bernoulli bandits within the many-armed regime but also attains sublinear regret across broader distributions. Collectively, our research indicates that in the many-armed regime, practitioners might find greater value in adopting greedy algorithms. △ Less

Submitted 20 March, 2024; v1 submitted 24 February, 2020; originally announced February 2020.

arXiv:2001.01558 [pdf]

Shear Stress Distribution Prediction in Symmetric Compound Channels Using Data Mining and Machine Learning Models

Authors: Zohreh Sheikh Khozani, Khabat Khosravi, Mohammadamin Torabi, Amir Mosavi, Bahram Rezaei, Timon Rabczuk

Abstract: Shear stress distribution prediction in open channels is of utmost importance in hydraulic structural engineering as it directly affects the design of stable channels. In this study, at first, a series of experimental tests were conducted to assess the shear stress distribution in prismatic compound channels. The shear stress values around the whole wetted perimeter were measured in the compound c… ▽ More Shear stress distribution prediction in open channels is of utmost importance in hydraulic structural engineering as it directly affects the design of stable channels. In this study, at first, a series of experimental tests were conducted to assess the shear stress distribution in prismatic compound channels. The shear stress values around the whole wetted perimeter were measured in the compound channel with different floodplain widths also in different flow depths in subcritical and supercritical conditions. A set of, data mining and machine learning models including Random Forest (RF), M5P, Random Committee (RC), KStar and Additive Regression Model (AR) implemented on attained data to predict the shear stress distribution in the compound channel. Results indicated among these five models, RF method indicated the most precise results with the highest R2 value of 0.9. Finally, the most powerful data mining method which studied in this research (RF) compared with two well-known analytical models of Shiono and Knight Method (SKM) and Shannon method to acquire the proposed model functioning in predicting the shear stress distribution. The results showed that the RF model has the best prediction performance compared to SKM and Shannon models. △ Less

Submitted 20 December, 2019; originally announced January 2020.

Comments: 29 pages, 6 figures

MSC Class: 68T05

arXiv:1901.03719 [pdf, other]

Non-Parametric Inference Adaptive to Intrinsic Dimension

Authors: Khashayar Khosravi, Greg Lewis, Vasilis Syrgkanis

Abstract: We consider non-parametric estimation and inference of conditional moment models in high dimensions. We show that even when the dimension $D$ of the conditioning variable is larger than the sample size $n$, estimation and inference is feasible as long as the distribution of the conditioning variable has small intrinsic dimension $d$, as measured by locally low doubling measures. Our estimation is… ▽ More We consider non-parametric estimation and inference of conditional moment models in high dimensions. We show that even when the dimension $D$ of the conditioning variable is larger than the sample size $n$, estimation and inference is feasible as long as the distribution of the conditioning variable has small intrinsic dimension $d$, as measured by locally low doubling measures. Our estimation is based on a sub-sampled ensemble of the $k$-nearest neighbors ($k$-NN) $Z$-estimator. We show that if the intrinsic dimension of the covariate distribution is equal to $d$, then the finite sample estimation error of our estimator is of order $n^{-1/(d+2)}$ and our estimate is $n^{1/(d+2)}$-asymptotically normal, irrespective of $D$. The sub-sampling size required for achieving these results depends on the unknown intrinsic dimension $d$. We propose an adaptive data-driven approach for choosing this parameter and prove that it achieves the desired rates. We discuss extensions and applications to heterogeneous treatment effect estimation. △ Less

Submitted 17 June, 2019; v1 submitted 11 January, 2019; originally announced January 2019.

arXiv:1704.09011 [pdf, other]

Mostly Exploration-Free Algorithms for Contextual Bandits

Authors: Hamsa Bastani, Mohsen Bayati, Khashayar Khosravi

Abstract: The contextual bandit literature has traditionally focused on algorithms that address the exploration-exploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surpri… ▽ More The contextual bandit literature has traditionally focused on algorithms that address the exploration-exploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surprisingly, we find that a simple greedy algorithm can be rate optimal (achieves asymptotically optimal regret) if there is sufficient randomness in the observed contexts (covariates). We prove that this is always the case for a two-armed bandit under a general class of context distributions that satisfy a condition we term covariate diversity. Furthermore, even absent this condition, we show that a greedy algorithm can be rate optimal with positive probability. Thus, standard bandit algorithms may unnecessarily explore. Motivated by these results, we introduce Greedy-First, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy algorithm or to explore. We prove that this algorithm is rate optimal without any additional assumptions on the context distribution or the number of arms. Extensive simulations demonstrate that Greedy-First successfully reduces exploration and outperforms existing (exploration-based) contextual bandit algorithms such as Thompson sampling or upper confidence bound (UCB). △ Less

Submitted 18 April, 2020; v1 submitted 28 April, 2017; originally announced April 2017.

Comments: 62 Pages, 7 Figures

arXiv:1611.01462 [pdf, ps, other]

Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

Authors: Hakan Inan, Khashayar Khosravi, Richard Socher

Abstract: Recurrent neural networks have been very successful at predicting sequences of words in tasks such as language modeling. However, all such models are based on the conventional classification framework, where the model is trained against one-hot targets, and each word is represented both as an input and as an output in isolation. This causes inefficiencies in learning both in terms of utilizing all… ▽ More Recurrent neural networks have been very successful at predicting sequences of words in tasks such as language modeling. However, all such models are based on the conventional classification framework, where the model is trained against one-hot targets, and each word is represented both as an input and as an output in isolation. This causes inefficiencies in learning both in terms of utilizing all of the information and in terms of the number of parameters needed to train. We introduce a novel theoretical framework that facilitates better learning in language modeling, and show that our framework leads to tying together the input embedding and the output projection matrices, greatly reducing the number of trainable variables. Our framework leads to state of the art performance on the Penn Treebank with a variety of network models. △ Less

Submitted 11 March, 2017; v1 submitted 4 November, 2016; originally announced November 2016.

arXiv:1603.00126 [pdf, ps, other]

Multiclass Classification, Information, Divergence, and Surrogate Risk

Authors: John C. Duchi, Khashayar Khosravi, Feng Ruan

Abstract: We provide a unifying view of statistical information measures, multi-way Bayesian hypothesis testing, loss functions for multi-class classification problems, and multi-distribution $f$-divergences, elaborating equivalence results between all of these objects, and extending existing results for binary outcome spaces to more general ones. We consider a generalization of $f$-divergences to multiple… ▽ More We provide a unifying view of statistical information measures, multi-way Bayesian hypothesis testing, loss functions for multi-class classification problems, and multi-distribution $f$-divergences, elaborating equivalence results between all of these objects, and extending existing results for binary outcome spaces to more general ones. We consider a generalization of $f$-divergences to multiple distributions, and we provide a constructive equivalence between divergences, statistical information (in the sense of DeGroot), and losses for multiclass classification. A major application of our results is in multi-class classification problems in which we must both infer a discriminant function $γ$---for making predictions on a label $Y$ from datum $X$---and a data representation (or, in the setting of a hypothesis testing problem, an experimental design), represented as a quantizer $\mathsf{q}$ from a family of possible quantizers $\mathsf{Q}$. In this setting, we characterize the equivalence between loss functions, meaning that optimizing either of two losses yields an optimal discriminant and quantizer $\mathsf{q}$, complementing and extending earlier results of Nguyen et. al. to the multiclass case. Our results provide a more substantial basis than standard classification calibration results for comparing different losses: we describe the convex losses that are consistent for jointly choosing a data representation and minimizing the (weighted) probability of error in multiclass classification problems. △ Less

Submitted 10 September, 2017; v1 submitted 29 February, 2016; originally announced March 2016.

Showing 1–8 of 8 results for author: Khosravi, K