Skip to main content

Showing 1–17 of 17 results for author: Tripuraneni, N

  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2311.00871  [pdf, other

    cs.LG cs.CL stat.ML

    Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

    Authors: Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni

    Abstract: Transformer models, notably large language models (LLMs), have the remarkable ability to perform in-context learning (ICL) -- to perform new tasks when prompted with unseen input-output examples without any explicit model training. In this work, we study how effectively transformers can bridge between their pretraining data mixture, comprised of multiple distinct task families, to identify and lea… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  4. arXiv:2309.07893  [pdf, other

    stat.ME cs.LG stat.ML

    Choosing a Proxy Metric from Past Experiments

    Authors: Nilesh Tripuraneni, Lee Richardson, Alexander D'Amour, Jacopo Soriano, Steve Yadlowsky

    Abstract: In many randomized experiments, the treatment effect of the long-term metric (i.e. the primary outcome of interest) is often difficult or infeasible to measure. Such long-term metrics are often slow to react to changes and sufficiently noisy they are challenging to faithfully estimate in short-horizon experiments. A common alternative is to measure several short-term proxy metrics in the hope they… ▽ More

    Submitted 15 June, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: To appear in KDD 2024

  5. arXiv:2206.12441  [pdf, ps, other

    cs.LG

    Joint Representation Training in Sequential Tasks with Shared Structure

    Authors: Aldo Pacchiano, Ofir Nachum, Nilseh Tripuraneni, Peter Bartlett

    Abstract: Classical theory in reinforcement learning (RL) predominantly focuses on the single task setting, where an agent learns to solve a task through trial-and-error experience, given access to data only from that task. However, many recent empirical works have demonstrated the significant practical benefits of leveraging a joint representation trained across multiple, related tasks. In this work we the… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

  6. arXiv:2111.08234  [pdf, other

    stat.ML cs.LG

    Covariate Shift in High-Dimensional Random Feature Regression

    Authors: Nilesh Tripuraneni, Ben Adlam, Jeffrey Pennington

    Abstract: A significant obstacle in the development of robust machine learning models is covariate shift, a form of distribution shift that occurs when the input distributions of the training and test sets differ while the conditional label distributions remain the same. Despite the prevalence of covariate shift in real-world applications, a theoretical understanding in the context of modern machine learnin… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

    Comments: 107 pages, 10 figures

  7. arXiv:2105.10590  [pdf, other

    stat.ML cs.LG q-bio.BM q-bio.QM

    Parallelizing Contextual Bandits

    Authors: Jeffrey Chan, Aldo Pacchiano, Nilesh Tripuraneni, Yun S. Song, Peter Bartlett, Michael I. Jordan

    Abstract: Standard approaches to decision-making under uncertainty focus on sequential exploration of the space of decisions. However, \textit{simultaneously} proposing a batch of decisions, which leverages available resources for parallel experimentation, has the potential to rapidly accelerate exploration. We present a family of (parallel) contextual bandit algorithms applicable to problems with bounded e… ▽ More

    Submitted 5 February, 2023; v1 submitted 21 May, 2021; originally announced May 2021.

  8. arXiv:2011.12433  [pdf, ps, other

    math.ST cs.DS cs.LG stat.ML

    Optimal Mean Estimation without a Variance

    Authors: Yeshwanth Cherapanamjeri, Nilesh Tripuraneni, Peter L. Bartlett, Michael I. Jordan

    Abstract: We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist. Concretely, given a sample $\mathbf{X} = \{X_i\}_{i = 1}^n$ from a distribution $\mathcal{D}$ over $\mathbb{R}^d$ with mean $μ$ which satisfies the following \emph{weak-moment} assumption for some ${α\in [0, 1]}$: \begin{equation*} \forall \|v\| = 1: \mathbb{E}_{X… ▽ More

    Submitted 8 December, 2020; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: Fixed typographical errors in Theorem 1.2, Lemmas 4.3 and C.8

  9. arXiv:2007.08137  [pdf, ps, other

    stat.ML cs.DS cs.LG

    Optimal Robust Linear Regression in Nearly Linear Time

    Authors: Yeshwanth Cherapanamjeri, Efe Aras, Nilesh Tripuraneni, Michael I. Jordan, Nicolas Flammarion, Peter L. Bartlett

    Abstract: We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = \langle X,w^* \rangle + ε$ (with $X \in \mathbb{R}^d$ and $ε$ independent), in which an $η$ fraction of the samples have been adversarially corrupted. We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive,… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  10. arXiv:2006.11650  [pdf, ps, other

    cs.LG stat.ML

    On the Theory of Transfer Learning: The Importance of Task Diversity

    Authors: Nilesh Tripuraneni, Michael I. Jordan, Chi Jin

    Abstract: We provide new statistical guarantees for transfer learning via representation learning--when transfer is achieved by learning a feature representation shared across different tasks. This enables learning on new tasks using far less data than is required to learn them in isolation. Formally, we consider $t+1$ tasks parameterized by functions of the form $f_j \circ h$ in a general function class… ▽ More

    Submitted 22 October, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  11. arXiv:2002.11684  [pdf, other

    cs.LG cs.AI stat.ML

    Provable Meta-Learning of Linear Representations

    Authors: Nilesh Tripuraneni, Chi Jin, Michael I. Jordan

    Abstract: Meta-learning, or learning-to-learn, seeks to design algorithms that can utilize previous experience to rapidly learn new skills or adapt to new environments. Representation learning -- a key tool for performing meta-learning -- learns a data representation that can transfer knowledge across multiple tasks, which is essential in regimes where data is scarce. Despite a recent surge of interest in t… ▽ More

    Submitted 31 December, 2021; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: Lower bound slightly improved to include task diversity parameter

  12. arXiv:1912.11071  [pdf, ps, other

    math.ST cs.DS

    Algorithms for Heavy-Tailed Statistics: Regression, Covariance Estimation, and Beyond

    Authors: Yeshwanth Cherapanamjeri, Samuel B. Hopkins, Tarun Kathuria, Prasad Raghavendra, Nilesh Tripuraneni

    Abstract: We study efficient algorithms for linear regression and covariance estimation in the absence of Gaussian assumptions on the underlying distributions of samples, making assumptions instead about only finitely-many moments. We focus on how many samples are needed to do estimation and regression with high accuracy and exponentially-good success probability. For covariance estimation, linear regress… ▽ More

    Submitted 23 December, 2019; originally announced December 2019.

  13. arXiv:1908.02341  [pdf, other

    stat.ML cs.LG

    Single Point Transductive Prediction

    Authors: Nilesh Tripuraneni, Lester Mackey

    Abstract: Standard methods in supervised learning separate training and prediction: the model is fit independently of any test points it may encounter. However, can knowledge of the next test point $\mathbf{x}_{\star}$ be exploited to improve prediction accuracy? We address this question in the context of linear prediction, showing how techniques from semi-parametric inference can be used transductively to… ▽ More

    Submitted 29 June, 2020; v1 submitted 6 August, 2019; originally announced August 2019.

    Comments: 37th International Conference on Machine Learning (ICML 2020)

  14. arXiv:1810.04777  [pdf, other

    stat.ML cs.LG

    Rao-Blackwellized Stochastic Gradients for Discrete Distributions

    Authors: Runjing Liu, Jeffrey Regier, Nilesh Tripuraneni, Michael I. Jordan, Jon McAuliffe

    Abstract: We wish to compute the gradient of an expectation over a finite or countably infinite sample space having $K \leq \infty$ categories. When $K$ is indeed infinite, or finite but very large, the relevant summation is intractable. Accordingly, various stochastic gradient estimators have been proposed. In this paper, we describe a technique that can be applied to reduce the variance of any such estima… ▽ More

    Submitted 13 May, 2019; v1 submitted 10 October, 2018; originally announced October 2018.

    Comments: Accepted to ICML 2019

  15. arXiv:1802.09128  [pdf, other

    cs.LG math.OC stat.ML

    Averaging Stochastic Gradient Descent on Riemannian Manifolds

    Authors: Nilesh Tripuraneni, Nicolas Flammarion, Francis Bach, Michael I. Jordan

    Abstract: We consider the minimization of a function defined on a Riemannian manifold $\mathcal{M}$ accessible only through unbiased estimates of its gradients. We develop a geometric framework to transform a sequence of slowly converging iterates generated from stochastic gradient descent (SGD) on $\mathcal{M}$ to an averaged iterate sequence with a robust and fast $O(1/n)$ convergence rate. We then presen… ▽ More

    Submitted 8 June, 2018; v1 submitted 25 February, 2018; originally announced February 2018.

    Comments: COLT 2018

  16. arXiv:1711.02838  [pdf, other

    cs.LG math.OC stat.ML

    Stochastic Cubic Regularization for Fast Nonconvex Optimization

    Authors: Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier, Michael I. Jordan

    Abstract: This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak 2006]. The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(ε^{-3.5})$ stochastic gradient and stochastic Hessian-vector product evaluations. The latter can be computed… ▽ More

    Submitted 5 December, 2017; v1 submitted 8 November, 2017; originally announced November 2017.

    Comments: The first two authors contributed equally

  17. arXiv:1706.04161  [pdf, other

    stat.ML cs.LG

    Lost Relatives of the Gumbel Trick

    Authors: Matej Balog, Nilesh Tripuraneni, Zoubin Ghahramani, Adrian Weller

    Abstract: The Gumbel trick is a method to sample from a discrete probability distribution, or to estimate its normalizing partition function. The method relies on repeatedly applying a random perturbation to the distribution in a particular way, each time solving for the most likely configuration. We derive an entire family of related methods, of which the Gumbel trick is one member, and show that the new m… ▽ More

    Submitted 13 June, 2017; originally announced June 2017.

    Comments: 34th International Conference on Machine Learning (ICML 2017)