Skip to main content

Showing 1–13 of 13 results for author: Vikram, S

  1. arXiv:2404.07839  [pdf, other

    cs.LG cs.AI cs.CL

    RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

    Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

    Abstract: We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2312.02179  [pdf, other

    cs.LG cs.AI cs.CL

    Training Chain-of-Thought via Latent-Variable Inference

    Authors: Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous

    Abstract: Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a ``chain-of-thought'' (CoT) prompt. One can also improve LLMs' performance on a specific task by supervised fine-tuning, i.e., by using gradient ascent on some tunable parameters to maximize the average log-likelihood of correct answers from a labeled training se… ▽ More

    Submitted 28 November, 2023; originally announced December 2023.

    Comments: 23 pages, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  5. arXiv:2104.14421  [pdf, other

    cs.LG stat.ML

    What Are Bayesian Neural Network Posteriors Really Like?

    Authors: Pavel Izmailov, Sharad Vikram, Matthew D. Hoffman, Andrew Gordon Wilson

    Abstract: The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex. For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as mean-field variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC). To investigate foundational questions in Bayesian deep learning, we instead use full-batch H… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

  6. arXiv:2003.01687  [pdf, other

    cs.LG stat.ML

    Automatic Differentiation Variational Inference with Mixtures

    Authors: Warren R. Morningstar, Sharad M. Vikram, Cusuh Ham, Andrew Gallagher, Joshua V. Dillon

    Abstract: Automatic Differentiation Variational Inference (ADVI) is a useful tool for efficiently learning probabilistic models in machine learning. Generally approximate posteriors learned by ADVI are forced to be unimodal in order to facilitate use of the reparameterization trick. In this paper, we show how stratified sampling may be used to enable mixture distributions as the approximate posterior, and d… ▽ More

    Submitted 24 June, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: Submitted to NeurIPS 2020, Corrected footnote from: "34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada" to "Preprint. Under review."

  7. arXiv:2002.00643  [pdf, other

    stat.ML cs.LG

    Automatic structured variational inference

    Authors: Luca Ambrogioni, Kate Lin, Emily Fertig, Sharad Vikram, Max Hinne, Dave Moore, Marcel van Gerven

    Abstract: Stochastic variational inference offers an attractive option as a default method for differentiable probabilistic programming. However, the performance of the variational approach depends on the choice of an appropriate variational family. Here, we introduce automatic structured variational inference (ASVI), a fully automated method for constructing structured variational families, inspired by the… ▽ More

    Submitted 10 February, 2021; v1 submitted 3 February, 2020; originally announced February 2020.

  8. arXiv:1903.11774  [pdf, ps, other

    cs.LG cs.AI stat.ML

    How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

    Authors: Quan Vuong, Sharad Vikram, Hao Su, Sicun Gao, Henrik I. Christensen

    Abstract: Recently, reinforcement learning (RL) algorithms have demonstrated remarkable success in learning complicated behaviors from minimally processed input. However, most of this success is limited to simulation. While there are promising successes in applying RL algorithms directly on real systems, their performance on more complex systems remains bottle-necked by the relative data inefficiency of RL… ▽ More

    Submitted 27 March, 2019; originally announced March 2019.

    Comments: 2-page extended abstract

  9. arXiv:1811.04582  [pdf

    cs.CR

    A Lightweight Signature-Based IDS for IoT Environment

    Authors: Nazim Uddin Sheikh, Hasina Rahman, Shashwat Vikram, Hamed AlQahtani

    Abstract: With the advent of large-scale heterogeneous networks comes the problem of unified network control resulting in security lapses that could have otherwise avoided. A mechanism is needed to detect and deflect intruders to safeguard resource constraint edge devices and networks as well. In this paper we demonstrate the use of an optimized pattern recognition algorithm to detect such attacks. Furtherm… ▽ More

    Submitted 12 November, 2018; originally announced November 2018.

    Comments: 4 pages, 1 figure, 1 table

  10. arXiv:1810.06891  [pdf, other

    cs.LG stat.ML

    The LORACs prior for VAEs: Letting the Trees Speak for the Data

    Authors: Sharad Vikram, Matthew D. Hoffman, Matthew J. Johnson

    Abstract: In variational autoencoders, the prior on the latent codes $z$ is often treated as an afterthought, but the prior shapes the kind of latent representation that the model learns. If the goal is to learn a representation that is interpretable and useful, then the prior should reflect the ways in which the high-level factors that describe the data vary. The "default" prior is an isotropic normal, but… ▽ More

    Submitted 16 October, 2018; originally announced October 2018.

  11. arXiv:1808.09105  [pdf, other

    cs.LG cs.RO stat.ML

    SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

    Authors: Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine

    Abstract: Model-based reinforcement learning (RL) has proven to be a data efficient approach for learning control tasks but is difficult to utilize in domains with complex observations such as images. In this paper, we present a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observat… ▽ More

    Submitted 22 June, 2019; v1 submitted 27 August, 2018; originally announced August 2018.

    Comments: ICML 2019. Project website: https://sites.google.com/view/icml19solar

  12. arXiv:1602.03258  [pdf, other

    cs.LG

    Interactive Bayesian Hierarchical Clustering

    Authors: Sharad Vikram, Sanjoy Dasgupta

    Abstract: Clustering is a powerful tool in data analysis, but it is often difficult to find a grouping that aligns with a user's needs. To address this, several methods incorporate constraints obtained from users into clustering algorithms, but unfortunately do not apply to hierarchical clustering. We design an interactive Bayesian algorithm that incorporates user interaction into hierarchical clustering wh… ▽ More

    Submitted 26 April, 2016; v1 submitted 9 February, 2016; originally announced February 2016.

  13. arXiv:1511.03683  [pdf, other

    cs.CL cs.LG

    Generative Concatenative Nets Jointly Learn to Write and Classify Reviews

    Authors: Zachary C. Lipton, Sharad Vikram, Julian McAuley

    Abstract: A recommender system's basic task is to estimate how users will respond to unseen items. This is typically modeled in terms of how a user might rate a product, but here we aim to extend such approaches to model how a user would write about the product. To do so, we design a character-level Recurrent Neural Network (RNN) that generates personalized product reviews. The network convincingly learns s… ▽ More

    Submitted 7 April, 2016; v1 submitted 11 November, 2015; originally announced November 2015.