Skip to main content

Showing 1–45 of 45 results for author: Tucker, G

  1. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2402.05821  [pdf, other

    cs.LG cs.NE

    Guided Evolution with Binary Discriminators for ML Program Search

    Authors: John D. Co-Reyes, Yingjie Miao, George Tucker, Aleksandra Faust, Esteban Real

    Abstract: How to automatically design better machine learning programs is an open problem within AutoML. While evolution has been a popular tool to search for better ML programs, using learning itself to guide the search has been less successful and less understood on harder problems but has the promise to dramatically increase the speed and final performance of the optimization process. We propose guiding… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  4. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  5. arXiv:2310.08710  [pdf, other

    cs.RO cs.LG

    Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

    Authors: Cole Gulino, Justin Fu, Wenjie Luo, George Tucker, Eli Bronstein, Yiren Lu, Jean Harb, Xinlei Pan, Yan Wang, Xiangyu Chen, John D. Co-Reyes, Rishabh Agarwal, Rebecca Roelofs, Yao Lu, Nico Montali, Paul Mougin, Zoey Yang, Brandyn White, Aleksandra Faust, Rowan McAllister, Dragomir Anguelov, Benjamin Sapp

    Abstract: Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simul… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  6. arXiv:2212.11419  [pdf, other

    cs.AI cs.RO

    Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios

    Authors: Yiren Lu, Justin Fu, George Tucker, Xinlei Pan, Eli Bronstein, Rebecca Roelofs, Benjamin Sapp, Brandyn White, Aleksandra Faust, Shimon Whiteson, Dragomir Anguelov, Sergey Levine

    Abstract: Imitation learning (IL) is a simple and powerful way to use high-quality human driving data, which can be collected at scale, to produce human-like behavior. However, policies based on imitation learning alone often fail to sufficiently account for safety and reliability concerns. In this paper, we show how imitation learning combined with reinforcement learning using simple rewards can substantia… ▽ More

    Submitted 10 August, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

    ACM Class: I.2.9; I.2.6

  7. arXiv:2211.15144  [pdf, other

    cs.LG

    Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes

    Authors: Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, Sergey Levine

    Abstract: The potential of offline reinforcement learning (RL) is that high-capacity models trained on large, heterogeneous datasets can lead to agents that generalize broadly, analogously to similar advances in vision and NLP. However, recent works argue that offline RL methods encounter unique challenges to scaling up model capacity. Drawing on the learnings from these works, we re-examine previous design… ▽ More

    Submitted 17 April, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: Accepted at ICLR 2023. Project website: https://sites.google.com/view/scaling-offlinerl/home

  8. arXiv:2211.02016  [pdf, other

    cs.LG cs.AI

    Oracle Inequalities for Model Selection in Offline Reinforcement Learning

    Authors: Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill

    Abstract: In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good policy without interacting with the environment. A major challenge in applying such methods in practice is the lack of both theoretically principled and practical tools for model selection and evaluation. To address this, we study the problem of model selection in offline RL with value function approximat… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  9. arXiv:2112.12320  [pdf, other

    cs.LG stat.ML

    Model Selection in Batch Policy Optimization

    Authors: Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai

    Abstract: We study the problem of model selection in batch policy optimization: given a fixed, partial-feedback dataset and $M$ model classes, learn a policy with performance that is competitive with the policy derived from the best model class. We formalize the problem in the contextual bandit setting with linear model classes by identifying three sources of error that any model selection algorithm should… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

  10. arXiv:2112.04716  [pdf, other

    cs.LG

    DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

    Authors: Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine

    Abstract: Despite overparameterization, deep networks trained via supervised learning are easy to optimize and exhibit excellent generalization. One hypothesis to explain this is that overparameterized deep networks enjoy the benefits of implicit regularization induced by stochastic gradient descent, which favors parsimonious solutions that generalize well on test inputs. It is reasonable to surmise that de… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

  11. arXiv:2111.09859  [pdf, other

    cs.CE physics.flu-dyn

    On the use of high order central difference schemes for differential equation based wall distance computations

    Authors: Hemanth Chandra Vamsi Kakumani, Nagabhushana Rao Vadlamani, Paul Gary Tucker

    Abstract: A computationally efficient high-order solver is developed to compute the wall distances by solving the relevant partial differential equations, namely: Eikonal, Hamilton-Jacobi (HJ) and Poisson equations. In contrast to the upwind schemes widely used in the literature, we explore the suitability of high-order central difference schemes (explicit/compact) for the wall-distance computation. While s… ▽ More

    Submitted 9 September, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

  12. arXiv:2106.08056  [pdf, other

    cs.LG stat.ML

    Coupled Gradient Estimators for Discrete Latent Variables

    Authors: Zhe Dong, Andriy Mnih, George Tucker

    Abstract: Training models with discrete latent variables is challenging due to the high variance of unbiased gradient estimators. While low-variance reparameterization gradients of a continuous relaxation can provide an effective solution, a continuous relaxation is not always available or tractable. Dong et al. (2020) and Yin et al. (2020) introduced a performant estimator that does not rely on continuous… ▽ More

    Submitted 15 November, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: Published in NeurIPS 2021

  13. arXiv:2104.13877  [pdf, other

    cs.LG cs.AI stat.ML

    Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

    Authors: Michael R. Zhang, Tom Le Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, Ziyu Wang, Mohammad Norouzi

    Abstract: Standard dynamics models for continuous control make use of feedforward computation to predict the conditional distribution of next state and reward given current state and action using a multivariate Gaussian with a diagonal covariance structure. This modeling choice assumes that different dimensions of the next state and reward are conditionally independent given the current state and action and… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: ICLR 2021. 17 pages

  14. arXiv:2103.16596  [pdf, other

    cs.LG stat.ML

    Benchmarks for Deep Off-Policy Evaluation

    Authors: Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine

    Abstract: Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making. The ability to learn offline is particularly important in many real-world domains, such as in healthcare, recommender systems, or robotics, where online data collection is an expensive and potentially dangerous process. Being able t… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: ICLR 2021 paper. Policies and evaluation code are available at https://github.com/google-research/deep_ope

  15. arXiv:2012.06919  [pdf, other

    cs.LG

    Offline Policy Selection under Uncertainty

    Authors: Mengjiao Yang, Bo Dai, Ofir Nachum, George Tucker, Dale Schuurmans

    Abstract: The presence of uncertainty in policy evaluation significantly complicates the process of policy ranking and selection in real-world settings. We formally consider offline policy selection as learning preferences over a set of policy prospects given a fixed experience dataset. While one can select or rank policies based on point estimates of their policy values or high-confidence intervals, access… ▽ More

    Submitted 12 December, 2020; originally announced December 2020.

  16. arXiv:2006.13888  [pdf, other

    cs.LG stat.ML

    RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

    Authors: Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas

    Abstract: Offline methods for reinforcement learning have a potential to help bridge the gap between reinforcement learning research and real-world applications. They make it possible to learn policies from offline datasets, thus overcoming concerns associated with online data collection in the real-world, including cost, safety, or ethical concerns. In this paper, we propose a benchmark called RL Unplugged… ▽ More

    Submitted 12 February, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: NeurIPS paper. 21 pages including supplementary material, the github link for the datasets: https://github.com/deepmind/deepmind-research/rl_unplugged

  17. arXiv:2006.10680  [pdf, other

    cs.LG stat.ML

    DisARM: An Antithetic Gradient Estimator for Binary Latent Variables

    Authors: Zhe Dong, Andriy Mnih, George Tucker

    Abstract: Training models with discrete latent variables is challenging due to the difficulty of estimating the gradients accurately. Much of the recent progress has been achieved by taking advantage of continuous relaxations of the system, which are not always available or even possible. The Augment-REINFORCE-Merge (ARM) estimator provides an alternative that, instead of relaxation, uses continuous augment… ▽ More

    Submitted 3 December, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Journal ref: Part of Advances in Neural Information Processing Systems 33 proceedings (NeurIPS 2020)

  18. arXiv:2006.04779  [pdf, other

    cs.LG stat.ML

    Conservative Q-Learning for Offline Reinforcement Learning

    Authors: Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine

    Abstract: Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected, static datasets without further interaction. However, in practice, offline RL presents a major challenge, and standard off-policy RL methods can fail due to overest… ▽ More

    Submitted 19 August, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: Preprint. Website at: https://sites.google.com/view/cql-offline-rl

  19. arXiv:2005.01643  [pdf, other

    cs.LG cs.AI stat.ML

    Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

    Authors: Sergey Levine, Aviral Kumar, George Tucker, Justin Fu

    Abstract: In this tutorial article, we aim to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcement learning algorithms that utilize previously collected data, without additional online data collection. Offline reinforcement learning algorithms hold tremendous promise for making it possible to turn large datasets into power… ▽ More

    Submitted 1 November, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

  20. arXiv:2004.07219  [pdf, other

    cs.LG stat.ML

    D4RL: Datasets for Deep Data-Driven Reinforcement Learning

    Authors: Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine

    Abstract: The offline reinforcement learning (RL) setting (also known as full batch RL), where a policy is learned from a static dataset, is compelling as progress enables RL methods to take advantage of large, previously-collected datasets, much like how the rise of large datasets has fueled results in supervised learning. However, existing online RL benchmarks are not tailored towards the offline setting… ▽ More

    Submitted 5 February, 2021; v1 submitted 15 April, 2020; originally announced April 2020.

    Comments: Website available at https://sites.google.com/view/d4rl/home

  21. arXiv:1912.03820  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-Learning without Memorization

    Authors: Mingzhang Yin, George Tucker, Mingyuan Zhou, Sergey Levine, Chelsea Finn

    Abstract: The ability to learn new concepts with small amounts of data is a critical aspect of intelligence that has proven challenging for deep learning methods. Meta-learning has emerged as a promising technique for leveraging data from previous tasks to enable efficient learning of new tasks. However, most meta-learning algorithms implicitly require that the meta-training tasks be mutually-exclusive, suc… ▽ More

    Submitted 27 April, 2020; v1 submitted 8 December, 2019; originally announced December 2019.

    Comments: ICLR 2020

  22. arXiv:1911.11361  [pdf, other

    cs.LG cs.AI stat.ML

    Behavior Regularized Offline Reinforcement Learning

    Authors: Yifan Wu, George Tucker, Ofir Nachum

    Abstract: In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment. However in many real-world applications, access to the environment is limited to a fixed offline dataset of logged experience. In such settings, standard RL algorithms have been shown to diverge or otherwise yield poor performance. Accordingly, recent work has suggested a numb… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

  23. arXiv:1911.02469  [pdf, other

    cs.LG stat.ML

    Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse

    Authors: James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi

    Abstract: Posterior collapse in Variational Autoencoders (VAEs) arises when the variational posterior distribution closely matches the prior for a subset of latent variables. This paper presents a simple and intuitive explanation for posterior collapse through the analysis of linear VAEs and their direct correspondence with Probabilistic PCA (pPCA). We explain how posterior collapse may occur in pPCA due to… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

    Comments: 11 main pages, 10 appendix pages. 13 figures total. Accepted at 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

  24. arXiv:1910.14265  [pdf, other

    cs.LG stat.ML

    Energy-Inspired Models: Learning with Sampler-Induced Distributions

    Authors: Dieterich Lawson, George Tucker, Bo Dai, Rajesh Ranganath

    Abstract: Energy-based models (EBMs) are powerful probabilistic models, but suffer from intractable sampling and density evaluation due to the partition function. As a result, inference in EBMs relies on approximate sampling algorithms, leading to a mismatch between the model and inference. Motivated by this, we consider the sampler-induced distribution as the model of interest and maximize the likelihood o… ▽ More

    Submitted 9 January, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

    Comments: Presented at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  25. arXiv:1906.06639  [pdf, other

    cs.LG stat.ML

    Reinforcement Learning Driven Heuristic Optimization

    Authors: Qingpeng Cai, Will Hang, Azalia Mirhoseini, George Tucker, Jingtao Wang, Wei Wei

    Abstract: Heuristic algorithms such as simulated annealing, Concorde, and METIS are effective and widely used approaches to find solutions to combinatorial optimization problems. However, they are limited by the high sample complexity required to reach a reasonable solution from a cold-start. In this paper, we introduce a novel framework to generate better initial solutions for heuristic algorithms using re… ▽ More

    Submitted 15 June, 2019; originally announced June 2019.

    Comments: DRL4KDD'19

  26. arXiv:1906.00949  [pdf, other

    cs.LG stat.ML

    Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

    Authors: Aviral Kumar, Justin Fu, George Tucker, Sergey Levine

    Abstract: Off-policy reinforcement learning aims to leverage experience collected from prior policies for sample-efficient learning. However, in practice, commonly used off-policy approximate dynamic programming methods based on Q-learning and actor-critic methods are highly sensitive to the data distribution, and can make only limited progress without collecting additional on-policy data. As a step towards… ▽ More

    Submitted 25 November, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: Accepted at NeurIPS 2019; Project Website: https://sites.google.com/view/bear-off-policyrl

  27. arXiv:1905.06922  [pdf, other

    cs.LG stat.ML

    On Variational Bounds of Mutual Information

    Authors: Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker

    Abstract: Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bounding MI in high dimensions is challenging. To establish tractable and scalable objectives, recent work has turned to variational bounds parameterized by neural networks, but the relationships and tradeoffs between these bounds remains unclear. In this work, we unify these recent development… ▽ More

    Submitted 16 May, 2019; originally announced May 2019.

    Comments: ICML 2019

  28. arXiv:1903.00374  [pdf, other

    cs.LG stat.ML

    Model-Based Reinforcement Learning for Atari

    Authors: Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski

    Abstract: Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction -- substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly? Part of the answer may be that people can learn how the game works and… ▽ More

    Submitted 3 April, 2024; v1 submitted 1 March, 2019; originally announced March 2019.

  29. arXiv:1812.11103  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Learning to Walk via Deep Reinforcement Learning

    Authors: Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker, Sergey Levine

    Abstract: Deep reinforcement learning (deep RL) holds the promise of automating the acquisition of complex controllers that can map sensory inputs directly to low-level actions. In the domain of robotic locomotion, deep RL could enable learning locomotion skills with minimal engineering and without an explicit model of the robot dynamics. Unfortunately, applying deep RL to real-world robotic tasks is except… ▽ More

    Submitted 19 June, 2019; v1 submitted 26 December, 2018; originally announced December 2018.

    Comments: RSS 2019, https://sites.google.com/view/minitaur-locomotion/

  30. arXiv:1812.05905  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Soft Actor-Critic Algorithms and Applications

    Authors: Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine

    Abstract: Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample complexity and brittleness to hyperparameters. Both of these challenges limit the applicability of such methods to real-world domains. In this paper, we describe S… ▽ More

    Submitted 29 January, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1801.01290

  31. arXiv:1810.04586  [pdf, other

    cs.LG stat.ML

    The Laplacian in RL: Learning Representations with Efficient Approximations

    Authors: Yifan Wu, George Tucker, Ofir Nachum

    Abstract: The smallest eigenvectors of the graph Laplacian are well-known to provide a succinct representation of the geometry of a weighted graph. In reinforcement learning (RL), where the weighted graph may be interpreted as the state transition process induced by a behavior policy acting on the environment, approximating the eigenvectors of the Laplacian provides a promising approach to state representat… ▽ More

    Submitted 10 October, 2018; originally announced October 2018.

  32. arXiv:1810.04152  [pdf, other

    cs.LG stat.ML

    Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives

    Authors: George Tucker, Dieterich Lawson, Shixiang Gu, Chris J. Maddison

    Abstract: Deep latent variable models have become a popular model choice due to the scalable learning algorithms introduced by (Kingma & Welling, 2013; Rezende et al., 2014). These approaches maximize a variational lower bound on the intractable log likelihood of the observed data. Burda et al. (2015) introduced a multi-sample variational bound, IWAE, that is at least as tight as the standard variational lo… ▽ More

    Submitted 19 November, 2018; v1 submitted 9 October, 2018; originally announced October 2018.

  33. arXiv:1807.01675  [pdf, other

    cs.LG cs.AI stat.ML

    Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

    Authors: Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee

    Abstract: Integrating model-free and model-based approaches in reinforcement learning has the potential to achieve the high performance of model-free algorithms with low sample complexity. However, this is difficult because an imperfect dynamics model can degrade the performance of the learning algorithm, and in sufficiently complex environments, the dynamics model will almost always be imperfect. As a resu… ▽ More

    Submitted 7 June, 2019; v1 submitted 4 July, 2018; originally announced July 2018.

    Journal ref: Advances in Neural Information Processing Systems, 2019 (pp. 8224-8234)

  34. arXiv:1806.10230  [pdf, other

    cs.NE cs.LG stat.ML

    Guided evolutionary strategies: Augmenting random search with surrogate gradients

    Authors: Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein

    Abstract: Many applications in machine learning require optimizing a function whose true gradient is unknown, but where surrogate gradient information (directions that may be correlated with, but not necessarily identical to, the true gradient) is available instead. This arises when an approximate gradient is easier to compute than the full gradient (e.g. in meta-learning or unrolled optimization), or when… ▽ More

    Submitted 10 June, 2019; v1 submitted 26 June, 2018; originally announced June 2018.

    Comments: Published at ICML 2019

  35. arXiv:1803.02348  [pdf, other

    cs.LG cs.AI

    Smoothed Action Value Functions for Learning Gaussian Policies

    Authors: Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans

    Abstract: State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. Mo… ▽ More

    Submitted 25 July, 2018; v1 submitted 5 March, 2018; originally announced March 2018.

    Comments: ICML 2018

  36. arXiv:1802.10031  [pdf, other

    cs.LG stat.ML

    The Mirage of Action-Dependent Baselines in Reinforcement Learning

    Authors: George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine

    Abstract: Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. To be… ▽ More

    Submitted 19 November, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

    Comments: Updated to ICML final submission

  37. arXiv:1802.09127  [pdf, other

    stat.ML cs.LG

    Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

    Authors: Carlos Riquelme, George Tucker, Jasper Snoek

    Abstract: Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, developing practical methods to balance exploration and exploitation in complex domains remains largely unsolved. Thompson Sampling and its extension to reinforcement learning provide an elegant approach to exploration that only requires access to posteri… ▽ More

    Submitted 25 February, 2018; originally announced February 2018.

    Comments: Sixth International Conference on Learning Representations, ICLR 2018

  38. arXiv:1706.06428  [pdf, other

    cs.CL cs.LG stat.ML

    An online sequence-to-sequence model for noisy speech recognition

    Authors: Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly

    Abstract: Generative models have long been the dominant approach for speech recognition. The success of these models however relies on the use of sophisticated recipes and complicated machinery that is not easily accessible to non-practitioners. Recent innovations in Deep Learning have given rise to an alternative - discriminative models called Sequence-to-Sequence models, that can almost match the accuracy… ▽ More

    Submitted 16 June, 2017; originally announced June 2017.

    Comments: arXiv admin note: substantial text overlap with arXiv:1608.01281

  39. arXiv:1705.09279  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Filtering Variational Objectives

    Authors: Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, Yee Whye Teh

    Abstract: When used as a surrogate objective for maximum likelihood estimation in latent variable models, the evidence lower bound (ELBO) produces state-of-the-art results. Inspired by this, we consider the extension of the ELBO to a family of lower bounds defined by a particle filter's estimator of the marginal likelihood, the filtering variational objectives (FIVOs). FIVOs take the same arguments as the E… ▽ More

    Submitted 12 November, 2017; v1 submitted 25 May, 2017; originally announced May 2017.

  40. arXiv:1705.05524  [pdf, other

    cs.AI cs.LG stat.ML

    Learning Hard Alignments with Variational Inference

    Authors: Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly

    Abstract: There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition. Hard attention can offer benefits over soft attention such as decreased computational cost, but training hard attention models can be difficult because of the discrete latent variables they introduce. Previous work used REINFORCE and Q-learning to ap… ▽ More

    Submitted 1 November, 2017; v1 submitted 16 May, 2017; originally announced May 2017.

  41. arXiv:1705.02411  [pdf, other

    cs.CL cs.LG stat.ML

    Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

    Authors: Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Gengshen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni

    Abstract: We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance.… ▽ More

    Submitted 5 May, 2017; originally announced May 2017.

    Journal ref: Spoken Language Technology Workshop (SLT), 2016 IEEE (pp. 474-480). IEEE

  42. arXiv:1703.07370  [pdf, other

    cs.LG stat.ML

    REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

    Authors: George Tucker, Andriy Mnih, Chris J. Maddison, Dieterich Lawson, Jascha Sohl-Dickstein

    Abstract: Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the REINFORCE estimator. Recent work (Jang et al. 2016, Maddison et al. 2016) has taken a different approach, introducing a continuous relaxation of discrete variables to produce low-variance, but biased, gradient… ▽ More

    Submitted 6 November, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

    Comments: NIPS 2017

  43. arXiv:1703.05820  [pdf, other

    cs.LG cs.AI

    Particle Value Functions

    Authors: Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Arnaud Doucet, Andriy Mnih, Yee Whye Teh

    Abstract: The policy gradients of the expected return objective can react slowly to rare rewards. Yet, in some cases agents may wish to emphasize the low or high returns regardless of their probability. Borrowing from the economics and control literature, we review the risk-sensitive value function that arises from an exponential utility and illustrate its effects on an example. This risk-sensitive value fu… ▽ More

    Submitted 16 March, 2017; originally announced March 2017.

  44. arXiv:1701.06548  [pdf, other

    cs.NE cs.LG

    Regularizing Neural Networks by Penalizing Confident Output Distributions

    Authors: Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, Geoffrey Hinton

    Abstract: We systematically explore regularizing neural networks by penalizing low entropy output distributions. We show that penalizing low entropy output distributions, which has been shown to improve exploration in reinforcement learning, acts as a strong regularizer in supervised learning. Furthermore, we connect a maximum entropy based confidence penalty to label smoothing through the direction of the… ▽ More

    Submitted 23 January, 2017; originally announced January 2017.

    Comments: Submitted to ICLR 2017

  45. arXiv:1611.06148  [pdf, other

    stat.ML cs.LG cs.NE

    Compacting Neural Network Classifiers via Dropout Training

    Authors: Yotaro Kubo, George Tucker, Simon Wiesler

    Abstract: We introduce dropout compaction, a novel method for training feed-forward neural networks which realizes the performance gains of training a large model with dropout regularization, yet extracts a compact neural network for run-time efficiency. In the proposed method, we introduce a sparsity-inducing prior on the per unit dropout retention probability so that the optimizer can effectively prune hi… ▽ More

    Submitted 24 May, 2017; v1 submitted 18 November, 2016; originally announced November 2016.

    Comments: Submitted to AISTATS 2017 (Short-version is accepted to NIPS Workshop on Efficient Methods for Deep Neural Networks)