Skip to main content

Showing 1–22 of 22 results for author: Paine, T L

  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2308.08998  [pdf, other

    cs.CL cs.LG

    Reinforced Self-Training (ReST) for Language Modeling

    Authors: Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, Wolfgang Macherey, Arnaud Doucet, Orhan Firat, Nando de Freitas

    Abstract: Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating sampl… ▽ More

    Submitted 21 August, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: 23 pages, 16 figures

  4. arXiv:2308.03526  [pdf, other

    cs.LG cs.AI

    AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

    Authors: Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Żołna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aäron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals

    Abstract: StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of it… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 32 pages, 13 figures, previous version published as a NeurIPS 2021 workshop: https://openreview.net/forum?id=Np8Pumfoty

  5. arXiv:2306.09800  [pdf, other

    cs.LG cs.RO

    $\pi2\text{vec}$: Policy Representations with Successor Features

    Authors: Gianluca Scarpellini, Ksenia Konyushkova, Claudio Fantacci, Tom Le Paine, Yutian Chen, Misha Denil

    Abstract: This paper describes $\pi2\text{vec}$, a method for representing behaviors of black box policies as feature vectors. The policy representations capture how the statistics of foundation model features change in response to the policy behavior in a task agnostic way, and can be trained from offline data, allowing them to be used in offline policy selection. This work provides a key piece of a recipe… ▽ More

    Submitted 24 January, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted paper at ICLR2024

  6. arXiv:2106.10251  [pdf, other

    cs.LG cs.AI stat.ML

    Active Offline Policy Selection

    Authors: Ksenia Konyushkova, Yutian Chen, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas

    Abstract: This paper addresses the problem of policy selection in domains with abundant logged data, but with a restricted interaction budget. Solving this problem would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and recommendation domains among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of polici… ▽ More

    Submitted 6 May, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: Presented at NeurIPS 2021

  7. arXiv:2105.10148  [pdf, other

    cs.LG stat.ML

    On Instrumental Variable Regression for Deep Offline Policy Evaluation

    Authors: Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet

    Abstract: We show that the popular reinforcement learning (RL) strategy of estimating the state-action value (Q-function) by minimizing the mean squared Bellman error leads to a regression problem with confounding, the inputs and output noise being correlated. Hence, direct minimization of the Bellman error can result in significantly biased Q-function estimates. We explain why fixing the target Q-network i… ▽ More

    Submitted 23 November, 2022; v1 submitted 21 May, 2021; originally announced May 2021.

    Comments: Accepted by Journal of Machine Learning Research in 11/2022

    Journal ref: Journal of Machine Learning Research 23 (2022) 1-41

  8. arXiv:2104.13877  [pdf, other

    cs.LG cs.AI stat.ML

    Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

    Authors: Michael R. Zhang, Tom Le Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, Ziyu Wang, Mohammad Norouzi

    Abstract: Standard dynamics models for continuous control make use of feedforward computation to predict the conditional distribution of next state and reward given current state and action using a multivariate Gaussian with a diagonal covariance structure. This modeling choice assumes that different dimensions of the next state and reward are conditionally independent given the current state and action and… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: ICLR 2021. 17 pages

  9. arXiv:2103.16596  [pdf, other

    cs.LG stat.ML

    Benchmarks for Deep Off-Policy Evaluation

    Authors: Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine

    Abstract: Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making. The ability to learn offline is particularly important in many real-world domains, such as in healthcare, recommender systems, or robotics, where online data collection is an expensive and potentially dangerous process. Being able t… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: ICLR 2021 paper. Policies and evaluation code are available at https://github.com/google-research/deep_ope

  10. arXiv:2007.09055  [pdf, other

    cs.LG cs.AI stat.ML

    Hyperparameter Selection for Offline Reinforcement Learning

    Authors: Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, Nando de Freitas

    Abstract: Offline reinforcement learning (RL purely from logged data) is an important avenue for deploying RL techniques in real-world scenarios. However, existing hyperparameter selection methods for offline RL break the offline assumption by evaluating policies corresponding to each hyperparameter setting in the environment. This online execution is often infeasible and hence undermines the main aim of of… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

  11. arXiv:2006.13888  [pdf, other

    cs.LG stat.ML

    RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

    Authors: Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas

    Abstract: Offline methods for reinforcement learning have a potential to help bridge the gap between reinforcement learning research and real-world applications. They make it possible to learn policies from offline datasets, thus overcoming concerns associated with online data collection in the real-world, including cost, safety, or ethical concerns. In this paper, we propose a benchmark called RL Unplugged… ▽ More

    Submitted 12 February, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: NeurIPS paper. 21 pages including supplementary material, the github link for the datasets: https://github.com/deepmind/deepmind-research/rl_unplugged

  12. arXiv:2006.00979  [pdf, other

    cs.LG cs.AI

    Acme: A Research Framework for Distributed Reinforcement Learning

    Authors: Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang , et al. (14 additional authors not shown)

    Abstract: Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce publishe… ▽ More

    Submitted 20 September, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme

  13. arXiv:1910.09890  [pdf, other

    cs.NE cs.LG

    Improving the Gating Mechanism of Recurrent Neural Networks

    Authors: Albert Gu, Caglar Gulcehre, Tom Le Paine, Matt Hoffman, Razvan Pascanu

    Abstract: Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate more easily through depth or time. However, their saturation property introduces problems of its own. For example, in recurrent models these gates need to have outputs near 1 to propagate information over long time-delays, which requires them to operate in their saturation regime and hinders gra… ▽ More

    Submitted 18 June, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: International Conference on Machine Learning 2020

  14. arXiv:1909.01387  [pdf, other

    cs.LG cs.AI

    Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

    Authors: Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team

    Abstract: This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. We also introduce a suite of eight tasks that combine these three properties, and show that R2D3 can solve several of the tasks where other state of the art methods (both with and without demonstrations) fai… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

  15. arXiv:1810.05017  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

    Authors: Tom Le Paine, Sergio Gómez Colmenarejo, Ziyu Wang, Scott Reed, Yusuf Aytar, Tobias Pfaff, Matt W. Hoffman, Gabriel Barth-Maron, Serkan Cabi, David Budden, Nando de Freitas

    Abstract: Humans are experts at high-fidelity imitation -- closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for… ▽ More

    Submitted 11 October, 2018; originally announced October 2018.

  16. arXiv:1805.11592  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Playing hard exploration games by watching YouTube

    Authors: Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas

    Abstract: Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstra… ▽ More

    Submitted 30 November, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

  17. arXiv:1704.06001  [pdf, other

    cs.LG cs.CV stat.ML

    Fast Generation for Convolutional Autoregressive Models

    Authors: Prajit Ramachandran, Tom Le Paine, Pooya Khorrami, Mohammad Babaeizadeh, Shiyu Chang, Yang Zhang, Mark A. Hasegawa-Johnson, Roy H. Campbell, Thomas S. Huang

    Abstract: Convolutional autoregressive models have recently demonstrated state-of-the-art performance on a number of generation tasks. While fast, parallel training methods have been crucial for their success, generation is typically implemented in a naïve fashion where redundant computations are unnecessarily repeated. This results in slow generation, making such models infeasible for production environmen… ▽ More

    Submitted 20 April, 2017; originally announced April 2017.

    Comments: Accepted at ICLR 2017 Workshop

  18. arXiv:1611.09482  [pdf, other

    cs.SD cs.DS cs.LG

    Fast Wavenet Generation Algorithm

    Authors: Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang, Prajit Ramachandran, Mark A. Hasegawa-Johnson, Thomas S. Huang

    Abstract: This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet. Compared to a naive implementation that has complexity O(2^L) (L denotes the number of layers in the network), our proposed approach removes redundant convolution operations by caching previous calculations, thereby reducing the complexity to O(L) time. Timing experiments show significant advanta… ▽ More

    Submitted 28 November, 2016; originally announced November 2016.

    Comments: Technical Report

  19. arXiv:1602.08465  [pdf, other

    cs.CV

    Seq-NMS for Video Object Detection

    Authors: Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang

    Abstract: Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip. Recently, there have been major advances for doing object detection in a single image. These methods typically contain three phases: (i) object proposal generation (ii) object classification and (iii) post-processing. We propose a modificatio… ▽ More

    Submitted 22 August, 2016; v1 submitted 26 February, 2016; originally announced February 2016.

    Comments: Technical Report for Imagenet VID Competition 2015

  20. arXiv:1602.07377  [pdf, other

    cs.CV

    How Deep Neural Networks Can Improve Emotion Recognition on Video Data

    Authors: Pooya Khorrami, Tom Le Paine, Kevin Brady, Charlie Dagli, Thomas S. Huang

    Abstract: We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this work, we present a system that performs emotion reco… ▽ More

    Submitted 9 January, 2017; v1 submitted 23 February, 2016; originally announced February 2016.

    Comments: Accepted at ICIP 2016. Fixed typo in Experiments section

  21. arXiv:1510.02969  [pdf, other

    cs.CV cs.LG cs.NE

    Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition?

    Authors: Pooya Khorrami, Tom Le Paine, Thomas S. Huang

    Abstract: Despite being the appearance-based classifier of choice in recent years, relatively few works have examined how much convolutional neural networks (CNNs) can improve performance on accepted expression recognition benchmarks and, more importantly, examine what it is they actually learn. In this work, not only do we show that CNNs can achieve strong performance, but we also introduce an approach to… ▽ More

    Submitted 15 March, 2017; v1 submitted 10 October, 2015; originally announced October 2015.

    Comments: Accepted at ICCV 2015 CV4AC Workshop. Corrected numbers in Tables 2 and 3

  22. arXiv:1412.6597  [pdf, other

    cs.CV cs.LG cs.NE

    An Analysis of Unsupervised Pre-training in Light of Recent Advances

    Authors: Tom Le Paine, Pooya Khorrami, Wei Han, Thomas S. Huang

    Abstract: Convolutional neural networks perform well on object recognition because of a number of recent advances: rectified linear units (ReLUs), data augmentation, dropout, and large labelled datasets. Unsupervised data has been proposed as another way to improve performance. Unfortunately, unsupervised pre-training is not used by state-of-the-art methods leading to the following question: Is unsupervised… ▽ More

    Submitted 10 April, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

    Comments: Accepted as a workshop contribution to ICLR 2015