Skip to main content

Showing 1–13 of 13 results for author: Douglas, S

  1. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2403.02164  [pdf

    cs.AI cs.MA

    Cognition is All You Need -- The Next Layer of AI Above Large Language Models

    Authors: Nova Spivack, Sam Douglas, Michelle Crames, Tim Connors

    Abstract: Recent studies of the applications of conversational AI tools, such as chatbots powered by large language models, to complex real-world knowledge work have shown limitations related to reasoning and multi-step problem solving. Specifically, while existing chatbots simulate shallow reasoning and understanding they are prone to errors as problem complexity increases. The failure of these systems to… ▽ More

    Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 63 pages, 18 figures

    ACM Class: I.2.0

  4. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  5. arXiv:2312.02179  [pdf, other

    cs.LG cs.AI cs.CL

    Training Chain-of-Thought via Latent-Variable Inference

    Authors: Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous

    Abstract: Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a ``chain-of-thought'' (CoT) prompt. One can also improve LLMs' performance on a specific task by supervised fine-tuning, i.e., by using gradient ascent on some tunable parameters to maximize the average log-likelihood of correct answers from a labeled training se… ▽ More

    Submitted 28 November, 2023; originally announced December 2023.

    Comments: 23 pages, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  6. arXiv:2307.08753  [pdf, other

    astro-ph.SR astro-ph.EP astro-ph.IM cs.LG

    A Novel Application of Conditional Normalizing Flows: Stellar Age Inference with Gyrochronology

    Authors: Phil Van-Lane, Joshua S. Speagle, Stephanie Douglas

    Abstract: Stellar ages are critical building blocks of evolutionary models, but challenging to measure for low mass main sequence stars. An unexplored solution in this regime is the application of probabilistic machine learning methods to gyrochronology, a stellar dating technique that is uniquely well suited for these stars. While accurate analytical gyrochronological models have proven challenging to deve… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted at the ICML 2023 Workshop on Machine Learning for Astrophysics. 10 pages, 3 figures (+1 in appendices)

    ACM Class: J.2.0

  7. arXiv:2211.05102  [pdf, other

    cs.LG cs.CL

    Efficiently Scaling Transformer Inference

    Authors: Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean

    Abstract: We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths. Better understanding of the engineering tradeoffs for inference for large Transformer-based models is important as use cases of these models are growing rapidly throughout application areas. We develop a sim… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  8. Dynamic multi feature-class Gaussian process models

    Authors: Jean-Rassaire Fouefack, Bhushan Borotikar, Marcel Lüthi, Tania S. Douglas, Valérie Burdin, Tinashe E. M. Mutsvangwa

    Abstract: In model-based medical image analysis, three features of interest are the shape of structures of interest, their relative pose, and image intensity profiles representative of some physical property. Often, these are modelled separately through statistical models by decomposing the object's features into a set of basis functions through principal geodesic analysis or principal component analysis. T… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

    Comments: 16

  9. arXiv:2103.15569  [pdf, other

    cs.LG stat.ML

    Risk Bounds for Learning via Hilbert Coresets

    Authors: Spencer Douglas, Piyush Kumar, R. K. Prasanth

    Abstract: We develop a formalism for constructing stochastic upper bounds on the expected full sample risk for supervised classification tasks via the Hilbert coresets approach within a transductive framework. We explicitly compute tight and meaningful bounds for complex datasets and complex hypothesis classes such as state-of-the-art deep neural network architectures. The bounds we develop exhibit nice pro… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

    Comments: 16 pages, 2 figures

    ACM Class: F.2.1; F.2.3

  10. arXiv:2101.11775  [pdf, ps, other

    cs.CY cs.AI

    Moral and Social Ramifications of Autonomous Vehicles

    Authors: Veljko Dubljević, Sean Douglas, Jovan Milojevich, Nirav Ajmeri, William A. Bauer, George F. List, Munindar P. Singh

    Abstract: Autonomous Vehicles (AVs) raise important social and ethical concerns, especially about accountability, dignity, and justice. We focus on the specific concerns arising from how AV technology will affect the lives and livelihoods of professional and semi-professional drivers. Whereas previous studies of such concerns have focused on the opinions of experts, we seek to understand these ethical and s… ▽ More

    Submitted 29 January, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

  11. Dynamic multi-object Gaussian process models: A framework for data-driven functional modelling of human joints

    Authors: Jean-Rassaire Fouefack, Bhushan Borotikar, Tania S. Douglas, Valérie Burdin, Tinashe E. M. Mutsvangwa

    Abstract: Statistical shape models (SSMs) are state-of-the-art medical image analysis tools for extracting and explaining features across a set of biological structures. However, a principled and robust way to combine shape and pose features has been illusive due to three main issues: 1) Non-homogeneity of the data (data with linear and non-linear natural variation across features), 2) non-optimal represent… ▽ More

    Submitted 22 January, 2020; originally announced January 2020.

    Comments: 15 pages, 14 figures

  12. arXiv:1812.05981  [pdf, other

    cs.LG eess.SP stat.ML

    Why ReLU Units Sometimes Die: Analysis of Single-Unit Error Backpropagation in Neural Networks

    Authors: Scott C. Douglas, Jiutian Yu

    Abstract: Recently, neural networks in machine learning use rectified linear units (ReLUs) in early processing layers for better performance. Training these structures sometimes results in "dying ReLU units" with near-zero outputs. We first explore this condition via simulation using the CIFAR-10 dataset and variants of two popular convolutive neural network architectures. Our explorations show that the out… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

    Comments: 5 pages, 7 figures, Proc. 52nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, October 2018

  13. arXiv:cs/0410009  [pdf, ps, other

    cs.DC

    Diffusive Load Balancing of Loosely-Synchronous Parallel Programs over Peer-to-Peer Networks

    Authors: Scott Douglas, Aaron Harwood

    Abstract: The use of under-utilized Internet resources is widely recognized as a viable form of high performance computing. Sustained processing power of roughly 40T FLOPS using 4 million volunteered Internet hosts has been reported for embarrassingly parallel problems. At the same time, peer-to-peer (P2P) file sharing networks, with more than 50 million participants, have demonstrated the capacity for sc… ▽ More

    Submitted 5 October, 2004; originally announced October 2004.

    Comments: 14 pages with 10 figures