Skip to main content

Showing 1–18 of 18 results for author: Goodman, S

  1. arXiv:2310.09199  [pdf, other

    cs.CV

    PaLI-3 Vision Language Models: Smaller, Faster, Stronger

    Authors: Xi Chen, Xiao Wang, Lucas Beyer, Alexander Kolesnikov, Jialin Wu, Paul Voigtlaender, Basil Mustafa, Sebastian Goodman, Ibrahim Alabdulmohsin, Piotr Padlewski, Daniel Salz, Xi Xiong, Daniel Vlasic, Filip Pavetic, Keran Rong, Tianli Yu, Daniel Keysers, Xiaohua Zhai, Radu Soricut

    Abstract: This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. As part of arriving at this strong performance, we compare Vision Transformer (ViT) models pretrained using classification objectives to contrastively (SigLIP) pretrained ones. We find that, while slightly underperforming on standard image classific… ▽ More

    Submitted 17 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

  2. arXiv:2308.06912  [pdf, other

    cs.LG cs.CL

    CausalLM is not optimal for in-context learning

    Authors: Nan Ding, Tomer Levinboim, Jialin Wu, Sebastian Goodman, Radu Soricut

    Abstract: Recent empirical evidence indicates that transformer based in-context learning performs better when using a prefix language model (prefixLM), in which in-context samples can all attend to each other, compared to causal language models (causalLM), which use auto-regressive attention that prohibits in-context samples to attend to future samples. While this result is intuitive, it is not understood f… ▽ More

    Submitted 20 February, 2024; v1 submitted 13 August, 2023; originally announced August 2023.

    Comments: ICLR 2024 conference paper. Code available at: https://github.com/google-research/causallm_icl

  3. arXiv:2305.18565  [pdf, other

    cs.CV cs.CL cs.LG

    PaLI-X: On Scaling up a Multilingual Vision and Language Model

    Authors: Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic , et al. (18 additional authors not shown)

    Abstract: We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-sh… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  4. arXiv:2209.06794  [pdf, other

    cs.CV cs.CL

    PaLI: A Jointly-Scaled Multilingual Language-Image Model

    Authors: Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner , et al. (4 additional authors not shown)

    Abstract: Effective scaling and a flexible task interface enable large language models to excel at many tasks. We present PaLI (Pathways Language and Image model), a model that extends this approach to the joint modeling of language and vision. PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages. To train PaL… ▽ More

    Submitted 5 June, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: ICLR 2023 (Notable-top-5%)

  5. arXiv:2209.05534  [pdf, other

    cs.CV cs.CL

    PreSTU: Pre-Training for Scene-Text Understanding

    Authors: Jihyung Kil, Soravit Changpinyo, Xi Chen, Hexiang Hu, Sebastian Goodman, Wei-Lun Chao, Radu Soricut

    Abstract: The ability to recognize and reason about text embedded in visual inputs is often lacking in vision-and-language (V&L) models, perhaps because V&L pre-training methods have often failed to include such an ability in their training objective. In this paper, we propose PreSTU, a novel pre-training recipe dedicated to scene-text understanding (STU). PreSTU introduces OCR-aware pre-training objectives… ▽ More

    Submitted 19 August, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

    Comments: Accepted to ICCV 2023

  6. LaMPost: Design and Evaluation of an AI-assisted Email Writing Prototype for Adults with Dyslexia

    Authors: Steven M. Goodman, Erin Buehler, Patrick Clary, Andy Coenen, Aaron Donsbach, Tiffanie N. Horne, Michal Lahav, Robert Macdonald, Rain Breaw Michaels, Ajit Narayanan, Mahima Pushkarna, Joel Riley, Alex Santana, Lei Shi, Rachel Sweeney, Phil Weaver, Ann Yuan, Meredith Ringel Morris

    Abstract: Prior work has explored the writing challenges experienced by people with dyslexia, and the potential for new spelling, grammar, and word retrieval technologies to address these challenges. However, the capabilities for natural language generation demonstrated by the latest class of large language models (LLMs) highlight an opportunity to explore new forms of human-AI writing support tools. In thi… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: To appear at The 24th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '22), October 23-26, 2022, Athens, Greece. 26 pages

  7. arXiv:2203.17189  [pdf, other

    cs.LG cs.CL

    Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

    Authors: Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen , et al. (18 additional authors not shown)

    Abstract: Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we presen… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  8. arXiv:2202.11134  [pdf

    cs.HC cs.LG cs.SD eess.AS

    ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users

    Authors: Dhruv Jain, Khoa Huynh Anh Nguyen, Steven Goodman, Rachel Grossman-Kahn, Hung Ngo, Aditya Kusupati, Ruofei Du, Alex Olwal, Leah Findlater, Jon E. Froehlich

    Abstract: Recent advances have enabled automatic sound recognition systems for deaf and hard of hearing (DHH) users on mobile devices. However, these tools use pre-trained, generic sound recognition models, which do not meet the diverse needs of DHH users. We introduce ProtoSound, an interactive system for customizing sound recognition models by recording a few examples, thereby enabling personalized and fi… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: Published at the ACM CHI Conference on Human Factors in Computing Systems (CHI) 2022

  9. Social, Environmental, and Technical: Factors at Play in the Current Use and Future Design of Small-Group Captioning

    Authors: Emma J. McDonnell, Ping Liu, Steven M. Goodman, Raja Kushalnagar, Jon E. Froehlich, Leah Findlater

    Abstract: Real-time captioning is a critical accessibility tool for many d/Deaf and hard of hearing (DHH) people. While the vast majority of captioning work has focused on formal settings and technical innovations, in contrast, we investigate captioning for informal, interactive small-group conversations, which have a high degree of spontaneity and foster dynamic social interactions. This paper reports on s… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: 25 pages, 3 figures, to be published in the PACMHCI-CSCW2 October 2021 edition, to be presented at CSCW 2021

  10. arXiv:2106.06899  [pdf, other

    cs.CL cs.LG

    Memory-efficient Transformers via Top-$k$ Attention

    Authors: Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonathan Berant

    Abstract: Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. While these variants are memory and compute efficient, it is not possible to directly use them with popular pre-trained language models trained using vanilla attention, without an expensive corrective pre-training… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

  11. arXiv:2105.14099  [pdf, other

    cs.LG stat.ML

    Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning

    Authors: Nan Ding, Xi Chen, Tomer Levinboim, Sebastian Goodman, Radu Soricut

    Abstract: Despite recent advances in its theoretical understanding, there still remains a significant gap in the ability of existing PAC-Bayesian theories on meta-learning to explain performance improvements in the few-shot learning setting, where the number of training examples in the target tasks is severely limited. This gap originates from an assumption in the existing theories which supposes that the n… ▽ More

    Submitted 25 October, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: Neural Information Processing Systems 2021

  12. arXiv:2010.03494  [pdf, other

    cs.CL

    TeaForN: Teacher-Forcing with N-grams

    Authors: Sebastian Goodman, Nan Ding, Radu Soricut

    Abstract: Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model parameter updates based on N prediction steps.… ▽ More

    Submitted 9 October, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: to be published in EMNLP 2020

  13. arXiv:2006.08686  [pdf, other

    cs.CV cs.LG

    Multi-Image Summarization: Textual Summary from a Set of Cohesive Images

    Authors: Nicholas Trieu, Sebastian Goodman, Pradyumna Narayana, Kazoo Sone, Radu Soricut

    Abstract: Multi-sentence summarization is a well studied problem in NLP, while generating image descriptions for a single image is a well studied problem in Computer Vision. However, for applications such as image cluster labeling or web page summarization, summarizing a set of images is also a useful and challenging task. This paper proposes the new task of multi-image summarization, which aims to generate… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: 9 pages, 5 figures

  14. arXiv:1909.11942  [pdf, other

    cs.CL cs.AI

    ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

    Authors: Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

    Abstract: Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Compr… ▽ More

    Submitted 8 February, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

  15. arXiv:1909.10599  [pdf, ps, other

    cs.CL

    Multi-stage Pretraining for Abstractive Summarization

    Authors: Sebastian Goodman, Zhenzhong Lan, Radu Soricut

    Abstract: Neural models for abstractive summarization tend to achieve the best performance in the presence of highly specialized, summarization specific modeling add-ons such as pointer-generator, coverage-modeling, and inferencetime heuristics. We show here that pretraining can complement such modeling advancements to yield improved results in both short-form and long-form abstractive summarization using t… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

  16. arXiv:1908.07333  [pdf

    cs.CY cs.HC

    Fairness Issues in AI Systems that Augment Sensory Abilities

    Authors: Leah Findlater, Steven Goodman, Yuhang Zhao, Shiri Azenkot, Margot Hanley

    Abstract: Systems that augment sensory abilities are increasingly employing AI and machine learning (ML) approaches, with applications ranging from object recognition and scene description tools for blind users to sound awareness tools for d/Deaf users. However, unlike many other AI-enabled technologies, these systems provide information that is already available to non-disabled people. In this paper, we di… ▽ More

    Submitted 16 August, 2019; originally announced August 2019.

    Comments: 4 pages. Accepted to the ACM ASSETS 2019 Workshop on AI Fairness for People with Disabilities

  17. arXiv:1612.07833  [pdf, other

    cs.CL cs.CV

    Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

    Authors: Nan Ding, Sebastian Goodman, Fei Sha, Radu Soricut

    Abstract: We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options. Accomplishing the task entails demonstrating comprehension beyond just recognizing "keywords" (or key-phrases) and their corresponding visual concepts. Instead, it requires an alignment between t… ▽ More

    Submitted 22 December, 2016; originally announced December 2016.

    Comments: 11 pages

  18. arXiv:1105.6020  [pdf

    cs.HC

    Level of Presence in Team-Building Activities: Gaming Component in Virtual Environments

    Authors: Gianluca De Leo, Koren S. Goodman, Elena Radici, Scott R. Secrhist, Thomas W. Mastaglio

    Abstract: Historically the training of teams has been implemented using a face-to-face approach. In the past decade, on-line multiuser virtual environments have offered a solution for training teams whose members are geographically dispersed. In order to develop on effective team training activity, a high sense of presence among the participant needs to be reached. Previous research studies reported being a… ▽ More

    Submitted 27 May, 2011; originally announced May 2011.

    Comments: 10 pages, 1 figure, 5 tables; The International Journal of Multimedia & Its Applications (IJMA) Vol.3, No.2, May 2011