Skip to main content

Showing 1–50 of 100 results for author: Goyal, N

  1. arXiv:2405.17653  [pdf, other

    cs.LG cs.AI cs.CL

    InversionView: A General-Purpose Method for Reading Information from Neural Activations

    Authors: Xinting Huang, Madhur Panwar, Navin Goyal, Michael Hahn

    Abstract: The inner workings of neural networks can be better understood if we can fully decipher the information encoded in neural activations. In this paper, we argue that this information is embodied by the subset of inputs that give rise to similar activations. Computing such subsets is nontrivial as the input space is exponentially large. We propose InversionView, which allows us to practically inspect… ▽ More

    Submitted 15 July, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: ICML 2024 Mechanistic Interpretability Workshop oral

  2. arXiv:2404.16367  [pdf, other

    cs.CL cs.LG

    Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically

    Authors: Kabir Ahuja, Vidhisha Balachandran, Madhur Panwar, Tianxing He, Noah A. Smith, Navin Goyal, Yulia Tsvetkov

    Abstract: Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures without explicitly encoding any structural bias. In this work, we investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge. We extensively experiment with transfor… ▽ More

    Submitted 31 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Code now available: https://github.com/kabirahuja2431/transformers-hg

  3. arXiv:2404.04289  [pdf, ps, other

    cs.AI cs.HC cs.LG

    Designing for Human-Agent Alignment: Understanding what humans want from their agents

    Authors: Nitesh Goyal, Minsuk Chang, Michael Terry

    Abstract: Our ability to build autonomous agents that leverage Generative AI continues to increase by the day. As builders and users of such agents it is unclear what parameters we need to align on before the agents start performing tasks on our behalf. To discover these parameters, we ran a qualitative empirical research study about designing agents that can negotiate during a fictional yet relatable task… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Human-AI Alignment, Human-Agent Alignment, Agents, Generative AI, Large Language Models

    ACM Class: I.2.0

  4. A Scoping Study of Evaluation Practices for Responsible AI Tools: Steps Towards Effectiveness Evaluations

    Authors: Glen Berman, Nitesh Goyal, Michael Madaio

    Abstract: Responsible design of AI systems is a shared goal across HCI and AI communities. Responsible AI (RAI) tools have been developed to support practitioners to identify, assess, and mitigate ethical issues during AI development. These tools take many forms (e.g., design playbooks, software toolkits, documentation protocols). However, research suggests that use of RAI tools is shaped by organizational… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted for publication in Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  5. arXiv:2312.01582  [pdf, other

    cs.CL

    Explaining with Contrastive Phrasal Highlighting: A Case Study in Assisting Humans to Detect Translation Differences

    Authors: Eleftheria Briakou, Navita Goyal, Marine Carpuat

    Abstract: Explainable NLP techniques primarily explain by answering "Which tokens in the input are responsible for this prediction?''. We argue that for NLP models that make predictions by comparing two input texts, it is more useful to explain by answering "What differences between the two inputs explain this prediction?''. We introduce a technique to generate contrastive highlights that explain the predic… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: EMNLP 2023

  6. arXiv:2310.15428  [pdf, other

    cs.HC cs.AI

    ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles

    Authors: Savvas Petridis, Ben Wedin, James Wexler, Aaron Donsbach, Mahima Pushkarna, Nitesh Goyal, Carrie J. Cai, Michael Terry

    Abstract: Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  7. arXiv:2310.12558  [pdf, other

    cs.CL cs.HC

    Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong

    Authors: Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé III, Jordan Boyd-Graber

    Abstract: Large Language Models (LLMs) are increasingly used for accessing information on the web. Their truthfulness and factuality are thus of great interest. To help users make the right decisions about the information they get, LLMs should not only provide information but also help users fact-check it. Our experiments with 80 crowdworkers compare language models with search engines (information retrieva… ▽ More

    Submitted 1 April, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: NAACL 2024

  8. The Impact of Explanations on Fairness in Human-AI Decision-Making: Protected vs Proxy Features

    Authors: Navita Goyal, Connor Baumler, Tin Nguyen, Hal Daumé III

    Abstract: AI systems have been known to amplify biases in real-world data. Explanations may help human-AI teams address these biases for fairer decision-making. Typically, explanations focus on salient input features. If a model is biased against some protected group, explanations may include features that demonstrate this bias, but when biases are realized through proxy features, the relationship between t… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: IUI 2024

  9. arXiv:2310.02859  [pdf, other

    cs.SI

    Tight Sampling in Unbounded Networks

    Authors: Kshitijaa Jaglan, Meher Chaitanya, Triansh Sharma, Abhijeeth Singam, Nidhi Goyal, Ponnurangam Kumaraguru, Ulrik Brandes

    Abstract: The default approach to deal with the enormous size and limited accessibility of many Web and social media networks is to sample one or more subnetworks from a conceptually unbounded unknown network. Clearly, the extracted subnetworks will crucially depend on the sampling scheme. Motivated by studies of homophily and opinion formation, we propose a variant of snowball sampling designed to prioriti… ▽ More

    Submitted 5 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: The first two authors contributed equally

  10. arXiv:2308.16884  [pdf, other

    cs.CL cs.AI cs.LG

    The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

    Authors: Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, Madian Khabsa

    Abstract: We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multip… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: 27 pages, 13 figures

    ACM Class: I.2.7

  11. `It is currently hodgepodge'': Examining AI/ML Practitioners' Challenges during Co-production of Responsible AI Values

    Authors: Rama Adithya Varanasi, Nitesh Goyal

    Abstract: Recently, the AI/ML research community has indicated an urgent need to establish Responsible AI (RAI) values and practices as part of the AI/ML lifecycle. Several organizations and communities are responding to this call by sharing RAI guidelines. However, there are gaps in awareness, deliberation, and execution of such practices for multi-disciplinary ML practitioners. This work contributes to th… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    ACM Class: I.2; K.4

  12. arXiv:2307.09288  [pdf, other

    cs.CL cs.AI

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini , et al. (43 additional authors not shown)

    Abstract: In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be… ▽ More

    Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

  13. arXiv:2306.10763  [pdf, other

    cs.CL cs.AI cs.LG cs.PL cs.SE

    Guiding Language Models of Code with Global Context using Monitors

    Authors: Lakshya A Agrawal, Aditya Kanade, Navin Goyal, Shuvendu K. Lahiri, Sriram K. Rajamani

    Abstract: Language models of code (LMs) work well when the surrounding code provides sufficient context. This is not true when it becomes necessary to use types, functionality or APIs defined elsewhere in the repository or a linked library, especially those not seen during training. LMs suffer from limited awareness of such global context and end up hallucinating. Integrated development environments (IDEs… ▽ More

    Submitted 3 November, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023 and to appear as "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context" at https://neurips.cc/virtual/2023/poster/70362 . Contents: 11 pages, 15 additional pages of appendix, 13 figures, 3 tables

    ACM Class: I.2.2; I.2.7; I.2.5

  14. arXiv:2306.04891  [pdf, other

    cs.LG cs.CL

    In-Context Learning through the Bayesian Prism

    Authors: Madhur Panwar, Kabir Ahuja, Navin Goyal

    Abstract: In-context learning (ICL) is one of the surprising and useful features of large language models and subject of intense research. Recently, stylized meta-learning-like ICL setups have been devised that train transformers on sequences of input-output pairs $(x, f(x))$. The function $f$ comes from a function class and generalization is checked by evaluating on sequences generated from unseen function… ▽ More

    Submitted 14 April, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: ICLR 2024

  15. arXiv:2305.14331  [pdf, other

    cs.CL cs.AI

    What Else Do I Need to Know? The Effect of Background Information on Users' Reliance on QA Systems

    Authors: Navita Goyal, Eleftheria Briakou, Amanda Liu, Connor Baumler, Claire Bonial, Jeffrey Micher, Clare R. Voss, Marine Carpuat, Hal Daumé III

    Abstract: NLP systems have shown impressive performance at answering questions by retrieving relevant context. However, with the increasingly large models, it is impossible and often undesirable to constrain models' knowledge or reasoning to only the retrieved context. This leads to a mismatch between the information that the models access to derive the answer and the information that is available to the us… ▽ More

    Submitted 25 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  16. arXiv:2304.09871  [pdf, other

    cs.LG cs.AI math.OC

    A Theory on Adam Instability in Large-Scale Machine Learning

    Authors: Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang

    Abstract: We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We observe that Adam can enter a state in which the parameter update vector has a relatively large norm and is essentially uncorrelated with the direction of descent… ▽ More

    Submitted 25 April, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

  17. arXiv:2304.02122  [pdf, other

    cs.CV

    OpenContrails: Benchmarking Contrail Detection on GOES-16 ABI

    Authors: Joe Yue-Hei Ng, Kevin McCloskey, Jian Cui, Vincent R. Meijer, Erica Brand, Aaron Sarna, Nita Goyal, Christopher Van Arsdale, Scott Geraedts

    Abstract: Contrails (condensation trails) are line-shaped ice clouds caused by aircraft and are likely the largest contributor of aviation-induced climate change. Contrail avoidance is potentially an inexpensive way to significantly reduce the climate impact of aviation. An automated contrail detection system is an essential tool to develop and evaluate contrail avoidance systems. In this paper, we present… ▽ More

    Submitted 20 April, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  18. arXiv:2303.07971  [pdf, other

    cs.CL cs.LG

    A Theory of Emergent In-Context Learning as Implicit Structure Induction

    Authors: Michael Hahn, Navin Goyal

    Abstract: Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on recombination of compositional operations found in natural language data. We derive an information-theoretic bound showing how in-context learning abilities ari… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  19. arXiv:2302.13971  [pdf, other

    cs.CL

    LLaMA: Open and Efficient Foundation Language Models

    Authors: Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample

    Abstract: We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is co… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  20. NewsComp: Facilitating Diverse News Reading through Comparative Annotation

    Authors: Md Momen Bhuiyan, Sang Won Lee, Nitesh Goyal, Tanushree Mitra

    Abstract: To support efficient, balanced news consumption, merging articles from diverse sources into one, potentially through crowdsourcing, could alleviate some hurdles. However, the merging process could also impact annotators' attitudes towards the content. To test this theory, we propose comparative news annotation, i.e., annotating similarities and differences between a pair of articles. By developing… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Comments: 2023 ACM CHI Conference on Human Factors in Computing Systems, 17 pages

  21. Investigating How Practitioners Use Human-AI Guidelines: A Case Study on the People + AI Guidebook

    Authors: Nur Yildirim, Mahima Pushkarna, Nitesh Goyal, Martin Wattenberg, Fernanda Viegas

    Abstract: Artificial intelligence (AI) presents new challenges for the user experience (UX) of products and services. Recently, practitioner-facing resources and design guidelines have become available to ease some of these challenges. However, little research has investigated if and how these guidelines are used, and how they impact practice. In this paper, we investigated how industry practitioners use th… ▽ More

    Submitted 20 April, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

    Journal ref: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

  22. arXiv:2301.11280  [pdf, other

    cs.CV cs.AI cs.LG

    Text-To-4D Dynamic Scene Generation

    Authors: Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman

    Abstract: We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed from any camera locat… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  23. arXiv:2301.10472  [pdf, other

    cs.CL cs.LG

    XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

    Authors: Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa

    Abstract: Large multilingual language models typically rely on a single vocabulary shared across 100+ languages. As these models have increased in parameter count and depth, vocabulary size has remained largely unchanged. This \textit{vocabulary bottleneck} limits the representational capabilities of multilingual models like XLM-R. In this paper, we introduce a new approach for scaling to very large multili… ▽ More

    Submitted 13 October, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: EMNLP 2023

  24. arXiv:2301.03728  [pdf, other

    cs.CL cs.AI cs.LG

    Scaling Laws for Generative Mixed-Modal Language Models

    Authors: Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

    Abstract: Generative language models define distributions over sequences of tokens that can represent essentially any combination of data modalities (e.g., any permutation of image tokens from VQ-VAEs, speech tokens from HuBERT, BPE tokens for language or code, and so on). To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modaliti… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

  25. arXiv:2301.00264  [pdf

    cs.CV cs.AI

    Application Of ADNN For Background Subtraction In Smart Surveillance System

    Authors: Piyush Batra, Gagan Raj Singh, Neeraj Goyal

    Abstract: Object movement identification is one of the most researched problems in the field of computer vision. In this task, we try to classify a pixel as foreground or background. Even though numerous traditional machine learning and deep learning methods already exist for this problem, the two major issues with most of them are the need for large amounts of ground truth data and their inferior performan… ▽ More

    Submitted 31 December, 2022; originally announced January 2023.

  26. arXiv:2211.07524  [pdf, other

    cs.CL cs.AI

    Towards a Mathematics Formalisation Assistant using Large Language Models

    Authors: Ayush Agrawal, Siddhartha Gadgil, Navin Goyal, Ashvni Narayanan, Anand Tadipatri

    Abstract: Mathematics formalisation is the task of writing mathematics (i.e., definitions, theorem statements, proofs) in natural language, as found in books and papers, into a formal language that can then be checked for correctness by a program. It is a thriving activity today, however formalisation remains cumbersome. In this paper, we explore the abilities of a large language model (Codex) to help with… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  27. arXiv:2210.12786  [pdf, other

    cs.CL

    When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks

    Authors: Ankur Sikarwar, Arkil Patel, Navin Goyal

    Abstract: Humans can reason compositionally whilst grounding language utterances to the real world. Recent benchmarks like ReaSCAN use navigation tasks grounded in a grid world to assess whether neural models exhibit similar capabilities. In this work, we present a simple transformer-based model that outperforms specialized architectures on ReaSCAN and a modified version of gSCAN. On analyzing the task, we… ▽ More

    Submitted 30 October, 2022; v1 submitted 23 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  28. arXiv:2210.11024  [pdf, other

    cs.LG cs.AI cs.CV

    A survey on Self Supervised learning approaches for improving Multimodal representation learning

    Authors: Naman Goyal

    Abstract: Recently self supervised learning has seen explosive growth and use in variety of machine learning tasks because of its ability to avoid the cost of annotating large-scale datasets. This paper gives an overview for best self supervised learning approaches for multimodal learning. The presented approaches have been aggregated by extensive study of the literature and tackle the application of self… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

  29. arXiv:2208.03188  [pdf, other

    cs.CL cs.AI

    BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

    Authors: Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

    Abstract: We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (arc… ▽ More

    Submitted 10 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  30. arXiv:2206.15129  [pdf, other

    cs.AI cs.HC cs.LG

    Personalized Detection of Cognitive Biases in Actions of Users from Their Logs: Anchoring and Recency Biases

    Authors: Atanu R Sinha, Navita Goyal, Sunny Dhamnani, Tanay Asija, Raja K Dubey, M V Kaarthik Raja, Georgios Theocharous

    Abstract: Cognitive biases are mental shortcuts humans use in dealing with information and the environment, and which result in biased actions and behaviors (or, actions), unbeknownst to themselves. Biases take many forms, with cognitive biases occupying a central role that inflicts fairness, accountability, transparency, ethics, law, medicine, and discrimination. Detection of biases is considered a necessa… ▽ More

    Submitted 1 July, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

  31. arXiv:2205.11726  [pdf, other

    cs.CL cs.AI cs.LG

    On the Role of Bidirectionality in Language Model Pre-Training

    Authors: Mikel Artetxe, Jingfei Du, Naman Goyal, Luke Zettlemoyer, Ves Stoyanov

    Abstract: Prior work on language model pre-training has explored different architectures and learning objectives, but differences in data, hyperparameters and evaluation make a principled comparison difficult. In this work, we focus on bidirectionality as a key factor that differentiates existing approaches, and present a comprehensive study of its role in next token prediction, text infilling, zero-shot pr… ▽ More

    Submitted 26 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Findings of EMNLP 2022

  32. arXiv:2205.06266  [pdf, other

    cs.CL

    Lifting the Curse of Multilinguality by Pre-training Modular Transformers

    Authors: Jonas Pfeiffer, Naman Goyal, Xi Victoria Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe

    Abstract: Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learn… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: NAACL 2022

  33. arXiv:2205.01068  [pdf, other

    cs.CL cs.LG

    OPT: Open Pre-trained Transformer Language Models

    Authors: Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

    Abstract: Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open… ▽ More

    Submitted 21 June, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

  34. arXiv:2205.00501  [pdf, other

    cs.HC cs.AI cs.CL cs.LG

    Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation

    Authors: Nitesh Goyal, Ian Kivlichan, Rachel Rosen, Lucy Vasserman

    Abstract: Machine learning models are commonly used to detect toxicity in online conversations. These models are trained on datasets annotated by human raters. We explore how raters' self-described identities impact how they annotate toxicity in online comments. We first define the concept of specialized rater pools: rater pools formed based on raters' self-described identities, rather than at random. We fo… ▽ More

    Submitted 1 May, 2022; originally announced May 2022.

    Comments: Proceedings of ACM in Human Computer Interaction in ACM Conference On Computer- Supported Cooperative Work And Social Computing CSCW 2022

  35. arXiv:2204.14268  [pdf, other

    cs.CL cs.AI

    How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?

    Authors: Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzman

    Abstract: A multilingual tokenizer is a fundamental component of multilingual neural machine translation. It is trained from a multilingual corpus. Since a skewed data distribution is considered to be harmful, a sampling strategy is usually used to balance languages in the corpus. However, few works have systematically answered how language imbalance in tokenizer training affects downstream performance. In… ▽ More

    Submitted 10 September, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

    Comments: AMTA 2022

  36. arXiv:2203.07402  [pdf, other

    cs.CL

    Revisiting the Compositional Generalization Abilities of Neural Sequence Models

    Authors: Arkil Patel, Satwik Bhattamishra, Phil Blunsom, Navin Goyal

    Abstract: Compositional generalization is a fundamental trait in humans, allowing us to effortlessly combine known phrases to form novel sentences. Recent works have claimed that standard seq-to-seq models severely lack the ability to compositionally generalize. In this paper, we focus on one-shot primitive generalization as introduced by the popular SCAN benchmark. We demonstrate that modifying the trainin… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  37. arXiv:2203.03457  [pdf, other

    cs.LG cs.CV

    Graph Neural Networks for Image Classification and Reinforcement Learning using Graph representations

    Authors: Naman Goyal, David Steiner

    Abstract: In this paper, we will evaluate the performance of graph neural networks in two distinct domains: computer vision and reinforcement learning. In the computer vision section, we seek to learn whether a novel non-redundant representation for images as graphs can improve performance over trivial pixel to node mapping on a graph-level prediction graph, specifically image classification. For the reinfo… ▽ More

    Submitted 8 March, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: The work was done as a project for Neural Networks and Deep Learning course, Fall 2021 offering by Prof. Richard Zemel at Columbia University

  38. "You have to prove the threat is real": Understanding the needs of Female Journalists and Activists to Document and Report Online Harassment

    Authors: Nitesh Goyal, Leslie Park, Lucy Vasserman

    Abstract: Online harassment is a major societal challenge that impacts multiple communities. Some members of community, like female journalists and activists, bear significantly higher impacts since their profession requires easy accessibility, transparency about their identity, and involves highlighting stories of injustice. Through a multi-phased qualitative research study involving a focus group and inte… ▽ More

    Submitted 18 March, 2024; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: CHI Conference on Human Factors in Computing Systems (CHI '22), April 29-May 5, 2022, New Orleans, LA, USA

  39. arXiv:2201.07520  [pdf, other

    cs.CL

    CM3: A Causal Masked Multimodal Model of the Internet

    Authors: Armen Aghajanyan, Bernie Huang, Candace Ross, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer

    Abstract: We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens. Our new causally masked approach generates tokens left to right while also masking out a small number of long token spans that are generated at the end of the string, instead of their original positions. The casual masking obje… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

  40. arXiv:2112.10684  [pdf, other

    cs.CL cs.AI cs.LG

    Efficient Large Scale Language Modeling with Mixtures of Experts

    Authors: Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

    Abstract: Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning. With the exception of fine-tuning, we… ▽ More

    Submitted 26 October, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: EMNLP 2022

  41. arXiv:2112.10668  [pdf, other

    cs.CL cs.AI

    Few-shot Learning with Multilingual Language Models

    Authors: Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li

    Abstract: Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study t… ▽ More

    Submitted 10 November, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: Accepted to EMNLP 2022; 34 pages

  42. Next Day Wildfire Spread: A Machine Learning Data Set to Predict Wildfire Spreading from Remote-Sensing Data

    Authors: Fantine Huot, R. Lily Hu, Nita Goyal, Tharun Sankar, Matthias Ihme, Yi-Fan Chen

    Abstract: Predicting wildfire spread is critical for land management and disaster preparedness. To this end, we present `Next Day Wildfire Spread,' a curated, large-scale, multivariate data set of historical wildfires aggregating nearly a decade of remote-sensing data across the United States. In contrast to existing fire data sets based on Earth observation satellites, our data set combines 2D fire data wi… ▽ More

    Submitted 2 March, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

    Comments: submitted to IEEE Transactions on Geoscience and Remote Sensing

  43. arXiv:2111.09296  [pdf, other

    cs.CL cs.SD eess.AS

    XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

    Authors: Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

    Abstract: This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128 languages, an order of magnitude more public data than the largest known prior work. Our evaluation covers a wide range of tasks, domains, data regimes and languages, b… ▽ More

    Submitted 16 December, 2021; v1 submitted 17 November, 2021; originally announced November 2021.

  44. arXiv:2107.06959  [pdf, ps, other

    cs.CL cs.SD eess.AS

    FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task

    Authors: Yun Tang, Hongyu Gong, Xian Li, Changhan Wang, Juan Pino, Holger Schwenk, Naman Goyal

    Abstract: In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task. Our system is built by leveraging transfer learning across modalities, tasks and languages. First, we leverage general-purpose multilingual modules pretrained with large amounts of unlabelled and labelled data. We furth… ▽ More

    Submitted 14 August, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

    Comments: Accepted by IWSLT 2021 as a system paper

  45. arXiv:2106.10535  [pdf, other

    cs.LG cs.AI

    Learning and Generalization in Overparameterized Normalizing Flows

    Authors: Kulin Shah, Amit Deshpande, Navin Goyal

    Abstract: In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with a sufficiently small learning rate and suitable initialization. In contrast, the benefit of overparameterization in unsupervised learning is not well understood. Normalizing flows (NFs) constitute an impo… ▽ More

    Submitted 23 March, 2022; v1 submitted 19 June, 2021; originally announced June 2021.

    Comments: 75 pages, Accepted in AISTATS 2022

  46. arXiv:2106.03193  [pdf, other

    cs.CL cs.AI

    The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

    Authors: Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc'Aurelio Ranzato, Francisco Guzman, Angela Fan

    Abstract: One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either lack good coverage of low-resource languages, consider only restricted domains, or are low quality because they are constructed using semi-automatic procedures. In this work, we introduce the FLORES-101 evaluation benc… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

  47. arXiv:2106.00047  [pdf, other

    cs.LG

    Learning and Generalization in RNNs

    Authors: Abhishek Panigrahi, Navin Goyal

    Abstract: Simple recurrent neural networks (RNNs) and their more advanced cousins LSTMs etc. have been very successful in sequence modeling. Their theoretical understanding, however, is lacking and has not kept pace with the progress for feedforward networks, where a reasonably complete understanding in the special case of highly overparametrized one-hidden-layer networks has emerged. In this paper, we make… ▽ More

    Submitted 31 May, 2021; originally announced June 2021.

  48. arXiv:2105.15071  [pdf, other

    cs.CL

    Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

    Authors: Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab

    Abstract: The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages. Fortunately, some low-resource languages are linguistically related or similar to high-resource languages; these related languages may share many lexical or syntactic structures. In this work, we exploit this linguistic overlap to facilitate translating to and from a… ▽ More

    Submitted 1 June, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: ACL 2021

  49. arXiv:2105.00572  [pdf, ps, other

    cs.CL

    Larger-Scale Transformers for Multilingual Masked Language Modeling

    Authors: Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau

    Abstract: Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large m… ▽ More

    Submitted 2 May, 2021; originally announced May 2021.

    Comments: 4 pages

  50. arXiv:2104.14095  [pdf, ps, other

    cs.AI cs.LG

    Analyzing the Nuances of Transformers' Polynomial Simplification Abilities

    Authors: Vishesh Agarwal, Somak Aditya, Navin Goyal

    Abstract: Symbolic Mathematical tasks such as integration often require multiple well-defined steps and understanding of sub-tasks to reach a solution. To understand Transformers' abilities in such tasks in a fine-grained manner, we deviate from traditional end-to-end settings, and explore a step-wise polynomial simplification task. Polynomials can be written in a simple normal form as a sum of monomials wh… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: 16 pages, 18 Tables, Accepted ICLR 2021 MathAI Workshop