Skip to main content

Showing 1–50 of 55 results for author: Narang, S

  1. arXiv:2406.10229  [pdf, other

    cs.LG cs.AI

    Quantifying Variance in Evaluation Benchmarks

    Authors: Lovish Madaan, Aaditya K. Singh, Rylan Schaeffer, Andrew Poulton, Sanmi Koyejo, Pontus Stenetorp, Sharan Narang, Dieuwke Hupkes

    Abstract: Evaluation benchmarks are the cornerstone of measuring capabilities of large language models (LLMs), as well as driving progress in said capabilities. Originally designed to make claims about capabilities (or lack thereof) in fully pretrained models, evaluation benchmarks are now also extensively used to decide between various training choices. Despite this widespread usage, we rarely quantify the… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  2. arXiv:2405.18064  [pdf

    cs.AI cs.CV

    Automated Real-World Sustainability Data Generation from Images of Buildings

    Authors: Peter J Bentley, Soo Ling Lim, Rajat Mathur, Sid Narang

    Abstract: When data on building features is unavailable, the task of determining how to improve that building in terms of carbon emissions becomes infeasible. We show that from only a set of images, a Large Language Model with appropriate prompt engineering and domain knowledge can successfully estimate a range of building features relevant for sustainability calculations. We compare our novel image-to-data… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 6 pages

    MSC Class: 68T07; 94A08

  3. arXiv:2405.16042  [pdf, other

    cs.CL

    Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention

    Authors: Andrew Li, Xianle Feng, Siddhant Narang, Austin Peng, Tianle Cai, Raj Sanjay Shah, Sashank Varma

    Abstract: When reading temporarily ambiguous garden-path sentences, misinterpretations sometimes linger past the point of disambiguation. This phenomenon has traditionally been studied in psycholinguistic experiments using online measures such as reading times and offline measures such as comprehension questions. Here, we investigate the processing of garden-path sentences and the fate of lingering misinter… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by CogSci-24

  4. arXiv:2405.12934  [pdf

    cs.CY cs.CE cs.IR

    Address-Specific Sustainable Accommodation Choice Through Real-World Data Integration

    Authors: Peter J. Bentley, Rajat Mathur, Soo Ling Lim, Sid Narang

    Abstract: Consumers wish to choose sustainable accommodation for their travels, and in the case of corporations, may be required to do so. Yet accommodation marketplaces provide no meaningful capability for sustainable choice: typically CO2 estimates are provided that are identical for all accommodation of the same type across an entire country. We propose a decision support system that enables real choice… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 8 pages

    MSC Class: 68U35 ACM Class: E.m; H.m

  5. arXiv:2403.19725  [pdf, other

    cs.CL cs.AI cs.LG

    MUGC: Machine Generated versus User Generated Content Detection

    Authors: Yaqi Xie, Anjali Rawal, Yujing Cen, Dixuan Zhao, Sunil K Narang, Shanu Sushmita

    Abstract: As advanced modern systems like deep neural networks (DNNs) and generative AI continue to enhance their capabilities in producing convincing and realistic content, the need to distinguish between user-generated and machine generated content is becoming increasingly evident. In this research, we undertake a comparative evaluation of eight traditional machine-learning algorithms to distinguish betwe… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 11 pages, 16 figures

  6. arXiv:2402.09589  [pdf, other

    cs.NI cs.DC cs.LG

    MLTCP: Congestion Control for DNN Training

    Authors: Sudarsanan Rajasekaran, Sanjoli Narang, Anton A. Zabreyko, Manya Ghobadi

    Abstract: We present MLTCP, a technique to augment today's congestion control algorithms to accelerate DNN training jobs in shared GPU clusters. MLTCP enables the communication phases of jobs that compete for network bandwidth to interleave with each other, thereby utilizing the network efficiently. At the heart of MLTCP lies a very simple principle based on a key conceptual insight: DNN training flows shou… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  7. arXiv:2402.04645  [pdf, other

    cs.GT

    Capacity Modification in the Stable Matching Problem

    Authors: Salil Gokhale, Shivika Narang, Samarth Singla, Rohit Vaish

    Abstract: We study the problem of capacity modification in the many-to-one stable matching of workers and firms. Our goal is to systematically study how the set of stable matchings changes when some seats are added to or removed from the firms. We make three main contributions: First, we examine whether firms and workers can improve or worsen upon changing the capacities under worker-proposing and firm-prop… ▽ More

    Submitted 18 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Appears in the Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems 2024 (AAMAS 2024) (https://dl.acm.org/doi/10.5555/3635637.3662922)

  8. arXiv:2309.16039  [pdf, other

    cs.CL

    Effective Long-Context Scaling of Foundation Models

    Authors: Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma

    Abstract: We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts are upsampled. We perform extensive evaluation on language modeling, synthetic context probing tasks, and a wide range of research benchmarks. On research benchm… ▽ More

    Submitted 13 November, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

  9. arXiv:2307.09288  [pdf, other

    cs.CL cs.AI

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini , et al. (43 additional authors not shown)

    Abstract: In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be… ▽ More

    Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

  10. arXiv:2305.00040  [pdf, other

    cs.GT

    Fair Distribution of Delivery Orders

    Authors: Hadi Hosseini, Shivika Narang, Tomasz Wąs

    Abstract: We initiate the study of fair distribution of delivery tasks among a set of agents wherein delivery jobs are placed along the vertices of a graph. Our goal is to fairly distribute delivery costs (modeled as a submodular function) among a fixed set of agents while satisfying some desirable notions of economic efficiency. We adopt well-established fairness concepts -- such as envy-freeness up to one… ▽ More

    Submitted 18 June, 2024; v1 submitted 28 April, 2023; originally announced May 2023.

    Comments: Appears in the 33rd International Joint Conference on Artificial Intelligence (IJCAI), 2024

  11. arXiv:2304.09871  [pdf, other

    cs.LG cs.AI math.OC

    A Theory on Adam Instability in Large-Scale Machine Learning

    Authors: Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang

    Abstract: We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We observe that Adam can enter a state in which the parameter update vector has a relatively large norm and is essentially uncorrelated with the direction of descent… ▽ More

    Submitted 25 April, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

  12. arXiv:2304.09151  [pdf, other

    cs.CL

    UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining

    Authors: Hyung Won Chung, Noah Constant, Xavier Garcia, Adam Roberts, Yi Tay, Sharan Narang, Orhan Firat

    Abstract: Pretrained multilingual large language models have typically used heuristic temperature-based sampling to balance between different languages. However previous work has not systematically evaluated the efficacy of different pretraining language distributions across model scales. In this paper, we propose a new sampling method, UniMax, that delivers more uniform coverage of head languages while mit… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

  13. arXiv:2212.10562  [pdf, other

    cs.CL cs.CV

    Character-Aware Models Improve Visual Text Rendering

    Authors: Rosanne Liu, Dan Garrette, Chitwan Saharia, William Chan, Adam Roberts, Sharan Narang, Irina Blok, RJ Mical, Mohammad Norouzi, Noah Constant

    Abstract: Current image generation models struggle to reliably produce well-formed visual text. In this paper, we investigate a key contributing factor: popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs. To quantify this effect, we conduct a series of experiments comparing character-aware vs. character-blind text e… ▽ More

    Submitted 3 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  14. arXiv:2210.13432  [pdf, other

    cs.CL

    Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models

    Authors: Hao Liu, Xinyang Geng, Lisa Lee, Igor Mordatch, Sergey Levine, Sharan Narang, Pieter Abbeel

    Abstract: Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide range of tasks. In this work, we propose a simple technique that significantly boosts the performance of LLMs without adding computational cost. Our key observati… ▽ More

    Submitted 31 January, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Added T-FCM and better FCM results

  15. arXiv:2210.11416  [pdf, other

    cs.LG cs.CL

    Scaling Instruction-Finetuned Language Models

    Authors: Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang , et al. (10 additional authors not shown)

    Abstract: Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects d… ▽ More

    Submitted 6 December, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: Public checkpoints: https://huggingface.co/docs/transformers/model_doc/flan-t5

  16. arXiv:2210.03945  [pdf, other

    cs.LG cs.AI

    Understanding HTML with Large Language Models

    Authors: Izzeddin Gur, Ofir Nachum, Yingjie Miao, Mustafa Safdari, Austin Huang, Aakanksha Chowdhery, Sharan Narang, Noah Fiedel, Aleksandra Faust

    Abstract: Large language models (LLMs) have shown exceptional performance on a variety of natural language tasks. Yet, their capabilities for HTML understanding -- i.e., parsing the raw HTML of a webpage, with applications to automation of web-based tasks, crawling, and browser-assisted retrieval -- have not been fully explored. We contribute HTML understanding models (fine-tuned LLMs) and an in-depth analy… ▽ More

    Submitted 19 May, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

  17. arXiv:2207.10551  [pdf, other

    cs.LG cs.CL

    Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?

    Authors: Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler

    Abstract: There have been a lot of interest in the scaling properties of Transformer models. However, not much has been done on the front of investigating the effect of scaling properties of different inductive biases and model architectures. Do model architectures scale differently? If so, how does inductive bias affect scaling behaviour? How does this influence upstream (pretraining) and downstream (trans… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

  18. arXiv:2207.01589  [pdf, ps, other

    cs.GT cs.MA

    Repeatedly Matching Items to Agents Fairly and Efficiently

    Authors: Ioannis Caragiannis, Shivika Narang

    Abstract: We consider a novel setting where a set of items are matched to the same set of agents repeatedly over multiple rounds. Each agent gets exactly one item per round, which brings interesting challenges to finding efficient and/or fair {\em repeated matchings}. A particular feature of our model is that the value of an agent for an item in some round depends on the number of rounds in which the item h… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: 20 pages

  19. arXiv:2204.02311  [pdf, other

    cs.CL

    PaLM: Scaling Language Modeling with Pathways

    Authors: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin , et al. (42 additional authors not shown)

    Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran… ▽ More

    Submitted 5 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

  20. arXiv:2203.17189  [pdf, other

    cs.LG cs.CL

    Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

    Authors: Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen , et al. (18 additional authors not shown)

    Abstract: Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we presen… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  21. arXiv:2203.11171  [pdf, other

    cs.CL cs.AI

    Self-Consistency Improves Chain of Thought Reasoning in Language Models

    Authors: Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou

    Abstract: Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consist… ▽ More

    Submitted 7 March, 2023; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: Published at ICLR 2023. V2: added PaLM results; V3: added UL2 results; V4: camera ready version at ICLR 2023

  22. arXiv:2110.00767  [pdf, other

    cs.GT

    Sublinear Approximation Algorithm for Nash Social Welfare with XOS Valuations

    Authors: Siddharth Barman, Anand Krishna, Pooja Kulkarni, Shivika Narang

    Abstract: We study the problem of allocating indivisible goods among $n$ agents with the objective of maximizing Nash social welfare (NSW). This welfare function is defined as the geometric mean of the agents' valuations and, hence, it strikes a balance between the extremes of social welfare (arithmetic mean) and egalitarian welfare (max-min value). Nash social welfare has been extensively studied in recent… ▽ More

    Submitted 15 July, 2022; v1 submitted 2 October, 2021; originally announced October 2021.

    Comments: 41 pages

  23. arXiv:2109.10686  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

    Authors: Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

    Abstract: There remain many open questions pertaining to the scaling behaviour of Transformer architectures. These scaling decisions and findings can be critical, as training runs often come with an associated computational cost which have both financial and/or environmental impact. The goal of this paper is to present scaling insights from pretraining and finetuning Transformers. While Kaplan et al. presen… ▽ More

    Submitted 30 January, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

    Comments: ICLR 2022 + Updated Checkpoint Release

  24. arXiv:2108.08990  [pdf, other

    cs.CV cs.LG

    Few Shot Activity Recognition Using Variational Inference

    Authors: Neeraj Kumar, Siddhansh Narang

    Abstract: There has been a remarkable progress in learning a model which could recognise novel classes with only a few labeled examples in the last few years. Few-shot learning (FSL) for action recognition is a challenging task of recognising novel action categories which are represented by few instances in the training data. We propose a novel variational inference based architectural framework (HF-AR) for… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

    Comments: Accepted in IJCAI 2021 - 3RD INTERNATIONAL WORKSHOP ON DEEP LEARNING FOR HUMAN ACTIVITY RECOGNITION. arXiv admin note: text overlap with arXiv:1611.09630, arXiv:1909.07945 by other authors

  25. arXiv:2105.13626  [pdf, other

    cs.CL

    ByT5: Towards a token-free future with pre-trained byte-to-byte models

    Authors: Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel

    Abstract: Most widely-used pre-trained language models operate on sequences of tokens corresponding to word or subword units. By comparison, token-free models that operate directly on raw text (bytes or characters) have many benefits: they can process text in any language out of the box, they are more robust to noise, and they minimize technical debt by removing complex and error-prone text preprocessing pi… ▽ More

    Submitted 7 March, 2022; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: To be published in TACL 2022

  26. arXiv:2104.04631  [pdf, other

    cs.CV

    DexYCB: A Benchmark for Capturing Hand Grasping of Objects

    Authors: Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, Jan Kautz, Dieter Fox

    Abstract: We introduce DexYCB, a new dataset for capturing hand grasping of objects. We first compare DexYCB with a related one through cross-dataset evaluation. We then present a thorough benchmark of state-of-the-art approaches on three relevant tasks: 2D object and keypoint detection, 6D object pose estimation, and 3D hand pose estimation. Finally, we evaluate a new robotics-relevant task: generating saf… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR 2021

  27. arXiv:2103.16388  [pdf

    q-fin.ST cs.LG

    Text Mining of Stocktwits Data for Predicting Stock Prices

    Authors: Mukul Jaggi, Priyanka Mandal, Shreya Narang, Usman Naseem, Matloob Khushi

    Abstract: Stock price prediction can be made more efficient by considering the price fluctuations and understanding the sentiments of people. A limited number of models understand financial jargon or have labelled datasets concerning stock price change. To overcome this challenge, we introduced FinALBERT, an ALBERT based model trained to handle financial domain text classification tasks by labelling Stocktw… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

    Journal ref: Appl. Syst. Innov. 2021, 4, 13

  28. arXiv:2102.11972  [pdf, other

    cs.LG cs.CL

    Do Transformer Modifications Transfer Across Implementations and Applications?

    Authors: Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel

    Abstract: The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, we f… ▽ More

    Submitted 10 September, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: To appear at EMNLP 2021 as a conference paper

  29. arXiv:2101.05452  [pdf, other

    cs.RO

    Interpreting and Predicting Tactile Signals for the SynTouch BioTac

    Authors: Yashraj S. Narang, Balakumar Sundaralingam, Karl Van Wyk, Arsalan Mousavian, Dieter Fox

    Abstract: In the human hand, high-density contact information provided by afferent neurons is essential for many human grasping and manipulation capabilities. In contrast, robotic tactile sensors, including the state-of-the-art SynTouch BioTac, are typically used to provide low-density contact information, such as contact location, center of pressure, and net force. Although useful, these data do not convey… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

    Comments: Submitted to International Journal of Robotics Research (IJRR)

  30. SHAD3S: A model to Sketch, Shade and Shadow

    Authors: Raghav B. Venkataramaiyer, Abhishek Joshi, Saisha Narang, Vinay P. Namboodiri

    Abstract: Hatching is a common method used by artists to accentuate the third dimension of a sketch, and to illuminate the scene. Our system SHAD3S attempts to compete with a human at hatching generic three-dimensional (3D) shapes, and also tries to assist her in a form exploration exercise. The novelty of our approach lies in the fact that we make no assumptions about the input other than that it represent… ▽ More

    Submitted 4 September, 2021; v1 submitted 13 November, 2020; originally announced November 2020.

    Comments: 10 pages, 11 figures, 2 tables Accepted to WACV 2021. Project Page: https://bvraghav.com/shad3s/

    Journal ref: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 3615-3624

  31. arXiv:2010.04826  [pdf, other

    cs.CL cs.AI

    On Task-Level Dialogue Composition of Generative Transformer Model

    Authors: Prasanna Parthasarathi, Arvind Neelakantan, Sharan Narang

    Abstract: Task-oriented dialogue systems help users accomplish tasks such as booking a movie ticket and ordering food via conversation. Generative models parameterized by a deep neural network are widely used for next turn response generation in such systems. It is natural for users of the system to want to accomplish multiple tasks within the same conversation, but the ability of generative models to compo… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: 8 pages; Accepted at Workshop on Insights from Negative Results in NLP

  32. arXiv:2009.05823  [pdf, ps, other

    cs.GT cs.DS

    On Achieving Leximin Fairness and Stability in Many-to-One Matchings

    Authors: Shivika Narang, Arpita Biswas, Y Narahari

    Abstract: The past few years have seen a surge of work on fairness in allocation problems where items must be fairly divided among agents having individual preferences. In comparison, fairness in settings with preferences on both sides, that is, where agents have to be matched to other agents, has received much less attention. Moreover, two-sided matching literature has largely focused on ordinal preference… ▽ More

    Submitted 25 May, 2022; v1 submitted 12 September, 2020; originally announced September 2020.

  33. arXiv:2006.03777  [pdf, other

    cs.RO

    Interpreting and Predicting Tactile Signals via a Physics-Based and Data-Driven Framework

    Authors: Yashraj S. Narang, Karl Van Wyk, Arsalan Mousavian, Dieter Fox

    Abstract: High-density afferents in the human hand have long been regarded as essential for human grasping and manipulation abilities. In contrast, robotic tactile sensors are typically used to provide low-density contact data, such as center-of-pressure and resultant force. Although useful, this data does not exploit the rich information content that some tactile sensors (e.g., the SynTouch BioTac) natural… ▽ More

    Submitted 6 June, 2020; originally announced June 2020.

    Comments: To be published in Proc. Robotics: Science and Systems (RSS)

  34. arXiv:2004.14546  [pdf, other

    cs.CL cs.LG

    WT5?! Training Text-to-Text Models to Explain their Predictions

    Authors: Sharan Narang, Colin Raffel, Katherine Lee, Adam Roberts, Noah Fiedel, Karishma Malkan

    Abstract: Neural networks have recently achieved human-level performance on various challenging natural language processing (NLP) tasks, but it is notoriously difficult to understand why a neural network produced a particular prediction. In this paper, we leverage the text-to-text framework proposed by Raffel et al.(2019) to train language models to output a natural text explanation alongside their predicti… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

  35. arXiv:2002.03246  [pdf, other

    cs.MA cs.CL

    SPA: Verbal Interactions between Agents and Avatars in Shared Virtual Environments using Propositional Planning

    Authors: Andrew Best, Sahil Narang, Dinesh Manocha

    Abstract: We present a novel approach for generating plausible verbal interactions between virtual human-like agents and user avatars in shared virtual environments. Sense-Plan-Ask, or SPA, extends prior work in propositional planning and natural language processing to enable agents to plan with uncertain information, and leverage question and answer dialogue with other agents and avatars to obtain the need… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  36. arXiv:2001.05655  [pdf, other

    cs.GT

    Design of Trusted Market Platforms using Permissioned Blockchains and Game Theory

    Authors: Shivika Narang

    Abstract: The blockchain concept forms the backbone of a new wave technology that promises to be deployed extensively in a wide variety of industrial and societal applications. Governments, financial institutions, banks, industrial supply chains, service companies, and even educational institutions and hospitals are investing in a substantial manner in the hope of improving business efficiency and operation… ▽ More

    Submitted 16 January, 2020; originally announced January 2020.

    Comments: Thesis

  37. arXiv:2001.05652  [pdf, other

    cs.GT

    On the Coexistence of Stability and Incentive Compatibility in Fractional Matchings

    Authors: Shivika Narang, Y Narahari

    Abstract: Stable matchings have been studied extensively in social choice literature. The focus has been mostly on integral matchings, in which the nodes on the two sides are wholly matched. A fractional matching, which is a convex combination of integral matchings, is a natural extension of integral matchings. The topic of stability of fractional matchings has started receiving attention only very recently… ▽ More

    Submitted 19 April, 2022; v1 submitted 16 January, 2020; originally announced January 2020.

  38. arXiv:1910.14613  [pdf, other

    cs.LG cs.CL stat.ML

    Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning

    Authors: Arvind Neelakantan, Semih Yavuz, Sharan Narang, Vishaal Prasad, Ben Goodrich, Daniel Duckworth, Chinnadhurai Sankar, Xifeng Yan

    Abstract: Task-oriented dialog presents a difficult challenge encompassing multiple problems including multi-turn language understanding and generation, knowledge retrieval and reasoning, and action prediction. Modern dialog systems typically begin by converting conversation history to a symbolic object referred to as belief state by using supervised learning. The belief state is then used to reason on an e… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

  39. arXiv:1910.10683  [pdf, other

    cs.LG cs.CL stat.ML

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

    Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

    Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing… ▽ More

    Submitted 19 September, 2023; v1 submitted 23 October, 2019; originally announced October 2019.

  40. arXiv:1901.09427  [pdf, ps, other

    cs.GT

    Fair Division of Indivisible Goods Among Strategic Agents

    Authors: Siddharth Barman, Ganesh Ghalme, Shweta Jain, Pooja Kulkarni, Shivika Narang

    Abstract: We study fair division of indivisible goods in a single-parameter environment. In particular, we develop truthful social welfare maximizing mechanisms for fairly allocating indivisible goods. Our fairness guarantees are in terms of solution concepts which are tailored to address allocation of indivisible goods and, hence, provide an appropriate framework for fair division of goods. This work speci… ▽ More

    Submitted 27 January, 2019; originally announced January 2019.

  41. arXiv:1712.00409  [pdf, other

    cs.LG stat.ML

    Deep Learning Scaling is Predictable, Empirically

    Authors: Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

    Abstract: Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, comput… ▽ More

    Submitted 1 December, 2017; originally announced December 2017.

    Comments: 19 pages, 11 figures

  42. arXiv:1711.02782  [pdf, other

    cs.LG cs.AI stat.ML

    Block-Sparse Recurrent Neural Networks

    Authors: Sharan Narang, Eric Undersander, Gregory Diamos

    Abstract: Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their den… ▽ More

    Submitted 7 November, 2017; originally announced November 2017.

  43. arXiv:1710.07654  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

    Authors: Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller

    Abstract: We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system. Deep Voice 3 matches state-of-the-art neural speech synthesis systems in naturalness while training ten times faster. We scale Deep Voice 3 to data set sizes unprecedented for TTS, training on more than eight hundred hours of audio from over two thousand speakers. In addition, we identify common erro… ▽ More

    Submitted 22 February, 2018; v1 submitted 20 October, 2017; originally announced October 2017.

    Comments: Published as a conference paper at ICLR 2018. (v3 changed paper title)

  44. arXiv:1710.03740  [pdf, other

    cs.AI cs.LG stat.ML

    Mixed Precision Training

    Authors: Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu

    Abstract: Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using half precision floating point numbers. In our technique, weights, activations and g… ▽ More

    Submitted 15 February, 2018; v1 submitted 10 October, 2017; originally announced October 2017.

    Comments: Published as a conference paper at ICLR 2018

  45. arXiv:1704.05119  [pdf, other

    cs.LG cs.CL

    Exploring Sparsity in Recurrent Neural Networks

    Authors: Sharan Narang, Erich Elsen, Gregory Diamos, Shubho Sengupta

    Abstract: Recurrent Neural Networks (RNN) are widely used to solve a variety of problems and as the quantity of data and the amount of available compute have increased, so have model sizes. The number of parameters in recent state-of-the-art networks makes them hard to deploy, especially on mobile phones and embedded devices. The challenge is due to both the size of the model and the time it takes to evalua… ▽ More

    Submitted 6 November, 2017; v1 submitted 17 April, 2017; originally announced April 2017.

    Comments: Published as a conference paper at ICLR 2017

  46. arXiv:1703.08561  [pdf, other

    cs.RO cs.MA

    AutonoVi: Autonomous Vehicle Planning with Dynamic Maneuvers and Traffic Constraints

    Authors: Andrew Best, Sahil Narang, Daniel Barber, Dinesh Manocha

    Abstract: We present AutonoVi:, a novel algorithm for autonomous vehicle navigation that supports dynamic maneuvers and satisfies traffic constraints and norms. Our approach is based on optimization-based maneuver planning that supports dynamic lane-changes, swerving, and braking in all traffic scenarios and guides the vehicle to its goal position. We take into account various traffic constraints, including… ▽ More

    Submitted 29 March, 2017; v1 submitted 24 March, 2017; originally announced March 2017.

    Comments: 9 pages, 6 figures

  47. arXiv:1607.04381  [pdf, other

    cs.CV

    DSD: Dense-Sparse-Dense Training for Deep Neural Networks

    Authors: Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally

    Abstract: Modern deep neural networks have a large number of parameters, making them very hard to train. We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance. In the first D (Dense) step, we train a dense network to learn connection weights and importance. In the S (Sparse) step, we regularize the network by pruning the unimp… ▽ More

    Submitted 21 February, 2017; v1 submitted 15 July, 2016; originally announced July 2016.

    Comments: Published as a conference paper at ICLR 2017

  48. arXiv:1602.03623  [pdf, other

    cs.MA

    Dynamic Group Behaviors for Interactive Crowd Simulation

    Authors: Liang He, Jia Pan, Sahil Narang, Wenping Wang, Dinesh Manocha

    Abstract: We present a new algorithm to simulate dynamic group behaviors for interactive multi-agent crowd simulation. Our approach is general and makes no assumption about the environment, shape, or size of the groups. We use the least effort principle to perform coherent group navigation and present efficient inter-group and intra-group maintenance techniques. We extend the reciprocal collision avoidance… ▽ More

    Submitted 11 February, 2016; originally announced February 2016.

  49. arXiv:1512.02595  [pdf, other

    cs.CL

    Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

    Authors: Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh , et al. (9 additional authors not shown)

    Abstract: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our app… ▽ More

    Submitted 8 December, 2015; originally announced December 2015.

  50. arXiv:1310.2646  [pdf, ps, other

    cs.LG

    Localized Iterative Methods for Interpolation in Graph Structured Data

    Authors: Sunil K. Narang, Akshay Gadde, Eduard Sanou, Antonio Ortega

    Abstract: In this paper, we present two localized graph filtering based methods for interpolating graph signals defined on the vertices of arbitrary graphs from only a partial set of samples. The first method is an extension of previous work on reconstructing bandlimited graph signals from partially observed samples. The iterative graph filtering approach very closely approximates the solution proposed in t… ▽ More

    Submitted 9 October, 2013; originally announced October 2013.