Skip to main content

Showing 1–30 of 30 results for author: Dasgupta, I

  1. arXiv:2406.09175  [pdf, other

    cs.CV cs.CL

    ReMI: A Dataset for Reasoning with Multiple Images

    Authors: Mehran Kazemi, Nishanth Dikkala, Ankit Anand, Petar Devic, Ishita Dasgupta, Fangyu Liu, Bahare Fatemi, Pranjal Awasthi, Dee Guo, Sreenivas Gollapudi, Ahmed Qureshi

    Abstract: With the continuous advancement of large language models (LLMs), it is essential to create new benchmarks to effectively evaluate their expanding capabilities and identify areas for improvement. This work focuses on multi-image reasoning, an emerging capability in state-of-the-art LLMs. We introduce ReMI, a dataset designed to assess LLMs' ability to Reason with Multiple Images. This dataset encom… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2404.10179  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Scaling Instructable Agents Across Many Simulated Worlds

    Authors: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi , et al. (68 additional authors not shown)

    Abstract: Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructio… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 March, 2024; originally announced April 2024.

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2403.01693  [pdf, other

    cs.CV cs.AI

    HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances

    Authors: Supreeth Narasimhaswamy, Uttaran Bhattacharya, Xiang Chen, Ishita Dasgupta, Saayan Mitra, Minh Hoai

    Abstract: Text-to-image generative models can generate high-quality humans, but realism is lost when generating hands. Common artifacts include irregular hand poses, shapes, incorrect numbers of fingers, and physically implausible finger orientations. To generate images with realistic hands, we propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting hand embeddings… ▽ More

    Submitted 21 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: Revisions: 1. Added a link to project page in the abstract, 2. Updated references and related work, 3. Fixed some grammatical errors

  5. arXiv:2402.07872  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

    Authors: Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, Brian Ichter

    Abstract: Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs produce only textual outputs, while robotic control and other spatial tasks require outputting continuous coordinates, actions, or trajectories. How can we ena… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  6. arXiv:2402.07282  [pdf, other

    cs.CL cs.AI cs.LG

    How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?

    Authors: Ryan Liu, Theodore R. Sumers, Ishita Dasgupta, Thomas L. Griffiths

    Abstract: In day-to-day communication, people often approximate the truth - for example, rounding the time or omitting details - in order to be maximally helpful to the listener. How do large language models (LLMs) handle such nuanced trade-offs? To address this question, we use psychological models and experiments designed to characterize human behavior to analyze LLMs. We test a range of LLMs and explore… ▽ More

    Submitted 13 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  7. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  8. arXiv:2311.00445  [pdf, other

    cs.CL cs.AI cs.LG

    A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models

    Authors: Tiwalayo Eisape, MH Tessler, Ishita Dasgupta, Fei Sha, Sjoerd van Steenkiste, Tal Linzen

    Abstract: A central component of rational behavior is logical inference: the process of determining which conclusions follow from a set of premises. Psychologists have documented several ways in which humans' inferences deviate from the rules of logic. Do language models, which are trained on text generated by humans, replicate such human biases, or are they able to overcome them? Focusing on the case of sy… ▽ More

    Submitted 11 April, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  9. arXiv:2310.19956  [pdf, other

    cs.CL

    The Impact of Depth on Compositional Generalization in Transformer Language Models

    Authors: Jackson Petty, Sjoerd van Steenkiste, Ishita Dasgupta, Fei Sha, Dan Garrette, Tal Linzen

    Abstract: To process novel sentences, language models (LMs) must generalize compositionally -- combine familiar elements in new ways. What aspects of a model's structure promote compositional generalization? Focusing on transformers, we test the hypothesis, motivated by theoretical and empirical work, that deeper transformers generalize more compositionally. Simply adding layers increases the total number o… ▽ More

    Submitted 10 April, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted to NAACL 2024

  10. arXiv:2309.11564  [pdf, other

    cs.LG cs.CL

    Hierarchical reinforcement learning with natural language subgoals

    Authors: Arun Ahuja, Kavya Kopparapu, Rob Fergus, Ishita Dasgupta

    Abstract: Hierarchical reinforcement learning has been a compelling approach for achieving goal directed behavior over long sequences of actions. However, it has been challenging to implement in realistic or open-ended environments. A main challenge has been to find the right space of sub-goals over which to instantiate a hierarchy. We present a novel approach where we use data from humans solving these tas… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  11. arXiv:2309.00359  [pdf, other

    cs.CL cs.CV

    Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior

    Authors: Ashmit Khandelwal, Aditya Agrawal, Aanisha Bhattacharyya, Yaman K Singla, Somesh Singh, Uttaran Bhattacharya, Ishita Dasgupta, Stefano Petrangeli, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy

    Abstract: Shannon and Weaver's seminal information theory divides communication into three levels: technical, semantic, and effectiveness. While the technical level deals with the accurate reconstruction of transmitted symbols, the semantic and effectiveness levels deal with the inferred meaning and its effect on the receiver. Large Language Models (LLMs), with their wide generalizability, make some progres… ▽ More

    Submitted 16 March, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

  12. arXiv:2305.16183  [pdf, other

    cs.LG cs.AI cs.CL

    Passive learning of active causal strategies in agents and language models

    Authors: Andrew Kyle Lampinen, Stephanie C Y Chan, Ishita Dasgupta, Andrew J Nam, Jane X Wang

    Abstract: What can be learned about causality and experimentation from passive data? This question is salient given recent successes of passively-trained language models in interactive domains such as tool use. Passive learning is inherently limited. However, we show that purely passive learning can in fact allow an agent to learn generalizable strategies for determining and using causal structures, as long… ▽ More

    Submitted 2 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Advances in Neural Information Processing Systems (NeurIPS 2023). 10 pages main text

  13. arXiv:2304.06729  [pdf, other

    cs.AI cs.LG

    Meta-Learned Models of Cognition

    Authors: Marcel Binz, Ishita Dasgupta, Akshay Jagadish, Matthew Botvinick, Jane X. Wang, Eric Schulz

    Abstract: Meta-learning is a framework for learning learning algorithms through repeated interactions with an environment as opposed to designing them by hand. In recent years, this framework has established itself as a promising tool for building models of human cognition. Yet, a coherent research program around meta-learned models of cognition is still missing. The purpose of this article is to synthesize… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  14. arXiv:2302.00763  [pdf, other

    cs.LG cs.AI cs.CL

    Collaborating with language models for embodied reasoning

    Authors: Ishita Dasgupta, Christine Kaeser-Chen, Kenneth Marino, Arun Ahuja, Sheila Babayan, Felix Hill, Rob Fergus

    Abstract: Reasoning in a complex and ambiguous environment is a key goal for Reinforcement Learning (RL) agents. While some sophisticated RL agents can successfully solve difficult tasks, they require a large amount of training data and often struggle to generalize to new unseen environments and new tasks. On the other hand, Large Scale Language Models (LSLMs) have exhibited strong reasoning ability and the… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: Presented at NeurIPS 2022 Language and Reinforcement Learning Workshop (best paper) and NeurIPS 2022 Foundation Models for Decision Making Workshop. 4 pages main; 14 pages total (including references and appendix); 3 figures

  15. arXiv:2301.12507  [pdf, other

    cs.AI

    Distilling Internet-Scale Vision-Language Models into Embodied Agents

    Authors: Theodore Sumers, Kenneth Marino, Arun Ahuja, Rob Fergus, Ishita Dasgupta

    Abstract: Instruction-following agents must ground language into their observation and action spaces. Learning to ground language is challenging, typically requiring domain-specific engineering or large quantities of human interaction data. To address this challenge, we propose using pretrained vision-language models (VLMs) to supervise embodied agents. We combine ideas from model distillation and hindsight… ▽ More

    Submitted 14 June, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: 9 pages, 7 figures. Presented at ICML 2023

  16. arXiv:2211.00177  [pdf, other

    cs.LG cs.IR cs.SI

    Learning to Navigate Wikipedia by Taking Random Walks

    Authors: Manzil Zaheer, Kenneth Marino, Will Grathwohl, John Schultz, Wendy Shang, Sheila Babayan, Arun Ahuja, Ishita Dasgupta, Christine Kaeser-Chen, Rob Fergus

    Abstract: A fundamental ability of an intelligent web-based agent is seeking out and acquiring new information. Internet search engines reliably find the correct vicinity but the top results may be a few links away from the desired target. A complementary approach is navigation via hyperlinks, employing a policy that comprehends local content and selects a link that moves it closer to the target. In this pa… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Journal ref: NeurIPS 2022

  17. arXiv:2210.05675  [pdf, other

    cs.CL cs.AI cs.LG

    Transformers generalize differently from information stored in context vs in weights

    Authors: Stephanie C. Y. Chan, Ishita Dasgupta, Junkyung Kim, Dharshan Kumaran, Andrew K. Lampinen, Felix Hill

    Abstract: Transformer models can use two fundamentally different kinds of information: information stored in weights during training, and information provided ``in-context'' at inference time. In this work, we show that transformers exhibit different inductive biases in how they represent and generalize from the information in these two sources. In particular, we characterize whether they generalize via par… ▽ More

    Submitted 13 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

  18. arXiv:2207.07051  [pdf, other

    cs.CL cs.AI cs.LG

    Language models show human-like content effects on reasoning tasks

    Authors: Ishita Dasgupta, Andrew K. Lampinen, Stephanie C. Y. Chan, Hannah R. Sheahan, Antonia Creswell, Dharshan Kumaran, James L. McClelland, Felix Hill

    Abstract: Abstract reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human abstract reasoning is also imperfect. For example, human reasoning is affected by our real-world knowledge and beliefs, and shows notable "content effects"; humans reason more reliably when the semant… ▽ More

    Submitted 30 October, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

  19. arXiv:2205.11558  [pdf, other

    cs.AI

    Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines

    Authors: Sreejan Kumar, Carlos G. Correa, Ishita Dasgupta, Raja Marjieh, Michael Y. Hu, Robert D. Hawkins, Nathaniel D. Daw, Jonathan D. Cohen, Karthik Narasimhan, Thomas L. Griffiths

    Abstract: Strong inductive biases give humans the ability to quickly learn to perform a variety of tasks. Although meta-learning is a method to endow neural networks with useful inductive biases, agents trained by meta-learning may sometimes acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and programs… ▽ More

    Submitted 5 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), winner of Outstanding Paper Award

  20. arXiv:2204.02329  [pdf, other

    cs.CL cs.AI cs.LG

    Can language models learn from explanations in context?

    Authors: Andrew K. Lampinen, Ishita Dasgupta, Stephanie C. Y. Chan, Kory Matthewson, Michael Henry Tessler, Antonia Creswell, James L. McClelland, Jane X. Wang, Felix Hill

    Abstract: Language Models (LMs) can perform new tasks by adapting to a few in-context examples. For humans, explanations that connect examples to task principles can improve learning. We therefore investigate whether explanations of few-shot examples can help LMs. We annotate questions from 40 challenging tasks with answer explanations, and various matched control explanations. We evaluate how different typ… ▽ More

    Submitted 10 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Findings of EMNLP 2022

  21. arXiv:2204.01437  [pdf

    cs.AI cs.HC

    Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning

    Authors: Sreejan Kumar, Ishita Dasgupta, Nathaniel D. Daw, Jonathan D. Cohen, Thomas L. Griffiths

    Abstract: The ability to acquire abstract knowledge is a hallmark of human intelligence and is believed by many to be one of the core differences between humans and neural network models. Agents can be endowed with an inductive bias towards abstraction through meta-learning, where they are trained on a distribution of tasks that share some abstract structure that can be learned and applied. However, because… ▽ More

    Submitted 3 March, 2023; v1 submitted 4 April, 2022; originally announced April 2022.

  22. arXiv:2203.14426  [pdf, other

    cs.NI

    An Information Centric Framework for Weather Sensing Data

    Authors: Robert Thompson, Eric Lyons, Ishita Dasgupta, Spyridon Mastorakis, Michael Zink, Susmit Shannigrahi

    Abstract: Weather sensing and forecasting has become increasingly accurate in the last decade thanks to high-resolution radars, efficient computational algorithms, and high-performance computing facilities. Through a distributed and federated network of radars, scientists can make high-resolution observations of the weather conditions on a scale that benefits public safety, commerce, transportation, and oth… ▽ More

    Submitted 27 March, 2022; originally announced March 2022.

    Comments: This paper has been accepted for publication by a workshop held in conjunction with the IEEE International Conference on Communications (ICC) 2022

  23. arXiv:2112.03753  [pdf, other

    cs.LG cs.AI stat.ML

    Tell me why! Explanations support learning relational and causal structure

    Authors: Andrew K. Lampinen, Nicholas A. Roy, Ishita Dasgupta, Stephanie C. Y. Chan, Allison C. Tam, James L. McClelland, Chen Yan, Adam Santoro, Neil C. Rabinowitz, Jane X. Wang, Felix Hill

    Abstract: Inferring the abstract relational and causal structure of the world is a major challenge for reinforcement-learning (RL) agents. For humans, language--particularly in the form of explanations--plays a considerable role in overcoming this challenge. Here, we show that language can play a similar role for deep RL agents in complex environments. While agents typically struggle to acquire relational a… ▽ More

    Submitted 25 May, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

    Comments: ICML 2022; 23 pages

    ACM Class: I.2.6

  24. arXiv:2110.04328  [pdf, other

    cs.LG

    Distinguishing rule- and exemplar-based generalization in learning systems

    Authors: Ishita Dasgupta, Erin Grant, Thomas L. Griffiths

    Abstract: Machine learning systems often do not share the same inductive biases as humans and, as a result, extrapolate or generalize in ways that are inconsistent with our expectations. The trade-off between exemplar- and rule-based generalization has been studied extensively in cognitive psychology; in this work, we present a protocol inspired by these experimental approaches to probe the inductive biases… ▽ More

    Submitted 17 June, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: To appear at the 39th International Conference on Machine Learning (ICML 2022)

  25. arXiv:2107.07013  [pdf, other

    cs.CV

    Passive Attention in Artificial Neural Networks Predicts Human Visual Selectivity

    Authors: Thomas A. Langlois, H. Charles Zhao, Erin Grant, Ishita Dasgupta, Thomas L. Griffiths, Nori Jacoby

    Abstract: Developments in machine learning interpretability techniques over the past decade have provided new tools to observe the image regions that are most informative for classification and localization in artificial neural networks (ANNs). Are the same regions similarly informative to human observers? Using data from 79 new experiments and 7,810 participants, we show that passive attention techniques r… ▽ More

    Submitted 31 October, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

  26. arXiv:2105.07197  [pdf, other

    cs.CV

    Are Convolutional Neural Networks or Transformers more like human vision?

    Authors: Shikhar Tuli, Ishita Dasgupta, Erin Grant, Thomas L. Griffiths

    Abstract: Modern machine learning models for computer vision exceed humans in accuracy on specific visual recognition tasks, notably on datasets like ImageNet. However, high accuracy can be achieved in many ways. The particular decision function found by a machine learning system is determined not only by the data to which the system is exposed, but also the inductive biases of the model, which are typicall… ▽ More

    Submitted 1 July, 2021; v1 submitted 15 May, 2021; originally announced May 2021.

    Comments: Accepted at CogSci 2021. Source code and fine-tuned models are available at https://github.com/shikhartuli/cnn_txf_bias

  27. arXiv:2010.02317  [pdf, other

    cs.LG cs.AI

    Meta-Learning of Structured Task Distributions in Humans and Machines

    Authors: Sreejan Kumar, Ishita Dasgupta, Jonathan D. Cohen, Nathaniel D. Daw, Thomas L. Griffiths

    Abstract: In recent years, meta-learning, in which a model is trained on a family of tasks (i.e. a task distribution), has emerged as an approach to training neural networks to perform tasks that were previously assumed to require structured representations, making strides toward closing the gap between humans and machines. However, we argue that evaluating meta-learning remains a challenge, and can miss wh… ▽ More

    Submitted 18 March, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Accepted at ICLR 2021

  28. arXiv:1909.05885  [pdf, other

    cs.CL cs.LG stat.ML

    Analyzing machine-learned representations: A natural language case study

    Authors: Ishita Dasgupta, Demi Guo, Samuel J. Gershman, Noah D. Goodman

    Abstract: As modern deep networks become more complex, and get closer to human-like capabilities in certain domains, the question arises of how the representations and decision rules they learn compare to the ones in humans. In this work, we study representations of sentences in one such artificial system for natural language processing. We first present a diagnostic test dataset to examine the degree of ab… ▽ More

    Submitted 12 September, 2019; originally announced September 2019.

    Comments: This article supersedes a previous article arXiv:1802.04302

  29. arXiv:1901.08162  [pdf, other

    cs.LG cs.AI stat.ML

    Causal Reasoning from Meta-reinforcement Learning

    Authors: Ishita Dasgupta, Jane Wang, Silvia Chiappa, Jovana Mitrovic, Pedro Ortega, David Raposo, Edward Hughes, Peter Battaglia, Matthew Botvinick, Zeb Kurth-Nelson

    Abstract: Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel… ▽ More

    Submitted 23 January, 2019; originally announced January 2019.

  30. arXiv:1802.04302  [pdf, other

    cs.CL stat.ML

    Evaluating Compositionality in Sentence Embeddings

    Authors: Ishita Dasgupta, Demi Guo, Andreas Stuhlmüller, Samuel J. Gershman, Noah D. Goodman

    Abstract: An important challenge for human-like AI is compositional semantics. Recent research has attempted to address this by using deep neural networks to learn vector space embeddings of sentences, which then serve as input to other tasks. We present a new dataset for one such task, `natural language inference' (NLI), that cannot be solved using only word-level knowledge and requires some compositionali… ▽ More

    Submitted 17 May, 2018; v1 submitted 12 February, 2018; originally announced February 2018.