Skip to main content

Showing 1–19 of 19 results for author: Kottur, S

  1. arXiv:2305.13721  [pdf, other

    cs.CL cs.AI

    Continual Dialogue State Tracking via Example-Guided Question Answering

    Authors: Hyundong Cho, Andrea Madotto, Zhaojiang Lin, Khyathi Raghavi Chandu, Satwik Kottur, Jing Xu, Jonathan May, Chinnadhurai Sankar

    Abstract: Dialogue systems are frequently updated to accommodate new services, but naively updating them by continually training with data for new services in diminishing performance on previously learnt services. Motivated by the insight that dialogue state tracking (DST), a crucial component of dialogue systems that estimates the user's goal as a conversation proceeds, is a simple natural language underst… ▽ More

    Submitted 14 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 11 pages, EMNLP 2023

  2. arXiv:2303.16406  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Hierarchical Video-Moment Retrieval and Step-Captioning

    Authors: Abhay Zala, Jaemin Cho, Satwik Kottur, Xilun Chen, Barlas Oğuz, Yasher Mehdad, Mohit Bansal

    Abstract: There is growing interest in searching for information from large video corpora. Prior works have studied relevant tasks, such as text-based video retrieval, moment retrieval, video summarization, and video captioning in isolation, without an end-to-end setup that can jointly search from video corpora and generate summaries. Such an end-to-end setup would allow for many interesting applications, e… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 (15 pages; the first two authors contributed equally; Project website: https://hirest-cvpr2023.github.io)

  3. arXiv:2211.08462  [pdf, other

    cs.CL

    Navigating Connected Memories with a Task-oriented Dialog System

    Authors: Seungwhan Moon, Satwik Kottur, Alborz Geramifard, Babak Damavandi

    Abstract: Recent years have seen an increasing trend in the volume of personal media captured by users, thanks to the advent of smartphones and smart glasses, resulting in large media collections. Despite conversation being an intuitive human-computer interface, current efforts focus mostly on single-shot natural language based media retrieval to aid users query their media and re-live their memories. This… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: 13 pages, 3 tables, 9 figures

  4. arXiv:2211.03940  [pdf, other

    cs.CL

    Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation

    Authors: Satwik Kottur, Seungwhan Moon, Aram H. Markosyan, Hardik Shah, Babak Damavandi, Alborz Geramifard

    Abstract: People capture photos and videos to relive and share memories of personal significance. Recently, media montages (stories) have become a popular mode of sharing these memories due to their intuitive and powerful storytelling capabilities. However, creating such montages usually involves a lot of manual searches, clicks, and selections that are time-consuming and cumbersome, adversely affecting use… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: 8 pages, 6 figures, 2 tables

  5. arXiv:2112.08351  [pdf, other

    cs.CL

    Database Search Results Disambiguation for Task-Oriented Dialog Systems

    Authors: Kun Qian, Ahmad Beirami, Satwik Kottur, Shahin Shayandeh, Paul Crook, Alborz Geramifard, Zhou Yu, Chinnadhurai Sankar

    Abstract: As task-oriented dialog systems are becoming increasingly popular in our lives, more realistic tasks have been proposed and explored. However, new practical challenges arise. For instance, current dialog systems cannot effectively handle multiple search results when querying a database, due to the lack of such scenarios in existing public datasets. In this paper, we propose Database Search Result… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

  6. arXiv:2110.11205  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Robustness through Data Augmentation Loss Consistency

    Authors: Tianjian Huang, Shaunak Halbe, Chinnadhurai Sankar, Pooyan Amini, Satwik Kottur, Alborz Geramifard, Meisam Razaviyayn, Ahmad Beirami

    Abstract: While deep learning through empirical risk minimization (ERM) has succeeded at achieving human-level performance at a variety of complex tasks, ERM is not robust to distribution shifts or adversarial attacks. Synthetic data augmentation followed by empirical risk minimization (DA-ERM) is a simple and widely used solution to improve robustness in ERM. In addition, consistency regularization can be… ▽ More

    Submitted 24 January, 2023; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: 40 pages

  7. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  8. arXiv:2104.08667  [pdf, other

    cs.CL cs.AI

    SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

    Authors: Satwik Kottur, Seungwhan Moon, Alborz Geramifard, Babak Damavandi

    Abstract: Next generation task-oriented dialog systems need to understand conversational contexts with their perceived surroundings, to effectively help users in the real-world multimodal environment. Existing task-oriented dialog datasets aimed towards virtual assistance fall short and do not situate the dialog in the user's multimodal context. To overcome, we present a new dataset for Situated and Interac… ▽ More

    Submitted 20 October, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: 10 pages, 7 figures, 5 tables

  9. arXiv:2101.00151  [pdf, other

    cs.AI cs.CL cs.LG

    DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue

    Authors: Hung Le, Chinnadhurai Sankar, Seungwhan Moon, Ahmad Beirami, Alborz Geramifard, Satwik Kottur

    Abstract: A video-grounded dialogue system is required to understand both dialogue, which contains semantic dependencies from turn to turn, and video, which contains visual cues of spatial and temporal scene variations. Building such dialogue systems is a challenging problem, involving various reasoning types on both visual and language inputs. Existing benchmarks do not have enough annotations to thoroughl… ▽ More

    Submitted 14 June, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

    Comments: 20 pages, 14 figures, 8 tables

    Journal ref: Association for Computational Linguistics (2021)

  10. arXiv:2011.06486  [pdf, ps, other

    cs.CL

    Overview of the Ninth Dialog System Technology Challenge: DSTC9

    Authors: Chulaka Gunasekara, Seokhwan Kim, Luis Fernando D'Haro, Abhinav Rastogi, Yun-Nung Chen, Mihail Eric, Behnam Hedayatnia, Karthik Gopalakrishnan, Yang Liu, Chao-Wei Huang, Dilek Hakkani-Tür, Jinchao Li, Qi Zhu, Lingxiao Luo, Lars Liden, Kaili Huang, Shahin Shayandeh, Runze Liang, Baolin Peng, Zheng Zhang, Swadheen Shukla, Minlie Huang, Jianfeng Gao, Shikib Mehri, Yulan Feng , et al. (14 additional authors not shown)

    Abstract: This paper introduces the Ninth Dialog System Technology Challenge (DSTC-9). This edition of the DSTC focuses on applying end-to-end dialog technologies for four distinct tasks in dialog systems, namely, 1. Task-oriented dialog Modeling with unstructured knowledge access, 2. Multi-domain task-oriented dialog, 3. Interactive evaluation of dialog, and 4. Situated interactive multi-modal dialog. This… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

  11. arXiv:2006.01460  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Situated and Interactive Multimodal Conversations

    Authors: Seungwhan Moon, Satwik Kottur, Paul A. Crook, Ankita De, Shivani Poddar, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard

    Abstract: Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take mult… ▽ More

    Submitted 10 November, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

    Comments: 20 pages, 5 figures, 11 tables, accepted to COLING 2020

  12. arXiv:2003.01848  [pdf, other

    cs.AI cs.CL cs.LG cs.MA

    On Emergent Communication in Competitive Multi-Agent Teams

    Authors: Paul Pu Liang, Jeffrey Chen, Ruslan Salakhutdinov, Louis-Philippe Morency, Satwik Kottur

    Abstract: Several recent works have found the emergence of grounded compositional language in the communication protocols developed by mostly cooperative multi-agent systems when learned end-to-end to maximize performance on a downstream task. However, human populations learn to solve complex tasks involving communicative behaviors not only in fully cooperative settings but also in scenarios where competiti… ▽ More

    Submitted 16 July, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: AAMAS 2020, code: https://github.com/pliang279/Competitive-Emergent-Communication

  13. arXiv:1903.03166  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog

    Authors: Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach

    Abstract: Visual Dialog is a multimodal task of answering a sequence of questions grounded in an image, using the conversation history as context. It entails challenges in vision, language, reasoning, and grounding. However, studying these subtasks in isolation on large, real datasets is infeasible as it requires prohibitively-expensive complete annotation of the 'state' of all images and dialogs. We deve… ▽ More

    Submitted 18 September, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

    Comments: 13 pages, 11 figures, 3 tables, accepted as a short paper at NAACL 2019

  14. arXiv:1809.01816  [pdf, other

    cs.CV cs.AI cs.CL

    Visual Coreference Resolution in Visual Dialog using Neural Module Networks

    Authors: Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach

    Abstract: Visual dialog entails answering a series of questions grounded in an image, using dialog history as context. In addition to the challenges found in visual question answering (VQA), which can be seen as one-round dialog, visual dialog encompasses several more. We focus on one such problem called visual coreference resolution that involves determining which words, typically noun phrases and pronouns… ▽ More

    Submitted 6 September, 2018; originally announced September 2018.

    Comments: ECCV 2018 + results on VisDial v1.0 dataset

  15. arXiv:1706.08502  [pdf, other

    cs.CL cs.AI cs.CV

    Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog

    Authors: Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra

    Abstract: A number of recent works have proposed techniques for end-to-end learning of communication protocols among cooperative multi-agent populations, and have simultaneously found the emergence of grounded human-interpretable language in the protocols developed by the agents, all learned without any human supervision! In this paper, using a Task and Tell reference game between two agents as a testbed,… ▽ More

    Submitted 20 August, 2017; v1 submitted 26 June, 2017; originally announced June 2017.

    Comments: 9 pages, 7 figures, 2 tables, accepted at EMNLP 2017 as short paper

  16. arXiv:1703.06585  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

    Authors: Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra

    Abstract: We introduce the first goal-driven training for visual question answering and dialog agents. Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images. We use deep reinforcement learning (RL) to learn the policies of these agents end-to-end -- from pixel… ▽ More

    Submitted 21 March, 2017; v1 submitted 19 March, 2017; originally announced March 2017.

    Comments: 11 pages, 4 figures, 2 tables, webpage: http://visualdialog.org/

  17. arXiv:1703.06114  [pdf, other

    cs.LG stat.ML

    Deep Sets

    Authors: Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, Alexander Smola

    Abstract: We study the problem of designing models for machine learning tasks defined on \emph{sets}. In contrast to traditional approach of operating on fixed dimensional vectors, we consider objective functions defined on sets that are invariant to permutations. Such problems are widespread, ranging from estimation of population statistics \cite{poczos13aistats}, to anomaly detection in piezometer data of… ▽ More

    Submitted 14 April, 2018; v1 submitted 10 March, 2017; originally announced March 2017.

    Comments: NIPS 2017

  18. arXiv:1611.08669  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Visual Dialog

    Authors: Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra

    Abstract: We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a… ▽ More

    Submitted 1 August, 2017; v1 submitted 26 November, 2016; originally announced November 2016.

    Comments: 23 pages, 18 figures, CVPR 2017 camera-ready, results on VisDial v0.9 dataset, Webpage: http://visualdialog.org

  19. arXiv:1511.07067  [pdf, other

    cs.CV cs.CL

    Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes

    Authors: Satwik Kottur, Ramakrishna Vedantam, José M. F. Moura, Devi Parikh

    Abstract: We propose a model to learn visually grounded word embeddings (vis-w2v) to capture visual notions of semantic relatedness. While word embeddings trained using text have been extremely successful, they cannot uncover notions of semantic relatedness implicit in our visual world. For instance, although "eats" and "stares at" seem unrelated in text, they share semantics visually. When people are eatin… ▽ More

    Submitted 29 June, 2016; v1 submitted 22 November, 2015; originally announced November 2015.

    Comments: 15 pages, 11 figures