subscribe to arXiv mailings

Navigating to Success in Multi-Modal Human-Robot Collaboration: Analysis and Corpus Release

Authors: Stephanie M. Lukin, Kimberly A. Pollard, Claire Bonial, Taylor Hudson, Ron Arstein, Clare Voss, David Traum

Abstract: Human-guided robotic exploration is a useful approach to gathering information at remote locations, especially those that might be too risky, inhospitable, or inaccessible for humans. Maintaining common ground between the remotely-located partners is a challenge, one that can be facilitated by multi-modal communication. In this paper, we explore how participants utilized multiple modalities to inv… ▽ More Human-guided robotic exploration is a useful approach to gathering information at remote locations, especially those that might be too risky, inhospitable, or inaccessible for humans. Maintaining common ground between the remotely-located partners is a challenge, one that can be facilitated by multi-modal communication. In this paper, we explore how participants utilized multiple modalities to investigate a remote location with the help of a robotic partner. Participants issued spoken natural language instructions and received from the robot: text-based feedback, continuous 2D LIDAR mapping, and upon-request static photographs. We noticed that different strategies were adopted in terms of use of the modalities, and hypothesize that these differences may be correlated with success at several exploration sub-tasks. We found that requesting photos may have improved the identification and counting of some key entities (doorways in particular) and that this strategy did not hinder the amount of overall area exploration. Future work with larger samples may reveal the effects of more nuanced photo and dialogue strategies, which can inform the training of robotic agents. Additionally, we announce the release of our unique multi-modal corpus of human-robot communication in an exploration context: SCOUT, the Situated Corpus on Understanding Transactions. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 7 pages, 3 figures

Journal ref: Proceedings of the 2023 IEEE Robot and Human Interactive Communication Conference

arXiv:2305.14331 [pdf, other]

What Else Do I Need to Know? The Effect of Background Information on Users' Reliance on QA Systems

Authors: Navita Goyal, Eleftheria Briakou, Amanda Liu, Connor Baumler, Claire Bonial, Jeffrey Micher, Clare R. Voss, Marine Carpuat, Hal Daumé III

Abstract: NLP systems have shown impressive performance at answering questions by retrieving relevant context. However, with the increasingly large models, it is impossible and often undesirable to constrain models' knowledge or reasoning to only the retrieved context. This leads to a mismatch between the information that the models access to derive the answer and the information that is available to the us… ▽ More NLP systems have shown impressive performance at answering questions by retrieving relevant context. However, with the increasingly large models, it is impossible and often undesirable to constrain models' knowledge or reasoning to only the retrieved context. This leads to a mismatch between the information that the models access to derive the answer and the information that is available to the user to assess the model predicted answer. In this work, we study how users interact with QA systems in the absence of sufficient information to assess their predictions. Further, we ask whether adding the requisite background helps mitigate users' over-reliance on predictions. Our study reveals that users rely on model predictions even in the absence of sufficient information needed to assess the model's correctness. Providing the relevant background, however, helps users better catch model errors, reducing over-reliance on incorrect predictions. On the flip side, background information also increases users' confidence in their accurate as well as inaccurate judgments. Our work highlights that supporting users' verification of QA predictions is an important, yet challenging, problem. △ Less

Submitted 25 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2303.14337 [pdf, other]

SmartBook: AI-Assisted Situation Report Generation for Intelligence Analysts

Authors: Revanth Gangi Reddy, Daniel Lee, Yi R. Fung, Khanh Duy Nguyen, Qi Zeng, Manling Li, Ziqi Wang, Clare Voss, Heng Ji

Abstract: Timely and comprehensive understanding of emerging events is crucial for effective decision-making; automating situation report generation can significantly reduce the time, effort, and cost for intelligence analysts. In this work, we identify intelligence analysts' practices and preferences for AI assistance in situation report generation to guide the design strategies for an effective, trust-bui… ▽ More Timely and comprehensive understanding of emerging events is crucial for effective decision-making; automating situation report generation can significantly reduce the time, effort, and cost for intelligence analysts. In this work, we identify intelligence analysts' practices and preferences for AI assistance in situation report generation to guide the design strategies for an effective, trust-building interface that aligns with their thought processes and needs. Next, we introduce SmartBook, an automated framework designed to generate situation reports from large volumes of news data, creating structured reports by automatically discovering event-related strategic questions. These reports include multiple hypotheses (claims), summarized and grounded to sources with factual evidence, to promote in-depth situation understanding. Our comprehensive evaluation of SmartBook, encompassing a user study alongside a content review with an editing study, reveals SmartBook's effectiveness in generating accurate and relevant situation reports. Qualitative evaluations indicate over 80% of questions probe for strategic information, and over 90% of summaries produce tactically useful content, being consistently favored over summaries from a large language model integrated with web search. The editing study reveals that minimal information is removed from the generated text (under 2.5%), suggesting that SmartBook provides analysts with a valuable foundation for situation reports △ Less

Submitted 27 May, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: Preprint

arXiv:2303.08774 [pdf, other]

GPT-4 Technical Report

Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4. △ Less

Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 100 pages; updated authors list; fixed author names and added citation

arXiv:2210.12582 [pdf, other]

Language Model Pre-Training with Sparse Latent Typing

Authors: Liliang Ren, Zixuan Zhang, Han Wang, Clare R. Voss, Chengxiang Zhai, Heng Ji

Abstract: Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by propo… ▽ More Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at: https://github.com/renll/SparseLT. △ Less

Submitted 26 October, 2022; v1 submitted 22 October, 2022; originally announced October 2022.

Comments: EMNLP 2022 (Oral)

arXiv:2104.06344 [pdf, other]

The Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction

Authors: Manling Li, Sha Li, Zhenhailong Wang, Lifu Huang, Kyunghyun Cho, Heng Ji, Jiawei Han, Clare Voss

Abstract: Event schemas encode knowledge of stereotypical structures of events and their connections. As events unfold, schemas are crucial to act as a scaffolding. Previous work on event schema induction focuses either on atomic events or linear temporal event sequences, ignoring the interplay between events via arguments and argument relations. We introduce a new concept of Temporal Complex Event Schema:… ▽ More Event schemas encode knowledge of stereotypical structures of events and their connections. As events unfold, schemas are crucial to act as a scaffolding. Previous work on event schema induction focuses either on atomic events or linear temporal event sequences, ignoring the interplay between events via arguments and argument relations. We introduce a new concept of Temporal Complex Event Schema: a graph-based schema representation that encompasses events, arguments, temporal connections and argument relations. In addition, we propose a Temporal Event Graph Model that predicts event instances following the temporal complex event schema. To build and evaluate such schemas, we release a new schema learning corpus containing 6,399 documents accompanied with event graphs, and we have manually constructed gold-standard schemas. Intrinsic evaluations based on schema matching and instance graph perplexity, prove the superior quality of our probabilistic graph schema library compared to linear representations. Extrinsic evaluation on schema-guided future event prediction further demonstrates the predictive power of our event graph model, significantly outperforming human schemas and baselines by more than 23.8% on HITS@1. △ Less

Submitted 29 April, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

arXiv:2102.12092 [pdf, other]

Zero-Shot Text-to-Image Generation

Authors: Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever

Abstract: Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and… ▽ More Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion. △ Less

Submitted 26 February, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

arXiv:2101.03477 [pdf]

Training Affective Computer Vision Models by Crowdsourcing Soft-Target Labels

Authors: Peter Washington, Onur Cezmi Mutlu, Emilie Leblanc, Aaron Kline, Cathy Hou, Brianna Chrisman, Nate Stockham, Kelley Paskov, Catalin Voss, Nick Haber, Dennis Wall

Abstract: Emotion classifiers traditionally predict discrete emotions. However, emotion expressions are often subjective, thus requiring a method to handle subjective labels. We explore the use of crowdsourcing to acquire reliable soft-target labels and evaluate an emotion detection classifier trained with these labels. We center our study on the Child Affective Facial Expression (CAFE) dataset, a gold stan… ▽ More Emotion classifiers traditionally predict discrete emotions. However, emotion expressions are often subjective, thus requiring a method to handle subjective labels. We explore the use of crowdsourcing to acquire reliable soft-target labels and evaluate an emotion detection classifier trained with these labels. We center our study on the Child Affective Facial Expression (CAFE) dataset, a gold standard collection of images depicting pediatric facial expressions along with 100 human labels per image. To test the feasibility of crowdsourcing to generate these labels, we used Microworkers to acquire labels for 207 CAFE images. We evaluate both unfiltered workers as well as workers selected through a short crowd filtration process. We then train two versions of a classifiers on soft-target CAFE labels using the original 100 annotations provided with the dataset: (1) a classifier trained with traditional one-hot encoded labels, and (2) a classifier trained with vector labels representing the distribution of CAFE annotator responses. We compare the resulting softmax output distributions of the two classifiers with a 2-sample independent t-test of L1 distances between the classifier's output probability distribution and the distribution of human labels. While agreement with CAFE is weak for unfiltered crowd workers, the filtered crowd agree with the CAFE labels 100% of the time for many emotions. While the F1-score for a one-hot encoded classifier is much higher (94.33% vs. 78.68%) with respect to the ground truth CAFE labels, the output probability vector of the crowd-trained classifier more closely resembles the distribution of human labels (t=3.2827, p=0.0014). Reporting an emotion probability distribution that accounts for the subjectivity of human interpretation. Crowdsourcing, including a sufficient filtering mechanism, is a feasible solution for acquiring soft-target labels. △ Less

Submitted 22 September, 2021; v1 submitted 10 January, 2021; originally announced January 2021.

arXiv:2101.01312 [pdf, other]

doi 10.1145/3437801.3441616

An Ownership Policy and Deadlock Detector for Promises

Authors: Caleb Voss, Vivek Sarkar

Abstract: Task-parallel programs often enjoy deadlock freedom under certain restrictions, such as the use of structured join operations, as in Cilk and X10, or the use of asynchronous task futures together with deadlock-avoiding policies such as Known Joins or Transitive Joins. However, the promise, a popular synchronization primitive for parallel tasks, does not enjoy deadlock-freedom guarantees. Promises… ▽ More Task-parallel programs often enjoy deadlock freedom under certain restrictions, such as the use of structured join operations, as in Cilk and X10, or the use of asynchronous task futures together with deadlock-avoiding policies such as Known Joins or Transitive Joins. However, the promise, a popular synchronization primitive for parallel tasks, does not enjoy deadlock-freedom guarantees. Promises can exhibit deadlock-like bugs; however, the concept of a deadlock is not currently well-defined for promises. To address these challenges, we propose an ownership semantics in which each promise is associated to the task which currently intends to fulfill it. Ownership immediately enables the identification of bugs in which a task fails to fulfill a promise for which it is responsible. Ownership further enables the discussion of deadlock cycles among tasks and promises and allows us to introduce a robust definition of deadlock-like bugs for promises. Cycle detection in this context is non-trivial because it is concurrent with changes in promise ownership. We provide a lock-free algorithm for precise runtime deadlock detection. We show how to obtain the memory consistency criteria required for the correctness of our algorithm under TSO and the Java and C++ memory models. An evaluation compares the execution time and memory usage overheads of our detection algorithm on benchmark programs relative to an unverified baseline. Our detector exhibits a 12% (1.12$\times$) geometric mean time overhead and a 6% (1.06$\times$) geometric mean memory overhead, which are smaller overheads than in past approaches to deadlock cycle detection. △ Less

Submitted 4 January, 2021; originally announced January 2021.

Journal ref: Principles and Practice of Parallel Programming, 2021, ACM, pp. 348-361

arXiv:2012.08678 [pdf]

Improved Digital Therapy for Developmental Pediatrics Using Domain-Specific Artificial Intelligence: Machine Learning Study

Authors: Peter Washington, Haik Kalantarian, John Kent, Arman Husic, Aaron Kline, Emilie Leblanc, Cathy Hou, Onur Cezmi Mutlu, Kaitlyn Dunlap, Yordan Penev, Maya Varma, Nate Tyler Stockham, Brianna Chrisman, Kelley Paskov, Min Woo Sun, Jae-Yoon Jung, Catalin Voss, Nick Haber, Dennis Paul Wall

Abstract: Background: Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces. Objective: We designed a strategy to gamify the collection and labeling of child emot… ▽ More Background: Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces. Objective: We designed a strategy to gamify the collection and labeling of child emotion-enriched images to boost the performance of automatic child emotion recognition models to a level closer to what will be needed for digital health care approaches. Methods: We leveraged our prototype therapeutic smartphone game, GuessWhat, which was designed in large part for children with developmental and behavioral conditions, to gamify the secure collection of video data of children expressing a variety of emotions prompted by the game. Independently, we created a secure web interface to gamify the human labeling effort, called HollywoodSquares, tailored for use by any qualified labeler. We gathered and labeled 2155 videos, 39,968 emotion frames, and 106,001 labels on all images. With this drastically expanded pediatric emotion-centric database (>30 times larger than existing public pediatric emotion data sets), we trained a convolutional neural network (CNN) computer vision classifier of happy, sad, surprised, fearful, angry, disgust, and neutral expressions evoked by children. Results: The classifier achieved a 66.9% balanced accuracy and 67.4% F1-score on the entirety of the Child Affective Facial Expression (CAFE) as well as a 79.1% balanced accuracy and 78% F1-score on CAFE Subset A, a subset containing at least 60% human agreement on emotions labels. This performance is at least 10% higher than all previously developed classifiers evaluated against CAFE, the best of which reached a 56% balanced accuracy even when combining "anger" and "disgust" into a single class. △ Less

Submitted 3 June, 2024; v1 submitted 15 December, 2020; originally announced December 2020.

Journal ref: JMIR pediatrics and parenting 5.2 (2022): e26760

arXiv:2009.01325 [pdf, other]

Learning to summarize from human feedback

Authors: Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano

Abstract: As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about -- summary quality. In this work, we show that it is possible t… ▽ More As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about -- summary quality. In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone. Our models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning. We conduct extensive analyses to understand our human feedback dataset and fine-tuned models We establish that our reward model generalizes to new datasets, and that optimizing our reward model results in better summaries than optimizing ROUGE according to humans. We hope the evidence from our paper motivates machine learning researchers to pay closer attention to how their training loss affects the model behavior they actually want. △ Less

Submitted 15 February, 2022; v1 submitted 2 September, 2020; originally announced September 2020.

Comments: NeurIPS 2020

arXiv:2007.00576 [pdf, other]

COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation

Authors: Qingyun Wang, Manling Li, Xuan Wang, Nikolaus Parulian, Guangxing Han, Jiawei Ma, Jingxuan Tu, Ying Lin, Haoran Zhang, Weili Liu, Aabhas Chauhan, Yingjun Guan, Bangzheng Li, Ruisong Li, Xiangchen Song, Yi R. Fung, Heng Ji, Jiawei Han, Shih-Fu Chang, James Pustejovsky, Jasmine Rah, David Liem, Ahmed Elsayed, Martha Palmer, Clare Voss , et al. (2 additional authors not shown)

Abstract: To combat COVID-19, both clinicians and scientists need to digest vast amounts of relevant biomedical knowledge in scientific literature to understand the disease mechanism and related biological functions. We have developed a novel and comprehensive knowledge discovery framework, COVID-KG to extract fine-grained multimedia knowledge elements (entities and their visual chemical structures, relatio… ▽ More To combat COVID-19, both clinicians and scientists need to digest vast amounts of relevant biomedical knowledge in scientific literature to understand the disease mechanism and related biological functions. We have developed a novel and comprehensive knowledge discovery framework, COVID-KG to extract fine-grained multimedia knowledge elements (entities and their visual chemical structures, relations, and events) from scientific literature. We then exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation, using drug repurposing as a case study. Our framework also provides detailed contextual sentences, subfigures, and knowledge subgraphs as evidence. △ Less

Submitted 11 May, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: 12 pages, Accepted by Proceedings of 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics System Demonstrations, for resources see http://blender.cs.illinois.edu/covid19/, for video see http://159.89.180.81/demo/covid/Covid-KG_DemoVideo.mp4, for slides see https://eaglew.github.io/files/Covid-KG_DemoVideo_with_ethics.pdf

arXiv:2004.14281 [pdf]

A Wearable Social Interaction Aid for Children with Autism

Authors: Nick Haber, Catalin Voss, Jena Daniels, Peter Washington, Azar Fazel, Aaron Kline, Titas De, Terry Winograd, Carl Feinstein, Dennis P. Wall

Abstract: With most recent estimates giving an incidence rate of 1 in 68 children in the United States, the autism spectrum disorder (ASD) is a growing public health crisis. Many of these children struggle to make eye contact, recognize facial expressions, and engage in social interactions. Today the standard for treatment of the core autism-related deficits focuses on a form of behavior training known as A… ▽ More With most recent estimates giving an incidence rate of 1 in 68 children in the United States, the autism spectrum disorder (ASD) is a growing public health crisis. Many of these children struggle to make eye contact, recognize facial expressions, and engage in social interactions. Today the standard for treatment of the core autism-related deficits focuses on a form of behavior training known as Applied Behavioral Analysis. To address perceived deficits in expression recognition, ABA approaches routinely involve the use of prompts such as flash cards for repetitive emotion recognition training via memorization. These techniques must be administered by trained practitioners and often at clinical centers that are far outnumbered by and out of reach from the many children and families in need of attention. Waitlists for access are up to 18 months long, and this wait may lead to children regressing down a path of isolation that worsens their long-term prognosis. There is an urgent need to innovate new methods of care delivery that can appropriately empower caregivers of children at risk or with a diagnosis of autism, and that capitalize on mobile tools and wearable devices for use outside of clinical settings. △ Less

Submitted 19 April, 2020; originally announced April 2020.

arXiv:2002.06581 [pdf]

Superpower Glass: Delivering Unobtrusive Real-time Social Cues in Wearable Systems

Authors: Catalin Voss, Peter Washington, Nick Haber, Aaron Kline, Jena Daniels, Azar Fazel, Titas De, Beth McCarthy, Carl Feinstein, Terry Winograd, Dennis Wall

Abstract: We have developed a system for automatic facial expression recognition, which runs on Google Glass and delivers real-time social cues to the wearer. We evaluate the system as a behavioral aid for children with Autism Spectrum Disorder (ASD), who can greatly benefit from real-time non-invasive emotional cues and are more sensitive to sensory input than neurotypically developing children. In additio… ▽ More We have developed a system for automatic facial expression recognition, which runs on Google Glass and delivers real-time social cues to the wearer. We evaluate the system as a behavioral aid for children with Autism Spectrum Disorder (ASD), who can greatly benefit from real-time non-invasive emotional cues and are more sensitive to sensory input than neurotypically developing children. In addition, we present a mobile application that enables users of the wearable aid to review their videos along with auto-curated emotional information on the video playback bar. This integrates our learning aid into the context of behavioral therapy. Expanding on our previous work describing in-lab trials, this paper presents our system and application-level design decisions in depth as well as the interface learnings gathered during the use of the system by multiple children with ASD in an at-home iterative trial. △ Less

Submitted 16 February, 2020; originally announced February 2020.

Comments: UbiComp ISWC 2016

arXiv:2002.04263 [pdf]

Designing a Holistic At-Home Learning Aid for Autism

Authors: Catalin Voss, Nick Haber, Peter Washington, Aaron Kline, Beth McCarthy, Jena Daniels, Azar Fazel, Titas De, Carl Feinstein, Terry Winograd, Dennis Wall

Abstract: In recent years, much focus has been put on employing technology to make novel behavioural aids for those with autism. Most of these are digital adaptations of tools used in standard behavioural therapy to enforce normative skills. These digital counterparts are often used outside of both the larger therapeutic context and the real world, in which the learned skills might apply. To address this, w… ▽ More In recent years, much focus has been put on employing technology to make novel behavioural aids for those with autism. Most of these are digital adaptations of tools used in standard behavioural therapy to enforce normative skills. These digital counterparts are often used outside of both the larger therapeutic context and the real world, in which the learned skills might apply. To address this, we are designing a system of automatic expression recognition on wearable devices that integrates directly into the families daily social interactions, to give children and their caregivers the tools and information they need to design their own holistic therapy. In order to develop a tool that will be truly useful to families, we proactively include children with autism and their families as co-designers in the development process. By providing an app and interface with interchangeable social feedback options, we aim to produce a framework for therapy that folds into their daily lives, tailored to their specific needs. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Comments: Conference Workshop

Journal ref: CHI 2016 - Autism Technology Workshop

arXiv:1910.05624 [pdf, other]

A Research Platform for Multi-Robot Dialogue with Humans

Authors: Matthew Marge, Stephen Nogar, Cory J. Hayes, Stephanie M. Lukin, Jesse Bloecker, Eric Holder, Clare Voss

Abstract: This paper presents a research platform that supports spoken dialogue interaction with multiple robots. The demonstration showcases our crafted MultiBot testing scenario in which users can verbally issue search, navigate, and follow instructions to two robotic teammates: a simulated ground robot and an aerial robot. This flexible language and robotic platform takes advantage of existing tools for… ▽ More This paper presents a research platform that supports spoken dialogue interaction with multiple robots. The demonstration showcases our crafted MultiBot testing scenario in which users can verbally issue search, navigate, and follow instructions to two robotic teammates: a simulated ground robot and an aerial robot. This flexible language and robotic platform takes advantage of existing tools for speech recognition and dialogue management that are compatible with new domains, and implements an inter-agent communication protocol (tactical behavior specification), where verbal instructions are encoded for tasks assigned to the appropriate robot. △ Less

Submitted 12 October, 2019; originally announced October 2019.

Comments: Accepted for publication at NAACL 2019; also presented at AI-HRI 2019 (arXiv:1909.04812)

Report number: AI-HRI/2019/05

arXiv:1906.00038 [pdf, other]

Visual Understanding and Narration: A Deeper Understanding and Explanation of Visual Scenes

Authors: Stephanie M. Lukin, Claire Bonial, Clare R. Voss

Abstract: We describe the task of Visual Understanding and Narration, in which a robot (or agent) generates text for the images that it collects when navigating its environment, by answering open-ended questions, such as 'what happens, or might have happened, here?' We describe the task of Visual Understanding and Narration, in which a robot (or agent) generates text for the images that it collects when navigating its environment, by answering open-ended questions, such as 'what happens, or might have happened, here?' △ Less

Submitted 23 September, 2019; v1 submitted 31 May, 2019; originally announced June 2019.

Comments: 2-page extended abstract, presented at the Workshop on Shortcomings in Vision and Language (SiVL), 2019, at the North American Association for Computational Linguistics (NAACL)

arXiv:1810.02017 [pdf, other]

Balancing Efficiency and Coverage in Human-Robot Dialogue Collection

Authors: Matthew Marge, Claire Bonial, Stephanie Lukin, Cory Hayes, Ashley Foots, Ron Artstein, Cassidy Henry, Kimberly Pollard, Carla Gordon, Felix Gervits, Anton Leuski, Susan Hill, Clare Voss, David Traum

Abstract: We describe a multi-phased Wizard-of-Oz approach to collecting human-robot dialogue in a collaborative search and navigation task. The data is being used to train an initial automated robot dialogue system to support collaborative exploration tasks. In the first phase, a wizard freely typed robot utterances to human participants. For the second phase, this data was used to design a GUI that includ… ▽ More We describe a multi-phased Wizard-of-Oz approach to collecting human-robot dialogue in a collaborative search and navigation task. The data is being used to train an initial automated robot dialogue system to support collaborative exploration tasks. In the first phase, a wizard freely typed robot utterances to human participants. For the second phase, this data was used to design a GUI that includes buttons for the most common communications, and templates for communications with varying parameters. Comparison of the data gathered in these phases show that the GUI enabled a faster pace of dialogue while still maintaining high coverage of suitable responses, enabling more efficient targeted data collection, and improvements in natural language understanding using GUI-collected data. As a promising first step towards interactive learning, this work shows that our approach enables the collection of useful training data for navigation-based HRI tasks. △ Less

Submitted 7 October, 2018; v1 submitted 3 October, 2018; originally announced October 2018.

Comments: Presented at AI-HRI AAAI-FSS, 2018 (arXiv:1809.06606)

Report number: AI-HRI/2018/01

arXiv:1807.08077 [pdf, other]

A Pipeline for Creative Visual Storytelling

Authors: Stephanie M. Lukin, Reginald Hobbs, Clare R. Voss

Abstract: Computational visual storytelling produces a textual description of events and interpretations depicted in a sequence of images. These texts are made possible by advances and cross-disciplinary approaches in natural language processing, generation, and computer vision. We define a computational creative visual storytelling as one with the ability to alter the telling of a story along three aspects… ▽ More Computational visual storytelling produces a textual description of events and interpretations depicted in a sequence of images. These texts are made possible by advances and cross-disciplinary approaches in natural language processing, generation, and computer vision. We define a computational creative visual storytelling as one with the ability to alter the telling of a story along three aspects: to speak about different environments, to produce variations based on narrative goals, and to adapt the narrative to the audience. These aspects of creative storytelling and their effect on the narrative have yet to be explored in visual storytelling. This paper presents a pipeline of task-modules, Object Identification, Single-Image Inferencing, and Multi-Image Narration, that serve as a preliminary design for building a creative visual storyteller. We have piloted this design for a sequence of images in an annotation task. We present and analyze the collected corpus and describe plans towards automation. △ Less

Submitted 20 July, 2018; originally announced July 2018.

Comments: Originally published in the Proceedings of the First Workshop on Storytelling (StoryNLP), 2018, at the North American Association for Computational Linguistics (NAACL)

arXiv:1807.08076 [pdf, ps, other]

Consequences and Factors of Stylistic Differences in Human-Robot Dialogue

Authors: Stephanie M. Lukin, Kimberly A. Pollard, Claire Bonial, Matthew Marge, Cassidy Henry, Ron Arstein, David Traum, Clare R. Voss

Abstract: This paper identifies stylistic differences in instruction-giving observed in a corpus of human-robot dialogue. Differences in verbosity and structure (i.e., single-intent vs. multi-intent instructions) arose naturally without restrictions or prior guidance on how users should speak with the robot. Different styles were found to produce different rates of miscommunication, and correlations were fo… ▽ More This paper identifies stylistic differences in instruction-giving observed in a corpus of human-robot dialogue. Differences in verbosity and structure (i.e., single-intent vs. multi-intent instructions) arose naturally without restrictions or prior guidance on how users should speak with the robot. Different styles were found to produce different rates of miscommunication, and correlations were found between style differences and individual user variation, trust, and interaction experience with the robot. Understanding potential consequences and factors that influence style can inform design of dialogue systems that are robust to natural variation from human users. △ Less

Submitted 20 July, 2018; originally announced July 2018.

Comments: Originally published in the Proceedings of the 19th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2018

arXiv:1807.08074 [pdf, other]

ScoutBot: A Dialogue System for Collaborative Navigation

Authors: Stephanie M. Lukin, Felix Gervits, Cory J. Hayes, Anton Leuski, Pooja Moolchandani, John G. Rogers III, Carlos Sanchez Amaro, Matthew Marge, Clare R. Voss, David Traum

Abstract: ScoutBot is a dialogue interface to physical and simulated robots that supports collaborative exploration of environments. The demonstration will allow users to issue unconstrained spoken language commands to ScoutBot. ScoutBot will prompt for clarification if the user's instruction needs additional input. It is trained on human-robot dialogue collected from Wizard-of-Oz experiments, where robot r… ▽ More ScoutBot is a dialogue interface to physical and simulated robots that supports collaborative exploration of environments. The demonstration will allow users to issue unconstrained spoken language commands to ScoutBot. ScoutBot will prompt for clarification if the user's instruction needs additional input. It is trained on human-robot dialogue collected from Wizard-of-Oz experiments, where robot responses were initiated by a human wizard in previous interactions. The demonstration will show a simulated ground robot (Clearpath Jackal) in a simulated environment supported by ROS (Robot Operating System). △ Less

Submitted 20 July, 2018; originally announced July 2018.

Comments: Originally published in the Proceedings of the Association for Computational Linguistics (ACL) 2018, System Demonstrations, 93-98

arXiv:1805.01818 [pdf, other]

Object and Text-guided Semantics for CNN-based Activity Recognition

Authors: Sungmin Eum, Christopher Reale, Heesung Kwon, Claire Bonial, Clare Voss

Abstract: Many previous methods have demonstrated the importance of considering semantically relevant objects for carrying out video-based human activity recognition, yet none of the methods have harvested the power of large text corpora to relate the objects and the activities to be transferred into learning a unified deep convolutional neural network. We present a novel activity recognition CNN which co-l… ▽ More Many previous methods have demonstrated the importance of considering semantically relevant objects for carrying out video-based human activity recognition, yet none of the methods have harvested the power of large text corpora to relate the objects and the activities to be transferred into learning a unified deep convolutional neural network. We present a novel activity recognition CNN which co-learns the object recognition task in an end-to-end multitask learning scheme to improve upon the baseline activity recognition performance. We further improve upon the multitask learning approach by exploiting a text-guided semantic space to select the most relevant objects with respect to the target activities. To the best of our knowledge, we are the first to investigate this approach. △ Less

Submitted 4 May, 2018; originally announced May 2018.

Comments: Submitted to ICIP 2018

arXiv:1710.06406 [pdf, other]

Laying Down the Yellow Brick Road: Development of a Wizard-of-Oz Interface for Collecting Human-Robot Dialogue

Authors: Claire Bonial, Matthew Marge, Ron artstein, Ashley Foots, Felix Gervits, Cory J. Hayes, Cassidy Henry, Susan G. Hill, Anton Leuski, Stephanie M. Lukin, Pooja Moolchandani, Kimberly A. Pollard, David Traum, Clare R. Voss

Abstract: We describe the adaptation and refinement of a graphical user interface designed to facilitate a Wizard-of-Oz (WoZ) approach to collecting human-robot dialogue data. The data collected will be used to develop a dialogue system for robot navigation. Building on an interface previously used in the development of dialogue systems for virtual agents and video playback, we add templates with open param… ▽ More We describe the adaptation and refinement of a graphical user interface designed to facilitate a Wizard-of-Oz (WoZ) approach to collecting human-robot dialogue data. The data collected will be used to develop a dialogue system for robot navigation. Building on an interface previously used in the development of dialogue systems for virtual agents and video playback, we add templates with open parameters which allow the wizard to quickly produce a wide variety of utterances. Our research demonstrates that this approach to data collection is viable as an intermediate step in developing a dialogue system for physical robots in remote locations from their users - a domain in which the human and robot need to regularly verify and update a shared understanding of the physical environment. We show that our WoZ interface and the fixed set of utterances and templates therein provide for a natural pace of dialogue with good coverage of the navigation domain. △ Less

Submitted 17 October, 2017; originally announced October 2017.

Comments: 7 pages, 2 figures, accepted for oral presentation at the Symposium on Natural Communication for Human-Robot Collaboration, AAAI Fall Symposium Series, November 9-11, 2017, https://www.aaai.org/ocs/index.php/FSS/FSS17

arXiv:1710.03357 [pdf, other]

Proofs as Relational Invariants of Synthesized Execution Grammars

Authors: Caleb Voss, David Heath, William Harris

Abstract: The automatic verification of programs that maintain unbounded low-level data structures is a critical and open problem. Analyzers and verifiers developed in previous work can synthesize invariants that only describe data structures of heavily restricted forms, or require an analyst to provide predicates over program data and structure that are used in a synthesized proof of correctness. In this… ▽ More The automatic verification of programs that maintain unbounded low-level data structures is a critical and open problem. Analyzers and verifiers developed in previous work can synthesize invariants that only describe data structures of heavily restricted forms, or require an analyst to provide predicates over program data and structure that are used in a synthesized proof of correctness. In this work, we introduce a novel automatic safety verifier of programs that maintain low-level data structures, named LTTP. LTTP synthesizes proofs of program safety represented as a grammar of a given program's control paths, annotated with invariants that relate program state at distinct points within its path of execution. LTTP synthesizes such proofs completely automatically, using a novel inductive-synthesis algorithm. We have implemented LTTP as a verifier for JVM bytecode and applied it to verify the safety of a collection of verification benchmarks. Our results demonstrate that LTTP can be applied to automatically verify the safety of programs that are beyond the scope of previously-developed verifiers. △ Less

Submitted 9 October, 2017; originally announced October 2017.

arXiv:1707.01066 [pdf, other]

Zero-Shot Transfer Learning for Event Extraction

Authors: Lifu Huang, Heng Ji, Kyunghyun Cho, Clare R. Voss

Abstract: Most previous event extraction studies have relied heavily on features derived from annotated event mentions, thus cannot be applied to new event types without annotation effort. In this work, we take a fresh look at event extraction and model it as a grounding problem. We design a transferable neural architecture, mapping event mentions and types jointly into a shared semantic space using structu… ▽ More Most previous event extraction studies have relied heavily on features derived from annotated event mentions, thus cannot be applied to new event types without annotation effort. In this work, we take a fresh look at event extraction and model it as a grounding problem. We design a transferable neural architecture, mapping event mentions and types jointly into a shared semantic space using structural and compositional neural networks, where the type of each event mention can be determined by the closest of all candidate types . By leveraging (1)~available manual annotations for a small set of existing event types and (2)~existing event ontologies, our framework applies to new event types without requiring additional annotation. Experiments on both existing event types (e.g., ACE, ERE) and new event types (e.g., FrameNet) demonstrate the effectiveness of our approach. \textit{Without any manual annotations} for 23 new event types, our zero-shot framework achieved performance comparable to a state-of-the-art supervised model which is trained from the annotations of 500 event mentions. △ Less

Submitted 4 July, 2017; originally announced July 2017.

arXiv:1703.03714 [pdf]

Applying the Wizard-of-Oz Technique to Multimodal Human-Robot Dialogue

Authors: Matthew Marge, Claire Bonial, Brendan Byrne, Taylor Cassidy, A. William Evans, Susan G. Hill, Clare Voss

Abstract: Our overall program objective is to provide more natural ways for soldiers to interact and communicate with robots, much like how soldiers communicate with other soldiers today. We describe how the Wizard-of-Oz (WOz) method can be applied to multimodal human-robot dialogue in a collaborative exploration task. While the WOz method can help design robot behaviors, traditional approaches place the bu… ▽ More Our overall program objective is to provide more natural ways for soldiers to interact and communicate with robots, much like how soldiers communicate with other soldiers today. We describe how the Wizard-of-Oz (WOz) method can be applied to multimodal human-robot dialogue in a collaborative exploration task. While the WOz method can help design robot behaviors, traditional approaches place the burden of decisions on a single wizard. In this work, we consider two wizards to stand in for robot navigation and dialogue management software components. The scenario used to elicit data is one in which a human-robot team is tasked with exploring an unknown environment: a human gives verbal instructions from a remote location and the robot follows them, clarifying possible misunderstandings as needed via dialogue. We found the division of labor between wizards to be workable, which holds promise for future software development. △ Less

Submitted 10 March, 2017; originally announced March 2017.

Comments: Presented at the 2016 IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Interactive Session, August 26-31, 2016

arXiv:1702.04457 [pdf, other]

Automated Phrase Mining from Massive Text Corpora

Authors: Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, Jiawei Han

Abstract: As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus. Phrase mining is important in various tasks such as information extraction/retrieval, taxonomy construction, and topic modeling. Most existing methods rely on complex, trained linguistic analyzers, and thus likely have unsatisfactory performance on text corpora of new domains and… ▽ More As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus. Phrase mining is important in various tasks such as information extraction/retrieval, taxonomy construction, and topic modeling. Most existing methods rely on complex, trained linguistic analyzers, and thus likely have unsatisfactory performance on text corpora of new domains and genres without extra but expensive adaption. Recently, a few data-driven methods have been developed successfully for extraction of phrases from massive domain-specific text. However, none of the state-of-the-art models is fully automated because they require human experts for designing rules or labeling phrases. Since one can easily obtain many quality phrases from public knowledge bases to a scale that is much larger than that produced by human experts, in this paper, we propose a novel framework for automated phrase mining, AutoPhrase, which leverages this large amount of high-quality phrases in an effective way and achieves better performance compared to limited human labeled phrases. In addition, we develop a POS-guided phrasal segmentation model, which incorporates the shallow syntactic information in part-of-speech (POS) tags to further enhance the performance, when a POS tagger is available. Note that, AutoPhrase can support any language as long as a general knowledge base (e.g., Wikipedia) in that language is available, while benefiting from, but not requiring, a POS tagger. Compared to the state-of-the-art methods, the new method has shown significant improvements in effectiveness on five real-world datasets across different domains and languages. △ Less

Submitted 11 March, 2017; v1 submitted 14 February, 2017; originally announced February 2017.

arXiv:1610.08763 [pdf, other]

CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases

Authors: Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han

Abstract: Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. In… ▽ More Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. In this paper, we investigate joint extraction of typed entities and relations with labeled data heuristically obtained from knowledge bases (i.e., distant supervision). As our algorithm for type labeling via distant supervision is context-agnostic, noisy training data poses unique challenges for the task. We propose a novel domain-independent framework, called CoType, that runs a data-driven text segmentation algorithm to extract entity mentions, and jointly embeds entity mentions, relation mentions, text features and type labels into two low-dimensional spaces (for entity and relation mentions respectively), where, in each space, objects whose types are close will also have similar representations. CoType, then using these learned embeddings, estimates the types of test (unlinkable) mentions. We formulate a joint optimization problem to learn embeddings from text corpora and knowledge bases, adopting a novel partial-label loss function for noisy labeled data and introducing an object "translation" function to capture the cross-constraints of entities and relations on each other. Experiments on three public datasets demonstrate the effectiveness of CoType across different domains (e.g., news, biomedical), with an average of 25% improvement in F1 score compared to the next best method. △ Less

Submitted 2 June, 2017; v1 submitted 27 October, 2016; originally announced October 2016.

Comments: WWW 2017

arXiv:1606.04000 [pdf, other]

Using a Distributional Semantic Vector Space with a Knowledge Base for Reasoning in Uncertain Conditions

Authors: Douglas Summers-Stay, Clare Voss, Taylor Cassidy

Abstract: The inherent inflexibility and incompleteness of commonsense knowledge bases (KB) has limited their usefulness. We describe a system called Displacer for performing KB queries extended with the analogical capabilities of the word2vec distributional semantic vector space (DSVS). This allows the system to answer queries with information which was not contained in the original KB in any form. By perf… ▽ More The inherent inflexibility and incompleteness of commonsense knowledge bases (KB) has limited their usefulness. We describe a system called Displacer for performing KB queries extended with the analogical capabilities of the word2vec distributional semantic vector space (DSVS). This allows the system to answer queries with information which was not contained in the original KB in any form. By performing analogous queries on semantically related terms and mapping their answers back into the context of the original query using displacement vectors, we are able to give approximate answers to many questions which, if posed to the KB alone, would return no results. We also show how the hand-curated knowledge in a KB can be used to increase the accuracy of a DSVS in solving analogy problems. In these ways, a KB and a DSVS can make up for each other's weaknesses. △ Less

Submitted 13 June, 2016; originally announced June 2016.

Journal ref: Biologically Inspired Cognitive Architectures (2016), pp. 34-44

arXiv:1602.05307 [pdf, other]

Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding

Authors: Xiang Ren, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Jiawei Han

Abstract: Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions. However, the type labels so obtained from knowledge bases are often noisy (i.e., incorrect for the entity mention's local context). We define a new task, Label Noise Reduction in Entity Typing (LNR), to be the automatic identifica… ▽ More Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions. However, the type labels so obtained from knowledge bases are often noisy (i.e., incorrect for the entity mention's local context). We define a new task, Label Noise Reduction in Entity Typing (LNR), to be the automatic identification of correct type labels (type-paths) for training examples, given the set of candidate type labels obtained by distant supervision with a given type hierarchy. The unknown type labels for individual entity mentions and the semantic similarity between entity types pose unique challenges for solving the LNR task. We propose a general framework, called PLE, to jointly embed entity mentions, text features and entity types into the same low-dimensional space where, in that space, objects whose types are semantically close have similar representations. Then we estimate the type-path for each training example in a top-down manner using the learned embeddings. We formulate a global objective for learning the embeddings from text corpora and knowledge bases, which adopts a novel margin-based loss that is robust to noisy labels and faithfully models type correlation derived from knowledge bases. Our experiments on three public typing datasets demonstrate the effectiveness and robustness of PLE, with an average of 25% improvement in accuracy compared to next best method. △ Less

Submitted 17 February, 2016; originally announced February 2016.

Comments: Submitted to KDD 2016. 11 pages

arXiv:1406.6312 [pdf, other]

Scalable Topical Phrase Mining from Text Corpora

Authors: Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare Voss, Jiawei Han

Abstract: While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases of mixed lengths. Existing work either performs post processing to the inference results of unigram-based topic models, or utilizes complex n-gram-discovery topic models. These methods ge… ▽ More While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases of mixed lengths. Existing work either performs post processing to the inference results of unigram-based topic models, or utilizes complex n-gram-discovery topic models. These methods generally produce low-quality topical phrases or suffer from poor scalability on even moderately-sized datasets. We propose a different approach that is both computationally efficient and effective. Our solution combines a novel phrase mining framework to segment a document into single and multi-word phrases, and a new topic model that operates on the induced document partition. Our approach discovers high quality topical phrases with negligible extra cost to the bag-of-words topic model in a variety of datasets including research publication titles, abstracts, reviews, and news articles. △ Less

Submitted 18 November, 2014; v1 submitted 24 June, 2014; originally announced June 2014.

Journal ref: Proceedings of the VLDB Endowment, Vol. 8(3), pp. 305 - 316, 2014

Showing 1–31 of 31 results for author: Voss, C