Skip to main content

Showing 1–50 of 83 results for author: Callison-Burch, C

  1. arXiv:2406.15586  [pdf, other

    cs.CL

    TinyStyler: Efficient Few-Shot Text Style Transfer with Authorship Embeddings

    Authors: Zachary Horvitz, Ajay Patel, Kanishk Singh, Chris Callison-Burch, Kathleen McKeown, Zhou Yu

    Abstract: The goal of text style transfer is to transform the style of texts while preserving their original meaning, often with only a few examples of the target style. Existing style transfer methods generally rely on the few-shot capabilities of large language models or on complex controllable text generation approaches that are inefficient and underperform on fluency metrics. We introduce TinyStyler, a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. Learning Translations via Matrix Completion

    Authors: Derry Wijaya, Brendan Callahan, John Hewitt, Jie Gao, Xiao Ling, Marianna Apidianaki, Chris Callison-Burch

    Abstract: Bilingual Lexicon Induction is the task of learning word translations without bilingual parallel corpora. We model this task as a matrix completion problem, and present an effective and extendable framework for completing the matrix. This method harnesses diverse bilingual and monolingual signals, each of which may be incomplete or noisy. Our model achieves state-of-the-art performance for both hi… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: This is a late posting of an old paper as Google Scholar somehow misses indexing the ACL anthology version of the paper

    ACM Class: I.2.7

    Journal ref: Volume: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Year: 2017, Pages: 1452-1463

  3. arXiv:2406.04331  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    PaCE: Parsimonious Concept Engineering for Large Language Models

    Authors: Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan Chan, Darshan Thaker, Aditya Chattopadhyay, Chris Callison-Burch, René Vidal

    Abstract: Large Language Models (LLMs) are being used for a wide variety of tasks. While they are capable of generating human-like responses, they can also produce undesirable output including potentially harmful information, racist or sexist language, and hallucinations. Alignment methods are designed to reduce such undesirable output, via techniques such as fine-tuning, prompt engineering, and representat… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 26 pages, 17 figures, 5 tables, dataset and code at https://github.com/peterljq/Parsimonious-Concept-Engineering

  4. arXiv:2405.20309  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Models Can Self-Improve At Web Agent Tasks

    Authors: Ajay Patel, Markus Hofmarcher, Claudiu Leoveanu-Condrei, Marius-Constantin Dinu, Chris Callison-Burch, Sepp Hochreiter

    Abstract: Training models to act as agents that can effectively navigate and perform actions in a complex environment, such as a web browser, has typically been challenging due to lack of training data. Large language models (LLMs) have recently demonstrated some capability to navigate novel environments as agents in a zero-shot or few-shot fashion, purely guided by natural language instructions as prompts.… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  5. arXiv:2405.19793  [pdf, other

    cs.CL

    PDDLEGO: Iterative Planning in Textual Environments

    Authors: Li Zhang, Peter Jansen, Tianyi Zhang, Peter Clark, Chris Callison-Burch, Niket Tandon

    Abstract: Planning in textual environments have been shown to be a long-standing challenge even for current models. A recent, promising line of work uses LLMs to generate a formal representation of the environment that can be solved by a symbolic planner. However, existing methods rely on a fully-observed environment where all entity states are initially known, so a one-off representation can be constructed… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: In *SEM 2024

  6. arXiv:2405.19423  [pdf, other

    cs.CV cs.AI

    Evaluating Vision-Language Models on Bistable Images

    Authors: Artemis Panagopoulou, Coby Melkin, Chris Callison-Burch

    Abstract: Bistable images, also known as ambiguous or reversible images, present visual stimuli that can be seen in two distinct interpretations, though not simultaneously by the observer. In this study, we conduct the most extensive examination of vision-language models using bistable images to date. We manually gathered a dataset of 29 bistable images, along with their associated labels, and subjected the… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  7. arXiv:2405.14839  [pdf, other

    cs.CV cs.CL

    A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis

    Authors: Yue Yang, Mona Gandhi, Yufei Wang, Yifan Wu, Michael S. Yao, Chris Callison-Burch, James C. Gee, Mark Yatskar

    Abstract: While deep networks have achieved broad success in analyzing natural images, when applied to medical scans, they often fail in unexcepted situations. We investigate this challenge and focus on model sensitivity to domain shifts, such as data sampled from different hospitals or data confounded by demographic variables such as sex, race, etc, in the context of chest X-rays and skin lesion images. A… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 23 pages, 9 figures, 12 tables, project page: https://yueyang1996.github.io/knobo/

  8. arXiv:2405.07940  [pdf, other

    cs.CL

    RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

    Authors: Liam Dugan, Alyssa Hwang, Filip Trhlik, Josh Magnus Ludan, Andrew Zhu, Hainiu Xu, Daphne Ippolito, Chris Callison-Burch

    Abstract: Many commercial and open-source models claim to detect machine-generated text with extremely high accuracy (99% or more). However, very few of these detectors are evaluated on shared benchmark datasets and even when they are, the datasets used for evaluation are insufficiently challenging-lacking variations in sampling strategy, adversarial attacks, and open-source generative models. In this work… ▽ More

    Submitted 10 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: ACL 2024

    ACM Class: I.2.7

  9. arXiv:2403.13900  [pdf, other

    cs.CV

    CoMo: Controllable Motion Generation through Language Guided Pose Code Editing

    Authors: Yiming Huang, Weilin Wan, Yue Yang, Chris Callison-Burch, Mark Yatskar, Lingjie Liu

    Abstract: Text-to-motion models excel at efficient human motion generation, but existing approaches lack fine-grained controllability over the generation process. Consequently, modifying subtle postures within a motion or inserting new actions at specific moments remains a challenge, limiting the applicability of these methods in diverse scenarios. In light of these challenges, we introduce CoMo, a Controll… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  10. arXiv:2403.00092  [pdf, other

    cs.CL

    PROC2PDDL: Open-Domain Planning Representations from Texts

    Authors: Tianyi Zhang, Li Zhang, Zhaoyi Hou, Ziyu Wang, Yuling Gu, Peter Clark, Chris Callison-Burch, Niket Tandon

    Abstract: Planning in a text-based environment continues to be a major challenge for AI systems. Recent approaches have used language models to predict a planning domain definition (e.g., PDDL) but have only been evaluated in closed-domain simulated environments. To address this, we present Proc2PDDL , the first dataset containing open-domain procedural texts paired with expert-annotated PDDL representation… ▽ More

    Submitted 2 July, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: In NLRSE 2024, the 2nd Natural Language Reasoning and Structured Explanations Workshop

  11. arXiv:2402.14116  [pdf, other

    cs.CL cs.AI

    FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models

    Authors: Andrew Zhu, Alyssa Hwang, Liam Dugan, Chris Callison-Burch

    Abstract: One type of question that is commonly found in day-to-day scenarios is ``fan-out'' questions, complex multi-hop, multi-document reasoning questions that require finding information about a large number of entities. However, there exist few resources to evaluate this type of question-answering capability among large language models. To evaluate complex reasoning in LLMs more fully, we present FanOu… ▽ More

    Submitted 6 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: 18 pages, 2 figures. ACL 2024

  12. arXiv:2402.13904  [pdf, other

    cs.CL

    Calibrating Large Language Models with Sample Consistency

    Authors: Qing Lyu, Kumar Shridhar, Chaitanya Malaviya, Li Zhang, Yanai Elazar, Niket Tandon, Marianna Apidianaki, Mrinmaya Sachan, Chris Callison-Burch

    Abstract: Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application. However, LLMs are often uncalibrated inherently and elude conventional calibration techniques due to their proprietary nature and massive scale. In this work, we explore the potential of deriving confidence from the distribution of multiple randomly sampled model generati… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  13. arXiv:2402.10379  [pdf, other

    cs.CL cs.LG

    DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

    Authors: Ajay Patel, Colin Raffel, Chris Callison-Burch

    Abstract: Large language models (LLMs) have become a dominant and important tool for NLP researchers in a wide range of tasks. Today, many researchers use LLMs in synthetic data generation, task evaluation, fine-tuning, distillation, and other model-in-the-loop research workflows. However, challenges arise when using these models that stem from their scale, their closed source nature, and the lack of standa… ▽ More

    Submitted 27 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Published in ACL 2024

  14. arXiv:2312.09067  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    Holodeck: Language Guided Generation of 3D Embodied AI Environments

    Authors: Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark

    Abstract: 3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To mitigate this limitation, we present Holodeck, a system that generates 3D environments to match a user-supplied prompt fully automatedly. Holodeck can generate diverse scenes, e.g., arcades, spas, and museums, adjust the designs… ▽ More

    Submitted 22 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Published in CVPR 2024, 21 pages, 27 figures, 2 tables

  15. arXiv:2311.06477  [pdf, other

    cs.CY

    Report of the 1st Workshop on Generative AI and Law

    Authors: A. Feder Cooper, Katherine Lee, James Grimmelmann, Daphne Ippolito, Christopher Callison-Burch, Christopher A. Choquette-Choo, Niloofar Mireshghallah, Miles Brundage, David Mimno, Madiha Zahrah Choksi, Jack M. Balkin, Nicholas Carlini, Christopher De Sa, Jonathan Frankle, Deep Ganguli, Bryant Gipson, Andres Guadamuz, Swee Leng Harris, Abigail Z. Jacobs, Elizabeth Joh, Gautam Kamath, Mark Lemley, Cass Matthews, Christine McLeavey, Corynne McSherry , et al. (10 additional authors not shown)

    Abstract: This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report… ▽ More

    Submitted 2 December, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

  16. arXiv:2311.02069  [pdf, other

    cs.CL

    Grounded Intuition of GPT-Vision's Abilities with Scientific Images

    Authors: Alyssa Hwang, Andrew Head, Chris Callison-Burch

    Abstract: GPT-Vision has impressed us on a range of vision-language tasks, but it comes with the familiar new challenge: we have little idea of its capabilities and limitations. In our study, we formalize a process that many have instinctively been trying already to develop "grounded intuition" of this new model. Inspired by the recent movement away from benchmarking in favor of example-driven qualitative e… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  17. arXiv:2310.19660  [pdf, other

    cs.CL

    Interpretable-by-Design Text Understanding with Iteratively Generated Concept Bottleneck

    Authors: Josh Magnus Ludan, Qing Lyu, Yue Yang, Liam Dugan, Mark Yatskar, Chris Callison-Burch

    Abstract: Black-box deep neural networks excel in text classification, yet their application in high-stakes domains is hindered by their lack of interpretability. To address this, we propose Text Bottleneck Models (TBM), an intrinsically interpretable text classification framework that offers both global and local explanations. Rather than directly predicting the output label, TBM predicts categorical value… ▽ More

    Submitted 3 April, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

  18. arXiv:2310.10134  [pdf, other

    cs.CL cs.AI cs.LG

    CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization

    Authors: Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark

    Abstract: Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. However, despite their zero-shot capabilities, these agents to date do not continually improve over time beyond performance refinement on a specific task. Here we present C… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Project page: https://allenai.github.io/clin/

  19. arXiv:2309.11737  [pdf, other

    cs.AI

    Choice-75: A Dataset on Decision Branching in Script Learning

    Authors: Zhaoyi Joey Hou, Li Zhang, Chris Callison-Burch

    Abstract: Script learning studies how stereotypical events unfold, enabling machines to reason about narratives with implicit information. Previous works mostly consider a script as a linear sequence of events while ignoring the potential branches that arise due to people's circumstantial choices. We hence propose Choice-75, the first benchmark that challenges intelligent systems to make decisions given des… ▽ More

    Submitted 17 March, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: To be published in LREC-COLING-2024

  20. arXiv:2309.05542  [pdf, other

    cs.SE cs.AI cs.CL

    Kani: A Lightweight and Highly Hackable Framework for Building Language Model Applications

    Authors: Andrew Zhu, Liam Dugan, Alyssa Hwang, Chris Callison-Burch

    Abstract: Language model applications are becoming increasingly popular and complex, often including features like tool usage and retrieval augmentation. However, existing frameworks for such applications are often opinionated, deciding for developers how their prompts ought to be formatted and imposing limitations on customizability and reproducibility. To solve this we present Kani: a lightweight, flexibl… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: In submission to NLP-OSS

    ACM Class: I.2.7

  21. arXiv:2308.15459  [pdf, other

    cs.CL cs.AI

    ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer

    Authors: Zachary Horvitz, Ajay Patel, Chris Callison-Burch, Zhou Yu, Kathleen McKeown

    Abstract: Textual style transfer is the task of transforming stylistic properties of text while preserving meaning. Target "styles" can be defined in numerous ways, ranging from single attributes (e.g, formality) to authorship (e.g, Shakespeare). Previous unsupervised style-transfer approaches generally rely on significant amounts of labeled data for only a fixed set of styles or require large language mode… ▽ More

    Submitted 22 February, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

  22. CALYPSO: LLMs as Dungeon Masters' Assistants

    Authors: Andrew Zhu, Lara J. Martin, Andrew Head, Chris Callison-Burch

    Abstract: The role of a Dungeon Master, or DM, in the game Dungeons & Dragons is to perform multiple tasks simultaneously. The DM must digest information about the game setting and monsters, synthesize scenes to present to other players, and respond to the players' interactions with the scene. Doing all of these tasks while maintaining consistency within the narrative and story world is no small feat of hum… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: 11 pages, 4 figures. AIIDE 2023

    Journal ref: AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE) 2023

  23. arXiv:2307.01972  [pdf, other

    cs.CL

    Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification

    Authors: Sha Li, Ruining Zhao, Manling Li, Heng Ji, Chris Callison-Burch, Jiawei Han

    Abstract: Event schemas are a form of world knowledge about the typical progression of events. Recent methods for event schema induction use information extraction systems to construct a large number of event graph instances from documents, and then learn to generalize the schema from such instances. In contrast, we propose to treat event schemas as a form of commonsense knowledge that can be derived from l… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: Accepted to ACL 2023. 19 pages with appendix

  24. arXiv:2306.09992  [pdf, other

    cs.HC cs.CL

    Rewriting the Script: Adapting Text Instructions for Voice Interaction

    Authors: Alyssa Hwang, Natasha Oza, Chris Callison-Burch, Andrew Head

    Abstract: Voice assistants have sharply risen in popularity in recent years, but their use has been limited mostly to simple applications like music, hands-free search, or control of internet-of-things devices. What would it take for voice assistants to guide people through more complex tasks? In our work, we study the limitations of the dominant approach voice assistants take to complex task guidance: read… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: To appear at Designing Interactive Systems 2023

  25. arXiv:2306.01201  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models

    Authors: Liam Dugan, Anshul Wadhawan, Kyle Spence, Chris Callison-Burch, Morgan McGuire, Victor Zordan

    Abstract: Recent work in speech-to-speech translation (S2ST) has focused primarily on offline settings, where the full input utterance is available before any output is given. This, however, is not reasonable in many real-world scenarios. In latency-sensitive applications, rather than waiting for the full utterance, translations should be spoken as soon as the information in the input is present. In this wo… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: To appear at INTERSPEECH 2023

  26. arXiv:2305.18657  [pdf, other

    cs.CL

    Representation Of Lexical Stylistic Features In Language Models' Embedding Space

    Authors: Qing Lyu, Marianna Apidianaki, Chris Callison-Burch

    Abstract: The representation space of pretrained Language Models (LMs) encodes rich information about words and their relationships (e.g., similarity, hypernymy, polysemy) as well as abstract semantic notions (e.g., intensity). In this paper, we demonstrate that lexical stylistic notions such as complexity, formality, and figurativeness, can also be identified in this space. We show that it is possible to d… ▽ More

    Submitted 31 May, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted at *SEM 2023

  27. arXiv:2305.14610  [pdf, other

    cs.CL

    This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models

    Authors: Bryan Li, Samar Haider, Chris Callison-Burch

    Abstract: Do the Spratly Islands belong to China, the Philippines, or Vietnam? A pretrained large language model (LLM) may answer differently if asked in the languages of each claimant country: Chinese, Tagalog, or Vietnamese. This contrasts with a multilingual human, who would likely answer consistently. In this paper, we show that LLMs recall certain geographical knowledge inconsistently when queried in d… ▽ More

    Submitted 1 April, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: NAACL 2024 main conference

  28. arXiv:2305.14603  [pdf, other

    cs.CL

    OpenPI2.0: An Improved Dataset for Entity Tracking in Texts

    Authors: Li Zhang, Hainiu Xu, Abhinav Kommula, Chris Callison-Burch, Niket Tandon

    Abstract: Much text describes a changing world (e.g., procedures, stories, newswires), and understanding them requires tracking how entities change. An earlier dataset, OpenPI, provided crowdsourced annotations of entity state changes in text. However, a major limitation was that those annotations were free-form and did not identify salient changes, hampering model evaluation. To overcome these limitations,… ▽ More

    Submitted 25 January, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: In EACL 2024

  29. arXiv:2305.12696  [pdf, other

    cs.CL

    Learning Interpretable Style Embeddings via Prompting LLMs

    Authors: Ajay Patel, Delip Rao, Ansh Kothary, Kathleen McKeown, Chris Callison-Burch

    Abstract: Style representation learning builds content-independent representations of author style in text. Stylometry, the analysis of style in text, is often performed by expert forensic linguists and no large dataset of stylometric annotations exists for training. Current style representation learning uses neural methods to disentangle style from content to create style vectors, however, these approaches… ▽ More

    Submitted 9 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  30. arXiv:2305.04990  [pdf, other

    cs.CL cs.LG

    Explanation-based Finetuning Makes Models More Robust to Spurious Cues

    Authors: Josh Magnus Ludan, Yixuan Meng, Tai Nguyen, Saurabh Shah, Qing Lyu, Marianna Apidianaki, Chris Callison-Burch

    Abstract: Large Language Models (LLMs) are so powerful that they sometimes learn correlations between labels and features that are irrelevant to the task, leading to poor generalization on out-of-distribution data. We propose explanation-based finetuning as a general approach to mitigate LLMs' reliance on spurious correlations. Unlike standard finetuning where the model only predicts the answer given the in… ▽ More

    Submitted 6 June, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

  31. FIREBALL: A Dataset of Dungeons and Dragons Actual-Play with Structured Game State Information

    Authors: Andrew Zhu, Karmanya Aggarwal, Alexander Feng, Lara J. Martin, Chris Callison-Burch

    Abstract: Dungeons & Dragons (D&D) is a tabletop roleplaying game with complex natural language interactions between players and hidden state information. Recent work has shown that large language models (LLMs) that have access to state information can generate higher quality game turns than LLMs that use dialog history alone. However, previous work used game state information that was heuristically created… ▽ More

    Submitted 25 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 21 pages, 2 figures. Accepted at ACL 2023

    Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 4171-4193

  32. arXiv:2304.13250  [pdf, other

    cs.CL

    Exploring the Curious Case of Code Prompts

    Authors: Li Zhang, Liam Dugan, Hainiu Xu, Chris Callison-Burch

    Abstract: Recent work has shown that prompting language models with code-like representations of natural language leads to performance improvements on structured reasoning tasks. However, such tasks comprise only a small subset of all natural language tasks. In our work, we seek to answer whether or not code-prompting is the preferred way of interacting with language models in general. We compare code and t… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

  33. arXiv:2304.12206  [pdf, other

    cs.CL

    PAXQA: Generating Cross-lingual Question Answering Examples at Training Scale

    Authors: Bryan Li, Chris Callison-Burch

    Abstract: Existing question answering (QA) systems owe much of their success to large, high-quality training data. Such annotation efforts are costly, and the difficulty compounds in the cross-lingual setting. Therefore, prior cross-lingual QA work has focused on releasing evaluation datasets, and then applying zero-shot methods as baselines. This work proposes a synthetic data generation method for cross-l… ▽ More

    Submitted 17 October, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: EMNLP 2023 (Findings)

  34. Human-in-the-Loop Schema Induction

    Authors: Tianyi Zhang, Isaac Tham, Zhaoyi Hou, Jiaxuan Ren, Liyang Zhou, Hainiu Xu, Li Zhang, Lara J. Martin, Rotem Dror, Sha Li, Heng Ji, Martha Palmer, Susan Brown, Reece Suchocki, Chris Callison-Burch

    Abstract: Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction(IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic el… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

    Comments: 10 pages, ACL2023 demo track

  35. arXiv:2301.13379  [pdf, other

    cs.CL

    Faithful Chain-of-Thought Reasoning

    Authors: Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, Chris Callison-Burch

    Abstract: While Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of complex reasoning tasks, the generated reasoning chain does not necessarily reflect how the model arrives at the answer (aka. faithfulness). We propose Faithful CoT, a reasoning framework involving two stages: Translation (Natural Language query $\rightarrow$ symbolic reasoning chain) and Problem Solving… ▽ More

    Submitted 20 September, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: IJCNLP-AACL 2023 camera-ready version

  36. arXiv:2301.10896  [pdf, other

    cs.CL

    Causal Reasoning of Entities and Events in Procedural Texts

    Authors: Li Zhang, Hainiu Xu, Yue Yang, Shuyan Zhou, Weiqiu You, Manni Arora, Chris Callison-Burch

    Abstract: Entities and events are crucial to natural language reasoning and common in procedural texts. Existing work has focused either exclusively on entity state tracking (e.g., whether a pan is hot) or on event reasoning (e.g., whether one would burn themselves by touching the pan), while these two tasks are often causally related. We propose CREPE, the first benchmark on causal reasoning of event plaus… ▽ More

    Submitted 16 February, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: In Findings of EACL 2023

  37. arXiv:2301.01162  [pdf, other

    cs.SD cs.CL eess.AS

    Language Models are Drummers: Drum Composition with Natural Language Pre-Training

    Authors: Li Zhang, Chris Callison-Burch

    Abstract: Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: Accepted to the 1st workshop on Creative AI across Modalities in AAAI 2023

  38. arXiv:2212.12672  [pdf, other

    cs.CL cs.AI cs.HC

    Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Generated Text

    Authors: Liam Dugan, Daphne Ippolito, Arun Kirubarajan, Sherry Shi, Chris Callison-Burch

    Abstract: As text generated by large language models proliferates, it becomes vital to understand how humans engage with such text, and whether or not they are able to detect when the text they are reading did not originate with a human writer. Prior work on human detection of generated text focuses on the case where an entire passage is either human-written or machine-generated. In this paper, we study a m… ▽ More

    Submitted 24 December, 2022; originally announced December 2022.

    Comments: AAAI 2023 Long Paper. Code is available at https://github.com/liamdugan/human-detection

    ACM Class: I.2.7

  39. CoRRPUS: Code-based Structured Prompting for Neurosymbolic Story Understanding

    Authors: Yijiang River Dong, Lara J. Martin, Chris Callison-Burch

    Abstract: Story generation and understanding -- as with all NLG/NLU tasks -- has seen a surge in neurosymbolic work. Researchers have recognized that, while large language models (LLMs) have tremendous utility, they can be augmented with symbolic means to be even better and to make up for any flaws that the neural networks might have. However, symbolic methods are extremely costly in terms of the amount of… ▽ More

    Submitted 8 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted to Findings of ACL 2023

    Journal ref: Findings of ACL 2023, pp. 13152-13168

  40. arXiv:2212.10060  [pdf, other

    cs.CL cs.AI

    I Cast Detect Thoughts: Learning to Converse and Guide with Intents and Theory-of-Mind in Dungeons and Dragons

    Authors: Pei Zhou, Andrew Zhu, Jennifer Hu, Jay Pujara, Xiang Ren, Chris Callison-Burch, Yejin Choi, Prithviraj Ammanabrolu

    Abstract: We propose a novel task, G4C, to study teacher-student natural language interactions in a goal-driven and grounded environment. Dungeons and Dragons (D&D), a role-playing game, provides an ideal setting to investigate such interactions. Here, the Dungeon Master (DM), i.e., the teacher, guides the actions of several players -- students, each with their own personas and abilities -- to achieve share… ▽ More

    Submitted 30 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted to ACL 2023. 18 pages, 11 figures, 5 Tables

  41. arXiv:2212.08986  [pdf, other

    cs.CL

    Low-Resource Authorship Style Transfer: Can Non-Famous Authors Be Imitated?

    Authors: Ajay Patel, Nicholas Andrews, Chris Callison-Burch

    Abstract: Authorship style transfer involves altering text to match the style of a target author whilst preserving the original meaning. Existing unsupervised approaches like STRAP have largely focused on style transfer to target authors with many examples of their writing style in books, speeches, or other published works. This high-resource training data requirement (often greater than 100,000 words) make… ▽ More

    Submitted 23 August, 2023; v1 submitted 17 December, 2022; originally announced December 2022.

  42. arXiv:2211.11158  [pdf, other

    cs.CV cs.CL

    Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification

    Authors: Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, Mark Yatskar

    Abstract: Concept Bottleneck Models (CBM) are inherently interpretable models that factor model decisions into human-readable concepts. They allow people to easily understand why a model is failing, a critical feature for high-stakes applications. CBMs require manually specified concepts and often under-perform their black box counterparts, preventing their broad adoption. We address these shortcomings and… ▽ More

    Submitted 25 April, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: Published in CVPR 2023, 18 pages, 12 figures, 16 tables

  43. arXiv:2210.12905  [pdf, other

    cs.CL

    Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction

    Authors: Yue Yang, Artemis Panagopoulou, Marianna Apidianaki, Mark Yatskar, Chris Callison-Burch

    Abstract: Neural language models encode rich knowledge about entities and their relationships which can be extracted from their representations using probing. Common properties of nouns (e.g., red strawberries, small ant) are, however, more challenging to extract compared to other types of knowledge because they are rarely explicitly stated in texts. We hypothesize this to mainly be the case for perceptual… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022; The first two authors contributed equally

    Journal ref: Findings of EMNLP 2022

  44. Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence

    Authors: Chris Callison-Burch, Gaurav Singh Tomar, Lara J. Martin, Daphne Ippolito, Suma Bailis, David Reitter

    Abstract: AI researchers have posited Dungeons and Dragons (D&D) as a challenge problem to test systems on various language-related capabilities. In this paper, we frame D&D specifically as a dialogue system challenge, where the tasks are to both generate the next conversational turn in the game and predict the state of the game given the dialogue history. We create a gameplay dataset consisting of nearly 9… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP 2022

    Journal ref: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 9379-9393, Dec. 2022

  45. arXiv:2209.14500  [pdf, other

    cs.LG cs.CL

    Bidirectional Language Models Are Also Few-shot Learners

    Authors: Ajay Patel, Bryan Li, Mohammad Sadegh Rasooli, Noah Constant, Colin Raffel, Chris Callison-Burch

    Abstract: Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prom… ▽ More

    Submitted 5 February, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: To appear at ICLR 2023

  46. arXiv:2209.11326  [pdf, other

    cs.CL

    Towards Faithful Model Explanation in NLP: A Survey

    Authors: Qing Lyu, Marianna Apidianaki, Chris Callison-Burch

    Abstract: End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand. This has given rise to numerous efforts towards model explainability in recent years. One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction. In this survey, we review over 110 model explanation method… ▽ More

    Submitted 12 January, 2024; v1 submitted 22 September, 2022; originally announced September 2022.

    Comments: Added acknowledgements; Accepted to the Computational Linguistics Journal (June 2024 issue)

  47. arXiv:2209.02821  [pdf, other

    cs.CL

    Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation

    Authors: Bryan Li, Mohammad Sadegh Rasooli, Ajay Patel, Chris Callison-Burch

    Abstract: We propose a two-stage approach for training a single NMT model to translate unseen languages both to and from English. For the first stage, we initialize an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform multilingual fine-tuning on parallel data in 40 languages to English. We find this model can generalize to zero-shot translations on unseen languages. For the second… ▽ More

    Submitted 3 April, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

    Comments: LoResMT @ EACL 2023

  48. arXiv:2206.04812  [pdf, other

    cs.CL

    The Case for a Single Model that can Both Generate Continuations and Fill in the Blank

    Authors: Daphne Ippolito, Liam Dugan, Emily Reif, Ann Yuan, Andy Coenen, Chris Callison-Burch

    Abstract: The task of inserting text into a specified position in a passage, known as fill in the blank (FitB), is useful for a variety of applications where writers interact with a natural language generation (NLG) system to craft text. While previous work has tackled this problem with models trained specifically to do the fill-in-the-blank task, a more useful model is one that can effectively perform _bot… ▽ More

    Submitted 30 June, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: This version: fixed bug in the headers of Table 2

    Journal ref: NAACL 2022 Findings

  49. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  50. arXiv:2205.12698  [pdf, other

    cs.CL

    Empathic Conversations: A Multi-level Dataset of Contextualized Conversations

    Authors: Damilola Omitaomu, Shabnam Tafreshi, Tingting Liu, Sven Buechel, Chris Callison-Burch, Johannes Eichstaedt, Lyle Ungar, João Sedoc

    Abstract: Empathy is a cognitive and emotional reaction to an observed situation of others. Empathy has recently attracted interest because it has numerous applications in psychology and AI, but it is unclear how different forms of empathy (e.g., self-report vs counterpart other-report, concern vs. distress) interact with other affective phenomena or demographics like gender and age. To better understand th… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: 21 pages