Skip to main content

Showing 1–31 of 31 results for author: Kummerfeld, J K

  1. arXiv:2406.04643  [pdf, other

    cs.CL

    More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play

    Authors: Wichayaporn Wongkamjan, Feng Gu, Yanze Wang, Ulf Hermjakob, Jonathan May, Brandon M. Stewart, Jonathan K. Kummerfeld, Denis Peskoff, Jordan Lee Boyd-Graber

    Abstract: The boardgame Diplomacy is a challenging setting for communicative and cooperative artificial intelligence. The most prominent communicative Diplomacy AI, Cicero, has excellent strategic abilities, exceeding human players. However, the best Diplomacy players master communication, not just tactics, which is why the game has received attention as an AI challenge. This work seeks to understand the de… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  2. arXiv:2405.08447  [pdf, other

    cs.HC

    AI-Resilient Interfaces

    Authors: Elena L. Glassman, Ziwei Gu, Jonathan K. Kummerfeld

    Abstract: AI is powerful, but it can make choices that result in objective errors, contextually inappropriate outputs, and disliked options. We need AI-resilient interfaces that help people be resilient to the AI choices that are not right, or not right for them. To support this goal, interfaces need to help users notice and have the context to appropriately judge those AI choices. Existing human-AI interac… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  3. arXiv:2405.06563  [pdf, other

    cs.CL

    What Can Natural Language Processing Do for Peer Review?

    Authors: Ilia Kuznetsov, Osama Mohammed Afzal, Koen Dercksen, Nils Dycke, Alexander Goldberg, Tom Hope, Dirk Hovy, Jonathan K. Kummerfeld, Anne Lauscher, Kevin Leyton-Brown, Sheng Lu, Mausam, Margot Mieskes, Aurélie Névéol, Danish Pruthi, Lizhen Qu, Roy Schwartz, Noah A. Smith, Thamar Solorio, Jingyan Wang, Xiaodan Zhu, Anna Rogers, Nihar B. Shah, Iryna Gurevych

    Abstract: The number of scientific articles produced every year is growing rapidly. Providing quality control over them is crucial for scientists and, ultimately, for the public good. In modern science, this process is largely delegated to peer review -- a distributed procedure in which each submission is evaluated by several independent experts in the field. Peer review is widely used, yet it is hard, time… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  4. arXiv:2401.13726  [pdf, other

    cs.HC cs.LG

    Supporting Sensemaking of Large Language Model Outputs at Scale

    Authors: Katy Ilonka Gero, Chelse Swoopes, Ziwei Gu, Jonathan K. Kummerfeld, Elena L. Glassman

    Abstract: Large language models (LLMs) are capable of generating multiple responses to a single prompt, yet little effort has been expended to help end-users or system designers make use of this capability. In this paper, we explore how to present many LLM responses at once. We design five features, which include both pre-existing and novel methods for computing similarities and differences across textual d… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 34 pages, 13 figures, conditionally accepted to ACM Conference on Human Factors in Computing Systems 2024

  5. arXiv:2401.10873  [pdf, other

    cs.HC

    An AI-Resilient Text Rendering Technique for Reading and Skimming Documents

    Authors: Ziwei Gu, Ian Arawjo, Kenneth Li, Jonathan K. Kummerfeld, Elena L. Glassman

    Abstract: Readers find text difficult to consume for many reasons. Summarization can address some of these difficulties, but introduce others, such as omitting, misrepresenting, or hallucinating information, which can be hard for a reader to notice. One approach to addressing this problem is to instead modify how the original text is rendered to make important information more salient. We introduce Grammar-… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Conditionally accepted to CHI 2024

  6. arXiv:2401.01967  [pdf, other

    cs.CL cs.AI

    A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

    Authors: Andrew Lee, Xiaoyan Bai, Itamar Pres, Martin Wattenberg, Jonathan K. Kummerfeld, Rada Mihalcea

    Abstract: While alignment algorithms are now commonly used to tune pre-trained language models towards a user's preferences, we lack explanations for the underlying mechanisms in which models become ``aligned'', thus making it difficult to explain phenomena like jailbreaks. In this work we study a popular algorithm, direct preference optimization (DPO), and the mechanisms by which it reduces toxicity. Namel… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  7. arXiv:2305.07372  [pdf, other

    cs.DB cs.CL

    Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations

    Authors: Yuan Tian, Zheng Zhang, Zheng Ning, Toby Jia-Jun Li, Jonathan K. Kummerfeld, Tianyi Zhang

    Abstract: Relational databases play an important role in business, science, and more. However, many users cannot fully unleash the analytical power of relational databases, because they are not familiar with database languages such as SQL. Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly f… ▽ More

    Submitted 4 January, 2024; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023

    ACM Class: I.2.7

  8. arXiv:2210.13344  [pdf, other

    cs.CL cs.LG

    Augmenting Task-Oriented Dialogue Systems with Relation Extraction

    Authors: Andrew Lee, Zhenguo Chen, Kevin Leach, Jonathan K. Kummerfeld

    Abstract: The standard task-oriented dialogue pipeline uses intent classification and slot-filling to interpret user utterances. While this approach can handle a wide range of queries, it does not extract the information needed to handle more complex queries that contain relationships between slots. We propose integration of relation extraction into this pipeline as an effective way to expand the capabiliti… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: DSTC 10 AAAI22 Workshop Paper

  9. arXiv:2207.05553  [pdf, other

    cs.CL

    Using Paraphrases to Study Properties of Contextual Embeddings

    Authors: Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea

    Abstract: We use paraphrases as a unique source of data to analyze contextualized embeddings, with a particular focus on BERT. Because paraphrases naturally encode consistent word and phrase semantics, they provide a unique lens for investigating properties of embeddings. Using the Paraphrase Database's alignments, we study words within paraphrases as well as phrase representations. We find that contextual… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: Published at NAACL 2022

  10. arXiv:2110.15724  [pdf, other

    cs.CL cs.LG

    Learning to Learn End-to-End Goal-Oriented Dialog From Related Dialog Tasks

    Authors: Janarthanan Rajendran, Jonathan K. Kummerfeld, Satinder Singh

    Abstract: For each goal-oriented dialog task of interest, large amounts of data need to be collected for end-to-end learning of a neural dialog system. Collecting that data is a costly and time-consuming process. Instead, we show that we can use only a small amount of data, supplemented with data from a related dialog task. Naively learning from related data fails to improve performance as the related data… ▽ More

    Submitted 10 October, 2021; originally announced October 2021.

    Comments: Workshop on NLP for Conversational AI, EMNLP 2021

  11. arXiv:2109.13770  [pdf, other

    cs.CL

    Micromodels for Efficient, Explainable, and Reusable Systems: A Case Study on Mental Health

    Authors: Andrew Lee, Jonathan K. Kummerfeld, Lawrence C. An, Rada Mihalcea

    Abstract: Many statistical models have high accuracy on test benchmarks, but are not explainable, struggle in low-resource scenarios, cannot be reused for multiple tasks, and cannot easily integrate domain expertise. These factors limit their use, particularly in settings such as mental health, where it is difficult to annotate datasets and model outputs have significant impact. We introduce a micromodel ar… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: To appear in Findings of EMNLP 2021

  12. arXiv:2106.12976  [pdf, other

    cs.CL

    Exploring Self-Identified Counseling Expertise in Online Support Forums

    Authors: Allison Lahnala, Yuntian Zhao, Charles Welch, Jonathan K. Kummerfeld, Lawrence An, Kenneth Resnicow, Rada Mihalcea, Verónica Pérez-Rosas

    Abstract: A growing number of people engage in online health forums, making it important to understand the quality of the advice they receive. In this paper, we explore the role of expertise in responses provided to help-seeking posts regarding mental health. We study the differences between (1) interactions with peers; and (2) interactions with self-identified mental health professionals. First, we show th… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: Accepted to Findings of ACL 2021

  13. arXiv:2105.12762  [pdf, other

    cs.CL cs.HC

    Quantifying and Avoiding Unfair Qualification Labour in Crowdsourcing

    Authors: Jonathan K. Kummerfeld

    Abstract: Extensive work has argued in favour of paying crowd workers a wage that is at least equivalent to the U.S. federal minimum wage. Meanwhile, research on collecting high quality annotations suggests using a qualification that requires workers to have previously completed a certain number of tasks. If most requesters who pay fairly require workers to have completed a large number of tasks already the… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

    Comments: To appear at ACL 2021

    ACM Class: I.2.7

  14. arXiv:2102.02917  [pdf, other

    cs.SD cs.AI cs.CL

    Chord Embeddings: Analyzing What They Capture and Their Role for Next Chord Prediction and Artist Attribute Prediction

    Authors: Allison Lahnala, Gauri Kambhatla, Jiajun Peng, Matthew Whitehead, Gillian Minnehan, Eric Guldan, Jonathan K. Kummerfeld, Anıl Çamcı, Rada Mihalcea

    Abstract: Natural language processing methods have been applied in a variety of music studies, drawing the connection between music and language. In this paper, we expand those approaches by investigating \textit{chord embeddings}, which we apply in two case studies to address two key questions: (1) what musical information do chord embeddings capture?; and (2) how might musical applications benefit from th… ▽ More

    Submitted 4 February, 2021; originally announced February 2021.

    Comments: 16 pages, accepted to EvoMUSART

    Journal ref: Computational Intelligence in Music, Sound, Art and Design, 10th International Conference, EvoMUSART 2021

  15. arXiv:2011.06057  [pdf, other

    cs.CL

    Exploring the Value of Personalized Word Embeddings

    Authors: Charles Welch, Jonathan K. Kummerfeld, Verónica Pérez-Rosas, Rada Mihalcea

    Abstract: In this paper, we introduce personalized word embeddings, and examine their value for language modeling. We compare the performance of our proposed prediction model when using personalized versus generic word representations, and study how these representations can be leveraged for improved performance. We provide insight into what types of words can be more accurately predicted when building pers… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: COLING 2020

  16. arXiv:2010.02986  [pdf, other

    cs.CL cs.AI cs.LG

    Compositional Demographic Word Embeddings

    Authors: Charles Welch, Jonathan K. Kummerfeld, Verónica Pérez-Rosas, Rada Mihalcea

    Abstract: Word embeddings are usually derived from corpora containing text from many individuals, thus leading to general purpose representations rather than individually personalized representations. While personalized embeddings can be useful to improve language model performance and other language processing tasks, they can only be computed for people with a large amount of longitudinal data, which is no… ▽ More

    Submitted 29 October, 2020; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: To appear at EMNLP 2020

  17. arXiv:2009.14109  [pdf, other

    cs.CL

    Improving Low Compute Language Modeling with In-Domain Embedding Initialisation

    Authors: Charles Welch, Rada Mihalcea, Jonathan K. Kummerfeld

    Abstract: Many NLP applications, such as biomedical data and technical support, have 10-100 million tokens of in-domain data and limited computational resources for learning from it. How should we train a language model in this scenario? Most language modeling research considers either a small dataset with a closed vocabulary (like the standard 1 million token Penn Treebank), or the whole web with byte-pair… ▽ More

    Submitted 30 September, 2020; v1 submitted 29 September, 2020; originally announced September 2020.

    Comments: To appear at EMNLP 2020

    ACM Class: I.2.7

  18. arXiv:2004.14876  [pdf, other

    cs.CL

    Analyzing the Surprising Variability in Word Embedding Stability Across Languages

    Authors: Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea

    Abstract: Word embeddings are powerful representations that form the foundation of many natural language processing architectures, both in English and in other languages. To gain further insight into word embeddings, we explore their stability (e.g., overlap between the nearest neighbors of a word in different embedding spaces) in diverse languages. We discuss linguistic properties that are related to stabi… ▽ More

    Submitted 9 September, 2021; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: Accepted to EMNLP 2021

  19. arXiv:1911.06394  [pdf, other

    cs.CL

    The Eighth Dialog System Technology Challenge

    Authors: Seokhwan Kim, Michel Galley, Chulaka Gunasekara, Sungjin Lee, Adam Atkinson, Baolin Peng, Hannes Schulz, Jianfeng Gao, Jinchao Li, Mahmoud Adada, Minlie Huang, Luis Lastras, Jonathan K. Kummerfeld, Walter S. Lasecki, Chiori Hori, Anoop Cherian, Tim K. Marks, Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta

    Abstract: This paper introduces the Eighth Dialog System Technology Challenge. In line with recent challenges, the eighth edition focuses on applying end-to-end dialog technologies in a pragmatic way for multi-domain task-completion, noetic response selection, audio visual scene-aware dialog, and schema-guided dialog state tracking tasks. This paper describes the task definition, provided datasets, and eval… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

    Comments: Submitted to NeurIPS 2019 3rd Conversational AI Workshop

  20. arXiv:1909.02128  [pdf, other

    cs.AI cs.LG cs.MA

    No Press Diplomacy: Modeling Multi-Agent Gameplay

    Authors: Philip Paquette, Yuchen Lu, Steven Bocco, Max O. Smith, Satya Ortiz-Gagne, Jonathan K. Kummerfeld, Satinder Singh, Joelle Pineau, Aaron Courville

    Abstract: Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Reliance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy wher… ▽ More

    Submitted 19 November, 2019; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: Accepted at NeurIPS 2019

  21. arXiv:1909.02027  [pdf, other

    cs.CL cs.AI cs.LG

    An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction

    Authors: Stefan Larson, Anish Mahendran, Joseph J. Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K. Kummerfeld, Kevin Leach, Michael A. Laurenzano, Lingjia Tang, Jason Mars

    Abstract: Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope---i.e., queries that do not fall into any of the system's supported intents. This poses a new challenge because models cannot assume that… ▽ More

    Submitted 4 September, 2019; originally announced September 2019.

    Comments: Accepted to EMNLP-IJCNLP 2019

  22. SLATE: A Super-Lightweight Annotation Tool for Experts

    Authors: Jonathan K. Kummerfeld

    Abstract: Many annotation tools have been developed, covering a wide variety of tasks and providing features like user management, pre-processing, and automatic labeling. However, all of these tools use Graphical User Interfaces, and often require substantial effort to install and configure. This paper presents a new annotation tool that is designed to fill the niche of a lightweight interface for users wit… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

    Comments: To appear at ACL as a demo

    ACM Class: I.2.7

    Journal ref: ACL: Demonstrations (2019) 7-12

  23. arXiv:1904.11610  [pdf, other

    cs.CL cs.AI

    Look Who's Talking: Inferring Speaker Attributes from Personal Longitudinal Dialog

    Authors: Charles Welch, Verónica Pérez-Rosas, Jonathan K. Kummerfeld, Rada Mihalcea

    Abstract: We examine a large dialog corpus obtained from the conversation history of a single individual with 104 conversation partners. The corpus consists of half a million instant messages, across several messaging platforms. We focus our analyses on seven speaker attributes, each of which partitions the set of speakers, namely: gender; relative age; family member; romantic partner; classmate; co-worker;… ▽ More

    Submitted 25 April, 2019; originally announced April 2019.

    Comments: 15 pages accepted to CICLing 2019

    Journal ref: Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2019)

  24. arXiv:1904.03122  [pdf, other

    cs.CL

    Outlier Detection for Improved Data Quality and Diversity in Dialog Systems

    Authors: Stefan Larson, Anish Mahendran, Andrew Lee, Jonathan K. Kummerfeld, Parker Hill, Michael A. Laurenzano, Johann Hauswald, Lingjia Tang, Jason Mars

    Abstract: In a corpus of data, outliers are either errors: mistakes in the data that are counterproductive, or are unique: informative samples that improve model robustness. Identifying outliers can lead to better datasets by (1) removing noise in datasets and (2) guiding collection of additional data to fill gaps. However, the problem of detecting both outlier types has received relatively little attention… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

    Comments: Accepted as long paper to NAACL 2019

  25. arXiv:1901.03461  [pdf, ps, other

    cs.CL

    Dialog System Technology Challenge 7

    Authors: Koichiro Yoshino, Chiori Hori, Julien Perez, Luis Fernando D'Haro, Lazaros Polymenakos, Chulaka Gunasekara, Walter S. Lasecki, Jonathan K. Kummerfeld, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan, Xiang Gao, Huda Alamari, Tim K. Marks, Devi Parikh, Dhruv Batra

    Abstract: This paper introduces the Seventh Dialog System Technology Challenges (DSTC), which use shared datasets to explore the problem of building dialog systems. Recently, end-to-end dialog modeling approaches have been applied to various dialog tasks. The seventh DSTC (DSTC7) focuses on developing technologies related to end-to-end dialog systems for (1) sentence selection, (2) sentence generation and (… ▽ More

    Submitted 10 January, 2019; originally announced January 2019.

    Comments: This paper is presented at NIPS2018 2nd Conversational AI workshop

  26. A Large-Scale Corpus for Conversation Disentanglement

    Authors: Jonathan K. Kummerfeld, Sai R. Gouravajhala, Joseph Peper, Vignesh Athreya, Chulaka Gunasekara, Jatin Ganhotra, Siva Sankalp Patel, Lazaros Polymenakos, Walter S. Lasecki

    Abstract: Disentangling conversations mixed together in a single stream of messages is a difficult task, made harder by the lack of large manually annotated datasets. We created a new dataset of 77,563 messages manually annotated with reply-structure graphs that both disentangle conversations and define internal conversation structure. Our dataset is 16 times larger than all previously released datasets com… ▽ More

    Submitted 18 July, 2019; v1 submitted 25 October, 2018; originally announced October 2018.

    Comments: To appear at ACL

    ACM Class: I.2.7

    Journal ref: ACL (2019) 3846-3856

  27. arXiv:1806.09029  [pdf, other

    cs.CL cs.AI cs.DB

    Improving Text-to-SQL Evaluation Methodology

    Authors: Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, Dragomir Radev

    Abstract: To be informative, an evaluation must measure how well systems generalize to realistic unseen data. We identify limitations of and propose improvements to current evaluations of text-to-SQL systems. First, we compare human-generated and automatically generated questions, characterizing properties of queries necessary for real-world applications. To facilitate evaluation on multiple datasets, we re… ▽ More

    Submitted 23 June, 2018; originally announced June 2018.

    Comments: To appear at ACL 2018

    ACM Class: I.2.7; I.2.1; H.2.3; H.3.4

    Journal ref: ACL (2018) 351-360

  28. Factors Influencing the Surprising Instability of Word Embeddings

    Authors: Laura Wendlandt, Jonathan K. Kummerfeld, Rada Mihalcea

    Abstract: Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this paper, we consider one aspect of embedding spaces, namely their stability. We show that even relatively high frequency words (100-200 occurrences) are often unstable. We provide empirical evidence for how various factors contribute to the stability… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

    Comments: NAACL HLT 2018

    Journal ref: NAACL-HLT (2018) 2092-2102

  29. Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation

    Authors: Greg Durrett, Jonathan K. Kummerfeld, Taylor Berg-Kirkpatrick, Rebecca S. Portnoff, Sadia Afroz, Damon McCoy, Kirill Levchenko, Vern Paxson

    Abstract: One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data. In this work, we study the task of identifying products being bought and sold in online cybercrime forums, which exhibits particularly challenging cross-domain effects. We formulate a task that represents a hybrid of slot-filling information extraction and named entity recognition and annotate d… ▽ More

    Submitted 31 August, 2017; originally announced August 2017.

    Comments: To appear at EMNLP 2017

    ACM Class: I.2.7

    Journal ref: EMNLP (2017) 2598-2607

  30. Parsing with Traces: An $O(n^4)$ Algorithm and a Structural Representation

    Authors: Jonathan K. Kummerfeld, Dan Klein

    Abstract: General treebank analyses are graph structured, but parsers are typically restricted to tree structures for efficiency and modeling reasons. We propose a new representation and algorithm for a class of graph structures that is flexible enough to cover almost all treebank structures, while still admitting efficient learning and inference. In particular, we consider directed, acyclic, one-endpoint-c… ▽ More

    Submitted 13 July, 2017; originally announced July 2017.

    Comments: To appear in Transactions of the Association for Computational Linguistics

    ACM Class: I.2.7; F.2.2; G.2.2

    Journal ref: TACL 5 (2017) 441-454

  31. Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection

    Authors: Youxuan Jiang, Jonathan K. Kummerfeld, Walter S. Lasecki

    Abstract: Linguistically diverse datasets are critical for training and evaluating robust machine learning systems, but data collection is a costly process that often requires experts. Crowdsourcing the process of paraphrase generation is an effective means of expanding natural language datasets, but there has been limited analysis of the trade-offs that arise when designing tasks. In this paper, we present… ▽ More

    Submitted 19 October, 2017; v1 submitted 19 April, 2017; originally announced April 2017.

    Comments: Published at ACL 2017

    ACM Class: I.2.7; H.5.0

    Journal ref: ACL (2017) 103-109