Skip to main content

Showing 1–31 of 31 results for author: Voss, C

  1. arXiv:2310.17568  [pdf, other

    cs.HC cs.CL cs.RO

    Navigating to Success in Multi-Modal Human-Robot Collaboration: Analysis and Corpus Release

    Authors: Stephanie M. Lukin, Kimberly A. Pollard, Claire Bonial, Taylor Hudson, Ron Arstein, Clare Voss, David Traum

    Abstract: Human-guided robotic exploration is a useful approach to gathering information at remote locations, especially those that might be too risky, inhospitable, or inaccessible for humans. Maintaining common ground between the remotely-located partners is a challenge, one that can be facilitated by multi-modal communication. In this paper, we explore how participants utilized multiple modalities to inv… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 7 pages, 3 figures

    Journal ref: Proceedings of the 2023 IEEE Robot and Human Interactive Communication Conference

  2. arXiv:2305.14331  [pdf, other

    cs.CL cs.AI

    What Else Do I Need to Know? The Effect of Background Information on Users' Reliance on QA Systems

    Authors: Navita Goyal, Eleftheria Briakou, Amanda Liu, Connor Baumler, Claire Bonial, Jeffrey Micher, Clare R. Voss, Marine Carpuat, Hal Daumé III

    Abstract: NLP systems have shown impressive performance at answering questions by retrieving relevant context. However, with the increasingly large models, it is impossible and often undesirable to constrain models' knowledge or reasoning to only the retrieved context. This leads to a mismatch between the information that the models access to derive the answer and the information that is available to the us… ▽ More

    Submitted 25 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  3. arXiv:2303.14337  [pdf, other

    cs.CL

    SmartBook: AI-Assisted Situation Report Generation for Intelligence Analysts

    Authors: Revanth Gangi Reddy, Daniel Lee, Yi R. Fung, Khanh Duy Nguyen, Qi Zeng, Manling Li, Ziqi Wang, Clare Voss, Heng Ji

    Abstract: Timely and comprehensive understanding of emerging events is crucial for effective decision-making; automating situation report generation can significantly reduce the time, effort, and cost for intelligence analysts. In this work, we identify intelligence analysts' practices and preferences for AI assistance in situation report generation to guide the design strategies for an effective, trust-bui… ▽ More

    Submitted 27 May, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: Preprint

  4. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  5. arXiv:2210.12582  [pdf, other

    cs.CL cs.AI

    Language Model Pre-Training with Sparse Latent Typing

    Authors: Liliang Ren, Zixuan Zhang, Han Wang, Clare R. Voss, Chengxiang Zhai, Heng Ji

    Abstract: Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by propo… ▽ More

    Submitted 26 October, 2022; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 (Oral)

  6. arXiv:2104.06344  [pdf, other

    cs.AI

    The Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction

    Authors: Manling Li, Sha Li, Zhenhailong Wang, Lifu Huang, Kyunghyun Cho, Heng Ji, Jiawei Han, Clare Voss

    Abstract: Event schemas encode knowledge of stereotypical structures of events and their connections. As events unfold, schemas are crucial to act as a scaffolding. Previous work on event schema induction focuses either on atomic events or linear temporal event sequences, ignoring the interplay between events via arguments and argument relations. We introduce a new concept of Temporal Complex Event Schema:… ▽ More

    Submitted 29 April, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

  7. arXiv:2102.12092  [pdf, other

    cs.CV cs.LG

    Zero-Shot Text-to-Image Generation

    Authors: Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever

    Abstract: Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and… ▽ More

    Submitted 26 February, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

  8. arXiv:2101.03477  [pdf

    cs.CV cs.HC

    Training Affective Computer Vision Models by Crowdsourcing Soft-Target Labels

    Authors: Peter Washington, Onur Cezmi Mutlu, Emilie Leblanc, Aaron Kline, Cathy Hou, Brianna Chrisman, Nate Stockham, Kelley Paskov, Catalin Voss, Nick Haber, Dennis Wall

    Abstract: Emotion classifiers traditionally predict discrete emotions. However, emotion expressions are often subjective, thus requiring a method to handle subjective labels. We explore the use of crowdsourcing to acquire reliable soft-target labels and evaluate an emotion detection classifier trained with these labels. We center our study on the Child Affective Facial Expression (CAFE) dataset, a gold stan… ▽ More

    Submitted 22 September, 2021; v1 submitted 10 January, 2021; originally announced January 2021.

  9. An Ownership Policy and Deadlock Detector for Promises

    Authors: Caleb Voss, Vivek Sarkar

    Abstract: Task-parallel programs often enjoy deadlock freedom under certain restrictions, such as the use of structured join operations, as in Cilk and X10, or the use of asynchronous task futures together with deadlock-avoiding policies such as Known Joins or Transitive Joins. However, the promise, a popular synchronization primitive for parallel tasks, does not enjoy deadlock-freedom guarantees. Promises… ▽ More

    Submitted 4 January, 2021; originally announced January 2021.

    Journal ref: Principles and Practice of Parallel Programming, 2021, ACM, pp. 348-361

  10. arXiv:2012.08678  [pdf

    cs.CV cs.CY cs.HC

    Improved Digital Therapy for Developmental Pediatrics Using Domain-Specific Artificial Intelligence: Machine Learning Study

    Authors: Peter Washington, Haik Kalantarian, John Kent, Arman Husic, Aaron Kline, Emilie Leblanc, Cathy Hou, Onur Cezmi Mutlu, Kaitlyn Dunlap, Yordan Penev, Maya Varma, Nate Tyler Stockham, Brianna Chrisman, Kelley Paskov, Min Woo Sun, Jae-Yoon Jung, Catalin Voss, Nick Haber, Dennis Paul Wall

    Abstract: Background: Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces. Objective: We designed a strategy to gamify the collection and labeling of child emot… ▽ More

    Submitted 3 June, 2024; v1 submitted 15 December, 2020; originally announced December 2020.

    Journal ref: JMIR pediatrics and parenting 5.2 (2022): e26760

  11. arXiv:2009.01325  [pdf, other

    cs.CL cs.AI cs.LG

    Learning to summarize from human feedback

    Authors: Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano

    Abstract: As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about -- summary quality. In this work, we show that it is possible t… ▽ More

    Submitted 15 February, 2022; v1 submitted 2 September, 2020; originally announced September 2020.

    Comments: NeurIPS 2020

  12. arXiv:2007.00576  [pdf, other

    cs.CL cs.AI

    COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation

    Authors: Qingyun Wang, Manling Li, Xuan Wang, Nikolaus Parulian, Guangxing Han, Jiawei Ma, Jingxuan Tu, Ying Lin, Haoran Zhang, Weili Liu, Aabhas Chauhan, Yingjun Guan, Bangzheng Li, Ruisong Li, Xiangchen Song, Yi R. Fung, Heng Ji, Jiawei Han, Shih-Fu Chang, James Pustejovsky, Jasmine Rah, David Liem, Ahmed Elsayed, Martha Palmer, Clare Voss , et al. (2 additional authors not shown)

    Abstract: To combat COVID-19, both clinicians and scientists need to digest vast amounts of relevant biomedical knowledge in scientific literature to understand the disease mechanism and related biological functions. We have developed a novel and comprehensive knowledge discovery framework, COVID-KG to extract fine-grained multimedia knowledge elements (entities and their visual chemical structures, relatio… ▽ More

    Submitted 11 May, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: 12 pages, Accepted by Proceedings of 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics System Demonstrations, for resources see http://blender.cs.illinois.edu/covid19/, for video see http://159.89.180.81/demo/covid/Covid-KG_DemoVideo.mp4, for slides see https://eaglew.github.io/files/Covid-KG_DemoVideo_with_ethics.pdf

  13. arXiv:2004.14281  [pdf

    cs.HC cs.CV cs.LG

    A Wearable Social Interaction Aid for Children with Autism

    Authors: Nick Haber, Catalin Voss, Jena Daniels, Peter Washington, Azar Fazel, Aaron Kline, Titas De, Terry Winograd, Carl Feinstein, Dennis P. Wall

    Abstract: With most recent estimates giving an incidence rate of 1 in 68 children in the United States, the autism spectrum disorder (ASD) is a growing public health crisis. Many of these children struggle to make eye contact, recognize facial expressions, and engage in social interactions. Today the standard for treatment of the core autism-related deficits focuses on a form of behavior training known as A… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

  14. arXiv:2002.06581  [pdf

    cs.HC

    Superpower Glass: Delivering Unobtrusive Real-time Social Cues in Wearable Systems

    Authors: Catalin Voss, Peter Washington, Nick Haber, Aaron Kline, Jena Daniels, Azar Fazel, Titas De, Beth McCarthy, Carl Feinstein, Terry Winograd, Dennis Wall

    Abstract: We have developed a system for automatic facial expression recognition, which runs on Google Glass and delivers real-time social cues to the wearer. We evaluate the system as a behavioral aid for children with Autism Spectrum Disorder (ASD), who can greatly benefit from real-time non-invasive emotional cues and are more sensitive to sensory input than neurotypically developing children. In additio… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.

    Comments: UbiComp ISWC 2016

  15. arXiv:2002.04263  [pdf

    cs.HC

    Designing a Holistic At-Home Learning Aid for Autism

    Authors: Catalin Voss, Nick Haber, Peter Washington, Aaron Kline, Beth McCarthy, Jena Daniels, Azar Fazel, Titas De, Carl Feinstein, Terry Winograd, Dennis Wall

    Abstract: In recent years, much focus has been put on employing technology to make novel behavioural aids for those with autism. Most of these are digital adaptations of tools used in standard behavioural therapy to enforce normative skills. These digital counterparts are often used outside of both the larger therapeutic context and the real world, in which the learned skills might apply. To address this, w… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Comments: Conference Workshop

    Journal ref: CHI 2016 - Autism Technology Workshop

  16. arXiv:1910.05624  [pdf, other

    cs.RO cs.CL cs.HC

    A Research Platform for Multi-Robot Dialogue with Humans

    Authors: Matthew Marge, Stephen Nogar, Cory J. Hayes, Stephanie M. Lukin, Jesse Bloecker, Eric Holder, Clare Voss

    Abstract: This paper presents a research platform that supports spoken dialogue interaction with multiple robots. The demonstration showcases our crafted MultiBot testing scenario in which users can verbally issue search, navigate, and follow instructions to two robotic teammates: a simulated ground robot and an aerial robot. This flexible language and robotic platform takes advantage of existing tools for… ▽ More

    Submitted 12 October, 2019; originally announced October 2019.

    Comments: Accepted for publication at NAACL 2019; also presented at AI-HRI 2019 (arXiv:1909.04812)

    Report number: AI-HRI/2019/05

  17. arXiv:1906.00038  [pdf, other

    cs.CL cs.CV

    Visual Understanding and Narration: A Deeper Understanding and Explanation of Visual Scenes

    Authors: Stephanie M. Lukin, Claire Bonial, Clare R. Voss

    Abstract: We describe the task of Visual Understanding and Narration, in which a robot (or agent) generates text for the images that it collects when navigating its environment, by answering open-ended questions, such as 'what happens, or might have happened, here?'

    Submitted 23 September, 2019; v1 submitted 31 May, 2019; originally announced June 2019.

    Comments: 2-page extended abstract, presented at the Workshop on Shortcomings in Vision and Language (SiVL), 2019, at the North American Association for Computational Linguistics (NAACL)

  18. arXiv:1810.02017  [pdf, other

    cs.RO cs.HC

    Balancing Efficiency and Coverage in Human-Robot Dialogue Collection

    Authors: Matthew Marge, Claire Bonial, Stephanie Lukin, Cory Hayes, Ashley Foots, Ron Artstein, Cassidy Henry, Kimberly Pollard, Carla Gordon, Felix Gervits, Anton Leuski, Susan Hill, Clare Voss, David Traum

    Abstract: We describe a multi-phased Wizard-of-Oz approach to collecting human-robot dialogue in a collaborative search and navigation task. The data is being used to train an initial automated robot dialogue system to support collaborative exploration tasks. In the first phase, a wizard freely typed robot utterances to human participants. For the second phase, this data was used to design a GUI that includ… ▽ More

    Submitted 7 October, 2018; v1 submitted 3 October, 2018; originally announced October 2018.

    Comments: Presented at AI-HRI AAAI-FSS, 2018 (arXiv:1809.06606)

    Report number: AI-HRI/2018/01

  19. arXiv:1807.08077  [pdf, other

    cs.CL

    A Pipeline for Creative Visual Storytelling

    Authors: Stephanie M. Lukin, Reginald Hobbs, Clare R. Voss

    Abstract: Computational visual storytelling produces a textual description of events and interpretations depicted in a sequence of images. These texts are made possible by advances and cross-disciplinary approaches in natural language processing, generation, and computer vision. We define a computational creative visual storytelling as one with the ability to alter the telling of a story along three aspects… ▽ More

    Submitted 20 July, 2018; originally announced July 2018.

    Comments: Originally published in the Proceedings of the First Workshop on Storytelling (StoryNLP), 2018, at the North American Association for Computational Linguistics (NAACL)

  20. arXiv:1807.08076  [pdf, ps, other

    cs.CL cs.HC cs.RO

    Consequences and Factors of Stylistic Differences in Human-Robot Dialogue

    Authors: Stephanie M. Lukin, Kimberly A. Pollard, Claire Bonial, Matthew Marge, Cassidy Henry, Ron Arstein, David Traum, Clare R. Voss

    Abstract: This paper identifies stylistic differences in instruction-giving observed in a corpus of human-robot dialogue. Differences in verbosity and structure (i.e., single-intent vs. multi-intent instructions) arose naturally without restrictions or prior guidance on how users should speak with the robot. Different styles were found to produce different rates of miscommunication, and correlations were fo… ▽ More

    Submitted 20 July, 2018; originally announced July 2018.

    Comments: Originally published in the Proceedings of the 19th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2018

  21. arXiv:1807.08074  [pdf, other

    cs.CL cs.HC

    ScoutBot: A Dialogue System for Collaborative Navigation

    Authors: Stephanie M. Lukin, Felix Gervits, Cory J. Hayes, Anton Leuski, Pooja Moolchandani, John G. Rogers III, Carlos Sanchez Amaro, Matthew Marge, Clare R. Voss, David Traum

    Abstract: ScoutBot is a dialogue interface to physical and simulated robots that supports collaborative exploration of environments. The demonstration will allow users to issue unconstrained spoken language commands to ScoutBot. ScoutBot will prompt for clarification if the user's instruction needs additional input. It is trained on human-robot dialogue collected from Wizard-of-Oz experiments, where robot r… ▽ More

    Submitted 20 July, 2018; originally announced July 2018.

    Comments: Originally published in the Proceedings of the Association for Computational Linguistics (ACL) 2018, System Demonstrations, 93-98

  22. arXiv:1805.01818  [pdf, other

    cs.CV

    Object and Text-guided Semantics for CNN-based Activity Recognition

    Authors: Sungmin Eum, Christopher Reale, Heesung Kwon, Claire Bonial, Clare Voss

    Abstract: Many previous methods have demonstrated the importance of considering semantically relevant objects for carrying out video-based human activity recognition, yet none of the methods have harvested the power of large text corpora to relate the objects and the activities to be transferred into learning a unified deep convolutional neural network. We present a novel activity recognition CNN which co-l… ▽ More

    Submitted 4 May, 2018; originally announced May 2018.

    Comments: Submitted to ICIP 2018

  23. arXiv:1710.06406  [pdf, other

    cs.CL cs.AI cs.HC cs.RO

    Laying Down the Yellow Brick Road: Development of a Wizard-of-Oz Interface for Collecting Human-Robot Dialogue

    Authors: Claire Bonial, Matthew Marge, Ron artstein, Ashley Foots, Felix Gervits, Cory J. Hayes, Cassidy Henry, Susan G. Hill, Anton Leuski, Stephanie M. Lukin, Pooja Moolchandani, Kimberly A. Pollard, David Traum, Clare R. Voss

    Abstract: We describe the adaptation and refinement of a graphical user interface designed to facilitate a Wizard-of-Oz (WoZ) approach to collecting human-robot dialogue data. The data collected will be used to develop a dialogue system for robot navigation. Building on an interface previously used in the development of dialogue systems for virtual agents and video playback, we add templates with open param… ▽ More

    Submitted 17 October, 2017; originally announced October 2017.

    Comments: 7 pages, 2 figures, accepted for oral presentation at the Symposium on Natural Communication for Human-Robot Collaboration, AAAI Fall Symposium Series, November 9-11, 2017, https://www.aaai.org/ocs/index.php/FSS/FSS17

  24. arXiv:1710.03357  [pdf, other

    cs.PL

    Proofs as Relational Invariants of Synthesized Execution Grammars

    Authors: Caleb Voss, David Heath, William Harris

    Abstract: The automatic verification of programs that maintain unbounded low-level data structures is a critical and open problem. Analyzers and verifiers developed in previous work can synthesize invariants that only describe data structures of heavily restricted forms, or require an analyst to provide predicates over program data and structure that are used in a synthesized proof of correctness. In this… ▽ More

    Submitted 9 October, 2017; originally announced October 2017.

  25. arXiv:1707.01066  [pdf, other

    cs.CL

    Zero-Shot Transfer Learning for Event Extraction

    Authors: Lifu Huang, Heng Ji, Kyunghyun Cho, Clare R. Voss

    Abstract: Most previous event extraction studies have relied heavily on features derived from annotated event mentions, thus cannot be applied to new event types without annotation effort. In this work, we take a fresh look at event extraction and model it as a grounding problem. We design a transferable neural architecture, mapping event mentions and types jointly into a shared semantic space using structu… ▽ More

    Submitted 4 July, 2017; originally announced July 2017.

  26. arXiv:1703.03714  [pdf

    cs.CL cs.AI cs.HC cs.RO

    Applying the Wizard-of-Oz Technique to Multimodal Human-Robot Dialogue

    Authors: Matthew Marge, Claire Bonial, Brendan Byrne, Taylor Cassidy, A. William Evans, Susan G. Hill, Clare Voss

    Abstract: Our overall program objective is to provide more natural ways for soldiers to interact and communicate with robots, much like how soldiers communicate with other soldiers today. We describe how the Wizard-of-Oz (WOz) method can be applied to multimodal human-robot dialogue in a collaborative exploration task. While the WOz method can help design robot behaviors, traditional approaches place the bu… ▽ More

    Submitted 10 March, 2017; originally announced March 2017.

    Comments: Presented at the 2016 IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Interactive Session, August 26-31, 2016

  27. arXiv:1702.04457  [pdf, other

    cs.CL

    Automated Phrase Mining from Massive Text Corpora

    Authors: Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, Jiawei Han

    Abstract: As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus. Phrase mining is important in various tasks such as information extraction/retrieval, taxonomy construction, and topic modeling. Most existing methods rely on complex, trained linguistic analyzers, and thus likely have unsatisfactory performance on text corpora of new domains and… ▽ More

    Submitted 11 March, 2017; v1 submitted 14 February, 2017; originally announced February 2017.

  28. arXiv:1610.08763  [pdf, other

    cs.CL cs.LG

    CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases

    Authors: Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han

    Abstract: Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. In… ▽ More

    Submitted 2 June, 2017; v1 submitted 27 October, 2016; originally announced October 2016.

    Comments: WWW 2017

  29. arXiv:1606.04000  [pdf, other

    cs.AI

    Using a Distributional Semantic Vector Space with a Knowledge Base for Reasoning in Uncertain Conditions

    Authors: Douglas Summers-Stay, Clare Voss, Taylor Cassidy

    Abstract: The inherent inflexibility and incompleteness of commonsense knowledge bases (KB) has limited their usefulness. We describe a system called Displacer for performing KB queries extended with the analogical capabilities of the word2vec distributional semantic vector space (DSVS). This allows the system to answer queries with information which was not contained in the original KB in any form. By perf… ▽ More

    Submitted 13 June, 2016; originally announced June 2016.

    Journal ref: Biologically Inspired Cognitive Architectures (2016), pp. 34-44

  30. arXiv:1602.05307  [pdf, other

    cs.CL cs.LG

    Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding

    Authors: Xiang Ren, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Jiawei Han

    Abstract: Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions. However, the type labels so obtained from knowledge bases are often noisy (i.e., incorrect for the entity mention's local context). We define a new task, Label Noise Reduction in Entity Typing (LNR), to be the automatic identifica… ▽ More

    Submitted 17 February, 2016; originally announced February 2016.

    Comments: Submitted to KDD 2016. 11 pages

  31. arXiv:1406.6312  [pdf, other

    cs.CL cs.IR cs.LG

    Scalable Topical Phrase Mining from Text Corpora

    Authors: Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare Voss, Jiawei Han

    Abstract: While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases of mixed lengths. Existing work either performs post processing to the inference results of unigram-based topic models, or utilizes complex n-gram-discovery topic models. These methods ge… ▽ More

    Submitted 18 November, 2014; v1 submitted 24 June, 2014; originally announced June 2014.

    Journal ref: Proceedings of the VLDB Endowment, Vol. 8(3), pp. 305 - 316, 2014