subscribe to arXiv mailings

Aligning Large Language Models with Diverse Political Viewpoints

Authors: Dominik Stammbach, Philine Widmer, Eunjung Cho, Caglar Gulcehre, Elliott Ash

Abstract: Large language models such as ChatGPT often exhibit striking political biases. If users query them about political information, they might take a normative stance and reinforce such biases. To overcome this, we align LLMs with diverse political viewpoints from 100,000 comments written by candidates running for national parliament in Switzerland. Such aligned models are able to generate more accura… ▽ More Large language models such as ChatGPT often exhibit striking political biases. If users query them about political information, they might take a normative stance and reinforce such biases. To overcome this, we align LLMs with diverse political viewpoints from 100,000 comments written by candidates running for national parliament in Switzerland. Such aligned models are able to generate more accurate political viewpoints from Swiss parties compared to commercial models such as ChatGPT. We also propose a procedure to generate balanced overviews from multiple viewpoints using such models. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13474 [pdf, other]

Attention-aware Post-training Quantization without Backpropagation

Authors: Junhan Kim, Ho-young Kim, Eulrang Cho, Chungman Lee, Joonyoung Kim, Yongkweon Jeon

Abstract: Quantization is a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. Existing quantization approaches, however, rely on gradient-based optimization, regardless of it being post-training quantization (PTQ) or quantization-aware training (QAT), which becomes problematic for hyper-scale LLMs with billions of parameters. This overhead can be alleviated… ▽ More Quantization is a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. Existing quantization approaches, however, rely on gradient-based optimization, regardless of it being post-training quantization (PTQ) or quantization-aware training (QAT), which becomes problematic for hyper-scale LLMs with billions of parameters. This overhead can be alleviated via recently proposed backpropagation-free PTQ methods; however, their performance is somewhat limited by their lack of consideration of inter-layer dependencies. In this paper, we thus propose a novel PTQ algorithm that considers inter-layer dependencies without relying on backpropagation. The fundamental concept involved is the development of attention-aware Hessian matrices, which facilitates the consideration of inter-layer dependencies within the attention module. Extensive experiments demonstrate that the proposed algorithm significantly outperforms conventional PTQ methods, particularly for low bit-widths. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 20 pages, under review

arXiv:2406.13144 [pdf, other]

DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents

Authors: Jiho Kim, Woosog Chay, Hyeonji Hwang, Daeun Kyung, Hyunseung Chung, Eunbyeol Cho, Yohan Jo, Edward Choi

Abstract: Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge… ▽ More Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge this gap, we introduce DialSim, a real-time dialogue simulator. In this simulator, an agent is assigned the role of a character from popular TV shows, requiring it to respond to spontaneous questions using past dialogue information and to distinguish between known and unknown information. Key features of DialSim include evaluating the agent's ability to respond within a reasonable time limit, handling long-term multi-party dialogues, and managing adversarial settings (e.g., swap character names) to challenge the agent's reliance on pre-trained knowledge. We utilized this simulator to evaluate the latest conversational agents and analyze their limitations. Our experiments highlight both the strengths and weaknesses of these agents, providing valuable insights for future improvements in the field of conversational AI. DialSim is available at https://github.com/jiho283/Simulator. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2405.19598 [pdf, other]

Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models

Authors: Fujiao Ji, Kiho Lee, Hyungjoon Koo, Wenhao You, Euijin Choo, Hyoungshick Kim, Doowon Kim

Abstract: Phishing attacks pose a significant threat to Internet users, with cybercriminals elaborately replicating the visual appearance of legitimate websites to deceive victims. Visual similarity-based detection systems have emerged as an effective countermeasure, but their effectiveness and robustness in real-world scenarios have been unexplored. In this paper, we comprehensively scrutinize and evaluate… ▽ More Phishing attacks pose a significant threat to Internet users, with cybercriminals elaborately replicating the visual appearance of legitimate websites to deceive victims. Visual similarity-based detection systems have emerged as an effective countermeasure, but their effectiveness and robustness in real-world scenarios have been unexplored. In this paper, we comprehensively scrutinize and evaluate state-of-the-art visual similarity-based anti-phishing models using a large-scale dataset of 450K real-world phishing websites. Our analysis reveals that while certain models maintain high accuracy, others exhibit notably lower performance than results on curated datasets, highlighting the importance of real-world evaluation. In addition, we observe the real-world tactic of manipulating visual components that phishing attackers employ to circumvent the detection systems. To assess the resilience of existing models against adversarial attacks and robustness, we apply visible and perturbation-based manipulations to website logos, which adversaries typically target. We then evaluate the models' robustness in handling these adversarial samples. Our findings reveal vulnerabilities in several models, emphasizing the need for more robust visual similarity techniques capable of withstanding sophisticated evasion attempts. We provide actionable insights for enhancing the security of phishing defense systems, encouraging proactive actions. To the best of our knowledge, this work represents the first large-scale, systematic evaluation of visual similarity-based models for phishing detection in real-world settings, necessitating the development of more effective and robust defenses. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 12 pages

arXiv:2404.09041 [pdf, other]

Three Disclaimers for Safe Disclosure: A Cardwriter for Reporting the Use of Generative AI in Writing Process

Authors: Won Ik Cho, Eunjung Cho, Hyeonji Shin

Abstract: Generative artificial intelligence (AI) and large language models (LLMs) are increasingly being used in the academic writing process. This is despite the current lack of unified framework for reporting the use of machine assistance. In this work, we propose "Cardwriter", an intuitive interface that produces a short report for authors to declare their use of generative AI in their writing process.… ▽ More Generative artificial intelligence (AI) and large language models (LLMs) are increasingly being used in the academic writing process. This is despite the current lack of unified framework for reporting the use of machine assistance. In this work, we propose "Cardwriter", an intuitive interface that produces a short report for authors to declare their use of generative AI in their writing process. The demo is available online, at https://cardwriter.vercel.app △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: 6 pages; an implementation version of PaperCard project

arXiv:2404.05687 [pdf, other]

Retrieval-Augmented Open-Vocabulary Object Detection

Authors: Jooyeon Kim, Eulrang Cho, Sehyung Kim, Hyunwoo J. Kim

Abstract: Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose R… ▽ More Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose Retrieval-Augmented Losses and visual Features (RALF). Our method retrieves related 'negative' classes and augments loss functions. Also, visual features are augmented with 'verbalized concepts' of classes, e.g., worn on the feet, handheld music player, and sharp teeth. Specifically, RALF consists of two modules: Retrieval Augmented Losses (RAL) and Retrieval-Augmented visual Features (RAF). RAL constitutes two losses reflecting the semantic similarity with negative vocabularies. In addition, RAF augments visual features with the verbalized concepts from a large language model (LLM). Our experiments demonstrate the effectiveness of RALF on COCO and LVIS benchmark datasets. We achieve improvement up to 3.4 box AP$_{50}^{\text{N}}$ on novel categories of the COCO dataset and 3.6 mask AP$_{\text{r}}$ gains on the LVIS dataset. Code is available at https://github.com/mlvlab/RALF . △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted paper at CVPR 2024

arXiv:2404.05431 [pdf, other]

Simplifying MBA Expression Using E-Graphs

Authors: Seoksu Lee, Hyeongchang Jeon, Eun-Sun Cho

Abstract: Code obfuscation involves the addition of meaningless code or the complication of existing code in order to make a program difficult to reverse engineer. In recent years, MBA (Mixed Boolean Arithmetic) obfuscation has been applied to virus and malware code to impede expert analysis. Among the various obfuscation techniques, Mixed Boolean Arithmetic (MBA) obfuscation is considered the most challeng… ▽ More Code obfuscation involves the addition of meaningless code or the complication of existing code in order to make a program difficult to reverse engineer. In recent years, MBA (Mixed Boolean Arithmetic) obfuscation has been applied to virus and malware code to impede expert analysis. Among the various obfuscation techniques, Mixed Boolean Arithmetic (MBA) obfuscation is considered the most challenging to decipher using existing code deobfuscation techniques. In this paper, we have attempted to simplify the MBA expression. We use an e-graph data structure to efficiently hold multiple expressions of the same semantics to systematically rewrite terms and find simpler expressions. The preliminary experimental result shows that our e-graph based MBA deobfuscation approach works faster with reasonable performance than other approaches do. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2403.15370 [pdf, other]

Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks

Authors: Aqeel Anwar, Tae Eun Choe, Zian Wang, Sanja Fidler, Minwoo Park

Abstract: Detecting a diverse range of objects under various driving scenarios is essential for the effectiveness of autonomous driving systems. However, the real-world data collected often lacks the necessary diversity presenting a long-tail distribution. Although synthetic data has been utilized to overcome this issue by generating virtual scenes, it faces hurdles such as a significant domain gap and the… ▽ More Detecting a diverse range of objects under various driving scenarios is essential for the effectiveness of autonomous driving systems. However, the real-world data collected often lacks the necessary diversity presenting a long-tail distribution. Although synthetic data has been utilized to overcome this issue by generating virtual scenes, it faces hurdles such as a significant domain gap and the substantial efforts required from 3D artists to create realistic environments. To overcome these challenges, we present ARSim, a fully automated, comprehensive, modular framework designed to enhance real multi-view image data with 3D synthetic objects of interest. The proposed method integrates domain adaptation and randomization strategies to address covariate shift between real and simulated data by inferring essential domain attributes from real data and employing simulation-based randomization for other attributes. We construct a simplified virtual scene using real data and strategically place 3D synthetic assets within it. Illumination is achieved by estimating light distribution from multiple images capturing the surroundings of the vehicle. Camera parameters from real data are employed to render synthetic assets in each frame. The resulting augmented multi-view consistent dataset is used to train a multi-camera perception network for autonomous vehicles. Experimental results on various AV perception tasks demonstrate the superior performance of networks trained on the augmented dataset. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 17 pages, 15 figures, 7 tables

arXiv:2403.02786 [pdf, other]

Semi-Supervised Graph Representation Learning with Human-centric Explanation for Predicting Fatty Liver Disease

Authors: So Yeon Kim, Sehee Wang, Eun Kyung Choe

Abstract: Addressing the challenge of limited labeled data in clinical settings, particularly in the prediction of fatty liver disease, this study explores the potential of graph representation learning within a semi-supervised learning framework. Leveraging graph neural networks (GNNs), our approach constructs a subject similarity graph to identify risk patterns from health checkup data. The effectiveness… ▽ More Addressing the challenge of limited labeled data in clinical settings, particularly in the prediction of fatty liver disease, this study explores the potential of graph representation learning within a semi-supervised learning framework. Leveraging graph neural networks (GNNs), our approach constructs a subject similarity graph to identify risk patterns from health checkup data. The effectiveness of various GNN approaches in this context is demonstrated, even with minimal labeled samples. Central to our methodology is the inclusion of human-centric explanations through explainable GNNs, providing personalized feature importance scores for enhanced interpretability and clinical relevance, thereby underscoring the potential of our approach in advancing healthcare practices with a keen focus on graph representation learning and human-centric explanation. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: Paper accepted in Human-Centric Representation Learning workshop at AAAI 2024 (https://hcrl-workshop.github.io/2024/)

arXiv:2311.08788 [pdf, other]

X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects

Authors: Minqian Liu, Ying Shen, Zhiyang Xu, Yixin Cao, Eunah Cho, Vaibhav Kumar, Reza Ghanadan, Lifu Huang

Abstract: Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e.g., consistency and naturalness) to obtain a comprehensive assessment. However, multi-aspect evaluation remains challenging as it may require the evaluator to generalize to any given evaluation aspect even if it's absent during training. In this paper, we introduce X-Eval, a two-stage instructi… ▽ More Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e.g., consistency and naturalness) to obtain a comprehensive assessment. However, multi-aspect evaluation remains challenging as it may require the evaluator to generalize to any given evaluation aspect even if it's absent during training. In this paper, we introduce X-Eval, a two-stage instruction tuning framework to evaluate the text in both seen and unseen aspects customized by end users. X-Eval consists of two learning stages: the vanilla instruction tuning stage that improves the model's ability to follow evaluation instructions, and an enhanced instruction tuning stage that exploits the connections between fine-grained evaluation aspects to better assess text quality. To support the training of X-Eval, we collect AspectInstruct, the first instruction tuning dataset tailored for multi-aspect NLG evaluation spanning 27 diverse evaluation aspects with 65 tasks. To enhance task diversity, we devise an augmentation strategy that converts human rating annotations into diverse forms of NLG evaluation tasks, including scoring, comparison, ranking, and Boolean question answering. Extensive experiments across three essential categories of NLG tasks: dialogue generation, summarization, and data-to-text coupled with 21 aspects in meta-evaluation, demonstrate that our X-Eval enables even a lightweight language model to achieve a comparable if not higher correlation with human judgments compared to the state-of-the-art NLG evaluators, such as GPT-4. △ Less

Submitted 13 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: NAACL 2024 Main Conference. 20 pages, 6 figures, 17 tables

arXiv:2310.18652 [pdf, other]

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

Authors: Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei Ji, Eric I-Chao Chang, Tackeun Kim, Edward Choi

Abstract: Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop o… ▽ More Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop our dataset, we first construct two uni-modal resources: 1) The MIMIC-CXR-VQA dataset, our newly created medical visual question answering (VQA) benchmark, specifically designed to augment the imaging modality in EHR QA, and 2) EHRSQL (MIMIC-IV), a refashioned version of a previously established table-based EHR QA dataset. By integrating these two uni-modal resources, we successfully construct a multi-modal EHR QA dataset that necessitates both uni-modal and cross-modal reasoning. To address the unique challenges of multi-modal questions within EHRs, we propose a NeuralSQL-based strategy equipped with an external VQA API. This pioneering endeavor enhances engagement with multi-modal EHR sources and we believe that our dataset can catalyze advances in real-world medical scenarios such as clinical decision-making and research. EHRXQA is available at https://github.com/baeseongsu/ehrxqa. △ Less

Submitted 25 December, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

Comments: Accepted at NeurIPS 2023 Datasets and Benchmarks Track (10 pages for main text, 4 pages for references, 39 pages for supplementary materials)

arXiv:2310.05791 [pdf, other]

Problem-Solving Guide: Predicting the Algorithm Tags and Difficulty for Competitive Programming Problems

Authors: Juntae Kim, Eunjung Cho, Dongwoo Kim, Dongbin Na

Abstract: The recent program development industries have required problem-solving abilities for engineers, especially application developers. However, AI-based education systems to help solve computer algorithm problems have not yet attracted attention, while most big tech companies require the ability to solve algorithm problems including Google, Meta, and Amazon. The most useful guide to solving algorithm… ▽ More The recent program development industries have required problem-solving abilities for engineers, especially application developers. However, AI-based education systems to help solve computer algorithm problems have not yet attracted attention, while most big tech companies require the ability to solve algorithm problems including Google, Meta, and Amazon. The most useful guide to solving algorithm problems might be guessing the category (tag) of the facing problems. Therefore, our study addresses the task of predicting the algorithm tag as a useful tool for engineers and developers. Moreover, we also consider predicting the difficulty levels of algorithm problems, which can be used as useful guidance to calculate the required time to solve that problem. In this paper, we present a real-world algorithm problem multi-task dataset, AMT, by mainly collecting problem samples from the most famous and large competitive programming website Codeforces. To the best of our knowledge, our proposed dataset is the most large-scale dataset for predicting algorithm tags compared to previous studies. Moreover, our work is the first to address predicting the difficulty levels of algorithm problems. We present a deep learning-based novel method for simultaneously predicting algorithm tags and the difficulty levels of an algorithm problem given. All datasets and source codes are available at https://github.com/sronger/PSG_Predicting_Algorithm_Tags_and_Difficulty. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 8 pages

arXiv:2310.04824 [pdf, other]

PaperCard for Reporting Machine Assistance in Academic Writing

Authors: Won Ik Cho, Eunjung Cho, Kyunghyun Cho

Abstract: Academic writing process has benefited from various technological developments over the years including search engines, automatic translators, and editing tools that review grammar and spelling mistakes. They have enabled human writers to become more efficient in writing academic papers, for example by helping with finding relevant literature more effectively and polishing texts. While these devel… ▽ More Academic writing process has benefited from various technological developments over the years including search engines, automatic translators, and editing tools that review grammar and spelling mistakes. They have enabled human writers to become more efficient in writing academic papers, for example by helping with finding relevant literature more effectively and polishing texts. While these developments have so far played a relatively assistive role, recent advances in large-scale language models (LLMs) have enabled LLMs to play a more major role in the writing process, such as coming up with research questions and generating key contents. This raises critical questions surrounding the concept of authorship in academia. ChatGPT, a question-answering system released by OpenAI in November 2022, has demonstrated a range of capabilities that could be utilised in producing academic papers. The academic community will have to address relevant pressing questions, including whether Artificial Intelligence (AI) should be merited authorship if it made significant contributions in the writing process, or whether its use should be restricted such that human authorship would not be undermined. In this paper, we aim to address such questions, and propose a framework we name "PaperCard", a documentation for human authors to transparently declare the use of AI in their writing process. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: Accepted at EAAMO'23 as a poster presentation

arXiv:2309.03406 [pdf, other]

Distribution-Aware Prompt Tuning for Vision-Language Models

Authors: Eulrang Cho, Jooyeon Kim, Hyunwoo J. Kim

Abstract: Pre-trained vision-language models (VLMs) have shown impressive performance on various downstream tasks by utilizing knowledge learned from large data. In general, the performance of VLMs on target tasks can be further improved by prompt tuning, which adds context to the input image or text. By leveraging data from target tasks, various prompt-tuning methods have been studied in the literature. A… ▽ More Pre-trained vision-language models (VLMs) have shown impressive performance on various downstream tasks by utilizing knowledge learned from large data. In general, the performance of VLMs on target tasks can be further improved by prompt tuning, which adds context to the input image or text. By leveraging data from target tasks, various prompt-tuning methods have been studied in the literature. A key to prompt tuning is the feature space alignment between two modalities via learnable vectors with model parameters fixed. We observed that the alignment becomes more effective when embeddings of each modality are `well-arranged' in the latent space. Inspired by this observation, we proposed distribution-aware prompt tuning (DAPT) for vision-language models, which is simple yet effective. Specifically, the prompts are learned by maximizing inter-dispersion, the distance between classes, as well as minimizing the intra-dispersion measured by the distance between embeddings from the same class. Our extensive experiments on 11 benchmark datasets demonstrate that our method significantly improves generalizability. The code is available at https://github.com/mlvlab/DAPT. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: Accepted to ICCV2023

arXiv:2309.00237 [pdf, other]

Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

Authors: Sunjun Kweon, Junu Kim, Jiyoun Kim, Sujeong Im, Eunbyeol Cho, Seongsu Bae, Jungwoo Oh, Gyubok Lee, Jong Hak Moon, Seng Chan You, Seungjin Baek, Chang Hoon Han, Yoon Bin Jung, Yohan Jo, Edward Choi

Abstract: The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train… ▽ More The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train our specialized clinical large language model, Asclepius. While Asclepius is trained on synthetic data, we assess its potential performance in real-world applications by evaluating it using real clinical notes. We benchmark Asclepius against several other large language models, including GPT-3.5-turbo and other open-source alternatives. To further validate our approach using synthetic notes, we also compare Asclepius with its variants trained on real clinical notes. Our findings convincingly demonstrate that synthetic clinical notes can serve as viable substitutes for real ones when constructing high-performing clinical language models. This conclusion is supported by detailed evaluations conducted by both GPT-4 and medical professionals. All resources including weights, codes, and data used in the development of Asclepius are made publicly accessible for future research. (https://github.com/starmpcc/Asclepius) △ Less

Submitted 13 June, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

Comments: ACL 2024 (Findings)

arXiv:2308.14296 [pdf, other]

RecMind: Large Language Model Powered Agent For Recommendation

Authors: Yancheng Wang, Ziyan Jiang, Zheng Chen, Fan Yang, Yingxue Zhou, Eunah Cho, Xing Fan, Xiaojiang Huang, Yanbin Lu, Yingzhen Yang

Abstract: While the recommendation system (RS) has advanced significantly through deep learning, current RS approaches usually train and fine-tune models on task-specific datasets, limiting their generalizability to new recommendation tasks and their ability to leverage external knowledge due to model scale and data size constraints. Thus, we designed an LLM-powered autonomous recommender agent, RecMind, wh… ▽ More While the recommendation system (RS) has advanced significantly through deep learning, current RS approaches usually train and fine-tune models on task-specific datasets, limiting their generalizability to new recommendation tasks and their ability to leverage external knowledge due to model scale and data size constraints. Thus, we designed an LLM-powered autonomous recommender agent, RecMind, which is capable of leveraging external knowledge, utilizing tools with careful planning to provide zero-shot personalized recommendations. We propose a Self-Inspiring algorithm to improve the planning ability. At each intermediate step, the LLM self-inspires to consider all previously explored states to plan for the next step. This mechanism greatly improves the model's ability to comprehend and utilize historical information in planning for recommendation. We evaluate RecMind's performance in various recommendation scenarios. Our experiment shows that RecMind outperforms existing zero/few-shot LLM-based recommendation baseline methods in various tasks and achieves comparable performance to a fully trained recommendation model P5. △ Less

Submitted 20 March, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: Accepted by NAACL 2024 (Findings)

arXiv:2305.14449 [pdf, other]

Graph Meets LLM: A Novel Approach to Collaborative Filtering for Robust Conversational Understanding

Authors: Zheng Chen, Ziyan Jiang, Fan Yang, Eunah Cho, Xing Fan, Xiaojiang Huang, Yanbin Lu, Aram Galstyan

Abstract: Conversational AI systems such as Alexa need to understand defective queries to ensure robust conversational understanding and reduce user friction. These defective queries often arise from user ambiguities, mistakes, or errors in automatic speech recognition (ASR) and natural language understanding (NLU). Personalized query rewriting is an approach that focuses on reducing defects in queries by… ▽ More Conversational AI systems such as Alexa need to understand defective queries to ensure robust conversational understanding and reduce user friction. These defective queries often arise from user ambiguities, mistakes, or errors in automatic speech recognition (ASR) and natural language understanding (NLU). Personalized query rewriting is an approach that focuses on reducing defects in queries by taking into account the user's individual behavior and preferences. It typically relies on an index of past successful user interactions with the conversational AI. However, unseen interactions within the user's history present additional challenges for personalized query rewriting. This paper presents our "Collaborative Query Rewriting" approach, which specifically addresses the task of rewriting new user interactions that have not been previously observed in the user's history. This approach builds a "User Feedback Interaction Graph" (FIG) of historical user-entity interactions and leverages multi-hop graph traversal to enrich each user's index to cover future unseen defective queries. The enriched user index is called a Collaborative User Index and contains hundreds of additional entries. To counteract precision degradation from the enlarged index, we add additional transformer layers to the L1 retrieval model and incorporate graph-based and guardrail features into the L2 ranking model. Since the user index can be pre-computed, we further investigate the utilization of a Large Language Model (LLM) to enhance the FIG for user-entity link prediction in the Video/Music domains. Specifically, this paper investigates the Dolly-V2 7B model. We found that the user index augmented by the fine-tuned Dolly-V2 generation significantly enhanced the coverage of future unseen user interactions, thereby boosting QR performance on unseen queries compared with the graph traversal only approach. △ Less

Submitted 19 June, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

ACM Class: F.2.2; I.2.7

arXiv:2305.07622 [pdf, other]

PALR: Personalization Aware LLMs for Recommendation

Authors: Fan Yang, Zheng Chen, Ziyan Jiang, Eunah Cho, Xiaojiang Huang, Yanbin Lu

Abstract: Large language models (LLMs) have recently received significant attention for their exceptional capabilities. Despite extensive efforts in developing general-purpose LLMs that can be utilized in various natural language processing (NLP) tasks, there has been less research exploring their potential in recommender systems. In this paper, we propose a novel framework, named PALR, which aiming to comb… ▽ More Large language models (LLMs) have recently received significant attention for their exceptional capabilities. Despite extensive efforts in developing general-purpose LLMs that can be utilized in various natural language processing (NLP) tasks, there has been less research exploring their potential in recommender systems. In this paper, we propose a novel framework, named PALR, which aiming to combine user history behaviors (such as clicks, purchases, ratings, etc.) with LLMs to generate user preferred items. Specifically, we first use user/item interactions as guidance for candidate retrieval. Then we adopt a LLM-based ranking model to generate recommended items. Unlike existing approaches that typically adopt general-purpose LLMs for zero/few-shot recommendation testing or training on small-sized language models (with less than 1 billion parameters), which cannot fully elicit LLMs' reasoning abilities and leverage rich item side parametric knowledge, we fine-tune a 7 billion parameters LLM for the ranking purpose. This model takes retrieval candidates in natural language format as input, with instruction which explicitly asking to select results from input candidates during inference. Our experimental results demonstrate that our solution outperforms state-of-the-art models on various sequential recommendation tasks. △ Less

Submitted 7 June, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

ACM Class: I.2.6; I.2.7

arXiv:2304.02260 [pdf, ps, other]

Feature Engineering Using File Layout for Malware Detection

Authors: Jeongwoo Kim, Eun-Sun Cho, Joon-Young Paik

Abstract: Malware detection on binary executables provides a high availability to even binaries which are not disassembled or decompiled. However, a binary-level approach could cause ambiguity problems. In this paper, we propose a new feature engineering technique that use minimal knowledge about the internal layout on a binary. The proposed feature avoids the ambiguity problems by integrating the informati… ▽ More Malware detection on binary executables provides a high availability to even binaries which are not disassembled or decompiled. However, a binary-level approach could cause ambiguity problems. In this paper, we propose a new feature engineering technique that use minimal knowledge about the internal layout on a binary. The proposed feature avoids the ambiguity problems by integrating the information about the layout with structural entropy. The experimental results show that our feature improves accuracy and F1-score by 3.3% and 0.07, respectively, on a CNN based malware detector with realistic benign and malicious samples. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: 2pages, no figures, This manuscript was presented in the poster session of The Annual Computer Security Applications Conference (ACSAC) 2020

arXiv:2303.08290 [pdf, other]

Rediscovery of CNN's Versatility for Text-based Encoding of Raw Electronic Health Records

Authors: Eunbyeol Cho, Min Jae Lee, Kyunghoon Hur, Jiyoun Kim, Jinsung Yoon, Edward Choi

Abstract: Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn effic… ▽ More Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn efficient EHR representation in terms of computation and memory usage. In this paper, we search for a versatile encoder not only reducing the large data into a manageable size but also well preserving the core information of patients to perform diverse clinical tasks. We found that hierarchically structured Convolutional Neural Network (CNN) often outperforms the state-of-the-art model on diverse tasks such as reconstruction, prediction, and generation, even with fewer parameters and less training time. Moreover, it turns out that making use of the inherent hierarchy of EHR data can boost the performance of any kind of backbone models and clinical tasks performed. Through extensive experiments, we present concrete evidence to generalize our research findings into real-world practice. We give a clear guideline on building the encoder based on the research findings captured while exploring numerous settings. △ Less

Submitted 10 May, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted to CHIL 2023

arXiv:2303.07547 [pdf, other]

HazardNet: Road Debris Detection by Augmentation of Synthetic Models

Authors: Tae Eun Choe, Jane Wu, Xiaolin Lin, Karen Kwon, Minwoo Park

Abstract: We present an algorithm to detect unseen road debris using a small set of synthetic models. Early detection of road debris is critical for safe autonomous or assisted driving, yet the development of a robust road debris detection model has not been widely discussed. There are two main challenges to building a road debris detector: first, data collection of road debris is challenging since hazardou… ▽ More We present an algorithm to detect unseen road debris using a small set of synthetic models. Early detection of road debris is critical for safe autonomous or assisted driving, yet the development of a robust road debris detection model has not been widely discussed. There are two main challenges to building a road debris detector: first, data collection of road debris is challenging since hazardous objects on the road are rare to encounter in real driving scenarios; second, the variability of road debris is broad, ranging from a very small brick to a large fallen tree. To overcome these challenges, we propose a novel approach to few-shot learning of road debris that uses semantic augmentation and domain randomization to augment real road images with synthetic models. We constrain the problem domain to uncommon objects on the road and allow the deep neural network, HazardNet, to learn the semantic meaning of road debris to eventually detect unseen road debris. Our results demonstrate that HazardNet is able to accurately detect real road debris when only trained on synthetic objects in augmented images. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 11 pages

MSC Class: ACM-class: I.1.4

arXiv:2302.10454 [pdf, other]

KG-ECO: Knowledge Graph Enhanced Entity Correction for Query Rewriting

Authors: Jinglun Cai, Mingda Li, Ziyan Jiang, Eunah Cho, Zheng Chen, Yang Liu, Xing Fan, Chenlei Guo

Abstract: Query Rewriting (QR) plays a critical role in large-scale dialogue systems for reducing frictions. When there is an entity error, it imposes extra challenges for a dialogue system to produce satisfactory responses. In this work, we propose KG-ECO: Knowledge Graph enhanced Entity COrrection for query rewriting, an entity correction system with corrupt entity span detection and entity retrieval/re-r… ▽ More Query Rewriting (QR) plays a critical role in large-scale dialogue systems for reducing frictions. When there is an entity error, it imposes extra challenges for a dialogue system to produce satisfactory responses. In this work, we propose KG-ECO: Knowledge Graph enhanced Entity COrrection for query rewriting, an entity correction system with corrupt entity span detection and entity retrieval/re-ranking functionalities. To boost the model performance, we incorporate Knowledge Graph (KG) to provide entity structural information (neighboring entities encoded by graph neural networks) and textual information (KG entity descriptions encoded by RoBERTa). Experimental results show that our approach yields a clear performance gain over two baselines: utterance level QR and entity correction without utilizing KG information. The proposed system is particularly effective for few-shot learning cases where target entities are rarely seen in training or there is a KG relation between the target entity and other contextual entities in the query. △ Less

Submitted 22 February, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

arXiv:2302.06819 [pdf, other]

L4 Pointer: An efficient pointer extension for spatial memory safety support without hardware extension

Authors: Seong-Kyun Mok, Eun-Sun Cho

Abstract: Since buffer overflow has long been a frequently occurring, high-risk vulnerability, various methods have been developed to support spatial memory safety and prevent buffer overflow. However, every proposed method, although effective in part, has its limitations. Due to expensive bound-checking or large memory in taking for metadata, the software-only support for spatial memory safety inherently e… ▽ More Since buffer overflow has long been a frequently occurring, high-risk vulnerability, various methods have been developed to support spatial memory safety and prevent buffer overflow. However, every proposed method, although effective in part, has its limitations. Due to expensive bound-checking or large memory in taking for metadata, the software-only support for spatial memory safety inherently entails runtime overhead. Contrastingly, hardware-assisted methods are not available without specific hardware assistants. To mitigate such limitations, Herein we propose L4 Pointer, which is a 128-bit pointer extended from a normal 64-bit virtual addresses. By using the extra bits and widespread SIMD operations, L4 Pointer shows less slow-down and higher performance without hardware extension than existing methods. △ Less

Submitted 13 February, 2023; originally announced February 2023.

arXiv:2211.08082 [pdf, other]

UniHPF : Universal Healthcare Predictive Framework with Zero Domain Knowledge

Authors: Kyunghoon Hur, Jungwoo Oh, Junu Kim, Jiyoun Kim, Min Jae Lee, Eunbyeol Cho, Seong-Eun Moon, Young-Hak Kim, Edward Choi

Abstract: Despite the abundance of Electronic Healthcare Records (EHR), its heterogeneity restricts the utilization of medical data in building predictive models. To address this challenge, we propose Universal Healthcare Predictive Framework (UniHPF), which requires no medical domain knowledge and minimal pre-processing for multiple prediction tasks. Experimental results demonstrate that UniHPF is capable… ▽ More Despite the abundance of Electronic Healthcare Records (EHR), its heterogeneity restricts the utilization of medical data in building predictive models. To address this challenge, we propose Universal Healthcare Predictive Framework (UniHPF), which requires no medical domain knowledge and minimal pre-processing for multiple prediction tasks. Experimental results demonstrate that UniHPF is capable of building large-scale EHR models that can process any form of medical data from distinct EHR systems. We believe that our findings can provide helpful insights for further research on the multi-source learning of EHRs. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 19 pages(main paper 6 pages). arXiv admin note: substantial text overlap with arXiv:2207.09858

arXiv:2209.02903 [pdf]

doi 10.1145/3555600

Taking a Language Detour: How International Migrants Speaking a Minority Language Seek COVID-Related Information in Their Host Countries

Authors: Ge Gao, Jian Zheng, Eun Kyoung Choe, Naomi Yamashita

Abstract: Information seeking is crucial for people's self-care and wellbeing in times of public crises. Extensive research has investigated empirical understandings as well as technical solutions to facilitate information seeking by domestic citizens of affected regions. However, limited knowledge is established to support international migrants who need to survive a crisis in their host countries. The cur… ▽ More Information seeking is crucial for people's self-care and wellbeing in times of public crises. Extensive research has investigated empirical understandings as well as technical solutions to facilitate information seeking by domestic citizens of affected regions. However, limited knowledge is established to support international migrants who need to survive a crisis in their host countries. The current paper presents an interview study with two cohorts of Chinese migrants living in Japan (N=14) and the United States (N=14). Participants reflected on their information seeking experiences during the COVID pandemic. The reflection was supplemented by two weeks of self-tracking where participants maintained records of their COVIDrelated information seeking practice. Our data indicated that participants often took language detours, or visits to Mandarin resources for information about the COVID outbreak in their host countries. They also made strategic use of the Mandarin information to perform selective reading, cross-checking, and contextualized interpretation of COVID-related information in Japanese or English. While such practices enhanced participants' perceived effectiveness of COVID-related information gathering and sensemaking, they disadvantaged people through sometimes incognizant ways. Further, participants lacked the awareness or preference to review migrant-oriented information that was issued by the host country's public authorities despite its availability. Building upon these findings, we discussed solutions to improve international migrants' COVID-related information seeking in their non-native language and cultural environment. We advocated inclusive crisis infrastructures that would engage people with diverse levels of local language fluency, information literacy, and experience in leveraging public services. △ Less

Submitted 27 September, 2022; v1 submitted 6 September, 2022; originally announced September 2022.

Journal ref: PACM on Human-Computer Interaction, Vol.6, No.CSCW2, Article 542, Publication date: November 2022

arXiv:2208.05612 [pdf, other]

SSLEM: A Simplifier for MBA Expressions based on Semi-linear MBA Expressions and Program Synthesis

Authors: Seong-Kyun Mok, Seoyeon Kang, Jeongwoo Kim, Eun-Sun Cho, Seokwoo Choi

Abstract: MBA (mixed boolean and arithmetic) expressions are hard to simplify, so used for malware obfuscation to hinder analysts' diagnosis. Some MBA simplification methods with high performance have been developed, but they narrowed the target to "linear" MBA expressions, which allows efficient solutions based on logic/term-rewriting. However such restrictions are not appropriate for general forms of MBA… ▽ More MBA (mixed boolean and arithmetic) expressions are hard to simplify, so used for malware obfuscation to hinder analysts' diagnosis. Some MBA simplification methods with high performance have been developed, but they narrowed the target to "linear" MBA expressions, which allows efficient solutions based on logic/term-rewriting. However such restrictions are not appropriate for general forms of MBA expressions usually appearing in malware. To overcome this limitation, we introduce a "semi-linear" MBA expression, a new class of MBA expression extended from a linear MBA expression, and propose a new MBA simplifier called "SSLEM", based on a simplification idea of semi-linear MBA expressions and program synthesis △ Less

Submitted 15 August, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

arXiv:2207.09858 [pdf, ps, other]

doi 10.1109/JBHI.2023.3327951

GenHPF: General Healthcare Predictive Framework with Multi-task Multi-source Learning

Authors: Kyunghoon Hur, Jungwoo Oh, Junu Kim, Jiyoun Kim, Min Jae Lee, Eunbyeol Cho, Seong-Eun Moon, Young-Hak Kim, Louis Atallah, Edward Choi

Abstract: Despite the remarkable progress in the development of predictive models for healthcare, applying these algorithms on a large scale has been challenging. Algorithms trained on a particular task, based on specific data formats available in a set of medical records, tend to not generalize well to other tasks or databases in which the data fields may differ. To address this challenge, we propose Gener… ▽ More Despite the remarkable progress in the development of predictive models for healthcare, applying these algorithms on a large scale has been challenging. Algorithms trained on a particular task, based on specific data formats available in a set of medical records, tend to not generalize well to other tasks or databases in which the data fields may differ. To address this challenge, we propose General Healthcare Predictive Framework (GenHPF), which is applicable to any EHR with minimal preprocessing for multiple prediction tasks. GenHPF resolves heterogeneity in medical codes and schemas by converting EHRs into a hierarchical textual representation while incorporating as many features as possible. To evaluate the efficacy of GenHPF, we conduct multi-task learning experiments with single-source and multi-source settings, on three publicly available EHR datasets with different schemas for 12 clinically meaningful prediction tasks. Our framework significantly outperforms baseline models that utilize domain knowledge in multi-source learning, improving average AUROC by 1.2%P in pooled learning and 2.6%P in transfer learning while also showing comparable results when trained on a single EHR dataset. Furthermore, we demonstrate that self-supervised pretraining using multi-source datasets is effective when combined with GenHPF, resulting in a 0.6%P AUROC improvement compared to models without pretraining. By eliminating the need for preprocessing and feature engineering, we believe that this work offers a solid framework for multi-task and multi-source learning that can be leveraged to speed up the scaling and usage of predictive algorithms in healthcare. △ Less

Submitted 15 November, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: Accepted by IEEE Journal of Biomedical and Health Informatics

Journal ref: IEEE Journal of Biomedical and Health Informatics 2024

arXiv:2205.13155 [pdf, other]

A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs

Authors: Euijin Choo, Mohamed Nabeel, Ravindu De Silva, Ting Yu, Issa Khalil

Abstract: VirusTotal (VT) provides aggregated threat intelligence on various entities including URLs, IP addresses, and binaries. It is widely used by researchers and practitioners to collect ground truth and evaluate the maliciousness of entities. In this work, we provide a comprehensive analysis of VT URL scanning reports containing the results of 95 scanners for 1.577 Billion URLs over two years. Individ… ▽ More VirusTotal (VT) provides aggregated threat intelligence on various entities including URLs, IP addresses, and binaries. It is widely used by researchers and practitioners to collect ground truth and evaluate the maliciousness of entities. In this work, we provide a comprehensive analysis of VT URL scanning reports containing the results of 95 scanners for 1.577 Billion URLs over two years. Individual VT scanners are known to be noisy in terms of their detection and attack type classification. To obtain high quality ground truth of URLs and actively take proper actions to mitigate different types of attacks, there are two challenges: (1) how to decide whether a given URL is malicious given noisy reports and (2) how to determine attack types (e.g., phishing or malware hosting) that the URL is involved in, given conflicting attack labels from different scanners. In this work, we provide a systematic comparative study on the behavior of VT scanners for different attack types of URLs. A common practice to decide the maliciousness is to use a cut-off threshold of scanners that report the URL as malicious. However, in this work, we show that using a fixed threshold is suboptimal, due to several reasons: (1) correlations between scanners; (2) lead/lag behavior; (3) the specialty of scanners; (4) the quality and reliability of scanners. A common practice to determine an attack type is to use majority voting. However, we show that majority voting could not accurately classify the attack type of a URL due to the bias from correlated scanners. Instead, we propose a machine learning-based approach to assign an attack type to URLs given the VT reports. △ Less

Submitted 26 May, 2022; originally announced May 2022.

arXiv:2205.08290 [pdf, other]

Literature Review to Collect Conceptual Variables of Scenario Methods for Establishing a Conceptual Scenario Framework

Authors: Young-Min Baek, Esther Cho, Donghwan Shin, Doo-Hwan Bae

Abstract: Over recent decades, scenarios and scenario-based software/system engineering have been actively employed as essential tools to handle intricate problems, validate requirements, and support stakeholders' communication. However, despite the widespread use of scenarios, there have been several challenges for engineers to more willingly utilize scenario-based engineering approaches (i.e., scenario me… ▽ More Over recent decades, scenarios and scenario-based software/system engineering have been actively employed as essential tools to handle intricate problems, validate requirements, and support stakeholders' communication. However, despite the widespread use of scenarios, there have been several challenges for engineers to more willingly utilize scenario-based engineering approaches (i.e., scenario methods) in their projects. First, the term scenario has numerous published definitions, thus lacking in a well-established shared understanding of scenarios and scenario methods. Second, the conceptual basis for engineers developing or employing scenarios is missing. To establish shared understanding and to find common denominators of scenario methods, this study leverages well-defined metamodeling and conceptualization that systematically investigate the concepts under analysis and define core entities and their relations. By conducting a semi-systematic literature review, conceptual variables are collected and conceptualized as a conceptual meta-model. As a result, this study introduces scenario variables (SVs) that represent constructs/semantics of scenario descriptions, according to 4 levels of constructs of a scenario method. To evaluate the comprehensibility and applicability of the defined variables, we analyze five existing scenario methods and their instances in automated driving system (ADS) domains. The results showed that our conceptual model and its constituent scenario variables adequately support the understanding of a scenario method and provide a means for comparative analysis between different scenario methods. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Comments: 22 pages, 7 figures

MSC Class: 68M99 ACM Class: D.2.1

arXiv:2204.10191 [pdf, other]

doi 10.1145/3555164

Alexa as an Active Listener: How Backchanneling Can Elicit Self-Disclosure and Promote User Experience

Authors: Eugene Cho, Nasim Motalebi, S. Shyam Sundar, Saeed Abdullah

Abstract: Active listening is a well-known skill applied in human communication to build intimacy and elicit self-disclosure to support a wide variety of cooperative tasks. When applied to conversational UIs, active listening from machines can also elicit greater self-disclosure by signaling to the users that they are being heard, which can have positive outcomes. However, it takes considerable engineering… ▽ More Active listening is a well-known skill applied in human communication to build intimacy and elicit self-disclosure to support a wide variety of cooperative tasks. When applied to conversational UIs, active listening from machines can also elicit greater self-disclosure by signaling to the users that they are being heard, which can have positive outcomes. However, it takes considerable engineering effort and training to embed active listening skills in machines at scale, given the need to personalize active-listening cues to individual users and their specific utterances. A more generic solution is needed given the increasing use of conversational agents, especially by the growing number of socially isolated individuals. With this in mind, we developed an Amazon Alexa skill that provides privacy-preserving and pseudo-random backchanneling to indicate active listening. User study (N = 40) data show that backchanneling improves perceived degree of active listening by smart speakers. It also results in more emotional disclosure, with participants using more positive words. Perception of smart speakers as active listeners is positively associated with perceived emotional support. Interview data corroborate the feasibility of using smart speakers to provide emotional support. These findings have important implications for smart speaker interaction design in several domains of cooperative work and social computing. △ Less

Submitted 22 September, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

Comments: To appear in Proceedings of the ACM on Human-Computer Interaction (PACM HCI). The paper will be presented in CSCW 2022 (https://cscw.acm.org/2022)

arXiv:2204.00145 [pdf, other]

doi 10.1145/3491102.3517457

MyMove: Facilitating Older Adults to Collect In-Situ Activity Labels on a Smartwatch with Speech

Authors: Young-Ho Kim, Diana Chou, Bongshin Lee, Margaret Danilovich, Amanda Lazar, David E. Conroy, Hernisa Kacorri, Eun Kyoung Choe

Abstract: Current activity tracking technologies are largely trained on younger adults' data, which can lead to solutions that are not well-suited for older adults. To build activity trackers for older adults, it is crucial to collect training data with them. To this end, we examine the feasibility and challenges with older adults in collecting activity labels by leveraging speech. Specifically, we built My… ▽ More Current activity tracking technologies are largely trained on younger adults' data, which can lead to solutions that are not well-suited for older adults. To build activity trackers for older adults, it is crucial to collect training data with them. To this end, we examine the feasibility and challenges with older adults in collecting activity labels by leveraging speech. Specifically, we built MyMove, a speech-based smartwatch app to facilitate the in-situ labeling with a low capture burden. We conducted a 7-day deployment study, where 13 older adults collected their activity labels and smartwatch sensor data, while wearing a thigh-worn activity monitor. Participants were highly engaged, capturing 1,224 verbal reports in total. We extracted 1,885 activities with corresponding effort level and timespan, and examined the usefulness of these reports as activity labels. We discuss the implications of our approach and the collected dataset in supporting older adults through personalized activity tracking technologies. △ Less

Submitted 31 March, 2022; originally announced April 2022.

Comments: To appear at ACM CHI 2022. 21 pages, 3 figures, 7 tables. For the NSF funded project, visit https://mymove-collective.github.io

ACM Class: H.5.2; H.5.1; I.2.1

arXiv:2109.04655 [pdf, other]

Zero-Shot Dialogue State Tracking via Cross-Task Transfer

Authors: Zhaojiang Lin, Bing Liu, Andrea Madotto, Seungwhan Moon, Paul Crook, Zhenpeng Zhou, Zhiguang Wang, Zhou Yu, Eunjoon Cho, Rajen Subba, Pascale Fung

Abstract: Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety of task-oriented dialogue domains without the expense of collecting in-domain data. In this work, we propose to transfer the \textit{cross-task} knowledge from general question answering (QA) corpora for the zero-shot DST task. Specifically, we propose TransferQA, a transferable generative QA model that se… ▽ More Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety of task-oriented dialogue domains without the expense of collecting in-domain data. In this work, we propose to transfer the \textit{cross-task} knowledge from general question answering (QA) corpora for the zero-shot DST task. Specifically, we propose TransferQA, a transferable generative QA model that seamlessly combines extractive QA and multi-choice QA via a text-to-text transformer framework, and tracks both categorical slots and non-categorical slots in DST. In addition, we introduce two effective ways to construct unanswerable questions, namely, negative question sampling and context truncation, which enable our model to handle "none" value slots in the zero-shot DST setting. The extensive experiments show that our approaches substantially improve the existing zero-shot and few-shot results on MultiWoz. Moreover, compared to the fully trained baseline on the Schema-Guided Dialogue dataset, our approach shows better generalization ability in unseen domains. △ Less

Submitted 9 September, 2021; originally announced September 2021.

Comments: EMNLP 2021

arXiv:2105.04222 [pdf, other]

Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue State Tracking

Authors: Zhaojiang Lin, Bing Liu, Seungwhan Moon, Paul Crook, Zhenpeng Zhou, Zhiguang Wang, Zhou Yu, Andrea Madotto, Eunjoon Cho, Rajen Subba

Abstract: Zero-shot cross-domain dialogue state tracking (DST) enables us to handle task-oriented dialogue in unseen domains without the expense of collecting in-domain data. In this paper, we propose a slot description enhanced generative approach for zero-shot cross-domain DST. Specifically, our model first encodes dialogue context and slots with a pre-trained self-attentive encoder, and generates slot va… ▽ More Zero-shot cross-domain dialogue state tracking (DST) enables us to handle task-oriented dialogue in unseen domains without the expense of collecting in-domain data. In this paper, we propose a slot description enhanced generative approach for zero-shot cross-domain DST. Specifically, our model first encodes dialogue context and slots with a pre-trained self-attentive encoder, and generates slot values in an auto-regressive manner. In addition, we incorporate Slot Type Informed Descriptions that capture the shared information across slots to facilitate cross-domain knowledge transfer. Experimental results on the MultiWOZ dataset show that our proposed method significantly improves existing state-of-the-art results in the zero-shot cross-domain setting. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: NAACL 2021

arXiv:2104.05979 [pdf, other]

Investigating Opportunities to Support Kids' Agency and Well-being: A Review of Kids' Wearables

Authors: Rachael Zehrung, Lily Huang, Bongshin Lee, Eun Kyoung Choe

Abstract: Wearable devices hold great potential for promoting children's health and well-being. However, research on kids' wearables is sparse and often focuses on their use in the context of parental surveillance. To gain insight into the current landscape of kids' wearables, we surveyed 47 wearable devices marketed for children. We collected rich data on the functionality of these devices and assessed how… ▽ More Wearable devices hold great potential for promoting children's health and well-being. However, research on kids' wearables is sparse and often focuses on their use in the context of parental surveillance. To gain insight into the current landscape of kids' wearables, we surveyed 47 wearable devices marketed for children. We collected rich data on the functionality of these devices and assessed how different features satisfy parents' information needs, and identified opportunities for wearables to support children's needs and interests. We found that many kids' wearables are technologically sophisticated devices that focus on parents' ability to communicate with their children and keep them safe, as well as encourage physical activity and nurture good habits. We discuss how our findings could inform the design of wearables that serve as more than monitoring devices, and instead support children and parents as equal stakeholders, providing implications for kids' agency, long-term development, and overall well-being. Finally, we identify future research efforts related to designing for kids' self-tracking and collaborative tracking with parents. △ Less

Submitted 13 April, 2021; originally announced April 2021.

Comments: 20 pages, 1 figure, 5 tables

arXiv:2101.06283 [pdf, other]

doi 10.1145/3411764.3445421

Data@Hand: Fostering Visual Exploration of Personal Data on Smartphones Leveraging Speech and Touch Interaction

Authors: Young-Ho Kim, Bongshin Lee, Arjun Srinivasan, Eun Kyoung Choe

Abstract: Most mobile health apps employ data visualization to help people view their health and activity data, but these apps provide limited support for visual data exploration. Furthermore, despite its huge potential benefits, mobile visualization research in the personal data context is sparse. This work aims to empower people to easily navigate and compare their personal health data on smartphones by e… ▽ More Most mobile health apps employ data visualization to help people view their health and activity data, but these apps provide limited support for visual data exploration. Furthermore, despite its huge potential benefits, mobile visualization research in the personal data context is sparse. This work aims to empower people to easily navigate and compare their personal health data on smartphones by enabling flexible time manipulation with speech. We designed and developed Data@Hand, a mobile app that leverages the synergy of two complementary modalities: speech and touch. Through an exploratory study with 13 long-term Fitbit users, we examined how multimodal interaction helps participants explore their own health data. Participants successfully adopted multimodal interaction (i.e., speech and touch) for convenient and fluid data exploration. Based on the quantitative and qualitative findings, we discuss design implications and opportunities with multimodal interaction for better supporting visual data exploration on mobile devices. △ Less

Submitted 15 January, 2021; originally announced January 2021.

Comments: To appear in ACM CHI 2021 Conference on Human Factors in Computing Systems; 16 pages, 6 figures, 5 tables

ACM Class: H.5.2

Journal ref: In CHI Conference on Human Factors in Computing Systems (CHI '21), May 8-13, 2021, Yokohama, Japan

arXiv:2012.15504 [pdf, other]

Continual Learning in Task-Oriented Dialogue Systems

Authors: Andrea Madotto, Zhaojiang Lin, Zhenpeng Zhou, Seungwhan Moon, Paul Crook, Bing Liu, Zhou Yu, Eunjoon Cho, Zhiguang Wang

Abstract: Continual learning in task-oriented dialogue systems can allow us to add new domains and functionalities through time without incurring the high cost of a whole system retraining. In this paper, we propose a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings, such as intent recognition, state tracking, natural language genera… ▽ More Continual learning in task-oriented dialogue systems can allow us to add new domains and functionalities through time without incurring the high cost of a whole system retraining. In this paper, we propose a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings, such as intent recognition, state tracking, natural language generation, and end-to-end. Moreover, we implement and compare multiple existing continual learning baselines, and we propose a simple yet effective architectural method based on residual adapters. Our experiments demonstrate that the proposed architectural method and a simple replay-based strategy perform comparably well but they both achieve inferior performance to the multi-task learning baseline, in where all the data are shown at once, showing that continual learning in task-oriented dialogue systems is a challenging task. Furthermore, we reveal several trade-offs between different continual learning methods in term of parameter usage and memory size, which are important in the design of a task-oriented dialogue system. The proposed benchmark is released together with several baselines to promote more research in this direction. △ Less

Submitted 31 December, 2020; originally announced December 2020.

Comments: 9 pages

arXiv:2012.13971 [pdf, other]

Time-Window Group-Correlation Support vs. Individual Features: A Detection of Abnormal Users

Authors: Lun-Pin Yuan, Euijin Choo, Ting Yu, Issa Khalil, Sencun Zhu

Abstract: Autoencoder-based anomaly detection methods have been used in identifying anomalous users from large-scale enterprise logs with the assumption that adversarial activities do not follow past habitual patterns. Most existing approaches typically build models by reconstructing single-day and individual-user behaviors. However, without capturing long-term signals and group-correlation signals, the mod… ▽ More Autoencoder-based anomaly detection methods have been used in identifying anomalous users from large-scale enterprise logs with the assumption that adversarial activities do not follow past habitual patterns. Most existing approaches typically build models by reconstructing single-day and individual-user behaviors. However, without capturing long-term signals and group-correlation signals, the models cannot identify low-signal yet long-lasting threats, and will wrongly report many normal users as anomalies on busy days, which, in turn, lead to high false positive rate. In this paper, we propose ACOBE, an Anomaly detection method based on COmpound BEhavior, which takes into consideration long-term patterns and group behaviors. ACOBE leverages a novel behavior representation and an ensemble of deep autoencoders and produces an ordered investigation list. Our evaluation shows that ACOBE outperforms prior work by a large margin in terms of precision and recall, and our case study demonstrates that ACOBE is applicable in practice for cyberattack detection. △ Less

Submitted 27 December, 2020; originally announced December 2020.

arXiv:2010.12757 [pdf, other]

Adding Chit-Chat to Enhance Task-Oriented Dialogues

Authors: Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, Claire Cardie

Abstract: Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e.g., booking hotels), open-domain chatbots aim at making socially engaging conversations. In this work, we propose to integrate both types of systems by Adding Chit-Chat to ENhance Task-ORiented dialogues (ACCENTOR), with the goal of making virtu… ▽ More Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e.g., booking hotels), open-domain chatbots aim at making socially engaging conversations. In this work, we propose to integrate both types of systems by Adding Chit-Chat to ENhance Task-ORiented dialogues (ACCENTOR), with the goal of making virtual assistant conversations more engaging and interactive. Specifically, we propose a Human <-> AI collaborative data collection approach for generating diverse chit-chat responses to augment task-oriented dialogues with minimal annotation effort. We then present our new chit-chat-based annotations to 23.8K dialogues from two popular task-oriented datasets (Schema-Guided Dialogue and MultiWOZ 2.1) and demonstrate their advantage over the originals via human evaluation. Lastly, we propose three new models for adding chit-chat to task-oriented dialogues, explicitly trained to predict user goals and to generate contextually relevant chit-chat responses. Automatic and human evaluations show that, compared with the state-of-the-art task-oriented baseline, our models can code-switch between task and chit-chat to be more engaging, interesting, knowledgeable, and humanlike, while maintaining competitive task performance. △ Less

Submitted 1 May, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

Comments: To appear in NAACL-HLT 2021

arXiv:2006.01460 [pdf, other]

Situated and Interactive Multimodal Conversations

Authors: Seungwhan Moon, Satwik Kottur, Paul A. Crook, Ankita De, Shivani Poddar, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard

Abstract: Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take mult… ▽ More Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take multimodal actions grounded in a co-evolving multimodal input context in addition to the dialog history. We provide two SIMMC datasets totalling ~13K human-human dialogs (~169K utterances) using a multimodal Wizard-of-Oz (WoZ) setup, on two shopping domains: (a) furniture (grounded in a shared virtual environment) and, (b) fashion (grounded in an evolving set of images). We also provide logs of the items appearing in each scene, and contextual NLU and coreference annotations, using a novel and unified framework of SIMMC conversational acts for both user and assistant utterances. Finally, we present several tasks within SIMMC as objective evaluation protocols, such as Structural API Prediction and Response Generation. We benchmark a collection of existing models on these SIMMC tasks as strong baselines, and demonstrate rich multimodal conversational interactions. Our data, annotations, code, and models are publicly available. △ Less

Submitted 10 November, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

Comments: 20 pages, 5 figures, 11 tables, accepted to COLING 2020

arXiv:2003.10656 [pdf, other]

doi 10.1007/978-3-030-58589-1_40

Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection

Authors: Yuliang Guo, Guang Chen, Peitao Zhao, Weide Zhang, Jinghao Miao, Jingao Wang, Tae Eun Choe

Abstract: We present a generalized and scalable method, called Gen-LaneNet, to detect 3D lanes from a single image. The method, inspired by the latest state-of-the-art 3D-LaneNet, is a unified framework solving image encoding, spatial transform of features and 3D lane prediction in a single network. However, we propose unique designs for Gen-LaneNet in two folds. First, we introduce a new geometry-guided la… ▽ More We present a generalized and scalable method, called Gen-LaneNet, to detect 3D lanes from a single image. The method, inspired by the latest state-of-the-art 3D-LaneNet, is a unified framework solving image encoding, spatial transform of features and 3D lane prediction in a single network. However, we propose unique designs for Gen-LaneNet in two folds. First, we introduce a new geometry-guided lane anchor representation in a new coordinate frame and apply a specific geometric transformation to directly calculate real 3D lane points from the network output. We demonstrate that aligning the lane points with the underlying top-view features in the new coordinate frame is critical towards a generalized method in handling unfamiliar scenes. Second, we present a scalable two-stage framework that decouples the learning of image segmentation subnetwork and geometry encoding subnetwork. Compared to 3D-LaneNet, the proposed Gen-LaneNet drastically reduces the amount of 3D lane labels required to achieve a robust solution in real-world application. Moreover, we release a new synthetic dataset and its construction strategy to encourage the development and evaluation of 3D lane detection methods. In experiments, we conduct extensive ablation study to substantiate the proposed Gen-LaneNet significantly outperforms 3D-LaneNet in average precision(AP) and F-score. △ Less

Submitted 24 March, 2020; originally announced March 2020.

arXiv:2003.09891 [pdf, other]

Low Latency ASR for Simultaneous Speech Translation

Authors: Thai Son Nguyen, Jan Niehues, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Muller, Matthias Sperber, Sebastian Stueker, Alex Waibel

Abstract: User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we… ▽ More User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we focused on word latency. We used it to analyze the performance of our current system and to identify opportunities for improvements. In order to minimize the latency we combined run-on decoding with a technique for identifying stable partial hypotheses when stream decoding and a protocol for dynamic output update that allows to revise the most recent parts of the transcription. This combination reduces the latency at word level, where the words are final and will never be updated again in the future, from 18.1s to 1.1s without sacrificing performance in terms of word error rate. △ Less

Submitted 22 March, 2020; originally announced March 2020.

arXiv:2003.02245 [pdf, other]

Data Augmentation using Pre-trained Transformer Models

Authors: Varun Kumar, Ashutosh Choudhary, Eunah Cho

Abstract: Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of transformer based pre-trained models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple y… ▽ More Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of transformer based pre-trained models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. Additionally, on three classification benchmarks, pre-trained Seq2Seq model outperforms other data augmentation methods in a low-resource setting. Further, we explore how different pre-trained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information. △ Less

Submitted 31 January, 2021; v1 submitted 4 March, 2020; originally announced March 2020.

Comments: In Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems @ AACL 2020; Code: https://github.com/varinf/TransformersDataAugmentation

arXiv:1911.12080 [pdf, other]

DeviceWatch: Identifying Compromised Mobile Devices through Network Traffic Analysis and Graph Inference

Authors: Euijin Choo, Mohamed Nabeel, Mashael Alsabah, Issa Khalil, Ting Yu, Wei Wang

Abstract: In this paper, we propose to identify compromised mobile devices from a network administrator's point of view. Intuitively, inadvertent users (and thus their devices) who download apps through untrustworthy markets are often allured to install malicious apps through in-app advertisement or phishing. We thus hypothesize that devices sharing a similar set of apps will have a similar probability of b… ▽ More In this paper, we propose to identify compromised mobile devices from a network administrator's point of view. Intuitively, inadvertent users (and thus their devices) who download apps through untrustworthy markets are often allured to install malicious apps through in-app advertisement or phishing. We thus hypothesize that devices sharing a similar set of apps will have a similar probability of being compromised, resulting in the association between a device being compromised and apps in the device. Our goal is to leverage such associations to identify unknown compromised devices (i.e., devices possibly having yet currently not having known malicious apps) using the guilt-by-association principle. Admittedly, such associations could be quite weak as it is often hard, if not impossible, for an app to automatically download and install other apps without explicit initiation from a user. We describe how we can magnify such weak associations between devices and apps by carefully choosing parameters when applying graph-based inferences. We empirically show the effectiveness of our approach with a comprehensive study on the mobile network traffic provided by a major mobile service provider. Concretely, we achieve nearly 98\% accuracy in terms of AUC (area under the ROC curve). Given the relatively weak nature of association, we further conduct in-depth analysis of the different behavior of a graph-inference approach, by comparing it to active DNS data. Moreover, we validate our results by showing that detected compromised devices indeed present undesirable behavior in terms of their privacy leakage and network infrastructure accessed. △ Less

Submitted 27 November, 2019; originally announced November 2019.

arXiv:1910.04196 [pdf, other]

Efficient Semi-Supervised Learning for Natural Language Understanding by Optimizing Diversity

Authors: Eunah Cho, He Xie, John P. Lalor, Varun Kumar, William M. Campbell

Abstract: Expanding new functionalities efficiently is an ongoing challenge for single-turn task-oriented dialogue systems. In this work, we explore functionality-specific semi-supervised learning via self-training. We consider methods that augment training data automatically from unlabeled data sets in a functionality-targeted manner. In addition, we examine multiple techniques for efficient selection of a… ▽ More Expanding new functionalities efficiently is an ongoing challenge for single-turn task-oriented dialogue systems. In this work, we explore functionality-specific semi-supervised learning via self-training. We consider methods that augment training data automatically from unlabeled data sets in a functionality-targeted manner. In addition, we examine multiple techniques for efficient selection of augmented utterances to reduce training time and increase diversity. First, we consider paraphrase detection methods that attempt to find utterance variants of labeled training data with good coverage. Second, we explore sub-modular optimization based on n-grams features for utterance selection. Experiments show that functionality-specific self-training is very effective for improving system performance. In addition, methods optimizing diversity can reduce training data in many cases to 50% with little impact on performance. △ Less

Submitted 9 October, 2019; originally announced October 2019.

arXiv:1907.03919 [pdf, other]

doi 10.1109/TVCG.2019.2934397

A Comparative Evaluation of Animation and Small Multiples for Trend Visualization on Mobile Phones

Authors: Matthew Brehmer, Bongshin Lee, Petra Isenberg, Eun Kyoung Choe

Abstract: We compare the efficacy of animated and small multiples variants of scatterplots on mobile phones for comparing trends in multivariate datasets. Visualization is increasingly prevalent in mobile applications and mobile-first websites, yet there is little prior visualization research dedicated to small displays. In this paper, we build upon previous experimental research carried out on larger displ… ▽ More We compare the efficacy of animated and small multiples variants of scatterplots on mobile phones for comparing trends in multivariate datasets. Visualization is increasingly prevalent in mobile applications and mobile-first websites, yet there is little prior visualization research dedicated to small displays. In this paper, we build upon previous experimental research carried out on larger displays that assessed animated and non-animated variants of scatterplots. Incorporating similar experimental stimuli and tasks, we conducted an experiment where 96 crowdworker participants performed nine trend comparison tasks using their mobile phones. We found that those using a small multiples design consistently completed tasks in less time, albeit with slightly less confidence than those using an animated design. The accuracy results were more task-dependent, and we further interpret our results according to the characteristics of the individual tasks, with a specific focus on the trajectories of target and distractor data items in each task. We identify cases that appear to favor either animation or small multiples, providing new questions for further experimental research and implications for visualization design on mobile devices. Lastly, we provide a reflection on our evaluation methodology. △ Less

Submitted 12 October, 2019; v1 submitted 8 July, 2019; originally announced July 2019.

Comments: Accepted for presentation at IEEE VIS 2019, October 20-25 in Vancouver, Canada. To appear in IEEE Transactions on Visualization and Computer Graphics

arXiv:1708.00993 [pdf, other]

Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning

Authors: Jan Niehues, Eunah Cho

Abstract: Linguistic resources such as part-of-speech (POS) tags have been extensively used in statistical machine translation (SMT) frameworks and have yielded better performances. However, usage of such linguistic annotations in neural machine translation (NMT) systems has been left under-explored. In this work, we show that multi-task learning is a successful and a easy approach to introduce an additio… ▽ More Linguistic resources such as part-of-speech (POS) tags have been extensively used in statistical machine translation (SMT) frameworks and have yielded better performances. However, usage of such linguistic annotations in neural machine translation (NMT) systems has been left under-explored. In this work, we show that multi-task learning is a successful and a easy approach to introduce an additional knowledge into an end-to-end neural attentional model. By jointly training several natural language processing (NLP) tasks in one system, we are able to leverage common information and improve the performance of the individual task. We analyze the impact of three design decisions in multi-task learning: the tasks used in training, the training schedule, and the degree of parameter sharing across the tasks, which is defined by the network architecture. The experiments are conducted for an German to English translation task. As additional linguistic resources, we exploit POS information and named-entities (NE). Experiments show that the translation quality can be improved by up to 1.5 BLEU points under the low-resource condition. The performance of the POS tagger is also improved using the multi-task learning scheme. △ Less

Submitted 3 August, 2017; originally announced August 2017.

Comments: 9 pages, Second Conference on Machine Translation(WMT17)

arXiv:1708.00563 [pdf, other]

Analyzing Neural MT Search and Model Performance

Authors: Jan Niehues, Eunah Cho, Thanh-Le Ha, Alex Waibel

Abstract: In this paper, we offer an in-depth analysis about the modeling and search performance. We address the question if a more complex search algorithm is necessary. Furthermore, we investigate the question if more complex models which might only be applicable during rescoring are promising. By separating the search space and the modeling using $n$-best list reranking, we analyze the influence of bot… ▽ More In this paper, we offer an in-depth analysis about the modeling and search performance. We address the question if a more complex search algorithm is necessary. Furthermore, we investigate the question if more complex models which might only be applicable during rescoring are promising. By separating the search space and the modeling using $n$-best list reranking, we analyze the influence of both parts of an NMT system independently. By comparing differently performing NMT systems, we show that the better translation is already in the search space of the translation systems with less performance. This results indicate that the current search algorithms are sufficient for the NMT systems. Furthermore, we could show that even a relatively small $n$-best list of $50$ hypotheses already contain notably better translations. △ Less

Submitted 1 August, 2017; originally announced August 2017.

Comments: 7 pages, First Workshop on Neural Machine Translation

arXiv:1706.00180 [pdf, ps, other]

A spectral characterisation of t-designs and its applications

Authors: Eun-Kyung Cho, Cunsheng Ding, Jong Yoon Hyun

Abstract: There are two standard approaches to the construction of $t$-designs. The first one is based on permutation group actions on certain base blocks. The second one is based on coding theory. The objective of this paper is to give a spectral characterisation of all $t$-designs by introducing a characteristic Boolean function of a $t$-design. The spectra of the characteristic functions of $(n-2)/2$-… ▽ More There are two standard approaches to the construction of $t$-designs. The first one is based on permutation group actions on certain base blocks. The second one is based on coding theory. The objective of this paper is to give a spectral characterisation of all $t$-designs by introducing a characteristic Boolean function of a $t$-design. The spectra of the characteristic functions of $(n-2)/2$-$(n, n/2, 1)$ Steiner systems are determined and properties of such designs are proved. Delsarte's characterisations of orthogonal arrays and $t$-designs, which are two special cases of Delsarte's characterisation of $T$-designs in association schemes, are slightly extended into two spectral characterisations. Another characterisation of $t$-designs by Delsarte and Seidel is also extended into a spectral one. These spectral characterisations are then compared with the new spectral characterisation of this paper. △ Less

Submitted 9 June, 2018; v1 submitted 1 June, 2017; originally announced June 2017.

MSC Class: 05B05; 51E10; 94B15

arXiv:1610.05243 [pdf, other]

Pre-Translation for Neural Machine Translation

Authors: Jan Niehues, Eunah Cho, Thanh-Le Ha, Alex Waibel

Abstract: Recently, the development of neural machine translation (NMT) has significantly improved the translation quality of automatic machine translation. While most sentences are more accurate and fluent than translations by statistical machine translation (SMT)-based systems, in some cases, the NMT system produces translations that have a completely different meaning. This is especially the case when ra… ▽ More Recently, the development of neural machine translation (NMT) has significantly improved the translation quality of automatic machine translation. While most sentences are more accurate and fluent than translations by statistical machine translation (SMT)-based systems, in some cases, the NMT system produces translations that have a completely different meaning. This is especially the case when rare words occur. When using statistical machine translation, it has already been shown that significant gains can be achieved by simplifying the input in a preprocessing step. A commonly used example is the pre-reordering approach. In this work, we used phrase-based machine translation to pre-translate the input into the target language. Then a neural machine translation system generates the final hypothesis using the pre-translation. Thereby, we use either only the output of the phrase-based machine translation (PBMT) system or a combination of the PBMT output and the source sentence. We evaluate the technique on the English to German translation task. Using this approach we are able to outperform the PBMT system as well as the baseline neural MT system by up to 2 BLEU points. We analyzed the influence of the quality of the initial system on the final result. △ Less

Submitted 17 October, 2016; originally announced October 2016.

Comments: 9 pages. To appear in COLING 2016

Showing 1–50 of 54 results for author: Choo, E