Skip to main content

Showing 1–50 of 59 results for author: Yeo, J

  1. arXiv:2407.03103  [pdf, other

    cs.CL

    Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

    Authors: Suyeon Lee, Sunghwan Kim, Minju Kim, Dongjin Kang, Dongil Yang, Harim Kim, Minseok Kang, Dayi Jung, Min Hee Kim, Seungbeen Lee, Kyoung-Mee Chung, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To add… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Under Review

  2. arXiv:2406.14703  [pdf, other

    cs.CL cs.AI

    Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

    Authors: Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu

    Abstract: The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliabilit… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint; Under review

  3. arXiv:2406.12269  [pdf, other

    cs.CL

    Unveiling Implicit Table Knowledge with Question-Then-Pinpoint Reasoner for Insightful Table Summarization

    Authors: Kwangwook Seo, Jinyoung Yeo, Dongha Lee

    Abstract: Implicit knowledge hidden within the explicit table cells, such as data insights, is the key to generating a high-quality table summary. However, unveiling such implicit knowledge is a non-trivial task. Due to the complex nature of structured tables, it is challenging even for large language models (LLMs) to mine the implicit knowledge in an insightful and faithful manner. To address this challeng… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: work in progress

  4. arXiv:2406.11275  [pdf, other

    cs.CL

    Self-training Large Language Models through Knowledge Detection

    Authors: Wei Jie Yeo, Teddy Ferdinan, Przemyslaw Kazienko, Ranjan Satapathy, Erik Cambria

    Abstract: Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks. This paper explores a self-training paradigm, where the LLM autonomously curates its own labels and selectively trains on unknown data samples identified through a reference-free consistency method. Empirical evaluations demonstrate significant i… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  5. arXiv:2406.10996  [pdf, other

    cs.CL

    THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

    Authors: Seo Hyun Kim, Kai Tzu-iunn Ong, Taeyoon Kwon, Namyoung Kim, Keummin Ka, SeongHyeon Bae, Yohan Jo, Seung-won Hwang, Dongha Lee, Jinyoung Yeo

    Abstract: Large language models (LLMs) are capable of processing lengthy dialogue histories during prolonged interaction with users without additional memory modules; however, their responses tend to overlook or incorrectly recall information from the past. In this paper, we revisit memory-augmented response generation in the era of LLMs. While prior work focuses on getting rid of outdated memories, we argu… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Under Review

  6. arXiv:2406.07867  [pdf, other

    cs.CV cs.AI cs.HC

    Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

    Authors: Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, Yong Man Ro

    Abstract: In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corp… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  7. arXiv:2404.02575  [pdf, other

    cs.CL

    Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

    Authors: Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Seonghwan Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, Jinyoung Yeo

    Abstract: Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for large language models (LLMs), even though they have demonstrated promising performance in other reasoning tasks. Within this context, some recent studies use progra… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 38 pages, 4 figures

  8. arXiv:2403.04787  [pdf, other

    cs.CL cs.AI

    Ever-Evolving Memory by Blending and Refining the Past

    Authors: Seo Hyun Kim, Keummin Ka, Yohan Jo, Seung-won Hwang, Dongha Lee, Jinyoung Yeo

    Abstract: For a human-like chatbot, constructing a long-term memory is crucial. However, current large language models often lack this capability, leading to instances of missing important user information or redundantly asking for the same information, thereby diminishing conversation quality. To effectively construct memory, it is crucial to seamlessly connect past and present information, while also poss… ▽ More

    Submitted 7 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: 17 pages, 4 figures, 7 tables

  9. arXiv:2403.04460  [pdf, other

    cs.CL

    Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset

    Authors: Minjin Kim, Minju Kim, Hana Kim, Beong-woo Kwak, Soyeon Chun, Hyunseo Kim, SeongKu Kang, Youngjae Yu, Jinyoung Yeo, Dongha Lee

    Abstract: Conversational recommender system is an emerging area that has garnered an increasing interest in the community, especially with the advancements in large language models (LLMs) that enable diverse reasoning over conversational input. Despite the progress, the field has many aspects left to explore. The currently available public datasets for conversational recommendation lack specific user prefer… ▽ More

    Submitted 8 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Published at ACL 2024 Findings

  10. arXiv:2403.02966  [pdf, other

    cs.CL cs.AI cs.LG

    Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering

    Authors: Sungho Ko, Hyunjin Cho, Hyungjoo Chae, Jinyoung Yeo, Dongha Lee

    Abstract: Recent studies have investigated utilizing Knowledge Graphs (KGs) to enhance Quesetion Answering (QA) performance of Large Language Models (LLMs), yet structured KG verbalization remains challengin. Existing methods, such as triple-form or free-form textual conversion of triple-form facts, encounter several issues. These include reduced evidence density due to duplicated entities or relationships,… ▽ More

    Submitted 19 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  11. arXiv:2403.00354  [pdf, other

    cs.CL

    Self-Consistent Reasoning-based Aspect-Sentiment Quad Prediction with Extract-Then-Assign Strategy

    Authors: Jieyong Kim, Ryang Heo, Yongsik Seo, SeongKu Kang, Jinyoung Yeo, Dongha Lee

    Abstract: In the task of aspect sentiment quad prediction (ASQP), generative methods for predicting sentiment quads have shown promising results. However, they still suffer from imprecise predictions and limited interpretability, caused by data scarcity and inadequate modeling of the quadruplet composition process. In this paper, we propose Self-Consistent Reasoning-based Aspect-sentiment quadruple Predicti… ▽ More

    Submitted 8 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  12. arXiv:2402.18374  [pdf, other

    cs.CL

    VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models

    Authors: Seoyeon Kim, Kwangwook Seo, Hyungjoo Chae, Jinyoung Yeo, Dongha Lee

    Abstract: Recent approaches in domain-specific named entity recognition (NER), such as biomedical NER, have shown remarkable advances. However, they still lack of faithfulness, producing erroneous predictions. We assume that knowledge of entities can be useful in verifying the correctness of the predictions. Despite the usefulness of knowledge, resolving such errors with knowledge is nontrivial, since the k… ▽ More

    Submitted 8 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024

  13. arXiv:2402.17546  [pdf, other

    cs.AI cs.CL

    COCOA: CBT-based Conversational Counseling Agent using Memory Specialized in Cognitive Distortions and Dynamic Prompt

    Authors: Suyeon Lee, Jieun Kang, Harim Kim, Kyoung-Mee Chung, Dongha Lee, Jinyoung Yeo

    Abstract: The demand for conversational agents that provide mental health care is consistently increasing. In this work, we develop a psychological counseling agent, referred to as CoCoA, that applies Cognitive Behavioral Therapy (CBT) techniques to identify and address cognitive distortions inherent in the client's statements. Specifically, we construct a memory system to efficiently manage information nec… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 4 pages, 2 figures

  14. arXiv:2402.15151  [pdf, other

    cs.CV cs.CL eess.AS eess.IV

    Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

    Authors: Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro

    Abstract: In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements. For example, homophenes, words that share identical lip movements but produce different sounds, can be distinguished by considering the context. In this paper, we propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM),… ▽ More

    Submitted 13 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: An Erratum was added on the last page of this paper

  15. arXiv:2402.13211  [pdf, other

    cs.CL

    Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

    Authors: Dongjin Kang, Sunghwan Kim, Taeyoon Kwon, Seungjun Moon, Hyunsouk Cho, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Emotional Support Conversation (ESC) is a task aimed at alleviating individuals' emotional distress through daily conversation. Given its inherent complexity and non-intuitive nature, ESConv dataset incorporates support strategies to facilitate the generation of appropriate responses. Recently, despite the remarkable conversational ability of large language models (LLMs), previous studies have sug… ▽ More

    Submitted 5 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024

  16. arXiv:2402.11863  [pdf, other

    cs.CL

    How Interpretable are Reasoning Explanations from Prompting Large Language Models?

    Authors: Wei Jie Yeo, Ranjan Satapathy, Rick Siow Mong Goh, Erik Cambria

    Abstract: Prompt Engineering has garnered significant attention for enhancing the performance of large language models across a multitude of tasks. Techniques such as the Chain-of-Thought not only bolster task performance but also delineate a clear trajectory of reasoning steps, offering a tangible form of explanation for the audience. Prior works on interpretability assess the reasoning chains yielded by C… ▽ More

    Submitted 1 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: NAACL Findings 2024

  17. arXiv:2402.08479  [pdf, other

    cs.CL

    Plausible Extractive Rationalization through Semi-Supervised Entailment Signal

    Authors: Wei Jie Yeo, Ranjan Satapathy, Erik Cambria

    Abstract: The increasing use of complex and opaque black box models requires the adoption of interpretable measures, one such option is extractive rationalizing models, which serve as a more interpretable alternative. These models, also known as Explain-Then-Predict models, employ an explainer model to extract rationales and subsequently condition the predictor with the extracted information. Their primary… ▽ More

    Submitted 25 February, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Under review

  18. arXiv:2401.16732  [pdf, other

    cs.CR

    Flash: A Hybrid Private Inference Protocol for Deep CNNs with High Accuracy and Low Latency on CPU

    Authors: Hyeri Roh, Jinsu Yeo, Yeongil Ko, Gu-Yeon Wei, David Brooks, Woo-Seok Choi

    Abstract: This paper presents Flash, an optimized private inference (PI) hybrid protocol utilizing both homomorphic encryption (HE) and secure two-party computation (2PC), which can reduce the end-to-end PI latency for deep CNN models less than 1 minute with CPU. To this end, first, Flash proposes a low-latency convolution algorithm built upon a fast slot rotation operation and a novel data encoding scheme,… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  19. arXiv:2401.14215  [pdf, other

    cs.CL cs.AI

    Commonsense-augmented Memory Construction and Management in Long-term Conversations via Context-aware Persona Refinement

    Authors: Hana Kim, Kai Tzu-iunn Ong, Seoyeon Kim, Dongha Lee, Jinyoung Yeo

    Abstract: Memorizing and utilizing speakers' personas is a common practice for response generation in long-term conversations. Yet, human-authored datasets often provide uninformative persona sentences that hinder response quality. This paper presents a novel framework that leverages commonsense-based persona expansion to address such issues in long-term conversation. While prior work focuses on not produci… ▽ More

    Submitted 12 February, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted to EACL 2024

  20. arXiv:2401.09802  [pdf, other

    eess.AS cs.CV cs.SD

    Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

    Authors: Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Se Jin Park, Yong Man Ro

    Abstract: This paper explores sentence-level Multilingual Visual Speech Recognition with a single model for the first time. As the massive multilingual modeling of visual data requires huge computational costs, we propose a novel strategy, processing with visual speech units. Motivated by the recent success of the audio speech unit, the proposed visual speech unit is obtained by discretizing the visual spee… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  21. arXiv:2312.07399  [pdf, other

    cs.CL cs.AI

    Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

    Authors: Taeyoon Kwon, Kai Tzu-iunn Ong, Dongjin Kang, Seungjun Moon, Jeong Ryong Lee, Dosik Hwang, Yongsik Sim, Beomseok Sohn, Dongha Lee, Jinyoung Yeo

    Abstract: Machine reasoning has made great progress in recent years owing to large language models (LLMs). In the clinical domain, however, most NLP-driven projects mainly focus on clinical classification or reading comprehension, and under-explore clinical reasoning for disease diagnosis due to the expensive rationale annotation with clinicians. In this work, we present a "reasoning-aware" diagnosis framew… ▽ More

    Submitted 10 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  22. arXiv:2311.07215  [pdf, other

    cs.CL cs.SE

    Coffee: Boost Your Code LLMs by Fixing Bugs with Feedback

    Authors: Seungjun Moon, Hyungjoo Chae, Yongho Song, Taeyoon Kwon, Dongjin Kang, Kai Tzu-iunn Ong, Seung-won Hwang, Jinyoung Yeo

    Abstract: Code editing is an essential step towards reliable program synthesis to automatically correct critical errors generated from code LLMs. Recent studies have demonstrated that closed-source LLMs (i.e., ChatGPT and GPT-4) are capable of generating corrective feedback to edit erroneous inputs. However, it remains challenging for open-source code LLMs to generate feedback for code editing, since these… ▽ More

    Submitted 23 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: Work in progress

  23. arXiv:2310.13895  [pdf, other

    cs.CL cs.LG

    RTSUM: Relation Triple-based Interpretable Summarization with Multi-level Salience Visualization

    Authors: Seonglae Cho, Yonggi Cho, HoonJae Lee, Myungha Jang, Jinyoung Yeo, Dongha Lee

    Abstract: In this paper, we present RTSUM, an unsupervised summarization framework that utilizes relation triples as the basic unit for summarization. Given an input document, RTSUM first selects salient relation triples via multi-level salience scoring and then generates a concise summary from the selected relation triples by using a text-to-text language model. On the basis of RTSUM, we also develop a web… ▽ More

    Submitted 25 March, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: 8 pages, 2 figures

  24. arXiv:2310.09343  [pdf, other

    cs.CL cs.AI

    Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

    Authors: Hyungjoo Chae, Yongho Song, Kai Tzu-iunn Ong, Taeyoon Kwon, Minjin Kim, Youngjae Yu, Dongha Lee, Dongyeop Kang, Jinyoung Yeo

    Abstract: Human-like chatbots necessitate the use of commonsense reasoning in order to effectively comprehend and respond to implicit information present within conversations. Achieving such coherence and informativeness in responses, however, is a non-trivial task. Even for large language models (LLMs), the task of identifying and aggregating key evidence within a single hop presents a substantial challeng… ▽ More

    Submitted 22 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: 25 pages, 8 figures, Accepted to EMNLP 2023

  25. arXiv:2309.11960  [pdf, other

    cs.AI cs.CE q-fin.CP

    A Comprehensive Review on Financial Explainable AI

    Authors: Wei Jie Yeo, Wihan van der Heever, Rui Mao, Erik Cambria, Ranjan Satapathy, Gianmarco Mengaldo

    Abstract: The success of artificial intelligence (AI), and deep learning models in particular, has led to their widespread adoption across various industries due to their ability to process huge amounts of data and learn complex patterns. However, due to their lack of explainability, there are significant concerns regarding their use in critical sectors, such as finance and healthcare, where decision-making… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  26. arXiv:2309.08535  [pdf, other

    cs.CV cs.AI eess.AS

    Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

    Authors: Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro

    Abstract: This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple languages, especially for low-resource languages that have a limited number of labeled data. Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the… ▽ More

    Submitted 12 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  27. arXiv:2309.08531  [pdf, other

    cs.CV cs.CL eess.AS eess.IV

    Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

    Authors: Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro

    Abstract: In this paper, we propose methods to build a powerful and efficient Image-to-Speech captioning (Im2Sp) model. To this end, we start with importing the rich knowledge related to image comprehension and language modeling from a large-scale pre-trained vision-language model into Im2Sp. We set the output of the proposed Im2Sp as discretized speech units, i.e., the quantized speech features of a self-s… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  28. arXiv:2308.09311  [pdf, other

    cs.CV cs.CL cs.SD eess.AS eess.IV

    Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

    Authors: Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Yong Man Ro

    Abstract: This paper proposes a novel lip reading framework, especially for low-resource languages, which has not been well addressed in the previous literature. Since low-resource languages do not have enough video-text paired data to train the model to have sufficient power to model lip movements and language, it is regarded as challenging to develop lip reading models for low-resource languages. In order… ▽ More

    Submitted 12 January, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023

  29. arXiv:2308.07593  [pdf, other

    cs.CV cs.MM eess.AS eess.IV

    AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

    Authors: Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, Yong Man Ro

    Abstract: Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements. VSR is regarded as a challenging task because of the insufficient information on lip movements. In this paper, we propose an Audio Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement the insufficient speech information of visual modality by using audio modality. Different fro… ▽ More

    Submitted 11 January, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted by IEEE Transactions on Multimedia

  30. arXiv:2306.14568  [pdf, other

    cond-mat.mes-hall cs.LG physics.comp-ph physics.data-an

    Elucidating Interfacial Dynamics of Ti-Al Systems Using Molecular Dynamics Simulation and Markov State Modeling

    Authors: Tianjiao Li, Chenxi Tian, Atieh Moridi, Jingjie Yeo

    Abstract: Due to their remarkable mechanical and chemical properties, Ti-Al based materials are attracting considerable interest in numerous fields of engineering, such as automotive, aerospace, and defense. With their low density, high strength, and resistance to corrosion and oxidation, these intermetallic alloys and compound metal-metallic composites have found diverse applications. The present study del… ▽ More

    Submitted 3 July, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

    Journal ref: ACS Appl. Mater. Interfaces 2023, 15, 43, 50489-50498

  31. arXiv:2306.10821  [pdf, other

    cs.CL cs.SD eess.AS

    Comparison of L2 Korean pronunciation error patterns from five L1 backgrounds by using automatic phonetic transcription

    Authors: Eun Jung Yeo, Hyungshin Ryu, Jooyoung Lee, Sunhee Kim, Minhwa Chung

    Abstract: This paper presents a large-scale analysis of L2 Korean pronunciation error patterns from five different language backgrounds, Chinese, Vietnamese, Japanese, Thai, and English, by using automatic phonetic transcription. For the analysis, confusion matrices are generated for each L1, by aligning canonical phone sequences and automatically transcribed phone sequences obtained from fine-tuned Wav2Vec… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures, accepted to ICPhS 2023

  32. arXiv:2305.18392  [pdf, other

    cs.SD cs.LG eess.AS

    Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification

    Authors: Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung

    Abstract: This paper proposes an improved Goodness of Pronunciation (GoP) that utilizes Uncertainty Quantification (UQ) for automatic speech intelligibility assessment for dysarthric speech. Current GoP methods rely heavily on neural network-driven overconfident predictions, which is unsuitable for assessing dysarthric speech due to its significant acoustic differences from healthy speech. To alleviate the… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023

  33. arXiv:2305.04542  [pdf, other

    cs.CV eess.AS

    Multi-Temporal Lip-Audio Memory for Visual Speech Recognition

    Authors: Jeong Hun Yeo, Minsu Kim, Yong Man Ro

    Abstract: Visual Speech Recognition (VSR) is a task to predict a sentence or word from lip movements. Some works have been recently presented which use audio signals to supplement visual information. However, existing methods utilize only limited information such as phoneme-level features and soft labels of Automatic Speech Recognition (ASR) networks. In this paper, we present a Multi-Temporal Lip-Audio Mem… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Presented at ICASSP 2023

  34. arXiv:2304.03031  [pdf, other

    cs.AI

    Evidentiality-aware Retrieval for Overcoming Abstractiveness in Open-Domain Question Answering

    Authors: Yongho Song, Dahyun Lee, Myungha Jang, Seung-won Hwang, Kyungjae Lee, Dongha Lee, Jinyeong Yeo

    Abstract: The long-standing goal of dense retrievers in abtractive open-domain question answering (ODQA) tasks is to learn to capture evidence passages among relevant passages for any given query, such that the reader produce factually correct outputs from evidence passages. One of the key challenge is the insufficient amount of training data with the supervision of the answerability of the passages. Recent… ▽ More

    Submitted 1 February, 2024; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: Findings of EACL 2024

  35. arXiv:2303.03628  [pdf, other

    cs.CL cs.LG

    CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification

    Authors: Seungone Kim, Se June Joo, Yul Jang, Hyungjoo Chae, Jinyoung Yeo

    Abstract: Chain-of-thought (CoT) prompting enables large language models (LLMs) to solve complex reasoning tasks by generating an explanation before the final prediction. Despite it's promising ability, a critical downside of CoT prompting is that the performance is greatly affected by the factuality of the generated explanation. To improve the correctness of the explanations, fine-tuning language models wi… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted at EACL 2023 Demo

  36. arXiv:2303.01105  [pdf, other

    eess.IV cs.CV cs.LG

    Evidence-empowered Transfer Learning for Alzheimer's Disease

    Authors: Kai Tzu-iunn Ong, Hana Kim, Minjin Kim, Jinseong Jang, Beomseok Sohn, Yoon Seong Choi, Dosik Hwang, Seong Jae Hwang, Jinyoung Yeo

    Abstract: Transfer learning has been widely utilized to mitigate the data scarcity problem in the field of Alzheimer's disease (AD). Conventional transfer learning relies on re-using models trained on AD-irrelevant tasks such as natural image classification. However, it often leads to negative transfer due to the discrepancy between the non-medical source and target medical domains. To address this, we pres… ▽ More

    Submitted 17 April, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted to IEEE International Symposium on Biomedical Imaging (ISBI) 2023. The authorship was changed from co-first authors to a single first author, which was authorized by the adviser/corresponding author Jinyoung Yeo (Apr 18th, 2023)

  37. arXiv:2302.12623  [pdf, other

    cs.AI cs.CL

    TUTORING: Instruction-Grounded Conversational Agent for Language Learners

    Authors: Hyungjoo Chae, Minjin Kim, Chaehyeong Kim, Wonseok Jeong, Hyejoong Kim, Junmyung Lee, Jinyoung Yeo

    Abstract: In this paper, we propose Tutoring bot, a generative chatbot trained on a large scale of tutor-student conversations for English-language learning. To mimic a human tutor's behavior in language education, the tutor bot leverages diverse educational instructions and grounds to each instruction as additional input context for the tutor response generation. As a single instruction generally involves… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  38. arXiv:2210.15387  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Automatic Severity Classification of Dysarthric speech by using Self-supervised Model with Multi-task Learning

    Authors: Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung

    Abstract: Automatic assessment of dysarthric speech is essential for sustained treatments and rehabilitation. However, obtaining atypical speech is challenging, often leading to data scarcity issues. To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. Wav2vec 2.0 XLS-R is jointly traine… ▽ More

    Submitted 28 April, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted to ICASSP 2023

  39. arXiv:2210.15386  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Opening the Black Box of wav2vec Feature Encoder

    Authors: Kwanghee Choi, Eun Jung Yeo

    Abstract: Self-supervised models, namely, wav2vec and its variants, have shown promising results in various downstream tasks in the speech domain. However, their inner workings are poorly understood, calling for in-depth analyses on what the model learns. In this paper, we concentrate on the convolutional feature encoder where its latent space is often speculated to represent discrete acoustic units. To ana… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  40. arXiv:2210.12687  [pdf, other

    cs.CL cs.AI

    BotsTalk: Machine-sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets

    Authors: Minju Kim, Chaehyeong Kim, Yongho Song, Seung-won Hwang, Jinyoung Yeo

    Abstract: To build open-domain chatbots that are able to use diverse communicative skills, we propose a novel framework BotsTalk, where multiple agents grounded to the specific target skills participate in a conversation to automatically annotate multi-skill dialogues. We further present Blended Skill BotsTalk (BSBT), a large-scale multi-skill dialogue dataset comprising 300K conversations. Through extensiv… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP2022. Code and data are available at https://github.com/convei-lab/BotsTalk

  41. arXiv:2209.13260  [pdf, other

    cs.CL

    Multilingual analysis of intelligibility classification using English, Korean, and Tamil dysarthric speech datasets

    Authors: Eun Jung Yeo, Sunhee Kim, Minhwa Chung

    Abstract: This paper analyzes dysarthric speech datasets from three languages with different prosodic systems: English, Korean, and Tamil. We inspect 39 acoustic measurements which reflect three speech dimensions including voice quality, pronunciation, and prosody. As multilingual analysis, examination on the mean values of acoustic measurements by intelligibility levels is conducted. Further, automatic int… ▽ More

    Submitted 2 November, 2022; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: 6 pages, 1 figure, O-COCOSDA 2022

  42. arXiv:2209.12942  [pdf

    cs.CL cs.SD eess.AS

    Cross-lingual Dysarthria Severity Classification for English, Korean, and Tamil

    Authors: Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung

    Abstract: This paper proposes a cross-lingual classification method for English, Korean, and Tamil, which employs both language-independent features and language-unique features. First, we extract thirty-nine features from diverse speech dimensions such as voice quality, pronunciation, and prosody. Second, feature selections are applied to identify the optimal feature set for each language. A set of shared… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 9 pages, 4 figures, APSIPA 2022

  43. arXiv:2209.00930  [pdf, other

    cs.CL cs.AI cs.LG

    Mind the Gap! Injecting Commonsense Knowledge for Abstractive Dialogue Summarization

    Authors: Seungone Kim, Se June Joo, Hyungjoo Chae, Chaehyeong Kim, Seung-won Hwang, Jinyoung Yeo

    Abstract: In this paper, we propose to leverage the unique characteristics of dialogues sharing commonsense knowledge across participants, to resolve the difficulties in summarizing them. We present SICK, a framework that uses commonsense inferences as additional context. Compared to previous work that solely relies on the input dialogue, SICK uses an external knowledge model to generate a rich set of commo… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

    Comments: Accepted at COLING 2022

  44. arXiv:2209.00055  [pdf, other

    physics.bio-ph cs.LG stat.CO

    Computational design of antimicrobial active surfaces via automated Bayesian optimization

    Authors: Hanfeng Zhai, Jingjie Yeo

    Abstract: Biofilms pose significant problems for engineers in diverse fields, such as marine science, bioenergy, and biomedicine, where effective biofilm control is a long-term goal. The adhesion and surface mechanics of biofilms play crucial roles in generating and removing biofilm. Designing customized nano-surfaces with different surface topologies can alter the adhesive properties to remove biofilms mor… ▽ More

    Submitted 31 August, 2022; originally announced September 2022.

    Journal ref: ACS Biomater. Sci. Eng. 2022

  45. arXiv:2206.03715  [pdf, other

    cs.AI cs.CL cs.LG

    Modularized Transfer Learning with Multiple Knowledge Graphs for Zero-shot Commonsense Reasoning

    Authors: Yu Jin Kim, Beong-woo Kwak, Youngwook Kim, Reinald Kim Amplayo, Seung-won Hwang, Jinyoung Yeo

    Abstract: Commonsense reasoning systems should be able to generalize to diverse reasoning cases. However, most state-of-the-art approaches depend on expensive data annotations and overfit to a specific benchmark without learning how to perform general semantic reasoning. To overcome these drawbacks, zero-shot QA systems have shown promise as a robust learning scheme by transforming a commonsense knowledge g… ▽ More

    Submitted 22 June, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: Accepted to NAACL2022

  46. arXiv:2204.01725  [pdf, other

    cs.CV

    Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

    Authors: Minsu Kim, Jeong Hun Yeo, Yong Man Ro

    Abstract: Recognizing speech from silent lip movement, which is called lip reading, is a challenging task due to 1) the inherent information insufficiency of lip movement to fully represent the speech, and 2) the existence of homophenes that have similar lip movement with different pronunciations. In this paper, we try to alleviate the aforementioned two challenges in lip reading by proposing a Multi-head V… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Published at AAAI 2022

  47. arXiv:2202.05435  [pdf, other

    cs.CL cs.AI cs.LG

    Dual Task Framework for Improving Persona-grounded Dialogue Dataset

    Authors: Minju Kim, Beong-woo Kwak, Youngwook Kim, Hong-in Lee, Seung-won Hwang, Jinyoung Yeo

    Abstract: This paper introduces a simple yet effective data-centric approach for the task of improving persona-conditioned dialogue agents. Prior model-centric approaches unquestioningly depend on the raw crowdsourced benchmark datasets such as Persona-Chat. In contrast, we aim to fix annotation artifacts in benchmarking, which is orthogonally applicable to any dialogue model. Specifically, we augment relev… ▽ More

    Submitted 16 February, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: Accepted to AAAI2022

  48. arXiv:2201.11661  [pdf, other

    cs.LG cs.AI

    TrustAL: Trustworthy Active Learning using Knowledge Distillation

    Authors: Beong-woo Kwak, Youngwook Kim, Yu Jin Kim, Seung-won Hwang, Jinyoung Yeo

    Abstract: Active learning can be defined as iterations of data labeling, model training, and data acquisition, until sufficient labels are acquired. A traditional view of data acquisition is that, through iterations, knowledge from human labels and models is implicitly distilled to monotonically increase the accuracy and label consistency. Under this assumption, the most recently trained model is a good sur… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

    Comments: Accepted to AAAI2022

  49. arXiv:2010.08924   

    cs.LG cs.AI

    Meta-path Free Semi-supervised Learning for Heterogeneous Networks

    Authors: Shin-woo Park, Byung Jun Bae, Jinyoung Yeo, Seung-won Hwang

    Abstract: Graph neural networks (GNNs) have been widely used in representation learning on graphs and achieved superior performance in tasks such as node classification. However, analyzing heterogeneous graph of different types of nodes and links still brings great challenges for injecting the heterogeneity into a graph neural network. A general remedy is to manually or automatically design meta-paths to tr… ▽ More

    Submitted 6 January, 2021; v1 submitted 18 October, 2020; originally announced October 2020.

    Comments: The technical description of [Proposed Models] section has an error. Especially, the training process

  50. arXiv:2008.02897  [pdf, other

    cs.LG stat.ML

    Iterative Compression of End-to-End ASR Model using AutoML

    Authors: Abhinav Mehrotra, Łukasz Dudziak, Jinsu Yeo, Young-yoon Lee, Ravichander Vipperla, Mohamed S. Abdelfattah, Sourav Bhattacharya, Samin Ishtiaq, Alberto Gil C. P. Ramos, SangJeong Lee, Daehyun Kim, Nicholas D. Lane

    Abstract: Increasing demand for on-device Automatic Speech Recognition (ASR) systems has resulted in renewed interests in developing automatic model compression techniques. Past research have shown that AutoML-based Low Rank Factorization (LRF) technique, when applied to an end-to-end Encoder-Attention-Decoder style ASR model, can achieve a speedup of up to 3.7x, outperforming laborious manual rank-selectio… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Journal ref: INTERSPEECH 2020