subscribe to arXiv mailings

Implicit Regression in Subspace for High-Sensitivity CEST Imaging

Authors: Chu Chen, Yang Liu, Se Weon Park, Jizhou Li, Kannie W. Y. Chan, Raymond H. F. Chan

Abstract: Chemical Exchange Saturation Transfer (CEST) MRI demonstrates its capability in significantly enhancing the detection of proteins and metabolites with low concentrations through exchangeable protons. The clinical application of CEST, however, is constrained by its low contrast and low signal-to-noise ratio (SNR) in the acquired data. Denoising, as one of the post-processing stages for CEST data, c… ▽ More Chemical Exchange Saturation Transfer (CEST) MRI demonstrates its capability in significantly enhancing the detection of proteins and metabolites with low concentrations through exchangeable protons. The clinical application of CEST, however, is constrained by its low contrast and low signal-to-noise ratio (SNR) in the acquired data. Denoising, as one of the post-processing stages for CEST data, can effectively improve the accuracy of CEST quantification. In this work, by modeling spatial variant z-spectrums into low-dimensional subspace, we introduce Implicit Regression in Subspace (IRIS), which is an unsupervised denoising algorithm utilizing the excellent property of implicit neural representation for continuous mapping. Experiments conducted on both synthetic and in-vivo data demonstrate that our proposed method surpasses other CEST denoising methods regarding both qualitative and quantitative performance. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2406.15524 [pdf, other]

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

Authors: Sungbin Shin, Wonpyo Park, Jaeho Lee, Namhoon Lee

Abstract: This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this… ▽ More This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this approach enables pruning under memory constraints, it generates high reconstruction errors. In this work, we first present an array of reconstruction techniques that can significantly reduce this error by more than $90\%$. Unwittingly, however, we discover that minimizing reconstruction error is not always ideal and can overfit the given calibration data, resulting in rather increased language perplexity and poor performance at downstream tasks. We find out that a strategy of self-generating calibration data can mitigate this trade-off between reconstruction and generalization, suggesting new directions in the presence of both benefits and pitfalls of reconstruction for pruning LLMs. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.12016 [pdf, other]

Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization

Authors: Seungwoo Son, Wonpyo Park, Woohyun Han, Kyuyeun Kim, Jaeho Lee

Abstract: Despite recent advances in LLM quantization, activation quantization remains to be challenging due to the activation outliers. Conventional remedies, e.g., mixing precisions for different channels, introduce extra overhead and reduce the speedup. In this work, we develop a simple yet effective strategy to facilitate per-tensor activation quantization by preventing the generation of problematic tok… ▽ More Despite recent advances in LLM quantization, activation quantization remains to be challenging due to the activation outliers. Conventional remedies, e.g., mixing precisions for different channels, introduce extra overhead and reduce the speedup. In this work, we develop a simple yet effective strategy to facilitate per-tensor activation quantization by preventing the generation of problematic tokens. Precisely, we propose a method to find a set of key-value cache, coined CushionCache, which mitigates outliers in subsequent tokens when inserted as a prefix. CushionCache works in two steps: First, we greedily search for a prompt token sequence that minimizes the maximum activation values in subsequent tokens. Then, we further tune the token cache to regularize the activations of subsequent tokens to be more quantization-friendly. The proposed method successfully addresses activation outliers of LLMs, providing a substantial performance boost for per-tensor activation quantization methods. We thoroughly evaluate our method over a wide range of models and benchmarks and find that it significantly surpasses the established baseline of per-tensor W8A8 quantization and can be seamlessly integrated with the recent activation quantization method. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2405.18239 [pdf, other]

doi 10.1145/3643834.3661507

The CoExplorer Technology Probe: A Generative AI-Powered Adaptive Interface to Support Intentionality in Planning and Running Video Meetings

Authors: Gun Woo Warren Park, Payod Panda, Lev Tankelevitch, Sean Rintel

Abstract: Effective meetings are effortful, but traditional videoconferencing systems offer little support for reducing this effort across the meeting lifecycle. Generative AI (GenAI) has the potential to radically redefine meetings by augmenting intentional meeting behaviors. CoExplorer, our novel adaptive meeting prototype, preemptively generates likely phases that meetings would undergo, tools that allow… ▽ More Effective meetings are effortful, but traditional videoconferencing systems offer little support for reducing this effort across the meeting lifecycle. Generative AI (GenAI) has the potential to radically redefine meetings by augmenting intentional meeting behaviors. CoExplorer, our novel adaptive meeting prototype, preemptively generates likely phases that meetings would undergo, tools that allow capturing attendees' thoughts before the meeting, and for each phase, window layouts, and appropriate applications and files. Using CoExplorer as a technology probe in a guided walkthrough, we studied its potential in a sample of participants from a global technology company. Our findings suggest that GenAI has the potential to help meetings stay on track and reduce workload, although concerns were raised about users' agency, trust, and possible disruption to traditional meeting norms. We discuss these concerns and their design implications for the development of GenAI meeting technology. △ Less

Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: DIS 2024

arXiv:2405.17633 [pdf, other]

HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs

Authors: Jocelyn Shen, Joel Mire, Hae Won Park, Cynthia Breazeal, Maarten Sap

Abstract: Empathy serves as a cornerstone in enabling prosocial behaviors, and can be evoked through sharing of personal experiences in stories. While empathy is influenced by narrative content, intuitively, people respond to the way a story is told as well, through narrative style. Yet the relationship between empathy and narrative style is not fully understood. In this work, we empirically examine and qua… ▽ More Empathy serves as a cornerstone in enabling prosocial behaviors, and can be evoked through sharing of personal experiences in stories. While empathy is influenced by narrative content, intuitively, people respond to the way a story is told as well, through narrative style. Yet the relationship between empathy and narrative style is not fully understood. In this work, we empirically examine and quantify this relationship between style and empathy using LLMs and large-scale crowdsourcing studies. We introduce a novel, theory-based taxonomy, HEART (Human Empathy and Narrative Taxonomy) that delineates elements of narrative style that can lead to empathy with the narrator of a story. We establish the performance of LLMs in extracting narrative elements from HEART, showing that prompting with our taxonomy leads to reasonable, human-level annotations beyond what prior lexicon-based methods can do. To show empirical use of our taxonomy, we collect a dataset of empathy judgments of stories via a large-scale crowdsourcing study with N=2,624 participants. We show that narrative elements extracted via LLMs, in particular, vividness of emotions and plot volume, can elucidate the pathways by which narrative style cultivates empathy towards personal stories. Our work suggests that such models can be used for narrative analyses that lead to human-centered social and behavioral insights. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.15708 [pdf, other]

EmpathicStories++: A Multimodal Dataset for Empathy towards Personal Experiences

Authors: Jocelyn Shen, Yubin Kim, Mohit Hulse, Wazeer Zulfikar, Sharifa Alghowinem, Cynthia Breazeal, Hae Won Park

Abstract: Modeling empathy is a complex endeavor that is rooted in interpersonal and experiential dimensions of human interaction, and remains an open problem within AI. Existing empathy datasets fall short in capturing the richness of empathy responses, often being confined to in-lab or acted scenarios, lacking longitudinal data, and missing self-reported labels. We introduce a new multimodal dataset for e… ▽ More Modeling empathy is a complex endeavor that is rooted in interpersonal and experiential dimensions of human interaction, and remains an open problem within AI. Existing empathy datasets fall short in capturing the richness of empathy responses, often being confined to in-lab or acted scenarios, lacking longitudinal data, and missing self-reported labels. We introduce a new multimodal dataset for empathy during personal experience sharing: the EmpathicStories++ dataset (https://mitmedialab.github.io/empathic-stories-multimodal/) containing 53 hours of video, audio, and text data of 41 participants sharing vulnerable experiences and reading empathically resonant stories with an AI agent. EmpathicStories++ is the first longitudinal dataset on empathy, collected over a month-long deployment of social robots in participants' homes, as participants engage in natural, empathic storytelling interactions with AI agents. We then introduce a novel task of predicting individuals' empathy toward others' stories based on their personal experiences, evaluated in two contexts: participants' own personal shared story context and their reflections on stories they read. We benchmark this task using state-of-the-art models to pave the way for future improvements in contextualized and longitudinal empathy modeling. Our work provides a valuable resource for further research in developing empathetic AI systems and understanding the intricacies of human empathy within genuine, real-world settings. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted to ACL 2024 Findings

arXiv:2405.10536 [pdf, other]

Time-Varying Constraint-Aware Reinforcement Learning for Energy Storage Control

Authors: Jaeik Jeong, Tai-Yeon Ku, Wan-Ki Park

Abstract: Energy storage devices, such as batteries, thermal energy storages, and hydrogen systems, can help mitigate climate change by ensuring a more stable and sustainable power supply. To maximize the effectiveness of such energy storage, determining the appropriate charging and discharging amounts for each time period is crucial. Reinforcement learning is preferred over traditional optimization for the… ▽ More Energy storage devices, such as batteries, thermal energy storages, and hydrogen systems, can help mitigate climate change by ensuring a more stable and sustainable power supply. To maximize the effectiveness of such energy storage, determining the appropriate charging and discharging amounts for each time period is crucial. Reinforcement learning is preferred over traditional optimization for the control of energy storage due to its ability to adapt to dynamic and complex environments. However, the continuous nature of charging and discharging levels in energy storage poses limitations for discrete reinforcement learning, and time-varying feasible charge-discharge range based on state of charge (SoC) variability also limits the conventional continuous reinforcement learning. In this paper, we propose a continuous reinforcement learning approach that takes into account the time-varying feasible charge-discharge range. An additional objective function was introduced for learning the feasible action range for each time period, supplementing the objectives of training the actor for policy learning and the critic for value learning. This actively promotes the utilization of energy storage by preventing them from getting stuck in suboptimal states, such as continuous full charging or discharging. This is achieved through the enforcement of the charging and discharging levels into the feasible action range. The experimental results demonstrated that the proposed method further maximized the effectiveness of energy storage by actively enhancing its utilization. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: ICLR 2024 Workshop: Tackling Climate Change with Machine Learning

arXiv:2404.15155 [pdf, other]

Adaptive Collaboration Strategy for LLMs in Medical Decision Making

Authors: Yubin Kim, Chanwoo Park, Hyewon Jeong, Yik Siu Chan, Xuhai Xu, Daniel McDuff, Cynthia Breazeal, Hae Won Park

Abstract: Foundation models have become invaluable in advancing the medical field. Despite their promise, the strategic deployment of LLMs for effective utility in complex medical tasks remains an open question. Our novel framework, Medical Decision-making Agents (MDAgents) aims to address this gap by automatically assigning the effective collaboration structure for LLMs. Assigned solo or group collaboratio… ▽ More Foundation models have become invaluable in advancing the medical field. Despite their promise, the strategic deployment of LLMs for effective utility in complex medical tasks remains an open question. Our novel framework, Medical Decision-making Agents (MDAgents) aims to address this gap by automatically assigning the effective collaboration structure for LLMs. Assigned solo or group collaboration structure is tailored to the complexity of the medical task at hand, emulating real-world medical decision making processes. We evaluate our framework and baseline methods with state-of-the-art LLMs across a suite of challenging medical benchmarks: MedQA, MedMCQA, PubMedQA, DDXPlus, PMC-VQA, Path-VQA, and MedVidQA, achieving the best performance in 5 out of 7 benchmarks that require an understanding of multi-modal medical reasoning. Ablation studies reveal that MDAgents excels in adapting the number of collaborating agents to optimize efficiency and accuracy, showcasing its robustness in diverse scenarios. We also explore the dynamics of group consensus, offering insights into how collaborative agents could behave in complex clinical team dynamics. Our code can be found at https://github.com/mitmedialab/MDAgents. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.07554 [pdf, other]

CAT: Contrastive Adapter Training for Personalized Image Generation

Authors: Jae Wan Park, Sang Hyun Park, Jun Young Koh, Junha Lee, Min Song

Abstract: The emergence of various adapters, including Low-Rank Adaptation (LoRA) applied from the field of natural language processing, has allowed diffusion models to personalize image generation at a low cost. However, due to the various challenges including limited datasets and shortage of regularization and computation resources, adapter training often results in unsatisfactory outcomes, leading to the… ▽ More The emergence of various adapters, including Low-Rank Adaptation (LoRA) applied from the field of natural language processing, has allowed diffusion models to personalize image generation at a low cost. However, due to the various challenges including limited datasets and shortage of regularization and computation resources, adapter training often results in unsatisfactory outcomes, leading to the corruption of the backbone model's prior knowledge. One of the well known phenomena is the loss of diversity in object generation, especially within the same class which leads to generating almost identical objects with minor variations. This poses challenges in generation capabilities. To solve this issue, we present Contrastive Adapter Training (CAT), a simple yet effective strategy to enhance adapter training through the application of CAT loss. Our approach facilitates the preservation of the base model's original knowledge when the model initiates adapters. Furthermore, we introduce the Knowledge Preservation Score (KPS) to evaluate CAT's ability to keep the former information. We qualitatively and quantitatively compare CAT's improvement. Finally, we mention the possibility of CAT in the aspects of multi-concept adapter and optimization. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: CVPRW 2024

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2403.14155 [pdf, other]

Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization

Authors: Yeji Song, Jimyeong Kim, Wonhark Park, Wonsik Shin, Wonjong Rhee, Nojun Kwak

Abstract: In a surge of text-to-image (T2I) models and their customization methods that generate new images of a user-provided subject, current works focus on alleviating the costs incurred by a lengthy per-subject optimization. These zero-shot customization methods encode the image of a specified subject into a visual embedding which is then utilized alongside the textual embedding for diffusion guidance.… ▽ More In a surge of text-to-image (T2I) models and their customization methods that generate new images of a user-provided subject, current works focus on alleviating the costs incurred by a lengthy per-subject optimization. These zero-shot customization methods encode the image of a specified subject into a visual embedding which is then utilized alongside the textual embedding for diffusion guidance. The visual embedding incorporates intrinsic information about the subject, while the textual embedding provides a new, transient context. However, the existing methods often 1) are significantly affected by the input images, eg., generating images with the same pose, and 2) exhibit deterioration in the subject's identity. We first pin down the problem and show that redundant pose information in the visual embedding interferes with the textual embedding containing the desired pose information. To address this issue, we propose orthogonal visual embedding which effectively harmonizes with the given textual embedding. We also adopt the visual-only embedding and inject the subject's clear features utilizing a self-attention swap. Our results demonstrate the effectiveness and robustness of our method, which offers highly flexible zero-shot generation while effectively maintaining the subject's identity. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Project page: https://ldynx.github.io/harmony-zero-t2i/

arXiv:2403.11330 [pdf, other]

Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

Authors: Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, Louis-Philippe Morency

Abstract: We describe an approach for aligning an LLM-based dialogue agent based on global (i.e., dialogue-level) rewards, while also taking into account naturally-occurring multimodal signals. At a high level, our approach (dubbed GELI) learns a local, turn-level reward model by decomposing the human-provided Global Explicit (GE) session-level reward, using Local Implicit (LI) multimodal reward signals to… ▽ More We describe an approach for aligning an LLM-based dialogue agent based on global (i.e., dialogue-level) rewards, while also taking into account naturally-occurring multimodal signals. At a high level, our approach (dubbed GELI) learns a local, turn-level reward model by decomposing the human-provided Global Explicit (GE) session-level reward, using Local Implicit (LI) multimodal reward signals to crossmodally shape the reward decomposition step. This decomposed reward model is then used as part of the standard RHLF pipeline improve an LLM-based dialog agent. We run quantitative and qualitative human studies to evaluate the performance of our GELI approach, and find that it shows consistent improvements across various conversational metrics compared to baseline methods. △ Less

Submitted 22 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

Comments: 10 pages, 3 figures, 2 tables

arXiv:2403.04982 [pdf, other]

A 28.6 mJ/iter Stable Diffusion Processor for Text-to-Image Generation with Patch Similarity-based Sparsity Augmentation and Text-based Mixed-Precision

Authors: Jiwon Choi, Wooyoung Jo, Seongyon Hong, Beomseok Kwon, Wonhoon Park, Hoi-Jun Yoo

Abstract: This paper presents an energy-efficient stable diffusion processor for text-to-image generation. While stable diffusion attained attention for high-quality image synthesis results, its inherent characteristics hinder its deployment on mobile platforms. The proposed processor achieves high throughput and energy efficiency with three key features as solutions: 1) Patch similarity-based sparsity augm… ▽ More This paper presents an energy-efficient stable diffusion processor for text-to-image generation. While stable diffusion attained attention for high-quality image synthesis results, its inherent characteristics hinder its deployment on mobile platforms. The proposed processor achieves high throughput and energy efficiency with three key features as solutions: 1) Patch similarity-based sparsity augmentation (PSSA) to reduce external memory access (EMA) energy of self-attention score by 60.3 %, leading to 37.8 % total EMA energy reduction. 2) Text-based important pixel spotting (TIPS) to allow 44.8 % of the FFN layer workload to be processed with low-precision activation. 3) Dual-mode bit-slice core (DBSC) architecture to enhance energy efficiency in FFN layers by 43.0 %. The proposed processor is implemented in 28 nm CMOS technology and achieves 3.84 TOPS peak throughput with 225.6 mW average power consumption. In sum, 28.6 mJ/iteration highly energy-efficient text-to-image generation processor can be achieved at MS-COCO dataset. △ Less

Submitted 14 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted at 2024 IEEE International Symposium on Circuits and Systems (ISCAS)

arXiv:2403.01469 [pdf, other]

KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations

Authors: Sunjun Kweon, Byungjin Choi, Minkyu Kim, Rae Woong Park, Edward Choi

Abstract: We introduce KorMedMCQA, the first Korean multiple-choice question answering (MCQA) benchmark derived from Korean healthcare professional licensing examinations, covering from the year 2012 to year 2023. This dataset consists of a selection of questions from the license examinations for doctors, nurses, and pharmacists, featuring a diverse array of subjects. We conduct baseline experiments on vari… ▽ More We introduce KorMedMCQA, the first Korean multiple-choice question answering (MCQA) benchmark derived from Korean healthcare professional licensing examinations, covering from the year 2012 to year 2023. This dataset consists of a selection of questions from the license examinations for doctors, nurses, and pharmacists, featuring a diverse array of subjects. We conduct baseline experiments on various large language models, including proprietary/open-source, multilingual/Korean-additional pretrained, and clinical context pretrained models, highlighting the potential for further enhancements. We make our data publicly available on HuggingFace (https://huggingface.co/datasets/sean0042/KorMedMCQA) and provide a evaluation script via LM-Harness, inviting further exploration and advancement in Korean healthcare environments. △ Less

Submitted 5 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.02005 [pdf, other]

Topology-Informed Graph Transformer

Authors: Yun Young Choi, Sun Woo Park, Minho Lee, Youngho Woo

Abstract: Transformers have revolutionized performance in Natural Language Processing and Vision, paving the way for their integration with Graph Neural Networks (GNNs). One key challenge in enhancing graph transformers is strengthening the discriminative power of distinguishing isomorphisms of graphs, which plays a crucial role in boosting their predictive performances. To address this challenge, we introd… ▽ More Transformers have revolutionized performance in Natural Language Processing and Vision, paving the way for their integration with Graph Neural Networks (GNNs). One key challenge in enhancing graph transformers is strengthening the discriminative power of distinguishing isomorphisms of graphs, which plays a crucial role in boosting their predictive performances. To address this challenge, we introduce 'Topology-Informed Graph Transformer (TIGT)', a novel transformer enhancing both discriminative power in detecting graph isomorphisms and the overall performance of Graph Transformers. TIGT consists of four components: A topological positional embedding layer using non-isomorphic universal covers based on cyclic subgraphs of graphs to ensure unique graph representation: A dual-path message-passing layer to explicitly encode topological characteristics throughout the encoder layers: A global attention mechanism: And a graph information layer to recalibrate channel-wise graph features for better feature representation. TIGT outperforms previous Graph Transformers in classifying synthetic dataset aimed at distinguishing isomorphism classes of graphs. Additionally, mathematical analysis and empirical evaluations highlight our model's competitive edge over state-of-the-art Graph Transformers across various benchmark datasets. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.15894 [pdf, other]

A Gated MLP Architecture for Learning Topological Dependencies in Spatio-Temporal Graphs

Authors: Yun Young Choi, Minho Lee, Sun Woo Park, Seunghwan Lee, Joohwan Ko

Abstract: Graph Neural Networks (GNNs) and Transformer have been increasingly adopted to learn the complex vector representations of spatio-temporal graphs, capturing intricate spatio-temporal dependencies crucial for applications such as traffic datasets. Although many existing methods utilize multi-head attention mechanisms and message-passing neural networks (MPNNs) to capture both spatial and temporal r… ▽ More Graph Neural Networks (GNNs) and Transformer have been increasingly adopted to learn the complex vector representations of spatio-temporal graphs, capturing intricate spatio-temporal dependencies crucial for applications such as traffic datasets. Although many existing methods utilize multi-head attention mechanisms and message-passing neural networks (MPNNs) to capture both spatial and temporal relations, these approaches encode temporal and spatial relations independently, and reflect the graph's topological characteristics in a limited manner. In this work, we introduce the Cycle to Mixer (Cy2Mixer), a novel spatio-temporal GNN based on topological non-trivial invariants of spatio-temporal graphs with gated multi-layer perceptrons (gMLP). The Cy2Mixer is composed of three blocks based on MLPs: A message-passing block for encapsulating spatial information, a cycle message-passing block for enriching topological information through cyclic subgraphs, and a temporal block for capturing temporal properties. We bolster the effectiveness of Cy2Mixer with mathematical evidence emphasizing that our cycle message-passing block is capable of offering differentiated information to the deep learning model compared to the message-passing block. Furthermore, empirical evaluations substantiate the efficacy of the Cy2Mixer, demonstrating state-of-the-art performances across various traffic benchmark datasets. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.06866 [pdf, other]

Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data

Authors: Yubin Kim, Xuhai Xu, Daniel McDuff, Cynthia Breazeal, Hae Won Park

Abstract: Large language models (LLMs) are capable of many natural language tasks, yet they are far from perfect. In health applications, grounding and interpreting domain-specific and non-linguistic data is crucial. This paper investigates the capacity of LLMs to make inferences about health based on contextual information (e.g. user demographics, health knowledge) and physiological data (e.g. resting hear… ▽ More Large language models (LLMs) are capable of many natural language tasks, yet they are far from perfect. In health applications, grounding and interpreting domain-specific and non-linguistic data is crucial. This paper investigates the capacity of LLMs to make inferences about health based on contextual information (e.g. user demographics, health knowledge) and physiological data (e.g. resting heart rate, sleep minutes). We present a comprehensive evaluation of 12 state-of-the-art LLMs with prompting and fine-tuning techniques on four public health datasets (PMData, LifeSnaps, GLOBEM and AW_FB). Our experiments cover 10 consumer health prediction tasks in mental health, activity, metabolic, and sleep assessment. Our fine-tuned model, HealthAlpaca exhibits comparable performance to much larger models (GPT-3.5, GPT-4 and Gemini-Pro), achieving the best performance in 8 out of 10 tasks. Ablation studies highlight the effectiveness of context enhancement strategies. Notably, we observe that our context enhancement can yield up to 23.8% improvement in performance. While constructing contextually rich prompts (combining user context, health knowledge and temporal information) exhibits synergistic improvement, the inclusion of health knowledge context in prompts significantly enhances overall performance. △ Less

Submitted 27 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

arXiv:2401.02833 [pdf]

doi 10.1145/3610977.3634945

Integrating Flow Theory and Adaptive Robot Roles: A Conceptual Model of Dynamic Robot Role Adaptation for the Enhanced Flow Experience in Long-term Multi-person Human-Robot Interactions

Authors: Huili Chen, Sharifa Alghowinem, Cynthia Breazeal, Hae Won Park

Abstract: In this paper, we introduce a novel conceptual model for a robot's behavioral adaptation in its long-term interaction with humans, integrating dynamic robot role adaptation with principles of flow experience from psychology. This conceptualization introduces a hierarchical interaction objective grounded in the flow experience, serving as the overarching adaptation goal for the robot. This objectiv… ▽ More In this paper, we introduce a novel conceptual model for a robot's behavioral adaptation in its long-term interaction with humans, integrating dynamic robot role adaptation with principles of flow experience from psychology. This conceptualization introduces a hierarchical interaction objective grounded in the flow experience, serving as the overarching adaptation goal for the robot. This objective intertwines both cognitive and affective sub-objectives and incorporates individual and group-level human factors. The dynamic role adaptation approach is a cornerstone of our model, highlighting the robot's ability to fluidly adapt its support roles - from leader to follower - with the aim of maintaining equilibrium between activity challenge and user skill, thereby fostering the user's optimal flow experiences. Moreover, this work delves into a comprehensive exploration of the limitations and potential applications of our proposed conceptualization. Our model places a particular emphasis on the multi-person HRI paradigm, a dimension of HRI that is both under-explored and challenging. In doing so, we aspire to extend the applicability and relevance of our conceptualization within the HRI field, contributing to the future development of adaptive social robots capable of sustaining long-term interactions with humans. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Journal ref: Proceedings of the 2024 ACM/IEEE International Conference on Human - Robot Interaction (HRI24), March 11 - 14, 2024, Boulder, CO, USA

arXiv:2311.04035 [pdf, other]

doi 10.1007/s10994-023-06452-4

Discordance Minimization-based Imputation Algorithms for Missing Values in Rating Data

Authors: Young Woong Park, Jinhak Kim, Dan Zhu

Abstract: Ratings are frequently used to evaluate and compare subjects in various applications, from education to healthcare, because ratings provide succinct yet credible measures for comparing subjects. However, when multiple rating lists are combined or considered together, subjects often have missing ratings, because most rating lists do not rate every subject in the combined list. In this study, we pro… ▽ More Ratings are frequently used to evaluate and compare subjects in various applications, from education to healthcare, because ratings provide succinct yet credible measures for comparing subjects. However, when multiple rating lists are combined or considered together, subjects often have missing ratings, because most rating lists do not rate every subject in the combined list. In this study, we propose analyses on missing value patterns using six real-world data sets in various applications, as well as the conditions for applicability of imputation algorithms. Based on the special structures and properties derived from the analyses, we propose optimization models and algorithms that minimize the total rating discordance across rating providers to impute missing ratings in the combined rating lists, using only the known rating information. The total rating discordance is defined as the sum of the pairwise discordance metric, which can be written as a quadratic function. Computational experiments based on real-world and synthetic rating data sets show that the proposed methods outperform the state-of-the-art general imputation methods in the literature in terms of imputation accuracy. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.20187 [pdf, other]

Self-Supervised Pre-Training for Precipitation Post-Processor

Authors: Sojung An, Junha Lee, Jiyeon Jang, Inchae Na, Wooyeon Park, Sujeong You

Abstract: Obtaining a sufficient forecast lead time for local precipitation is essential in preventing hazardous weather events. Global warming-induced climate change increases the challenge of accurately predicting severe precipitation events, such as heavy rainfall. In this paper, we propose a deep learning-based precipitation post-processor for numerical weather prediction (NWP) models. The precipitation… ▽ More Obtaining a sufficient forecast lead time for local precipitation is essential in preventing hazardous weather events. Global warming-induced climate change increases the challenge of accurately predicting severe precipitation events, such as heavy rainfall. In this paper, we propose a deep learning-based precipitation post-processor for numerical weather prediction (NWP) models. The precipitation post-processor consists of (i) employing self-supervised pre-training, where the parameters of the encoder are pre-trained on the reconstruction of the masked variables of the atmospheric physics domain; and (ii) conducting transfer learning on precipitation segmentation tasks (the target domain) from the pre-trained encoder. In addition, we introduced a heuristic labeling approach to effectively train class-imbalanced datasets. Our experiments on precipitation correction for regional NWP show that the proposed method outperforms other approaches. △ Less

Submitted 19 February, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: 7 pages, 3 figures, 1 table, accepted to NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning at [this http URL](https://www.climatechange.ai/papers/neurips2023/18)

arXiv:2310.16112 [pdf, other]

Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge

Authors: Gregory Holste, Yiliang Zhou, Song Wang, Ajay Jaiswal, Mingquan Lin, Sherry Zhuge, Yuzhe Yang, Dongkyun Kim, Trong-Hieu Nguyen-Mau, Minh-Triet Tran, Jaehyup Jeong, Wongi Park, Jongbin Ryu, Feng Hong, Arsh Verma, Yosuke Yamagishi, Changhyun Kim, Hyeryeong Seo, Myungjoo Kang, Leo Anthony Celi, Zhiyong Lu, Ronald M. Summers, George Shih, Zhangyang Wang, Yifan Peng

Abstract: Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" $\unicode{x2013}$ there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of… ▽ More Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" $\unicode{x2013}$ there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification. △ Less

Submitted 1 April, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: Update after major revision

arXiv:2310.00926 [pdf, other]

Integration of Graph Neural Network and Neural-ODEs for Tumor Dynamic Prediction

Authors: Omid Bazgir, Zichen Wang, Ji Won Park, Marc Hafner, James Lu

Abstract: In anti-cancer drug development, a major scientific challenge is disentangling the complex relationships between high-dimensional genomics data from patient tumor samples, the corresponding tumor's organ of origin, the drug targets associated with given treatments and the resulting treatment response. Furthermore, to realize the aspirations of precision medicine in identifying and adjusting treatm… ▽ More In anti-cancer drug development, a major scientific challenge is disentangling the complex relationships between high-dimensional genomics data from patient tumor samples, the corresponding tumor's organ of origin, the drug targets associated with given treatments and the resulting treatment response. Furthermore, to realize the aspirations of precision medicine in identifying and adjusting treatments for patients depending on the therapeutic response, there is a need for building tumor dynamic models that can integrate both longitudinal tumor size as well as multimodal, high-content data. In this work, we take a step towards enhancing personalized tumor dynamic predictions by proposing a heterogeneous graph encoder that utilizes a bipartite Graph Convolutional Neural network (GCN) combined with Neural Ordinary Differential Equations (Neural-ODEs). We applied the methodology to a large collection of patient-derived xenograft (PDX) data, spanning a wide variety of treatments (as well as their combinations) on tumors that originated from a number of different organs. We first show that the methodology is able to discover a tumor dynamic model that significantly improves upon an empirical model which is in current use. Additionally, we show that the graph encoder is able to effectively utilize multimodal data to enhance tumor predictions. Our findings indicate that the methodology holds significant promise and offers potential applications in pre-clinical settings. △ Less

Submitted 27 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.01961 [pdf, other]

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh , et al. (17 additional authors not shown)

Abstract: In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested… ▽ More In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks. △ Less

Submitted 10 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Tech report, project page https://nice.lgresearch.ai/

arXiv:2309.01670 [pdf, other]

Blind Biological Sequence Denoising with Self-Supervised Set Learning

Authors: Nathan Ng, Ji Won Park, Jae Hyeon Lee, Ryan Lewis Kelly, Stephen Ra, Kyunghyun Cho

Abstract: Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are avai… ▽ More Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are available or error rates are too high. In this paper, we propose a novel method for blindly denoising sets of sequences without directly observing clean source sequence labels. Our method, Self-Supervised Set Learning (SSSL), gathers subreads together in an embedding space and estimates a single set embedding as the midpoint of the subreads in both the latent and sequence spaces. This set embedding represents the "average" of the subreads and can be decoded into a prediction of the clean sequence. In experiments on simulated long-read DNA data, SSSL methods denoise small reads of $\leq 6$ subreads with 17% fewer errors and large reads of $>6$ subreads with 8% fewer errors compared to the best baseline. On a real dataset of antibody sequences, SSSL improves over baselines on two self-supervised metrics, with a significant improvement on difficult small reads that comprise over 60% of the test set. By accurately denoising these reads, SSSL promises to better realize the potential of high-throughput DNA sequencing data for downstream scientific applications. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2308.05770 [pdf, other]

Fine-Grained Self-Supervised Learning with Jigsaw Puzzles for Medical Image Classification

Authors: Wongi Park, Jongbin Ryu

Abstract: Classifying fine-grained lesions is challenging due to minor and subtle differences in medical images. This is because learning features of fine-grained lesions with highly minor differences is very difficult in training deep neural networks. Therefore, in this paper, we introduce Fine-Grained Self-Supervised Learning(FG-SSL) method for classifying subtle lesions in medical images. The proposed me… ▽ More Classifying fine-grained lesions is challenging due to minor and subtle differences in medical images. This is because learning features of fine-grained lesions with highly minor differences is very difficult in training deep neural networks. Therefore, in this paper, we introduce Fine-Grained Self-Supervised Learning(FG-SSL) method for classifying subtle lesions in medical images. The proposed method progressively learns the model through hierarchical block such that the cross-correlation between the fine-grained Jigsaw puzzle and regularized original images is close to the identity matrix. We also apply hierarchical block for progressive fine-grained learning, which extracts different information in each step, to supervised learning for discovering subtle differences. Our method does not require an asymmetric model, nor does a negative sampling strategy, and is not sensitive to batch size. We evaluate the proposed fine-grained self-supervised learning method on comprehensive experiments using various medical image recognition datasets. In our experiments, the proposed method performs favorably compared to existing state-of-the-art approaches on the widely-used ISIC2018, APTOS2019, and ISIC2017 datasets. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2308.05542 [pdf, other]

Robust Asymmetric Loss for Multi-Label Long-Tailed Learning

Authors: Wongi Park, Inhyuk Park, Sungeun Kim, Jongbin Ryu

Abstract: In real medical data, training samples typically show long-tailed distributions with multiple labels. Class distribution of the medical data has a long-tailed shape, in which the incidence of different diseases is quite varied, and at the same time, it is not unusual for images taken from symptomatic patients to be multi-label diseases. Therefore, in this paper, we concurrently address these two i… ▽ More In real medical data, training samples typically show long-tailed distributions with multiple labels. Class distribution of the medical data has a long-tailed shape, in which the incidence of different diseases is quite varied, and at the same time, it is not unusual for images taken from symptomatic patients to be multi-label diseases. Therefore, in this paper, we concurrently address these two issues by putting forth a robust asymmetric loss on the polynomial function. Since our loss tackles both long-tailed and multi-label classification problems simultaneously, it leads to a complex design of the loss function with a large number of hyper-parameters. Although a model can be highly fine-tuned due to a large number of hyper-parameters, it is difficult to optimize all hyper-parameters at the same time, and there might be a risk of overfitting a model. Therefore, we regularize the loss function using the Hill loss approach, which is beneficial to be less sensitive against the numerous hyper-parameters so that it reduces the risk of overfitting the model. For this reason, the proposed loss is a generic method that can be applied to most medical image classification tasks and does not make the training process more time-consuming. We demonstrate that the proposed robust asymmetric loss performs favorably against the long-tailed with multi-label medical image classification in addition to the various long-tailed single-label datasets. Notably, our method achieves Top-5 results on the CXR-LT dataset of the ICCV CVAMD 2023 competition. We opensource our implementation of the robust asymmetric loss in the public repository: https://github.com/kalelpark/RAL. △ Less

Submitted 10 August, 2023; originally announced August 2023.

Journal ref: ICCVW 2023

arXiv:2307.15017 [pdf, other]

Samplable Anonymous Aggregation for Private Federated Data Analysis

Authors: Kunal Talwar, Shan Wang, Audra McMillan, Vojta Jina, Vitaly Feldman, Bailey Basile, Aine Cahill, Yi Sheng Chan, Mike Chatzidakis, Junye Chen, Oliver Chick, Mona Chitnis, Suman Ganta, Yusuf Goren, Filip Granqvist, Kristine Guo, Frederic Jacobs, Omid Javidbakht, Albert Liu, Richard Low, Dan Mascenik, Steve Myers, David Park, Wonhee Park, Gianni Parsa , et al. (11 additional authors not shown)

Abstract: We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Our first contribution is to propose a simple primitive that allows for efficient implementation of several commonly used algorithms, and allows for privacy accounting that is close to that in the central setting without requiring the strong trust as… ▽ More We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Our first contribution is to propose a simple primitive that allows for efficient implementation of several commonly used algorithms, and allows for privacy accounting that is close to that in the central setting without requiring the strong trust assumptions it entails. Second, we propose a system architecture that implements this primitive and perform a security analysis of the proposed system. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 24 pages

arXiv:2306.00344 [pdf, other]

BOtied: Multi-objective Bayesian optimization with tied multivariate ranks

Authors: Ji Won Park, Nataša Tagasovska, Michael Maser, Stephen Ra, Kyunghyun Cho

Abstract: Many scientific and industrial applications require the joint optimization of multiple, potentially competing objectives. Multi-objective Bayesian optimization (MOBO) is a sample-efficient framework for identifying Pareto-optimal solutions. At the heart of MOBO is the acquisition function, which determines the next candidate to evaluate by navigating the best compromises among the objectives. In t… ▽ More Many scientific and industrial applications require the joint optimization of multiple, potentially competing objectives. Multi-objective Bayesian optimization (MOBO) is a sample-efficient framework for identifying Pareto-optimal solutions. At the heart of MOBO is the acquisition function, which determines the next candidate to evaluate by navigating the best compromises among the objectives. In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function (CDF). Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied. BOtied inherits desirable invariance properties of the CDF, and an efficient implementation with copulas allows it to scale to many objectives. Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions while being computationally efficient for many objectives. △ Less

Submitted 7 June, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 12 pages (+9 appendix), 13 figures. Accepted at ICML 2024

arXiv:2305.19473 [pdf, other]

Chain of Log-Concave Markov Chains

Authors: Saeed Saremi, Ji Won Park, Francis Bach

Abstract: We introduce a theoretical framework for sampling from unnormalized densities based on a smoothing scheme that uses an isotropic Gaussian kernel with a single fixed noise scale. We prove one can decompose sampling from a density (minimal assumptions made on the density) into a sequence of sampling from log-concave conditional densities via accumulation of noisy measurements with equal noise levels… ▽ More We introduce a theoretical framework for sampling from unnormalized densities based on a smoothing scheme that uses an isotropic Gaussian kernel with a single fixed noise scale. We prove one can decompose sampling from a density (minimal assumptions made on the density) into a sequence of sampling from log-concave conditional densities via accumulation of noisy measurements with equal noise levels. Our construction is unique in that it keeps track of a history of samples, making it non-Markovian as a whole, but it is lightweight algorithmically as the history only shows up in the form of a running empirical mean of samples. Our sampling algorithm generalizes walk-jump sampling (Saremi & Hyvärinen, 2019). The "walk" phase becomes a (non-Markovian) chain of (log-concave) Markov chains. The "jump" from the accumulated measurements is obtained by empirical Bayes. We study our sampling algorithm quantitatively using the 2-Wasserstein metric and compare it with various Langevin MCMC algorithms. We also report a remarkable capacity of our algorithm to "tunnel" between modes of a distribution. △ Less

Submitted 28 September, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.14246 [pdf, other]

Modeling Empathic Similarity in Personal Narratives

Authors: Jocelyn Shen, Maarten Sap, Pedro Colon-Hernandez, Hae Won Park, Cynthia Breazeal

Abstract: The most meaningful connections between people are often fostered through expression of shared vulnerability and emotional experiences in personal narratives. We introduce a new task of identifying similarity in personal stories based on empathic resonance, i.e., the extent to which two people empathize with each others' experiences, as opposed to raw semantic or lexical similarity, as has predomi… ▽ More The most meaningful connections between people are often fostered through expression of shared vulnerability and emotional experiences in personal narratives. We introduce a new task of identifying similarity in personal stories based on empathic resonance, i.e., the extent to which two people empathize with each others' experiences, as opposed to raw semantic or lexical similarity, as has predominantly been studied in NLP. Using insights from social psychology, we craft a framework that operationalizes empathic similarity in terms of three key features of stories: main events, emotional trajectories, and overall morals or takeaways. We create EmpathicStories, a dataset of 1,500 personal stories annotated with our empathic similarity features, and 2,000 pairs of stories annotated with empathic similarity scores. Using our dataset, we fine-tune a model to compute empathic similarity of story pairs, and show that this outperforms semantic similarity models on automated correlation and retrieval metrics. Through a user study with 150 participants, we also assess the effect our model has on retrieving stories that users empathize with, compared to naive semantic similarity-based retrieval, and find that participants empathized significantly more with stories retrieved by our model. Our work has strong implications for the use of empathy-aware models to foster human connection and empathy between people. △ Less

Submitted 6 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Published at EMNLP 2023

arXiv:2305.12369 [pdf, other]

HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with Cross-person Memory Transformer

Authors: Yubin Kim, Dong Won Lee, Paul Pu Liang, Sharifa Algohwinem, Cynthia Breazeal, Hae Won Park

Abstract: Accurately modeling affect dynamics, which refers to the changes and fluctuations in emotions and affective displays during human conversations, is crucial for understanding human interactions. By analyzing affect dynamics, we can gain insights into how people communicate, respond to different situations, and form relationships. However, modeling affect dynamics is challenging due to contextual fa… ▽ More Accurately modeling affect dynamics, which refers to the changes and fluctuations in emotions and affective displays during human conversations, is crucial for understanding human interactions. By analyzing affect dynamics, we can gain insights into how people communicate, respond to different situations, and form relationships. However, modeling affect dynamics is challenging due to contextual factors, such as the complex and nuanced nature of interpersonal relationships, the situation, and other factors that influence affective displays. To address this challenge, we propose a Cross-person Memory Transformer (CPM-T) framework which is able to explicitly model affective dynamics (intrapersonal and interpersonal influences) by identifying verbal and non-verbal cues, and with a large language model to utilize the pre-trained knowledge and perform verbal reasoning. The CPM-T framework maintains memory modules to store and update the contexts within the conversation window, enabling the model to capture dependencies between earlier and later parts of a conversation. Additionally, our framework employs cross-modal attention to effectively align information from multi-modalities and leverage cross-person attention to align behaviors in multi-party interactions. We evaluate the effectiveness and generalizability of our approach on three publicly available datasets for joint engagement, rapport, and human beliefs prediction tasks. Remarkably, the CPM-T framework outperforms baseline models in average F1-scores by up to 7.3%, 9.3%, and 2.0% respectively. Finally, we demonstrate the importance of each component in the framework via ablation studies with respect to multimodal temporal behavior. △ Less

Submitted 21 May, 2023; originally announced May 2023.

arXiv:2304.14082 [pdf, other]

JaxPruner: A concise library for sparsity research

Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

Abstract: This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the… ▽ More This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks. △ Less

Submitted 18 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: Jaxpruner is hosted at http://github.com/google-research/jaxpruner

arXiv:2304.12204 [pdf, other]

Multipar-T: Multiparty-Transformer for Capturing Contingent Behaviors in Group Conversations

Authors: Dong Won Lee, Yubin Kim, Rosalind Picard, Cynthia Breazeal, Hae Won Park

Abstract: As we move closer to real-world AI systems, AI agents must be able to deal with multiparty (group) conversations. Recognizing and interpreting multiparty behaviors is challenging, as the system must recognize individual behavioral cues, deal with the complexity of multiple streams of data from multiple people, and recognize the subtle contingent social exchanges that take place amongst group membe… ▽ More As we move closer to real-world AI systems, AI agents must be able to deal with multiparty (group) conversations. Recognizing and interpreting multiparty behaviors is challenging, as the system must recognize individual behavioral cues, deal with the complexity of multiple streams of data from multiple people, and recognize the subtle contingent social exchanges that take place amongst group members. To tackle this challenge, we propose the Multiparty-Transformer (Multipar-T), a transformer model for multiparty behavior modeling. The core component of our proposed approach is the Crossperson Attention, which is specifically designed to detect contingent behavior between pairs of people. We verify the effectiveness of Multipar-T on a publicly available video-based group engagement detection benchmark, where it outperforms state-of-the-art approaches in average F-1 scores by 5.2% and individual class F-1 scores by up to 10.0%. Through qualitative analysis, we show that our Crossperson Attention module is able to discover contingent behavior. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: 7 pages, 4 figures, IJCAI

arXiv:2302.07754 [pdf, other]

SupSiam: Non-contrastive Auxiliary Loss for Learning from Molecular Conformers

Authors: Michael Maser, Ji Won Park, Joshua Yao-Yu Lin, Jae Hyeon Lee, Nathan C. Frey, Andrew Watkins

Abstract: We investigate Siamese networks for learning related embeddings for augmented samples of molecular conformers. We find that a non-contrastive (positive-pair only) auxiliary task aids in supervised training of Euclidean neural networks (E3NNs) and increases manifold smoothness (MS) around point-cloud geometries. We demonstrate this property for multiple drug-activity prediction tasks while maintain… ▽ More We investigate Siamese networks for learning related embeddings for augmented samples of molecular conformers. We find that a non-contrastive (positive-pair only) auxiliary task aids in supervised training of Euclidean neural networks (E3NNs) and increases manifold smoothness (MS) around point-cloud geometries. We demonstrate this property for multiple drug-activity prediction tasks while maintaining relevant performance metrics, and propose an extension of MS to probabilistic and regression settings. We provide an analysis of representation collapse, finding substantial effects of task-weighting, latent dimension, and regularization. We expect the presented protocol to aid in the development of reliable E3NNs from molecular conformers, even for small-data drug discovery programs. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Comments: Submitted to the MLDD workshop, ICLR 2023

arXiv:2301.06858 [pdf, other]

doi 10.1109/LRA.2023.3300248

Design, Modeling and Control of a Top-loading Fully-Actuated Cargo Transportation Multirotor

Authors: Wooyong Park, Xiangyu Wu, Dongjae Lee, Seung Jae Lee

Abstract: Existing multirotor-based cargo transportation does not maintain a constant cargo attitude due to underactuation; however, fragile payloads may require a consistent posture. The conventional method is also cumbersome when loading cargo, and the size of the cargo to be loaded is limited. To overcome these issues, we propose a new fully-actuated multirotor unmanned aerial vehicle platform capable of… ▽ More Existing multirotor-based cargo transportation does not maintain a constant cargo attitude due to underactuation; however, fragile payloads may require a consistent posture. The conventional method is also cumbersome when loading cargo, and the size of the cargo to be loaded is limited. To overcome these issues, we propose a new fully-actuated multirotor unmanned aerial vehicle platform capable of translational motion while maintaining a constant attitude. Our newly developed platform has a cubic exterior and can freely place cargo at any point on the flat top surface. However, the center-of-mass (CoM) position changes when cargo is loaded, leading to undesired attitudinal motion due to unwanted torque generation. To address this problem, we introduce a new model-free center-of-mass position estimation method inspired by the extremum-seeking control (ESC) technique. Experimental results are presented to validate the performance of the proposed estimation method, effectively estimating the CoM position and showing satisfactory constant-attitude flight performance. △ Less

Submitted 17 January, 2023; originally announced January 2023.

Comments: 8 pages, 8 figures, submitted to IEEE RA-L

arXiv:2212.14128 [pdf, other]

Joint Engagement Classification using Video Augmentation Techniques for Multi-person Human-robot Interaction

Authors: Yubin Kim, Huili Chen, Sharifa Alghowinem, Cynthia Breazeal, Hae Won Park

Abstract: Affect understanding capability is essential for social robots to autonomously interact with a group of users in an intuitive and reciprocal way. However, the challenge of multi-person affect understanding comes from not only the accurate perception of each user's affective state (e.g., engagement) but also the recognition of the affect interplay between the members (e.g., joint engagement) that p… ▽ More Affect understanding capability is essential for social robots to autonomously interact with a group of users in an intuitive and reciprocal way. However, the challenge of multi-person affect understanding comes from not only the accurate perception of each user's affective state (e.g., engagement) but also the recognition of the affect interplay between the members (e.g., joint engagement) that presents as complex, but subtle, nonverbal exchanges between them. Here we present a novel hybrid framework for identifying a parent-child dyad's joint engagement by combining a deep learning framework with various video augmentation techniques. Using a dataset of parent-child dyads reading storybooks together with a social robot at home, we first train RGB frame- and skeleton-based joint engagement recognition models with four video augmentation techniques (General Aug, DeepFake, CutOut, and Mixed) applied datasets to improve joint engagement classification performance. Second, we demonstrate experimental results on the use of trained models in the robot-parent-child interaction context. Third, we introduce a behavior-based metric for evaluating the learned representation of the models to investigate the model interpretability when recognizing joint engagement. This work serves as the first step toward fully unlocking the potential of end-to-end video understanding models pre-trained on large public datasets and augmented with data augmentation and visualization techniques for affect recognition in the multi-person human-robot interaction in the wild. △ Less

Submitted 28 December, 2022; originally announced December 2022.

arXiv:2212.10504 [pdf, other]

Can Current Task-oriented Dialogue Models Automate Real-world Scenarios in the Wild?

Authors: Sang-Woo Lee, Sungdong Kim, Donghyeon Ko, Donghoon Ham, Youngki Hong, Shin Ah Oh, Hyunhoon Jung, Wangkyo Jung, Kyunghyun Cho, Donghyun Kwak, Hyungsuk Noh, Woomyoung Park

Abstract: Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-worl… ▽ More Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way to cover the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model. △ Less

Submitted 24 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

arXiv:2212.02421 [pdf, other]

Score-based denoising for atomic structure identification

Authors: Tim Hsu, Babak Sadigh, Nicolas Bertin, Cheol Woo Park, James Chapman, Vasily Bulatov, Fei Zhou

Abstract: We propose an effective method for removing thermal vibrations that complicate the task of analyzing complex dynamics in atomistic simulation of condensed matter. Our method iteratively subtracts thermal noises or perturbations in atomic positions using a denoising score function trained on synthetically noised but otherwise perfect crystal lattices. The resulting denoised structures clearly revea… ▽ More We propose an effective method for removing thermal vibrations that complicate the task of analyzing complex dynamics in atomistic simulation of condensed matter. Our method iteratively subtracts thermal noises or perturbations in atomic positions using a denoising score function trained on synthetically noised but otherwise perfect crystal lattices. The resulting denoised structures clearly reveal underlying crystal order while retaining disorder associated with crystal defects. Purely geometric, agnostic to interatomic potentials, and trained without inputs from explicit simulations, our denoiser can be applied to simulation data generated from vastly different interatomic interactions. The denoiser is shown to improve existing classification methods such as common neighbor analysis and polyhedral template matching, reaching perfect classification accuracy on a recent benchmark dataset of thermally perturbed structures up to the melting point. Demonstrated here in a wide variety of atomistic simulation contexts, the denoiser is general, robust, and readily extendable to delineate order from disorder in structurally and chemically complex materials. △ Less

Submitted 3 May, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

arXiv:2211.10082 [pdf, other]

Private Federated Statistics in an Interactive Setting

Authors: Audra McMillan, Omid Javidbakht, Kunal Talwar, Elliot Briggs, Mike Chatzidakis, Junye Chen, John Duchi, Vitaly Feldman, Yusuf Goren, Michael Hesse, Vojta Jina, Anil Katti, Albert Liu, Cheney Lyford, Joey Meyer, Alex Palmer, David Park, Wonhee Park, Gianni Parsa, Paul Pelzl, Rehan Rishi, Congzheng Song, Shan Wang, Shundong Zhou

Abstract: Privately learning statistics of events on devices can enable improved user experience. Differentially private algorithms for such problems can benefit significantly from interactivity. We argue that an aggregation protocol can enable an interactive private federated statistics system where user's devices maintain control of the privacy assurance. We describe the architecture of such a system, and… ▽ More Privately learning statistics of events on devices can enable improved user experience. Differentially private algorithms for such problems can benefit significantly from interactivity. We argue that an aggregation protocol can enable an interactive private federated statistics system where user's devices maintain control of the privacy assurance. We describe the architecture of such a system, and analyze its security properties. △ Less

Submitted 18 November, 2022; originally announced November 2022.

arXiv:2211.07807 [pdf, other]

Hierarchical Inference of the Lensing Convergence from Photometric Catalogs with Bayesian Graph Neural Networks

Authors: Ji Won Park, Simon Birrer, Madison Ueland, Miles Cranmer, Adriano Agnello, Sebastian Wagner-Carena, Philip J. Marshall, Aaron Roodman, the LSST Dark Energy Science Collaboration

Abstract: We present a Bayesian graph neural network (BGNN) that can estimate the weak lensing convergence ($κ$) from photometric measurements of galaxies along a given line of sight. The method is of particular interest in strong gravitational time delay cosmography (TDC), where characterizing the "external convergence" ($κ_{\rm ext}$) from the lens environment and line of sight is necessary for precise in… ▽ More We present a Bayesian graph neural network (BGNN) that can estimate the weak lensing convergence ($κ$) from photometric measurements of galaxies along a given line of sight. The method is of particular interest in strong gravitational time delay cosmography (TDC), where characterizing the "external convergence" ($κ_{\rm ext}$) from the lens environment and line of sight is necessary for precise inference of the Hubble constant ($H_0$). Starting from a large-scale simulation with a $κ$ resolution of $\sim$1$'$, we introduce fluctuations on galaxy-galaxy lensing scales of $\sim$1$''$ and extract random sightlines to train our BGNN. We then evaluate the model on test sets with varying degrees of overlap with the training distribution. For each test set of 1,000 sightlines, the BGNN infers the individual $κ$ posteriors, which we combine in a hierarchical Bayesian model to yield constraints on the hyperparameters governing the population. For a test field well sampled by the training set, the BGNN recovers the population mean of $κ$ precisely and without bias, resulting in a contribution to the $H_0$ error budget well under 1\%. In the tails of the training set with sparse samples, the BGNN, which can ingest all available information about each sightline, extracts more $κ$ signal compared to a simplified version of the traditional method based on matching galaxy number counts, which is limited by sample variance. Our hierarchical inference pipeline using BGNNs promises to improve the $κ_{\rm ext}$ characterization for precision TDC. The implementation of our pipeline is available as a public Python package, Node to Joy. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: 15 pages, 8 figures (+ 6 pages, 2 figures in Appendix). Submitted to ApJ. Code at https://github.com/jiwoncpark/node-to-joy

arXiv:2210.11068 [pdf, other]

Frequency of Interest-based Noise Attenuation Method to Improve Anomaly Detection Performance

Authors: YeongHyeon Park, Myung Jin Kim, Won Seok Park

Abstract: Accurately extracting driving events is the way to maximize computational efficiency and anomaly detection performance in the tire frictional nose-based anomaly detection task. This study proposes a concise and highly useful method for improving the precision of the event extraction that is hindered by extra noise such as wind noise, which is difficult to characterize clearly due to its randomness… ▽ More Accurately extracting driving events is the way to maximize computational efficiency and anomaly detection performance in the tire frictional nose-based anomaly detection task. This study proposes a concise and highly useful method for improving the precision of the event extraction that is hindered by extra noise such as wind noise, which is difficult to characterize clearly due to its randomness. The core of the proposed method is based on the identification of the road friction sound corresponding to the frequency of interest and removing the opposite characteristics with several frequency filters. Our method enables precision maximization of driving event extraction while improving anomaly detection performance by an average of 8.506%. Therefore, we conclude our method is a practical solution suitable for road surface anomaly detection purposes in outdoor edge computing environments. △ Less

Submitted 2 December, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: 5 pages, 4 figures, 4 tables

arXiv:2210.09394 [pdf]

Review Learning: Alleviating Catastrophic Forgetting with Generative Replay without Generator

Authors: Jaesung Yoo, Sunghyuk Choi, Ye Seul Yang, Suhyeon Kim, Jieun Choi, Dongkyeong Lim, Yaeji Lim, Hyung Joon Joo, Dae Jung Kim, Rae Woong Park, Hyeong-Jin Yoon, Kwangsoo Kim

Abstract: When a deep learning model is sequentially trained on different datasets, it forgets the knowledge acquired from previous data, a phenomenon known as catastrophic forgetting. It deteriorates performance of the deep learning model on diverse datasets, which is critical in privacy-preserving deep learning (PPDL) applications based on transfer learning (TL). To overcome this, we propose review learni… ▽ More When a deep learning model is sequentially trained on different datasets, it forgets the knowledge acquired from previous data, a phenomenon known as catastrophic forgetting. It deteriorates performance of the deep learning model on diverse datasets, which is critical in privacy-preserving deep learning (PPDL) applications based on transfer learning (TL). To overcome this, we propose review learning (RL), a generative-replay-based continual learning technique that does not require a separate generator. Data samples are generated from the memory stored within the synaptic weights of the deep learning model which are used to review knowledge acquired from previous datasets. The performance of RL was validated through PPDL experiments. Simulations and real-world medical multi-institutional experiments were conducted using three types of binary classification electronic health record data. In the real-world experiments, the global area under the receiver operating curve was 0.710 for RL and 0.655 for TL. Thus, RL was highly effective in retaining previously learned knowledge. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2210.08750 [pdf, other]

Keep Me Updated! Memory Management in Long-term Conversations

Authors: Sanghwan Bae, Donghyun Kwak, Soyoung Kang, Min Young Lee, Sungdong Kim, Yuin Jeong, Hyeri Kim, Sang-Woo Lee, Woomyoung Park, Nako Sung

Abstract: Remembering important information from the past and continuing to talk about it in the present are crucial in long-term conversations. However, previous literature does not deal with cases where the memorized information is outdated, which may cause confusion in later conversations. To address this issue, we present a novel task and a corresponding dataset of memory management in long-term convers… ▽ More Remembering important information from the past and continuing to talk about it in the present are crucial in long-term conversations. However, previous literature does not deal with cases where the memorized information is outdated, which may cause confusion in later conversations. To address this issue, we present a novel task and a corresponding dataset of memory management in long-term conversations, in which bots keep track of and bring up the latest information about users while conversing through multiple sessions. In order to support more precise and interpretable memory, we represent memory as unstructured text descriptions of key information and propose a new mechanism of memory management that selectively eliminates invalidated or redundant information. Experimental results show that our approach outperforms the baselines that leave the stored memory unchanged in terms of engagingness and humanness, with larger performance gap especially in the later sessions. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: Accepted to EMNLP2022 Findings

arXiv:2210.04096 [pdf, other]

PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design

Authors: Ji Won Park, Samuel Stanton, Saeed Saremi, Andrew Watkins, Henri Dwyer, Vladimir Gligorijevic, Richard Bonneau, Stephen Ra, Kyunghyun Cho

Abstract: Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarch… ▽ More Bayesian optimization offers a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. Whereas it is possible to optimize the various properties of interest jointly using a multi-objective acquisition function, such as the expected hypervolume improvement (EHVI), this approach does not account for objectives with a hierarchical dependency structure. We consider a common use case where some regions of the Pareto frontier are prioritized over others according to a specified $\textit{partial ordering}$ in the objectives. For instance, when designing antibodies, we would like to maximize the binding affinity to a target antigen only if it can be expressed in live cell culture -- modeling the experimental dependency in which affinity can only be measured for antibodies that can be expressed and thus produced in viable quantities. In general, we may want to confer a partial ordering to the properties such that each property is optimized conditioned on its parent properties satisfying some feasibility condition. To this end, we present PropertyDAG, a framework that operates on top of the traditional multi-objective BO to impose this desired ordering on the objectives, e.g. expression $\rightarrow$ affinity. We demonstrate its performance over multiple simulated active learning iterations on a penicillin production task, toy numerical problem, and a real-world antibody design task. △ Less

Submitted 8 October, 2022; originally announced October 2022.

Comments: 9 pages, 7 figures. Submitted to NeurIPS 2022 AI4Science Workshop

arXiv:2208.13427 [pdf, other]

The PWLR Graph Representation: A Persistent Weisfeiler-Lehman scheme with Random Walks for Graph Classification

Authors: Sun Woo Park, Yun Young Choi, Dosang Joe, U Jin Choi, Youngho Woo

Abstract: This paper presents the Persistent Weisfeiler-Lehman Random walk scheme (abbreviated as PWLR) for graph representations, a novel mathematical framework which produces a collection of explainable low-dimensional representations of graphs with discrete and continuous node features. The proposed scheme effectively incorporates normalized Weisfeiler-Lehman procedure, random walks on graphs, and persis… ▽ More This paper presents the Persistent Weisfeiler-Lehman Random walk scheme (abbreviated as PWLR) for graph representations, a novel mathematical framework which produces a collection of explainable low-dimensional representations of graphs with discrete and continuous node features. The proposed scheme effectively incorporates normalized Weisfeiler-Lehman procedure, random walks on graphs, and persistent homology. We thereby integrate three distinct properties of graphs, which are local topological features, node degrees, and global topological invariants, while preserving stability from graph perturbations. This generalizes many variants of Weisfeiler-Lehman procedures, which are primarily used to embed graphs with discrete node labels. Empirical results suggest that these representations can be efficiently utilized to produce comparable results to state-of-the-art techniques in classifying graphs with discrete node labels, and enhanced performances in classifying those with continuous node features. △ Less

Submitted 29 August, 2022; originally announced August 2022.

Comments: Accepted to the ICML 2022 Workshop on Topology, Algebra, and Geometry in Machine Learning

arXiv:2208.11258 [pdf, other]

Applying Eigencontours to PolarMask-Based Instance Segmentation

Authors: Wonhui Park, Dongkwon Jin, Chang-Su Kim

Abstract: Eigencontours are the first data-driven contour descriptors based on singular value decomposition. Based on the implementation of ESE-Seg, eigencontours were applied to the instance segmentation task successfully. In this report, we incorporate eigencontours into the PolarMask network for instance segmentation. Experimental results demonstrate that the proposed algorithm yields better results than… ▽ More Eigencontours are the first data-driven contour descriptors based on singular value decomposition. Based on the implementation of ESE-Seg, eigencontours were applied to the instance segmentation task successfully. In this report, we incorporate eigencontours into the PolarMask network for instance segmentation. Experimental results demonstrate that the proposed algorithm yields better results than PolarMask on two instance segmentation datasets of COCO2017 and SBD. Also, we analyze the characteristics of eigencontours qualitatively. Our codes are available at https://github.com/dnjs3594/Eigencontours. △ Less

Submitted 23 August, 2022; originally announced August 2022.

arXiv:2206.07018 [pdf, other]

Turning a Curse into a Blessing: Enabling In-Distribution-Data-Free Backdoor Removal via Stabilized Model Inversion

Authors: Si Chen, Yi Zeng, Jiachen T. Wang, Won Park, Xun Chen, Lingjuan Lyu, Zhuoqing Mao, Ruoxi Jia

Abstract: Many backdoor removal techniques in machine learning models require clean in-distribution data, which may not always be available due to proprietary datasets. Model inversion techniques, often considered privacy threats, can reconstruct realistic training samples, potentially eliminating the need for in-distribution data. Prior attempts to combine backdoor removal and model inversion yielded limit… ▽ More Many backdoor removal techniques in machine learning models require clean in-distribution data, which may not always be available due to proprietary datasets. Model inversion techniques, often considered privacy threats, can reconstruct realistic training samples, potentially eliminating the need for in-distribution data. Prior attempts to combine backdoor removal and model inversion yielded limited results. Our work is the first to provide a thorough understanding of leveraging model inversion for effective backdoor removal by addressing key questions about reconstructed samples' properties, perceptual similarity, and the potential presence of backdoor triggers. We establish that relying solely on perceptual similarity is insufficient for robust defenses, and the stability of model predictions in response to input and parameter perturbations is also crucial. To tackle this, we introduce a novel bi-level optimization-based framework for model inversion, promoting stability and visual quality. Interestingly, we discover that reconstructed samples from a pre-trained generator's latent space are backdoor-free, even when utilizing signals from a backdoored model. We provide a theoretical analysis to support this finding. Our evaluation demonstrates that our stabilized model inversion technique achieves state-of-the-art backdoor removal performance without clean in-distribution data, matching or surpassing performance using the same amount of clean samples. △ Less

Submitted 23 March, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: Because of an equation and author informational error, this paper has been withdrawn by the submitter

arXiv:2205.04259 [pdf, other]

Multi-segment preserving sampling for deep manifold sampler

Authors: Daniel Berenberg, Jae Hyeon Lee, Simon Kelow, Ji Won Park, Andrew Watkins, Vladimir Gligorijević, Richard Bonneau, Stephen Ra, Kyunghyun Cho

Abstract: Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit biological insight and model flexibility. The deep manifold sampler was recently proposed as a means to iteratively sample variable-length protein sequences by exploiting the gradients from a function predictor. We introduce an alternative approach to this guide… ▽ More Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit biological insight and model flexibility. The deep manifold sampler was recently proposed as a means to iteratively sample variable-length protein sequences by exploiting the gradients from a function predictor. We introduce an alternative approach to this guided sampling procedure, multi-segment preserving sampling, that enables the direct inclusion of domain-specific knowledge by designating preserved and non-preserved segments along the input sequence, thereby restricting variation to only select regions. We present its effectiveness in the context of antibody design by training two models: a deep manifold sampler and a GPT-2 language model on nearly six million heavy chain sequences annotated with the IGHV1-18 gene. During sampling, we restrict variation to only the complementarity-determining region 3 (CDR3) of the input. We obtain log probability scores from a GPT-2 model for each sampled CDR3 and demonstrate that multi-segment preserving sampling generates reasonable designs while maintaining the desired, preserved regions. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2205.00176 [pdf, other]

Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models

Authors: Sanghwan Bae, Donghyun Kwak, Sungdong Kim, Donghoon Ham, Soyoung Kang, Sang-Woo Lee, Woomyoung Park

Abstract: Recent open-domain dialogue models have brought numerous breakthroughs. However, building a chat system is not scalable since it often requires a considerable volume of human-human dialogue data, especially when enforcing features such as persona, style, or safety. In this work, we study the challenge of imposing roles on open-domain dialogue systems, with the goal of making the systems maintain c… ▽ More Recent open-domain dialogue models have brought numerous breakthroughs. However, building a chat system is not scalable since it often requires a considerable volume of human-human dialogue data, especially when enforcing features such as persona, style, or safety. In this work, we study the challenge of imposing roles on open-domain dialogue systems, with the goal of making the systems maintain consistent roles while conversing naturally with humans. To accomplish this, the system must satisfy a role specification that includes certain conditions on the stated features as well as a system policy on whether or not certain types of utterances are allowed. For this, we propose an efficient data collection framework leveraging in-context few-shot learning of large-scale language models for building role-satisfying dialogue dataset from scratch. We then compare various architectures for open-domain dialogue systems in terms of meeting role specifications while maintaining conversational abilities. Automatic and human evaluations show that our models return few out-of-bounds utterances, keeping competitive performance on general metrics. We release a Korean dialogue dataset we built for further research. △ Less

Submitted 30 April, 2022; originally announced May 2022.

Comments: Accepted to NAACL2022 as a long paper

arXiv:2204.13509 [pdf, other]

On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model

Authors: Seongjin Shin, Sang-Woo Lee, Hwijeen Ahn, Sungdong Kim, HyoungSeok Kim, Boseop Kim, Kyunghyun Cho, Gichang Lee, Woomyoung Park, Jung-Woo Ha, Nako Sung

Abstract: Many recent studies on large-scale language models have reported successful in-context zero- and few-shot learning ability. However, the in-depth analysis of when in-context learning occurs is still lacking. For example, it is unknown how in-context learning performance changes as the training corpus varies. Here, we investigate the effects of the source and size of the pretraining corpus on in-co… ▽ More Many recent studies on large-scale language models have reported successful in-context zero- and few-shot learning ability. However, the in-depth analysis of when in-context learning occurs is still lacking. For example, it is unknown how in-context learning performance changes as the training corpus varies. Here, we investigate the effects of the source and size of the pretraining corpus on in-context learning in HyperCLOVA, a Korean-centric GPT-3 model. From our in-depth investigation, we introduce the following observations: (1) in-context learning performance heavily depends on the corpus domain source, and the size of the pretraining corpus does not necessarily determine the emergence of in-context learning, (2) in-context learning ability can emerge when a language model is trained on a combination of multiple corpora, even when each corpus does not result in in-context learning on its own, (3) pretraining with a corpus related to a downstream task does not always guarantee the competitive in-context learning performance of the downstream task, especially in the few-shot setting, and (4) the relationship between language modeling (measured in perplexity) and in-context learning does not always correlate: e.g., low perplexity does not always imply high in-context few-shot learning performance. △ Less

Submitted 8 May, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: Accepted to NAACL2022 as a long paper. Camera-ready version

Showing 1–50 of 103 results for author: Park, W