subscribe to arXiv mailings

EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

Authors: Yeonsu Kwon, Jiho Kim, Gyubok Lee, Seongsu Bae, Daeun Kyung, Wonchul Cha, Tom Pollard, Alistair Johnson, Edward Choi

Abstract: Electronic Health Records (EHRs) are integral for storing comprehensive patient medical records, combining structured data (e.g., medications) with detailed clinical notes (e.g., physician notes). These elements are essential for straightforward data retrieval and provide deep, contextual insights into patient care. However, they often suffer from discrepancies due to unintuitive EHR system design… ▽ More Electronic Health Records (EHRs) are integral for storing comprehensive patient medical records, combining structured data (e.g., medications) with detailed clinical notes (e.g., physician notes). These elements are essential for straightforward data retrieval and provide deep, contextual insights into patient care. However, they often suffer from discrepancies due to unintuitive EHR system designs and human errors, posing serious risks to patient safety. To address this, we developed EHRCon, a new dataset and task specifically designed to ensure data consistency between structured tables and unstructured notes in EHRs. EHRCon was crafted in collaboration with healthcare professionals using the MIMIC-III EHR dataset, and includes manual annotations of 3,943 entities across 105 clinical notes checked against database entries for consistency. EHRCon has two versions, one using the original MIMIC-III schema, and another using the OMOP CDM schema, in order to increase its applicability and generalizability. Furthermore, leveraging the capabilities of large language models, we introduce CheckEHR, a novel framework for verifying the consistency between clinical notes and database tables. CheckEHR utilizes an eight-stage process and shows promising results in both few-shot and zero-shot settings. The code is available at https://github.com/dustn1259/EHRCon. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.09047 [pdf, other]

DeepJEB: 3D Deep Learning-based Synthetic Jet Engine Bracket Dataset

Authors: Seongjun Hong, Yongmin Kwon, Dongju Shin, Jangseop Park, Namwoo Kang

Abstract: Recent advancements in artificial intelligence (AI) have significantly influenced various fields, including mechanical engineering. Nonetheless, the development of high-quality, diverse datasets for structural analysis still needs to be improved. Although traditional datasets, such as simulated jet engine bracket dataset, are useful, they are constrained by a small number of samples, which must be… ▽ More Recent advancements in artificial intelligence (AI) have significantly influenced various fields, including mechanical engineering. Nonetheless, the development of high-quality, diverse datasets for structural analysis still needs to be improved. Although traditional datasets, such as simulated jet engine bracket dataset, are useful, they are constrained by a small number of samples, which must be improved for developing robust data-driven surrogate models. This study presents the DeepJEB dataset, which has been created using deep generative models and automated engineering simulation pipelines, to overcome these challenges. Moreover, this study provides comprehensive 3D geometries and their corresponding structural analysis data. Key experiments validated the effectiveness of the DeepJEB dataset, demonstrating significant improvements in the prediction accuracy and reliability of surrogate models trained on this data. The enhanced dataset showed a broader design space and better generalization capabilities than traditional datasets. These findings highlight the potential of DeepJEB as a benchmark dataset for developing reliable surrogate models in structural engineering. The DeepJEB dataset supports advanced modeling techniques, such as graph neural networks (GNNs) and high-dimensional convolutional networks (CNNs), leveraging node-level field data for precise predictions. This dataset is set to drive innovation in engineering design applications, enabling more accurate and efficient structural performance predictions. The DeepJEB dataset is publicly accessible at: https://www.narnia.ai/dataset △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.06947 [pdf, other]

CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only

Authors: Junhee Cho, Jihoon Kim, Daseul Bae, Jinho Choo, Youngjune Gwon, Yeong-Dae Kwon

Abstract: Software robots have long been deployed in Robotic Process Automation (RPA) to automate mundane and repetitive computer tasks. The advent of Large Language Models (LLMs) with advanced reasoning capabilities has set the stage for these agents to now undertake more complex and even previously unseen tasks. However, the LLM-based automation techniques in recent literature frequently rely on HTML sour… ▽ More Software robots have long been deployed in Robotic Process Automation (RPA) to automate mundane and repetitive computer tasks. The advent of Large Language Models (LLMs) with advanced reasoning capabilities has set the stage for these agents to now undertake more complex and even previously unseen tasks. However, the LLM-based automation techniques in recent literature frequently rely on HTML source codes for input, limiting their application to web environments. Moreover, the information contained in HTML codes is often inaccurate or incomplete, making the agent less reliable for practical applications. We propose an LLM-based agent that functions solely on the basis of screenshots for recognizing environments, while leveraging in-context learning to eliminate the need for collecting large datasets of human demonstration. Our strategy, named Context-Aware Action Planning (CAAP) prompting encourages the agent to meticulously review the context in various angles. Through our proposed methodology, we achieve a success rate of 94.4% on 67~types of MiniWoB++ problems, utilizing only 1.48~demonstrations per problem type. Our method offers the potential for broader applications, especially for tasks that require inter-application coordination on computers or smartphones, showcasing a significant advancement in the field of automation agents. Codes and models are accessible at https://github.com/caap-agent/caap-agent. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 10 pages, 5 figures; (19 pages and 6 figures more in appendix)

arXiv:2406.06650 [pdf, other]

Predicting the risk of early-stage breast cancer recurrence using H\&E-stained tissue images

Authors: Geongyu Lee, Joonho Lee, Tae-Yeong Kwak, Sun Woo Kim, Youngmee Kwon, Chungyeul Kim, Hyeyoon Chang

Abstract: Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images la… ▽ More Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images labeled with the risk prediction via genomics assays were used, and we obtained sensitivity of 0.857, 0.746, and 0.529 for predicting low, intermediate, and high risk, and specificity of 0.816, 0.803, and 0.972. When compared to the expert pathologist's regional histology grade information, a Pearson's correlation coefficient of 0.61 was obtained. When we checked the model learned through these studies through the class activation map, we found that it actually considered tubule formation and mitotic rate when predicting different risk groups. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 12 pages, 7 figures

arXiv:2406.01339 [pdf, other]

Recover as It is Designed to Be: Recovering from Compatibility Mobile App Crashes by Reusing User Flows

Authors: Donghwi Kim, Hyungjun Yoon, Chang Min Park, Sujin Han, Youngjin Kwon, Steven Y. Ko, Sung-Ju Lee

Abstract: Android OS is severely fragmented by API updates and device vendors' OS customization, creating a market condition where vastly different OS versions coexist. This gives rise to compatibility crash problems where Android apps crash on certain Android versions but not on others. Although well-known, this problem is extremely challenging for app developers to overcome due to the sheer number of Andr… ▽ More Android OS is severely fragmented by API updates and device vendors' OS customization, creating a market condition where vastly different OS versions coexist. This gives rise to compatibility crash problems where Android apps crash on certain Android versions but not on others. Although well-known, this problem is extremely challenging for app developers to overcome due to the sheer number of Android versions in the market that must be tested. We present RecoFlow, a framework for enabling app developers to automatically recover an app from a crash by programming user flows with our API and visual tools. RecoFlow tracks app feature usage with the user flows on user devices and recovers an app from a crash by replaying UI actions of the app feature disrupted by the crash. To prevent recurring compatibility crashes, RecoFlow executes a previously crashed app in compatibility mode that is enabled by our novel Android OS virtualization technique. Our evaluation with professional Android developers shows that our API and tools are easy to use and effective in recovering from compatibility crashes. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.18253 [pdf, other]

Truthful Dataset Valuation by Pointwise Mutual Information

Authors: Shuran Zheng, Yongchan Kwon, Xuan Qi, James Zou

Abstract: A common way to evaluate a dataset in ML involves training a model on this dataset and assessing the model's performance on a test set. However, this approach has two issues: (1) it may incentivize undesirable data manipulation in data marketplaces, as the self-interested data providers seek to modify the dataset to maximize their evaluation scores; (2) it may select datasets that overfit to poten… ▽ More A common way to evaluate a dataset in ML involves training a model on this dataset and assessing the model's performance on a test set. However, this approach has two issues: (1) it may incentivize undesirable data manipulation in data marketplaces, as the self-interested data providers seek to modify the dataset to maximize their evaluation scores; (2) it may select datasets that overfit to potentially small test sets. We propose a new data valuation method that provably guarantees the following: data providers always maximize their expected score by truthfully reporting their observed data. Any manipulation of the data, including but not limited to data duplication, adding random data, data removal, or re-weighting data from different groups, cannot increase their expected score. Our method, following the paradigm of proper scoring rules, measures the pointwise mutual information (PMI) of the test dataset and the evaluated dataset. However, computing the PMI of two datasets is challenging. We introduce a novel PMI measuring method that greatly improves tractability within Bayesian machine learning contexts. This is accomplished through a new characterization of PMI that relies solely on the posterior probabilities of the model parameter at an arbitrarily selected value. Finally, we support our theoretical results with simulations and further test the effectiveness of our data valuation method in identifying the top datasets among multiple data providers. Interestingly, our method outperforms the standard approach of selecting datasets based on the trained model's test performance, suggesting that our truthful valuation score can also be more robust to overfitting. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.06424 [pdf, other]

Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation

Authors: JoonHo Lee, Jae Oh Woo, Juree Seok, Parisa Hassanzadeh, Wooseok Jang, JuYoun Son, Sima Didari, Baruch Gutow, Heng Hao, Hankyu Moon, Wenjun Hu, Yeong-Dae Kwon, Taehee Lee, Seungjai Min

Abstract: Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for t… ▽ More Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for the quality of paired responses based on Bayesian approximation. Trained with preference datasets, our uncertainty-enabled proxy not only scores rewards for responses but also evaluates their inherent uncertainty. Empirical results demonstrate significant benefits of incorporating the proposed proxy into language model training. Our method boosts the instruction following capability of language models by refining data curation for training and improving policy optimization objectives, thereby surpassing existing methods by a large margin on benchmarks such as Vicuna and MT-bench. These findings highlight that our proposed approach substantially advances language model training and paves a new way of harnessing uncertainty within language models. △ Less

Submitted 19 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: Accepted to ICML 2024

arXiv:2405.03875 [pdf, other]

Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits

Authors: Jiachen T. Wang, Tianji Yang, James Zou, Yongchan Kwon, Ruoxi Jia

Abstract: Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research. Data selection is considered a standard application of Data Shapley. However, its data selection performance has shown to be inconsistent across settings in the literature. This study aims to deepen our understanding of this phenomenon. We introduce a hypothesis te… ▽ More Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research. Data selection is considered a standard application of Data Shapley. However, its data selection performance has shown to be inconsistent across settings in the literature. This study aims to deepen our understanding of this phenomenon. We introduce a hypothesis testing framework and show that Data Shapley's performance can be no better than random selection without specific constraints on utility functions. We identify a class of utility functions, monotonically transformed modular functions, within which Data Shapley optimally selects data. Based on this insight, we propose a heuristic for predicting Data Shapley's effectiveness in data selection tasks. Our experiments corroborate these findings, adding new insights into when Data Shapley may or may not succeed. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2404.10933 [pdf, other]

LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs

Authors: Taeho Kim, Yanming Wang, Vatshank Chaturvedi, Lokesh Gupta, Seyeon Kim, Yongin Kwon, Sangtae Ha

Abstract: Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However, determining the most effective method for achieving rapid fine-tuning while preventing GPU out-of-memory issues in a given environment remains unclear. To address thi… ▽ More Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However, determining the most effective method for achieving rapid fine-tuning while preventing GPU out-of-memory issues in a given environment remains unclear. To address this challenge, we introduce LLMem, a solution that estimates the GPU memory consumption when applying distributed fine-tuning methods across multiple GPUs and identifies the optimal method. We conduct GPU memory usage estimation prior to fine-tuning, leveraging the fundamental structure of transformer-based decoder models and the memory usage distribution of each method. Experimental results show that LLMem accurately estimates peak GPU memory usage on a single GPU, with error rates of up to 1.6%. Additionally, it shows an average error rate of 3.0% when applying distributed fine-tuning methods to LLMs with more than a billion parameters on multi-GPU setups. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 9 pages, 9 figures, accepted to IJCAI 2024

arXiv:2404.08847 [pdf, other]

doi 10.1145/3620665.3640384

LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models

Authors: Juntaek Lim, Youngeun Kwon, Ranggi Hwang, Kiwan Maeng, G. Edward Suh, Minsoo Rhu

Abstract: Differential privacy (DP) is widely being employed in the industry as a practical standard for privacy protection. While private training of computer vision or natural language processing applications has been studied extensively, the computational challenges of training of recommender systems (RecSys) with DP have not been explored. In this work, we first present our detailed characterization of… ▽ More Differential privacy (DP) is widely being employed in the industry as a practical standard for privacy protection. While private training of computer vision or natural language processing applications has been studied extensively, the computational challenges of training of recommender systems (RecSys) with DP have not been explored. In this work, we first present our detailed characterization of private RecSys training using DP-SGD, root-causing its several performance bottlenecks. Specifically, we identify DP-SGD's noise sampling and noisy gradient update stage to suffer from a severe compute and memory bandwidth limitation, respectively, causing significant performance overhead in training private RecSys. Based on these findings, we propose LazyDP, an algorithm-software co-design that addresses the compute and memory challenges of training RecSys with DP-SGD. Compared to a state-of-the-art DP-SGD training system, we demonstrate that LazyDP provides an average 119x training throughput improvement while also ensuring mathematically equivalent, differentially private RecSys models to be trained. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Journal ref: Published at 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-29), 2024

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2403.16049 [pdf, other]

doi 10.1016/j.chaos.2024.115032

Improving Demand Forecasting in Open Systems with Cartogram-Enhanced Deep Learning

Authors: Sangjoon Park, Yongsung Kwon, Hyungjoon Soh, Mi Jin Lee, Seung-Woo Son

Abstract: Predicting temporal patterns across various domains poses significant challenges due to their nuanced and often nonlinear trajectories. To address this challenge, prediction frameworks have been continuously refined, employing data-driven statistical methods, mathematical models, and machine learning. Recently, as one of the challenging systems, shared transport systems such as public bicycles hav… ▽ More Predicting temporal patterns across various domains poses significant challenges due to their nuanced and often nonlinear trajectories. To address this challenge, prediction frameworks have been continuously refined, employing data-driven statistical methods, mathematical models, and machine learning. Recently, as one of the challenging systems, shared transport systems such as public bicycles have gained prominence due to urban constraints and environmental concerns. Predicting rental and return patterns at bicycle stations remains a formidable task due to the system's openness and imbalanced usage patterns across stations. In this study, we propose a deep learning framework to predict rental and return patterns by leveraging cartogram approaches. The cartogram approach facilitates the prediction of demand for newly installed stations with no training data as well as long-period prediction, which has not been achieved before. We apply this method to public bicycle rental-and-return data in Seoul, South Korea, employing a spatial-temporal convolutional graph attention network. Our improved architecture incorporates batch attention and modified node feature updates for better prediction accuracy across different time scales. We demonstrate the effectiveness of our framework in predicting temporal patterns and its potential applications. △ Less

Submitted 26 May, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

Comments: 11 pages, 7 figures

arXiv:2403.12098 [pdf, other]

Deep Generative Design for Mass Production

Authors: Jihoon Kim, Yongmin Kwon, Namwoo Kang

Abstract: Generative Design (GD) has evolved as a transformative design approach, employing advanced algorithms and AI to create diverse and innovative solutions beyond traditional constraints. Despite its success, GD faces significant challenges regarding the manufacturability of complex designs, often necessitating extensive manual modifications due to limitations in standard manufacturing processes and t… ▽ More Generative Design (GD) has evolved as a transformative design approach, employing advanced algorithms and AI to create diverse and innovative solutions beyond traditional constraints. Despite its success, GD faces significant challenges regarding the manufacturability of complex designs, often necessitating extensive manual modifications due to limitations in standard manufacturing processes and the reliance on additive manufacturing, which is not ideal for mass production. Our research introduces an innovative framework addressing these manufacturability concerns by integrating constraints pertinent to die casting and injection molding into GD, through the utilization of 2D depth images. This method simplifies intricate 3D geometries into manufacturable profiles, removing unfeasible features such as non-manufacturable overhangs and allowing for the direct consideration of essential manufacturing aspects like thickness and rib design. Consequently, designs previously unsuitable for mass production are transformed into viable solutions. We further enhance this approach by adopting an advanced 2D generative model, which offer a more efficient alternative to traditional 3D shape generation methods. Our results substantiate the efficacy of this framework, demonstrating the production of innovative, and, importantly, manufacturable designs. This shift towards integrating practical manufacturing considerations into GD represents a pivotal advancement, transitioning from purely inspirational concepts to actionable, production-ready solutions. Our findings underscore usefulness and potential of GD for broader industry adoption, marking a significant step forward in aligning GD with the demands of manufacturing challenges. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.11513 [pdf, other]

Visual Preference Inference: An Image Sequence-Based Preference Reasoning in Tabletop Object Manipulation

Authors: Joonhyung Lee, Sangbeom Park, Yongin Kwon, Jemin Lee, Minwook Ahn, Sungjoon Choi

Abstract: In robotic object manipulation, human preferences can often be influenced by the visual attributes of objects, such as color and shape. These properties play a crucial role in operating a robot to interact with objects and align with human intention. In this paper, we focus on the problem of inferring underlying human preferences from a sequence of raw visual observations in tabletop manipulation… ▽ More In robotic object manipulation, human preferences can often be influenced by the visual attributes of objects, such as color and shape. These properties play a crucial role in operating a robot to interact with objects and align with human intention. In this paper, we focus on the problem of inferring underlying human preferences from a sequence of raw visual observations in tabletop manipulation environments with a variety of object types, named Visual Preference Inference (VPI). To facilitate visual reasoning in the context of manipulation, we introduce the Chain-of-Visual-Residuals (CoVR) method. CoVR employs a prompting mechanism that describes the difference between the consecutive images (i.e., visual residuals) and incorporates such texts with a sequence of images to infer the user's preference. This approach significantly enhances the ability to understand and adapt to dynamic changes in its visual environment during manipulation tasks. Furthermore, we incorporate such texts along with a sequence of images to infer the user's preferences. Our method outperforms baseline methods in terms of extracting human preferences from visual sequences in both simulation and real-world environments. Code and videos are available at: \href{https://joonhyung-lee.github.io/vpi/}{https://joonhyung-lee.github.io/vpi/} △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 8 pages

arXiv:2402.09264 [pdf, other]

UR2M: Uncertainty and Resource-Aware Event Detection on Microcontrollers

Authors: Hong Jia, Young D. Kwon, Dong Ma, Nhat Pham, Lorena Qendro, Tam Vu, Cecilia Mascolo

Abstract: Traditional machine learning techniques are prone to generating inaccurate predictions when confronted with shifts in the distribution of data between the training and testing phases. This vulnerability can lead to severe consequences, especially in applications such as mobile healthcare. Uncertainty estimation has the potential to mitigate this issue by assessing the reliability of a model's outp… ▽ More Traditional machine learning techniques are prone to generating inaccurate predictions when confronted with shifts in the distribution of data between the training and testing phases. This vulnerability can lead to severe consequences, especially in applications such as mobile healthcare. Uncertainty estimation has the potential to mitigate this issue by assessing the reliability of a model's output. However, existing uncertainty estimation techniques often require substantial computational resources and memory, making them impractical for implementation on microcontrollers (MCUs). This limitation hinders the feasibility of many important on-device wearable event detection (WED) applications, such as heart attack detection. In this paper, we present UR2M, a novel Uncertainty and Resource-aware event detection framework for MCUs. Specifically, we (i) develop an uncertainty-aware WED based on evidential theory for accurate event detection and reliable uncertainty estimation; (ii) introduce a cascade ML framework to achieve efficient model inference via early exits, by sharing shallower model layers among different event models; (iii) optimize the deployment of the model and MCU library for system efficiency. We conducted extensive experiments and compared UR2M to traditional uncertainty baselines using three wearable datasets. Our results demonstrate that UR2M achieves up to 864% faster inference speed, 857% energy-saving for uncertainty estimation, 55% memory saving on two popular MCUs, and a 22% improvement in uncertainty quantification performance. UR2M can be deployed on a wide range of MCUs, significantly expanding real-time and reliable WED applications. △ Less

Submitted 12 March, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2401.16875 [pdf, other]

doi 10.1364/OE.520492

Quantum Circuit Mapping for Universal and Scalable Computing in MZI-based Integrated Photonics

Authors: Yong Kwon, Alessio Baldazzi, Lorenzo Pavesi, Byung-Soo Choi

Abstract: Linear optical quantum computing (LOQC) offers a quantum computation paradigm based on well-established and robust technology and flexible environmental conditions following DiVincenzo's criteria. Within this framework, integrated photonics can be utilized to achieve gate-based quantum computing, defining qubits by path-encoding, quantum gates through the use of Mach-Zehnder interferometers (MZIs)… ▽ More Linear optical quantum computing (LOQC) offers a quantum computation paradigm based on well-established and robust technology and flexible environmental conditions following DiVincenzo's criteria. Within this framework, integrated photonics can be utilized to achieve gate-based quantum computing, defining qubits by path-encoding, quantum gates through the use of Mach-Zehnder interferometers (MZIs) as fundamental building blocks, and measurements through single-photon detectors. In particular, universal two-qubit gates can be achieved by suitable structures of MZIs together with post-selection or heralding. The most resource-efficient choice is given by the post-selected CZ gate. However, this implementation is characterized by a design which has a non-regular structure and cannot be cascaded. This limits the implementation of large-scale LOQC. Starting from these issues, we suggest an approach to move toward a universal and scalable LOQC on the integrated photonic platform. First of all, choosing the post-selected CZ as universal two-qubit gate, we extend the path-encoded dual-rail qubit to a triplet of waveguides, composed of an auxiliary waveguide and the pair of waveguides corresponding to the qubit basis states. Additionally, we introduce a swap photonic network that maps the regularly-labeled structure of the new path-encoded qubits to the structure needed for the post-selected CZ. We also discuss the optical swap gate that allows the connection of non-nearest neighbor path-encoded qubits. In this way, we can deterministically exchange the locations of the qubits and execute controlled quantum gates between any path-encoded qubits. Next, by truncating the auxiliary waveguides after any post-selected CZ, we find that it is possible to cascade this optical gate when it acts on different pairs that share only one qubit. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Journal ref: Opt. Express 32, 12852-12881 (2024)

arXiv:2312.08400 [pdf, other]

Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction

Authors: Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed

Abstract: Large language models (LLMs) finetuned to follow human instruction have recently exhibited significant capabilities in various English NLP tasks. However, their performance in grammatical error correction (GEC), especially on languages other than English, remains significantly unexplored. In this work, we evaluate the abilities of instruction finetuned LLMs in Arabic GEC, a complex task due to Ara… ▽ More Large language models (LLMs) finetuned to follow human instruction have recently exhibited significant capabilities in various English NLP tasks. However, their performance in grammatical error correction (GEC), especially on languages other than English, remains significantly unexplored. In this work, we evaluate the abilities of instruction finetuned LLMs in Arabic GEC, a complex task due to Arabic's rich morphology. Our findings suggest that various prompting methods, coupled with (in-context) few-shot learning, demonstrate considerable effectiveness, with GPT-4 achieving up to $65.49$ F$_{1}$ score under expert prompting (approximately $5$ points higher than our established baseline). Despite these positive results, we find that instruction finetuned models, regardless of their size, are still outperformed by fully finetuned ones, even if they are significantly smaller in size. This disparity highlights substantial room for improvements for LLMs. Inspired by methods used in low-resource machine translation, we also develop a method exploiting synthetic data that significantly outperforms previous models on two standard Arabic benchmarks. Our best model achieves a new SOTA on Arabic GEC, with $73.29$ and $73.26$ F$_{1}$ on the 2014 and 2015 QALB datasets, respectively, compared to peer-reviewed published baselines. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: arXiv admin note: text overlap with arXiv:2308.04492

arXiv:2312.01511 [pdf, other]

SoK: The Gap Between Data Rights Ideals and Reality

Authors: Yujin Kwon, Ella Corren, Gonzalo Munilla Garrido, Chris Hoofnagle, Dawn Song

Abstract: As information economies burgeon, they unlock innovation and economic wealth while posing novel threats to civil liberties and altering power dynamics between individuals, companies, and governments. Legislatures have reacted with privacy laws designed to empower individuals over their data. These laws typically create rights for "data subjects" (individuals) to make requests of data collectors (c… ▽ More As information economies burgeon, they unlock innovation and economic wealth while posing novel threats to civil liberties and altering power dynamics between individuals, companies, and governments. Legislatures have reacted with privacy laws designed to empower individuals over their data. These laws typically create rights for "data subjects" (individuals) to make requests of data collectors (companies and governments). The European Union General Data Protection Regulation (GDPR) exemplifies this, granting extensive data rights to data subjects, a model embraced globally. However, the question remains: do these rights-based privacy laws effectively empower individuals over their data? This paper scrutinizes these approaches by reviewing 201 interdisciplinary empirical studies, news articles, and blog posts. We pinpoint 15 key questions concerning the efficacy of rights allocations. The literature often presents conflicting results regarding the effectiveness of rights-based frameworks, but it generally emphasizes their limitations. We offer recommendations to policymakers and Computer Science (CS) groups committed to these frameworks, and suggest alternative privacy regulation approaches. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2311.13712 [pdf, other]

Data Acquisition: A New Frontier in Data-centric AI

Authors: Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou

Abstract: As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative. There is limited study on the challenges of data acquisition due to ad-hoc processes and lack of consistent methodologies. We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets, transparent prici… ▽ More As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative. There is limited study on the challenges of data acquisition due to ad-hoc processes and lack of consistent methodologies. We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets, transparent pricing, standardized data formats. With the objective of inciting participation from the data-centric AI community, we then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers. The benchmark was released as a part of DataPerf. Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in ML. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.11420 [pdf, other]

LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms

Authors: Young D. Kwon, Jagmohan Chauhan, Hong Jia, Stylianos I. Venieris, Cecilia Mascolo

Abstract: Continual Learning (CL) allows applications such as user personalization and household robots to learn on the fly and adapt to context. This is an important feature when context, actions, and users change. However, enabling CL on resource-constrained embedded systems is challenging due to the limited labeled data, memory, and computing capacity. In this paper, we propose LifeLearner, a hardware-aw… ▽ More Continual Learning (CL) allows applications such as user personalization and household robots to learn on the fly and adapt to context. This is an important feature when context, actions, and users change. However, enabling CL on resource-constrained embedded systems is challenging due to the limited labeled data, memory, and computing capacity. In this paper, we propose LifeLearner, a hardware-aware meta continual learning system that drastically optimizes system resources (lower memory, latency, energy consumption) while ensuring high accuracy. Specifically, we (1) exploit meta-learning and rehearsal strategies to explicitly cope with data scarcity issues and ensure high accuracy, (2) effectively combine lossless and lossy compression to significantly reduce the resource requirements of CL and rehearsal samples, and (3) developed hardware-aware system on embedded and IoT platforms considering the hardware characteristics. As a result, LifeLearner achieves near-optimal CL performance, falling short by only 2.8% on accuracy compared to an Oracle baseline. With respect to the state-of-the-art (SOTA) Meta CL method, LifeLearner drastically reduces the memory footprint (by 178.7x), end-to-end latency by 80.8-94.2%, and energy consumption by 80.9-94.2%. In addition, we successfully deployed LifeLearner on two edge devices and a microcontroller unit, thereby enabling efficient CL on resource-constrained platforms where it would be impractical to run SOTA methods and the far-reaching deployment of adaptable CL in a ubiquitous manner. Code is available at https://github.com/theyoungkwon/LifeLearner. △ Less

Submitted 19 November, 2023; originally announced November 2023.

Comments: Accepted for publication at SenSys 2023

arXiv:2310.11449 [pdf, other]

DELIFFAS: Deformable Light Fields for Fast Avatar Synthesis

Authors: Youngjoong Kwon, Lingjie Liu, Henry Fuchs, Marc Habermann, Christian Theobalt

Abstract: Generating controllable and photorealistic digital human avatars is a long-standing and important problem in Vision and Graphics. Recent methods have shown great progress in terms of either photorealism or inference speed while the combination of the two desired properties still remains unsolved. To this end, we propose a novel method, called DELIFFAS, which parameterizes the appearance of the hum… ▽ More Generating controllable and photorealistic digital human avatars is a long-standing and important problem in Vision and Graphics. Recent methods have shown great progress in terms of either photorealism or inference speed while the combination of the two desired properties still remains unsolved. To this end, we propose a novel method, called DELIFFAS, which parameterizes the appearance of the human as a surface light field that is attached to a controllable and deforming human mesh model. At the core, we represent the light field around the human with a deformable two-surface parameterization, which enables fast and accurate inference of the human appearance. This allows perceptual supervision on the full image compared to previous approaches that could only supervise individual pixels or small patches due to their slow runtime. Our carefully designed human representation and supervision strategy leads to state-of-the-art synthesis results and inference time. The video results and code are available at https://vcai.mpi-inf.mpg.de/projects/DELIFFAS. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.11220 [pdf, other]

KG-GPT: A General Framework for Reasoning on Knowledge Graphs Using Large Language Models

Authors: Jiho Kim, Yeonsu Kwon, Yohan Jo, Edward Choi

Abstract: While large language models (LLMs) have made considerable advancements in understanding and generating unstructured text, their application in structured data remains underexplored. Particularly, using LLMs for complex reasoning tasks on knowledge graphs (KGs) remains largely untouched. To address this, we propose KG-GPT, a multi-purpose framework leveraging LLMs for tasks employing KGs. KG-GPT co… ▽ More While large language models (LLMs) have made considerable advancements in understanding and generating unstructured text, their application in structured data remains underexplored. Particularly, using LLMs for complex reasoning tasks on knowledge graphs (KGs) remains largely untouched. To address this, we propose KG-GPT, a multi-purpose framework leveraging LLMs for tasks employing KGs. KG-GPT comprises three steps: Sentence Segmentation, Graph Retrieval, and Inference, each aimed at partitioning sentences, retrieving relevant graph components, and deriving logical conclusions, respectively. We evaluate KG-GPT using KG-based fact verification and KGQA benchmarks, with the model showing competitive and robust performance, even outperforming several fully-supervised models. Our work, therefore, marks a significant step in unifying structured and unstructured data processing within the realm of LLMs. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 Findings

arXiv:2310.00902 [pdf, other]

DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models

Authors: Yongchan Kwon, Eric Wu, Kevin Wu, James Zou

Abstract: Quantifying the impact of training data points is crucial for understanding the outputs of machine learning models and for improving the transparency of the AI pipeline. The influence function is a principled and popular data attribution method, but its computational cost often makes it challenging to use. This issue becomes more pronounced in the setting of large language models and text-to-image… ▽ More Quantifying the impact of training data points is crucial for understanding the outputs of machine learning models and for improving the transparency of the AI pipeline. The influence function is a principled and popular data attribution method, but its computational cost often makes it challenging to use. This issue becomes more pronounced in the setting of large language models and text-to-image models. In this work, we propose DataInf, an efficient influence approximation method that is practical for large-scale generative AI models. Leveraging an easy-to-compute closed-form expression, DataInf outperforms existing influence computation algorithms in terms of computational and memory efficiency. Our theoretical analysis shows that DataInf is particularly well-suited for parameter-efficient fine-tuning techniques such as LoRA. Through systematic empirical evaluations, we show that DataInf accurately approximates influence scores and is orders of magnitude faster than existing methods. In applications to RoBERTa-large, Llama-2-13B-chat, and stable-diffusion-v1.5 models, DataInf effectively identifies the most influential fine-tuning examples better than other approximate influence scores. Moreover, it can help to identify which data points are mislabeled. △ Less

Submitted 13 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: ICLR 2024

arXiv:2309.14741 [pdf, other]

Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

Authors: Hee-Soo Heo, KiHyun Nam, Bong-Jin Lee, Youngki Kwon, Minjae Lee, You Jin Kim, Joon Son Chung

Abstract: In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remain… ▽ More In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remains fixed in this training process. This results in two similarity scores: one for the speakers information and one for the session information. The latter score acts as a compensator for the former that might be skewed due to session variations. Our extensive experiments demonstrate that session information can be effectively compensated without retraining of the embedding extractor. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.04655 [pdf]

Intelligent upper-limb exoskeleton integrated with soft wearable bioelectronics and deep-learning for human intention-driven strength augmentation based on sensory feedback

Authors: Jinwoo Lee, Kangkyu Kwon, Ira Soltis, Jared Matthews, Yoonjae Lee, Hojoong Kim, Lissette Romero, Nathan Zavanelli, Youngjin Kwon, Shinjae Kwon, Jimin Lee, Yewon Na, Sung Hoon Lee, Ki Jun Yu, Minoru Shinohara, Frank L. Hammond, Woon-Hong Yeo

Abstract: The age and stroke-associated decline in musculoskeletal strength degrades the ability to perform daily human tasks using the upper extremities. Although there are a few examples of exoskeletons, they need manual operations due to the absence of sensor feedback and no intention prediction of movements. Here, we introduce an intelligent upper-limb exoskeleton system that uses cloud-based deep learn… ▽ More The age and stroke-associated decline in musculoskeletal strength degrades the ability to perform daily human tasks using the upper extremities. Although there are a few examples of exoskeletons, they need manual operations due to the absence of sensor feedback and no intention prediction of movements. Here, we introduce an intelligent upper-limb exoskeleton system that uses cloud-based deep learning to predict human intention for strength augmentation. The embedded soft wearable sensors provide sensory feedback by collecting real-time muscle signals, which are simultaneously computed to determine the user's intended movement. The cloud-based deep-learning predicts four upper-limb joint motions with an average accuracy of 96.2% at a 200-250 millisecond response rate, suggesting that the exoskeleton operates just by human intention. In addition, an array of soft pneumatics assists the intended movements by providing 897 newton of force and 78.7 millimeter of displacement at maximum. Collectively, the intent-driven exoskeleton can augment human strength by 5.15 times on average compared to the unassisted exoskeleton. This report demonstrates an exoskeleton robot that augments the upper-limb joint movements by human intention based on a machine-learning cloud computing and sensory feedback. △ Less

Submitted 26 January, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: 15 pages, 6 figures, 1 table, published in npj flexible electronics journals

MSC Class: 68T40 (Primary) 92C55; 68T99 (Secondary)

arXiv:2309.01310 [pdf, other]

ExMobileViT: Lightweight Classifier Extension for Mobile Vision Transformer

Authors: Gyeongdong Yang, Yungwook Kwon, Hyunjin Kim

Abstract: The paper proposes an efficient structure for enhancing the performance of mobile-friendly vision transformer with small computational overhead. The vision transformer (ViT) is very attractive in that it reaches outperforming results in image classification, compared to conventional convolutional neural networks (CNNs). Due to its need of high computational resources, MobileNet-based ViT models su… ▽ More The paper proposes an efficient structure for enhancing the performance of mobile-friendly vision transformer with small computational overhead. The vision transformer (ViT) is very attractive in that it reaches outperforming results in image classification, compared to conventional convolutional neural networks (CNNs). Due to its need of high computational resources, MobileNet-based ViT models such as MobileViT-S have been developed. However, their performance cannot reach the original ViT model. The proposed structure relieves the above weakness by storing the information from early attention stages and reusing it in the final classifier. This paper is motivated by the idea that the data itself from early attention stages can have important meaning for the final classification. In order to reuse the early information in attention stages, the average pooling results of various scaled features from early attention stages are used to expand channels in the fully-connected layer of the final classifier. It is expected that the inductive bias introduced by the averaged features can enhance the final performance. Because the proposed structure only needs the average pooling of features from the attention stages and channel expansions in the final classifier, its computational and storage overheads are very small, keeping the benefits of low-cost MobileNet-based ViT (MobileViT). Compared with the original MobileViTs on the ImageNet dataset, the proposed ExMobileViT has noticeable accuracy enhancements, having only about 5% additional parameters. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: Under Review

arXiv:2308.04492 [pdf, other]

ChatGPT for Arabic Grammatical Error Correction

Authors: Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoud, Muhammad Abdul-Mageed

Abstract: Recently, large language models (LLMs) fine-tuned to follow human instruction have exhibited significant capabilities in various English NLP tasks. However, their performance in grammatical error correction (GEC) tasks, particularly in non-English languages, remains significantly unexplored. In this paper, we delve into abilities of instruction fine-tuned LLMs in Arabic GEC, a task made complex du… ▽ More Recently, large language models (LLMs) fine-tuned to follow human instruction have exhibited significant capabilities in various English NLP tasks. However, their performance in grammatical error correction (GEC) tasks, particularly in non-English languages, remains significantly unexplored. In this paper, we delve into abilities of instruction fine-tuned LLMs in Arabic GEC, a task made complex due to Arabic's rich morphology. Our findings suggest that various prompting methods, coupled with (in-context) few-shot learning, demonstrate considerable effectiveness, with GPT-4 achieving up to $65.49$ F\textsubscript{1} score under expert prompting (approximately $5$ points higher than our established baseline). This highlights the potential of LLMs in low-resource settings, offering a viable approach for generating useful synthetic data for model training. Despite these positive results, we find that instruction fine-tuned models, regardless of their size, significantly underperform compared to fully fine-tuned models of significantly smaller sizes. This disparity highlights a substantial room for improvements for LLMs. Inspired by methods from low-resource machine translation, we also develop a method exploiting synthetic data that significantly outperforms previous models on two standard Arabic benchmarks. Our work sets new SoTA for Arabic GEC, with $72.19\%$ and $73.26$ F$_{1}$ on the 2014 and 2015 QALB datasets, respectively. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2307.11754 [pdf, other]

What Drives the (In)stability of a Stablecoin?

Authors: Yujin Kwon, Kornrapat Pongmala, Kaihua Qin, Ariah Klages-Mundt, Philipp Jovanovic, Christine Parlour, Arthur Gervais, Dawn Song

Abstract: In May 2022, an apparent speculative attack, followed by market panic, led to the precipitous downfall of UST, one of the most popular stablecoins at that time. However, UST is not the only stablecoin to have been depegged in the past. Designing resilient and long-term stable coins, therefore, appears to present a hard challenge. To further scrutinize existing stablecoin designs and ultimately l… ▽ More In May 2022, an apparent speculative attack, followed by market panic, led to the precipitous downfall of UST, one of the most popular stablecoins at that time. However, UST is not the only stablecoin to have been depegged in the past. Designing resilient and long-term stable coins, therefore, appears to present a hard challenge. To further scrutinize existing stablecoin designs and ultimately lead to more robust systems, we need to understand where volatility emerges. Our work provides a game-theoretical model aiming to help identify why stablecoins suffer from a depeg. This game-theoretical model reveals that stablecoins have different price equilibria depending on the coin's architecture and mechanism to minimize volatility. Moreover, our theory is supported by extensive empirical data, spanning $1$ year. To that end, we collect daily prices for 22 stablecoins and on-chain data from five blockchains including the Ethereum and the Terra blockchain. △ Less

Submitted 25 July, 2023; v1 submitted 14 June, 2023; originally announced July 2023.

arXiv:2307.09988 [pdf, other]

TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge

Authors: Young D. Kwon, Rui Li, Stylianos I. Venieris, Jagmohan Chauhan, Nicholas D. Lane, Cecilia Mascolo

Abstract: On-device training is essential for user personalisation and privacy. With the pervasiveness of IoT devices and microcontroller units (MCUs), this task becomes more challenging due to the constrained memory and compute resources, and the limited availability of labelled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time (e.g. a few hours), o… ▽ More On-device training is essential for user personalisation and privacy. With the pervasiveness of IoT devices and microcontroller units (MCUs), this task becomes more challenging due to the constrained memory and compute resources, and the limited availability of labelled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time (e.g. a few hours), or induce substantial accuracy loss (>10%). In this paper, we propose TinyTrain, an on-device training approach that drastically reduces training time by selectively updating parts of the model and explicitly coping with data scarcity. TinyTrain introduces a task-adaptive sparse-update method that dynamically selects the layer/channel to update based on a multi-objective criterion that jointly captures user data, the memory, and the compute capabilities of the target device, leading to high accuracy on unseen tasks with reduced computation and memory footprint. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0% in accuracy, while reducing the backward-pass memory and computation cost by up to 1,098x and 7.68x, respectively. Targeting broadly used real-world edge devices, TinyTrain achieves 9.5x faster and 3.5x more energy-efficient training over status-quo approaches, and 2.23x smaller memory footprint than SOTA methods, while remaining within the 1 MB memory envelope of MCU-grade platforms. △ Less

Submitted 10 June, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

Comments: Accepted by ICML 2024

arXiv:2307.02784 [pdf, other]

On the Spatial-Wideband Effects in Millimeter-Wave Cell-Free Massive MIMO

Authors: Seyoung Ahn, Soohyeong Kim, Yongseok Kwon, Joohan Park, Jiseung Youn, Sunghyun Cho

Abstract: In this paper, we investigate the spatial-wideband effects in cell-free massive MIMO (CF-mMIMO) systems in mmWave bands. The utilization of mmWave frequencies brings challenges such as signal attenuation and the need for denser networks like ultra-dense networks (UDN) to maintain communication performance. CF-mMIMO is introduced as a solution, where distributed access points (APs) transmit signals… ▽ More In this paper, we investigate the spatial-wideband effects in cell-free massive MIMO (CF-mMIMO) systems in mmWave bands. The utilization of mmWave frequencies brings challenges such as signal attenuation and the need for denser networks like ultra-dense networks (UDN) to maintain communication performance. CF-mMIMO is introduced as a solution, where distributed access points (APs) transmit signals to a central processing unit (CPU) for joint processing. CF-mMIMO offers advantages in reducing non-line-of-sight (NLOS) conditions and overcoming signal blockage. We investigate the synchronization problem in CF-mMIMO due to time delays between APs. It proposes a minimum cyclic prefix length to mitigate inter-symbol interference (ISI) in OFDM systems. Furthermore, the spatial correlations of channel responses are analyzed in the frequency-phase domain. The impact of these correlations on system performance is examined. The findings contribute to improving the performance of CF-mMIMO systems and enhancing the effective utilization of mmWave communication. △ Less

Submitted 6 July, 2023; originally announced July 2023.

arXiv:2306.10577 [pdf, other]

OpenDataVal: a Unified Benchmark for Data Valuation

Authors: Kevin Fu Jiang, Weixin Liang, James Zou, Yongchan Kwon

Abstract: Assessing the quality and impact of individual data points is critical for improving model performance and mitigating undesirable biases within the training dataset. Several data valuation algorithms have been proposed to quantify data quality, however, there lacks a systemic and standardized benchmarking system for data valuation. In this paper, we introduce OpenDataVal, an easy-to-use and unifie… ▽ More Assessing the quality and impact of individual data points is critical for improving model performance and mitigating undesirable biases within the training dataset. Several data valuation algorithms have been proposed to quantify data quality, however, there lacks a systemic and standardized benchmarking system for data valuation. In this paper, we introduce OpenDataVal, an easy-to-use and unified benchmark framework that empowers researchers and practitioners to apply and compare various data valuation algorithms. OpenDataVal provides an integrated environment that includes (i) a diverse collection of image, natural language, and tabular datasets, (ii) implementations of eleven different state-of-the-art data valuation algorithms, and (iii) a prediction model API that can import any models in scikit-learn. Furthermore, we propose four downstream machine learning tasks for evaluating the quality of data values. We perform benchmarking analysis using OpenDataVal, quantifying and comparing the efficacy of state-of-the-art data valuation approaches. We find that no single algorithm performs uniformly best across all tasks, and an appropriate algorithm should be employed for a user's downstream task. OpenDataVal is publicly available at https://opendataval.github.io with comprehensive documentation. Furthermore, we provide a leaderboard where researchers can evaluate the effectiveness of their own data valuation algorithms. △ Less

Submitted 13 October, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: 25 pages, NeurIPS 2023 Track on Datasets and Benchmarks

arXiv:2306.00680 [pdf, other]

Encoder-decoder multimodal speaker change detection

Authors: Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You Jin Kim, Young-ki Kwon, Minjae Lee, Bong-Jin Lee

Abstract: The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance. Recently, multimodal SCD (MMSCD) models, which utilise text modality in addition to audio, have shown improved performance. In this study, the proposed model are bui… ▽ More The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance. Recently, multimodal SCD (MMSCD) models, which utilise text modality in addition to audio, have shown improved performance. In this study, the proposed model are built upon two main proposals, a novel mechanism for modality fusion and the adoption of a encoder-decoder architecture. Different to previous MMSCD works that extract speaker embeddings from extremely short audio segments, aligned to a single word, we use a speaker embedding extracted from 1.5s. A transformer decoder layer further improves the performance of an encoder-only MMSCD model. The proposed model achieves state-of-the-art results among studies that report SCD performance and is also on par with recent work that combines SCD with automatic speech recognition via human transcription. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 5 pages, accepted for presentation at INTERSPEECH 2023

arXiv:2305.13114 [pdf, other]

Exploring User Perspectives on ChatGPT: Applications, Perceptions, and Implications for AI-Integrated Education

Authors: Reza Hadi Mogavi, Chao Deng, Justin Juho Kim, Pengyuan Zhou, Young D. Kwon, Ahmed Hosny Saleh Metwally, Ahmed Tlili, Simone Bassanelli, Antonio Bucchiarone, Sujit Gujar, Lennart E. Nacke, Pan Hui

Abstract: To foster the development of pedagogically potent and ethically sound AI-integrated learning landscapes, it is pivotal to critically explore the perceptions and experiences of the users immersed in these contexts. In this study, we perform a thorough qualitative content analysis across four key social media platforms. Our goal is to understand the user experience (UX) and views of early adopters o… ▽ More To foster the development of pedagogically potent and ethically sound AI-integrated learning landscapes, it is pivotal to critically explore the perceptions and experiences of the users immersed in these contexts. In this study, we perform a thorough qualitative content analysis across four key social media platforms. Our goal is to understand the user experience (UX) and views of early adopters of ChatGPT across different educational sectors. The results of our research show that ChatGPT is most commonly used in the domains of higher education, K-12 education, and practical skills training. In social media dialogues, the topics most frequently associated with ChatGPT are productivity, efficiency, and ethics. Early adopters' attitudes towards ChatGPT are multifaceted. On one hand, some users view it as a transformative tool capable of amplifying student self-efficacy and learning motivation. On the other hand, there is a degree of apprehension among concerned users. They worry about a potential overdependence on the AI system, which they fear might encourage superficial learning habits and erode students' social and critical thinking skills. This dichotomy of opinions underscores the complexity of Human-AI Interaction in educational contexts. Our investigation adds depth to this ongoing discourse, providing crowd-sourced insights for educators and learners who are considering incorporating ChatGPT or similar generative AI tools into their pedagogical strategies. △ Less

Submitted 25 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: This is the authors' preprint version of the paper accepted by the Journal of Computers in Human Behavior: Artificial Humans (doi: https://doi.org/10.1016/j.chbah.2023.100027)

arXiv:2305.07288 [pdf, other]

Open-WikiTable: Dataset for Open Domain Question Answering with Complex Reasoning over Table

Authors: Sunjun Kweon, Yeonsu Kwon, Seonhee Cho, Yohan Jo, Edward Choi

Abstract: Despite recent interest in open domain question answering (ODQA) over tables, many studies still rely on datasets that are not truly optimal for the task with respect to utilizing structural nature of table. These datasets assume answers reside as a single cell value and do not necessitate exploring over multiple cells such as aggregation, comparison, and sorting. Thus, we release Open-WikiTable,… ▽ More Despite recent interest in open domain question answering (ODQA) over tables, many studies still rely on datasets that are not truly optimal for the task with respect to utilizing structural nature of table. These datasets assume answers reside as a single cell value and do not necessitate exploring over multiple cells such as aggregation, comparison, and sorting. Thus, we release Open-WikiTable, the first ODQA dataset that requires complex reasoning over tables. Open-WikiTable is built upon WikiSQL and WikiTableQuestions to be applicable in the open-domain setting. As each question is coupled with both textual answers and SQL queries, Open-WikiTable opens up a wide range of possibilities for future research, as both reader and parser methods can be applied. The dataset and code are publicly available. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: ACL 2023 (Findings)

arXiv:2305.06590 [pdf, other]

FactKG: Fact Verification via Reasoning on Knowledge Graphs

Authors: Jiho Kim, Sungjin Park, Yeonsu Kwon, Yohan Jo, James Thorne, Edward Choi

Abstract: In real world applications, knowledge graphs (KG) are widely used in various domains (e.g. medical applications and dialogue agents). However, for fact verification, KGs have not been adequately utilized as a knowledge source. KGs can be a valuable knowledge source in fact verification due to their reliability and broad applicability. A KG consists of nodes and edges which makes it clear how conce… ▽ More In real world applications, knowledge graphs (KG) are widely used in various domains (e.g. medical applications and dialogue agents). However, for fact verification, KGs have not been adequately utilized as a knowledge source. KGs can be a valuable knowledge source in fact verification due to their reliability and broad applicability. A KG consists of nodes and edges which makes it clear how concepts are linked together, allowing machines to reason over chains of topics. However, there are many challenges in understanding how these machine-readable concepts map to information in text. To enable the community to better use KGs, we introduce a new dataset, FactKG: Fact Verification via Reasoning on Knowledge Graphs. It consists of 108k natural language claims with five types of reasoning: One-hop, Conjunction, Existence, Multi-hop, and Negation. Furthermore, FactKG contains various linguistic patterns, including colloquial style claims as well as written style claims to increase practicality. Lastly, we develop a baseline approach and analyze FactKG over these reasoning types. We believe FactKG can advance both reliability and practicality in KG-based fact verification. △ Less

Submitted 18 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023

arXiv:2305.02995 [pdf, other]

Accuracy on the Curve: On the Nonlinear Correlation of ML Performance Between Data Subpopulations

Authors: Weixin Liang, Yining Mao, Yongchan Kwon, Xinyu Yang, James Zou

Abstract: Understanding the performance of machine learning (ML) models across diverse data distributions is critically important for reliable applications. Despite recent empirical studies positing a near-perfect linear correlation between in-distribution (ID) and out-of-distribution (OOD) accuracies, we empirically demonstrate that this correlation is more nuanced under subpopulation shifts. Through rigor… ▽ More Understanding the performance of machine learning (ML) models across diverse data distributions is critically important for reliable applications. Despite recent empirical studies positing a near-perfect linear correlation between in-distribution (ID) and out-of-distribution (OOD) accuracies, we empirically demonstrate that this correlation is more nuanced under subpopulation shifts. Through rigorous experimentation and analysis across a variety of datasets, models, and training epochs, we demonstrate that OOD performance often has a nonlinear correlation with ID performance in subpopulation shifts. Our findings, which contrast previous studies that have posited a linear correlation in model performance during distribution shifts, reveal a "moon shape" correlation (parabolic uptrend curve) between the test performance on the majority subpopulation and the minority subpopulation. This non-trivial nonlinear correlation holds across model architectures, hyperparameters, training durations, and the imbalance between subpopulations. Furthermore, we found that the nonlinearity of this "moon shape" is causally influenced by the degree of spurious correlations in the training data. Our controlled experiments show that stronger spurious correlation in the training data creates more nonlinear performance correlation. We provide complementary experimental and theoretical analyses for this phenomenon, and discuss its implications for ML reliability and fairness. Our work highlights the importance of understanding the nonlinear effects of model improvement on performance in different subpopulations, and has the potential to inform the development of more equitable and responsible machine learning models. △ Less

Submitted 31 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Comments: Accepted to the main conference of ICML 2023

arXiv:2304.13292 [pdf, other]

Zero-Shot Slot and Intent Detection in Low-Resource Languages

Authors: Sang Yun Kwon, Gagan Bhatia, El Moatez Billah Nagoudi, Alcides Alcoba Inciarte, Muhammad Abdul-Mageed

Abstract: Intent detection and slot filling are critical tasks in spoken and natural language understanding for task-oriented dialog systems. In this work we describe our participation in the slot and intent detection for low-resource language varieties (SID4LR; Aepli et al. (2023)). We investigate the slot and intent detection (SID) tasks using a wide range of models and settings. Given the recent success… ▽ More Intent detection and slot filling are critical tasks in spoken and natural language understanding for task-oriented dialog systems. In this work we describe our participation in the slot and intent detection for low-resource language varieties (SID4LR; Aepli et al. (2023)). We investigate the slot and intent detection (SID) tasks using a wide range of models and settings. Given the recent success of multitask-prompted finetuning of large language models, we also test the generalization capability of the recent encoder-decoder model mT0 (Muennighoff et al., 2022) on new tasks (i.e., SID) in languages they have never intentionally seen. We show that our best model outperforms the baseline by a large margin (up to +30 F1 points) in both SID tasks △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: VarDial @ EACL

arXiv:2304.09822 [pdf, other]

Unpacking How Decentralized Autonomous Organizations (DAOs) Work in Practice

Authors: Tanusree Sharma, Yujin Kwon, Kornrapat Pongmala, Henry Wang, Andrew Miller, Dawn Song, Yang Wang

Abstract: Decentralized Autonomous Organizations (DAOs) have emerged as a novel way to coordinate a group of (pseudonymous) entities towards a shared vision (e.g., promoting sustainability), utilizing self-executing smart contracts on blockchains to support decentralized governance and decision-making. In just a few years, over 4,000 DAOs have been launched in various domains, such as investment, education,… ▽ More Decentralized Autonomous Organizations (DAOs) have emerged as a novel way to coordinate a group of (pseudonymous) entities towards a shared vision (e.g., promoting sustainability), utilizing self-executing smart contracts on blockchains to support decentralized governance and decision-making. In just a few years, over 4,000 DAOs have been launched in various domains, such as investment, education, health, and research. Despite such rapid growth and diversity, it is unclear how these DAOs actually work in practice and to what extent they are effective in achieving their goals. Given this, we aim to unpack how (well) DAOs work in practice. We conducted an in-depth analysis of a diverse set of 10 DAOs of various categories and smart contracts, leveraging on-chain (e.g., voting results) and off-chain data (e.g., community discussions) as well as our interviews with DAO organizers/members. Specifically, we defined metrics to characterize key aspects of DAOs, such as the degrees of decentralization and autonomy. We observed CompoundDAO, AssangeDAO, Bankless, and Krausehouse having poor decentralization in voting, while decentralization has improved over time for one-person-one-vote DAOs (e.g., Proof of Humanity). Moreover, the degree of autonomy varies among DAOs, with some (e.g., Compound and Krausehouse) relying more on third parties than others. Lastly, we offer a set of design implications for future DAO systems based on our findings. △ Less

Submitted 16 April, 2023; originally announced April 2023.

arXiv:2304.07718 [pdf, other]

Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value

Authors: Yongchan Kwon, James Zou

Abstract: Data valuation is a powerful framework for providing statistical insights into which data are beneficial or detrimental to model training. Many Shapley-based data valuation methods have shown promising results in various downstream tasks, however, they are well known to be computationally challenging as it requires training a large number of models. As a result, it has been recognized as infeasibl… ▽ More Data valuation is a powerful framework for providing statistical insights into which data are beneficial or detrimental to model training. Many Shapley-based data valuation methods have shown promising results in various downstream tasks, however, they are well known to be computationally challenging as it requires training a large number of models. As a result, it has been recognized as infeasible to apply to large datasets. To address this issue, we propose Data-OOB, a new data valuation method for a bagging model that utilizes the out-of-bag estimate. The proposed method is computationally efficient and can scale to millions of data by reusing trained weak learners. Specifically, Data-OOB takes less than 2.25 hours on a single CPU processor when there are $10^6$ samples to evaluate and the input dimension is 100. Furthermore, Data-OOB has solid theoretical interpretations in that it identifies the same important data point as the infinitesimal jackknife influence function when two different points are compared. We conduct comprehensive experiments using 12 classification datasets, each with thousands of sample sizes. We demonstrate that the proposed method significantly outperforms existing state-of-the-art data valuation methods in identifying mislabeled data and finding a set of helpful (or harmful) data points, highlighting the potential for applying data values in real-world applications. △ Less

Submitted 1 June, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

Comments: 18 pages, to be published at ICML 2023

arXiv:2304.04897 [pdf, other]

Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling

Authors: Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

Abstract: We present a method that enables synthesizing novel views and novel poses of arbitrary human performers from sparse multi-view images. A key ingredient of our method is a hybrid appearance blending module that combines the advantages of the implicit body NeRF representation and image-based rendering. Existing generalizable human NeRF methods that are conditioned on the body model have shown robust… ▽ More We present a method that enables synthesizing novel views and novel poses of arbitrary human performers from sparse multi-view images. A key ingredient of our method is a hybrid appearance blending module that combines the advantages of the implicit body NeRF representation and image-based rendering. Existing generalizable human NeRF methods that are conditioned on the body model have shown robustness against the geometric variation of arbitrary human performers. Yet they often exhibit blurry results when generalized onto unseen identities. Meanwhile, image-based rendering shows high-quality results when sufficient observations are available, whereas it suffers artifacts in sparse-view settings. We propose Neural Image-based Avatars (NIA) that exploits the best of those two methods: to maintain robustness under new articulations and self-occlusions while directly leveraging the available (sparse) source view colors to preserve appearance details of new subject identities. Our hybrid design outperforms recent methods on both in-domain identity generalization as well as challenging cross-dataset generalization settings. Also, in terms of the pose generalization, our method outperforms even the per-subject optimized animatable NeRF methods. The video results are available at https://youngjoongunc.github.io/nia △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2304.03013 [pdf, other]

doi 10.1016/j.jpdc.2022.12.008

Tensor Slicing and Optimization for Multicore NPUs

Authors: Rafael Sousa, Marcio Pereira, Yongin Kwon, Taeho Kim, Namsoon Jung, Chang Soo Kim, Michael Frank, Guido Araujo

Abstract: Although code generation for Convolution Neural Network (CNN) models has been extensively studied, performing efficient data slicing and parallelization for highly-constrai\-ned Multicore Neural Processor Units (NPUs) is still a challenging problem. Given the size of convolutions' input/output tensors and the small footprint of NPU on-chip memories, minimizing memory transactions while maximizing… ▽ More Although code generation for Convolution Neural Network (CNN) models has been extensively studied, performing efficient data slicing and parallelization for highly-constrai\-ned Multicore Neural Processor Units (NPUs) is still a challenging problem. Given the size of convolutions' input/output tensors and the small footprint of NPU on-chip memories, minimizing memory transactions while maximizing parallelism and MAC utilization are central to any effective solution. This paper proposes a TensorFlow XLA/LLVM compiler optimization pass for Multicore NPUs, called Tensor Slicing Optimization (TSO), which: (a) maximizes convolution parallelism and memory usage across NPU cores; and (b) reduces data transfers between host and NPU on-chip memories by using DRAM memory burst time estimates to guide tensor slicing. To evaluate the proposed approach, a set of experiments was performed using the NeuroMorphic Processor (NMP), a multicore NPU containing 32 RISC-V cores extended with novel CNN instructions. Experimental results show that TSO is capable of identifying the best tensor slicing that minimizes execution time for a set of CNN models. Speed-ups of up to 21.7\% result when comparing the TSO burst-based technique to a no-burst data slicing approach. To validate the generality of the TSO approach, the algorithm was also ported to the Glow Machine Learning framework. The performance of the models were measured on both Glow and TensorFlow XLA/LLVM compilers, revealing similar results. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Journal ref: Journal of Parallel and Distributed Computing Journal of Parallel and Distributed Computing, Volume 175, May 2023, Pages 66-79

arXiv:2304.01197 [pdf, other]

Bringing Telepresence to Every Desk

Authors: Shengze Wang, Ziheng Wang, Ryan Schmelzle, Liujie Zheng, YoungJoong Kwon, Soumyadip Sengupta, Henry Fuchs

Abstract: In this paper, we work to bring telepresence to every desktop. Unlike commercial systems, personal 3D video conferencing systems must render high-quality videos while remaining financially and computationally viable for the average consumer. To this end, we introduce a capturing and rendering system that only requires 4 consumer-grade RGBD cameras and synthesizes high-quality free-viewpoint videos… ▽ More In this paper, we work to bring telepresence to every desktop. Unlike commercial systems, personal 3D video conferencing systems must render high-quality videos while remaining financially and computationally viable for the average consumer. To this end, we introduce a capturing and rendering system that only requires 4 consumer-grade RGBD cameras and synthesizes high-quality free-viewpoint videos of users as well as their environments. Experimental results show that our system renders high-quality free-viewpoint videos without using object templates or heavy pre-processing. While not real-time, our system is fast and does not require per-video optimizations. Moreover, our system is robust to complex hand gestures and clothing, and it can generalize to new users. This work provides a strong basis for further optimization, and it will help bring telepresence to every desk in the near future. The code and dataset will be made available on our website https://mcmvmc.github.io/PersonalTelepresence/. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2303.17807 [pdf]

doi 10.1371/journal.pdig.0000416

GPT-4 can pass the Korean National Licensing Examination for Korean Medicine Doctors

Authors: Dongyeop Jang, Tae-Rim Yun, Choong-Yeol Lee, Young-Kyu Kwon, Chang-Eop Kim

Abstract: Traditional Korean medicine (TKM) emphasizes individualized diagnosis and treatment. This uniqueness makes AI modeling difficult due to limited data and implicit processes. Large language models (LLMs) have demonstrated impressive medical inference, even without advanced training in medical texts. This study assessed the capabilities of GPT-4 in TKM, using the Korean National Licensing Examination… ▽ More Traditional Korean medicine (TKM) emphasizes individualized diagnosis and treatment. This uniqueness makes AI modeling difficult due to limited data and implicit processes. Large language models (LLMs) have demonstrated impressive medical inference, even without advanced training in medical texts. This study assessed the capabilities of GPT-4 in TKM, using the Korean National Licensing Examination for Korean Medicine Doctors (K-NLEKMD) as a benchmark. The K-NLEKMD, administered by a national organization, encompasses 12 major subjects in TKM. We optimized prompts with Chinese-term annotation, English translation for questions and instruction, exam-optimized instruction, and self-consistency. GPT-4 with optimized prompts achieved 66.18% accuracy, surpassing both the examination's average pass mark of 60% and the 40% minimum for each subject. The gradual introduction of language-related prompts and prompting techniques enhanced the accuracy from 51.82% to its maximum accuracy. GPT-4 showed low accuracy in subjects including public health & medicine-related law, internal medicine (2) which are localized in Korea and TKM. The model's accuracy was lower for questions requiring TKM-specialized knowledge. It exhibited higher accuracy in diagnosis-based and recall-based questions than in intervention-based questions. A positive correlation was observed between the consistency and accuracy of GPT-4's responses. This study unveils both the potential and challenges of applying LLMs to TKM. These findings underline the potential of LLMs like GPT-4 in culturally adapted medicine, especially TKM, for tasks such as clinical assistance, medical education, and research. But they also point towards the necessity for the development of methods to mitigate cultural bias inherent in large language models and validate their efficacy in real-world clinical settings. △ Less

Submitted 16 November, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

Comments: 23 pages, 4 figures

ACM Class: J.3

arXiv:2303.12557 [pdf, other]

Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems

Authors: Jemin Lee, Yongin Kwon, Sihyeong Park, Misun Yu, Jeman Park, Hwanjun Song

Abstract: Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread implementation. To address this issue, researchers have proposed efficient hybrid transformer architectures that combine convolutional and transformer layers w… ▽ More Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread implementation. To address this issue, researchers have proposed efficient hybrid transformer architectures that combine convolutional and transformer layers with optimized attention computation of linear complexity. Additionally, post-training quantization has been proposed as a means of mitigating computational demands. For mobile devices, achieving optimal acceleration for ViTs necessitates the strategic integration of quantization techniques and efficient hybrid transformer structures. However, no prior investigation has applied quantization to efficient hybrid transformers. In this paper, we discover that applying existing post-training quantization (PTQ) methods for ViTs to efficient hybrid transformers leads to a drastic accuracy drop, attributed to the four following challenges: (i) highly dynamic ranges, (ii) zero-point overflow, (iii) diverse normalization, and (iv) limited model parameters ($<$5M). To overcome these challenges, we propose a new post-training quantization method, which is the first to quantize efficient hybrid ViTs (MobileViTv1, MobileViTv2, Mobile-Former, EfficientFormerV1, EfficientFormerV2). We achieve a significant improvement of 17.73% for 8-bit and 29.75% for 6-bit on average, respectively, compared with existing PTQ methods (EasyQuant, FQ-ViT, PTQ4ViT, and RepQ-ViT)}. We plan to release our code at https://gitlab.com/ones-ai/q-hyvit. △ Less

Submitted 16 May, 2024; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: 14 pages, 9 figures, accepted in IEEE Internet of Things Journal

arXiv:2302.13750 [pdf, other]

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

Authors: Yoohwan Kwon, Soo-Whan Chung

Abstract: Multi-lingual speech recognition aims to distinguish linguistic expressions in different languages and integrate acoustic processing simultaneously. In contrast, current multi-lingual speech recognition research follows a language-aware paradigm, mainly targeted to improve recognition performance rather than discriminate language characteristics. In this paper, we present a multi-lingual speech re… ▽ More Multi-lingual speech recognition aims to distinguish linguistic expressions in different languages and integrate acoustic processing simultaneously. In contrast, current multi-lingual speech recognition research follows a language-aware paradigm, mainly targeted to improve recognition performance rather than discriminate language characteristics. In this paper, we present a multi-lingual speech recognition network named Mixture-of-Language-Expert(MoLE), which digests speech in a variety of languages. Specifically, MoLE analyzes linguistic expression from input speech in arbitrary languages, activating a language-specific expert with a lightweight language tokenizer. The tokenizer not only activates experts, but also estimates the reliability of the activation. Based on the reliability, the activated expert and the language-agnostic expert are aggregated to represent language-conditioned embedding for efficient speech recognition. Our proposed model is evaluated in 5 languages scenario, and the experimental results show that our structure is advantageous on multi-lingual recognition, especially for speech in low-resource language. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.09765 [pdf, other]

ENInst: Enhancing Weakly-supervised Low-shot Instance Segmentation

Authors: Moon Ye-Bin, Dongmin Choi, Yongjin Kwon, Junsik Kim, Tae-Hyun Oh

Abstract: We address a weakly-supervised low-shot instance segmentation, an annotation-efficient training method to deal with novel classes effectively. Since it is an under-explored problem, we first investigate the difficulty of the problem and identify the performance bottleneck by conducting systematic analyses of model components and individual sub-tasks with a simple baseline model. Based on the analy… ▽ More We address a weakly-supervised low-shot instance segmentation, an annotation-efficient training method to deal with novel classes effectively. Since it is an under-explored problem, we first investigate the difficulty of the problem and identify the performance bottleneck by conducting systematic analyses of model components and individual sub-tasks with a simple baseline model. Based on the analyses, we propose ENInst with sub-task enhancement methods: instance-wise mask refinement for enhancing pixel localization quality and novel classifier composition for improving classification accuracy. Our proposed method lifts the overall performance by enhancing the performance of each sub-task. We demonstrate that our ENInst is 7.5 times more efficient in achieving comparable performance to the existing fully-supervised few-shot models and even outperforms them at times. △ Less

Submitted 30 July, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: Accepted at Pattern Recognition (PR)

arXiv:2302.09716 [pdf, other]

Seeing the Fruit for the Leaves: Towards Automated Apple Fruitlet Thinning

Authors: Ans Qureshi, Neville Loh, Young Min Kwon, David Smith, Trevor Gee, Oliver Bachelor, Josh McCulloch, Mahla Nejati, JongYoon Lim, Richard Green, Ho Seok Ahn, Bruce MacDonald, Henry Williams

Abstract: Following a global trend, the lack of reliable access to skilled labour is causing critical issues for the effective management of apple orchards. One of the primary challenges is maintaining skilled human operators capable of making precise fruitlet thinning decisions. Thinning requires accurately measuring the true crop load for individual apple trees to provide optimal thinning decisions on an… ▽ More Following a global trend, the lack of reliable access to skilled labour is causing critical issues for the effective management of apple orchards. One of the primary challenges is maintaining skilled human operators capable of making precise fruitlet thinning decisions. Thinning requires accurately measuring the true crop load for individual apple trees to provide optimal thinning decisions on an individual basis. A challenging task due to the dense foliage obscuring the fruitlets within the tree structure. This paper presents the initial design, implementation, and evaluation details of the vision system for an automatic apple fruitlet thinning robot to meet this need. The platform consists of a UR5 robotic arm and stereo cameras which enable it to look around the leaves to map the precise number and size of the fruitlets on the apple branches. We show that this platform can measure the fruitlet load on the apple tree to with 84% accuracy in a real-world commercial apple orchard while being 87% precise. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Comments: Accepted and Presented at the Australasian Conference on Robotics and Automation (ACRA 2022)

arXiv:2302.07352 [pdf, other]

Reachability-based Trajectory Design with Neural Implicit Safety Constraints

Authors: Jonathan Michaux, Qingyi Chen, Yongseok Kwon, Ram Vasudevan

Abstract: Generating safe motion plans in real-time is a key requirement for deploying robot manipulators to assist humans in collaborative settings. In particular, robots must satisfy strict safety requirements to avoid self-damage or harming nearby humans. Satisfying these requirements is particularly challenging if the robot must also operate in real-time to adjust to changes in its environment.This pape… ▽ More Generating safe motion plans in real-time is a key requirement for deploying robot manipulators to assist humans in collaborative settings. In particular, robots must satisfy strict safety requirements to avoid self-damage or harming nearby humans. Satisfying these requirements is particularly challenging if the robot must also operate in real-time to adjust to changes in its environment.This paper addresses these challenges by proposing Reachability-based Signed Distance Functions (RDFs) as a neural implicit representation for robot safety. RDF, which can be constructed using supervised learning in a tractable fashion, accurately predicts the distance between the swept volume of a robot arm and an obstacle. RDF's inference and gradient computations are fast and scale linearly with the dimension of the system; these features enable its use within a novel real-time trajectory planning framework as a continuous-time collision-avoidance constraint. The planning method using RDF is compared to a variety of state-of-the-art techniques and is demonstrated to successfully solve challenging motion planning tasks for high-dimensional systems faster and more reliably than all tested methods. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:2302.01493 [pdf]

Deep Learning (DL)-based Automatic Segmentation of the Internal Pudendal Artery (IPA) for Reduction of Erectile Dysfunction in Definitive Radiotherapy of Localized Prostate Cancer

Authors: Anjali Balagopal, Michael Dohopolski, Young Suk Kwon, Steven Montalvo, Howard Morgan, Ti Bai, Dan Nguyen, Xiao Liang, Xinran Zhong, Mu-Han Lin, Neil Desai, Steve Jiang

Abstract: Background and purpose: Radiation-induced erectile dysfunction (RiED) is commonly seen in prostate cancer patients. Clinical trials have been developed in multiple institutions to investigate whether dose-sparing to the internal-pudendal-arteries (IPA) will improve retention of sexual potency. The IPA is usually not considered a conventional organ-at-risk (OAR) due to segmentation difficulty. In t… ▽ More Background and purpose: Radiation-induced erectile dysfunction (RiED) is commonly seen in prostate cancer patients. Clinical trials have been developed in multiple institutions to investigate whether dose-sparing to the internal-pudendal-arteries (IPA) will improve retention of sexual potency. The IPA is usually not considered a conventional organ-at-risk (OAR) due to segmentation difficulty. In this work, we propose a deep learning (DL)-based auto-segmentation model for the IPA that utilizes CT and MRI or CT alone as the input image modality to accommodate variation in clinical practice. Materials and methods: 86 patients with CT and MRI images and noisy IPA labels were recruited in this study. We split the data into 42/14/30 for model training, testing, and a clinical observer study, respectively. There were three major innovations in this model: 1) we designed an architecture with squeeze-and-excite blocks and modality attention for effective feature extraction and production of accurate segmentation, 2) a novel loss function was used for training the model effectively with noisy labels, and 3) modality dropout strategy was used for making the model capable of segmentation in the absence of MRI. Results: The DSC, ASD, and HD95 values for the test dataset were 62.2%, 2.54mm, and 7mm, respectively. AI segmented contours were dosimetrically equivalent to the expert physician's contours. The observer study showed that expert physicians' scored AI contours (mean=3.7) higher than inexperienced physicians' contours (mean=3.1). When inexperienced physicians started with AI contours, the score improved to 3.7. Conclusion: The proposed model achieved good quality IPA contours to improve uniformity of segmentation and to facilitate introduction of standardized IPA segmentation into clinical trials and practice. △ Less

Submitted 2 February, 2023; originally announced February 2023.

arXiv:2301.07695 [pdf, other]

EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records

Authors: Gyubok Lee, Hyeonji Hwang, Seongsu Bae, Yeonsu Kwon, Woncheol Shin, Seongjun Yang, Minjoon Seo, Jong-Yeup Kim, Edward Choi

Abstract: We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and used the responses to create seed questions. We then manually linked these questions… ▽ More We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and used the responses to create seed questions. We then manually linked these questions to two open-source EHR databases, MIMIC-III and eICU, and included various time expressions and held-out unanswerable questions in the dataset, which were also collected from the poll. Our dataset poses a unique set of challenges: the model needs to 1) generate SQL queries that reflect a wide range of needs in the hospital, including simple retrieval and complex operations such as calculating survival rate, 2) understand various time expressions to answer time-sensitive questions in healthcare, and 3) distinguish whether a given question is answerable or unanswerable. We believe our dataset, EHRSQL, can serve as a practical benchmark for developing and assessing QA models on structured EHR data and take a step further towards bridging the gap between text-to-SQL research and its real-life deployment in healthcare. EHRSQL is available at https://github.com/glee4810/EHRSQL. △ Less

Submitted 25 December, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

Comments: Published as a conference paper at NeurIPS 2022 (Track on Datasets and Benchmarks)

Showing 1–50 of 133 results for author: Kwon, Y