-
MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models
Authors:
Chengguang Gan,
Qingyu Yin,
Xinyang He,
Hanjun Wei,
Yunhao Liang,
Younghun Lim,
Shijian Wang,
Hexiang Huang,
Qinghao Zhang,
Shiwen Ni,
Tatsunori Mori
Abstract:
The Mutual Reinforcement Effect (MRE) represents a promising avenue in information extraction and multitasking research. Nevertheless, its applicability has been constrained due to the exclusive availability of MRE mix datasets in Japanese, thereby limiting comprehensive exploration by the global research community. To address this limitation, we introduce a Multilingual MRE mix dataset (MMM) that…
▽ More
The Mutual Reinforcement Effect (MRE) represents a promising avenue in information extraction and multitasking research. Nevertheless, its applicability has been constrained due to the exclusive availability of MRE mix datasets in Japanese, thereby limiting comprehensive exploration by the global research community. To address this limitation, we introduce a Multilingual MRE mix dataset (MMM) that encompasses 21 sub-datasets in English, Japanese, and Chinese. In this paper, we also propose a method for dataset translation assisted by Large Language Models (LLMs), which significantly reduces the manual annotation time required for dataset construction by leveraging LLMs to translate the original Japanese datasets. Additionally, we have enriched the dataset by incorporating open-domain Named Entity Recognition (NER) and sentence classification tasks. Utilizing this expanded dataset, we developed a unified input-output framework to train an Open-domain Information Extraction Large Language Model (OIELLM). The OIELLM model demonstrates the capability to effectively process novel MMM datasets, exhibiting significant improvements in performance.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval
Authors:
Youngsun Lim,
Hyunjung Shim
Abstract:
Text-to-image generation has shown remarkable progress with the emergence of diffusion models. However, these models often generate factually inconsistent images, failing to accurately reflect the factual information and common sense conveyed by the input text prompts. We refer to this issue as Image hallucination. Drawing from studies on hallucinations in language models, we classify this problem…
▽ More
Text-to-image generation has shown remarkable progress with the emergence of diffusion models. However, these models often generate factually inconsistent images, failing to accurately reflect the factual information and common sense conveyed by the input text prompts. We refer to this issue as Image hallucination. Drawing from studies on hallucinations in language models, we classify this problem into three types and propose a methodology that uses factual images retrieved from external sources to generate realistic images. Depending on the nature of the hallucination, we employ off-the-shelf image editing tools, either InstructPix2Pix or IP-Adapter, to leverage factual information from the retrieved image. This approach enables the generation of images that accurately reflect the facts and common sense.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection
Authors:
Yewon Lim,
Changyeon Lee,
Aerin Kim,
Oren Etzioni
Abstract:
A dramatic influx of diffusion-generated images has marked recent years, posing unique challenges to current detection technologies. While the task of identifying these images falls under binary classification, a seemingly straightforward category, the computational load is significant when employing the "reconstruction then compare" technique. This approach, known as DIRE (Diffusion Reconstructio…
▽ More
A dramatic influx of diffusion-generated images has marked recent years, posing unique challenges to current detection technologies. While the task of identifying these images falls under binary classification, a seemingly straightforward category, the computational load is significant when employing the "reconstruction then compare" technique. This approach, known as DIRE (Diffusion Reconstruction Error), not only identifies diffusion-generated images but also detects those produced by GANs, highlighting the technique's broad applicability. To address the computational challenges and improve efficiency, we propose distilling the knowledge embedded in diffusion models to develop rapid deepfake detection models. Our approach, aimed at creating a small, fast, cheap, and lightweight diffusion synthesized deepfake detector, maintains robust performance while significantly reducing operational demands. Maintaining performance, our experimental results indicate an inference speed 3.2 times faster than the existing DIRE framework. This advance not only enhances the practicality of deploying these systems in real-world settings but also paves the way for future research endeavors that seek to leverage diffusion model knowledge.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Enhancing Security and Privacy in Federated Learning using Update Digests and Voting-Based Defense
Authors:
Wenjie Li,
Kai Fan,
Jingyuan Zhang,
Hui Li,
Wei Yang Bryan Lim,
Qiang Yang
Abstract:
Federated Learning (FL) is a promising privacy-preserving machine learning paradigm that allows data owners to collaboratively train models while keeping their data localized. Despite its potential, FL faces challenges related to the trustworthiness of both clients and servers, especially in the presence of curious or malicious adversaries. In this paper, we introduce a novel framework named \unde…
▽ More
Federated Learning (FL) is a promising privacy-preserving machine learning paradigm that allows data owners to collaboratively train models while keeping their data localized. Despite its potential, FL faces challenges related to the trustworthiness of both clients and servers, especially in the presence of curious or malicious adversaries. In this paper, we introduce a novel framework named \underline{\textbf{F}}ederated \underline{\textbf{L}}earning with \underline{\textbf{U}}pdate \underline{\textbf{D}}igest (FLUD), which addresses the critical issues of privacy preservation and resistance to Byzantine attacks within distributed learning environments. FLUD utilizes an innovative approach, the $\mathsf{LinfSample}$ method, allowing clients to compute the $l_{\infty}$ norm across sliding windows of updates as an update digest. This digest enables the server to calculate a shared distance matrix, significantly reducing the overhead associated with Secure Multi-Party Computation (SMPC) by three orders of magnitude while effectively distinguishing between benign and malicious updates. Additionally, FLUD integrates a privacy-preserving, voting-based defense mechanism that employs optimized SMPC protocols to minimize communication rounds. Our comprehensive experiments demonstrate FLUD's effectiveness in countering Byzantine adversaries while incurring low communication and runtime overhead. FLUD offers a scalable framework for secure and reliable FL in distributed environments, facilitating its application in scenarios requiring robust data management and security.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Securing Monolithic Kernels using Compartmentalization
Authors:
Soo Yee Lim,
Sidhartha Agrawal,
Xueyuan Han,
David Eyers,
Dan O'Keeffe,
Thomas Pasquier
Abstract:
Monolithic operating systems, where all kernel functionality resides in a single, shared address space, are the foundation of most mainstream computer systems. However, a single flaw, even in a non-essential part of the kernel (e.g., device drivers), can cause the entire operating system to fall under an attacker's control. Kernel hardening techniques might prevent certain types of vulnerabilities…
▽ More
Monolithic operating systems, where all kernel functionality resides in a single, shared address space, are the foundation of most mainstream computer systems. However, a single flaw, even in a non-essential part of the kernel (e.g., device drivers), can cause the entire operating system to fall under an attacker's control. Kernel hardening techniques might prevent certain types of vulnerabilities, but they fail to address a fundamental weakness: the lack of intra-kernel security that safely isolates different parts of the kernel. We survey kernel compartmentalization techniques that define and enforce intra-kernel boundaries and propose a taxonomy that allows the community to compare and discuss future work. We also identify factors that complicate comparisons among compartmentalized systems, suggest new ways to compare future approaches with existing work meaningfully, and discuss emerging research directions.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Incremental XAI: Memorable Understanding of AI with Incremental Explanations
Authors:
Jessica Y. Bo,
Pan Hao,
Brian Y. Lim
Abstract:
Many explainable AI (XAI) techniques strive for interpretability by providing concise salient information, such as sparse linear factors. However, users either only see inaccurate global explanations, or highly-varying local explanations. We propose to provide more detailed explanations by leveraging the human cognitive capacity to accumulate knowledge by incrementally receiving more details. Focu…
▽ More
Many explainable AI (XAI) techniques strive for interpretability by providing concise salient information, such as sparse linear factors. However, users either only see inaccurate global explanations, or highly-varying local explanations. We propose to provide more detailed explanations by leveraging the human cognitive capacity to accumulate knowledge by incrementally receiving more details. Focusing on linear factor explanations (factors $\times$ values = outcome), we introduce Incremental XAI to automatically partition explanations for general and atypical instances by providing Base + Incremental factors to help users read and remember more faithful explanations. Memorability is improved by reusing base factors and reducing the number of factors shown in atypical cases. In modeling, formative, and summative user studies, we evaluated the faithfulness, memorability and understandability of Incremental XAI against baseline explanation methods. This work contributes towards more usable explanation that users can better ingrain to facilitate intuitive engagement with AI.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
ForzaETH Race Stack -- Scaled Autonomous Head-to-Head Racing on Fully Commercial off-the-Shelf Hardware
Authors:
Nicolas Baumann,
Edoardo Ghignone,
Jonas Kühne,
Niklas Bastuck,
Jonathan Becker,
Nadine Imholz,
Tobias Kränzlin,
Tian Yi Lim,
Michael Lötscher,
Luca Schwarzenbach,
Luca Tognoni,
Christian Vogt,
Andrea Carron,
Michele Magno
Abstract:
Autonomous racing in robotics combines high-speed dynamics with the necessity for reliability and real-time decision-making. While such racing pushes software and hardware to their limits, many existing full-system solutions necessitate complex, custom hardware and software, and usually focus on Time-Trials rather than full unrestricted Head-to-Head racing, due to financial and safety constraints.…
▽ More
Autonomous racing in robotics combines high-speed dynamics with the necessity for reliability and real-time decision-making. While such racing pushes software and hardware to their limits, many existing full-system solutions necessitate complex, custom hardware and software, and usually focus on Time-Trials rather than full unrestricted Head-to-Head racing, due to financial and safety constraints. This limits their reproducibility, making advancements and replication feasible mostly for well-resourced laboratories with comprehensive expertise in mechanical, electrical, and robotics fields. Researchers interested in the autonomy domain but with only partial experience in one of these fields, need to spend significant time with familiarization and integration. The ForzaETH Race Stack addresses this gap by providing an autonomous racing software platform designed for F1TENTH, a 1:10 scaled Head-to-Head autonomous racing competition, which simplifies replication by using commercial off-the-shelf hardware. This approach enhances the competitive aspect of autonomous racing and provides an accessible platform for research and development in the field. The ForzaETH Race Stack is designed with modularity and operational ease of use in mind, allowing customization and adaptability to various environmental conditions, such as track friction and layout. Capable of handling both Time-Trials and Head-to-Head racing, the stack has demonstrated its effectiveness, robustness, and adaptability in the field by winning the official F1TENTH international competition multiple times.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties
Authors:
Jasmine Chiat Ling Ong,
Liyuan Jin,
Kabilan Elangovan,
Gilbert Yong San Lim,
Daniel Yan Zheng Lim,
Gerald Gui Ren Sng,
Yuhe Ke,
Joshua Yi Min Tung,
Ryan Jian Zhong,
Christopher Ming Yao Koh,
Keane Zhi Hao Lee,
Xiang Chen,
Jack Kian Chng,
Aung Than,
Ken Junyang Goh,
Daniel Shu Wei Ting
Abstract:
Importance: We introduce a novel Retrieval Augmented Generation (RAG)-Large Language Model (LLM) framework as a Clinical Decision Support Systems (CDSS) to support safe medication prescription.
Objective: To evaluate the efficacy of LLM-based CDSS in correctly identifying medication errors in different patient case vignettes from diverse medical and surgical sub-disciplines, against a human expe…
▽ More
Importance: We introduce a novel Retrieval Augmented Generation (RAG)-Large Language Model (LLM) framework as a Clinical Decision Support Systems (CDSS) to support safe medication prescription.
Objective: To evaluate the efficacy of LLM-based CDSS in correctly identifying medication errors in different patient case vignettes from diverse medical and surgical sub-disciplines, against a human expert panel derived ground truth. We compared performance for under 2 different CDSS practical healthcare integration modalities: LLM-based CDSS alone (fully autonomous mode) vs junior pharmacist + LLM-based CDSS (co-pilot, assistive mode).
Design, Setting, and Participants: Utilizing a RAG model with state-of-the-art medically-related LLMs (GPT-4, Gemini Pro 1.0 and Med-PaLM 2), this study used 61 prescribing error scenarios embedded into 23 complex clinical vignettes across 12 different medical and surgical specialties. A multidisciplinary expert panel assessed these cases for Drug-Related Problems (DRPs) using the PCNE classification and graded severity / potential for harm using revised NCC MERP medication error index. We compared.
Results RAG-LLM performed better compared to LLM alone. When employed in a co-pilot mode, accuracy, recall, and F1 scores were optimized, indicating effectiveness in identifying moderate to severe DRPs. The accuracy of DRP detection with RAG-LLM improved in several categories but at the expense of lower precision.
Conclusions This study established that a RAG-LLM based CDSS significantly boosts the accuracy of medication error identification when used alongside junior pharmacists (co-pilot), with notable improvements in detecting severe DRPs. This study also illuminates the comparative performance of current state-of-the-art LLMs in RAG-based CDSS systems.
△ Less
Submitted 17 February, 2024; v1 submitted 29 January, 2024;
originally announced February 2024.
-
Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias
Authors:
Yu He Ke,
Rui Yang,
Sui An Lie,
Taylor Xin Yi Lim,
Hairil Rizal Abdullah,
Daniel Shu Wei Ting,
Nan Liu
Abstract:
Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field.
Objective: This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decisi…
▽ More
Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field.
Objective: This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy.
Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 to facilitate interactions among four simulated agents to replicate clinical team dynamics. Each agent has a distinct role: 1) To make the final diagnosis after considering the discussions, 2) The devil's advocate and correct confirmation and anchoring bias, 3) The tutor and facilitator of the discussion to reduce premature closure bias, and 4) To record and summarize the findings. A total of 80 simulations were evaluated for the accuracy of initial diagnosis, top differential diagnosis and final two differential diagnoses.
Results: In a total of 80 responses evaluating both initial and final diagnoses, the initial diagnosis had an accuracy of 0% (0/80), but following multi-agent discussions, the accuracy for the top differential diagnosis increased to 71.3% (57/80), and for the final two differential diagnoses, to 80.0% (64/80).
Conclusions: The framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. The LLM-driven multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios.
△ Less
Submitted 12 May, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Robustness Evaluation of Localization Techniques for Autonomous Racing
Authors:
Tian Yi Lim,
Edoardo Ghignone,
Nicolas Baumann,
Michele Magno
Abstract:
This work introduces SynPF, an MCL-based algorithm tailored for high-speed racing environments. Benchmarked against Cartographer, a state-of-the-art pose-graph SLAM algorithm, SynPF leverages synergies from previous particle-filtering methods and synthesizes them for the high-performance racing domain. Our extensive in-field evaluations reveal that while Cartographer excels under nominal condition…
▽ More
This work introduces SynPF, an MCL-based algorithm tailored for high-speed racing environments. Benchmarked against Cartographer, a state-of-the-art pose-graph SLAM algorithm, SynPF leverages synergies from previous particle-filtering methods and synthesizes them for the high-performance racing domain. Our extensive in-field evaluations reveal that while Cartographer excels under nominal conditions, it struggles when subjected to wheel-slip, a common phenomenon in a racing scenario due to varying grip levels and aggressive driving behaviour. Conversely, SynPF demonstrates robustness in these challenging conditions and a low-latency computation time of 1.25 ms on on-board computers without a GPU. Using the F1TENTH platform, a 1:10 scaled autonomous racing vehicle, this work not only highlights the vulnerabilities of existing algorithms in high-speed scenarios, tested up until 7.6 m/s, but also emphasizes the potential of SynPF as a viable alternative, especially in deteriorating odometry conditions.
△ Less
Submitted 26 March, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
RainSD: Rain Style Diversification Module for Image Synthesis Enhancement using Feature-Level Style Distribution
Authors:
Hyeonjae Jeon,
Junghyun Seo,
Taesoo Kim,
Sungho Son,
Jungki Lee,
Gyeungho Choi,
Yongseob Lim
Abstract:
Autonomous driving technology nowadays targets to level 4 or beyond, but the researchers are faced with some limitations for developing reliable driving algorithms in diverse challenges. To promote the autonomous vehicles to spread widely, it is important to address safety issues on this technology. Among various safety concerns, the sensor blockage problem by severe weather conditions can be one…
▽ More
Autonomous driving technology nowadays targets to level 4 or beyond, but the researchers are faced with some limitations for developing reliable driving algorithms in diverse challenges. To promote the autonomous vehicles to spread widely, it is important to address safety issues on this technology. Among various safety concerns, the sensor blockage problem by severe weather conditions can be one of the most frequent threats for multi-task learning based perception algorithms during autonomous driving. To handle this problem, the importance of the generation of proper datasets is becoming more significant. In this paper, a synthetic road dataset with sensor blockage generated from real road dataset BDD100K is suggested in the format of BDD100K annotation. Rain streaks for each frame were made by an experimentally established equation and translated utilizing the image-to-image translation network based on style transfer. Using this dataset, the degradation of the diverse multi-task networks for autonomous driving, such as lane detection, driving area segmentation, and traffic object detection, has been thoroughly evaluated and analyzed. The tendency of the performance degradation of deep neural network-based perception systems for autonomous vehicle has been analyzed in depth. Finally, we discuss the limitation and the future directions of the deep neural network-based perception algorithms and autonomous driving dataset generation based on image-to-image translation.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
Challenges of YOLO Series for Object Detection in Extremely Heavy Rain: CALRA Simulator based Synthetic Evaluation Dataset
Authors:
T. Kim,
H. Jeon,
Y. Lim
Abstract:
Recently, as many studies of autonomous vehicles have been achieved for levels 4 and 5, there has been also increasing interest in the advancement of perception, decision, and control technologies, which are the three major aspects of autonomous vehicles. As for the perception technologies achieving reliable maneuvering of autonomous vehicles, object detection by using diverse sensors (e.g., LiDAR…
▽ More
Recently, as many studies of autonomous vehicles have been achieved for levels 4 and 5, there has been also increasing interest in the advancement of perception, decision, and control technologies, which are the three major aspects of autonomous vehicles. As for the perception technologies achieving reliable maneuvering of autonomous vehicles, object detection by using diverse sensors (e.g., LiDAR, radar, and camera) should be prioritized. These sensors require to detect objects accurately and quickly in diverse weather conditions, but they tend to have challenges to consistently detect objects in bad weather conditions with rain, snow, or fog. Thus, in this study, based on the experimentally obtained raindrop data from precipitation conditions, we constructed a novel dataset that could test diverse network model in various precipitation conditions through the CARLA simulator. Consequently, based on our novel dataset, YOLO series, a one-stage-detector, was used to quantitatively verify how much object detection performance could be decreased under various precipitation conditions from normal to extreme heavy rain situations.
△ Less
Submitted 14 December, 2023; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Integrated Path Tracking with DYC and MPC using LSTM Based Tire Force Estimator for Four-wheel Independent Steering and Driving Vehicle
Authors:
Sungjin Lim,
Bilal Sadiq,
Yongsik Jin,
Sangho Lee,
Gyeungho Choi,
Kanghyun Nam,
Yongseob Lim
Abstract:
Active collision avoidance system plays a crucial role in ensuring the lateral safety of autonomous vehicles, and it is primarily related to path planning and tracking control algorithms. In particular, the direct yaw-moment control (DYC) system can significantly improve the lateral stability of a vehicle in environments with sudden changes in road conditions. In order to apply the DYC algorithm,…
▽ More
Active collision avoidance system plays a crucial role in ensuring the lateral safety of autonomous vehicles, and it is primarily related to path planning and tracking control algorithms. In particular, the direct yaw-moment control (DYC) system can significantly improve the lateral stability of a vehicle in environments with sudden changes in road conditions. In order to apply the DYC algorithm, it is very important to accurately consider the properties of tire forces with complex nonlinearity for control to ensure the lateral stability of the vehicle. In this study, longitudinal and lateral tire forces for safety path tracking were simultaneously estimated using a long short-term memory (LSTM) neural network based estimator. Furthermore, to improve path tracking performance in case of sudden changes in road conditions, a system has been developed by combining 4-wheel independent steering (4WIS) model predictive control (MPC) and 4-wheel independent drive (4WID) direct yaw-moment control (DYC). The estimation performance of the extended Kalman filter (EKF), which are commonly used for tire force estimation, was compared. In addition, the estimated longitudinal and lateral tire forces of each wheel were applied to the proposed system, and system verification was performed through simulation using a vehicle dynamics simulator. Consequently, the proposed method, the integrated path tracking algorithm with DYC and MPC using the LSTM based estimator, was validated to significantly improve the vehicle stability in suddenly changing road conditions.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
CARTOS: A Charging-Aware Real-Time Operating System for Intermittent Batteryless Devices
Authors:
Mohsen Karimi,
Yidi Wang,
Youngbin Kim,
Yoojin Lim,
Hyoseung Kim
Abstract:
This paper presents CARTOS, a charging-aware real-time operating system designed to enhance the functionality of intermittently-powered batteryless devices (IPDs) for various Internet of Things (IoT) applications. While IPDs offer significant advantages such as extended lifespan and operability in extreme environments, they pose unique challenges, including the need to ensure forward progress of p…
▽ More
This paper presents CARTOS, a charging-aware real-time operating system designed to enhance the functionality of intermittently-powered batteryless devices (IPDs) for various Internet of Things (IoT) applications. While IPDs offer significant advantages such as extended lifespan and operability in extreme environments, they pose unique challenges, including the need to ensure forward progress of program execution amidst variable energy availability and maintaining reliable real-time time behavior during power disruptions. To address these challenges, CARTOS introduces a mixed-preemption scheduling model that classifies tasks into computational and peripheral tasks, and ensures their efficient and timely execution by adopting just-in-time checkpointing for divisible computation tasks and uninterrupted execution for indivisible peripheral tasks. CARTOS also supports processing chains of tasks with precedence constraints and adapts its scheduling in response to environmental changes to offer continuous execution under diverse conditions. CARTOS is implemented with new APIs and components added to FreeRTOS but is designed for portability to other embedded RTOSs. Through real hardware experiments and simulations, CARTOS exhibits superior performance over state-of-the-art methods, demonstrating that it can serve as a practical platform for developing resilient, real-time sensing applications on IPDs.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Improving Korean NLP Tasks with Linguistically Informed Subword Tokenization and Sub-character Decomposition
Authors:
Taehee Jeon,
Bongseok Yang,
Changhwan Kim,
Yoonseob Lim
Abstract:
We introduce a morpheme-aware subword tokenization method that utilizes sub-character decomposition to address the challenges of applying Byte Pair Encoding (BPE) to Korean, a language characterized by its rich morphology and unique writing system. Our approach balances linguistic accuracy with computational efficiency in Pre-trained Language Models (PLMs). Our evaluations show that this technique…
▽ More
We introduce a morpheme-aware subword tokenization method that utilizes sub-character decomposition to address the challenges of applying Byte Pair Encoding (BPE) to Korean, a language characterized by its rich morphology and unique writing system. Our approach balances linguistic accuracy with computational efficiency in Pre-trained Language Models (PLMs). Our evaluations show that this technique achieves good performances overall, notably improving results in the syntactic task of NIKL-CoLA. This suggests that integrating morpheme type information can enhance language models' syntactic and semantic capabilities, indicating that adopting more linguistic insights can further improve performance beyond standard morphological analysis.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Thermal-Infrared Remote Target Detection System for Maritime Rescue based on Data Augmentation with 3D Synthetic Data
Authors:
Sungjin Cheong,
Wonho Jung,
Yoon Seop Lim,
Yong-Hwa Park
Abstract:
This paper proposes a thermal-infrared (TIR) remote target detection system for maritime rescue using deep learning and data augmentation. We established a self-collected TIR dataset consisting of multiple scenes imitating human rescue situations using a TIR camera (FLIR). Additionally, to address dataset scarcity and improve model robustness, a synthetic dataset from a 3D game (ARMA3) to augment…
▽ More
This paper proposes a thermal-infrared (TIR) remote target detection system for maritime rescue using deep learning and data augmentation. We established a self-collected TIR dataset consisting of multiple scenes imitating human rescue situations using a TIR camera (FLIR). Additionally, to address dataset scarcity and improve model robustness, a synthetic dataset from a 3D game (ARMA3) to augment the data is further collected. However, a significant domain gap exists between synthetic TIR and real TIR images. Hence, a proper domain adaptation algorithm is essential to overcome the gap. Therefore, we suggest a domain adaptation algorithm in a target-background separated manner from 3D game-to-real, based on a generative model, to address this issue. Furthermore, a segmentation network with fixed-weight kernels at the head is proposed to improve the signal-to-noise ratio (SNR) and provide weak attention, as remote TIR targets inherently suffer from unclear boundaries. Experiment results reveal that the network trained on augmented data consisting of translated synthetic and real TIR data outperforms that trained on only real TIR data by a large margin. Furthermore, the proposed segmentation model surpasses the performance of state-of-the-art segmentation methods.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
General-Purpose Retrieval-Enhanced Medical Prediction Model Using Near-Infinite History
Authors:
Junu Kim,
Chaeeun Shim,
Bosco Seong Kyu Yang,
Chami Im,
Sung Yoon Lim,
Han-Gil Jeong,
Edward Choi
Abstract:
Developing clinical prediction models (e.g., mortality prediction) based on electronic health records (EHRs) typically relies on expert opinion for feature selection and adjusting observation window size. This burdens experts and creates a bottleneck in the development process. We propose Retrieval-Enhanced Medical prediction model (REMed) to address such challenges. REMed can essentially evaluate…
▽ More
Developing clinical prediction models (e.g., mortality prediction) based on electronic health records (EHRs) typically relies on expert opinion for feature selection and adjusting observation window size. This burdens experts and creates a bottleneck in the development process. We propose Retrieval-Enhanced Medical prediction model (REMed) to address such challenges. REMed can essentially evaluate an unlimited number of clinical events, select the relevant ones, and make predictions. This approach effectively eliminates the need for manual feature selection and enables an unrestricted observation window. We verified these properties through experiments on 27 clinical tasks and two independent cohorts from publicly available EHR datasets, where REMed outperformed other contemporary architectures that aim to handle as many events as possible. Notably, we found that the preferences of REMed align closely with those of medical experts. We expect our approach to significantly expedite the development of EHR prediction models by minimizing clinicians' need for manual involvement.
△ Less
Submitted 20 March, 2024; v1 submitted 31 October, 2023;
originally announced October 2023.
-
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
Authors:
Daehee Kim,
Yoonsik Kim,
DongHyun Kim,
Yumin Lim,
Geewook Kim,
Taeho Kil
Abstract:
Inspired by the great success of language model (LM)-based pre-training, recent studies in visual document understanding have explored LM-based pre-training methods for modeling text within document images. Among them, pre-training that reads all text from an image has shown promise, but often exhibits instability and even fails when applied to broader domains, such as those involving both visual…
▽ More
Inspired by the great success of language model (LM)-based pre-training, recent studies in visual document understanding have explored LM-based pre-training methods for modeling text within document images. Among them, pre-training that reads all text from an image has shown promise, but often exhibits instability and even fails when applied to broader domains, such as those involving both visual documents and scene text images. This is a substantial limitation for real-world scenarios, where the processing of text image inputs in diverse domains is essential. In this paper, we investigate effective pre-training tasks in the broader domains and also propose a novel pre-training method called SCOB that leverages character-wise supervised contrastive learning with online text rendering to effectively pre-train document and scene text domains by bridging the domain gap. Moreover, SCOB enables weakly supervised learning, significantly reducing annotation costs. Extensive benchmarks demonstrate that SCOB generally improves vanilla pre-training methods and achieves comparable performance to state-of-the-art methods. Our findings suggest that SCOB can be served generally and effectively for read-type pre-training methods. The code will be available at https://github.com/naver-ai/scob.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Parallelizing non-linear sequential models over the sequence length
Authors:
Yi Heng Lim,
Qi Zhu,
Joshua Selfridge,
Muhammad Firmansyah Kasim
Abstract:
Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought sequential models could not be parallelized. We challenge this long-held belief with our parallel algorithm that accelerates GPU evaluation of sequential models b…
▽ More
Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought sequential models could not be parallelized. We challenge this long-held belief with our parallel algorithm that accelerates GPU evaluation of sequential models by up to 3 orders of magnitude faster without compromising output accuracy. The algorithm does not need any special structure in the sequential models' architecture, making it applicable to a wide range of architectures. Using our method, training sequential models can be more than 10 times faster than the common sequential method without any meaningful difference in the training results. Leveraging this accelerated training, we discovered the efficacy of the Gated Recurrent Unit in a long time series classification problem with 17k time samples. By overcoming the training bottleneck, our work serves as the first step to unlock the potential of non-linear sequential models for long sequence problems.
△ Less
Submitted 16 January, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Feeding the Coffee Habit: A Longitudinal Study of a Robo-Barista
Authors:
Mei Yii Lim,
David A. Robb,
Bruce W. Wilson,
Helen Hastie
Abstract:
Studying Human-Robot Interaction over time can provide insights into what really happens when a robot becomes part of people's everyday lives. "In the Wild" studies inform the design of social robots, such as for the service industry, to enable them to remain engaging and useful beyond the novelty effect and initial adoption. This paper presents an "In the Wild" experiment where we explored the ev…
▽ More
Studying Human-Robot Interaction over time can provide insights into what really happens when a robot becomes part of people's everyday lives. "In the Wild" studies inform the design of social robots, such as for the service industry, to enable them to remain engaging and useful beyond the novelty effect and initial adoption. This paper presents an "In the Wild" experiment where we explored the evolution of interaction between users and a Robo-Barista. We show that perceived trust and prior attitudes are both important factors associated with the usefulness, adaptability and likeability of the Robo-Barista. A combination of interaction features and user attributes are used to predict user satisfaction. Qualitative insights illuminated users' Robo-Barista experience and contribute to a number of lessons learned for future long-term studies.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Short-Term Stock Price Forecasting using exogenous variables and Machine Learning Algorithms
Authors:
Albert Wong,
Steven Whang,
Emilio Sagre,
Niha Sachin,
Gustavo Dutra,
Yew-Wei Lim,
Gaetan Hains,
Youry Khmelevsky,
Frank Zhang
Abstract:
Creating accurate predictions in the stock market has always been a significant challenge in finance. With the rise of machine learning as the next level in the forecasting area, this research paper compares four machine learning models and their accuracy in forecasting three well-known stocks traded in the NYSE in the short term from March 2020 to May 2022. We deploy, develop, and tune XGBoost, R…
▽ More
Creating accurate predictions in the stock market has always been a significant challenge in finance. With the rise of machine learning as the next level in the forecasting area, this research paper compares four machine learning models and their accuracy in forecasting three well-known stocks traded in the NYSE in the short term from March 2020 to May 2022. We deploy, develop, and tune XGBoost, Random Forest, Multi-layer Perceptron, and Support Vector Regression models. We report the models that produce the highest accuracies from our evaluation metrics: RMSE, MAPE, MTT, and MPE. Using a training data set of 240 trading days, we find that XGBoost gives the highest accuracy despite running longer (up to 10 seconds). Results from this study may improve by further tuning the individual parameters or introducing more exogenous variables.
△ Less
Submitted 17 May, 2023;
originally announced September 2023.
-
Latent Emission-Augmented Perspective-Taking (LEAPT) for Human-Robot Interaction
Authors:
Kaiqi Chen,
Jing Yu Lim,
Kingsley Kuan,
Harold Soh
Abstract:
Perspective-taking is the ability to perceive or understand a situation or concept from another individual's point of view, and is crucial in daily human interactions. Enabling robots to perform perspective-taking remains an unsolved problem; existing approaches that use deterministic or handcrafted methods are unable to accurately account for uncertainty in partially-observable settings. This wor…
▽ More
Perspective-taking is the ability to perceive or understand a situation or concept from another individual's point of view, and is crucial in daily human interactions. Enabling robots to perform perspective-taking remains an unsolved problem; existing approaches that use deterministic or handcrafted methods are unable to accurately account for uncertainty in partially-observable settings. This work proposes to address this limitation via a deep world model that enables a robot to perform both perception and conceptual perspective taking, i.e., the robot is able to infer what a human sees and believes. The key innovation is a decomposed multi-modal latent state space model able to generate and augment fictitious observations/emissions. Optimizing the ELBO that arises from this probabilistic graphical model enables the learning of uncertainty in latent space, which facilitates uncertainty estimation from high-dimensional observations. We tasked our model to predict human observations and beliefs on three partially-observable HRI tasks. Experiments show that our method significantly outperforms existing baselines and is able to infer visual observations available to other agent and their internal beliefs.
△ Less
Submitted 12 August, 2023;
originally announced August 2023.
-
Unleashing Unprivileged eBPF Potential with Dynamic Sandboxing
Authors:
Soo Yee Lim,
Xueyuan Han,
Thomas Pasquier
Abstract:
For safety reasons, unprivileged users today have only limited ways to customize the kernel through the extended Berkeley Packet Filter (eBPF). This is unfortunate, especially since the eBPF framework itself has seen an increase in scope over the years. We propose SandBPF, a software-based kernel isolation technique that dynamically sandboxes eBPF programs to allow unprivileged users to safely ext…
▽ More
For safety reasons, unprivileged users today have only limited ways to customize the kernel through the extended Berkeley Packet Filter (eBPF). This is unfortunate, especially since the eBPF framework itself has seen an increase in scope over the years. We propose SandBPF, a software-based kernel isolation technique that dynamically sandboxes eBPF programs to allow unprivileged users to safely extend the kernel, unleashing eBPF's full potential. Our early proof-of-concept shows that SandBPF can effectively prevent exploits missed by eBPF's native safety mechanism (i.e., static verification) while incurring 0%-10% overhead on web server benchmarks.
△ Less
Submitted 15 August, 2023; v1 submitted 3 August, 2023;
originally announced August 2023.
-
We are all Individuals: The Role of Robot Personality and Human Traits in Trustworthy Interaction
Authors:
Mei Yii Lim,
José David Aguas Lopes,
David A. Robb,
Bruce W. Wilson,
Meriam Moujahid,
Emanuele De Pellegrin,
Helen Hastie
Abstract:
As robots take on roles in our society, it is important that their appearance, behaviour and personality are appropriate for the job they are given and are perceived favourably by the people with whom they interact. Here, we provide an extensive quantitative and qualitative study exploring robot personality but, importantly, with respect to individual human traits. Firstly, we show that we can acc…
▽ More
As robots take on roles in our society, it is important that their appearance, behaviour and personality are appropriate for the job they are given and are perceived favourably by the people with whom they interact. Here, we provide an extensive quantitative and qualitative study exploring robot personality but, importantly, with respect to individual human traits. Firstly, we show that we can accurately portray personality in a social robot, in terms of extroversion-introversion using vocal cues and linguistic features. Secondly, through garnering preferences and trust ratings for these different robot personalities, we establish that, for a Robo-Barista, an extrovert robot is preferred and trusted more than an introvert robot, regardless of the subject's own personality. Thirdly, we find that individual attitudes and predispositions towards robots do impact trust in the Robo-Baristas, and are therefore important considerations in addition to robot personality, roles and interaction context when designing any human-robot interaction study.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
A Study on Quantifying Sim2Real Image Gap in Autonomous Driving Simulations Using Lane Segmentation Attention Map Similarity
Authors:
Seongjeong Park,
Jinu Pahk,
Lennart Lorenz Freimuth Jahn,
Yongseob Lim,
Jinung An,
Gyeungho Choi
Abstract:
Autonomous driving simulations require highly realistic images. Our preliminary study found that when the CARLA Simulator image was made more like reality by using DCLGAN, the performance of the lane recognition model improved to levels comparable to real-world driving. It was also confirmed that the vehicle's ability to return to the center of the lane after deviating from it improved significant…
▽ More
Autonomous driving simulations require highly realistic images. Our preliminary study found that when the CARLA Simulator image was made more like reality by using DCLGAN, the performance of the lane recognition model improved to levels comparable to real-world driving. It was also confirmed that the vehicle's ability to return to the center of the lane after deviating from it improved significantly. However, there is currently no agreed-upon metric for quantitatively evaluating the realism of simulation images. To address this issue, based on the idea that FID (Fréchet Inception Distance) measures the feature vector distribution distance using a pre-trained model, this paper proposes a metric that measures the similarity of simulation road images using the attention map from the self-attention distillation process of ENet-SAD. Finally, this paper verified the suitability of the measurement method by applying it to the image of the CARLA map that implemented a realworld autonomous driving test road.
△ Less
Submitted 18 June, 2023;
originally announced June 2023.
-
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration
Authors:
Hwaran Lee,
Seokhee Hong,
Joonsuk Park,
Takyoung Kim,
Meeyoung Cha,
Yejin Choi,
Byoung Pil Kim,
Gunhee Kim,
Eun-Ju Lee,
Yong Lim,
Alice Oh,
Sangchul Park,
Jung-Woo Ha
Abstract:
The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-inte…
▽ More
The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-intentioned. For safer models in such scenarios, we present the Sensitive Questions and Acceptable Response (SQuARe) dataset, a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses. The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines. Experiments show that acceptable response generation significantly improves for HyperCLOVA and GPT-3, demonstrating the efficacy of this dataset.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
RPLKG: Robust Prompt Learning with Knowledge Graph
Authors:
Yewon Kim,
YongTaek Lim,
Dokyung Yoon,
KyungWoo Song
Abstract:
Large-scale pre-trained models have been known that they are transferable, and they generalize well on the unseen dataset. Recently, multimodal pre-trained models such as CLIP show significant performance improvement in diverse experiments. However, when the labeled dataset is limited, the generalization of a new dataset or domain is still challenging. To improve the generalization performance on…
▽ More
Large-scale pre-trained models have been known that they are transferable, and they generalize well on the unseen dataset. Recently, multimodal pre-trained models such as CLIP show significant performance improvement in diverse experiments. However, when the labeled dataset is limited, the generalization of a new dataset or domain is still challenging. To improve the generalization performance on few-shot learning, there have been diverse efforts, such as prompt learning and adapter. However, the current few-shot adaptation methods are not interpretable, and they require a high computation cost for adaptation. In this study, we propose a new method, robust prompt learning with knowledge graph (RPLKG). Based on the knowledge graph, we automatically design diverse interpretable and meaningful prompt sets. Our model obtains cached embeddings of prompt sets after one forwarding from a large pre-trained model. After that, model optimizes the prompt selection processes with GumbelSoftmax. In this way, our model is trained using relatively little memory and learning time. Also, RPLKG selects the optimal interpretable prompt automatically, depending on the dataset. In summary, RPLKG is i) interpretable, ii) requires small computation resources, and iii) easy to incorporate prior human knowledge. To validate the RPLKG, we provide comprehensive experimental results on few-shot learning, domain generalization and new class generalization setting. RPLKG shows a significant performance improvement compared to zero-shot learning and competitive performance against several prompt learning methods using much lower resources.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Sustainable AIGC Workload Scheduling of Geo-Distributed Data Centers: A Multi-Agent Reinforcement Learning Approach
Authors:
Siyue Zhang,
Minrui Xu,
Wei Yang Bryan Lim,
Dusit Niyato
Abstract:
Recent breakthroughs in generative artificial intelligence have triggered a surge in demand for machine learning training, which poses significant cost burdens and environmental challenges due to its substantial energy consumption. Scheduling training jobs among geographically distributed cloud data centers unveils the opportunity to optimize the usage of computing capacity powered by inexpensive…
▽ More
Recent breakthroughs in generative artificial intelligence have triggered a surge in demand for machine learning training, which poses significant cost burdens and environmental challenges due to its substantial energy consumption. Scheduling training jobs among geographically distributed cloud data centers unveils the opportunity to optimize the usage of computing capacity powered by inexpensive and low-carbon energy and address the issue of workload imbalance. To tackle the challenge of multi-objective scheduling, i.e., maximizing GPU utilization while reducing operational costs, we propose an algorithm based on multi-agent reinforcement learning and actor-critic methods to learn the optimal collaborative scheduling strategy through interacting with a cloud system built with real-life workload patterns, energy prices, and carbon intensities. Compared with other algorithms, our proposed method improves the system utility by up to 28.6% attributable to higher GPU utilization, lower energy cost, and less carbon emission.
△ Less
Submitted 16 April, 2023;
originally announced April 2023.
-
Visual based Tomato Size Measurement System for an Indoor Farming Environment
Authors:
Andy Kweon,
Vishnu Hu,
Jong Yoon Lim,
Trevor Gee,
Edmond Liu,
Henry Williams,
Bruce A. MacDonald,
Mahla Nejati,
Inkyu Sa,
Ho Seok Ahn
Abstract:
As technology progresses, smart automated systems will serve an increasingly important role in the agricultural industry. Current existing vision systems for yield estimation face difficulties in occlusion and scalability as they utilize a camera system that is large and expensive, which are unsuitable for orchard environments. To overcome these problems, this paper presents a size measurement met…
▽ More
As technology progresses, smart automated systems will serve an increasingly important role in the agricultural industry. Current existing vision systems for yield estimation face difficulties in occlusion and scalability as they utilize a camera system that is large and expensive, which are unsuitable for orchard environments. To overcome these problems, this paper presents a size measurement method combining a machine learning model and depth images captured from three low cost RGBD cameras to detect and measure the height and width of tomatoes. The performance of the presented system is evaluated on a lab environment with real tomato fruits and fake leaves to simulate occlusion in the real farm environment. To improve accuracy by addressing fruit occlusion, our three-camera system was able to achieve a height measurement accuracy of 0.9114 and a width accuracy of 0.9443.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
That's What I Said: Fully-Controllable Talking Face Generation
Authors:
Youngjoon Jang,
Kyeongha Rho,
Jong-Bin Woo,
Hyeongkeun Lee,
Jihwan Park,
Youshin Lim,
Byeong-Yeol Kim,
Joon Son Chung
Abstract:
The goal of this paper is to synthesise talking faces with controllable facial motions. To achieve this goal, we propose two key ideas. The first is to establish a canonical space where every face has the same motion patterns but different identities. The second is to navigate a multimodal motion space that only represents motion-related features while eliminating identity information. To disentan…
▽ More
The goal of this paper is to synthesise talking faces with controllable facial motions. To achieve this goal, we propose two key ideas. The first is to establish a canonical space where every face has the same motion patterns but different identities. The second is to navigate a multimodal motion space that only represents motion-related features while eliminating identity information. To disentangle identity and motion, we introduce an orthogonality constraint between the two different latent spaces. From this, our method can generate natural-looking talking faces with fully controllable facial attributes and accurate lip synchronisation. Extensive experiments demonstrate that our method achieves state-of-the-art results in terms of both visual quality and lip-sync score. To the best of our knowledge, we are the first to develop a talking face generation framework that can accurately manifest full target facial motions including lip, head pose, and eye movements in the generated video without any additional supervision beyond RGB video with audio.
△ Less
Submitted 18 September, 2023; v1 submitted 6 April, 2023;
originally announced April 2023.
-
BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning
Authors:
Changdae Oh,
Hyeji Hwang,
Hee-young Lee,
YongTaek Lim,
Geunyoung Jung,
Jiyoung Jung,
Hosik Choi,
Kyungwoo Song
Abstract:
With the surge of large-scale pre-trained models (PTMs), fine-tuning these models to numerous downstream tasks becomes a crucial problem. Consequently, parameter efficient transfer learning (PETL) of large models has grasped huge attention. While recent PETL methods showcase impressive performance, they rely on optimistic assumptions: 1) the entire parameter set of a PTM is available, and 2) a suf…
▽ More
With the surge of large-scale pre-trained models (PTMs), fine-tuning these models to numerous downstream tasks becomes a crucial problem. Consequently, parameter efficient transfer learning (PETL) of large models has grasped huge attention. While recent PETL methods showcase impressive performance, they rely on optimistic assumptions: 1) the entire parameter set of a PTM is available, and 2) a sufficiently large memory capacity for the fine-tuning is equipped. However, in most real-world applications, PTMs are served as a black-box API or proprietary software without explicit parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. In this work, we propose black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge about model architectures and parameters. BlackVIP has two components; 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent image-shaped visual prompts, which improves few-shot adaptation and robustness on distribution/location shift. SPSA-GC efficiently estimates the gradient of a target model to update Coordinator. Extensive experiments on 16 datasets demonstrate that BlackVIP enables robust adaptation to diverse domains without accessing PTMs' parameters, with minimal memory requirements. Code: \url{https://github.com/changdaeoh/BlackVIP}
△ Less
Submitted 8 July, 2023; v1 submitted 26 March, 2023;
originally announced March 2023.
-
IRIS: Interpretable Rubric-Informed Segmentation for Action Quality Assessment
Authors:
Hitoshi Matsuyama,
Nobuo Kawaguchi,
Brian Y. Lim
Abstract:
AI-driven Action Quality Assessment (AQA) of sports videos can mimic Olympic judges to help score performances as a second opinion or for training. However, these AI methods are uninterpretable and do not justify their scores, which is important for algorithmic accountability. Indeed, to account for their decisions, instead of scoring subjectively, sports judges use a consistent set of criteria -…
▽ More
AI-driven Action Quality Assessment (AQA) of sports videos can mimic Olympic judges to help score performances as a second opinion or for training. However, these AI methods are uninterpretable and do not justify their scores, which is important for algorithmic accountability. Indeed, to account for their decisions, instead of scoring subjectively, sports judges use a consistent set of criteria - rubric - on multiple actions in each performance sequence. Therefore, we propose IRIS to perform Interpretable Rubric-Informed Segmentation on action sequences for AQA. We investigated IRIS for scoring videos of figure skating performance. IRIS predicts (1) action segments, (2) technical element score differences of each segment relative to base scores, (3) multiple program component scores, and (4) the summed final score. In a modeling study, we found that IRIS performs better than non-interpretable, state-of-the-art models. In a formative user study, practicing figure skaters agreed with the rubric-informed explanations, found them useful, and trusted AI judgments more. This work highlights the importance of using judgment rubrics to account for AI decisions.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Scalable Object Detection on Embedded Devices Using Weight Pruning and Singular Value Decomposition
Authors:
Dohyun Ham,
Jaeyeop Jeong,
June-Kyoo Park,
Raehyeon Jeong,
Seungmin Jeon,
Hyeongjun Jeon,
Yewon Lim
Abstract:
This paper presents a method for optimizing object detection models by combining weight pruning and singular value decomposition (SVD). The proposed method was evaluated on a custom dataset of street work images obtained from https://universe.roboflow.com/roboflow-100/street-work. The dataset consists of 611 training images, 175 validation images, and 87 test images with 7 classes. We compared the…
▽ More
This paper presents a method for optimizing object detection models by combining weight pruning and singular value decomposition (SVD). The proposed method was evaluated on a custom dataset of street work images obtained from https://universe.roboflow.com/roboflow-100/street-work. The dataset consists of 611 training images, 175 validation images, and 87 test images with 7 classes. We compared the performance of the optimized models with the original unoptimized model in terms of frame rate, mean average precision (mAP@50), and weight size. The results show that the weight pruning + SVD model achieved a 0.724 mAP@50 with a frame rate of 1.48 FPS and a weight size of 12.1 MB, outperforming the original model (0.717 mAP@50, 1.50 FPS, and 12.3 MB). Precision-recall curves were also plotted for all models. Our work demonstrates that the proposed method can effectively optimize object detection models while balancing accuracy, speed, and model size.
△ Less
Submitted 17 March, 2023; v1 submitted 5 March, 2023;
originally announced March 2023.
-
RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions
Authors:
Yunlong Wang,
Shuyuan Shen,
Brian Y. Lim
Abstract:
Generative AI models have shown impressive ability to produce images with text prompts, which could benefit creativity in visual art creation and self-expression. However, it is unclear how precisely the generated images express contexts and emotions from the input texts. We explored the emotional expressiveness of AI-generated images and developed RePrompt, an automatic method to refine text prom…
▽ More
Generative AI models have shown impressive ability to produce images with text prompts, which could benefit creativity in visual art creation and self-expression. However, it is unclear how precisely the generated images express contexts and emotions from the input texts. We explored the emotional expressiveness of AI-generated images and developed RePrompt, an automatic method to refine text prompts toward precise expression of the generated images. Inspired by crowdsourced editing strategies, we curated intuitive text features, such as the number and concreteness of nouns, and trained a proxy model to analyze the feature effects on the AI-generated image. With model explanations of the proxy model, we curated a rubric to adjust text prompts to optimize image generation for precise emotion expression. We conducted simulation and user studies, which showed that RePrompt significantly improves the emotional expressiveness of AI-generated images, especially for negative emotions.
△ Less
Submitted 19 March, 2023; v1 submitted 18 February, 2023;
originally announced February 2023.
-
Diagrammatization: Rationalizing with diagrammatic AI explanations for abductive-deductive reasoning on hypotheses
Authors:
Brian Y. Lim,
Joseph P. Cahaly,
Chester Y. F. Sng,
Adam Chew
Abstract:
Many visualizations have been developed for explainable AI (XAI), but they often require further reasoning by users to interpret. We argue that XAI should support diagrammatic and abductive reasoning for the AI to perform hypothesis generation and evaluation to reduce the interpretability gap. We propose Diagrammatization to i) perform Peircean abductive-deductive reasoning, ii) follow domain conv…
▽ More
Many visualizations have been developed for explainable AI (XAI), but they often require further reasoning by users to interpret. We argue that XAI should support diagrammatic and abductive reasoning for the AI to perform hypothesis generation and evaluation to reduce the interpretability gap. We propose Diagrammatization to i) perform Peircean abductive-deductive reasoning, ii) follow domain conventions, and iii) explain with diagrams visually or verbally. We implemented DiagramNet for a clinical application to predict cardiac diagnoses from heart auscultation, and explain with shape-based murmur diagrams. In modeling studies, we found that DiagramNet not only provides faithful murmur shape explanations, but also has better prediction performance than baseline models. We further demonstrate the interpretability and trustworthiness of diagrammatic explanations in a qualitative user study with medical students, showing that clinically-relevant, diagrammatic explanations are preferred over technical saliency map explanations. This work contributes insights into providing domain-conventional abductive explanations for user-centric XAI.
△ Less
Submitted 12 July, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Enhanced artificial intelligence-based diagnosis using CBCT with internal denoising: Clinical validation for discrimination of fungal ball, sinusitis, and normal cases in the maxillary sinus
Authors:
Kyungsu Kim,
Chae Yeon Lim,
Joong Bo Shin,
Myung Jin Chung,
Yong Gi Jung
Abstract:
The cone-beam computed tomography (CBCT) provides 3D volumetric imaging of a target with low radiation dose and cost compared with conventional computed tomography, and it is widely used in the detection of paranasal sinus disease. However, it lacks the sensitivity to detect soft tissue lesions owing to reconstruction constraints. Consequently, only physicians with expertise in CBCT reading can di…
▽ More
The cone-beam computed tomography (CBCT) provides 3D volumetric imaging of a target with low radiation dose and cost compared with conventional computed tomography, and it is widely used in the detection of paranasal sinus disease. However, it lacks the sensitivity to detect soft tissue lesions owing to reconstruction constraints. Consequently, only physicians with expertise in CBCT reading can distinguish between inherent artifacts or noise and diseases, restricting the use of this imaging modality. The development of artificial intelligence (AI)-based computer-aided diagnosis methods for CBCT to overcome the shortage of experienced physicians has attracted substantial attention. However, advanced AI-based diagnosis addressing intrinsic noise in CBCT has not been devised, discouraging the practical use of AI solutions for CBCT. To address this issue, we propose an AI-based computer-aided diagnosis method using CBCT with a denoising module. This module is implemented before diagnosis to reconstruct the internal ground-truth full-dose scan corresponding to an input CBCT image and thereby improve the diagnostic performance. The external validation results for the unified diagnosis of sinus fungal ball, chronic rhinosinusitis, and normal cases show that the proposed method improves the micro-, macro-average AUC, and accuracy by 7.4, 5.6, and 9.6% (from 86.2, 87.0, and 73.4 to 93.6, 92.6, and 83.0%), respectively, compared with a baseline while improving human diagnosis accuracy by 11% (from 71.7 to 83.0%), demonstrating technical differentiation and clinical effectiveness. This pioneering study on AI-based diagnosis using CBCT indicates denoising can improve diagnostic performance and reader interpretability in images from the sinonasal area, thereby providing a new approach and direction to radiographic image reconstruction regarding the development of AI-based diagnostic solutions.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Effects of Sim2Real Image Translation on Lane Keeping Assist System in CARLA Simulator
Authors:
Jinu Pahk,
Jungseok Shim,
MinHyeok Baek,
Yongseob Lim,
Gyeungho Choi
Abstract:
Autonomous vehicle simulation has the advantage of testing algorithms in various environment variables and scenarios without wasting time and resources, however, there is a visual gap with the real-world. In this paper, we trained DCLGAN to realistically convert the image of the CARLA simulator and evaluated the effect of the Sim2Real conversion focusing on the LKAS (Lane Keeping Assist System) al…
▽ More
Autonomous vehicle simulation has the advantage of testing algorithms in various environment variables and scenarios without wasting time and resources, however, there is a visual gap with the real-world. In this paper, we trained DCLGAN to realistically convert the image of the CARLA simulator and evaluated the effect of the Sim2Real conversion focusing on the LKAS (Lane Keeping Assist System) algorithm. In order to avoid the case where the lane is translated distortedly by DCLGAN, we found the optimal training hyperparameter using FSIM (feature-similarity). After training, we built a system that connected the DCLGAN model with CARLA and AV in real-time. Then, we collected data (e.g. images, GPS) and analyzed them using the following four methods. First, image reality was measured with FID, which we verified quantitatively reflects the lane characteristics. CARLA images that passed through DCLGAN had smaller FID values than the original images. Second, lane segmentation accuracy through ENet-SAD was improved by DCLGAN. Third, in the curved route, the case of using DCLGAN drove closer to the center of the lane and had a high success rate. Lastly, in the straight route, DCLGAN improved lane restoring ability after deviating from the center of the lane as much as in reality.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Towards Green Metaverse Networking Technologies, Advancements and Future Directions
Authors:
Siyue Zhang,
Wei Yang Bryan Lim,
Wei Chong Ng,
Zehui Xiong,
Dusit Niyato,
Xuemin Sherman Shen,
Chunyan Miao
Abstract:
As the Metaverse is iteratively being defined, its potential to unleash the next wave of digital disruption and create real-life value becomes increasingly clear. With distinctive features of immersive experience, simultaneous interactivity, and user agency, the Metaverse has the capability to transform all walks of life. However, the enabling technologies of the Metaverse, i.e., digital twin, art…
▽ More
As the Metaverse is iteratively being defined, its potential to unleash the next wave of digital disruption and create real-life value becomes increasingly clear. With distinctive features of immersive experience, simultaneous interactivity, and user agency, the Metaverse has the capability to transform all walks of life. However, the enabling technologies of the Metaverse, i.e., digital twin, artificial intelligence, blockchain, and extended reality, are known to be energy-hungry, therefore raising concerns about the sustainability of its large-scale deployment and development. This article proposes Green Metaverse Networking for the first time to optimize energy efficiencies of all network components for Metaverse sustainable development. We first analyze energy consumption, efficiency, and sustainability of energy-intensive technologies in the Metaverse. Next, focusing on computation and networking, we present major advancements related to energy efficiency and their integration into the Metaverse. A case study of energy conservation by incorporating semantic communication and stochastic resource allocation in the Metaverse is presented. Finally, we outline the critical challenges of Metaverse sustainable development, thereby indicating potential directions of future research towards the green Metaverse.
△ Less
Submitted 13 April, 2023; v1 submitted 6 November, 2022;
originally announced November 2022.
-
Metric Learning for User-defined Keyword Spotting
Authors:
Jaemin Jung,
Youkyum Kim,
Jihwan Park,
Youshin Lim,
Byeong-Yeol Kim,
Youngjoon Jang,
Joon Son Chung
Abstract:
The goal of this work is to detect new spoken terms defined by users. While most previous works address Keyword Spotting (KWS) as a closed-set classification problem, this limits their transferability to unseen terms. The ability to define custom keywords has advantages in terms of user experience.
In this paper, we propose a metric learning-based training strategy for user-defined keyword spott…
▽ More
The goal of this work is to detect new spoken terms defined by users. While most previous works address Keyword Spotting (KWS) as a closed-set classification problem, this limits their transferability to unseen terms. The ability to define custom keywords has advantages in terms of user experience.
In this paper, we propose a metric learning-based training strategy for user-defined keyword spotting. In particular, we make the following contributions: (1) we construct a large-scale keyword dataset with an existing speech corpus and propose a filtering method to remove data that degrade model training; (2) we propose a metric learning-based two-stage training strategy, and demonstrate that the proposed method improves the performance on the user-defined keyword spotting task by enriching their representations; (3) to facilitate the fair comparison in the user-defined KWS field, we propose unified evaluation protocol and metrics.
Our proposed system does not require an incremental training on the user-defined keywords, and outperforms previous works by a significant margin on the Google Speech Commands dataset using the proposed as well as the existing metrics.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Visually Improved Erosion Algorithm for the Procedural Generation of Tile-based Terrain
Authors:
Fong Yuan Lim,
Yu Wei Tan,
Anand Bhojan
Abstract:
Procedural terrain generation is the process of generating a digital representation of terrain using a computer program or procedure, with little to no human guidance. This paper proposes a procedural terrain generation algorithm based on a graph representation of fluvial erosion that offers several novel improvements over existing algorithms. Namely, the use of a height constraint map with two ty…
▽ More
Procedural terrain generation is the process of generating a digital representation of terrain using a computer program or procedure, with little to no human guidance. This paper proposes a procedural terrain generation algorithm based on a graph representation of fluvial erosion that offers several novel improvements over existing algorithms. Namely, the use of a height constraint map with two types of locally defined constraint strengths; the ability to specify a realistic erosion strength via level of rainfall; and the ability to carve realistic gorges. These novelties allow it to generate more varied and realistic terrain by integrating additional parameters and simulation processes, while being faster and offering more flexibility and ease of use to terrain designers due to the nature and intuitiveness of these new parameters and processes. This paper additionally reviews some common metrics used to evaluate terrain generators, and suggests a completely new one that contributes to a more holistic evaluation.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Review Learning: Alleviating Catastrophic Forgetting with Generative Replay without Generator
Authors:
Jaesung Yoo,
Sunghyuk Choi,
Ye Seul Yang,
Suhyeon Kim,
Jieun Choi,
Dongkyeong Lim,
Yaeji Lim,
Hyung Joon Joo,
Dae Jung Kim,
Rae Woong Park,
Hyeong-Jin Yoon,
Kwangsoo Kim
Abstract:
When a deep learning model is sequentially trained on different datasets, it forgets the knowledge acquired from previous data, a phenomenon known as catastrophic forgetting. It deteriorates performance of the deep learning model on diverse datasets, which is critical in privacy-preserving deep learning (PPDL) applications based on transfer learning (TL). To overcome this, we propose review learni…
▽ More
When a deep learning model is sequentially trained on different datasets, it forgets the knowledge acquired from previous data, a phenomenon known as catastrophic forgetting. It deteriorates performance of the deep learning model on diverse datasets, which is critical in privacy-preserving deep learning (PPDL) applications based on transfer learning (TL). To overcome this, we propose review learning (RL), a generative-replay-based continual learning technique that does not require a separate generator. Data samples are generated from the memory stored within the synaptic weights of the deep learning model which are used to review knowledge acquired from previous datasets. The performance of RL was validated through PPDL experiments. Simulations and real-world medical multi-institutional experiments were conducted using three types of binary classification electronic health record data. In the real-world experiments, the global area under the receiver operating curve was 0.710 for RL and 0.655 for TL. Thus, RL was highly effective in retaining previously learned knowledge.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Cooperative Resource Management in Quantum Key Distribution (QKD) Networks for Semantic Communication
Authors:
Rakpong Kaewpuang,
Minrui Xu,
Wei Yang Bryan Lim,
Dusit Niyato,
Han Yu,
Jiawen Kang,
Xuemin Sherman Shen
Abstract:
Increasing privacy and security concerns in intelligence-native 6G networks require quantum key distribution-secured semantic information communication (QKD-SIC). In QKD-SIC systems, edge devices connected via quantum channels can efficiently encrypt semantic information from the semantic source, and securely transmit the encrypted semantic information to the semantic destination. In this paper, w…
▽ More
Increasing privacy and security concerns in intelligence-native 6G networks require quantum key distribution-secured semantic information communication (QKD-SIC). In QKD-SIC systems, edge devices connected via quantum channels can efficiently encrypt semantic information from the semantic source, and securely transmit the encrypted semantic information to the semantic destination. In this paper, we consider an efficient resource (i.e., QKD and KM wavelengths) sharing problem to support QKD-SIC systems under the uncertainty of semantic information generated by edge devices. In such a system, QKD service providers offer QKD services with different subscription options to the edge devices. As such, to reduce the cost for the edge device users, we propose a QKD resource management framework for the edge devices communicating semantic information. The framework is based on a two-stage stochastic optimization model to achieve optimal QKD deployment. Moreover, to reduce the deployment cost of QKD service providers, QKD resources in the proposed framework can be utilized based on efficient QKD-SIC resource management, including semantic information transmission among edge devices, secret-key provisioning, and cooperation formation among QKD service providers. In detail, the formulated two-stage stochastic optimization model can achieve the optimal QKD-SIC resource deployment while meeting the secret-key requirements for semantic information transmission of edge devices. Moreover, to share the cost of the QKD resource pool among cooperative QKD service providers forming a coalition in a fair and interpretable manner, the proposed framework leverages the concept of Shapley value from cooperative game theory as a solution. Experimental results demonstrate that the proposed framework can reduce the deployment cost by about 40% compared with existing non-cooperative baselines.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Stochastic Resource Allocation for Semantic Communication-aided Virtual Transportation Networks in the Metaverse
Authors:
Wei Chong Ng,
Hongyang Du,
Wei Yang Bryan Lim,
Zehui Xiong,
Dusit Niyato,
Chunyan Miao
Abstract:
The physical-virtual world synchronization to develop the Metaverse will require a massive transmission and exchange of data. In this paper, we introduce semantic communication for the development of virtual transportation networks in the Metaverse. Leveraging the perception capabilities of edge devices, virtual service providers (VSPs) can subscribe to their preferred edge devices to receive the…
▽ More
The physical-virtual world synchronization to develop the Metaverse will require a massive transmission and exchange of data. In this paper, we introduce semantic communication for the development of virtual transportation networks in the Metaverse. Leveraging the perception capabilities of edge devices, virtual service providers (VSPs) can subscribe to their preferred edge devices to receive the semantic data of interest. However, the demands of the VSPs are highly dependent on the users that they are serving. To address the resource allocation problem amid stochastic user demand, we propose a stochastic semantic transmission scheme (SSTS) based on two-stage stochastic integer programming. Using real data captured by edge devices we deploy in Singapore, the simulation results show that SSTS can minimize the transmission cost of the VSPs while accounting for the users' demand uncertainties.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
Constants of motion network
Authors:
Muhammad Firmansyah Kasim,
Yi Heng Lim
Abstract:
The beauty of physics is that there is usually a conserved quantity in an always-changing system, known as the constant of motion. Finding the constant of motion is important in understanding the dynamics of the system, but typically requires mathematical proficiency and manual analytical work. In this paper, we present a neural network that can simultaneously learn the dynamics of the system and…
▽ More
The beauty of physics is that there is usually a conserved quantity in an always-changing system, known as the constant of motion. Finding the constant of motion is important in understanding the dynamics of the system, but typically requires mathematical proficiency and manual analytical work. In this paper, we present a neural network that can simultaneously learn the dynamics of the system and the constants of motion from data. By exploiting the discovered constants of motion, it can produce better predictions on dynamics and can work on a wider range of systems than Hamiltonian-based neural networks. In addition, the training progresses of our method can be used as an indication of the number of constants of motion in a system which could be useful in studying a novel physical system.
△ Less
Submitted 4 October, 2022; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Economics of Semantic Communication System: An Auction Approach
Authors:
Zi Qin Liew,
Hongyang Du,
Wei Yang Bryan Lim,
Zehui Xiong,
Dusit Niyato,
Chunyan Miao,
Dong In Kim
Abstract:
Semantic communication technologies enable wireless edge devices to communicate effectively by transmitting semantic meaning of data. Edge components, such as vehicles in next-generation intelligent transport systems, use well-trained semantic models to encode and decode semantic information extracted from raw and sensor data. However, the limitation in computing resources makes it difficult to su…
▽ More
Semantic communication technologies enable wireless edge devices to communicate effectively by transmitting semantic meaning of data. Edge components, such as vehicles in next-generation intelligent transport systems, use well-trained semantic models to encode and decode semantic information extracted from raw and sensor data. However, the limitation in computing resources makes it difficult to support the training process of accurate semantic models on edge devices. As such, edge devices can buy the pretrained semantic models from semantic model providers, which is called "semantic model trading". Upon collecting semantic information with the semantic models, the edge devices can then sell the extracted semantic information, e.g., information about urban road conditions or traffic signs, to the interested buyers for profit, which is called "semantic information trading". To facilitate both types of the trades, effective incentive mechanisms should be designed. Thus, in this paper, we propose a hierarchical trading system to support both semantic model trading and semantic information trading jointly. The proposed incentive mechanism helps to maximize the revenue of semantic model providers in the semantic model trading, and effectively incentivizes model providers to participate in the development of semantic communication systems. For semantic information trading, our designed auction approach can support the trading between multiple semantic information sellers and buyers, while ensuring individual rationality, incentive compatibility, and budget balance, and moreover, allowing them achieve higher utilities than the baseline method.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
Unifying physical systems' inductive biases in neural ODE using dynamics constraints
Authors:
Yi Heng Lim,
Muhammad Firmansyah Kasim
Abstract:
Conservation of energy is at the core of many physical phenomena and dynamical systems. There have been a significant number of works in the past few years aimed at predicting the trajectory of motion of dynamical systems using neural networks while adhering to the law of conservation of energy. Most of these works are inspired by classical mechanics such as Hamiltonian and Lagrangian mechanics as…
▽ More
Conservation of energy is at the core of many physical phenomena and dynamical systems. There have been a significant number of works in the past few years aimed at predicting the trajectory of motion of dynamical systems using neural networks while adhering to the law of conservation of energy. Most of these works are inspired by classical mechanics such as Hamiltonian and Lagrangian mechanics as well as Neural Ordinary Differential Equations. While these works have been shown to work well in specific domains respectively, there is a lack of a unifying method that is more generally applicable without requiring significant changes to the neural network architectures. In this work, we aim to address this issue by providing a simple method that could be applied to not just energy-conserving systems, but also dissipative systems, by including a different inductive bias in different cases in the form of a regularisation term in the loss function. The proposed method does not require changing the neural network architecture and could form the basis to validate a novel idea, therefore showing promises to accelerate research in this direction.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Examining the Impact of Source-product Congruence and Sponsorship Disclosure on the Communicative Effectiveness of Instagram Influencers
Authors:
Yi Xin Lim,
Weiyu Zhang
Abstract:
Guided by the Persuasion Knowledge Model and the Attribution Theory, this study investigates the perceived source expertise-product attribute congruence and sponsorship disclosure as pertinent factors affecting the communicative effectiveness of influencers. Instagram, with an immense influencer market value projected at USD2.3 billion in 2020, was chosen as the platform context. The study utilise…
▽ More
Guided by the Persuasion Knowledge Model and the Attribution Theory, this study investigates the perceived source expertise-product attribute congruence and sponsorship disclosure as pertinent factors affecting the communicative effectiveness of influencers. Instagram, with an immense influencer market value projected at USD2.3 billion in 2020, was chosen as the platform context. The study utilised a 2 (source expertise) x2 (product category) x2 (sponsorship disclosure) experiment to examine the roles of source-product congruence and sponsorship disclosure in affecting consumers' perception of extrinsic and intrinsic source motives, consumer resistance and ultimately, advertising effectiveness. Results revealed that the presence of a sponsorship disclosure generated stronger perceptions of extrinsic source motives but did not impact consumer resistance and advertising effectiveness, indicating that the activation of consumers' conceptual persuasion knowledge may not necessarily affect attitudinal persuasion knowledge. Source-product congruence, on the other hand, had main impacts on intrinsic motives, consumer resistance and ad effectiveness. In addition, hierarchical multiple regressions found that source-product congruence triggers a multi-stage process where consumers' perception of intrinsic source motives mediates consumer resistance which subsequently, mediates the relationship between source-product congruence and ad effectiveness.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
Comparative Validation of AI and non-AI Methods in MRI Volumetry to Diagnose Parkinsonian Syndromes
Authors:
Joomee Song,
Juyoung Hahm,
Jisoo Lee,
Chae Yeon Lim,
Myung Jin Chung,
Jinyoung Youn,
Jin Whan Cho,
Jong Hyeon Ahn,
Kyung-Su Kim
Abstract:
Automated segmentation and volumetry of brain magnetic resonance imaging (MRI) scans are essential for the diagnosis of Parkinson's disease (PD) and Parkinson's plus syndromes (P-plus). To enhance the diagnostic performance, we adopt deep learning (DL) models in brain segmentation and compared their performance with the gold-standard non-DL method. We collected brain MRI scans of healthy controls…
▽ More
Automated segmentation and volumetry of brain magnetic resonance imaging (MRI) scans are essential for the diagnosis of Parkinson's disease (PD) and Parkinson's plus syndromes (P-plus). To enhance the diagnostic performance, we adopt deep learning (DL) models in brain segmentation and compared their performance with the gold-standard non-DL method. We collected brain MRI scans of healthy controls (n=105) and patients with PD (n=105), multiple systemic atrophy (n=132), and progressive supranuclear palsy (n=69) at Samsung Medical Center from January 2017 to December 2020. Using the gold-standard non-DL model, FreeSurfer (FS), we segmented six brain structures: midbrain, pons, caudate, putamen, pallidum, and third ventricle, and considered them as annotating data for DL models, the representative V-Net and UNETR. The Dice scores and area under the curve (AUC) for differentiating normal, PD, and P-plus cases were calculated. The segmentation times of V-Net and UNETR for the six brain structures per patient were 3.48 +- 0.17 and 48.14 +- 0.97 s, respectively, being at least 300 times faster than FS (15,735 +- 1.07 s). Dice scores of both DL models were sufficiently high (>0.85), and their AUCs for disease classification were superior to that of FS. For classification of normal vs. P-plus and PD vs. multiple systemic atrophy (cerebellar type), the DL models and FS showed AUCs above 0.8. DL significantly reduces the analysis time without compromising the performance of brain segmentation and differential diagnosis. Our findings may contribute to the adoption of DL brain MRI segmentation in clinical settings and advance brain research.
△ Less
Submitted 23 July, 2022;
originally announced July 2022.
-
Enhancing Generative Networks for Chest Anomaly Localization through Automatic Registration-Based Unpaired-to-Pseudo-Paired Training Data Translation
Authors:
Kyungsu Kim,
Seong Je Oh,
Chae Yeon Lim,
Ju Hwan Lee,
Tae Uk Kim,
Myung Jin Chung
Abstract:
Image translation based on a generative adversarial network (GAN-IT) is a promising method for the precise localization of abnormal regions in chest X-ray images (AL-CXR) even without the pixel-level annotation. However, heterogeneous unpaired datasets undermine existing methods to extract key features and distinguish normal from abnormal cases, resulting in inaccurate and unstable AL-CXR. To addr…
▽ More
Image translation based on a generative adversarial network (GAN-IT) is a promising method for the precise localization of abnormal regions in chest X-ray images (AL-CXR) even without the pixel-level annotation. However, heterogeneous unpaired datasets undermine existing methods to extract key features and distinguish normal from abnormal cases, resulting in inaccurate and unstable AL-CXR. To address this problem, we propose an improved two-stage GAN-IT involving registration and data augmentation. For the first stage, we introduce an advanced deep-learning-based registration technique that virtually and reasonably converts unpaired data into paired data for learning registration maps, by sequentially utilizing linear-based global and uniform coordinate transformation and AI-based non-linear coordinate fine-tuning. This approach enables independent and complex coordinate transformation of each detailed location of the lung while recognizing the entire lung structure, thereby achieving higher registration performance with resolving inherent artifacts caused by unpaired conditions. For the second stage, we apply data augmentation to diversify anomaly locations by swapping the left and right lung regions on the uniform registered frames, further improving the performance by alleviating imbalance in data distribution showing left and right lung lesions. The proposed method is model agnostic and shows consistent AL-CXR performance improvement in representative AI models. Therefore, we believe GAN-IT for AL-CXR can be clinically implemented by using our basis framework, even if learning data are scarce or difficult for the pixel-level disease annotation.
△ Less
Submitted 15 June, 2024; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Tackling Data Scarcity with Transfer Learning: A Case Study of Thickness Characterization from Optical Spectra of Perovskite Thin Films
Authors:
Siyu Isaac Parker Tian,
Zekun Ren,
Selvaraj Venkataraj,
Yuanhang Cheng,
Daniil Bash,
Felipe Oviedo,
J. Senthilnath,
Vijila Chellappan,
Yee-Fun Lim,
Armin G. Aberle,
Benjamin P MacLeod,
Fraser G. L. Parlane,
Curtis P. Berlinguette,
Qianxiao Li,
Tonio Buonassisi,
Zhe Liu
Abstract:
Transfer learning increasingly becomes an important tool in handling data scarcity often encountered in machine learning. In the application of high-throughput thickness as a downstream process of the high-throughput optimization of optoelectronic thin films with autonomous workflows, data scarcity occurs especially for new materials. To achieve high-throughput thickness characterization, we propo…
▽ More
Transfer learning increasingly becomes an important tool in handling data scarcity often encountered in machine learning. In the application of high-throughput thickness as a downstream process of the high-throughput optimization of optoelectronic thin films with autonomous workflows, data scarcity occurs especially for new materials. To achieve high-throughput thickness characterization, we propose a machine learning model called thicknessML that predicts thickness from UV-Vis spectrophotometry input and an overarching transfer learning workflow. We demonstrate the transfer learning workflow from generic source domain of generic band-gapped materials to specific target domain of perovskite materials, where the target domain data only come from limited number (18) of refractive indices from literature. The target domain can be easily extended to other material classes with a few literature data. Defining thickness prediction accuracy to be within-10% deviation, thicknessML achieves 92.2% (with a deviation of 3.6%) accuracy with transfer learning compared to 81.8% (with a deviation of 3.6%) 11.7% without (lower mean and larger standard deviation). Experimental validation on six deposited perovskite films also corroborates the efficacy of the proposed workflow by yielding a 10.5% mean absolute percentage error (MAPE).
△ Less
Submitted 20 December, 2022; v1 submitted 14 June, 2022;
originally announced July 2022.