-
Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting
Authors:
Jinning Li,
Jiachen Li,
Sangjae Bae,
David Isele
Abstract:
Deep learning-based trajectory prediction models for autonomous driving often struggle with generalization to out-of-distribution (OOD) scenarios, sometimes performing worse than simple rule-based models. To address this limitation, we propose a novel framework, Adaptive Prediction Ensemble (APE), which integrates deep learning and rule-based prediction experts. A learned routing function, trained…
▽ More
Deep learning-based trajectory prediction models for autonomous driving often struggle with generalization to out-of-distribution (OOD) scenarios, sometimes performing worse than simple rule-based models. To address this limitation, we propose a novel framework, Adaptive Prediction Ensemble (APE), which integrates deep learning and rule-based prediction experts. A learned routing function, trained concurrently with the deep learning model, dynamically selects the most reliable prediction based on the input scenario. Our experiments on large-scale datasets, including Waymo Open Motion Dataset (WOMD) and Argoverse, demonstrate improvement in zero-shot generalization across datasets. We show that our method outperforms individual prediction models and other variants, particularly in long-horizon prediction and scenarios with a high proportion of OOD data. This work highlights the potential of hybrid approaches for robust and generalizable motion prediction in autonomous driving.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
A Very Effective and Simple Diffusion Reconstruction for the Diluted Ising Model
Authors:
Stefano Bae,
Enzo Marinari,
Federico Ricci-Tersenghi
Abstract:
Diffusion-based generative models are machine learning models that use diffusion processes to learn the probability distribution of high-dimensional data. In recent years, they have become extremely successful in generating multimedia content. However, it is still unknown if such models can be used to generate high-quality datasets of physical models. In this work, we use a Landau-Ginzburg-like di…
▽ More
Diffusion-based generative models are machine learning models that use diffusion processes to learn the probability distribution of high-dimensional data. In recent years, they have become extremely successful in generating multimedia content. However, it is still unknown if such models can be used to generate high-quality datasets of physical models. In this work, we use a Landau-Ginzburg-like diffusion model to infer the distribution of a $2D$ bond-diluted Ising model. Our approach is simple and effective, and we show that the generated samples reproduce correctly the statistical and critical properties of the physical model.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Data-driven Nucleus Subclassification on Colon H&E using Style-transferred Digital Pathology
Authors:
Lucas W. Remedios,
Shunxing Bao,
Samuel W. Remedios,
Ho Hin Lee,
Leon Y. Cai,
Thomas Li,
Ruining Deng,
Nancy R. Newlin,
Adam M. Saunders,
Can Cui,
Jia Li,
Qi Liu,
Ken S. Lau,
Joseph T. Roland,
Mary K Washington,
Lori A. Coburn,
Keith T. Wilson,
Yuankai Huo,
Bennett A. Landman
Abstract:
Understanding the way cells communicate, co-locate, and interrelate is essential to furthering our understanding of how the body functions. H&E is widely available, however, cell subtyping often requires expert knowledge and the use of specialized stains. To reduce the annotation burden, AI has been proposed for the classification of cells on H&E. For example, the recent Colon Nucleus Identificati…
▽ More
Understanding the way cells communicate, co-locate, and interrelate is essential to furthering our understanding of how the body functions. H&E is widely available, however, cell subtyping often requires expert knowledge and the use of specialized stains. To reduce the annotation burden, AI has been proposed for the classification of cells on H&E. For example, the recent Colon Nucleus Identification and Classification (CoNIC) Challenge focused on labeling 6 cell types on H&E of the colon. However, the CoNIC Challenge was unable to classify epithelial subtypes (progenitor, enteroendocrine, goblet), lymphocyte subtypes (B, helper T, cytotoxic T), and connective subtypes (fibroblasts). We use inter-modality learning to label previously un-labelable cell types on H&E. We take advantage of multiplexed immunofluorescence (MxIF) histology to label 14 cell subclasses. We performed style transfer on the same MxIF tissues to synthesize realistic virtual H&E which we paired with the MxIF-derived cell subclassification labels. We evaluated the efficacy of using a supervised learning scheme where the input was realistic-quality virtual H&E and the labels were MxIF-derived cell subclasses. We assessed our model on private virtual H&E and public real H&E. On virtual H&E, we were able to classify helper T cells and epithelial progenitors with positive predictive values of $0.34 \pm 0.15$ (prevalence $0.03 \pm 0.01$) and $0.47 \pm 0.1$ (prevalence $0.07 \pm 0.02$) respectively, when using ground truth centroid information. On real H&E we could classify helper T cells and epithelial progenitors with upper bound positive predictive values of $0.43 \pm 0.03$ (parent class prevalence 0.21) and $0.94 \pm 0.02$ (parent class prevalence 0.49) when using ground truth centroid information. This is the first work to provide cell type classification for helper T and epithelial progenitor nuclei on H&E.
△ Less
Submitted 15 May, 2024;
originally announced July 2024.
-
EA4RCA:Efficient AIE accelerator design framework for Regular Communication-Avoiding Algorithm
Authors:
W. B. Zhang,
Y. Q. Liu,
T. H. Zang,
Z. S. Bao
Abstract:
With the introduction of the Adaptive Intelligence Engine (AIE), the Versal Adaptive Compute Acceleration Platform (Versal ACAP) has garnered great attention. However, the current focus of Vitis Libraries and limited research has mainly been on how to invoke AIE modules, without delving into a thorough discussion on effectively utilizing AIE in its typical use cases. As a result, the widespread ad…
▽ More
With the introduction of the Adaptive Intelligence Engine (AIE), the Versal Adaptive Compute Acceleration Platform (Versal ACAP) has garnered great attention. However, the current focus of Vitis Libraries and limited research has mainly been on how to invoke AIE modules, without delving into a thorough discussion on effectively utilizing AIE in its typical use cases. As a result, the widespread adoption of Versal ACAP has been restricted. The Communication Avoidance (CA) algorithm is considered a typical application within the AIE architecture. Nevertheless, the effective utilization of AIE in CA applications remains an area that requires further exploration. We propose a top-down customized design framework, EA4RCA(Efficient AIE accelerator design framework for regular Communication-Avoid Algorithm), specifically tailored for CA algorithms with regular communication patterns, and equipped with AIE Graph Code Generator software to accelerate the AIE design process. The primary objective of this framework is to maximize the performance of AIE while incorporating high-speed data streaming services. Experiments show that for the RCA algorithm Filter2D and Matrix Multiple (MM) with lower communication requirements and the RCA algorithm FFT with higher communication requirements, the accelerators implemented by the RA4RCA framework achieve the highest throughput improvements of 22.19x, 1.05x and 3.88x compared with the current highest performance acceleration scheme (SOTA), and the highest energy efficiency improvements of 6.11x, 1.30x and 7.00x.
△ Less
Submitted 8 July, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Authors:
Sungnyun Kim,
Kangwook Jang,
Sangmin Bae,
Hoirin Kim,
Se-Young Yun
Abstract:
Audio-visual speech recognition (AVSR) aims to transcribe human speech using both audio and video modalities. In practical environments with noise-corrupted audio, the role of video information becomes crucial. However, prior works have primarily focused on enhancing audio features in AVSR, overlooking the importance of video features. In this study, we strengthen the video features by learning th…
▽ More
Audio-visual speech recognition (AVSR) aims to transcribe human speech using both audio and video modalities. In practical environments with noise-corrupted audio, the role of video information becomes crucial. However, prior works have primarily focused on enhancing audio features in AVSR, overlooking the importance of video features. In this study, we strengthen the video features by learning three temporal dynamics in video data: context order, playback direction, and the speed of video frames. Cross-modal attention modules are introduced to enrich video features with audio information so that speech variability can be taken into account when training on the video temporal dynamics. Based on our approach, we achieve the state-of-the-art performance on the LRS2 and LRS3 AVSR benchmarks for the noise-dominant settings. Our approach excels in scenarios especially for babble and speech noise, indicating the ability to distinguish the speech signal that should be recognized from lip movements in the video modality. We support the validity of our methodology by offering the ablation experiments for the temporal dynamics losses and the cross-modal attention architecture design.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
The evolution and detection of vector superradiant instabilities
Authors:
Yin-Da Guo,
Nayun Jia,
Shou-Shan Bao,
Hong Zhang,
Xin Zhang
Abstract:
Ultralight vectors can extract energy and angular momentum from a Kerr black hole (BH) due to superradiant instability, resulting in the formation of a BH-condensate system. In this work, we carefully investigate the evolution of this system numerically with multiple superradiant modes. Simple formulas are obtained to estimate important timescales, maximum masses of different modes, as well as the…
▽ More
Ultralight vectors can extract energy and angular momentum from a Kerr black hole (BH) due to superradiant instability, resulting in the formation of a BH-condensate system. In this work, we carefully investigate the evolution of this system numerically with multiple superradiant modes. Simple formulas are obtained to estimate important timescales, maximum masses of different modes, as well as the BH mass and spin at various times. Due to the coexistence of modes with small frequency differences, the BH-condensate system emits gravitational waves with a unique beat signature, which could be directly observed by current and projected interferometers. Besides, the current BH spin-mass data from the binary BH merger events already excludes the vector mass in the range $5\times 10^{-15}\ \mathrm{eV} <μ< 9\times 10^{-12}\ \mathrm{eV}$.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis
Authors:
Ruining Deng,
Quan Liu,
Can Cui,
Tianyuan Yao,
Juming Xiong,
Shunxing Bao,
Hao Li,
Mengmeng Yin,
Yu Wang,
Shilin Zhao,
Yucheng Tang,
Haichun Yang,
Yuankai Huo
Abstract:
Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel…
▽ More
Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel Hierarchical Adaptive Taxonomy Segmentation (HATs) method, which is designed to thoroughly segment panoramic views of kidney structures by leveraging detailed anatomical insights. Our approach entails (1) the innovative HATs technique which translates spatial relationships among 15 distinct object classes into a versatile "plug-and-play" loss function that spans across regions, functional units, and cells, (2) the incorporation of anatomical hierarchies and scale considerations into a unified simple matrix representation for all panoramic entities, (3) the adoption of the latest AI foundation model (EfficientSAM) as a feature extraction tool to boost the model's adaptability, yet eliminating the need for manual prompt generation in conventional segment anything model (SAM). Experimental findings demonstrate that the HATs method offers an efficient and effective strategy for integrating clinical insights and imaging precedents into a unified segmentation model across more than 15 categories. The official implementation is publicly available at https://github.com/hrlblab/HATs.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning
Authors:
Muhammad Salman Ali,
Maryam Qamar,
Sung-Ho Bae,
Enzo Tartaglione
Abstract:
In recent times, the utilization of 3D models has gained traction, owing to the capacity for end-to-end training initially offered by Neural Radiance Fields and more recently by 3D Gaussian Splatting (3DGS) models. The latter holds a significant advantage by inherently easing rapid convergence during training and offering extensive editability. However, despite rapid advancements, the literature s…
▽ More
In recent times, the utilization of 3D models has gained traction, owing to the capacity for end-to-end training initially offered by Neural Radiance Fields and more recently by 3D Gaussian Splatting (3DGS) models. The latter holds a significant advantage by inherently easing rapid convergence during training and offering extensive editability. However, despite rapid advancements, the literature still lives in its infancy regarding the scalability of these models. In this study, we take some initial steps in addressing this gap, showing an approach that enables both the memory and computational scalability of such models. Specifically, we propose "Trimming the fat", a post-hoc gradient-informed iterative pruning technique to eliminate redundant information encoded in the model. Our experimental findings on widely acknowledged benchmarks attest to the effectiveness of our approach, revealing that up to 75% of the Gaussians can be removed while maintaining or even improving upon baseline performance. Our approach achieves around 50$\times$ compression while preserving performance similar to the baseline model, and is able to speed-up computation up to 600~FPS.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
FacePsy: An Open-Source Affective Mobile Sensing System -- Analyzing Facial Behavior and Head Gesture for Depression Detection in Naturalistic Settings
Authors:
Rahul Islam,
Sang Won Bae
Abstract:
Depression, a prevalent and complex mental health issue affecting millions worldwide, presents significant challenges for detection and monitoring. While facial expressions have shown promise in laboratory settings for identifying depression, their potential in real-world applications remains largely unexplored due to the difficulties in developing efficient mobile systems. In this study, we aim t…
▽ More
Depression, a prevalent and complex mental health issue affecting millions worldwide, presents significant challenges for detection and monitoring. While facial expressions have shown promise in laboratory settings for identifying depression, their potential in real-world applications remains largely unexplored due to the difficulties in developing efficient mobile systems. In this study, we aim to introduce FacePsy, an open-source mobile sensing system designed to capture affective inferences by analyzing sophisticated features and generating real-time data on facial behavior landmarks, eye movements, and head gestures -- all within the naturalistic context of smartphone usage with 25 participants. Through rigorous development, testing, and optimization, we identified eye-open states, head gestures, smile expressions, and specific Action Units (2, 6, 7, 12, 15, and 17) as significant indicators of depressive episodes (AUROC=81%). Our regression model predicting PHQ-9 scores achieved moderate accuracy, with a Mean Absolute Error of 3.08. Our findings offer valuable insights and implications for enhancing deployable and usable mobile affective sensing systems, ultimately improving mental health monitoring, prediction, and just-in-time adaptive interventions for researchers and developers in healthcare.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records
Authors:
Yeonsu Kwon,
Jiho Kim,
Gyubok Lee,
Seongsu Bae,
Daeun Kyung,
Wonchul Cha,
Tom Pollard,
Alistair Johnson,
Edward Choi
Abstract:
Electronic Health Records (EHRs) are integral for storing comprehensive patient medical records, combining structured data (e.g., medications) with detailed clinical notes (e.g., physician notes). These elements are essential for straightforward data retrieval and provide deep, contextual insights into patient care. However, they often suffer from discrepancies due to unintuitive EHR system design…
▽ More
Electronic Health Records (EHRs) are integral for storing comprehensive patient medical records, combining structured data (e.g., medications) with detailed clinical notes (e.g., physician notes). These elements are essential for straightforward data retrieval and provide deep, contextual insights into patient care. However, they often suffer from discrepancies due to unintuitive EHR system designs and human errors, posing serious risks to patient safety. To address this, we developed EHRCon, a new dataset and task specifically designed to ensure data consistency between structured tables and unstructured notes in EHRs. EHRCon was crafted in collaboration with healthcare professionals using the MIMIC-III EHR dataset, and includes manual annotations of 3,943 entities across 105 clinical notes checked against database entries for consistency. EHRCon has two versions, one using the original MIMIC-III schema, and another using the OMOP CDM schema, in order to increase its applicability and generalizability. Furthermore, leveraging the capabilities of large language models, we introduce CheckEHR, a novel framework for verifying the consistency between clinical notes and database tables. CheckEHR utilizes an eight-stage process and shows promising results in both few-shot and zero-shot settings. The code is available at https://github.com/dustn1259/EHRCon.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Revolutionizing Mental Health Support: An Innovative Affective Mobile Framework for Dynamic, Proactive, and Context-Adaptive Conversational Agents
Authors:
Rahul Islam,
Sang Won Bae
Abstract:
As we build towards developing interactive systems that can recognize human emotional states and respond to individual needs more intuitively and empathetically in more personalized and context-aware computing time. This is especially important regarding mental health support, with a rising need for immediate, non-intrusive help tailored to each individual. Individual mental health and the complex…
▽ More
As we build towards developing interactive systems that can recognize human emotional states and respond to individual needs more intuitively and empathetically in more personalized and context-aware computing time. This is especially important regarding mental health support, with a rising need for immediate, non-intrusive help tailored to each individual. Individual mental health and the complex nature of human emotions call for novel approaches beyond conventional proactive and reactive-based chatbot approaches. In this position paper, we will explore how to create Chatbots that can sense, interpret, and intervene in emotional signals by combining real-time facial expression analysis, physiological signal interpretation, and language models. This is achieved by incorporating facial affect detection into existing practical and ubiquitous passive sensing contexts, thus empowering them with the capabilities to the ubiquity of sensing behavioral primitives to recognize, interpret, and respond to human emotions. In parallel, the system employs cognitive-behavioral therapy tools such as cognitive reframing and mood journals, leveraging the therapeutic intervention potential of Chatbots in mental health contexts. Finally, we propose a project to build a system that enhances the emotional understanding of Chatbots to engage users in chat-based intervention, thereby helping manage their mood.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation
Authors:
Xin Yu,
Qi Yang,
Han Liu,
Ho Hin Lee,
Yucheng Tang,
Lucas W. Remedios,
Michael E. Kim,
Rendong Zhang,
Shunxing Bao,
Yuankai Huo,
Ann Zenobia Moore,
Luigi Ferrucci,
Bennett A. Landman
Abstract:
2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmenta…
▽ More
2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmentation results. In this work, we propose a novel 3D-to-2D distillation framework, leveraging pre-trained 3D models to enhance 2D single-slice segmentation. Specifically, we extract the prediction distribution centroid from the 3D representations, to guide the 2D student by learning intra- and inter-class correlation. Unlike traditional knowledge distillation methods that require the same data input, our approach employs unpaired 3D CT scans with any contrast to guide the 2D student model. Experiments conducted on 707 subjects from the single-slice Baltimore Longitudinal Study of Aging (BLSA) dataset demonstrate that state-of-the-art 2D multi-organ segmentation methods can benefit from the 3D teacher model, achieving enhanced performance in single-slice multi-organ segmentation. Notably, our approach demonstrates considerable efficacy in low-data regimes, outperforming the model trained with all available training subjects even when utilizing only 200 training subjects. Thus, this work underscores the potential to alleviate manual annotation burdens.
△ Less
Submitted 12 July, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation
Authors:
Seo Hyun Kim,
Kai Tzu-iunn Ong,
Taeyoon Kwon,
Namyoung Kim,
Keummin Ka,
SeongHyeon Bae,
Yohan Jo,
Seung-won Hwang,
Dongha Lee,
Jinyoung Yeo
Abstract:
Large language models (LLMs) are capable of processing lengthy dialogue histories during prolonged interaction with users without additional memory modules; however, their responses tend to overlook or incorrectly recall information from the past. In this paper, we revisit memory-augmented response generation in the era of LLMs. While prior work focuses on getting rid of outdated memories, we argu…
▽ More
Large language models (LLMs) are capable of processing lengthy dialogue histories during prolonged interaction with users without additional memory modules; however, their responses tend to overlook or incorrectly recall information from the past. In this paper, we revisit memory-augmented response generation in the era of LLMs. While prior work focuses on getting rid of outdated memories, we argue that such memories can provide contextual cues that help dialogue systems understand the development of past events and, therefore, benefit response generation. We present Theanine, a framework that augments LLMs' response generation with memory timelines -- series of memories that demonstrate the development and causality of relevant past events. Along with Theanine, we introduce TeaFarm, a counterfactual-driven question-answering pipeline addressing the limitation of G-Eval in long-term conversations. Supplementary videos of our methods and the TeaBag dataset for TeaFarm evaluation are in https://theanine-693b0.web.app/.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Spin and lattice dynamics of a van der Waals antiferromagnet MnPSe$_3$
Authors:
Junbo Liao,
Zhentao Huang,
Yanyan Shangguan,
Bo Zhang,
Shufan Cheng,
Hao Xu,
Ryoichi Kajimoto,
Kazuya Kamazawa,
Song Bao,
Jinsheng Wen
Abstract:
Antiferromagnetic van der Waals family $\rm \textit{M}P\textit{X}_{3}\ (M=Fe,\ Mn,\ Co,\text{ and}\ Ni; X=S\text{ and}\ Se)$ have attracted significant research attention due to the possibility of realizing long-range magnetic order down to the monolayer limit. Here, we perform inelastic neutron scattering measurements on single crystal samples of MnPSe$_3$, a member of the…
▽ More
Antiferromagnetic van der Waals family $\rm \textit{M}P\textit{X}_{3}\ (M=Fe,\ Mn,\ Co,\text{ and}\ Ni; X=S\text{ and}\ Se)$ have attracted significant research attention due to the possibility of realizing long-range magnetic order down to the monolayer limit. Here, we perform inelastic neutron scattering measurements on single crystal samples of MnPSe$_3$, a member of the $\rm \textit{M}P\textit{X}_{3}$ family, to study the spin dynamics and determine the effective spin model. The excited magnon bands are well characterized by a spin model, which includes a Heisenberg term with three intraplane exchange parameters ($J_{1}=-0.73$~meV, $J_{2}=-0.014$~meV, $J_{3}=-0.43$~meV) and one interplane parameter ($J_{c}=-0.054$~meV), and an easy-plane single-ion anisotropy term ($D=-0.035$~meV). Additionally, we observe the intersection of the magnon and phonon bands but no anomalous spectral features induced by the formation of magnon-phonon hybrid excitations at the intersecting region. We discuss possible reasons for the absence of such hybrid excitations in MnPSe$_3$.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Authors:
Namgyu Ho,
Sangmin Bae,
Taehyeon Kim,
Hyunjik Jo,
Yireun Kim,
Tal Schuster,
Adam Fisch,
James Thorne,
Se-Young Yun
Abstract:
This paper presents the Block Transformer architecture which adopts hierarchical global-to-local modeling to autoregressive transformers to mitigate the inference bottlenecks of self-attention. To apply self-attention, the key-value (KV) cache of all previous sequences must be retrieved from memory at every decoding step. Thereby, this KV cache IO becomes a significant bottleneck in batch inferenc…
▽ More
This paper presents the Block Transformer architecture which adopts hierarchical global-to-local modeling to autoregressive transformers to mitigate the inference bottlenecks of self-attention. To apply self-attention, the key-value (KV) cache of all previous sequences must be retrieved from memory at every decoding step. Thereby, this KV cache IO becomes a significant bottleneck in batch inference. We notice that these costs stem from applying self-attention on the global context, therefore we isolate the expensive bottlenecks of global modeling to lower layers and apply fast local modeling in upper layers. To mitigate the remaining costs in the lower layers, we aggregate input tokens into fixed size blocks and then apply self-attention at this coarse level. Context information is aggregated into a single embedding to enable upper layers to decode the next block of tokens, without global attention. Free of global attention bottlenecks, the upper layers can fully utilize the compute hardware to maximize inference throughput. By leveraging global and local modules, the Block Transformer architecture demonstrates 10-20x gains in inference throughput compared to vanilla transformers with equivalent perplexity. Our work introduces a new approach to optimize language model inference through novel application of global-to-local modeling. Code is available at https://github.com/itsnamgyu/block-transformer.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Constraining Axion-Gluon Coupling in Mono-hadron Processes
Authors:
Shou-shan Bao,
Wenhai Gao,
Hong Zhang,
Jian Zhou
Abstract:
The axion-gluon coupling can be constrained directly through hard exclusive processes at the LHC. Specifically, we study the associated production of a long-lived axion with a $ρ^0$ meson in ultra-peripheral $AA$ collisions and in $pp$ collisions. With the axion escaped from the detector, the final state is characterized by a mono-hadron signature. The main background in our analysis originates fr…
▽ More
The axion-gluon coupling can be constrained directly through hard exclusive processes at the LHC. Specifically, we study the associated production of a long-lived axion with a $ρ^0$ meson in ultra-peripheral $AA$ collisions and in $pp$ collisions. With the axion escaped from the detector, the final state is characterized by a mono-hadron signature. The main background in our analysis originates from the $ρ^0+π^0$ process, where the photons from the $π^0$ decay are undetected due to limited detector performance. Our analysis yields an exclusion limit of the axion-gluon coupling that is comparable to the limit obtained from the mono-jet process at the LHC.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Why In-Context Learning Transformers are Tabular Data Classifiers
Authors:
Felix den Breejen,
Sangmin Bae,
Stephen Cha,
Se-Young Yun
Abstract:
The recently introduced TabPFN pretrains an In-Context Learning (ICL) transformer on synthetic data to perform tabular data classification. As synthetic data does not share features or labels with real-world data, the underlying mechanism that contributes to the success of this method remains unclear. This study provides an explanation by demonstrating that ICL-transformers acquire the ability to…
▽ More
The recently introduced TabPFN pretrains an In-Context Learning (ICL) transformer on synthetic data to perform tabular data classification. As synthetic data does not share features or labels with real-world data, the underlying mechanism that contributes to the success of this method remains unclear. This study provides an explanation by demonstrating that ICL-transformers acquire the ability to create complex decision boundaries during pretraining. To validate our claim, we develop a novel forest dataset generator which creates datasets that are unrealistic, but have complex decision boundaries. Our experiments confirm the effectiveness of ICL-transformers pretrained on this data. Furthermore, we create TabForestPFN, the ICL-transformer pretrained on both the original TabPFN synthetic dataset generator and our forest dataset generator. By fine-tuning this model, we reach the current state-of-the-art on tabular data classification. Code is available at https://github.com/FelixdenBreejen/TabForestPFN.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection
Authors:
Feiran Li,
Qianqian Xu,
Shilong Bao,
Zhiyong Yang,
Runmin Cong,
Xiaochun Cao,
Qingming Huang
Abstract:
This paper explores the size-invariance of evaluation metrics in Salient Object Detection (SOD), especially when multiple targets of diverse sizes co-exist in the same image. We observe that current metrics are size-sensitive, where larger objects are focused, and smaller ones tend to be ignored. We argue that the evaluation should be size-invariant because bias based on size is unjustified withou…
▽ More
This paper explores the size-invariance of evaluation metrics in Salient Object Detection (SOD), especially when multiple targets of diverse sizes co-exist in the same image. We observe that current metrics are size-sensitive, where larger objects are focused, and smaller ones tend to be ignored. We argue that the evaluation should be size-invariant because bias based on size is unjustified without additional semantic information. In pursuit of this, we propose a generic approach that evaluates each salient object separately and then combines the results, effectively alleviating the imbalance. We further develop an optimization framework tailored to this goal, achieving considerable improvements in detecting objects of different sizes. Theoretically, we provide evidence supporting the validity of our new metrics and present the generalization analysis of SOD. Extensive experiments demonstrate the effectiveness of our method. The code is available at https://github.com/Ferry-Li/SI-SOD.
△ Less
Submitted 27 May, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
ReconBoost: Boosting Can Achieve Modality Reconcilement
Authors:
Cong Hua,
Qianqian Xu,
Shilong Bao,
Zhiyong Yang,
Qingming Huang
Abstract:
This paper explores a novel multi-modal alternating learning paradigm pursuing a reconciliation between the exploitation of uni-modal features and the exploration of cross-modal interactions. This is motivated by the fact that current paradigms of multi-modal learning tend to explore multi-modal features simultaneously. The resulting gradient prohibits further exploitation of the features in the w…
▽ More
This paper explores a novel multi-modal alternating learning paradigm pursuing a reconciliation between the exploitation of uni-modal features and the exploration of cross-modal interactions. This is motivated by the fact that current paradigms of multi-modal learning tend to explore multi-modal features simultaneously. The resulting gradient prohibits further exploitation of the features in the weak modality, leading to modality competition, where the dominant modality overpowers the learning process. To address this issue, we study the modality-alternating learning paradigm to achieve reconcilement. Specifically, we propose a new method called ReconBoost to update a fixed modality each time. Herein, the learning objective is dynamically adjusted with a reconcilement regularization against competition with the historical models. By choosing a KL-based reconcilement, we show that the proposed method resembles Friedman's Gradient-Boosting (GB) algorithm, where the updated learner can correct errors made by others and help enhance the overall performance. The major difference with the classic GB is that we only preserve the newest model for each modality to avoid overfitting caused by ensembling strong learners. Furthermore, we propose a memory consolidation scheme and a global rectification scheme to make this strategy more effective. Experiments over six multi-modal benchmarks speak to the efficacy of the method. We release the code at https://github.com/huacong/ReconBoost.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition
Authors:
Zhiyong Yang,
Qianqian Xu,
Zitai Wang,
Sicong Li,
Boyu Han,
Shilong Bao,
Xiaochun Cao,
Qingming Huang
Abstract:
This paper explores test-agnostic long-tail recognition, a challenging long-tail task where the test label distributions are unknown and arbitrarily imbalanced. We argue that the variation in these distributions can be broken down hierarchically into global and local levels. The global ones reflect a broad range of diversity, while the local ones typically arise from milder changes, often focused…
▽ More
This paper explores test-agnostic long-tail recognition, a challenging long-tail task where the test label distributions are unknown and arbitrarily imbalanced. We argue that the variation in these distributions can be broken down hierarchically into global and local levels. The global ones reflect a broad range of diversity, while the local ones typically arise from milder changes, often focused on a particular neighbor. Traditional methods predominantly use a Mixture-of-Expert (MoE) approach, targeting a few fixed test label distributions that exhibit substantial global variations. However, the local variations are left unconsidered. To address this issue, we propose a new MoE strategy, $\mathsf{DirMixE}$, which assigns experts to different Dirichlet meta-distributions of the label distribution, each targeting a specific aspect of local variations. Additionally, the diversity among these Dirichlet meta-distributions inherently captures global variations. This dual-level approach also leads to a more stable objective function, allowing us to sample different test distributions better to quantify the mean and variance of performance outcomes. Theoretically, we show that our proposed objective benefits from enhanced generalization by virtue of the variance-based regularization. Comprehensive experiments across multiple benchmarks confirm the effectiveness of $\mathsf{DirMixE}$. The code is available at \url{https://github.com/scongl/DirMixE}.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records
Authors:
Gyubok Lee,
Sunjun Kweon,
Seongsu Bae,
Edward Choi
Abstract:
Electronic Health Records (EHRs) are relational databases that store the entire medical histories of patients within hospitals. They record numerous aspects of patients' medical care, from hospital admission and diagnosis to treatment and discharge. While EHRs are vital sources of clinical data, exploring them beyond a predefined set of queries requires skills in query languages like SQL. To make…
▽ More
Electronic Health Records (EHRs) are relational databases that store the entire medical histories of patients within hospitals. They record numerous aspects of patients' medical care, from hospital admission and diagnosis to treatment and discharge. While EHRs are vital sources of clinical data, exploring them beyond a predefined set of queries requires skills in query languages like SQL. To make information retrieval more accessible, one strategy is to build a question-answering system, possibly leveraging text-to-SQL models that can automatically translate natural language questions into corresponding SQL queries and use these queries to retrieve the answers. The EHRSQL 2024 shared task aims to advance and promote research in developing a question-answering system for EHRs using text-to-SQL modeling, capable of reliably providing requested answers to various healthcare professionals to improve their clinical work processes and satisfy their needs. Among more than 100 participants who applied to the shared task, eight teams were formed and completed the entire shared task requirement and demonstrated a wide range of methods to effectively solve this task. In this paper, we describe the task of reliable text-to-SQL modeling, the dataset, and the methods and results of the participants. We hope this shared task will spur further research and insights into developing reliable question-answering systems for EHRs.
△ Less
Submitted 23 May, 2024; v1 submitted 4 May, 2024;
originally announced May 2024.
-
ATLS: Automated Trailer Loading for Surface Vessels
Authors:
Amer Abughaida,
Meet Gandhi,
Jun Heo,
Vaishnav Tadiparthi,
Yosuke Sakamoto,
Joohyun Woo,
Sangjae Bae
Abstract:
Automated docking technologies of marine boats have been enlightened by an increasing number of literature. This paper contributes to the literature by proposing a mathematical framework that automates "trailer loading" in the presence of wind disturbances, which is unexplored despite its importance to boat owners. The comprehensive pipeline of localization, system identification, and trajectory o…
▽ More
Automated docking technologies of marine boats have been enlightened by an increasing number of literature. This paper contributes to the literature by proposing a mathematical framework that automates "trailer loading" in the presence of wind disturbances, which is unexplored despite its importance to boat owners. The comprehensive pipeline of localization, system identification, and trajectory optimization is structured, followed by several techniques to improve performance reliability. The performance of the proposed method was demonstrated with a commercial pontoon boat in Michigan, in 2023, securing a success rate of 80\% in the presence of perception errors and wind disturbance. This result indicates the strong potential of the proposed pipeline, effectively accommodating the wind effect.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Field-of-View Extension for Diffusion MRI via Deep Generative Models
Authors:
Chenyu Gao,
Shunxing Bao,
Michael Kim,
Nancy Newlin,
Praitayini Kanakaraj,
Tianyuan Yao,
Gaurav Rudravaram,
Yuankai Huo,
Daniel Moyer,
Kurt Schilling,
Walter Kukull,
Arthur Toga,
Derek Archer,
Timothy Hohman,
Bennett Landman,
Zhiyuan Li
Abstract:
Purpose: In diffusion MRI (dMRI), the volumetric and bundle analyses of whole-brain tissue microstructure and connectivity can be severely impeded by an incomplete field-of-view (FOV). This work aims to develop a method for imputing the missing slices directly from existing dMRI scans with an incomplete FOV. We hypothesize that the imputed image with complete FOV can improve the whole-brain tracto…
▽ More
Purpose: In diffusion MRI (dMRI), the volumetric and bundle analyses of whole-brain tissue microstructure and connectivity can be severely impeded by an incomplete field-of-view (FOV). This work aims to develop a method for imputing the missing slices directly from existing dMRI scans with an incomplete FOV. We hypothesize that the imputed image with complete FOV can improve the whole-brain tractography for corrupted data with incomplete FOV. Therefore, our approach provides a desirable alternative to discarding the valuable dMRI data, enabling subsequent tractography analyses that would otherwise be challenging or unattainable with corrupted data. Approach: We propose a framework based on a deep generative model that estimates the absent brain regions in dMRI scans with incomplete FOV. The model is capable of learning both the diffusion characteristics in diffusion-weighted images (DWI) and the anatomical features evident in the corresponding structural images for efficiently imputing missing slices of DWI outside of incomplete FOV. Results: For evaluating the imputed slices, on the WRAP dataset the proposed framework achieved PSNRb0=22.397, SSIMb0=0.905, PSNRb1300=22.479, SSIMb1300=0.893; on the NACC dataset it achieved PSNRb0=21.304, SSIMb0=0.892, PSNRb1300=21.599, SSIMb1300= 0.877. The proposed framework improved the tractography accuracy, as demonstrated by an increased average Dice score for 72 tracts (p < 0.001) on both the WRAP and NACC datasets. Conclusions: Results suggest that the proposed framework achieved sufficient imputation performance in dMRI data with incomplete FOV for improving whole-brain tractography, thereby repairing the corrupted data. Our approach achieved more accurate whole-brain tractography results with extended and complete FOV and reduced the uncertainty when analyzing bundles associated with Alzheimer's Disease.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
Authors:
June-Woo Kim,
Miika Toikkanen,
Sangmin Bae,
Minseok Kim,
Ho-Young Jung
Abstract:
Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrain…
▽ More
Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrained speech models for respiratory sound classification. We find that there is a characterization gap between speech and lung sound samples, and to bridge this gap, data augmentation is essential. However, the most widely used augmentation technique for audio and speech, SpecAugment, requires 2-dimensional spectrogram format and cannot be applied to models pretrained on speech waveforms. To address this, we propose RepAugment, an input-agnostic representation-level augmentation technique that outperforms SpecAugment, but is also suitable for respiratory sound classification with waveform pretrained models. Experimental results show that our approach outperforms the SpecAugment, demonstrating a substantial improvement in the accuracy of minority disease classes, reaching up to 7.14%.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Parameter optimization of Josephson parametric amplifiers using a heuristic search algorithm for axion haloscope search
Authors:
Younggeun Kim,
Junu Jeong,
SungWoo Youn,
Sungjae Bae,
Arjan F. van Loo,
Yasunobu Nakamura,
Sergey Uchaikin,
Yannis K. Semertzidis
Abstract:
The cavity haloscope is among the most widely adopted experimental platforms designed to detect dark matter axions with its principle relying on the conversion of axions into microwave photons in the presence of a strong magnetic field. The Josephson parametric amplifier (JPA), known for its quantum-limited noise characteristics, has been incorporated in the detection system to capture the weakly…
▽ More
The cavity haloscope is among the most widely adopted experimental platforms designed to detect dark matter axions with its principle relying on the conversion of axions into microwave photons in the presence of a strong magnetic field. The Josephson parametric amplifier (JPA), known for its quantum-limited noise characteristics, has been incorporated in the detection system to capture the weakly interacting axion signals. However, the performance of the JPA can be influenced by its environment, leading to potential unreliability of a predefined parameter set obtained in a specific laboratory setting. Furthermore, conducting a broadband search requires consecutive characterization of the amplifier across different tuning frequencies. To ensure more reliable measurements, we utilize the Nelder-Mead technique as a numerical search method to dynamically determine the optimal operating conditions. This heuristic search algorithm explores the multidimensional parameter space of the JPA, optimizing critical characteristics such as gain and noise temperature to maximize signal-to-noise ratios for a given experimental setup. Our study presents a comprehensive analysis of the properties of a flux-driven JPA to demonstrate the effectiveness of the algorithm. This approach contributes to ongoing efforts in axion dark matter research by offering an efficient method to enhance axion detection sensitivity through the optimized utilization of JPAs.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Tailoring coercive fields and the Curie temperature via proximity coupling in WSe$_2$/Fe$_3$GeTe$_2$ van der Waals heterostructures
Authors:
Guodong Ma,
Renjun Du,
Fuzhuo Lian,
Song Bao,
Zijing Guo,
Xiaofan Cai,
Jingkuan Xiao,
Yaqing Han,
Di Zhang,
Siqi Jiang,
Jiabei Huang,
Xinglong Wu,
Alexander S. Mayorov,
Jinsheng Wen,
Lei Wang,
Geliang Yu
Abstract:
Hybrid structures consisting of two-dimensional (2D) magnets and semiconductors have exhibited extensive functionalities in spintronics and opto-spintronics. In this work, we have fabricated WSe$_2$/Fe$_3$GeTe$_2$ van der Waals (vdW) heterostructures and investigated the proximity effects on 2D magnetism. Through reflective magnetic circular dichroism (RMCD), we have observed a temperature-depende…
▽ More
Hybrid structures consisting of two-dimensional (2D) magnets and semiconductors have exhibited extensive functionalities in spintronics and opto-spintronics. In this work, we have fabricated WSe$_2$/Fe$_3$GeTe$_2$ van der Waals (vdW) heterostructures and investigated the proximity effects on 2D magnetism. Through reflective magnetic circular dichroism (RMCD), we have observed a temperature-dependent modulation of magnetic order in the heterostructure. For temperatures above $40$ K, WSe$_2$-covered Fe$_3$GeTe$_2$ exhibits a larger coercive field than that observed in bare Fe$_3$GeTe$_2$, accompanied by a noticeable enhancement of the Curie temperature by $21$ K. This strengthening suggests an increase in magnetic anisotropy in the interfacial Fe$_3$GeTe$_2$ layer, which can be attributed to the spin-orbit coupling (SOC) proximity effect induced by the adjacent WSe$_2$ layers. However, at much lower temperatures ($T<20$ K), a non-monotonic modification of the coercive field is observed, showing both reduction and enhancement, which depends on the thickness of the WSe$_2$ and Fe$_3$GeTe$_2$ layers. Moreover, an unconventional two-step magnetization process emerges in the heterostructure, indicating the short-range nature of SOC proximity effects. Our findings revealing proximity effects on 2D magnetism may shed light on the design of future spintronic and memory devices based on 2D magnetic heterostructures.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Gate control of 2D magnetism in tri- and four-layers $\rm CrI_3$/graphene heterostructures
Authors:
Ping Wang,
Fuzhuo Lian,
Renjun Du,
Xiaofan Cai,
Song Bao,
Yaqing Han,
Jingkuan Xiao,
Kenji Watanabe,
Takashi Taniguchi,
Jinsheng Wen,
Hongxin Yang,
Alexander S. Mayorov,
Lei Wang,
Geliang Yu
Abstract:
We conduct experimental studies on the electrical transport properties of monolayer graphene directly covered by a few layers of $\rm CrI_3$. We do not observe the expected magnetic exchange coupling in the graphene but instead discover proximity effects featuring gate and magnetic field tunability. The tunability of gate voltage is manifested in the alignment of the lowest conduction band of…
▽ More
We conduct experimental studies on the electrical transport properties of monolayer graphene directly covered by a few layers of $\rm CrI_3$. We do not observe the expected magnetic exchange coupling in the graphene but instead discover proximity effects featuring gate and magnetic field tunability. The tunability of gate voltage is manifested in the alignment of the lowest conduction band of $\rm CrI_3$ and the Fermi level of graphene, which can be controlled by the gate voltage. The coexistence of the normal and atypical quantum Hall effects in our device also corresponds to gate-control modulation doping. The lowest conduction band depends on the magnetic states of the $\rm CrI_3$ and can be altered by the magnetic field, which corresponds to the resistance loops during back-and-forth sweeps of the magnetic field. Our results serve as a reference for exploiting the magnetic proximity effects in graphene.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
USmorph: An Updated Framework of Automatic Classification of Galaxy Morphologies and Its Application to Galaxies in the COSMOS Field
Authors:
Jie Song,
GuanWen Fang,
Shuo Ba,
Zesen Lin,
Yizhou Gu,
Chichun Zhou,
Tao Wang,
Cai-Na Hao,
Guilin Liu,
Hongxin Zhang,
Yao Yao,
Xu Kong
Abstract:
Morphological classification conveys abundant information on the formation, evolution, and environment of galaxies. In this work, we refine the two-step galaxy morphological classification framework ({\tt\string USmorph}), which employs a combination of unsupervised machine learning (UML) and supervised machine learning (SML) techniques, along with a self-consistent and robust data preprocessing s…
▽ More
Morphological classification conveys abundant information on the formation, evolution, and environment of galaxies. In this work, we refine the two-step galaxy morphological classification framework ({\tt\string USmorph}), which employs a combination of unsupervised machine learning (UML) and supervised machine learning (SML) techniques, along with a self-consistent and robust data preprocessing step. The updated method is applied to the galaxies with $I_{\rm mag}<25$ at $0.2<z<1.2$ in the COSMOS field. Based on their HST/ACS I-band images, we classify them into five distinct morphological types: spherical (SPH, 15,200), early-type disk (ETD, 17,369), late-type disk (LTD, 21,143), irregular disk (IRR, 28,965), and unclassified (UNC, 17,129). In addition, we have conducted both parametric and nonparametric morphological measurements. For galaxies with stellar masses exceeding $10^{9}M_{\sun}$, a gradual increase in effective radius from SPHs to IRRs is observed, accompanied by a decrease in the Sérsic index. Nonparametric morphologies reveal distinct distributions of galaxies across the $Gini-M_{20}$ and $C-A$ parameter spaces for different categories. Moreover, different categories exhibit significant dissimilarity in their $G_2$ and $Ψ$ distributions. We find morphology to be strongly correlated with redshift and stellar mass. The consistency of these classification results with expected correlations among multiple parameters underscores the validity and reliability of our classification method, rendering it a valuable tool for future studies.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
PupilSense: Detection of Depressive Episodes Through Pupillary Response in the Wild
Authors:
Rahul Islam,
Sang Won Bae
Abstract:
Early detection of depressive episodes is crucial in managing mental health disorders such as Major Depressive Disorder (MDD) and Bipolar Disorder. However, existing methods often necessitate active participation or are confined to clinical settings. Addressing this gap, we introduce PupilSense, a novel, deep learning-driven mobile system designed to discreetly track pupillary responses as users i…
▽ More
Early detection of depressive episodes is crucial in managing mental health disorders such as Major Depressive Disorder (MDD) and Bipolar Disorder. However, existing methods often necessitate active participation or are confined to clinical settings. Addressing this gap, we introduce PupilSense, a novel, deep learning-driven mobile system designed to discreetly track pupillary responses as users interact with their smartphones in their daily lives. This study presents a proof-of-concept exploration of PupilSense's capabilities, where we captured real-time pupillary data from users in naturalistic settings. Our findings indicate that PupilSense can effectively and passively monitor indicators of depressive episodes, offering a promising tool for continuous mental health assessment outside laboratory environments. This advancement heralds a significant step in leveraging ubiquitous mobile technology for proactive mental health care, potentially transforming how depressive episodes are detected and managed in everyday contexts.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Exploring Algorithmic Explainability: Generating Explainable AI Insights for Personalized Clinical Decision Support Focused on Cannabis Intoxication in Young Adults
Authors:
Tongze Zhang,
Tammy Chung,
Anind Dey,
Sang Won Bae
Abstract:
This study explores the possibility of facilitating algorithmic decision-making by combining interpretable artificial intelligence (XAI) techniques with sensor data, with the aim of providing researchers and clinicians with personalized analyses of cannabis intoxication behavior. SHAP analyzes the importance and quantifies the impact of specific factors such as environmental noise or heart rate, e…
▽ More
This study explores the possibility of facilitating algorithmic decision-making by combining interpretable artificial intelligence (XAI) techniques with sensor data, with the aim of providing researchers and clinicians with personalized analyses of cannabis intoxication behavior. SHAP analyzes the importance and quantifies the impact of specific factors such as environmental noise or heart rate, enabling clinicians to pinpoint influential behaviors and environmental conditions. SkopeRules simplify the understanding of cannabis use for a specific activity or environmental use. Decision trees provide a clear visualization of how factors interact to influence cannabis consumption. Counterfactual models help identify key changes in behaviors or conditions that may alter cannabis use outcomes, to guide effective individualized intervention strategies. This multidimensional analytical approach not only unveils changes in behavioral and physiological states after cannabis use, such as frequent fluctuations in activity states, nontraditional sleep patterns, and specific use habits at different times and places, but also highlights the significance of individual differences in responses to cannabis use. These insights carry profound implications for clinicians seeking to gain a deeper understanding of the diverse needs of their patients and for tailoring precisely targeted intervention strategies. Furthermore, our findings highlight the pivotal role that XAI technologies could play in enhancing the transparency and interpretability of Clinical Decision Support Systems (CDSS), with a particular focus on substance misuse treatment. This research significantly contributes to ongoing initiatives aimed at advancing clinical practices that aim to prevent and reduce cannabis-related harms to health, positioning XAI as a supportive tool for clinicians and researchers alike.
△ Less
Submitted 29 April, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Towards Scalable & Efficient Interaction-Aware Planning in Autonomous Vehicles using Knowledge Distillation
Authors:
Piyush Gupta,
David Isele,
Sangjae Bae
Abstract:
Real-world driving involves intricate interactions among vehicles navigating through dense traffic scenarios. Recent research focuses on enhancing the interaction awareness of autonomous vehicles to leverage these interactions in decision-making. These interaction-aware planners rely on neural-network-based prediction models to capture inter-vehicle interactions, aiming to integrate these predicti…
▽ More
Real-world driving involves intricate interactions among vehicles navigating through dense traffic scenarios. Recent research focuses on enhancing the interaction awareness of autonomous vehicles to leverage these interactions in decision-making. These interaction-aware planners rely on neural-network-based prediction models to capture inter-vehicle interactions, aiming to integrate these predictions with traditional control techniques such as Model Predictive Control. However, this integration of deep learning-based models with traditional control paradigms often results in computationally demanding optimization problems, relying on heuristic methods. This study introduces a principled and efficient method for combining deep learning with constrained optimization, employing knowledge distillation to train smaller and more efficient networks, thereby mitigating complexity. We demonstrate that these refined networks maintain the problem-solving efficacy of larger models while significantly accelerating optimization. Specifically, in the domain of interaction-aware trajectory planning for autonomous vehicles, we illustrate that training a smaller prediction network using knowledge distillation speeds up optimization without sacrificing accuracy.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Competition-Aware Decision-Making Approach for Mobile Robots in Racing Scenarios
Authors:
Kyoungtae Ji,
Sangjae Bae,
Nan Li,
Kyoungseok Han
Abstract:
This paper presents a game-theoretic strategy for racing, where the autonomous ego agent seeks to block a racing opponent that aims to overtake the ego agent. After a library of trajectory candidates and an associated reward matrix are constructed, the optimal trajectory in terms of maximizing the cumulative reward over the planning horizon is determined based on the level-K reasoning framework. I…
▽ More
This paper presents a game-theoretic strategy for racing, where the autonomous ego agent seeks to block a racing opponent that aims to overtake the ego agent. After a library of trajectory candidates and an associated reward matrix are constructed, the optimal trajectory in terms of maximizing the cumulative reward over the planning horizon is determined based on the level-K reasoning framework. In particular, the level of the opponent is estimated online according to its behavior over a past window and is then used to determine the trajectory for the ego agent. Taking into account that the opponent may change its level and strategy during the decision process of the ego agent, we introduce a trajectory mixing strategy that blends the level-K optimal trajectory with a fail-safe trajectory. The overall algorithm was tested and evaluated in various simulated racing scenarios, which also includes human-in-the-loop experiments. Comparative analysis against the conventional level-K framework demonstrates the superiority of our proposed approach in terms of overtake-blocking success rates.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Enhancing Empathy in Virtual Reality: An Embodied Approach to Mindset Modulation
Authors:
Seoyeon Bae,
Yoon Kyung Lee,
Jungcheol Lee,
Jaeheon Kim,
Haeseong Jeon,
Seung-Hwan Lim,
Byung-Cheol Kim,
Sowon Hahn
Abstract:
A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a stepping stone, enabling players to identify with t…
▽ More
A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a stepping stone, enabling players to identify with the character and cultivate a growth mindset as they followed mission instructions. We considered several implementation factors to assist players in positioning within the VR experience, including positive feedback, content difficulty, background lighting, and multimodal feedback. We conducted an experiment to investigate the intervention's effectiveness in increasing empathy. Our findings revealed that the VR content and mindset training encouraged participants to improve their growth mindsets and empathic motives. This VR content was developed for college students to enhance their empathy and teamwork skills. It has the potential to improve collaboration in organizational and community environments.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Lane-Change in Dense Traffic with Model Predictive Control and Neural Networks
Authors:
Sangjae Bae,
David Isele,
Alireza Nakhaei,
Peng Xu,
Alexandre Miranda Anon,
Chiho Choi,
Kikuo Fujimura,
Scott Moura
Abstract:
This paper presents an online smooth-path lane-change control framework. We focus on dense traffic where inter-vehicle space gaps are narrow, and cooperation with surrounding drivers is essential to achieve the lane-change maneuver. We propose a two-stage control framework that harmonizes Model Predictive Control (MPC) with Generative Adversarial Networks (GAN) by utilizing driving intentions to g…
▽ More
This paper presents an online smooth-path lane-change control framework. We focus on dense traffic where inter-vehicle space gaps are narrow, and cooperation with surrounding drivers is essential to achieve the lane-change maneuver. We propose a two-stage control framework that harmonizes Model Predictive Control (MPC) with Generative Adversarial Networks (GAN) by utilizing driving intentions to generate smooth lane-change maneuvers. To improve performance in practice, the system is augmented with an adaptive safety boundary and a Kalman Filter to mitigate sensor noise. Simulation studies are investigated in different levels of traffic density and cooperativeness of other drivers. The simulation results support the effectiveness, driving comfort, and safety of the proposed method.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Multiple-Input Auto-Encoder Guided Feature Selection for IoT Intrusion Detection Systems
Authors:
Phai Vu Dinh,
Diep N. Nguyen,
Dinh Thai Hoang,
Quang Uy Nguyen,
Eryk Dutkiewicz,
Son Pham Bao
Abstract:
While intrusion detection systems (IDSs) benefit from the diversity and generalization of IoT data features, the data diversity (e.g., the heterogeneity and high dimensions of data) also makes it difficult to train effective machine learning models in IoT IDSs. This also leads to potentially redundant/noisy features that may decrease the accuracy of the detection engine in IDSs. This paper first i…
▽ More
While intrusion detection systems (IDSs) benefit from the diversity and generalization of IoT data features, the data diversity (e.g., the heterogeneity and high dimensions of data) also makes it difficult to train effective machine learning models in IoT IDSs. This also leads to potentially redundant/noisy features that may decrease the accuracy of the detection engine in IDSs. This paper first introduces a novel neural network architecture called Multiple-Input Auto-Encoder (MIAE). MIAE consists of multiple sub-encoders that can process inputs from different sources with different characteristics. The MIAE model is trained in an unsupervised learning mode to transform the heterogeneous inputs into lower-dimensional representation, which helps classifiers distinguish between normal behaviour and different types of attacks. To distil and retain more relevant features but remove less important/redundant ones during the training process, we further design and embed a feature selection layer right after the representation layer of MIAE resulting in a new model called MIAEFS. This layer learns the importance of features in the representation vector, facilitating the selection of informative features from the representation vector. The results on three IDS datasets, i.e., NSLKDD, UNSW-NB15, and IDS2017, show the superior performance of MIAE and MIAEFS compared to other methods, e.g., conventional classifiers, dimensionality reduction models, unsupervised representation learning methods with different input dimensions, and unsupervised feature selection models. Moreover, MIAE and MIAEFS combined with the Random Forest (RF) classifier achieve accuracy of 96.5% in detecting sophisticated attacks, e.g., Slowloris. The average running time for detecting an attack sample using RF with the representation of MIAE and MIAEFS is approximate 1.7E-6 seconds, whilst the model size is lower than 1 MB.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Enabling Physical Localization of Uncooperative Cellular Devices
Authors:
Taekkyung Oh,
Sangwook Bae,
Junho Ahn,
Yonghwa Lee,
Dinh-Tuan Hoang,
Min Suk Kang,
Nils Ole Tippenhauer,
Yongdae Kim
Abstract:
In cellular networks, it can become necessary for authorities to physically locate user devices for tracking criminals or illegal devices. While cellular operators can provide authorities with cell information the device is camping on, fine-grained localization is still required. Therefore, the authorized agents trace the device by monitoring its uplink signals. However, tracking the uplink signal…
▽ More
In cellular networks, it can become necessary for authorities to physically locate user devices for tracking criminals or illegal devices. While cellular operators can provide authorities with cell information the device is camping on, fine-grained localization is still required. Therefore, the authorized agents trace the device by monitoring its uplink signals. However, tracking the uplink signal source without its cooperation is challenging even for operators and authorities. Particularly, three challenges remain for fine-grained localization: i) localization works only if devices generate enough uplink traffic reliably over time, ii) the target device might generate its uplink traffic with significantly low power, and iii) cellular repeater may add too much noise to true uplink signals. While these challenges present practical hurdles for localization, they have been overlooked in prior works.
In this work, we investigate the impact of these real-world challenges on cellular localization and propose an Uncooperative Multiangulation Attack (UMA) that addresses these challenges. UMA can 1) force a target device to transmit traffic continuously, 2) boost the target's signal strength to the maximum, and 3) uniquely distinguish traffic from the target and the repeaters. Notably, the UMA technique works without privilege on cellular operators or user devices, which makes it operate on any LTE network. Our evaluations show that UMA effectively resolves the challenges in real-world environments when devices are not cooperative for localization. Our approach exploits the current cellular design vulnerabilities, which we have responsibly disclosed to GSMA.
△ Less
Submitted 25 March, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Search for Dark Matter Axions with Tunable TM_020 mode
Authors:
Sungjae Bae,
Junu Jeong,
Younggeun Kim,
SungWoo Youn,
Heejun Park,
Taehyeon Seong,
Seongjeong Oh,
Yannis K. Semertzidis
Abstract:
Axions are hypothesized particles believed to potentially resolve two major puzzles in modern physics: the strong CP problem and the nature of dark matter. Cavity-based axion haloscopes represent the most sensitive tools for probing their theoretically favored couplings to photons in the microelectronvolt range. However, as the search mass (or frequency) increases, the detection efficiency decreas…
▽ More
Axions are hypothesized particles believed to potentially resolve two major puzzles in modern physics: the strong CP problem and the nature of dark matter. Cavity-based axion haloscopes represent the most sensitive tools for probing their theoretically favored couplings to photons in the microelectronvolt range. However, as the search mass (or frequency) increases, the detection efficiency decreases, largely due to a decrease in cavity volume. Despite the potential of higher-order resonant modes to preserve experimental volume, their practical application in searches has been limited by the challenge of maintaining a high form factor over a reasonably wide search bandwidth. We introduce an innovative tuning method that uses the unique properties of auxetic materials, designed to effectively tune higher modes. This approach was applied to the TM_020 mode for a dark matter axion search exploring a mass range from 21.38 to 21.79 ueV, resulting in the establishment of new exclusion limits for axion-photon coupling greater than approximately 10^-13 GeV^-1. These findings signify a breakthrough, demonstrating that our tuning mechanism facilitates the practical utilization of higher-order modes for cavity haloscope searches.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Intersection of a Moran type Sierpinski carpet and a line with rational slope
Authors:
Simin Bao
Abstract:
In 2005, Liu et al. calculated the dimensionality of the intersection of Sierpinski carpet and a straight line with rational slope in the sense of Lebesgue measure.Sierpinski carpet is a self-similar set in two-dimensional planes obtained by an iterative function system, so each layer has the same structure. While the Sierpinski carpet set with Moran structure is the limit set obtained by the acti…
▽ More
In 2005, Liu et al. calculated the dimensionality of the intersection of Sierpinski carpet and a straight line with rational slope in the sense of Lebesgue measure.Sierpinski carpet is a self-similar set in two-dimensional planes obtained by an iterative function system, so each layer has the same structure. While the Sierpinski carpet set with Moran structure is the limit set obtained by the action of two iterative function systems in the two-dimensional plane, which we denote as ~$F_σ$~. And the structure of each layer may be different, controlled by 0,1 sequence ~$σ$~ and controlled by the set. In this paper, the upper and lower box dimensions of the set ~$F_σ$~ and the straight line ~$L_{a}$~ with rational slope are calculated, where ~$a$~ is the intercept of the straight line. In addition, we consider some related problem. The main difficulty in the research is that the structure of each layer of the set ~$F_σ$~ may be different, so the structure of each layer needs to be considered with the help of the sequence ~$σ$~ in the calculation process.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Large language models surpass human experts in predicting neuroscience results
Authors:
Xiaoliang Luo,
Akilles Rechardt,
Guangzhi Sun,
Kevin K. Nejad,
Felipe Yáñez,
Bati Yilmaz,
Kangjoo Lee,
Alexandra O. Cohen,
Valentina Borghesani,
Anton Pashkov,
Daniele Marinazzo,
Jonathan Nicholas,
Alessandro Salatiello,
Ilia Sucholutsky,
Pasquale Minervini,
Sepehr Razavi,
Roberta Rocca,
Elkhan Yusifov,
Tereza Okalova,
Nianlong Gu,
Martin Ferianc,
Mikail Khona,
Kaustubh R. Patil,
Pui-Shee Lee,
Rui Mata
, et al. (14 additional authors not shown)
Abstract:
Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created Brain…
▽ More
Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs were confident in their predictions, they were more likely to be correct, which presages a future where humans and LLMs team together to make discoveries. Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors.
△ Less
Submitted 21 June, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Ultralight vector dark matter search using data from the KAGRA O3GK run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
H. Abe,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi
, et al. (1778 additional authors not shown)
Abstract:
Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese…
▽ More
Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Revisiting Learning-based Video Motion Magnification for Real-time Processing
Authors:
Hyunwoo Ha,
Oh Hyun-Bin,
Kim Jun-Seong,
Kwon Byung-Ki,
Kim Sung-Bin,
Linh-Tam Tran,
Ji-Yun Kim,
Sung-Ho Bae,
Tae-Hyun Oh
Abstract:
Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully demonstrates the modelling of the motion magnification problem with outstanding quality compared to conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being e…
▽ More
Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully demonstrates the modelling of the motion magnification problem with outstanding quality compared to conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being extended to various online applications. In this paper, we investigate an efficient deep learning-based motion magnification model that runs in real time for full-HD resolution videos. Due to the specified network design of the prior art, i.e. inhomogeneous architecture, the direct application of existing neural architecture search methods is complicated. Instead of automatic search, we carefully investigate the architecture module by module for its role and importance in the motion magnification task. Two key findings are 1) Reducing the spatial resolution of the latent motion representation in the decoder provides a good trade-off between computational efficiency and task quality, and 2) surprisingly, only a single linear layer and a single branch in the encoder are sufficient for the motion magnification task. Based on these findings, we introduce a real-time deep learning-based motion magnification model with4.2X fewer FLOPs and is 2.7X faster than the prior art while maintaining comparable quality.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Extensive search for axion dark matter over 1\,GHz with CAPP's Main Axion eXperiment
Authors:
Saebyeok Ahn,
JinMyeong Kim,
Boris I. Ivanov,
Ohjoon Kwon,
HeeSu Byun,
Arjan F. van Loo,
SeongTae Par,
Junu Jeong,
Soohyung Lee,
Jinsu Kim,
Çağlar Kutlu,
Andrew K. Yi,
Yasunobu Nakamura,
Seonjeong Oh,
Danho Ahn,
SungJae Bae,
Hyoungsoon Choi,
Jihoon Choi,
Yonuk Chong,
Woohyun Chung,
Violeta Gkika,
Jihn E. Kim,
Younggeun Kim,
Byeong Rok Ko,
Lino Miceli
, et al. (11 additional authors not shown)
Abstract:
We report an extensive high-sensitivity search for axion dark matter above 1\,GHz at the Center for Axion and Precision Physics Research (CAPP). The cavity resonant search, exploiting the coupling between axions and photons, explored the frequency (mass) range of 1.025\,GHz (4.24\,$μ$eV) to 1.185\,GHz (4.91\,$μ$eV). We have introduced a number of innovations in this field, demonstrating the practi…
▽ More
We report an extensive high-sensitivity search for axion dark matter above 1\,GHz at the Center for Axion and Precision Physics Research (CAPP). The cavity resonant search, exploiting the coupling between axions and photons, explored the frequency (mass) range of 1.025\,GHz (4.24\,$μ$eV) to 1.185\,GHz (4.91\,$μ$eV). We have introduced a number of innovations in this field, demonstrating the practical approach of optimizing all the relevant parameters of axion haloscopes, extending presently available technology. The CAPP 12\,T magnet with an aperture of 320\,mm made of Nb$_3$Sn and NbTi superconductors surrounding a 37-liter ultralight-weight copper cavity is expected to convert DFSZ axions into approximately $10^2$ microwave photons per second. A powerful dilution refrigerator, capable of keeping the core system below 40\,mK, combined with quantum-noise limited readout electronics, achieved a total system noise of about 200\,mK or below, which corresponds to a background of roughly $4\times 10^3$ photons per second within the axion bandwidth. The combination of all those improvements provides unprecedented search performance, imposing the most stringent exclusion limits on axion--photon coupling in this frequency range to date. These results also suggest an experimental capability suitable for highly-sensitive searches for axion dark matter above 1\,GHz.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Hysteresis Compensation of Flexible Continuum Manipulator using RGBD Sensing and Temporal Convolutional Network
Authors:
Junhyun Park,
Seonghyeok Jang,
Hyojae Park,
Seongjun Bae,
Minho Hwang
Abstract:
Flexible continuum manipulators are valued for minimally invasive surgery, offering access to confined spaces through nonlinear paths. However, cable-driven manipulators face control difficulties due to hysteresis from cabling effects such as friction, elongation, and coupling. These effects are difficult to model due to nonlinearity and the difficulties become even more evident when dealing with…
▽ More
Flexible continuum manipulators are valued for minimally invasive surgery, offering access to confined spaces through nonlinear paths. However, cable-driven manipulators face control difficulties due to hysteresis from cabling effects such as friction, elongation, and coupling. These effects are difficult to model due to nonlinearity and the difficulties become even more evident when dealing with long and coupled, multi-segmented manipulator. This paper proposes a data-driven approach based on Deep Neural Networks (DNN) to capture these nonlinear and previous states-dependent characteristics of cable actuation. We collect physical joint configurations according to command joint configurations using RGBD sensing and 7 fiducial markers to model the hysteresis of the proposed manipulator. Result on a study comparing the estimation performance of four DNN models show that the Temporal Convolution Network (TCN) demonstrates the highest predictive capability. Leveraging trained TCNs, we build a control algorithm to compensate for hysteresis. Tracking tests in task space using unseen trajectories show that the proposed control algorithm reduces the average position and orientation error by 61.39% (from 13.7mm to 5.29 mm) and 64.04% (from 31.17° to 11.21°), respectively. This result implies that the proposed calibrated controller effectively reaches the desired configurations by estimating the hysteresis of the manipulator. Applying this method in real surgical scenarios has the potential to enhance control precision and improve surgical performance.
△ Less
Submitted 3 May, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model
Authors:
Junghun Cha,
Ali Haider,
Seoyun Yang,
Hoeyeong Jin,
Subin Yang,
A. F. M. Shahab Uddin,
Jaehyoung Kim,
Soo Ye Kim,
Sung-Ho Bae
Abstract:
A significant volume of analog information, i.e., documents and images, have been digitized in the form of scanned copies for storing, sharing, and/or analyzing in the digital world. However, the quality of such contents is severely degraded by various distortions caused by printing, storing, and scanning processes in the physical world. Although restoring high-quality content from scanned copies…
▽ More
A significant volume of analog information, i.e., documents and images, have been digitized in the form of scanned copies for storing, sharing, and/or analyzing in the digital world. However, the quality of such contents is severely degraded by various distortions caused by printing, storing, and scanning processes in the physical world. Although restoring high-quality content from scanned copies has become an indispensable task for many products, it has not been systematically explored, and to the best of our knowledge, no public datasets are available. In this paper, we define this problem as Descanning and introduce a new high-quality and large-scale dataset named DESCAN-18K. It contains 18K pairs of original and scanned images collected in the wild containing multiple complex degradations. In order to eliminate such complex degradations, we propose a new image restoration model called DescanDiffusion consisting of a color encoder that corrects the global color degradation and a conditional denoising diffusion probabilistic model (DDPM) that removes local degradations. To further improve the generalization ability of DescanDiffusion, we also design a synthetic data generation scheme by reproducing prominent degradations in scanned images. We demonstrate that our DescanDiffusion outperforms other baselines including commercial restoration products, objectively and subjectively, via comprehensive experiments and analyses.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
XiHe: A Data-Driven Model for Global Ocean Eddy-Resolving Forecasting
Authors:
Xiang Wang,
Renzhi Wang,
Ningzi Hu,
Pinqiang Wang,
Peng Huo,
Guihua Wang,
Huizan Wang,
Senzhang Wang,
Junxing Zhu,
Jianbo Xu,
Jun Yin,
Senliang Bao,
Ciqiang Luo,
Ziqing Zu,
Yi Han,
Weimin Zhang,
Kaijun Ren,
Kefeng Deng,
Junqiang Song
Abstract:
The leading operational Global Ocean Forecasting Systems (GOFSs) use physics-driven numerical forecasting models that solve the partial differential equations with expensive computation. Recently, specifically in atmosphere weather forecasting, data-driven models have demonstrated significant potential for speeding up environmental forecasting by orders of magnitude, but there is still no data-dri…
▽ More
The leading operational Global Ocean Forecasting Systems (GOFSs) use physics-driven numerical forecasting models that solve the partial differential equations with expensive computation. Recently, specifically in atmosphere weather forecasting, data-driven models have demonstrated significant potential for speeding up environmental forecasting by orders of magnitude, but there is still no data-driven GOFS that matches the forecasting accuracy of the numerical GOFSs. In this paper, we propose the first data-driven 1/12° resolution global ocean eddy-resolving forecasting model named XiHe, which is established from the 25-year France Mercator Ocean International's daily GLORYS12 reanalysis data. XiHe is a hierarchical transformer-based framework coupled with two special designs. One is the land-ocean mask mechanism for focusing exclusively on the global ocean circulation. The other is the ocean-specific block for effectively capturing both local ocean information and global teleconnection. Extensive experiments are conducted under satellite observations, in situ observations, and the IV-TT Class 4 evaluation framework of the world's leading operational GOFSs from January 2019 to December 2020. The results demonstrate that XiHe achieves stronger forecast performance in all testing variables than existing leading operational numerical GOFSs including Mercator Ocean Physical SYstem (PSY4), Global Ice Ocean Prediction System (GIOPS), BLUElinK OceanMAPS (BLK), and Forecast Ocean Assimilation Model (FOAM). Particularly, the accuracy of ocean current forecasting of XiHe out to 60 days is even better than that of PSY4 in just 10 days. Additionally, XiHe is able to forecast the large-scale circulation and the mesoscale eddies. Furthermore, it can make a 10-day forecast in only 0.36 seconds, which accelerates the forecast speed by thousands of times compared to the traditional numerical GOFSs.
△ Less
Submitted 8 February, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Efficient and Interaction-Aware Trajectory Planning for Autonomous Vehicles with Particle Swarm Optimization
Authors:
Lin Song,
David Isele,
Naira Hovakimyan,
Sangjae Bae
Abstract:
This paper introduces a novel numerical approach to achieving smooth lane-change trajectories in autonomous driving scenarios. Our trajectory generation approach leverages particle swarm optimization (PSO) techniques, incorporating Neural Network (NN) predictions for trajectory refinement. The generation of smooth and dynamically feasible trajectories for the lane change maneuver is facilitated by…
▽ More
This paper introduces a novel numerical approach to achieving smooth lane-change trajectories in autonomous driving scenarios. Our trajectory generation approach leverages particle swarm optimization (PSO) techniques, incorporating Neural Network (NN) predictions for trajectory refinement. The generation of smooth and dynamically feasible trajectories for the lane change maneuver is facilitated by combining polynomial curve fitting with particle propagation, which can account for vehicle dynamics. The proposed planning algorithm is capable of determining feasible trajectories with real-time computation capability. We conduct comparative analyses with two baseline methods for lane changing, involving analytic solutions and heuristic techniques in numerical simulations. The simulation results validate the efficacy and effectiveness of our proposed approach.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Quantum Privacy Aggregation of Teacher Ensembles (QPATE) for Privacy-preserving Quantum Machine Learning
Authors:
William Watkins,
Heehwan Wang,
Sangyoon Bae,
Huan-Hsin Tseng,
Jiook Cha,
Samuel Yen-Chi Chen,
Shinjae Yoo
Abstract:
The utility of machine learning has rapidly expanded in the last two decades and presents an ethical challenge. Papernot et. al. developed a technique, known as Private Aggregation of Teacher Ensembles (PATE) to enable federated learning in which multiple teacher models are trained on disjoint datasets. This study is the first to apply PATE to an ensemble of quantum neural networks (QNN) to pave a…
▽ More
The utility of machine learning has rapidly expanded in the last two decades and presents an ethical challenge. Papernot et. al. developed a technique, known as Private Aggregation of Teacher Ensembles (PATE) to enable federated learning in which multiple teacher models are trained on disjoint datasets. This study is the first to apply PATE to an ensemble of quantum neural networks (QNN) to pave a new way of ensuring privacy in quantum machine learning (QML) models.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Multi-Profile Quadratic Programming (MPQP) for Optimal Gap Selection and Speed Planning of Autonomous Driving
Authors:
Alexandre Miranda Anon,
Sangjae Bae,
Manish Saroya,
David Isele
Abstract:
Smooth and safe speed planning is imperative for the successful deployment of autonomous vehicles. This paper presents a mathematical formulation for the optimal speed planning of autonomous driving, which has been validated in high-fidelity simulations and real-road demonstrations with practical constraints. The algorithm explores the inter-traffic gaps in the time and space domain using a breadt…
▽ More
Smooth and safe speed planning is imperative for the successful deployment of autonomous vehicles. This paper presents a mathematical formulation for the optimal speed planning of autonomous driving, which has been validated in high-fidelity simulations and real-road demonstrations with practical constraints. The algorithm explores the inter-traffic gaps in the time and space domain using a breadth-first search. For each gap, quadratic programming finds an optimal speed profile, synchronizing the time and space pair along with dynamic obstacles. Qualitative and quantitative analysis in Carla is reported to discuss the smoothness and robustness of the proposed algorithm. Finally, we present a road demonstration result for urban city driving.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Nucleus subtype classification using inter-modality learning
Authors:
Lucas W. Remedios,
Shunxing Bao,
Samuel W. Remedios,
Ho Hin Lee,
Leon Y. Cai,
Thomas Li,
Ruining Deng,
Can Cui,
Jia Li,
Qi Liu,
Ken S. Lau,
Joseph T. Roland,
Mary K. Washington,
Lori A. Coburn,
Keith T. Wilson,
Yuankai Huo,
Bennett A. Landman
Abstract:
Understanding the way cells communicate, co-locate, and interrelate is essential to understanding human physiology. Hematoxylin and eosin (H&E) staining is ubiquitously available both for clinical studies and research. The Colon Nucleus Identification and Classification (CoNIC) Challenge has recently innovated on robust artificial intelligence labeling of six cell types on H&E stains of the colon.…
▽ More
Understanding the way cells communicate, co-locate, and interrelate is essential to understanding human physiology. Hematoxylin and eosin (H&E) staining is ubiquitously available both for clinical studies and research. The Colon Nucleus Identification and Classification (CoNIC) Challenge has recently innovated on robust artificial intelligence labeling of six cell types on H&E stains of the colon. However, this is a very small fraction of the number of potential cell classification types. Specifically, the CoNIC Challenge is unable to classify epithelial subtypes (progenitor, endocrine, goblet), lymphocyte subtypes (B, helper T, cytotoxic T), or connective subtypes (fibroblasts, stromal). In this paper, we propose to use inter-modality learning to label previously un-labelable cell types on virtual H&E. We leveraged multiplexed immunofluorescence (MxIF) histology imaging to identify 14 subclasses of cell types. We performed style transfer to synthesize virtual H&E from MxIF and transferred the higher density labels from MxIF to these virtual H&E images. We then evaluated the efficacy of learning in this approach. We identified helper T and progenitor nuclei with positive predictive values of $0.34 \pm 0.15$ (prevalence $0.03 \pm 0.01$) and $0.47 \pm 0.1$ (prevalence $0.07 \pm 0.02$) respectively on virtual H&E. This approach represents a promising step towards automating annotation in digital pathology.
△ Less
Submitted 28 January, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.