-
Sub-Nyquist Sampling OFDM Radar With a Time-Frequency Phase-Coded Waveform
Authors:
Seonghyeon Kang,
Kawon Han,
Songcheol Hong
Abstract:
This paper presents a time-frequency phase-coded sub-Nyquist sampling orthogonal frequency division multiplexing (PC-SNS-OFDM) radar system to reduce the analog-to-digital converter (ADC) sampling rate without any additional hardware or signal processing. The proposed radar divides the transmitted OFDM signal into multiple sub-bands along the frequency axis and provides orthogonality to these sub-…
▽ More
This paper presents a time-frequency phase-coded sub-Nyquist sampling orthogonal frequency division multiplexing (PC-SNS-OFDM) radar system to reduce the analog-to-digital converter (ADC) sampling rate without any additional hardware or signal processing. The proposed radar divides the transmitted OFDM signal into multiple sub-bands along the frequency axis and provides orthogonality to these sub-bands by multiplying phase codes in both the time and frequency domains. Although the sampling rate is reduced by the factor of the number of sub-bands, the sub-bands above the sampling rate are folded into the lowest one due to aliasing. In the process of restoring the signals in folded sub-bands to those in full signal bands, the proposed PC-SNS-OFDM radar effectively eliminates symbol-mismatch noise while introducing trade-offs in the range and Doppler ambiguities. The utilization of phase codes in both the frequency and time domains provides flexible control of the range and Doppler ambiguities. It also improves the signal-to-noise ratio (SNR) of detected targets compared to an earlier sub-Nyquist sampling OFDM radar system. This is validated with simulations and experiments under various sub-Nyquist sampling rates.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Tidal effects based on GUP-induced effective metric
Authors:
Soon-Tae Hong,
Yong-Wan Kim,
Young-Jai Park
Abstract:
In this paper, we study tidal forces in the Schwarzschild black hole whose metric includes explicitly a generalized uncertainty principle (GUP) effect. We also investigate interesting features of the geodesic equations and tidal effects dependent on the GUP parameter $α$ related to a minimum length. Then, by solving geodesic deviation equations explicitly with appropriate boundary conditions, we s…
▽ More
In this paper, we study tidal forces in the Schwarzschild black hole whose metric includes explicitly a generalized uncertainty principle (GUP) effect. We also investigate interesting features of the geodesic equations and tidal effects dependent on the GUP parameter $α$ related to a minimum length. Then, by solving geodesic deviation equations explicitly with appropriate boundary conditions, we show that $α$ in the effective metric affects both the radial and angular components of the geodesic equation, particularly near the singularities.
△ Less
Submitted 2 June, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
When Do "More Contexts" Help with Sarcasm Recognition?
Authors:
Ojas Nimase,
Sanghyun Hong
Abstract:
Sarcasm recognition is challenging because it needs an understanding of the true intention, which is opposite to or different from the literal meaning of the words. Prior work has addressed this challenge by developing a series of methods that provide richer $contexts$, e.g., sentiment or cultural nuances, to models. While shown to be effective individually, no study has systematically evaluated t…
▽ More
Sarcasm recognition is challenging because it needs an understanding of the true intention, which is opposite to or different from the literal meaning of the words. Prior work has addressed this challenge by developing a series of methods that provide richer $contexts$, e.g., sentiment or cultural nuances, to models. While shown to be effective individually, no study has systematically evaluated their collective effectiveness. As a result, it remains unclear to what extent additional contexts can improve sarcasm recognition. In this work, we explore the improvements that existing methods bring by incorporating more contexts into a model. To this end, we develop a framework where we can integrate multiple contextual cues and test different approaches. In evaluation with four approaches on three sarcasm recognition benchmarks, we achieve existing state-of-the-art performances and also demonstrate the benefits of sequentially adding more contexts. We also identify inherent drawbacks of using more contexts, highlighting that in the pursuit of even better results, the model may need to adopt societal biases.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Diffusion Denoising as a Certified Defense against Clean-label Poisoning
Authors:
Sanghyun Hong,
Nicholas Carlini,
Alexey Kurakin
Abstract:
We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion mo…
▽ More
We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion model can sanitize the tampered training data. We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy. We compare our defense with existing countermeasures against clean-label poisoning, showing that the defense reduces the attack success the most and offers the best model utility. Our results highlight the need for future work on developing stronger clean-label attacks and using our certified yet practical defense as a strong baseline to evaluate these attacks.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
A Deep Learning Method for Beat-Level Risk Analysis and Interpretation of Atrial Fibrillation Patients during Sinus Rhythm
Authors:
Jun Lei,
Yuxi Zhou,
Xue Tian,
Qinghao Zhao,
Qi Zhang,
Shijia Geng,
Qingbo Wu,
Shenda Hong
Abstract:
Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhyt…
▽ More
Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhythm is absent. To address this, this paper proposes a novel artificial intelligence (AI) algorithm to distinguish ``sinus rhythm in AF patients'' and ``sinus rhythm in normal individuals'' in beat-level. We introduce beat-level risk interpreters, trend risk interpreters, addressing the interpretability issues of deep learning models and the difficulty in explaining AF risk trends. Additionally, the beat-level information fusion decision is presented to enhance model accuracy. The experimental results demonstrate that the average AUC for single beats used as testing data from CPSC 2021 dataset is 0.7314. By employing 150 beats for information fusion decision algorithm, the average AUC can reach 0.7591. Compared to previous segment-level algorithms, we utilized beats as input, reducing data dimensionality and making the model more lightweight, facilitating deployment on portable medical devices. Furthermore, we draw new and interesting findings through average beat analysis and subgroup analysis, considering varying risk levels.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence
Authors:
Sunghwan Hong,
Seokju Cho,
Seungryong Kim,
Stephen Lin
Abstract:
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks. In the context of dense matching, many works benefit from one of two forms of aggregation: feature aggregation, which pertains to the alignment of similar features, or cost aggregation, a procedure aimed at instilling coherence in the flow estimates across neighboring pixel…
▽ More
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks. In the context of dense matching, many works benefit from one of two forms of aggregation: feature aggregation, which pertains to the alignment of similar features, or cost aggregation, a procedure aimed at instilling coherence in the flow estimates across neighboring pixels. In this work, we first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes. We then introduce a simple yet effective architecture that harnesses self- and cross-attention mechanisms to show that our approach unifies feature aggregation and cost aggregation and effectively harnesses the strengths of both techniques. Within the proposed attention layers, the features and cost volume both complement each other, and the attention layers are interleaved through a coarse-to-fine design to further promote accurate correspondence estimation. Finally at inference, our network produces multi-scale predictions, computes their confidence scores, and selects the most confident flow for final prediction. Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
△ Less
Submitted 22 April, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Asymptotically Near-Optimal Hybrid Beamforming for mmWave IRS-Aided MIMO Systems
Authors:
Jeongjae Lee,
Songnam Hong
Abstract:
Hybrid beamforming is an emerging technology for massive multiple-input multiple-output (MIMO) systems due to the advantages of lower complexity, cost, and power consumption. Recently, intelligent reflection surface (IRS) has been proposed as the cost-effective technique for robust millimeter-wave (mmWave) MIMO systems. Thus, it is required to jointly optimize a reflection vector and hybrid beamfo…
▽ More
Hybrid beamforming is an emerging technology for massive multiple-input multiple-output (MIMO) systems due to the advantages of lower complexity, cost, and power consumption. Recently, intelligent reflection surface (IRS) has been proposed as the cost-effective technique for robust millimeter-wave (mmWave) MIMO systems. Thus, it is required to jointly optimize a reflection vector and hybrid beamforming matrices for IRS-aided mmWave MIMO systems. Due to the lack of RF chain in the IRS, it is unavailable to acquire the TX-IRS and IRS-RX channels separately. Instead, there are efficient methods to estimate the so-called effective (or cascaded) channel in literature. We for the first time derive the near-optimal solution of the aforementioned joint optimization only using the effective channel. Based on our theoretical analysis, we develop the practical reflection vector and hybrid beamforming matrices by projecting the asymptotic solution into the modulus constraint. Via simulations, it is demonstrated that the proposed construction can outperform the state-of-the-art (SOTA) method, where the latter even requires the knowledge of the TX-IRS and IRS-RX channels separately. Furthermore, our construction can provide robustness for channel estimation errors, which is inevitable for practical massive MIMO systems.
△ Less
Submitted 24 April, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Electrically Tunable Spin Exchange Splitting in Graphene Hybrid Heterostructure
Authors:
Dongwon Shin,
Hyeonbeom Kim,
Sung Ju Hong,
Sehwan Song,
Yeongju Choi,
Youngkuk Kim,
Sungkyun Park,
Dongseok Suh,
Woo Seok Choi
Abstract:
Graphene, with spin and valley degrees of freedom, fosters unexpected physical and chemical properties for the realization of next-generation quantum devices. However, the spin symmetry of graphene is rather robustly protected, hampering manipulation of the spin degrees of freedom for the application of spintronic devices such as electric gate tunable spin filters. We demonstrate that a hybrid het…
▽ More
Graphene, with spin and valley degrees of freedom, fosters unexpected physical and chemical properties for the realization of next-generation quantum devices. However, the spin symmetry of graphene is rather robustly protected, hampering manipulation of the spin degrees of freedom for the application of spintronic devices such as electric gate tunable spin filters. We demonstrate that a hybrid heterostructure composed of graphene and LaCoO3 epitaxial thin film exhibits an electrically tunable spin exchange splitting. The large and adjustable spin exchange splitting of 155.9 - 306.5 meV was obtained by the characteristic shifts in both the spin symmetry broken quantum Hall states and the Shubnikov-de-Haas oscillations. Strong hybridization induced charge transfer across the hybrid heterointerface has been identified for the observed spin exchange splitting. The substantial and facile controllability of the spin exchange splitting provides an opportunity for spintronics applications with the electrically-tunable spin polarization in hybrid heterostructures.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Test for high-dimensional linear hypothesis of mean vectors via random integration
Authors:
Jianghao Li,
Shizhe Hong,
Zhenzhen Niu,
Zhidong Bai
Abstract:
In this paper, we investigate hypothesis testing for the linear combination of mean vectors across multiple populations through the method of random integration. We have established the asymptotic distributions of the test statistics under both null and alternative hypotheses. Additionally, we provide a theoretical explanation for the special use of our test statistics in situations when the nonze…
▽ More
In this paper, we investigate hypothesis testing for the linear combination of mean vectors across multiple populations through the method of random integration. We have established the asymptotic distributions of the test statistics under both null and alternative hypotheses. Additionally, we provide a theoretical explanation for the special use of our test statistics in situations when the nonzero signal in the linear combination of the true mean vectors is weakly dense. Moreover, Monte-Carlo simulations are presented to evaluate the suggested test against existing high-dimensional tests. The findings from these simulations reveal that our test not only aligns with the performance of other tests in terms of size but also exhibits superior power.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
ArgMed-Agents: Explainable Clinical Decision Reasoning with LLM Disscusion via Argumentation Schemes
Authors:
Shengxin Hong,
Liang Xiao,
Xin Zhang,
Jianxia Chen
Abstract:
There are two main barriers to using large language models (LLMs) in clinical reasoning. Firstly, while LLMs exhibit significant promise in Natural Language Processing (NLP) tasks, their performance in complex reasoning and planning falls short of expectations. Secondly, LLMs use uninterpretable methods to make clinical decisions that are fundamentally different from the clinician's cognitive proc…
▽ More
There are two main barriers to using large language models (LLMs) in clinical reasoning. Firstly, while LLMs exhibit significant promise in Natural Language Processing (NLP) tasks, their performance in complex reasoning and planning falls short of expectations. Secondly, LLMs use uninterpretable methods to make clinical decisions that are fundamentally different from the clinician's cognitive processes. This leads to user distrust. In this paper, we present a multi-agent framework called ArgMed-Agents, which aims to enable LLM-based agents to make explainable clinical decision reasoning through interaction. ArgMed-Agents performs self-argumentation iterations via Argumentation Scheme for Clinical Discussion (a reasoning mechanism for modeling cognitive processes in clinical reasoning), and then constructs the argumentation process as a directed graph representing conflicting relationships. Ultimately, use symbolic solver to identify a series of rational and coherent arguments to support decision. We construct a formal model of ArgMed-Agents and present conjectures for theoretical guarantees. ArgMed-Agents enables LLMs to mimic the process of clinical argumentative reasoning by generating explanations of reasoning in a self-directed manner. The setup experiments show that ArgMed-Agents not only improves accuracy in complex clinical decision reasoning problems compared to other prompt methods, but more importantly, it provides users with decision explanations that increase their confidence.
△ Less
Submitted 20 June, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
Authors:
Joseph Cho,
Fachrina Dewi Puspitasari,
Sheng Zheng,
Jingyao Zheng,
Lik-Hang Lee,
Tae-Ho Kim,
Choong Seon Hong,
Chaoning Zhang
Abstract:
The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discu…
▽ More
The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discuss these elements consisting of but not limited to core building blocks (vision, language, and temporal) and supporting features from the perspective of their contributions to achieving a world model. We employ the PRISMA framework to curate 97 impactful research articles from renowned scientific databases primarily studying video synthesis using text conditions. Upon minute exploration of these manuscripts, we observe that text-to-video generation involves more intricate technologies beyond the plain extension of text-to-image generation. Our additional review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation such as dataset, evaluation metric, efficient architecture, and human-controlled generation. Finally, we conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community towards its advancement as the first step to realize artificial general intelligence (AGI).
△ Less
Submitted 7 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Local limit theorem of Brownian motion on metric trees
Authors:
Soonki Hong
Abstract:
Let $\mathcal{T}$ be a locally finite tree whose geometric boundary has infinitely many points. Suppose that a non-amenable group $\G$ acts isometrically and geometrically on the tree $\mathcal{T}$.
In this paper, we show that if the length spectrum is Diophantine, then there exists a continuous function $C$ on $\mathcal{T}^2$ such that the heat kernel $p(t,x,y)$ of $\mathcal{T}$ satisfies…
▽ More
Let $\mathcal{T}$ be a locally finite tree whose geometric boundary has infinitely many points. Suppose that a non-amenable group $\G$ acts isometrically and geometrically on the tree $\mathcal{T}$.
In this paper, we show that if the length spectrum is Diophantine, then there exists a continuous function $C$ on $\mathcal{T}^2$ such that the heat kernel $p(t,x,y)$ of $\mathcal{T}$ satisfies
$$\lim_{t\rightarrow \infty}t^{3/2}e^{λ_0t}p(t,x,y)=C(x,y)$$
for any $x,y\in \mathcal{T}$. Here, $λ_0$ is the bottom of the spectrum of the Laplacian on $\mathcal{T}$.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
A 28.6 mJ/iter Stable Diffusion Processor for Text-to-Image Generation with Patch Similarity-based Sparsity Augmentation and Text-based Mixed-Precision
Authors:
Jiwon Choi,
Wooyoung Jo,
Seongyon Hong,
Beomseok Kwon,
Wonhoon Park,
Hoi-Jun Yoo
Abstract:
This paper presents an energy-efficient stable diffusion processor for text-to-image generation. While stable diffusion attained attention for high-quality image synthesis results, its inherent characteristics hinder its deployment on mobile platforms. The proposed processor achieves high throughput and energy efficiency with three key features as solutions: 1) Patch similarity-based sparsity augm…
▽ More
This paper presents an energy-efficient stable diffusion processor for text-to-image generation. While stable diffusion attained attention for high-quality image synthesis results, its inherent characteristics hinder its deployment on mobile platforms. The proposed processor achieves high throughput and energy efficiency with three key features as solutions: 1) Patch similarity-based sparsity augmentation (PSSA) to reduce external memory access (EMA) energy of self-attention score by 60.3 %, leading to 37.8 % total EMA energy reduction. 2) Text-based important pixel spotting (TIPS) to allow 44.8 % of the FFN layer workload to be processed with low-precision activation. 3) Dual-mode bit-slice core (DBSC) architecture to enhance energy efficiency in FFN layers by 43.0 %. The proposed processor is implemented in 28 nm CMOS technology and achieves 3.84 TOPS peak throughput with 225.6 mW average power consumption. In sum, 28.6 mJ/iteration highly energy-efficient text-to-image generation processor can be achieved at MS-COCO dataset.
△ Less
Submitted 14 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Towards Robust Federated Learning via Logits Calibration on Non-IID Data
Authors:
Yu Qiao,
Apurba Adhikary,
Chaoning Zhang,
Choong Seon Hong
Abstract:
Federated learning (FL) is a privacy-preserving distributed management framework based on collaborative model training of distributed devices in edge networks. However, recent studies have shown that FL is vulnerable to adversarial examples (AEs), leading to a significant drop in its performance. Meanwhile, the non-independent and identically distributed (non-IID) challenge of data distribution be…
▽ More
Federated learning (FL) is a privacy-preserving distributed management framework based on collaborative model training of distributed devices in edge networks. However, recent studies have shown that FL is vulnerable to adversarial examples (AEs), leading to a significant drop in its performance. Meanwhile, the non-independent and identically distributed (non-IID) challenge of data distribution between edge devices can further degrade the performance of models. Consequently, both AEs and non-IID pose challenges to deploying robust learning models at the edge. In this work, we adopt the adversarial training (AT) framework to improve the robustness of FL models against adversarial example (AE) attacks, which can be termed as federated adversarial training (FAT). Moreover, we address the non-IID challenge by implementing a simple yet effective logits calibration strategy under the FAT framework, which can enhance the robustness of models when subjected to adversarial attacks. Specifically, we employ a direct strategy to adjust the logits output by assigning higher weights to classes with small samples during training. This approach effectively tackles the class imbalance in the training data, with the goal of mitigating biases between local and global models. Experimental results on three dataset benchmarks, MNIST, Fashion-MNIST, and CIFAR-10 show that our strategy achieves competitive results in natural and robust accuracy compared to several baselines.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Target Localization and Performance Trade-Offs in Cooperative ISAC Systems: A Scheme Based on 5G NR OFDM Signals
Authors:
Zhenkun Zhang,
Hong Ren,
Cunhua Pan,
Sheng Hong,
Dongming Wang,
Jiangzhou Wang,
Xiaohu You
Abstract:
The integration of sensing capabilities into communication systems, by sharing physical resources, has a significant potential for reducing spectrum, hardware, and energy costs while inspiring innovative applications. Cooperative networks, in particular, are expected to enhance sensing services by enlarging the coverage area and enriching sensing measurements, thus improving the service availabili…
▽ More
The integration of sensing capabilities into communication systems, by sharing physical resources, has a significant potential for reducing spectrum, hardware, and energy costs while inspiring innovative applications. Cooperative networks, in particular, are expected to enhance sensing services by enlarging the coverage area and enriching sensing measurements, thus improving the service availability and accuracy. This paper proposes a cooperative integrated sensing and communication (ISAC) framework by leveraging information-carrying orthogonal frequency division multiplexing (OFDM) signals transmitted by access points (APs). Specifically, we propose a two-stage scheme for target localization, where communication signals are reused as sensing reference signals based on the system information shared at the central processing unit (CPU). In Stage I, we measure the ranges of scattered paths induced by targets, through the extraction of time-delay information from the received signals at APs. Then, the target locations are estimated in Stage II based on these range measurements. Considering that the scattered paths corresponding to some targets may not be detectable by all APs, we propose an effective algorithm to match the range measurements with the targets and achieve the target location estimation. Notably, by analyzing the OFDM numerologies defined in fifth generation (5G) standards, we elucidate the flexibility and consistency of performance trade-offs in both communication and sensing aspects. Finally, numerical results confirm the effectiveness of our sensing scheme and the cooperative gain of the ISAC framework.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Closing the Knowledge Gap in Designing Data Annotation Interfaces for AI-powered Disaster Management Analytic Systems
Authors:
Zinat Ara,
Hossein Salemi,
Sungsoo Ray Hong,
Yasas Senarath,
Steve Peterson,
Amanda Lee Hughes,
Hemant Purohit
Abstract:
Data annotation interfaces predominantly leverage ground truth labels to guide annotators toward accurate responses. With the growing adoption of Artificial Intelligence (AI) in domain-specific professional tasks, it has become increasingly important to help beginning annotators identify how their early-stage knowledge can lead to inaccurate answers, which in turn, helps to ensure quality annotati…
▽ More
Data annotation interfaces predominantly leverage ground truth labels to guide annotators toward accurate responses. With the growing adoption of Artificial Intelligence (AI) in domain-specific professional tasks, it has become increasingly important to help beginning annotators identify how their early-stage knowledge can lead to inaccurate answers, which in turn, helps to ensure quality annotations at scale. To investigate this issue, we conducted a formative study involving eight individuals from the field of disaster management, each possessing varying levels of expertise. The goal was to understand the prevalent factors contributing to disagreements among annotators when classifying Twitter messages related to disasters and to analyze their respective responses. Our analysis identified two primary causes of disagreement between expert and beginner annotators: 1) a lack of contextual knowledge or uncertainty about the situation, and 2) the absence of visual or supplementary cues. Based on these findings, we designed a Context interface, which generates aids that help beginners identify potential mistakes and provide the hidden context of the presented tweet. The summative study compares Context design with two widely used designs in data annotation UI, Highlight and Reasoning-based interfaces. We found significant differences between these designs in terms of attitudinal and behavioral data. We conclude with implications for designing future interfaces aiming at closing the knowledge gap among annotators.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Collaborative Job Seeking for People with Autism: Challenges and Design Opportunities
Authors:
Zinat Ara,
Amrita Ganguly,
Donna Peppard,
Dongjun Chung,
Slobodan Vucetic,
Vivian Genaro Motti,
Sungsoo Ray Hong
Abstract:
Successful job search results from job seekers' well-shaped social communication. While well-known differences in communication exist between people with autism and neurotypicals, little is known about how people with autism collaborate with their social surroundings to strive in the job market. To better understand the practices and challenges of collaborative job seeking for people with autism,…
▽ More
Successful job search results from job seekers' well-shaped social communication. While well-known differences in communication exist between people with autism and neurotypicals, little is known about how people with autism collaborate with their social surroundings to strive in the job market. To better understand the practices and challenges of collaborative job seeking for people with autism, we interviewed 20 participants including applicants with autism, their social surroundings, and career experts. Through the interviews, we identified social challenges that people with autism face during their job seeking; the social support they leverage to be successful; and the technological limitations that hinder their collaboration. We designed four probes that represent major collaborative features found from the interviews--executive planning, communication, stage-wise preparation, and neurodivergent community formation--and discussed their potential usefulness and impact through three focus groups. We provide implications regarding how our findings can enhance collaborative job seeking experiences for people with autism through new designs.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Data Interpreter: An LLM Agent For Data Science
Authors:
Sirui Hong,
Yizhang Lin,
Bang Liu,
Bangbang Liu,
Binhao Wu,
Danyang Li,
Jiaqi Chen,
Jiayi Zhang,
Jinlin Wang,
Li Zhang,
Lingyao Zhang,
Min Yang,
Mingchen Zhuge,
Taicheng Guo,
Tuo Zhou,
Wei Tao,
Wenyi Wang,
Xiangru Tang,
Xiangtao Lu,
Xiawu Zheng,
Xinbing Liang,
Yaying Fei,
Yuheng Cheng,
Zongze Xu,
Chenglin Wu
Abstract:
Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution de…
▽ More
Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution designed to solve with code that emphasizes three pivotal techniques to augment problem-solving in data science: 1) dynamic planning with hierarchical graph structures for real-time data adaptability;2) tool integration dynamically to enhance code proficiency during execution, enriching the requisite expertise;3) logical inconsistency identification in feedback, and efficiency enhancement through experience recording. We evaluate the Data Interpreter on various data science and real-world tasks. Compared to open-source baselines, it demonstrated superior performance, exhibiting significant improvements in machine learning tasks, increasing from 0.86 to 0.95. Additionally, it showed a 26% increase in the MATH dataset and a remarkable 112% improvement in open-ended tasks. The solution will be released at https://github.com/geekan/MetaGPT.
△ Less
Submitted 12 March, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
PAC-FNO: Parallel-Structured All-Component Fourier Neural Operators for Recognizing Low-Quality Images
Authors:
Jinsung Jeon,
Hyundong Jin,
Jonghyun Choi,
Sanghyun Hong,
Dongeun Lee,
Kookjin Lee,
Noseong Park
Abstract:
A standard practice in developing image recognition models is to train a model on a specific image resolution and then deploy it. However, in real-world inference, models often encounter images different from the training sets in resolution and/or subject to natural variations such as weather changes, noise types and compression artifacts. While traditional solutions involve training multiple mode…
▽ More
A standard practice in developing image recognition models is to train a model on a specific image resolution and then deploy it. However, in real-world inference, models often encounter images different from the training sets in resolution and/or subject to natural variations such as weather changes, noise types and compression artifacts. While traditional solutions involve training multiple models for different resolutions or input variations, these methods are computationally expensive and thus do not scale in practice. To this end, we propose a novel neural network model, parallel-structured and all-component Fourier neural operator (PAC-FNO), that addresses the problem. Unlike conventional feed-forward neural networks, PAC-FNO operates in the frequency domain, allowing it to handle images of varying resolutions within a single model. We also propose a two-stage algorithm for training PAC-FNO with a minimal modification to the original, downstream model. Moreover, the proposed PAC-FNO is ready to work with existing image recognition models. Extensively evaluating methods with seven image recognition benchmarks, we show that the proposed PAC-FNO improves the performance of existing baseline models on images with various resolutions by up to 77.1% and various types of natural variations in the images at inference.
△ Less
Submitted 14 March, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Non-Invertible Peccei-Quinn Symmetry and the Massless Quark Solution to the Strong CP Problem
Authors:
Clay Cordova,
Sungwoo Hong,
Seth Koren
Abstract:
We consider theories of gauged quark flavor and identify non-invertible Peccei-Quinn symmetries arising from fractional instantons when the resulting gauge group has non-trivial global structure. Such symmetries exist solely because the Standard Model has the same numbers of generations as colors, $N_g = N_c$. This leads us to a massless down-type quark solution to the strong CP problem in an ultr…
▽ More
We consider theories of gauged quark flavor and identify non-invertible Peccei-Quinn symmetries arising from fractional instantons when the resulting gauge group has non-trivial global structure. Such symmetries exist solely because the Standard Model has the same numbers of generations as colors, $N_g = N_c$. This leads us to a massless down-type quark solution to the strong CP problem in an ultraviolet $SU(9)$ theory of quark color-flavor unification. We show how the CKM flavor structure and weak CP violation can be generated without upsetting our solution.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
On the importance of assessing topological convergence in Bayesian phylogenetic inference
Authors:
Marius Brusselmans,
Luiz Max Carvalho,
Samuel L. Hong,
Jiansi Gao,
Frederick A. Matsen IV,
Andrew Rambaut,
Philippe Lemey,
Marc A. Suchard,
Gytis Dudas,
Guy Baele
Abstract:
Modern phylogenetics research is often performed within a Bayesian framework, using sampling algorithms such as Markov chain Monte Carlo (MCMC) to approximate the posterior distribution. These algorithms require careful evaluation of the quality of the generated samples. Within the field of phylogenetics, one frequently adopted diagnostic approach is to evaluate the effective sample size (ESS) and…
▽ More
Modern phylogenetics research is often performed within a Bayesian framework, using sampling algorithms such as Markov chain Monte Carlo (MCMC) to approximate the posterior distribution. These algorithms require careful evaluation of the quality of the generated samples. Within the field of phylogenetics, one frequently adopted diagnostic approach is to evaluate the effective sample size (ESS) and to investigate trace graphs of the sampled parameters. A major limitation of these approaches is that they are developed for continuous parameters and therefore incompatible with a crucial parameter in these inferences: the tree topology. Several recent advancements have aimed at extending these diagnostics to topological space. In this short reflection paper, we present a case study illustrating how these topological diagnostics can contain information not found in standard diagnostics, and how decisions regarding which of these diagnostics to compute can impact inferences regarding MCMC convergence and mixing. Given the major importance of detecting convergence and mixing issues in Bayesian phylogenetic analyses, the lack of a unified approach to this problem warrants further action, especially now that additional tools are becoming available to researchers.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Gradients of brain organization: Smooth sailing from methods development to user community
Authors:
Jessica Royer,
Casey Paquola,
Sofie L. Valk,
Matthias Kirschner,
Seok-Jun Hong,
Bo-yong Park,
Richard A. I. Bethlehem,
Robert Leech,
B. T. Thomas Yeo,
Elizabeth Jefferies,
Jonathan Smallwood,
Daniel Margulies,
Boris C. Bernhardt
Abstract:
Multimodal neuroimaging grants a powerful in vivo window into the structure and function of the human brain. Recent methodological and conceptual advances have enabled investigations of the interplay between large-scale spatial trends, or gradients, in brain structure and function, offering a framework to unify principles of brain organization across multiple scales. Strong community enthusiasm fo…
▽ More
Multimodal neuroimaging grants a powerful in vivo window into the structure and function of the human brain. Recent methodological and conceptual advances have enabled investigations of the interplay between large-scale spatial trends, or gradients, in brain structure and function, offering a framework to unify principles of brain organization across multiple scales. Strong community enthusiasm for these techniques has been instrumental in their widespread adoption and implementation to answer key questions in neuroscience. Following a brief review of current literature on this framework, this perspective paper will highlight how pragmatic steps aiming to make gradient methods more accessible to the community propelled these techniques to the forefront of neuroscientific inquiry. More specifically, we will emphasize how interest for gradient methods was catalyzed by data sharing, open-source software development, as well as the organization of dedicated workshops led by a diverse team of early career researchers. To this end, we argue that the growing excitement for brain gradients is the result of coordinated and consistent efforts to build an inclusive community and can serve as a case in point for future innovations and conceptual advances in neuroinformatics. We close this perspective paper by discussing challenges for the continuous refinement of neuroscientific theory, methodological innovation, and real-world translation to maintain our collective progress towards integrated models of brain organization.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Transformers with Attentive Federated Aggregation for Time Series Stock Forecasting
Authors:
Chu Myaet Thwal,
Ye Lin Tun,
Kitae Kim,
Seong-Bae Park,
Choong Seon Hong
Abstract:
Recent innovations in transformers have shown their superior performance in natural language processing (NLP) and computer vision (CV). The ability to capture long-range dependencies and interactions in sequential data has also triggered a great interest in time series modeling, leading to the widespread use of transformers in many time series applications. However, being the most common and cruci…
▽ More
Recent innovations in transformers have shown their superior performance in natural language processing (NLP) and computer vision (CV). The ability to capture long-range dependencies and interactions in sequential data has also triggered a great interest in time series modeling, leading to the widespread use of transformers in many time series applications. However, being the most common and crucial application, the adaptation of transformers to time series forecasting has remained limited, with both promising and inconsistent results. In contrast to the challenges in NLP and CV, time series problems not only add the complexity of order or temporal dependence among input sequences but also consider trend, level, and seasonality information that much of this data is valuable for decision making. The conventional training scheme has shown deficiencies regarding model overfitting, data scarcity, and privacy issues when working with transformers for a forecasting task. In this work, we propose attentive federated transformers for time series stock forecasting with better performance while preserving the privacy of participating enterprises. Empirical results on various stock data from the Yahoo! Finance website indicate the superiority of our proposed scheme in dealing with the above challenges and data heterogeneity in federated learning.
△ Less
Submitted 22 January, 2024;
originally announced February 2024.
-
Retrieval-Augmented Score Distillation for Text-to-3D Generation
Authors:
Junyoung Seo,
Susung Hong,
Wooseok Jang,
Inès Hyeonsu Kim,
Minseop Kwak,
Doyup Lee,
Seungryong Kim
Abstract:
Text-to-3D generation has achieved significant success by incorporating powerful 2D diffusion models, but insufficient 3D prior knowledge also leads to the inconsistency of 3D geometry. Recently, since large-scale multi-view datasets have been released, fine-tuning the diffusion model on the multi-view datasets becomes a mainstream to solve the 3D inconsistency problem. However, it has confronted…
▽ More
Text-to-3D generation has achieved significant success by incorporating powerful 2D diffusion models, but insufficient 3D prior knowledge also leads to the inconsistency of 3D geometry. Recently, since large-scale multi-view datasets have been released, fine-tuning the diffusion model on the multi-view datasets becomes a mainstream to solve the 3D inconsistency problem. However, it has confronted with fundamental difficulties regarding the limited quality and diversity of 3D data, compared with 2D data. To sidestep these trade-offs, we explore a retrieval-augmented approach tailored for score distillation, dubbed ReDream. We postulate that both expressiveness of 2D diffusion models and geometric consistency of 3D assets can be fully leveraged by employing the semantically relevant assets directly within the optimization process. To this end, we introduce novel framework for retrieval-based quality enhancement in text-to-3D generation. We leverage the retrieved asset to incorporate its geometric prior in the variational objective and adapt the diffusion model's 2D prior toward view consistency, achieving drastic improvements in both geometry and fidelity of generated scenes. We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency. Project page is available at https://ku-cvlab.github.io/ReDream/.
△ Less
Submitted 2 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Spin: An Efficient Secure Computation Framework with GPU Acceleration
Authors:
Wuxuan Jiang,
Xiangjun Song,
Shenbai Hong,
Haijun Zhang,
Wenxin Liu,
Bo Zhao,
Wei Xu,
Yi Li
Abstract:
Accuracy and efficiency remain challenges for multi-party computation (MPC) frameworks. Spin is a GPU-accelerated MPC framework that supports multiple computation parties and a dishonest majority adversarial setup. We propose optimized protocols for non-linear functions that are critical for machine learning, as well as several novel optimizations specific to attention that is the fundamental unit…
▽ More
Accuracy and efficiency remain challenges for multi-party computation (MPC) frameworks. Spin is a GPU-accelerated MPC framework that supports multiple computation parties and a dishonest majority adversarial setup. We propose optimized protocols for non-linear functions that are critical for machine learning, as well as several novel optimizations specific to attention that is the fundamental unit of Transformer models, allowing Spin to perform non-trivial CNNs training and Transformer inference without sacrificing security. At the backend level, Spin leverages GPU, CPU, and RDMA-enabled smart network cards for acceleration. Comprehensive evaluations demonstrate that Spin can be up to $2\times$ faster than the state-of-the-art for deep neural network training. For inference on a Transformer model with 18.9 million parameters, our attention-specific optimizations enable Spin to achieve better efficiency, less communication, and better accuracy.
△ Less
Submitted 23 February, 2024; v1 submitted 3 February, 2024;
originally announced February 2024.
-
Distributional Off-policy Evaluation with Bellman Residual Minimization
Authors:
Sungee Hong,
Zhengling Qi,
Raymond K. W. Wong
Abstract:
We consider the problem of distributional off-policy evaluation which serves as the foundation of many distributional reinforcement learning (DRL) algorithms. In contrast to most existing works (that rely on supremum-extended statistical distances such as supremum-Wasserstein distance), we study the expectation-extended statistical distance for quantifying the distributional Bellman residuals and…
▽ More
We consider the problem of distributional off-policy evaluation which serves as the foundation of many distributional reinforcement learning (DRL) algorithms. In contrast to most existing works (that rely on supremum-extended statistical distances such as supremum-Wasserstein distance), we study the expectation-extended statistical distance for quantifying the distributional Bellman residuals and show that it can upper bound the expected error of estimating the return distribution. Based on this appealing property, by extending the framework of Bellman residual minimization to DRL, we propose a method called Energy Bellman Residual Minimizer (EBRM) to estimate the return distribution. We establish a finite-sample error bound for the EBRM estimator under the realizability assumption. Furthermore, we introduce a variant of our method based on a multi-step bootstrapping procedure to enable multi-step extension. By selecting an appropriate step level, we obtain a better error bound for this variant of EBRM compared to a single-step EBRM, under some non-realizability settings. Finally, we demonstrate the superior performance of our method through simulation studies, comparing with several existing methods.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Test for high-dimensional mean vectors via the weighted $L_2$-norm
Authors:
Jianghao Li,
Zhenzhen Niu,
Shizhe Hong,
Zhidong Bai
Abstract:
In this paper, we propose a novel approach to test the equality of high-dimensional mean vectors of several populations via the weighted $L_2$-norm. We establish the asymptotic normality of the test statistics under the null hypothesis. We also explain theoretically why our test statistics can be highly useful in weakly dense cases when the nonzero signal in mean vectors is present. Furthermore, w…
▽ More
In this paper, we propose a novel approach to test the equality of high-dimensional mean vectors of several populations via the weighted $L_2$-norm. We establish the asymptotic normality of the test statistics under the null hypothesis. We also explain theoretically why our test statistics can be highly useful in weakly dense cases when the nonzero signal in mean vectors is present. Furthermore, we compare the proposed test with existing tests using simulation results, demonstrating that the weighted $L_2$-norm-based test statistic exhibits favorable properties in terms of both size and power.
△ Less
Submitted 31 January, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Synchronization Behavior of Newton's Cradle
Authors:
Minseok Lee,
Seokchan Hong
Abstract:
A Newton's cradle is a device that demonstrates conservation of momentum using a series of identical colliding pendula. Despite being a famous example that demonstrates the concept of momentum conservation, extensive analysis of the system is rarely reported in literature. Here, we model the system as a collection of identical nonlinear spring pendulums performing viscoelastic collisions, which sh…
▽ More
A Newton's cradle is a device that demonstrates conservation of momentum using a series of identical colliding pendula. Despite being a famous example that demonstrates the concept of momentum conservation, extensive analysis of the system is rarely reported in literature. Here, we model the system as a collection of identical nonlinear spring pendulums performing viscoelastic collisions, which shows excellent agreement with experiments performed at various conditions. Dependence of its synchronization rate on four key system parameters are studied in detail. Interestingly, the resonance between radial and angular motion was found to modulate the synchronization rate. The proposed theory with full consideration of two dimensional motion and string hysteresis provides an excellent long-term prediction of the synchronized cradle motion.
△ Less
Submitted 12 February, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
3DPFIX: Improving Remote Novices' 3D Printing Troubleshooting through Human-AI Collaboration
Authors:
Nahyun Kwon,
Tong Sun,
Yuyang Gao,
Liang Zhao,
Xu Wang,
Jeeeun Kim,
Sungsoo Ray Hong
Abstract:
The widespread consumer-grade 3D printers and learning resources online enable novices to self-train in remote settings. While troubleshooting plays an essential part of 3D printing, the process remains challenging for many remote novices even with the help of well-developed online sources, such as online troubleshooting archives and online community help. We conducted a formative study with 76 ac…
▽ More
The widespread consumer-grade 3D printers and learning resources online enable novices to self-train in remote settings. While troubleshooting plays an essential part of 3D printing, the process remains challenging for many remote novices even with the help of well-developed online sources, such as online troubleshooting archives and online community help. We conducted a formative study with 76 active 3D printing users to learn how remote novices leverage online resources in troubleshooting and their challenges. We found that remote novices cannot fully utilize online resources. For example, the online archives statically provide general information, making it hard to search and relate their unique cases with existing descriptions. Online communities can potentially ease their struggles by providing more targeted suggestions, but a helper who can provide custom help is rather scarce, making it hard to obtain timely assistance. We propose 3DPFIX, an interactive 3D troubleshooting system powered by the pipeline to facilitate Human-AI Collaboration, designed to improve novices' 3D printing experiences and thus help them easily accumulate their domain knowledge. We built 3DPFIX that supports automated diagnosis and solution-seeking. 3DPFIX was built upon shared dialogues about failure cases from Q&A discourses accumulated in online communities. We leverage social annotations (i.e., comments) to build an annotated failure image dataset for AI classifiers and extract a solution pool. Our summative study revealed that using 3DPFIX helped participants spend significantly less effort in diagnosing failures and finding a more accurate solution than relying on their common practice. We also found that 3DPFIX users learn about 3D printing domain-specific knowledge. We discuss the implications of leveraging community-driven data in developing future Human-AI Collaboration designs.
△ Less
Submitted 1 February, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
Deep Learning with Information Fusion and Model Interpretation for Health Monitoring of Fetus based on Long-term Prenatal Electronic Fetal Heart Rate Monitoring Data
Authors:
Zenghui Lin,
Xintong Liu,
Nan Wang,
Ruichen Li,
Qingao Liu,
Jingying Ma,
Liwei Wang,
Yan Wang,
Shenda Hong
Abstract:
Long-term fetal heart rate (FHR) monitoring during the antepartum period, increasingly popularized by electronic FHR monitoring, represents a growing approach in FHR monitoring. This kind of continuous monitoring, in contrast to the short-term one, collects an extended period of fetal heart data. This offers a more comprehensive understanding of fetus's conditions. However, the interpretation of l…
▽ More
Long-term fetal heart rate (FHR) monitoring during the antepartum period, increasingly popularized by electronic FHR monitoring, represents a growing approach in FHR monitoring. This kind of continuous monitoring, in contrast to the short-term one, collects an extended period of fetal heart data. This offers a more comprehensive understanding of fetus's conditions. However, the interpretation of long-term antenatal fetal heart monitoring is still in its early stages, lacking corresponding clinical standards. Furthermore, the substantial amount of data generated by continuous monitoring imposes a significant burden on clinical work when analyzed manually. To address above challenges, this study develops an automatic analysis system named LARA (Long-term Antepartum Risk Analysis system) for continuous FHR monitoring, combining deep learning and information fusion methods. LARA's core is a well-established convolutional neural network (CNN) model. It processes long-term FHR data as input and generates a Risk Distribution Map (RDM) and Risk Index (RI) as the analysis results. We evaluate LARA on inner test dataset, the performance metrics are as follows: AUC 0.872, accuracy 0.816, specificity 0.811, sensitivity 0.806, precision 0.271, and F1 score 0.415. In our study, we observe that long-term FHR monitoring data with higher RI is more likely to result in adverse outcomes (p=0.0021). In conclusion, this study introduces LARA, the first automated analysis system for long-term FHR monitoring, initiating the further explorations into its clinical value in the future.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map Rendering
Authors:
Sheng Hong,
Junjie He,
Xinhu Zheng,
Chunran Zheng,
Shaojie Shen
Abstract:
We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multimodal sensor fused mapping system that builds on the differentiable \pre{surface splatting }\now{Gaussians} to improve the mapping fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion.
This system leverages the complementary characteri…
▽ More
We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multimodal sensor fused mapping system that builds on the differentiable \pre{surface splatting }\now{Gaussians} to improve the mapping fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion.
This system leverages the complementary characteristics of LiDAR and visual data to capture the geometric structures of large-scale 3D scenes and restore their visual surface information with high fidelity. The initialization for the scene's surface Gaussians and the sensor's poses of each frame are obtained using a LiDAR-inertial system with the feature of size-adaptive voxels. Then, we optimized and refined the Gaussians using visual-derived photometric gradients to optimize their quality and density.
Our method is compatible with various types of LiDAR, including solid-state and mechanical LiDAR, supporting both repetitive and non-repetitive scanning modes. Bolstering structure construction through LiDAR and facilitating real-time generation of photorealistic renderings across diverse LIV datasets. It showcases notable resilience and versatility in generating real-time photorealistic scenes potentially for digital twins and virtual reality, while also holding potential applicability in real-time SLAM and robotics domains.
We release our software and hardware and self-collected datasets to benefit the community.
△ Less
Submitted 16 May, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Cross-Modal Prototype based Multimodal Federated Learning under Severely Missing Modality
Authors:
Huy Q. Le,
Chu Myaet Thwal,
Yu Qiao,
Ye Lin Tun,
Minh N. H. Nguyen,
Choong Seon Hong
Abstract:
Multimodal federated learning (MFL) has emerged as a decentralized machine learning paradigm, allowing multiple clients with different modalities to collaborate on training a machine learning model across diverse data sources without sharing their private data. However, challenges, such as data heterogeneity and severely missing modalities, pose crucial hindrances to the robustness of MFL, signifi…
▽ More
Multimodal federated learning (MFL) has emerged as a decentralized machine learning paradigm, allowing multiple clients with different modalities to collaborate on training a machine learning model across diverse data sources without sharing their private data. However, challenges, such as data heterogeneity and severely missing modalities, pose crucial hindrances to the robustness of MFL, significantly impacting the performance of global model. The absence of a modality introduces misalignment during the local training phase, stemming from zero-filling in the case of clients with missing modalities. Consequently, achieving robust generalization in global model becomes imperative, especially when dealing with clients that have incomplete data. In this paper, we propose Multimodal Federated Cross Prototype Learning (MFCPL), a novel approach for MFL under severely missing modalities by conducting the complete prototypes to provide diverse modality knowledge in modality-shared level with the cross-modal regularization and modality-specific level with cross-modal contrastive mechanism. Additionally, our approach introduces the cross-modal alignment to provide regularization for modality-specific features, thereby enhancing overall performance, particularly in scenarios involving severely missing modalities. Through extensive experiments on three multimodal datasets, we demonstrate the effectiveness of MFCPL in mitigating these challenges and improving the overall performance.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
A Review of Deep Learning Methods for Photoplethysmography Data
Authors:
Guangkun Nie,
Jiabao Zhu,
Gongzheng Tang,
Deyun Zhang,
Shijia Geng,
Qinghao Zhao,
Shenda Hong
Abstract:
Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this rev…
▽ More
Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this review, we systematically reviewed papers that applied deep learning models to process PPG data between January 1st of 2017 and July 31st of 2023 from Google Scholar, PubMed and Dimensions. Each paper is analyzed from three key perspectives: tasks, models, and data. We finally extracted 193 papers where different deep learning frameworks were used to process PPG signals. Based on the tasks addressed in these papers, we categorized them into two major groups: medical-related, and non-medical-related. The medical-related tasks were further divided into seven subgroups, including blood pressure analysis, cardiovascular monitoring and diagnosis, sleep health, mental health, respiratory monitoring and analysis, blood glucose analysis, as well as others. The non-medical-related tasks were divided into four subgroups, which encompass signal processing, biometric identification, electrocardiogram reconstruction, and human activity recognition. In conclusion, significant progress has been made in the field of using deep learning methods to process PPG data recently. This allows for a more thorough exploration and utilization of the information contained in PPG signals. However, challenges remain, such as limited quantity and quality of publicly available databases, a lack of effective validation in real-world scenarios, and concerns about the interpretability, scalability, and complexity of deep learning models. Moreover, there are still emerging research areas that require further investigation.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Attention on Personalized Clinical Decision Support System: Federated Learning Approach
Authors:
Chu Myaet Thwal,
Kyi Thar,
Ye Lin Tun,
Choong Seon Hong
Abstract:
Health management has become a primary problem as new kinds of diseases and complex symptoms are introduced to a rapidly growing modern society. Building a better and smarter healthcare infrastructure is one of the ultimate goals of a smart city. To the best of our knowledge, neural network models are already employed to assist healthcare professionals in achieving this goal. Typically, training a…
▽ More
Health management has become a primary problem as new kinds of diseases and complex symptoms are introduced to a rapidly growing modern society. Building a better and smarter healthcare infrastructure is one of the ultimate goals of a smart city. To the best of our knowledge, neural network models are already employed to assist healthcare professionals in achieving this goal. Typically, training a neural network requires a rich amount of data but heterogeneous and vulnerable properties of clinical data introduce a challenge for the traditional centralized network. Moreover, adding new inputs to a medical database requires re-training an existing model from scratch. To tackle these challenges, we proposed a deep learning-based clinical decision support system trained and managed under a federated learning paradigm. We focused on a novel strategy to guarantee the safety of patient privacy and overcome the risk of cyberattacks while enabling large-scale clinical data mining. As a result, we can leverage rich clinical data for training each local neural network without the need for exchanging the confidential data of patients. Moreover, we implemented the proposed scheme as a sequence-to-sequence model architecture integrating the attention mechanism. Thus, our objective is to provide a personalized clinical decision support system with evolvable characteristics that can deliver accurate solutions and assist healthcare professionals in medical diagnosing.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
Authors:
Chu Myaet Thwal,
Minh N. H. Nguyen,
Ye Lin Tun,
Seong Tae Kim,
My T. Thai,
Choong Seon Hong
Abstract:
Federated learning (FL) has emerged as a promising approach to collaboratively train machine learning models across multiple edge devices while preserving privacy. The success of FL hinges on the efficiency of participating models and their ability to handle the unique challenges of distributed learning. While several variants of Vision Transformer (ViT) have shown great potential as alternatives…
▽ More
Federated learning (FL) has emerged as a promising approach to collaboratively train machine learning models across multiple edge devices while preserving privacy. The success of FL hinges on the efficiency of participating models and their ability to handle the unique challenges of distributed learning. While several variants of Vision Transformer (ViT) have shown great potential as alternatives to modern convolutional neural networks (CNNs) for centralized training, the unprecedented size and higher computational demands hinder their deployment on resource-constrained edge devices, challenging their widespread application in FL. Since client devices in FL typically have limited computing resources and communication bandwidth, models intended for such devices must strike a balance between model size, computational efficiency, and the ability to adapt to the diverse and non-IID data distributions encountered in FL. To address these challenges, we propose OnDev-LCT: Lightweight Convolutional Transformers for On-Device vision tasks with limited training data and resources. Our models incorporate image-specific inductive biases through the LCT tokenizer by leveraging efficient depthwise separable convolutions in residual linear bottleneck blocks to extract local features, while the multi-head self-attention (MHSA) mechanism in the LCT encoder implicitly facilitates capturing global representations of images. Extensive experiments on benchmark image datasets indicate that our models outperform existing lightweight vision models while having fewer parameters and lower computational demands, making them suitable for FL scenarios with data heterogeneity and communication bottlenecks.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
LW-FedSSL: Resource-efficient Layer-wise Federated Self-supervised Learning
Authors:
Ye Lin Tun,
Chu Myaet Thwal,
Le Quang Huy,
Minh N. H. Nguyen,
Choong Seon Hong
Abstract:
Many studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw training data distributed across edge devices. However, edge devices often struggle with high computation and communication costs imposed by SSL and FL algorithms. To tackle this hindrance, we propose LW-FedSSL, a layer-wise federated self-supervised learning approach that allows edge device…
▽ More
Many studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw training data distributed across edge devices. However, edge devices often struggle with high computation and communication costs imposed by SSL and FL algorithms. To tackle this hindrance, we propose LW-FedSSL, a layer-wise federated self-supervised learning approach that allows edge devices to incrementally train a single layer of the model at a time. Our LW-FedSSL comprises server-side calibration and representation alignment mechanisms to maintain comparable performance with end-to-end federated self-supervised learning (FedSSL) while significantly lowering clients' resource requirements. In a pure layer-wise training scheme, training one layer at a time may limit effective interaction between different layers of the model. The server-side calibration mechanism takes advantage of the resource-rich server in an FL environment to ensure smooth collaboration between different layers of the global model. During the local training process, the representation alignment mechanism encourages closeness between representations of FL local models and those of the global model, thereby preserving the layer cohesion established by server-side calibration. Our experiments show that LW-FedSSL has a $3.3 \times$ lower memory requirement and a $3.2 \times$ cheaper communication cost than its end-to-end counterpart. We also explore a progressive training strategy called Prog-FedSSL that outperforms end-to-end training with a similar memory requirement and a $1.8 \times$ cheaper communication cost.
△ Less
Submitted 29 April, 2024; v1 submitted 21 January, 2024;
originally announced January 2024.
-
Joint UAV Deployment and Resource Allocation in THz-Assisted MEC-Enabled Integrated Space-Air-Ground Networks
Authors:
Yan Kyaw Tun,
György Dán,
Yu Min Park,
Choong Seon Hong
Abstract:
Multi-access edge computing (MEC)-enabled integrated space-air-ground (SAG) networks have drawn much attention recently, as they can provide communication and computing services to wireless devices in areas that lack terrestrial base stations (TBSs). Leveraging the ample bandwidth in the terahertz (THz) spectrum, in this paper, we propose MEC-enabled integrated SAG networks with collaboration amon…
▽ More
Multi-access edge computing (MEC)-enabled integrated space-air-ground (SAG) networks have drawn much attention recently, as they can provide communication and computing services to wireless devices in areas that lack terrestrial base stations (TBSs). Leveraging the ample bandwidth in the terahertz (THz) spectrum, in this paper, we propose MEC-enabled integrated SAG networks with collaboration among unmanned aerial vehicles (UAVs). We then formulate the problem of minimizing the energy consumption of devices and UAVs in the proposed MEC-enabled integrated SAG networks by optimizing tasks offloading decisions, THz sub-bands assignment, transmit power control, and UAVs deployment. The formulated problem is a mixed-integer nonlinear programming (MILP) problem with a non-convex structure, which is challenging to solve. We thus propose a block coordinate descent (BCD) approach to decompose the problem into four sub-problems: 1) device task offloading decision problem, 2) THz sub-band assignment and power control problem, 3) UAV deployment problem, and 4) UAV task offloading decision problem. We then propose to use a matching game, concave-convex procedure (CCP) method, successive convex approximation (SCA), and block successive upper-bound minimization (BSUM) approaches for solving the individual subproblems. Finally, extensive simulations are performed to demonstrate the effectiveness of our proposed algorithm.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Curriculum Design Helps Spiking Neural Networks to Classify Time Series
Authors:
Chenxi Sun,
Hongyan Li,
Moxian Song,
Derun Can,
Shenda Hong
Abstract:
Spiking Neural Networks (SNNs) have a greater potential for modeling time series data than Artificial Neural Networks (ANNs), due to their inherent neuron dynamics and low energy consumption. However, it is difficult to demonstrate their superiority in classification accuracy, because current efforts mainly focus on designing better network structures. In this work, enlighten by brain-inspired sci…
▽ More
Spiking Neural Networks (SNNs) have a greater potential for modeling time series data than Artificial Neural Networks (ANNs), due to their inherent neuron dynamics and low energy consumption. However, it is difficult to demonstrate their superiority in classification accuracy, because current efforts mainly focus on designing better network structures. In this work, enlighten by brain-inspired science, we find that, not only the structure but also the learning process should be human-like. To achieve this, we investigate the power of Curriculum Learning (CL) on SNNs by designing a novel method named CSNN with two theoretically guaranteed mechanisms: The active-to-dormant training order makes the curriculum similar to that of human learning and suitable for spiking neurons; The value-based regional encoding makes the neuron activity to mimic the brain memory when learning sequential data. Experiments on multiple time series sources including simulated, sensor, motion, and healthcare demonstrate that CL has a more positive effect on SNNs than ANNs with about twice the accuracy change, and CSNN can increase about 3% SNNs' accuracy by improving network sparsity, neuron firing status, anti-noise ability, and convergence speed.
△ Less
Submitted 25 December, 2023;
originally announced January 2024.
-
Optimal multiple-phase estimation with multi-mode NOON states against photon loss
Authors:
Min Namkung,
Dong-Hyun Kim,
Seongjin Hong,
Yong-Su Kim,
Changhyoup Lee,
Hyang-Tag Lim
Abstract:
Multi-mode NOON states can quantum-enhance multiple-phase estimation in the absence of photon loss. However, a multi-mode NOON state is known to be vulnerable to photon loss, and its quantum-enhancement can be dissipated by lossy environment. In this work, we demonstrate that a quantum advantage in estimate precision can still be achieved in the presence of photon loss. This is accomplished by opt…
▽ More
Multi-mode NOON states can quantum-enhance multiple-phase estimation in the absence of photon loss. However, a multi-mode NOON state is known to be vulnerable to photon loss, and its quantum-enhancement can be dissipated by lossy environment. In this work, we demonstrate that a quantum advantage in estimate precision can still be achieved in the presence of photon loss. This is accomplished by optimizing the weights of the multi-mode NOON states according to photon loss rates in the multiple modes, including the reference mode which defines the other phases. For practical relevance, we also show that photon-number counting via a multi-mode beam-splitter achieves the useful, albeit sub-optimal, quantum advantage. We expect this work to provide valuable guidance for developing quantum-enhanced multiple-phase estimation techniques in lossy environments.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Maunakea Spectroscopic Explorer exposure time calculator for end-to-end simulator: to optimizing spectrograph design and observing simulation
Authors:
Tae-Geun Ji,
Jennifer Sobeck,
Changgon Kim,
Hojae Ahn,
Mingyeong Yang,
Taeeun Kim,
Sungwook E. Hong,
Kei Szeto,
Jennifer L. Marshall,
Christian Surace,
Soojong Pak
Abstract:
The Maunakea Spectroscopic Explorer (MSE) project will provide multi-object spectroscopy in the optical and near-infrared bands using an 11.25-m aperture telescope, repurposing the original Canada-France-Hawaii Telescope (CFHT) site. MSE will observe 4,332 objects per single exposure with a field of view of 1.5 square degrees, utilizing two spectrographs with low-moderate (R$\sim$3,000, 6,000) and…
▽ More
The Maunakea Spectroscopic Explorer (MSE) project will provide multi-object spectroscopy in the optical and near-infrared bands using an 11.25-m aperture telescope, repurposing the original Canada-France-Hawaii Telescope (CFHT) site. MSE will observe 4,332 objects per single exposure with a field of view of 1.5 square degrees, utilizing two spectrographs with low-moderate (R$\sim$3,000, 6,000) and high (R$\approx$30,000) spectral resolution. In general, an exposure time calculator (ETC) is used to estimate the performance of an observing system by calculating a signal-to-noise ratio (S/N) and exposure time. We present the design of the MSE exposure time calculator (ETC), which has four calculation modes (S/N, exposure time, S/N trend with wavelength, and S/N trend with magnitude) and incorporates the MSE system requirements as specified in the Conceptual Design. The MSE ETC currently allows for user-defined inputs of target AB magnitude, water vapor, airmass, and sky brightness AB magnitude (additional user inputs can be provided depending on computational mode). The ETC is built using Python 3.7 and features a graphical user interface that allows for cross-platform use. The development process of the ETC software follows an Agile methodology and utilizes the Unified Modeling Language (UML) diagrams to visualize the software architecture. We also describe the testing and verification of the MSE ETC.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Near-Field Channel Estimation for XL-RIS Assisted Multi-User XL-MIMO Systems: Hybrid Beamforming Architectures
Authors:
Jeongjae Lee,
Hyeongjin Chung,
Yunseong Cho,
Sunwoo Kim,
Songnam Hong
Abstract:
Channel estimation is one of the key challenges for the deployment of extremely large-scale reconfigurable intelligent surface (XL-RIS) assisted multiple-input multiple-output (MIMO) systems. In this paper, we study the channel estimation problem for XL-RIS assisted multi-user XL-MIMO systems with hybrid beamforming structures. For this system, we propose an {\em unified} channel estimation method…
▽ More
Channel estimation is one of the key challenges for the deployment of extremely large-scale reconfigurable intelligent surface (XL-RIS) assisted multiple-input multiple-output (MIMO) systems. In this paper, we study the channel estimation problem for XL-RIS assisted multi-user XL-MIMO systems with hybrid beamforming structures. For this system, we propose an {\em unified} channel estimation method that yields a notable estimation accuracy in the near-field BS-RIS and near-field RIS-User channels (in short, near-near field channels), far-near field channels, and far-far field channels. Our key idea is that the effective (or cascaded) channels to be estimated can be each factorized as the product of low-rank matrices (i.e., the product of the common (or user-independent) matrix and the user-specific coefficient matrix). The common matrix whose columns are the basis of the column space of the BS-RIS channel matrix is efficiently estimated via a {\em collaborative} low-rank approximation (CLRA). Leveraging the hybrid beamforming structures, we develop an efficient iterative algorithm that jointly optimizes the user-specific coefficient matrices. Via experiments and complexity analysis, we verify the effectiveness of the proposed channel estimation method (named CLRA-JO) in the aforementioned three classes of wireless channels.
△ Less
Submitted 25 April, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
Recent developments of selective laser processes for wearable devices
Authors:
Youngchan Kim,
Eunseung Hwang,
Chang Kai,
Kaichen Xu,
Heng Pan,
Sukjoon Hong
Abstract:
Recently, the growing interest in wearable technology for personal healthcare and smart VR/AR applications newly imposed a need for development of facile fabrication method. Regarding the issue, laser has long been proposing original answers to such challenging technological demands with its remote, sterile, rapid, and site-selective processing characteristics for arbitrary materials. In this revi…
▽ More
Recently, the growing interest in wearable technology for personal healthcare and smart VR/AR applications newly imposed a need for development of facile fabrication method. Regarding the issue, laser has long been proposing original answers to such challenging technological demands with its remote, sterile, rapid, and site-selective processing characteristics for arbitrary materials. In this review, recent developments in relevant laser processes are summarized in two separate categories. Firstly, transformative approaches represented by laser-induced graphene (LIG) are introduced. Apart from design optimization and alteration of native substrate, latest advancements in the transformative approach now enable not only more complex material compositions but also multilayer device configurations by simultaneous transformation of heterogeneous precursor or sequential addition of functional layers coupled with other electronic elements. Besides, more conventional laser techniques such as ablation, sintering and synthesis are still accessible for enhancing the functionality of the entire system through expansion of applicable materials and adoption of new mechanisms. Various wearable device components developed through the corresponding laser processes are then organized with emphasis on chemical/physical sensors and energy devices. At the same time, special attention is given to the applications utilizing multiple laser sources or multiple laser processes, which pave the way towards all-laser fabrication of wearable devices.
△ Less
Submitted 28 November, 2023;
originally announced January 2024.
-
Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects
Authors:
Yuheng Cheng,
Ceyao Zhang,
Zhengwen Zhang,
Xiangrui Meng,
Sirui Hong,
Wenhao Li,
Zihao Wang,
Zekai Wang,
Feng Yin,
Junhua Zhao,
Xiuqiang He
Abstract:
Intelligent agents stand out as a potential path toward artificial general intelligence (AGI). Thus, researchers have dedicated significant effort to diverse implementations for them. Benefiting from recent progress in large language models (LLMs), LLM-based agents that use universal natural language as an interface exhibit robust generalization capabilities across various applications -- from ser…
▽ More
Intelligent agents stand out as a potential path toward artificial general intelligence (AGI). Thus, researchers have dedicated significant effort to diverse implementations for them. Benefiting from recent progress in large language models (LLMs), LLM-based agents that use universal natural language as an interface exhibit robust generalization capabilities across various applications -- from serving as autonomous general-purpose task assistants to applications in coding, social, and economic domains, LLM-based agents offer extensive exploration opportunities. This paper surveys current research to provide an in-depth overview of LLM-based intelligent agents within single-agent and multi-agent systems. It covers their definitions, research frameworks, and foundational components such as their composition, cognitive and planning methods, tool utilization, and responses to environmental feedback. We also delve into the mechanisms of deploying LLM-based agents in multi-agent systems, including multi-role collaboration, message passing, and strategies to alleviate communication issues between agents. The discussions also shed light on popular datasets and application scenarios. We conclude by envisioning prospects for LLM-based agents, considering the evolving landscape of AI and natural language processing.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Synergistic Formulaic Alpha Generation for Quantitative Trading based on Reinforcement Learning
Authors:
Hong-Gi Shin,
Sukhyun Jeong,
Eui-Yeon Kim,
Sungho Hong,
Young-Jin Cho,
Yong-Hoon Choi
Abstract:
Mining of formulaic alpha factors refers to the process of discovering and developing specific factors or indicators (referred to as alpha factors) for quantitative trading in stock market. To efficiently discover alpha factors in vast search space, reinforcement learning (RL) is commonly employed. This paper proposes a method to enhance existing alpha factor mining approaches by expanding a searc…
▽ More
Mining of formulaic alpha factors refers to the process of discovering and developing specific factors or indicators (referred to as alpha factors) for quantitative trading in stock market. To efficiently discover alpha factors in vast search space, reinforcement learning (RL) is commonly employed. This paper proposes a method to enhance existing alpha factor mining approaches by expanding a search space and utilizing pretrained formulaic alpha set as initial seed values to generate synergistic formulaic alpha. We employ information coefficient (IC) and rank information coefficient (Rank IC) as performance evaluation metrics for the model. Using CSI300 market data, we conducted real investment simulations and observed significant performance improvement compared to existing techniques.
△ Less
Submitted 7 July, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
Elastic p-12C scattering by using a cluster effective field theory
Authors:
Eun Jin In,
Tae-Sun Park,
Young-Ho Song,
Seung-Woo Hong
Abstract:
The elastic p-12C scattering at low energies is studied by using a cluster effective field theory (EFT), where the low-lying resonance states (s1/2, p3/2, d5/2) of 13N are treated as pertinent degrees of freedom. The low-energy constants of the Lagrangian are expressed in terms of the Coulomb-modified effective range parameters, which are determined to reproduce the experimental data for the diffe…
▽ More
The elastic p-12C scattering at low energies is studied by using a cluster effective field theory (EFT), where the low-lying resonance states (s1/2, p3/2, d5/2) of 13N are treated as pertinent degrees of freedom. The low-energy constants of the Lagrangian are expressed in terms of the Coulomb-modified effective range parameters, which are determined to reproduce the experimental data for the differential cross-sections. The resulting theoretical predictions agree very well with the experimental data. The resulting theory is shown to give us almost identical phase shifts as obtained from the R-matrix approach. The role of the ground state of 13N below the threshold and the next-to-leading order in the EFT power counting are also discussed.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Improving Diffusion-Based Image Synthesis with Context Prediction
Authors:
Ling Yang,
Jingwei Liu,
Shenda Hong,
Zhilong Zhang,
Zhilin Huang,
Zheming Cai,
Wentao Zhang,
Bin Cui
Abstract:
Diffusion models are a new class of generative models, and have dramatically promoted image generation with unprecedented quality and diversity. Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes. However, such point-based reconstruction may fail to make each predicted pixel/feature fully preserve its…
▽ More
Diffusion models are a new class of generative models, and have dramatically promoted image generation with unprecedented quality and diversity. Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes. However, such point-based reconstruction may fail to make each predicted pixel/feature fully preserve its neighborhood context, impairing diffusion-based image synthesis. As a powerful source of automatic supervisory signal, context has been well studied for learning representations. Inspired by this, we for the first time propose ConPreDiff to improve diffusion-based image synthesis with context prediction. We explicitly reinforce each point to predict its neighborhood context (i.e., multi-stride features/tokens/pixels) with a context decoder at the end of diffusion denoising blocks in training stage, and remove the decoder for inference. In this way, each point can better reconstruct itself by preserving its semantic connections with neighborhood context. This new paradigm of ConPreDiff can generalize to arbitrary discrete and continuous diffusion backbones without introducing extra parameters in sampling procedure. Extensive experiments are conducted on unconditional image generation, text-to-image generation and image inpainting tasks. Our ConPreDiff consistently outperforms previous methods and achieves a new SOTA text-to-image generation results on MS-COCO, with a zero-shot FID score of 6.21.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Engineering the strain and interlayer excitons of 2D materials via lithographically engraved hexagonal boron nitride
Authors:
Yu-Chiang Hsieh,
Zhen-You Lin,
Shin-Ji Fung,
Wen-Shin Lu,
Sheng-Chin Ho,
Siang-Ping Hong,
Sheng-Zhu Ho,
Chiu-Hua Huang,
Kenji Watanabe,
Takashi Taniguchi,
Yang-Hao Chan,
Yi-Chun Chen,
Chung-Lin Wu,
Tse-Ming Chen
Abstract:
Strain engineering has quickly emerged as a viable option to modify the electronic, optical and magnetic properties of 2D materials. However, it remains challenging to arbitrarily control the strain. Here we show that by creating atomically-flat surface nanostructures in hexagonal boron nitride, we achieve an arbitrary on-chip control of both the strain distribution and magnitude on high-quality m…
▽ More
Strain engineering has quickly emerged as a viable option to modify the electronic, optical and magnetic properties of 2D materials. However, it remains challenging to arbitrarily control the strain. Here we show that by creating atomically-flat surface nanostructures in hexagonal boron nitride, we achieve an arbitrary on-chip control of both the strain distribution and magnitude on high-quality molybdenum disulfide. The phonon and exciton emissions are shown to vary in accordance with our strain field designs, enabling us to write and draw any photoluminescence color image in a single chip. Moreover, our strain engineering offers a powerful means to significantly and controllably alter the strengths and energies of interlayer excitons at room temperature. This method can be easily extended to other material systems and offers a promise for functional excitonic devices.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Curricular and Cyclical Loss for Time Series Learning Strategy
Authors:
Chenxi Sun,
Hongyan Li,
Moxian Song,
Derun Cai,
Shenda Hong
Abstract:
Time series widely exists in real-world applications and many deep learning models have performed well on it. Current research has shown the importance of learning strategy for models, suggesting that the benefit is the order and size of learning samples. However, no effective strategy has been proposed for time series due to its abstract and dynamic construction. Meanwhile, the existing one-shot…
▽ More
Time series widely exists in real-world applications and many deep learning models have performed well on it. Current research has shown the importance of learning strategy for models, suggesting that the benefit is the order and size of learning samples. However, no effective strategy has been proposed for time series due to its abstract and dynamic construction. Meanwhile, the existing one-shot tasks and continuous tasks for time series necessitate distinct learning processes and mechanisms. No all-purpose approach has been suggested. In this work, we propose a novel Curricular and CyclicaL loss (CRUCIAL) to learn time series for the first time. It is model- and task-agnostic and can be plugged on top of the original loss with no extra procedure. CRUCIAL has two characteristics: It can arrange an easy-to-hard learning order by dynamically determining the sample contribution and modulating the loss amplitude; It can manage a cyclically changed dataset and achieve an adaptive cycle by correlating the loss distribution and the selection probability. We prove that compared with monotonous size, cyclical size can reduce expected error. Experiments on 3 kinds of tasks and 5 real-world datasets show the benefits of CRUCIAL for most deep learning models when learning time series.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models
Authors:
Seunghoo Hong,
Juhun Lee,
Simon S. Woo
Abstract:
Text-to-Image models such as Stable Diffusion have shown impressive image generation synthesis, thanks to the utilization of large-scale datasets. However, these datasets may contain sexually explicit, copyrighted, or undesirable content, which allows the model to directly generate them. Given that retraining these large models on individual concept deletion requests is infeasible, fine-tuning alg…
▽ More
Text-to-Image models such as Stable Diffusion have shown impressive image generation synthesis, thanks to the utilization of large-scale datasets. However, these datasets may contain sexually explicit, copyrighted, or undesirable content, which allows the model to directly generate them. Given that retraining these large models on individual concept deletion requests is infeasible, fine-tuning algorithms have been developed to tackle concept erasing in diffusion models. While these algorithms yield good concept erasure, they all present one of the following issues: 1) the corrupted feature space yields synthesis of disintegrated objects, 2) the initially synthesized content undergoes a divergence in both spatial structure and semantics in the generated images, and 3) sub-optimal training updates heighten the model's susceptibility to utility harm. These issues severely degrade the original utility of generative models. In this work, we present a new approach that solves all of these challenges. We take inspiration from the concept of classifier guidance and propose a surgical update on the classifier guidance term while constraining the drift of the unconditional score term. Furthermore, our algorithm empowers the user to select an alternative to the erasing concept, allowing for more controllability. Our experimental results show that our algorithm not only erases the target concept effectively but also preserves the model's generation capability.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Effect of Resonant Acoustic Powder Mixing on Delay Time of W-KClO4-BaCrO4 Mixtures
Authors:
Kyungmin Kwon,
Seunghwan Ryu,
Soyun Joo,
Youngjoon Han,
Donghyeon Baek,
Moonsoo Park,
Dongwon Kim,
Seungbum Hong
Abstract:
This study investigates the impact of resonant acoustic powder mixing on the delay time of the W-KClO4-BaCrO4 (WKB) mixture and its potential implications for powder and material synthesis. Through thermal analysis, an inverse linear relationship was found between thermal conductivity and delay time, allowing us to use thermal conductivity as a reliable proxy for the delay time. By comparing the t…
▽ More
This study investigates the impact of resonant acoustic powder mixing on the delay time of the W-KClO4-BaCrO4 (WKB) mixture and its potential implications for powder and material synthesis. Through thermal analysis, an inverse linear relationship was found between thermal conductivity and delay time, allowing us to use thermal conductivity as a reliable proxy for the delay time. By comparing the thermal conductivity of WKB mixtures mixed manually and using acoustic powder mixer, we found that acoustic powder mixing resulted in minimal deviations in thermal conductivity, proving more uniform mixing. Furthermore, DSC analysis and Sestak-Berggren modeling demonstrated consistent reaction dynamics with a constant activation energy as the reaction progressed in samples mixed using acoustic waves. These findings underscore the critical role of uniform powder mixing in enhancing the thermodynamic quality of the WKB mixture and emphasize the importance of developing novel methods for powder and material synthesis.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.