-
Information-Theoretic Foundations for Machine Learning
Authors:
Hong Jun Jeon,
Benjamin Van Roy
Abstract:
The staggering progress of machine learning in the past decade has been a sight to behold. In retrospect, it is both remarkable and unsettling that these milestones were achievable with little to no rigorous theory to guide experimentation. Despite this fact, practitioners have been able to guide their future experimentation via observations from previous large-scale empirical investigations. Howe…
▽ More
The staggering progress of machine learning in the past decade has been a sight to behold. In retrospect, it is both remarkable and unsettling that these milestones were achievable with little to no rigorous theory to guide experimentation. Despite this fact, practitioners have been able to guide their future experimentation via observations from previous large-scale empirical investigations. However, alluding to Plato's Allegory of the cave, it is likely that the observations which form the field's notion of reality are but shadows representing fragments of that reality. In this work, we propose a theoretical framework which attempts to answer what exists outside of the cave. To the theorist, we provide a framework which is mathematically rigorous and leaves open many interesting ideas for future exploration. To the practitioner, we provide a framework whose results are very intuitive, general, and which will help form principles to guide future investigations. Concretely, we provide a theoretical framework rooted in Bayesian statistics and Shannon's information theory which is general enough to unify the analysis of many phenomena in machine learning. Our framework characterizes the performance of an optimal Bayesian learner, which considers the fundamental limits of information. Throughout this work, we derive very general theoretical results and apply them to derive insights specific to settings ranging from data which is independently and identically distributed under an unknown distribution, to data which is sequential, to data which exhibits hierarchical structure amenable to meta-learning. We conclude with a section dedicated to characterizing the performance of misspecified algorithms. These results are exciting and particularly relevant as we strive to overcome increasingly difficult machine learning challenges in this endlessly complex world.
△ Less
Submitted 18 July, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Kinetic Typography Diffusion Model
Authors:
Seonmi Park,
Inhwan Bae,
Seunghyun Shin,
Hae-Gon Jeon
Abstract:
This paper introduces a method for realistic kinetic typography that generates user-preferred animatable 'text content'. We draw on recent advances in guided video diffusion models to achieve visually-pleasing text appearances. To do this, we first construct a kinetic typography dataset, comprising about 600K videos. Our dataset is made from a variety of combinations in 584 templates designed by p…
▽ More
This paper introduces a method for realistic kinetic typography that generates user-preferred animatable 'text content'. We draw on recent advances in guided video diffusion models to achieve visually-pleasing text appearances. To do this, we first construct a kinetic typography dataset, comprising about 600K videos. Our dataset is made from a variety of combinations in 584 templates designed by professional motion graphics designers and involves changing each letter's position, glyph, and size (i.e., flying, glitches, chromatic aberration, reflecting effects, etc.). Next, we propose a video diffusion model for kinetic typography. For this, there are three requirements: aesthetic appearances, motion effects, and readable letters. This paper identifies the requirements. For this, we present static and dynamic captions used as spatial and temporal guidance of a video diffusion model, respectively. The static caption describes the overall appearance of the video, such as colors, texture and glyph which represent a shape of each letter. The dynamic caption accounts for the movements of letters and backgrounds. We add one more guidance with zero convolution to determine which text content should be visible in the video. We apply the zero convolution to the text content, and impose it on the diffusion model. Lastly, our glyph loss, only minimizing a difference between the predicted word and its ground-truth, is proposed to make the prediction letters readable. Experiments show that our model generates kinetic typography videos with legible and artistic letter motions based on text prompts.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images
Authors:
Jisu Shin,
Junmyeong Lee,
Seongmin Lee,
Min-Gyu Park,
Ju-Mi Kang,
Ju Hong Yoon,
Hae-Gon Jeon
Abstract:
We present a novel framework for reconstructing animatable human avatars from multiple images, termed CanonicalFusion. Our central concept involves integrating individual reconstruction results into the canonical space. To be specific, we first predict Linear Blend Skinning (LBS) weight maps and depth maps using a shared-encoder-dual-decoder network, enabling direct canonicalization of the 3D mesh…
▽ More
We present a novel framework for reconstructing animatable human avatars from multiple images, termed CanonicalFusion. Our central concept involves integrating individual reconstruction results into the canonical space. To be specific, we first predict Linear Blend Skinning (LBS) weight maps and depth maps using a shared-encoder-dual-decoder network, enabling direct canonicalization of the 3D mesh from the predicted depth maps. Here, instead of predicting high-dimensional skinning weights, we infer compressed skinning weights, i.e., 3-dimensional vector, with the aid of pre-trained MLP networks. We also introduce a forward skinning-based differentiable rendering scheme to merge the reconstructed results from multiple images. This scheme refines the initial mesh by reposing the canonical mesh via the forward skinning and by minimizing photometric and geometric errors between the rendered and the predicted results. Our optimization scheme considers the position and color of vertices as well as the joint angles for each image, thereby mitigating the negative effects of pose errors. We conduct extensive experiments to demonstrate the effectiveness of our method and compare our CanonicalFusion with state-of-the-art methods. Our source codes are available at https://github.com/jsshin98/CanonicalFusion.
△ Less
Submitted 15 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Information-Theoretic Foundations for Neural Scaling Laws
Authors:
Hong Jun Jeon,
Benjamin Van Roy
Abstract:
Neural scaling laws aim to characterize how out-of-sample error behaves as a function of model and training dataset size. Such scaling laws guide allocation of a computational resources between model and data processing to minimize error. However, existing theoretical support for neural scaling laws lacks rigor and clarity, entangling the roles of information and optimization. In this work, we dev…
▽ More
Neural scaling laws aim to characterize how out-of-sample error behaves as a function of model and training dataset size. Such scaling laws guide allocation of a computational resources between model and data processing to minimize error. However, existing theoretical support for neural scaling laws lacks rigor and clarity, entangling the roles of information and optimization. In this work, we develop rigorous information-theoretic foundations for neural scaling laws. This allows us to characterize scaling laws for data generated by a two-layer neural network of infinite width. We observe that the optimal relation between data and model size is linear, up to logarithmic factors, corroborating large-scale empirical investigations. Concise yet general results of the kind we establish may bring clarity to this topic and inform future investigations.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
On-off switchable nonreciprocal negative refraction in non-Hermitian photon-magnon hybrid systems
Authors:
Junyoung Kim,
Bosung Kim,
Bo-Jong Kim,
Haechan Jeon,
Sang-Koog Kim
Abstract:
Photon-magnon coupling, where electromagnetic waves interact with spin waves, and negative refraction, which bends the direction of electromagnetic waves unnaturally, constitute critical foundations and advancements in the realms of optics, spintronics, and quantum information technology. Here, we explore a magnetic-field-controlled, on-off switchable, nonreciprocal negative refraction within a no…
▽ More
Photon-magnon coupling, where electromagnetic waves interact with spin waves, and negative refraction, which bends the direction of electromagnetic waves unnaturally, constitute critical foundations and advancements in the realms of optics, spintronics, and quantum information technology. Here, we explore a magnetic-field-controlled, on-off switchable, nonreciprocal negative refraction within a non-Hermitian photon-magnon hybrid system. By integrating an yttrium iron garnet film with an inverted split-ring resonator, we discover pronounced negative refraction driven by the system's non-Hermitian properties. This phenomenon exhibits unique nonreciprocal behavior dependent on the signal's propagation direction. Our analytical model sheds light on the crucial interplay between coherent and dissipative coupling, significantly altering permittivity and permeability's imaginary components, crucial for negative refraction's emergence. This work pioneers new avenues for employing negative refraction in photon-magnon hybrid systems, signaling substantial advancements in quantum hybrid systems.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
On a cofinal Reinhardt embedding without Powerset
Authors:
Hanul Jeon
Abstract:
In this paper, we provide a positive answer to a question by Matthews whether $\mathsf{ZF}^-$ is consistent with a non-trivial cofinal Reinhardt elementary embedding $j\colon V\to V$. The consistency follows from $\mathsf{ZFC} + I_0$, and more precisely, Schultzenberg's model of $\mathsf{ZF}$ with an elementary embedding $k\colon V_{λ+2}\to V_{λ+2}$.
In this paper, we provide a positive answer to a question by Matthews whether $\mathsf{ZF}^-$ is consistent with a non-trivial cofinal Reinhardt elementary embedding $j\colon V\to V$. The consistency follows from $\mathsf{ZFC} + I_0$, and more precisely, Schultzenberg's model of $\mathsf{ZF}$ with an elementary embedding $k\colon V_{λ+2}\to V_{λ+2}$.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Optimal Qubit Mapping Search for Encoding Classical Data into Matrix Product State Representation with Minimal Loss
Authors:
Hyeongjun Jeon,
Kyungmin Lee,
Dongkyu Lee,
Bongsang Kim,
Taehyun Kim
Abstract:
Matrix product state (MPS) offers a framework for encoding classical data into quantum states, enabling the efficient utilization of quantum resources for data representation and processing. This research paper investigates techniques to enhance the efficiency and accuracy of MPS representations specifically designed for encoding classical data. Based on the observations that MPS truncation error…
▽ More
Matrix product state (MPS) offers a framework for encoding classical data into quantum states, enabling the efficient utilization of quantum resources for data representation and processing. This research paper investigates techniques to enhance the efficiency and accuracy of MPS representations specifically designed for encoding classical data. Based on the observations that MPS truncation error depends on the pattern of the classical data, we devised an algorithm that finds optimal qubit mapping for given classical data, thereby improving the efficiency and fidelity of the MPS representation. Furthermore, we evaluate the impact of the optimized MPS in the context of quantum classifiers, demonstrating their enhanced performance compared to the conventional mapping. This improvement confirms the efficacy of the proposed techniques for encoding classical data into quantum states. MPS representation combined with optimal qubit mapping can pave a new way for more efficient and accurate quantum data representation and processing.
△ Less
Submitted 12 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
The behavior of higher proof theory I: Case $Σ^1_2$
Authors:
Hanul Jeon
Abstract:
The current ordinal analysis provides the proof-theoretic ordinal of a theory, which calibrates the robustness of the $Π^1_1$-consequences of the theory. We can ask whether there is an ordinal characteristic capturing more complex consequences, and it turns out that we can define the $Σ^1_2$-proof-theoretic ordinal capturing the robustness of the $Σ^1_2$-consequences of a theory. In this paper, we…
▽ More
The current ordinal analysis provides the proof-theoretic ordinal of a theory, which calibrates the robustness of the $Π^1_1$-consequences of the theory. We can ask whether there is an ordinal characteristic capturing more complex consequences, and it turns out that we can define the $Σ^1_2$-proof-theoretic ordinal capturing the robustness of the $Σ^1_2$-consequences of a theory. In this paper, we study the behavior of $Σ^1_2$-proof-theoretic ordinal, and it turns out that $Σ^1_2$-proof-theoretic ordinal also follows an analogue of Walsh's characterization of proof-theoretic ordinal.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
ReDistill: Residual Encoded Distillation for Peak Memory Reduction
Authors:
Fang Chen,
Gourav Datta,
Mujahid Al Rafi,
Hyeran Jeon,
Meng Tang
Abstract:
The expansion of neural network sizes and the enhancement of image resolution through modern camera sensors result in heightened memory and power demands for neural networks. Reducing peak memory, which is the maximum memory consumed during the execution of a neural network, is critical to deploy neural networks on edge devices with limited memory budget. A naive approach to reducing peak memory i…
▽ More
The expansion of neural network sizes and the enhancement of image resolution through modern camera sensors result in heightened memory and power demands for neural networks. Reducing peak memory, which is the maximum memory consumed during the execution of a neural network, is critical to deploy neural networks on edge devices with limited memory budget. A naive approach to reducing peak memory is aggressive down-sampling of feature maps via pooling with large stride, which often results in unacceptable degradation in network performance. To mitigate this problem, we propose residual encoded distillation (ReDistill) for peak memory reduction in a teacher-student framework, in which a student network with less memory is derived from the teacher network using aggressive pooling. We apply our distillation method to multiple problems in computer vision including image classification and diffusion based image generation. For image classification, our method yields 2x-3.2x measured peak memory on an edge GPU with negligible degradation in accuracy for most CNN based architectures. Additionally, our method yields improved test accuracy for tiny vision transformer (ViT) based models distilled from large CNN based teacher architectures. For diffusion-based image generation, our proposed distillation method yields a denoising network with 4x lower theoretical peak memory while maintaining decent diversity and fidelity for image generation. Experiments demonstrate our method's superior performance compared to other feature-based and response-based distillation methods.
△ Less
Submitted 6 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Learning to Continually Learn with the Bayesian Principle
Authors:
Soochan Lee,
Hyeonseong Jeon,
Jaehyeon Son,
Gunhee Kim
Abstract:
In the present era of deep learning, continual learning research is mainly focused on mitigating forgetting when training a neural network with stochastic gradient descent on a non-stationary stream of data. On the other hand, in the more classical literature of statistical machine learning, many models have sequential Bayesian update rules that yield the same learning outcome as the batch trainin…
▽ More
In the present era of deep learning, continual learning research is mainly focused on mitigating forgetting when training a neural network with stochastic gradient descent on a non-stationary stream of data. On the other hand, in the more classical literature of statistical machine learning, many models have sequential Bayesian update rules that yield the same learning outcome as the batch training, i.e., they are completely immune to catastrophic forgetting. However, they are often overly simple to model complex real-world data. In this work, we adopt the meta-learning paradigm to combine the strong representational power of neural networks and simple statistical models' robustness to forgetting. In our novel meta-continual learning framework, continual learning takes place only in statistical models via ideal sequential Bayesian update rules, while neural networks are meta-learned to bridge the raw data and the statistical models. Since the neural networks remain fixed during continual learning, they are protected from catastrophic forgetting. This approach not only achieves significantly improved performance but also exhibits excellent scalability. Since our approach is domain-agnostic and model-agnostic, it can be applied to a wide range of problems and easily integrated with existing model architectures.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation
Authors:
Se-eun Yoon,
Hyunsik Jeon,
Julian McAuley
Abstract:
We introduce a multimodal dataset where users express preferences through images. These images encompass a broad spectrum of visual expressions ranging from landscapes to artistic depictions. Users request recommendations for books or music that evoke similar feelings to those captured in the images, and recommendations are endorsed by the community through upvotes. This dataset supports two recom…
▽ More
We introduce a multimodal dataset where users express preferences through images. These images encompass a broad spectrum of visual expressions ranging from landscapes to artistic depictions. Users request recommendations for books or music that evoke similar feelings to those captured in the images, and recommendations are endorsed by the community through upvotes. This dataset supports two recommendation tasks: title generation and multiple-choice selection. Our experiments with large foundation models reveal their limitations in these tasks. Particularly, vision-language models show no significant advantage over language-only counterparts that use descriptions, which we hypothesize is due to underutilized visual capabilities. To better harness these abilities, we propose the chain-of-imagery prompting, which results in notable improvements. We release our code and datasets.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Depth Prompting for Sensor-Agnostic Depth Estimation
Authors:
Jin-Hwi Park,
Chanhwi Jeong,
Junoh Lee,
Hae-Gon Jeon
Abstract:
Dense depth maps have been used as a key element of visual perception tasks. There have been tremendous efforts to enhance the depth quality, ranging from optimization-based to learning-based methods. Despite the remarkable progress for a long time, their applicability in the real world is limited due to systematic measurement biases such as density, sensing pattern, and scan range. It is well-kno…
▽ More
Dense depth maps have been used as a key element of visual perception tasks. There have been tremendous efforts to enhance the depth quality, ranging from optimization-based to learning-based methods. Despite the remarkable progress for a long time, their applicability in the real world is limited due to systematic measurement biases such as density, sensing pattern, and scan range. It is well-known that the biases make it difficult for these methods to achieve their generalization. We observe that learning a joint representation for input modalities (e.g., images and depth), which most recent methods adopt, is sensitive to the biases. In this work, we disentangle those modalities to mitigate the biases with prompt engineering. For this, we design a novel depth prompt module to allow the desirable feature representation according to new depth distributions from either sensor types or scene configurations. Our depth prompt can be embedded into foundation models for monocular depth estimation. Through this embedding process, our method helps the pretrained model to be free from restraint of depth scan range and to provide absolute scale depth maps. We demonstrate the effectiveness of our method through extensive evaluations. Source code is publicly available at https://github.com/JinhwiPark/DepthPrompting .
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Combinational Nonuniform Timeslicing of Dynamic Networks
Authors:
Seokweon Jung,
DongHwa Shin,
Hyeon Jeon,
Jinwook Seo
Abstract:
Dynamic networks represent the complex and evolving interrelationships between real-world entities. Given the scale and variability of these networks, finding an optimal slicing interval is essential for meaningful analysis. Nonuniform timeslicing, which adapts to density changes within the network, is drawing attention as a solution to this problem. In this research, we categorized existing algor…
▽ More
Dynamic networks represent the complex and evolving interrelationships between real-world entities. Given the scale and variability of these networks, finding an optimal slicing interval is essential for meaningful analysis. Nonuniform timeslicing, which adapts to density changes within the network, is drawing attention as a solution to this problem. In this research, we categorized existing algorithms into two domains -- data mining and visualization -- according to their approach to the problem. Data mining approach focuses on capturing temporal patterns of dynamic networks, while visualization approach emphasizes lessening the burden of analysis. We then introduce a novel nonuniform timeslicing method that synthesizes the strengths of both approaches, demonstrating its efficacy with a real-world data. The findings suggest that combining the two approaches offers the potential for more effective network analysis.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Simplifying MBA Expression Using E-Graphs
Authors:
Seoksu Lee,
Hyeongchang Jeon,
Eun-Sun Cho
Abstract:
Code obfuscation involves the addition of meaningless code or the complication of existing code in order to make a program difficult to reverse engineer. In recent years, MBA (Mixed Boolean Arithmetic) obfuscation has been applied to virus and malware code to impede expert analysis. Among the various obfuscation techniques, Mixed Boolean Arithmetic (MBA) obfuscation is considered the most challeng…
▽ More
Code obfuscation involves the addition of meaningless code or the complication of existing code in order to make a program difficult to reverse engineer. In recent years, MBA (Mixed Boolean Arithmetic) obfuscation has been applied to virus and malware code to impede expert analysis. Among the various obfuscation techniques, Mixed Boolean Arithmetic (MBA) obfuscation is considered the most challenging to decipher using existing code deobfuscation techniques. In this paper, we have attempted to simplify the MBA expression. We use an e-graph data structure to efficiently hold multiple expressions of the same semantics to systematically rewrite terms and find simpler expressions. The preliminary experimental result shows that our e-graph based MBA deobfuscation approach works faster with reasonable performance than other approaches do.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Enhancing Empathy in Virtual Reality: An Embodied Approach to Mindset Modulation
Authors:
Seoyeon Bae,
Yoon Kyung Lee,
Jungcheol Lee,
Jaeheon Kim,
Haeseong Jeon,
Seung-Hwan Lim,
Byung-Cheol Kim,
Sowon Hahn
Abstract:
A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a stepping stone, enabling players to identify with t…
▽ More
A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a stepping stone, enabling players to identify with the character and cultivate a growth mindset as they followed mission instructions. We considered several implementation factors to assist players in positioning within the VR experience, including positive feedback, content difficulty, background lighting, and multimodal feedback. We conducted an experiment to investigate the intervention's effectiveness in increasing empathy. Our findings revealed that the VR content and mindset training encouraged participants to improve their growth mindsets and empathic motives. This VR content was developed for college students to enhance their empathy and teamwork skills. It has the potential to improve collaboration in organizational and community environments.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
Authors:
Inhwan Bae,
Young-Jae Park,
Hae-Gon Jeon
Abstract:
There are five types of trajectory prediction tasks: deterministic, stochastic, domain adaptation, momentary observation, and few-shot. These associated tasks are defined by various factors, such as the length of input paths, data split and pre-processing methods. Interestingly, even though they commonly take sequential coordinates of observations as input and infer future paths in the same coordi…
▽ More
There are five types of trajectory prediction tasks: deterministic, stochastic, domain adaptation, momentary observation, and few-shot. These associated tasks are defined by various factors, such as the length of input paths, data split and pre-processing methods. Interestingly, even though they commonly take sequential coordinates of observations as input and infer future paths in the same coordinates as output, designing specialized architectures for each task is still necessary. For the other task, generality issues can lead to sub-optimal performances. In this paper, we propose SingularTrajectory, a diffusion-based universal trajectory prediction framework to reduce the performance gap across the five tasks. The core of SingularTrajectory is to unify a variety of human dynamics representations on the associated tasks. To do this, we first build a Singular space to project all types of motion patterns from each task into one embedding space. We next propose an adaptive anchor working in the Singular space. Unlike traditional fixed anchor methods that sometimes yield unacceptable paths, our adaptive anchor enables correct anchors, which are put into a wrong location, based on a traversability map. Finally, we adopt a diffusion-based predictor to further enhance the prototype paths using a cascaded denoising process. Our unified framework ensures the generality across various benchmark settings such as input modality, and trajectory lengths. Extensive experiments on five public benchmarks demonstrate that SingularTrajectory substantially outperforms existing models, highlighting its effectiveness in estimating general dynamics of human movements. Code is publicly available at https://github.com/inhwanbae/SingularTrajectory .
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction
Authors:
Inhwan Bae,
Junoh Lee,
Hae-Gon Jeon
Abstract:
Language models have demonstrated impressive ability in context understanding and generative performance. Inspired by the recent success of language foundation models, in this paper, we propose LMTraj (Language-based Multimodal Trajectory predictor), which recasts the trajectory prediction task into a sort of question-answering problem. Departing from traditional numerical regression models, which…
▽ More
Language models have demonstrated impressive ability in context understanding and generative performance. Inspired by the recent success of language foundation models, in this paper, we propose LMTraj (Language-based Multimodal Trajectory predictor), which recasts the trajectory prediction task into a sort of question-answering problem. Departing from traditional numerical regression models, which treat the trajectory coordinate sequence as continuous signals, we consider them as discrete signals like text prompts. Specially, we first transform an input space for the trajectory coordinate into the natural language space. Here, the entire time-series trajectories of pedestrians are converted into a text prompt, and scene images are described as text information through image captioning. The transformed numerical and image data are then wrapped into the question-answering template for use in a language model. Next, to guide the language model in understanding and reasoning high-level knowledge, such as scene context and social relationships between pedestrians, we introduce an auxiliary multi-task question and answering. We then train a numerical tokenizer with the prompt data. We encourage the tokenizer to separate the integer and decimal parts well, and leverage it to capture correlations between the consecutive numbers in the language model. Lastly, we train the language model using the numerical tokenizer and all of the question-answer prompts. Here, we propose a beam-search-based most-likely prediction and a temperature-based multimodal prediction to implement both deterministic and stochastic inferences. Applying our LMTraj, we show that the language-based model can be a powerful pedestrian trajectory predictor, and outperforms existing numerical-based predictor methods. Code is publicly available at https://github.com/inhwanbae/LMTrajectory .
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation
Authors:
Ba Hung Ngo,
Nhat-Tuong Do-Tran,
Tuan-Ngoc Nguyen,
Hae-Gon Jeon,
Tae Jong Choi
Abstract:
Most domain adaptation (DA) methods are based on either a convolutional neural networks (CNNs) or a vision transformers (ViTs). They align the distribution differences between domains as encoders without considering their unique characteristics. For instance, ViT excels in accuracy due to its superior ability to capture global representations, while CNN has an advantage in capturing local represen…
▽ More
Most domain adaptation (DA) methods are based on either a convolutional neural networks (CNNs) or a vision transformers (ViTs). They align the distribution differences between domains as encoders without considering their unique characteristics. For instance, ViT excels in accuracy due to its superior ability to capture global representations, while CNN has an advantage in capturing local representations. This fact has led us to design a hybrid method to fully take advantage of both ViT and CNN, called Explicitly Class-specific Boundaries (ECB). ECB learns CNN on ViT to combine their distinct strengths. In particular, we leverage ViT's properties to explicitly find class-specific decision boundaries by maximizing the discrepancy between the outputs of the two classifiers to detect target samples far from the source support. In contrast, the CNN encoder clusters target features based on the previously defined class-specific boundaries by minimizing the discrepancy between the probabilities of the two classifiers. Finally, ViT and CNN mutually exchange knowledge to improve the quality of pseudo labels and reduce the knowledge discrepancies of these models. Compared to conventional DA methods, our ECB achieves superior performance, which verifies its effectiveness in this hybrid model. The project website can be found https://dotrannhattuong.github.io/ECB/website.
△ Less
Submitted 26 April, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Explainable Graph Neural Networks for Observation Impact Analysis in Atmospheric State Estimation
Authors:
Hyeon-Ju Jeon,
Jeon-Ho Kang,
In-Hyuk Kwon,
O-Joun Lee
Abstract:
This paper investigates the impact of observations on atmospheric state estimation in weather forecasting systems using graph neural networks (GNNs) and explainability methods. We integrate observation and Numerical Weather Prediction (NWP) points into a meteorological graph, extracting $k$-hop subgraphs centered on NWP points. Self-supervised GNNs are employed to estimate the atmospheric state by…
▽ More
This paper investigates the impact of observations on atmospheric state estimation in weather forecasting systems using graph neural networks (GNNs) and explainability methods. We integrate observation and Numerical Weather Prediction (NWP) points into a meteorological graph, extracting $k$-hop subgraphs centered on NWP points. Self-supervised GNNs are employed to estimate the atmospheric state by aggregating data within these $k$-hop radii. The study applies gradient-based explainability methods to quantify the significance of different observations in the estimation process. Evaluated with data from 11 satellite and land-based observations, the results highlight the effectiveness of visualizing the importance of observation types, enhancing the understanding and optimization of observational data in weather forecasting.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Extracting Human Attention through Crowdsourced Patch Labeling
Authors:
Minsuk Chang,
Seokhyeon Park,
Hyeon Jeon,
Aeri Cho,
Soohyun Lee,
Jinwook Seo
Abstract:
In image classification, a significant problem arises from bias in the datasets. When it contains only specific types of images, the classifier begins to rely on shortcuts - simplistic and erroneous rules for decision-making. This leads to high performance on the training dataset but inferior results on new, varied images, as the classifier's generalization capability is reduced. For example, if t…
▽ More
In image classification, a significant problem arises from bias in the datasets. When it contains only specific types of images, the classifier begins to rely on shortcuts - simplistic and erroneous rules for decision-making. This leads to high performance on the training dataset but inferior results on new, varied images, as the classifier's generalization capability is reduced. For example, if the images labeled as mustache consist solely of male figures, the model may inadvertently learn to classify images by gender rather than the presence of a mustache. One approach to mitigate such biases is to direct the model's attention toward the target object's location, usually marked using bounding boxes or polygons for annotation. However, collecting such annotations requires substantial time and human effort. Therefore, we propose a novel patch-labeling method that integrates AI assistance with crowdsourcing to capture human attention from images, which can be a viable solution for mitigating bias. Our method consists of two steps. First, we extract the approximate location of a target using a pre-trained saliency detection model supplemented by human verification for accuracy. Then, we determine the human-attentive area in the image by iteratively dividing the image into smaller patches and employing crowdsourcing to ascertain whether each patch can be classified as the target object. We demonstrated the effectiveness of our method in mitigating bias through improved classification accuracy and the refined focus of the model. Also, crowdsourced experiments validate that our method collects human annotation up to 3.4 times faster than annotating object locations with polygons, significantly reducing the need for human resources. We conclude the paper by discussing the advantages of our method in a crowdsourcing context, mainly focusing on aspects of human errors and accessibility.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
ICLN: Input Convex Loss Network for Decision Focused Learning
Authors:
Haeun Jeon,
Hyunglip Bae,
Minsu Park,
Chanyeong Kim,
Woo Chang Kim
Abstract:
In decision-making problem under uncertainty, predicting unknown parameters is often considered independent of the optimization part. Decision-focused Learning (DFL) is a task-oriented framework to integrate prediction and optimization by adapting predictive model to give better decision for the corresponding task. Here, an inevitable challenge arises when computing gradients of the optimal decisi…
▽ More
In decision-making problem under uncertainty, predicting unknown parameters is often considered independent of the optimization part. Decision-focused Learning (DFL) is a task-oriented framework to integrate prediction and optimization by adapting predictive model to give better decision for the corresponding task. Here, an inevitable challenge arises when computing gradients of the optimal decision with respect to the parameters. Existing researches cope this issue by smoothly reforming surrogate optimization or construct surrogate loss function that mimic task loss. However, they are applied to restricted optimization domain or build functions in a local manner leading a large computational time. In this paper, we propose Input Convex Loss Network (ICLN), a novel global surrogate loss which can be implemented in a general DFL paradigm. ICLN learns task loss via Input Convex Neural Networks which is guaranteed to be convex for some inputs, while keeping the global structure for the other inputs. This enables ICLN to admit general DFL through only a single surrogate loss without any sense for choosing appropriate parametric forms. We confirm effectiveness and flexibility of ICLN by evaluating our proposed model with three stochastic decision-making problems.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
CloChat: Understanding How People Customize, Interact, and Experience Personas in Large Language Models
Authors:
Juhye Ha,
Hyeon Jeon,
DaEun Han,
Jinwook Seo,
Changhoon Oh
Abstract:
Large language models (LLMs) have facilitated significant strides in generating conversational agents, enabling seamless, contextually relevant dialogues across diverse topics. However, the existing LLM-driven conversational agents have fixed personalities and functionalities, limiting their adaptability to individual user needs. Creating personalized agent personas with distinct expertise or trai…
▽ More
Large language models (LLMs) have facilitated significant strides in generating conversational agents, enabling seamless, contextually relevant dialogues across diverse topics. However, the existing LLM-driven conversational agents have fixed personalities and functionalities, limiting their adaptability to individual user needs. Creating personalized agent personas with distinct expertise or traits can address this issue. Nonetheless, we lack knowledge of how people customize and interact with agent personas. In this research, we investigated how users customize agent personas and their impact on interaction quality, diversity, and dynamics. To this end, we developed CloChat, an interface supporting easy and accurate customization of agent personas in LLMs. We conducted a study comparing how participants interact with CloChat and ChatGPT. The results indicate that participants formed emotional bonds with the customized agents, engaged in more dynamic dialogues, and showed interest in sustaining interactions. These findings contribute to design implications for future systems with conversational agents using LLMs.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
CloudNine: Analyzing Meteorological Observation Impact on Weather Prediction Using Explainable Graph Neural Networks
Authors:
Hyeon-Ju Jeon,
Jeon-Ho Kang,
In-Hyuk Kwon,
O-Joun Lee
Abstract:
The impact of meteorological observations on weather forecasting varies with sensor type, location, time, and other environmental factors. Thus, quantitative analysis of observation impacts is crucial for effective and efficient development of weather forecasting systems. However, the existing impact analysis methods are difficult to be widely applied due to their high dependencies on specific for…
▽ More
The impact of meteorological observations on weather forecasting varies with sensor type, location, time, and other environmental factors. Thus, quantitative analysis of observation impacts is crucial for effective and efficient development of weather forecasting systems. However, the existing impact analysis methods are difficult to be widely applied due to their high dependencies on specific forecasting systems. Also, they cannot provide observation impacts at multiple spatio-temporal scales, only global impacts of observation types. To address these issues, we present a novel system called ``CloudNine,'' which allows analysis of individual observations' impacts on specific predictions based on explainable graph neural networks (XGNNs). Combining an XGNN-based atmospheric state estimation model with a numerical weather prediction model, we provide a web application to search for observations in the 3D space of the Earth system and to visualize the impact of individual observations on predictions in specific spatial regions and time periods.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
A Dual-Prompting for Interpretable Mental Health Language Models
Authors:
Hyolim Jeon,
Dongje Yoo,
Daeun Lee,
Sejung Son,
Seungbae Kim,
Jinyoung Han
Abstract:
Despite the increasing demand for AI-based mental health monitoring tools, their practical utility for clinicians is limited by the lack of interpretability.The CLPsych 2024 Shared Task (Chim et al., 2024) aims to enhance the interpretability of Large Language Models (LLMs), particularly in mental health analysis, by providing evidence of suicidality through linguistic content. We propose a dual-p…
▽ More
Despite the increasing demand for AI-based mental health monitoring tools, their practical utility for clinicians is limited by the lack of interpretability.The CLPsych 2024 Shared Task (Chim et al., 2024) aims to enhance the interpretability of Large Language Models (LLMs), particularly in mental health analysis, by providing evidence of suicidality through linguistic content. We propose a dual-prompting approach: (i) Knowledge-aware evidence extraction by leveraging the expert identity and a suicide dictionary with a mental health-specific LLM; and (ii) Evidence summarization by employing an LLM-based consistency evaluator. Comprehensive experiments demonstrate the effectiveness of combining domain-specific information, revealing performance improvements and the approach's potential to aid clinicians in assessing mental state progression.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Learning Emergent Gaits with Decentralized Phase Oscillators: on the role of Observations, Rewards, and Feedback
Authors:
Jenny Zhang,
Steve Heim,
Se Hwan Jeon,
Sangbae Kim
Abstract:
We present a minimal phase oscillator model for learning quadrupedal locomotion. Each of the four oscillators is coupled only to itself and its corresponding leg through local feedback of the ground reaction force, which can be interpreted as an observer feedback gain. We interpret the oscillator itself as a latent contact state-estimator. Through a systematic ablation study, we show that the comb…
▽ More
We present a minimal phase oscillator model for learning quadrupedal locomotion. Each of the four oscillators is coupled only to itself and its corresponding leg through local feedback of the ground reaction force, which can be interpreted as an observer feedback gain. We interpret the oscillator itself as a latent contact state-estimator. Through a systematic ablation study, we show that the combination of phase observations, simple phase-based rewards, and the local feedback dynamics induces policies that exhibit emergent gait preferences, while using a reduced set of simple rewards, and without prescribing a specific gait. The code is open-source, and a video synopsis available at https://youtu.be/1NKQ0rSV3jU.
△ Less
Submitted 17 February, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models
Authors:
Hyesung Jeon,
Yulhwa Kim,
Jae-joon Kim
Abstract:
Due to the high memory and computational costs associated with Large Language Models, model compression via quantization and parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA), are gaining popularity. This has led to active research on quantization-aware PEFT techniques, which aim to create models with high accuracy and low memory overhead. Among quantization method…
▽ More
Due to the high memory and computational costs associated with Large Language Models, model compression via quantization and parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA), are gaining popularity. This has led to active research on quantization-aware PEFT techniques, which aim to create models with high accuracy and low memory overhead. Among quantization methods, post-training quantization (PTQ) is more commonly used in previous works than quantization-aware training (QAT), despite QAT's potential for higher accuracy. This preference is due to PTQ's low training overhead. However, PTQ-based PEFT methods often utilize high-precision parameters, making it difficult to fully exploit the efficiency of quantization. Additionally, they have limited adaptation ability due to a reduced and constrained LoRA parameter structure. To overcome these challenges, we propose L4Q, which leverages joint quantization and fine-tuning to reduce QAT's memory overhead and produce models that consist entirely of quantized weights while achieving effective adaptation to downstream tasks. By design, L4Q allows quantization parameters to reflect weight updates, while weight updates reduce quantization errors. Our experiments demonstrate that this coupled quantization and fine-tuning approach yields superior accuracy compared to decoupled fine-tuning schemes in sub-4-bit quantization. Using the LLaMA model families and instructional datasets, we showcase L4Q's capabilities in language tasks and few-shot in-context learning.
△ Less
Submitted 22 May, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Long-Term Typhoon Trajectory Prediction: A Physics-Conditioned Approach Without Reanalysis Data
Authors:
Young-Jae Park,
Minseok Seo,
Doyi Kim,
Hyeri Kim,
Sanghoon Choi,
Beomkyu Choi,
Jeongwon Ryu,
Sohee Son,
Hae-Gon Jeon,
Yeji Choi
Abstract:
In the face of escalating climate changes, typhoon intensities and their ensuing damage have surged. Accurate trajectory prediction is crucial for effective damage control. Traditional physics-based models, while comprehensive, are computationally intensive and rely heavily on the expertise of forecasters. Contemporary data-driven methods often rely on reanalysis data, which can be considered to b…
▽ More
In the face of escalating climate changes, typhoon intensities and their ensuing damage have surged. Accurate trajectory prediction is crucial for effective damage control. Traditional physics-based models, while comprehensive, are computationally intensive and rely heavily on the expertise of forecasters. Contemporary data-driven methods often rely on reanalysis data, which can be considered to be the closest to the true representation of weather conditions. However, reanalysis data is not produced in real-time and requires time for adjustment because prediction models are calibrated with observational data. This reanalysis data, such as ERA5, falls short in challenging real-world situations. Optimal preparedness necessitates predictions at least 72 hours in advance, beyond the capabilities of standard physics models. In response to these constraints, we present an approach that harnesses real-time Unified Model (UM) data, sidestepping the limitations of reanalysis data. Our model provides predictions at 6-hour intervals for up to 72 hours in advance and outperforms both state-of-the-art data-driven methods and numerical weather prediction models. In line with our efforts to mitigate adversities inflicted by \rthree{typhoons}, we release our preprocessed \textit{PHYSICS TRACK} dataset, which includes ERA5 reanalysis data, typhoon best-track, and UM forecast data.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
An Information-Theoretic Analysis of In-Context Learning
Authors:
Hong Jun Jeon,
Jason D. Lee,
Qi Lei,
Benjamin Van Roy
Abstract:
Previous theoretical results pertaining to meta-learning on sequences build on contrived assumptions and are somewhat convoluted. We introduce new information-theoretic tools that lead to an elegant and very general decomposition of error into three components: irreducible error, meta-learning error, and intra-task error. These tools unify analyses across many meta-learning challenges. To illustra…
▽ More
Previous theoretical results pertaining to meta-learning on sequences build on contrived assumptions and are somewhat convoluted. We introduce new information-theoretic tools that lead to an elegant and very general decomposition of error into three components: irreducible error, meta-learning error, and intra-task error. These tools unify analyses across many meta-learning challenges. To illustrate, we apply them to establish new results about in-context learning with transformers. Our theoretical results characterizes how error decays in both the number of training sequences and sequence lengths. Our results are very general; for example, they avoid contrived mixing time assumptions made by all prior results that establish decay of error with sequence length.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Adaptive Crowdsourcing Via Self-Supervised Learning
Authors:
Anmol Kagrecha,
Henrik Marklund,
Benjamin Van Roy,
Hong Jun Jeon,
Richard Zeckhauser
Abstract:
Common crowdsourcing systems average estimates of a latent quantity of interest provided by many crowdworkers to produce a group estimate. We develop a new approach -- predict-each-worker -- that leverages self-supervised learning and a novel aggregation scheme. This approach adapts weights assigned to crowdworkers based on estimates they provided for previous quantities. When skills vary across c…
▽ More
Common crowdsourcing systems average estimates of a latent quantity of interest provided by many crowdworkers to produce a group estimate. We develop a new approach -- predict-each-worker -- that leverages self-supervised learning and a novel aggregation scheme. This approach adapts weights assigned to crowdworkers based on estimates they provided for previous quantities. When skills vary across crowdworkers or their estimates correlate, the weighted sum offers a more accurate group estimate than the average. Existing algorithms such as expectation maximization can, at least in principle, produce similarly accurate group estimates. However, their computational requirements become onerous when complex models, such as neural networks, are required to express relationships among crowdworkers. Predict-each-worker accommodates such complexity as well as many other practical challenges. We analyze the efficacy of predict-each-worker through theoretical and computational studies. Among other things, we establish asymptotic optimality as the number of engagements per crowdworker grows.
△ Less
Submitted 1 February, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
RainSD: Rain Style Diversification Module for Image Synthesis Enhancement using Feature-Level Style Distribution
Authors:
Hyeonjae Jeon,
Junghyun Seo,
Taesoo Kim,
Sungho Son,
Jungki Lee,
Gyeungho Choi,
Yongseob Lim
Abstract:
Autonomous driving technology nowadays targets to level 4 or beyond, but the researchers are faced with some limitations for developing reliable driving algorithms in diverse challenges. To promote the autonomous vehicles to spread widely, it is important to address safety issues on this technology. Among various safety concerns, the sensor blockage problem by severe weather conditions can be one…
▽ More
Autonomous driving technology nowadays targets to level 4 or beyond, but the researchers are faced with some limitations for developing reliable driving algorithms in diverse challenges. To promote the autonomous vehicles to spread widely, it is important to address safety issues on this technology. Among various safety concerns, the sensor blockage problem by severe weather conditions can be one of the most frequent threats for multi-task learning based perception algorithms during autonomous driving. To handle this problem, the importance of the generation of proper datasets is becoming more significant. In this paper, a synthetic road dataset with sensor blockage generated from real road dataset BDD100K is suggested in the format of BDD100K annotation. Rain streaks for each frame were made by an experimentally established equation and translated utilizing the image-to-image translation network based on style transfer. Using this dataset, the degradation of the diverse multi-task networks for autonomous driving, such as lane detection, driving area segmentation, and traffic object detection, has been thoroughly evaluated and analyzed. The tendency of the performance degradation of deep neural network-based perception systems for autonomous vehicle has been analyzed in depth. Finally, we discuss the limitation and the future directions of the deep neural network-based perception algorithms and autonomous driving dataset generation based on image-to-image translation.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
Generalized ordinal analysis and reflection principles in set theory
Authors:
Hanul Jeon,
James Walsh
Abstract:
It is widely claimed that the natural axiom systems$\unicode{x2013}$including the large cardinal axioms$\unicode{x2013}$form a well-ordered hierarchy. Yet, as is well-known, it is possible to exhibit non-linearity and ill-foundedness by means of \emph{ad hoc} constructions. In this paper we formulate notions of proof-theoretic strength based on set-theoretic reflection principles. We prove that th…
▽ More
It is widely claimed that the natural axiom systems$\unicode{x2013}$including the large cardinal axioms$\unicode{x2013}$form a well-ordered hierarchy. Yet, as is well-known, it is possible to exhibit non-linearity and ill-foundedness by means of \emph{ad hoc} constructions. In this paper we formulate notions of proof-theoretic strength based on set-theoretic reflection principles. We prove that they coincide with orderings on theories given by the generalized ordinal analysis of Pohlers. Accordingly, these notions of proof-theoretic strength engender genuinely well-ordered hierarchies. The reflection principles considered in this paper are formulated relative to Gödel's constructible universe; we conclude with generalizations to other inner models.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
The proof-theoretic strength of Constructive Second-order set theories
Authors:
Hanul Jeon
Abstract:
In this paper, we define constructive analogues of second-order set theories, which we will call $\mathsf{IGB}$, $\mathsf{CGB}$, $\mathsf{IKM}$, and $\mathsf{CKM}$. Each of them can be viewed as $\mathsf{IZF}$- and $\mathsf{CZF}$-analogues of Gödel-Bernays set theory $\mathsf{GB}$ and Kelley-Morse set theory $\mathsf{KM}$. We also provide their proof-theoretic strengths in terms of classical theor…
▽ More
In this paper, we define constructive analogues of second-order set theories, which we will call $\mathsf{IGB}$, $\mathsf{CGB}$, $\mathsf{IKM}$, and $\mathsf{CKM}$. Each of them can be viewed as $\mathsf{IZF}$- and $\mathsf{CZF}$-analogues of Gödel-Bernays set theory $\mathsf{GB}$ and Kelley-Morse set theory $\mathsf{KM}$. We also provide their proof-theoretic strengths in terms of classical theories, and we especially prove that $\mathsf{CKM}$ and full Second-Order Arithmetic have the same proof-theoretic strength.
△ Less
Submitted 6 May, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Challenges of YOLO Series for Object Detection in Extremely Heavy Rain: CALRA Simulator based Synthetic Evaluation Dataset
Authors:
T. Kim,
H. Jeon,
Y. Lim
Abstract:
Recently, as many studies of autonomous vehicles have been achieved for levels 4 and 5, there has been also increasing interest in the advancement of perception, decision, and control technologies, which are the three major aspects of autonomous vehicles. As for the perception technologies achieving reliable maneuvering of autonomous vehicles, object detection by using diverse sensors (e.g., LiDAR…
▽ More
Recently, as many studies of autonomous vehicles have been achieved for levels 4 and 5, there has been also increasing interest in the advancement of perception, decision, and control technologies, which are the three major aspects of autonomous vehicles. As for the perception technologies achieving reliable maneuvering of autonomous vehicles, object detection by using diverse sensors (e.g., LiDAR, radar, and camera) should be prioritized. These sensors require to detect objects accurately and quickly in diverse weather conditions, but they tend to have challenges to consistently detect objects in bad weather conditions with rain, snow, or fog. Thus, in this study, based on the experimentally obtained raindrop data from precipitation conditions, we constructed a novel dataset that could test diverse network model in various precipitation conditions through the CARLA simulator. Consequently, based on our novel dataset, YOLO series, a one-stage-detector, was used to quantitatively verify how much object detection performance could be decreased under various precipitation conditions from normal to extreme heavy rain situations.
△ Less
Submitted 14 December, 2023; v1 submitted 13 December, 2023;
originally announced December 2023.
-
The High Energy Light Isotope eXperiment program of direct cosmic-ray studies
Authors:
HELIX Collaboration,
S. Coutu,
P. S. Allison,
M. Baiocchi,
J. J. Beatty,
L. Beaufore,
D. H. Calderon,
A. G. Castano,
Y. Chen,
N. Green,
D. Hanna,
H. B. Jeon,
S. B. Klein,
B. Kunkler,
M. Lang,
R. Mbarek,
K. McBride,
S. I. Mognet,
J. Musser,
S. Nutter,
S. OBrien,
N. Park,
K. M. Powledge,
K. Sakai,
M. Tabata
, et al. (5 additional authors not shown)
Abstract:
HELIX is a new NASA-sponsored instrument aimed at measuring the spectra and composition of light cosmic-ray isotopes from hydrogen to neon nuclei, in particular the clock isotopes 10Be (radioactive, with 1.4 Myr lifetime) and 9Be (stable). The latter are unique markers of the production and Galactic propagation of secondary cosmic-ray nuclei, and are needed to resolve such important mysteries as t…
▽ More
HELIX is a new NASA-sponsored instrument aimed at measuring the spectra and composition of light cosmic-ray isotopes from hydrogen to neon nuclei, in particular the clock isotopes 10Be (radioactive, with 1.4 Myr lifetime) and 9Be (stable). The latter are unique markers of the production and Galactic propagation of secondary cosmic-ray nuclei, and are needed to resolve such important mysteries as the proportion of secondary positrons in the excess of antimatter observed by the AMS-02 experiment. By using a combination of a 1 T superconducting magnet spectrometer (with drift-chamber tracker) with a high-resolution time-of-flight detector system and ring-imaging Cherenkov detector, mass-resolved isotope measurements of light cosmic-ray nuclei will be possible up to 3 GeV/n in a first stratospheric balloon flight from Kiruna, Sweden to northern Canada, anticipated to take place in early summer 2024. An eventual longer Antarctic balloon flight of HELIX will yield measurements up to 10 GeV/n, sampling production from a larger volume of the Galaxy extending into the halo. We review the instrument design, testing, status and scientific prospects.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Photo-induced charge carrier dynamics in a semiconductor-based ion trap investigated via motion-sensitive qubit transitions
Authors:
Woojun Lee,
Daun Chung,
Honggi Jeon,
Beomgeun Cho,
KwangYeul Choi,
SeungWoo Yoo,
Changhyun Jung,
Junho Jeong,
Changsoon Kim,
Dong-Il "Dan'' Cho,
Taehyun Kim
Abstract:
Ion trap systems built upon microfabricated chips have emerged as a promising platform for quantum computing to achieve reproducible and scalable structures. However, photo-induced charging of materials in such chips can generate undesired stray electric fields that disrupt the quantum state of the ion, limiting high-fidelity quantum control essential for practical quantum computing. While crude u…
▽ More
Ion trap systems built upon microfabricated chips have emerged as a promising platform for quantum computing to achieve reproducible and scalable structures. However, photo-induced charging of materials in such chips can generate undesired stray electric fields that disrupt the quantum state of the ion, limiting high-fidelity quantum control essential for practical quantum computing. While crude understanding of the phenomena has been gained heuristically over the past years, explanations for the microscopic mechanism of photo-generated charge carrier dynamics remains largely elusive. Here, we present a photo-induced charging model for semiconductors, whose verification is enabled by a systematic interaction between trapped ions and photo-induced stray fields from exposed silicon surfaces in our chip. We use motion-sensitive qubit transitions to directly characterize the stray field and analyze its effect on the quantum dynamics of the trapped ion. In contrast to incoherent errors arising from the thermal motion of the ion, coherent errors are induced by the stray field, whose effect is significantly imprinted during the quantum control of the ion. These errors are investigated in depth and methods to mitigate them are discussed. Finally, we extend the implications of our study to other photo-induced charging mechanisms prevalent in ion traps.
△ Less
Submitted 29 November, 2023;
originally announced December 2023.
-
Comparison of CMS measurements with predictions at NLO applying the Parton Branching Method and PYTHIA
Authors:
Fernando Guzman,
Si Hyun Jeon,
Hannes Jung,
Danyer Perez Adan,
Sara Taheri Monfared,
Fateme Almaksusi,
Daniel Belmonte Perez,
Dorukhan Boncukcu,
Aleksandr Boger,
Emmanuel Botero Osorio,
Isadora Bozza Galvão,
Juan Esteban Ospina Holguin,
Behnam Falahi,
Faeze Gagonani,
Omar Gonzalez,
Acelya Deniz Güngördü,
Abdelhamid Haddad,
Mahtab Jalalvandi,
Josue Daniel Jaramillo,
Jesus Jimenez Zepeda,
Gleb Kutyrev,
Nazanin Zahra Noroozi,
Nestor Raul Mancilla Xinto,
Haritz Mentaste,
Lucas Johnny Monte Tamayo
, et al. (9 additional authors not shown)
Abstract:
In August 2023, more than 30 students joined the Special Remote DESY summer-school to work on projects of importance for LHC experiments. In a dedicated initiative, analyses that had not been incorporated into the RIVET package were implemented and verified. Here, a brief description of the accomplished work is given, and a comparison of the measurements with predictions obtained from matched stan…
▽ More
In August 2023, more than 30 students joined the Special Remote DESY summer-school to work on projects of importance for LHC experiments. In a dedicated initiative, analyses that had not been incorporated into the RIVET package were implemented and verified. Here, a brief description of the accomplished work is given, and a comparison of the measurements with predictions obtained from matched standard parton shower Monte Carlo event generators as well as with those obtained from Parton-Branching TMDs with corresponding parton showers are presented.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
SpeakEasy: A Conversational Intelligence Chatbot for Enhancing College Students' Communication Skills
Authors:
Hyunbae Jeon,
Rhea Ramachandran,
Victoria Ploerer,
Yella Diekmann,
Max Bagga
Abstract:
Social interactions and conversation skills separate the successful from the rest and the confident from the shy. For college students in particular, the ability to converse can be an outlet for the stress and anxiety experienced on a daily basis along with a foundation for all-important career skills. In light of this, we designed SpeakEasy: a chatbot with some degree of intelligence that provide…
▽ More
Social interactions and conversation skills separate the successful from the rest and the confident from the shy. For college students in particular, the ability to converse can be an outlet for the stress and anxiety experienced on a daily basis along with a foundation for all-important career skills. In light of this, we designed SpeakEasy: a chatbot with some degree of intelligence that provides feedback to the user on their ability to engage in free-form conversations with the chatbot. SpeakEasy attempts to help college students improve their communication skills by engaging in a seven-minute spoken conversation with the user, analyzing the user's responses with metrics designed based on previous psychology and linguistics research, and providing feedback to the user on how they can improve their conversational ability. To simulate natural conversation, SpeakEasy converses with the user on a wide assortment of topics that two people meeting for the first time might discuss: travel, sports, and entertainment. Unlike most other chatbots with the goal of improving conversation skills, SpeakEasy actually records the user speaking, transcribes the audio into tokens, and uses macros-e.g., sequences that calculate the pace of speech, determine if the user has an over-reliance on certain words, and identifies awkward transitions-to evaluate the quality of the conversation. Based on the evaluation, SpeakEasy provides elaborate feedback on how the user can improve their conversations. In turn, SpeakEasy updates its algorithms based on a series of questions that the user responds to regarding SpeakEasy's performance.
△ Less
Submitted 23 September, 2023;
originally announced October 2023.
-
Learning Co-Speech Gesture for Multimodal Aphasia Type Detection
Authors:
Daeun Lee,
Sejung Son,
Hyolim Jeon,
Seungbae Kim,
Jinyoung Han
Abstract:
Aphasia, a language disorder resulting from brain damage, requires accurate identification of specific aphasia types, such as Broca's and Wernicke's aphasia, for effective treatment. However, little attention has been paid to developing methods to detect different types of aphasia. Recognizing the importance of analyzing co-speech gestures for distinguish aphasia types, we propose a multimodal gra…
▽ More
Aphasia, a language disorder resulting from brain damage, requires accurate identification of specific aphasia types, such as Broca's and Wernicke's aphasia, for effective treatment. However, little attention has been paid to developing methods to detect different types of aphasia. Recognizing the importance of analyzing co-speech gestures for distinguish aphasia types, we propose a multimodal graph neural network for aphasia type detection using speech and corresponding gesture patterns. By learning the correlation between the speech and gesture modalities for each aphasia type, our model can generate textual representations sensitive to gesture information, leading to accurate aphasia type detection. Extensive experiments demonstrate the superiority of our approach over existing methods, achieving state-of-the-art results (F1 84.2\%). We also show that gesture features outperform acoustic features, highlighting the significance of gesture expression in detecting aphasia types. We provide the codes for reproducibility purposes.
△ Less
Submitted 20 October, 2023; v1 submitted 18 October, 2023;
originally announced October 2023.
-
Cold-start Bundle Recommendation via Popularity-based Coalescence and Curriculum Heating
Authors:
Hyunsik Jeon,
Jong-eun Lee,
Jeongin Yun,
U Kang
Abstract:
How can we recommend cold-start bundles to users? The cold-start problem in bundle recommendation is crucial because new bundles are continuously created on the Web for various marketing purposes. Despite its importance, existing methods for cold-start item recommendation are not readily applicable to bundles. They depend overly on historical information, even for less popular bundles, failing to…
▽ More
How can we recommend cold-start bundles to users? The cold-start problem in bundle recommendation is crucial because new bundles are continuously created on the Web for various marketing purposes. Despite its importance, existing methods for cold-start item recommendation are not readily applicable to bundles. They depend overly on historical information, even for less popular bundles, failing to address the primary challenge of the highly skewed distribution of bundle interactions. In this work, we propose CoHeat (Popularity-based Coalescence and Curriculum Heating), an accurate approach for cold-start bundle recommendation. CoHeat first represents users and bundles through graph-based views, capturing collaborative information effectively. To estimate the user-bundle relationship more accurately, CoHeat addresses the highly skewed distribution of bundle interactions through a popularity-based coalescence approach, which incorporates historical and affiliation information based on the bundle's popularity. Furthermore, it effectively learns latent representations by exploiting curriculum learning and contrastive learning. CoHeat demonstrates superior performance in cold-start bundle recommendation, achieving up to 193% higher nDCG@20 compared to the best competitor.
△ Less
Submitted 10 March, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
ECGNet: A generative adversarial network (GAN) approach to the synthesis of 12-lead ECG signals from single lead inputs
Authors:
Max Bagga,
Hyunbae Jeon,
Alex Issokson
Abstract:
Electrocardiography (ECG) signal generation has been heavily explored using generative adversarial networks (GAN) because the implementation of 12-lead ECGs is not always feasible. The GAN models have achieved remarkable results in reproducing ECG signals but are only designed for multiple lead inputs and the features the GAN model preserves have not been identified-limiting the generated signals…
▽ More
Electrocardiography (ECG) signal generation has been heavily explored using generative adversarial networks (GAN) because the implementation of 12-lead ECGs is not always feasible. The GAN models have achieved remarkable results in reproducing ECG signals but are only designed for multiple lead inputs and the features the GAN model preserves have not been identified-limiting the generated signals use in cardiovascular disease (CVD)-predictive models. This paper presents ECGNet which is a procedure that generates a complete set of 12-lead ECG signals from any single lead input using a GAN framework with a bidirectional long short-term memory (LSTM) generator and a convolutional neural network (CNN) discriminator. Cross and auto-correlation analysis performed on the generated signals identifies features conserved during the signal generation-i.e., features that can characterize the unique-nature of each signal and thus likely indicators of CVD. Finally, by using ECG signals annotated with the CVD-indicative features detailed by the correlation analysis as inputs for a CVD-onset-predictive CNN model, we overcome challenges preventing the prediction of multiple-CVD targets. Our models are experimented on 15s 12-lead ECG dataset recorded using MyoVista's wavECG. Functional outcome data for each patient is recorded and used in the CVD-predictive model. Our best GAN model achieves state-of-the-art accuracy with Frechet Distance (FD) scores of 4.73, 4.89, 5.18, 4.77, 4.71, and 5.55 on the V1-V6 pre-cordial leads respectively and shows strength in preserving the P-Q segments and R-peaks in the generated signals. To the best of our knowledge, ECGNet is the first to predict all of the remaining eleven leads from the input of any single lead.
△ Less
Submitted 23 September, 2023;
originally announced October 2023.
-
HPCClusterScape: Increasing Transparency and Efficiency of Shared High-Performance Computing Clusters for Large-scale AI Models
Authors:
Heungseok Park,
Aeree Cho,
Hyojun Jeon,
Hayoung Lee,
Youngil Yang,
Sungjae Lee,
Heungsub Lee,
Jaegul Choo
Abstract:
The emergence of large-scale AI models, like GPT-4, has significantly impacted academia and industry, driving the demand for high-performance computing (HPC) to accelerate workloads. To address this, we present HPCClusterScape, a visualization system that enhances the efficiency and transparency of shared HPC clusters for large-scale AI models. HPCClusterScape provides a comprehensive overview of…
▽ More
The emergence of large-scale AI models, like GPT-4, has significantly impacted academia and industry, driving the demand for high-performance computing (HPC) to accelerate workloads. To address this, we present HPCClusterScape, a visualization system that enhances the efficiency and transparency of shared HPC clusters for large-scale AI models. HPCClusterScape provides a comprehensive overview of system-level (e.g., partitions, hosts, and workload status) and application-level (e.g., identification of experiments and researchers) information, allowing HPC operators and machine learning researchers to monitor resource utilization and identify issues through customizable violation rules. The system includes diagnostic tools to investigate workload imbalances and synchronization bottlenecks in large-scale distributed deep learning experiments. Deployed in industrial-scale HPC clusters, HPCClusterScape incorporates user feedback and meets specific requirements. This paper outlines the challenges and prerequisites for efficient HPC operation, introduces the interactive visualization system, and highlights its contributions in addressing pain points and optimizing resource utilization in shared HPC clusters.
△ Less
Submitted 20 December, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Modeling Student Performance in Game-Based Learning Environments
Authors:
Hyunbae Jeon,
Harry He,
Anthony Wang,
Susanna Spooner
Abstract:
This study investigates game-based learning in the context of the educational game "Jo Wilder and the Capitol Case," focusing on predicting student performance using various machine learning models, including K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), and Random Forest. The research aims to identify the features most predictive of student performance and correct question answering. B…
▽ More
This study investigates game-based learning in the context of the educational game "Jo Wilder and the Capitol Case," focusing on predicting student performance using various machine learning models, including K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), and Random Forest. The research aims to identify the features most predictive of student performance and correct question answering. By leveraging gameplay data, we establish complete benchmarks for these models and explore the importance of applying proper data aggregation methods. By compressing all numeric data to min/max/mean/sum and categorical data to first, last, count, and nunique, we reduced the size of the original training data from 4.6 GB to 48 MB of preprocessed training data, maintaining high F1 scores and accuracy.
Our findings suggest that proper preprocessing techniques can be vital in enhancing the performance of non-deep-learning-based models. The MLP model outperformed the current state-of-the-art French Touch model, achieving an F-1 score of 0.83 and an accuracy of 0.74, suggesting its suitability for this dataset. Future research should explore using larger datasets, other preprocessing techniques, more advanced deep learning techniques, and real-world applications to provide personalized learning recommendations to students based on their predicted performance. This paper contributes to the understanding of game-based learning and provides insights into optimizing educational game experiences for improved student outcomes and skill development.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models
Authors:
Hyung-Kwon Ko,
Hyeon Jeon,
Gwanmo Park,
Dae Hyun Kim,
Nam Wook Kim,
Juho Kim,
Jinwook Seo
Abstract:
We introduce VL2NL, a Large Language Model (LLM) framework that generates rich and diverse NL datasets using only Vega-Lite specifications as input, thereby streamlining the development of Natural Language Interfaces (NLIs) for data visualization. To synthesize relevant chart semantics accurately and enhance syntactic diversity in each NL dataset, we leverage 1) a guided discovery incorporated int…
▽ More
We introduce VL2NL, a Large Language Model (LLM) framework that generates rich and diverse NL datasets using only Vega-Lite specifications as input, thereby streamlining the development of Natural Language Interfaces (NLIs) for data visualization. To synthesize relevant chart semantics accurately and enhance syntactic diversity in each NL dataset, we leverage 1) a guided discovery incorporated into prompting so that LLMs can steer themselves to create faithful NL datasets in a self-directed manner; 2) a score-based paraphrasing to augment NL syntax along with four language axes. We also present a new collection of 1,981 real-world Vega-Lite specifications that have increased diversity and complexity than existing chart collections. When tested on our chart collection, VL2NL extracted chart semantics and generated L1/L2 captions with 89.4% and 76.0% accuracy, respectively. It also demonstrated generating and paraphrasing utterances and questions with greater diversity compared to the benchmarks. Last, we discuss how our NL datasets and framework can be utilized in real-world scenarios. The codes and chart collection are available at https://github.com/hyungkwonko/chart-llm.
△ Less
Submitted 21 January, 2024; v1 submitted 18 September, 2023;
originally announced September 2023.
-
Non-reciprocal absorption and zero reflection in physically separated dual photonic resonators by traveling-wave-induced indirect coupling
Authors:
Bojong Kim,
Junyoung Kim,
Hae-Chan Jeon,
Sang-Koog Kim
Abstract:
We experimentally explored novel behaviors of non-reciprocal absorption and almost zero reflection in a dual photon resonator system, which is physically separated and composed of two inverted split ring resonators (ISRRs) with varying inter-distances. We also found that an electromagnetically-induced-transparency (EIT)-like peak at a specific inter-distance of d = 18 mm through traveling waves fl…
▽ More
We experimentally explored novel behaviors of non-reciprocal absorption and almost zero reflection in a dual photon resonator system, which is physically separated and composed of two inverted split ring resonators (ISRRs) with varying inter-distances. We also found that an electromagnetically-induced-transparency (EIT)-like peak at a specific inter-distance of d = 18 mm through traveling waves flowing along a shared microstrip line to which the dual ISRRs are dissipatively coupled. With the aid of CST-simulations and analytical modeling, we found that destructive and/or constructive interferences in traveling waves, indirectly coupled to each ISRR, result in a traveling-wave-induced transparency peak within a narrow window. Furthermore, we observed not only strong non-reciprocal responses of reflectivity and absorptivity at individual inter-distances exactly at the corresponding EIT-like peak positions, but also nearly zero reflection and almost perfect absorption for a specific case of d = 20 mm. Finally, the unidirectional absorptions with zero reflection at d = 20 mm are found to be ascribed to a non-Hermitian origin. This work not only provides a better understanding of traveling-wave-induced indirect coupling between two photonic resonators without magnetic coupling, but also suggests potential implications for the resulting non-reciprocal behaviors of absorption and reflection in microwave circuits and quantum information devices.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
The acrylic vessel for JSNS$^{2}$-II neutrino target
Authors:
C. D. Shin,
S. Ajimura,
M. K. Cheoun,
J. H. Choi,
J. Y. Choi,
T. Dodo,
J. Goh,
K. Haga,
M. Harada,
S. Hasegawa,
T. Hiraiwa,
W. Hwang,
T. Iida,
H. I. Jang,
J. S. Jang,
H. Jeon,
S. Jeon,
K. K. Joo,
D. E. Jung,
S. K. Kang,
Y. Kasugai,
T. Kawasaki,
E. J. Kim,
J. Y. Kim,
S. B. Kim
, et al. (35 additional authors not shown)
Abstract:
The JSNS$^{2}$ (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) is an experiment designed for the search for sterile neutrinos. The experiment is currently at the stage of the second phase named JSNS$^{2}$-II with two detectors at near and far locations from the neutrino source. One of the key components of the experiment is an acrylic vessel, that is used for the target volume…
▽ More
The JSNS$^{2}$ (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) is an experiment designed for the search for sterile neutrinos. The experiment is currently at the stage of the second phase named JSNS$^{2}$-II with two detectors at near and far locations from the neutrino source. One of the key components of the experiment is an acrylic vessel, that is used for the target volume for the detection of the anti-neutrinos. The specifications, design, and measured properties of the acrylic vessel are described.
△ Less
Submitted 11 December, 2023; v1 submitted 4 September, 2023;
originally announced September 2023.
-
On Separating Wholeness Axioms
Authors:
Hanul Jeon
Abstract:
In this paper, we prove that $\mathsf{WA}_1$ implies the consistency of $\mathsf{WA}_0$, and $\mathsf{WA}_{n+2}$ implies the consistency of $\mathsf{WA}_n$ for $n\ge 0$. We also prove that $\mathsf{ZFC+WA}_n$ is finitely axiomatizable, and $\mathsf{ZFC+WA}$ is not finitely axiomatizable.
In this paper, we prove that $\mathsf{WA}_1$ implies the consistency of $\mathsf{WA}_0$, and $\mathsf{WA}_{n+2}$ implies the consistency of $\mathsf{WA}_n$ for $n\ge 0$. We also prove that $\mathsf{ZFC+WA}_n$ is finitely axiomatizable, and $\mathsf{ZFC+WA}$ is not finitely axiomatizable.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Study on the accidental background of the JSNS$^2$ experiment
Authors:
D. H. Lee,
S. Ajimura,
M. K. Cheoun,
J. H. Choi,
J. Y. Choi,
T. Dodo,
J. Goh,
K. Haga,
M. Harada,
S. Hasegawa,
T. Hiraiwa,
W. Hwang,
H. I. Jang,
J. S. Jang,
H. Jeon,
S. Jeon,
K. K. Joo,
D. E. Jung,
S. K. Kang,
Y. Kasugai,
T. Kawasaki,
E. J. Kim,
J. Y. Kim,
S. B. Kim,
W. Kim
, et al. (33 additional authors not shown)
Abstract:
JSNS$^2$ (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) is an experiment which searches for sterile neutrinos via the observation of $\barν_μ \to \barν_{e}$ appearance oscillations using muon decay-at-rest neutrinos. The data taking of JSNS$^2$ have been performed from 2021. In this manuscript, a study of the accidental background is presented. The rate of the accidental back…
▽ More
JSNS$^2$ (J-PARC Sterile Neutrino Search at J-PARC Spallation Neutron Source) is an experiment which searches for sterile neutrinos via the observation of $\barν_μ \to \barν_{e}$ appearance oscillations using muon decay-at-rest neutrinos. The data taking of JSNS$^2$ have been performed from 2021. In this manuscript, a study of the accidental background is presented. The rate of the accidental background is (9.29$\pm 0.39) \times 10^{-8}$ / spill with 0.75 MW beam power and comparable to the number of searching signals.
△ Less
Submitted 22 April, 2024; v1 submitted 4 August, 2023;
originally announced August 2023.
-
CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering
Authors:
Hyeon Jeon,
Ghulam Jilani Quadri,
Hyunwook Lee,
Paul Rosen,
Danielle Albers Szafir,
Jinwook Seo
Abstract:
Visual clustering is a common perceptual task in scatterplots that supports diverse analytics tasks (e.g., cluster identification). However, even with the same scatterplot, the ways of perceiving clusters (i.e., conducting visual clustering) can differ due to the differences among individuals and ambiguous cluster boundaries. Although such perceptual variability casts doubt on the reliability of d…
▽ More
Visual clustering is a common perceptual task in scatterplots that supports diverse analytics tasks (e.g., cluster identification). However, even with the same scatterplot, the ways of perceiving clusters (i.e., conducting visual clustering) can differ due to the differences among individuals and ambiguous cluster boundaries. Although such perceptual variability casts doubt on the reliability of data analysis based on visual clustering, we lack a systematic way to efficiently assess this variability. In this research, we study perceptual variability in conducting visual clustering, which we call Cluster Ambiguity. To this end, we introduce CLAMS, a data-driven visual quality measure for automatically predicting cluster ambiguity in monochrome scatterplots. We first conduct a qualitative study to identify key factors that affect the visual separation of clusters (e.g., proximity or size difference between clusters). Based on study findings, we deploy a regression module that estimates the human-judged separability of two clusters. Then, CLAMS predicts cluster ambiguity by analyzing the aggregated results of all pairwise separability between clusters that are generated by the module. CLAMS outperforms widely-used clustering techniques in predicting ground truth cluster ambiguity. Meanwhile, CLAMS exhibits performance on par with human annotators. We conclude our work by presenting two applications for optimizing and benchmarking data mining techniques using CLAMS. The interactive demo of CLAMS is available at clusterambiguity.dev.
△ Less
Submitted 11 August, 2023; v1 submitted 1 August, 2023;
originally announced August 2023.
-
ZADU: A Python Library for Evaluating the Reliability of Dimensionality Reduction Embeddings
Authors:
Hyeon Jeon,
Aeri Cho,
Jinhwa Jang,
Soohyun Lee,
Jake Hyun,
Hyung-Kwon Ko,
Jaemin Jo,
Jinwook Seo
Abstract:
Dimensionality reduction (DR) techniques inherently distort the original structure of input high-dimensional data, producing imperfect low-dimensional embeddings. Diverse distortion measures have thus been proposed to evaluate the reliability of DR embeddings. However, implementing and executing distortion measures in practice has so far been time-consuming and tedious. To address this issue, we p…
▽ More
Dimensionality reduction (DR) techniques inherently distort the original structure of input high-dimensional data, producing imperfect low-dimensional embeddings. Diverse distortion measures have thus been proposed to evaluate the reliability of DR embeddings. However, implementing and executing distortion measures in practice has so far been time-consuming and tedious. To address this issue, we present ZADU, a Python library that provides distortion measures. ZADU is not only easy to install and execute but also enables comprehensive evaluation of DR embeddings through three key features. First, the library covers a wide range of distortion measures. Second, it automatically optimizes the execution of distortion measures, substantially reducing the running time required to execute multiple measures. Last, the library informs how individual points contribute to the overall distortions, facilitating the detailed analysis of DR embeddings. By simulating a real-world scenario of optimizing DR embeddings, we verify that our optimization scheme substantially reduces the time required to execute distortion measures. Finally, as an application of ZADU, we present another library called ZADUVis that allows users to easily create distortion visualizations that depict the extent to which each region of an embedding suffers from distortions.
△ Less
Submitted 11 August, 2023; v1 submitted 1 August, 2023;
originally announced August 2023.