-
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation
Authors:
Jiwook Kim,
Seonho Lee,
Jaeyo Shin,
Jiho Choi,
Hyunjung Shim
Abstract:
Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks due to its inherent 3D consistency. However, existing SDS-based 3D editing methods suffer from extensive training time and lead to low-quality results, primarily because these methods deviate from the sampling dynamics of diffusion models. In this paper, we propose DreamCatalyst, a novel framewo…
▽ More
Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks due to its inherent 3D consistency. However, existing SDS-based 3D editing methods suffer from extensive training time and lead to low-quality results, primarily because these methods deviate from the sampling dynamics of diffusion models. In this paper, we propose DreamCatalyst, a novel framework that interprets SDS-based editing as a diffusion reverse process. Our objective function considers the sampling dynamics, thereby making the optimization process of DreamCatalyst an approximation of the diffusion reverse process in editing tasks. DreamCatalyst aims to reduce training time and improve editing quality. DreamCatalyst presents two modes: (1) a faster mode, which edits the NeRF scene in only about 25 minutes, and (2) a high-quality mode, which produces superior results in less than 70 minutes. Specifically, our high-quality mode outperforms current state-of-the-art NeRF editing methods both in terms of speed and quality. See more extensive results on our project page: https://dream-catalyst.github.io.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Team HYU ASML ROBOVOX SP Cup 2024 System Description
Authors:
Jeong-Hwan Choi,
Gaeun Kim,
Hee-Jae Lee,
Seyun Ahn,
Hyun-Soo Kim,
Joon-Hyuk Chang
Abstract:
This report describes the submission of HYU ASML team to the IEEE Signal Processing Cup 2024 (SP Cup 2024). This challenge, titled "ROBOVOX: Far-Field Speaker Recognition by a Mobile Robot," focuses on speaker recognition using a mobile robot in noisy and reverberant conditions. Our solution combines the result of deep residual neural networks and time-delay neural network-based speaker embedding…
▽ More
This report describes the submission of HYU ASML team to the IEEE Signal Processing Cup 2024 (SP Cup 2024). This challenge, titled "ROBOVOX: Far-Field Speaker Recognition by a Mobile Robot," focuses on speaker recognition using a mobile robot in noisy and reverberant conditions. Our solution combines the result of deep residual neural networks and time-delay neural network-based speaker embedding models. These models were trained on a diverse dataset that includes French speech. To account for the challenging evaluation environment characterized by high noise, reverberation, and short speech conditions, we focused on data augmentation and training speech duration for the speaker embedding model. Our submission achieved second place on the SP Cup 2024 public leaderboard, with a detection cost function of 0.5245 and an equal error rate of 6.46%.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Multibeam Satellite Communications with Massive MIMO: Asymptotic Performance Analysis and Design Insights
Authors:
Seyong Kim,
Jinseok Choi,
Wonjae Shin,
Namyoon Lee,
Jeonghun Park
Abstract:
To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by…
▽ More
To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by which inter-beam interference is efficiently mitigated by narrowing corresponding beam width. By modeling the ground users' locations via a Poisson point process, we rigorously analyze the achievable performance of the presented multibeam satellite system. In particular, we investigate the asymptotic scaling laws that reveal the interplay between the user density, the number of beams, and the number of antennas. Our analysis offers critical design insights for the multibeam satellite with massive MIMO: i) If the user density scales in power with the number of antennas, the considered precoding can achieve a linear fraction of the optimal rate in the asymptotic regime. ii) A certain additional scaling factor for the user density is needed as the number of beams increases to maintain the asymptotic optimality.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Centrality dependence of Lévy-stable two-pion Bose-Einstein correlations in $\sqrt{s_{_{NN}}}=200$ GeV Au$+$Au collisions
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
H. Al-Ta'ani,
J. Alexander,
A. Angerami,
K. Aoki,
N. Apadula,
Y. Aramaki,
H. Asano,
E. C. Aschenauer,
E. T. Atomssa,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
B. Bannier,
K. N. Barish,
B. Bassalleck,
S. Bathe
, et al. (377 additional authors not shown)
Abstract:
The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability…
▽ More
The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability $α$, and the Lévy-scale parameter $R$ as a function of transverse mass $m_T$ and centrality. The $λ(m_T)$ parameter is constant at larger values of $m_T$, but decreases as $m_T$ decreases. The Lévy scale parameter $R(m_T)$ decreases with $m_T$ and exhibits proportionality to the length scale of the nuclear overlap region. The Lévy exponent $α(m_T)$ is independent of $m_T$ within uncertainties in each investigated centrality bin, but shows a clear centrality dependence. At all centralities, the Lévy exponent $α$ is significantly different from that of Gaussian ($α=2$) or Cauchy ($α=1$) source distributions. Comparisons to the predictions of Monte-Carlo simulations of resonance-decay chains show that in all but the most peripheral centrality class (50%-60%), the obtained results are inconsistent with the measurements, unless a significant reduction of the in-medium mass of the $η'$ meson is included. In each centrality class, the best value of the in-medium $η'$ mass is compared to the mass of the $η$ meson, as well as to several theoretical predictions that consider restoration of $U_A(1)$ symmetry in hot hadronic matter.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Unveiling the Potential of BERTopic for Multilingual Fake News Analysis -- Use Case: Covid-19
Authors:
Karla Schäfer,
Jeong-Eun Choi,
Inna Vogel,
Martin Steinebach
Abstract:
Topic modeling is frequently being used for analysing large text corpora such as news articles or social media data. BERTopic, consisting of sentence embedding, dimension reduction, clustering, and topic extraction, is the newest and currently the SOTA topic modeling method. However, current topic modeling methods have room for improvement because, as unsupervised methods, they require careful tun…
▽ More
Topic modeling is frequently being used for analysing large text corpora such as news articles or social media data. BERTopic, consisting of sentence embedding, dimension reduction, clustering, and topic extraction, is the newest and currently the SOTA topic modeling method. However, current topic modeling methods have room for improvement because, as unsupervised methods, they require careful tuning and selection of hyperparameters, e.g., for dimension reduction and clustering. This paper aims to analyse the technical application of BERTopic in practice. For this purpose, it compares and selects different methods and hyperparameters for each stage of BERTopic through density based clustering validation and six different topic coherence measures. Moreover, it also aims to analyse the results of topic modeling on real world data as a use case. For this purpose, the German fake news dataset (GermanFakeNCovid) on Covid-19 was created by us and in order to experiment with topic modeling in a multilingual (English and German) setting combined with the FakeCovid dataset. With the final results, we were able to determine thematic similarities between the United States and Germany. Whereas, distinguishing the topics of fake news from India proved to be more challenging.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
ESM+: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models
Authors:
Benjamin Ascoli,
Ram Kandikonda,
Jinho D. Choi
Abstract:
The task of Text-to-SQL enables anyone to retrieve information from SQL databases using natural language. Despite several challenges, recent models have made remarkable advancements in this task using large language models (LLMs). Interestingly, we find that LLM-based models without fine-tuning exhibit distinct natures compared to their fine-tuned counterparts, leading to inadequacies in current e…
▽ More
The task of Text-to-SQL enables anyone to retrieve information from SQL databases using natural language. Despite several challenges, recent models have made remarkable advancements in this task using large language models (LLMs). Interestingly, we find that LLM-based models without fine-tuning exhibit distinct natures compared to their fine-tuned counterparts, leading to inadequacies in current evaluation metrics to accurately convey their performance. Thus, we analyze the two primary metrics, Test Suite Execution Accuracy (EXE) and Exact Set Matching Accuracy (ESM), to examine their robustness for this task and address shortcomings. We compare the performance of 9 LLM-based models using EXE, the original ESM, and our improved ESM (called ESM+). Our results show that EXE and ESM have high false positive and negative rates of 11.3% and 13.9%, while ESM+ gives those of 0.1% and 2.6% respectively, providing a significantly more stable evaluation. We release the ESM+ script as open-source for the community to contribute, while enjoying a more reliable assessment of Text-to-SQL.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Purcell enhancement and spin spectroscopy of silicon vacancy centers in silicon carbide using an ultra-small mode-volume plasmonic cavity
Authors:
Jae-Pil So,
Jialun Luo,
Jaehong Choi,
Brendan McCullian,
Gregory D. Fuchs
Abstract:
Silicon vacancy (V$_{Si}$) centers in 4H-silicon carbide have emerged as a strong candidate for quantum networking applications due to their robust electronic and optical properties including a long spin coherence lifetime and bright, stable emission. Here, we report the integration of V$_{Si}$ centers with a plasmonic nanocavity to Purcell enhance the emission, which is critical for scalable quan…
▽ More
Silicon vacancy (V$_{Si}$) centers in 4H-silicon carbide have emerged as a strong candidate for quantum networking applications due to their robust electronic and optical properties including a long spin coherence lifetime and bright, stable emission. Here, we report the integration of V$_{Si}$ centers with a plasmonic nanocavity to Purcell enhance the emission, which is critical for scalable quantum networking. Employing a simple fabrication process, we demonstrate plasmonic cavities that support a nanoscale mode volume and exhibit an increase in the spontaneous emission rate with a measured Purcell factor of up to 48. In addition to investigating the optical resonance modes, we demonstrate that an improvement in the optical stability of the spin-preserving resonant optical transitions relative to the radiation-limited value. The results highlight the potential of nanophotonic structures for advancing quantum networking technologies and emphasizes the importance of optimizing emitter-cavity interactions for efficient quantum photonic applications.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation
Authors:
Jin Woo Lee,
Jaehyun Park,
Min Jun Choi,
Kyogu Lee
Abstract:
While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling wit…
▽ More
While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling within a neural network framework. Our model leverages physical properties and fundamental frequencies as inputs, outputting string states across time and space that solve the partial differential equation characterizing the nonlinear string. Empirical evaluations demonstrate that the proposed architecture achieves superior accuracy in string motion simulation compared to existing baseline architectures. The code and demo are available online.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Quantal phase of extreme nonstatic light waves: Step-phase evolution and its effects
Authors:
Jeong Ryeol Choi
Abstract:
The phases are the main factor that affects the outcome of various optical phenomena, such as quantum superposition, wave interference, and light-matter interaction. As a light wave becomes nonstatic, an additional phase, the so-called geometric phase, takes place in its evolution. Then, due to this phase, the overall phase of the quantum wave function varies in a nonlinear way with time. Interest…
▽ More
The phases are the main factor that affects the outcome of various optical phenomena, such as quantum superposition, wave interference, and light-matter interaction. As a light wave becomes nonstatic, an additional phase, the so-called geometric phase, takes place in its evolution. Then, due to this phase, the overall phase of the quantum wave function varies in a nonlinear way with time. Interestingly, the phase exhibits a step-like evolution if the measure of nonstaticity is extremely high. Such an abnormal phase variation is analyzed in detail for better understanding of wave nonstaticity in this work. As the wave becomes highly nonstatic, the phase factor of the electromagnetic wave evolves in a rectangular manner. However, the shape of the electromagnetic field is still a sinusoidal form on account of the compensational variation of the wave amplitude. The electromagnetic field in this case very much resembles that of a standing wave. The effects accompanying the step-phase evolution, such as modification of the probability distribution and alteration of the wave-interference profile, are analyzed and their implications are illustrated.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
Authors:
Janghwan Lee,
Seongmin Park,
Sukjin Hong,
Minsoo Kim,
Du-Seong Chang,
Jungwook Choi
Abstract:
The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved throu…
▽ More
The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved through techniques like post-training quantization (PTQ), presents challenges such as token-flipping that can impair chatbot performance. In response, we propose a novel preference alignment approach, quantization-aware direct preference optimization (QDPO), that aligns quantized LLMs with their full-precision counterparts, improving conversational abilities. Evaluated on two instruction-tuned LLMs in various languages, QDPO demonstrated superior performance in improving conversational abilities compared to established PTQ and knowledge-distillation fine-tuning techniques, marking a significant step forward in the development of efficient and effective conversational LLMs.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
CLOi-Mapper: Consistent, Lightweight, Robust, and Incremental Mapper With Embedded Systems for Commercial Robot Services
Authors:
DongKi Noh,
Hyungtae Lim,
Gyuho Eoh,
Duckyu Choi,
Jeongsik Choi,
Hyunjun Lim,
SeungMin Baek,
Hyun Myung
Abstract:
In commercial autonomous service robots with several form factors, simultaneous localization and mapping (SLAM) is an essential technology for providing proper services such as cleaning and guidance. Such robots require SLAM algorithms suitable for specific applications and environments. Hence, several SLAM frameworks have been proposed to address various requirements in the past decade. However,…
▽ More
In commercial autonomous service robots with several form factors, simultaneous localization and mapping (SLAM) is an essential technology for providing proper services such as cleaning and guidance. Such robots require SLAM algorithms suitable for specific applications and environments. Hence, several SLAM frameworks have been proposed to address various requirements in the past decade. However, we have encountered challenges in implementing recent innovative frameworks when handling service robots with low-end processors and insufficient sensor data, such as low-resolution 2D LiDAR sensors. Specifically, regarding commercial robots, consistent performance in different hardware configurations and environments is more crucial than the performance dedicated to specific sensors or environments. Therefore, we propose a) a multi-stage %hierarchical approach for global pose estimation in embedded systems; b) a graph generation method with zero constraints for synchronized sensors; and c) a robust and memory-efficient method for long-term pose-graph optimization. As verified in in-home and large-scale indoor environments, the proposed method yields consistent global pose estimation for services in commercial fields. Furthermore, the proposed method exhibits potential commercial viability considering the consistent performance verified via mass production and long-term (> 5 years) operation.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Disentangled Motion Modeling for Video Frame Interpolation
Authors:
Jaihyun Lew,
Jooyoung Choi,
Chaehun Shin,
Dahuin Jung,
Sungroh Yoon
Abstract:
Video frame interpolation (VFI) aims to synthesize intermediate frames in between existing frames to enhance visual smoothness and quality. Beyond the conventional methods based on the reconstruction loss, recent works employ the high quality generative models for perceptual quality. However, they require complex training and large computational cost for modeling on the pixel space. In this paper,…
▽ More
Video frame interpolation (VFI) aims to synthesize intermediate frames in between existing frames to enhance visual smoothness and quality. Beyond the conventional methods based on the reconstruction loss, recent works employ the high quality generative models for perceptual quality. However, they require complex training and large computational cost for modeling on the pixel space. In this paper, we introduce disentangled Motion Modeling (MoMo), a diffusion-based approach for VFI that enhances visual quality by focusing on intermediate motion modeling. We propose disentangled two-stage training process, initially training a frame synthesis model to generate frames from input pairs and their optical flows. Subsequently, we propose a motion diffusion model, equipped with our novel diffusion U-Net architecture designed for optical flow, to produce bi-directional flows between frames. This method, by leveraging the simpler low-frequency representation of motions, achieves superior perceptual quality with reduced computational demands compared to generative modeling methods on the pixel space. Our method surpasses state-of-the-art methods in perceptual metrics across various benchmarks, demonstrating its efficacy and efficiency in VFI. Our code is available at: https://github.com/JHLew/MoMo
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Memorizing Documents with Guidance in Large Language Models
Authors:
Bumjin Park,
Jaesik Choi
Abstract:
Training data plays a pivotal role in AI models. Large language models (LLMs) are trained with massive amounts of documents, and their parameters hold document-related contents. Recently, several studies identified content-specific locations in LLMs by examining the parameters. Instead of the post hoc interpretation, we propose another approach. We propose document-wise memory architecture to trac…
▽ More
Training data plays a pivotal role in AI models. Large language models (LLMs) are trained with massive amounts of documents, and their parameters hold document-related contents. Recently, several studies identified content-specific locations in LLMs by examining the parameters. Instead of the post hoc interpretation, we propose another approach. We propose document-wise memory architecture to track document memories in training. The proposed architecture maps document representations to memory entries, which softly mask memories in the forward process of LLMs. Additionally, we propose document guidance loss, which increases the likelihood of text with document memories and reduces the likelihood of the text with the memories of other documents. Experimental results on Wikitext-103-v1 with Pythia-1B show that the proposed methods provide different memory entries for documents and high recall of document-related content in generation with trained document-wise memories.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
DataFreeShield: Defending Adversarial Attacks without Training Data
Authors:
Hyeyoon Lee,
Kanghyun Choi,
Dain Kwon,
Sunjong Park,
Mayoore Selvarasa Jaiswal,
Noseong Park,
Jonghyun Choi,
Jinho Lee
Abstract:
Recent advances in adversarial robustness rely on an abundant set of training data, where using external or additional datasets has become a common setting. However, in real life, the training data is often kept private for security and privacy issues, while only the pretrained weight is available to the public. In such scenarios, existing methods that assume accessibility to the original data bec…
▽ More
Recent advances in adversarial robustness rely on an abundant set of training data, where using external or additional datasets has become a common setting. However, in real life, the training data is often kept private for security and privacy issues, while only the pretrained weight is available to the public. In such scenarios, existing methods that assume accessibility to the original data become inapplicable. Thus we investigate the pivotal problem of data-free adversarial robustness, where we try to achieve adversarial robustness without accessing any real data. Through a preliminary study, we highlight the severity of the problem by showing that robustness without the original dataset is difficult to achieve, even with similar domain datasets. To address this issue, we propose DataFreeShield, which tackles the problem from two perspectives: surrogate dataset generation and adversarial training using the generated data. Through extensive validation, we show that DataFreeShield outperforms baselines, demonstrating that the proposed method sets the first entirely data-free solution for the adversarial robustness problem.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Decoupled static and dynamical charge correlations in La$_{2-x}$Sr$_x$CuO$_4$
Authors:
L. Martinelli,
I. Biało,
X. Hong,
J. Oppliger,
C. Lin,
T. Schaller,
J. Küspert,
M. H. Fischer,
T. Kurosawa,
N. Momono,
M. Oda,
J. Choi,
S. Agrestini,
M. Garcia-Fernandez,
Ke-Jin Zhou,
Q. Wang,
J. Chang
Abstract:
The relation between charge order, its quantum fluctuations and optical phonon modes in cuprate superconductors remains an unsolved problem. The exploration of these excitations is however complicated by the presence of twinned domains. Here, we use uniaxial strain in combination with ultra-high-resolution Resonant Inelastic X-ray Scattering (RIXS) at the oxygen K- and copper L3-edges to study the…
▽ More
The relation between charge order, its quantum fluctuations and optical phonon modes in cuprate superconductors remains an unsolved problem. The exploration of these excitations is however complicated by the presence of twinned domains. Here, we use uniaxial strain in combination with ultra-high-resolution Resonant Inelastic X-ray Scattering (RIXS) at the oxygen K- and copper L3-edges to study the excitations stemming from the charge ordering wave vector in La1.875Sr0.125CuO4. By detwinning stripe ordering, we demonstrate that the optical phonon anomalies do not show any stripe anisotropy. The low-energy charge excitations also retain an in-plane four-fold symmetry. As such, we find that both phonon and charge excitations are decoupled entirely from the strength of static charge ordering. The almost isotropic character of charge excitations remains a possible source for the strange metal properties found in the normal state of cuprate superconductors.
△ Less
Submitted 15 July, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
Scalable Training of Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN
Authors:
Massimiliano Lupo Pasini,
Jong Youl Choi,
Kshitij Mehta,
Pei Zhang,
David Rogers,
Jonghyun Bae,
Khaled Z. Ibrahim,
Ashwin M. Aji,
Karl W. Schulz,
Jorda Polo,
Prasanna Balaprakash
Abstract:
We present our work on developing and training scalable graph foundation models (GFM) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that de…
▽ More
We present our work on developing and training scalable graph foundation models (GFM) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that define convolution in GNNs. This work discusses a series of optimizations that have allowed scaling up the GFM training to tens of thousands of GPUs on datasets that consist of hundreds of millions of graphs. Our GFMs use multi-task learning (MTL) to simultaneously learn graph-level and node-level properties of atomistic structures, such as the total energy and atomic forces. Using over 150 million atomistic structures for training, we illustrate the performance of our approach along with the lessons learned on two United States Department of Energy (US-DOE) supercomputers, namely the Perlmutter petascale system at the National Energy Research Scientific Computing Center and the Frontier exascale system at Oak Ridge National Laboratory. The HydraGNN architecture enables the GFM to achieve near-linear strong scaling performance using more than 2,000 GPUs on Perlmutter and 16,000 GPUs on Frontier. Hyperparameter optimization (HPO) was performed on over 64,000 GPUs on Frontier to select GFM architectures with high accuracy. Early stopping was applied on each GFM architecture for energy awareness in performing such an extreme-scale task. The training of an ensemble of highest-ranked GFM architectures continued until convergence to establish uncertainty quantification (UQ) capabilities with ensemble learning. Our contribution opens the door for rapidly developing, training, and deploying GFMs using large-scale computational resources to enable AI-accelerated materials discovery and design.
△ Less
Submitted 28 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling
Authors:
Irfan Robbani,
Paul Reisert,
Naoya Inoue,
Surawat Pothong,
Camélia Guerraoui,
Wenzhi Wang,
Shoichi Naito,
Jungmin Choi,
Kentaro Inui
Abstract:
Prior research in computational argumentation has mainly focused on scoring the quality of arguments, with less attention on explicating logical errors. In this work, we introduce four sets of explainable templates for common informal logical fallacies designed to explicate a fallacy's implicit logic. Using our templates, we conduct an annotation study on top of 400 fallacious arguments taken from…
▽ More
Prior research in computational argumentation has mainly focused on scoring the quality of arguments, with less attention on explicating logical errors. In this work, we introduce four sets of explainable templates for common informal logical fallacies designed to explicate a fallacy's implicit logic. Using our templates, we conduct an annotation study on top of 400 fallacious arguments taken from LOGIC dataset and achieve a high agreement score (Krippendorf's alpha of 0.54) and reasonable coverage (0.83). Finally, we conduct an experiment for detecting the structure of fallacies and discover that state-of-the-art language models struggle with detecting fallacy templates (0.47 accuracy). To facilitate research on fallacies, we make our dataset and guidelines publicly available.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Authors:
Young Jin Ahn,
Jungwoo Park,
Sangha Park,
Jonghyun Choi,
Kee-Eung Kim
Abstract:
Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fel…
▽ More
Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fell short of full synchronization. To address this, we present SyncVSR, an end-to-end learning framework that leverages quantized audio for frame-level crossmodal supervision. By integrating a projection layer that synchronizes visual representation with acoustic data, our encoder learns to generate discrete audio tokens from a video sequence in a non-autoregressive manner. SyncVSR shows versatility across tasks, languages, and modalities at the cost of a forward pass. Our empirical evaluations show that it not only achieves state-of-the-art results but also reduces data usage by up to ninefold.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Understanding Multi-Granularity for Open-Vocabulary Part Segmentation
Authors:
Jiho Choi,
Seonho Lee,
Seungho Lee,
Minhyun Lee,
Hyunjung Shim
Abstract:
Open-vocabulary part segmentation (OVPS) is an emerging research area focused on segmenting fine-grained entities based on diverse and previously unseen vocabularies. Our study highlights the inherent complexities of part segmentation due to intricate boundaries and diverse granularity, reflecting the knowledge-based nature of part identification. To address these challenges, we propose PartCLIPSe…
▽ More
Open-vocabulary part segmentation (OVPS) is an emerging research area focused on segmenting fine-grained entities based on diverse and previously unseen vocabularies. Our study highlights the inherent complexities of part segmentation due to intricate boundaries and diverse granularity, reflecting the knowledge-based nature of part identification. To address these challenges, we propose PartCLIPSeg, a novel framework utilizing generalized parts and object-level contexts to mitigate the lack of generalization in fine-grained parts. PartCLIPSeg integrates competitive part relationships and attention control techniques, alleviating ambiguous boundaries and underrepresented parts. Experimental results demonstrate that PartCLIPSeg outperforms existing state-of-the-art OVPS methods, offering refined segmentation and an advanced understanding of part relationships in images. Through extensive experiments, our model demonstrated an improvement over the state-of-the-art models on the Pascal-Part-116, ADE20K-Part-234, and PartImageNet datasets.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Semi-Supervised Domain Adaptation Using Target-Oriented Domain Augmentation for 3D Object Detection
Authors:
Yecheol Kim,
Junho Lee,
Changsoo Park,
Hyoung won Kim,
Inho Lim,
Christopher Chang,
Jun Won Choi
Abstract:
3D object detection is crucial for applications like autonomous driving and robotics. However, in real-world environments, variations in sensor data distribution due to sensor upgrades, weather changes, and geographic differences can adversely affect detection performance. Semi-Supervised Domain Adaptation (SSDA) aims to mitigate these challenges by transferring knowledge from a source domain, abu…
▽ More
3D object detection is crucial for applications like autonomous driving and robotics. However, in real-world environments, variations in sensor data distribution due to sensor upgrades, weather changes, and geographic differences can adversely affect detection performance. Semi-Supervised Domain Adaptation (SSDA) aims to mitigate these challenges by transferring knowledge from a source domain, abundant in labeled data, to a target domain where labels are scarce. This paper presents a new SSDA method referred to as Target-Oriented Domain Augmentation (TODA) specifically tailored for LiDAR-based 3D object detection. TODA efficiently utilizes all available data, including labeled data in the source domain, and both labeled data and unlabeled data in the target domain to enhance domain adaptation performance. TODA consists of two stages: TargetMix and AdvMix. TargetMix employs mixing augmentation accounting for LiDAR sensor characteristics to facilitate feature alignment between the source-domain and target-domain. AdvMix applies point-wise adversarial augmentation with mixing augmentation, which perturbs the unlabeled data to align the features within both labeled and unlabeled data in the target domain. Our experiments conducted on the challenging domain adaptation tasks demonstrate that TODA outperforms existing domain adaptation techniques designed for 3D object detection by significant margins. The code is available at: https://github.com/rasd3/TODA.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment
Authors:
Daechul Ahn,
Yura Choi,
San Kim,
Youngjae Yu,
Dongyeop Kang,
Jonghyun Choi
Abstract:
Aligning Video Large Multimodal Models (VLMMs) face challenges such as modality misalignment and verbose responses. Although iterative approaches such as self-rewarding or iterative direct preference optimization (DPO) recently showed a significant improvement in language model alignment, particularly on reasoning tasks, self-aligned models applied to large video-language models often result in le…
▽ More
Aligning Video Large Multimodal Models (VLMMs) face challenges such as modality misalignment and verbose responses. Although iterative approaches such as self-rewarding or iterative direct preference optimization (DPO) recently showed a significant improvement in language model alignment, particularly on reasoning tasks, self-aligned models applied to large video-language models often result in lengthy and irrelevant responses. To address these challenges, we propose a novel method that employs self-retrospection to enhance both response generation and preference modeling, and call iterative self-retrospective judgment (i-SRT). By revisiting and evaluating already generated content and preference in loop, i-SRT improves the alignment between textual and visual modalities, reduce verbosity, and enhances content relevance. Our empirical evaluations across diverse video question answering benchmarks demonstrate that i-SRT significantly outperforms prior arts. We are committed to opensourcing our code, models, and datasets to encourage further investigation.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
SpoT-Mamba: Learning Long-Range Dependency on Spatio-Temporal Graphs with Selective State Spaces
Authors:
Jinhyeok Choi,
Heehyeon Kim,
Minhyeong An,
Joyce Jiyoung Whang
Abstract:
Spatio-temporal graph (STG) forecasting is a critical task with extensive applications in the real world, including traffic and weather forecasting. Although several recent methods have been proposed to model complex dynamics in STGs, addressing long-range spatio-temporal dependencies remains a significant challenge, leading to limited performance gains. Inspired by a recently proposed state space…
▽ More
Spatio-temporal graph (STG) forecasting is a critical task with extensive applications in the real world, including traffic and weather forecasting. Although several recent methods have been proposed to model complex dynamics in STGs, addressing long-range spatio-temporal dependencies remains a significant challenge, leading to limited performance gains. Inspired by a recently proposed state space model named Mamba, which has shown remarkable capability of capturing long-range dependency, we propose a new STG forecasting framework named SpoT-Mamba. SpoT-Mamba generates node embeddings by scanning various node-specific walk sequences. Based on the node embeddings, it conducts temporal scans to capture long-range spatio-temporal dependencies. Experimental results on the real-world traffic forecasting dataset demonstrate the effectiveness of SpoT-Mamba.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
An experimental search for an explanation of the difference between beam and bottle neutron lifetime measurements
Authors:
M. F. Blatnik,
L. S. Blokland,
N. Callahan,
J. H. Choi,
S. Clayton,
C. B Cude-Woods,
B. W. Filippone,
W. R. Fox,
E. Fries,
P. Geltenbort,
F. M. Gonzalez,
L. Hayen,
K. P. Hickerson,
A. T. Holley,
T. M. Ito,
A. Komives,
S Lin,
Chen-Yu Liu,
M. F. Makela,
C. L. Morris,
R. Musedinovic,
C. M. O'Shaughnessy,
R. W. Pattie Jr.,
J. C. Ramsey,
D. J. Salvat
, et al. (10 additional authors not shown)
Abstract:
The past two decades have yielded several new measurements and reanalysis of older measurements of the neutron lifetime. These have led to a 4.4 standard deviation discrepancy between the most precise measurements of the neutron decay rate producing protons in cold neutron beams and the most precise lifetime measured in neutron storage experiments. Here we publish an analysis of the recently publi…
▽ More
The past two decades have yielded several new measurements and reanalysis of older measurements of the neutron lifetime. These have led to a 4.4 standard deviation discrepancy between the most precise measurements of the neutron decay rate producing protons in cold neutron beams and the most precise lifetime measured in neutron storage experiments. Here we publish an analysis of the recently published UCN aimed a searching for an explanation of this difference using the model proposed by Koch and Hummel.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Leveraging Explicit Reasoning for Inference Integration in Commonsense-Augmented Dialogue Models
Authors:
Sarah E. Finch,
Jinho D. Choi
Abstract:
Open-domain dialogue systems need to grasp social commonsense to understand and respond effectively to human users. Commonsense-augmented dialogue models have been proposed that aim to infer commonsense knowledge from dialogue contexts in order to improve response quality. However, existing approaches to commonsense-augmented dialogue rely on implicit reasoning to integrate commonsense inferences…
▽ More
Open-domain dialogue systems need to grasp social commonsense to understand and respond effectively to human users. Commonsense-augmented dialogue models have been proposed that aim to infer commonsense knowledge from dialogue contexts in order to improve response quality. However, existing approaches to commonsense-augmented dialogue rely on implicit reasoning to integrate commonsense inferences during response generation. In this study, we explore the impact of explicit reasoning against implicit reasoning over commonsense for dialogue response generation. Our findings demonstrate that separating commonsense reasoning into explicit steps for generating, selecting, and integrating commonsense into responses leads to better dialogue interactions, improving naturalness, engagement, specificity, and overall quality. Subsequent analyses of these findings unveil insights into the effectiveness of various types of commonsense in generating responses and the particular response traits enhanced through explicit reasoning for commonsense integration. Our work advances research in open-domain dialogue by achieving a new state-of-the-art in commonsense-augmented response generation.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning
Authors:
Janghoon Han,
Changho Lee,
Joongbo Shin,
Stanley Jungkyu Choi,
Honglak Lee,
Kynghoon Bae
Abstract:
Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in ins…
▽ More
Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in instruction tuning, we perform instruction tuning individually for two distinct language meta-datasets. Subsequently, we assess the performance on unseen tasks in a language different from the one used for training. To facilitate this investigation, we introduce a novel non-English meta-dataset named "KORANI" (Korean Natural Instruction), comprising 51 Korean benchmarks. Moreover, we design cross-lingual templates to mitigate discrepancies in language and instruction-format of the template between training and inference within the cross-lingual setting. Our experiments reveal consistent improvements through cross-lingual generalization in both English and Korean, outperforming baseline by average scores of 20.7\% and 13.6\%, respectively. Remarkably, these enhancements are comparable to those achieved by monolingual instruction tuning and even surpass them in some tasks. The result underscores the significance of relevant data acquisition across languages over linguistic congruence with unseen tasks during instruction tuning.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Stability of a Two-Phase Stokes Problem with Surface Tension
Authors:
Jae Ho Choi
Abstract:
In this work, we study the well-posedness of a system of partial differential equations that model the dynamics of a two-dimensional Stokes bubble immersed in two-dimensional ambient Stokes fluid of the same viscosity that extends to infinity under the effect of surface tension. We assume that the two fluids are immiscible and incompressible and that there is no interfacial jump in the fluid veloc…
▽ More
In this work, we study the well-posedness of a system of partial differential equations that model the dynamics of a two-dimensional Stokes bubble immersed in two-dimensional ambient Stokes fluid of the same viscosity that extends to infinity under the effect of surface tension. We assume that the two fluids are immiscible and incompressible and that there is no interfacial jump in the fluid velocity. For this PDE system, a circular fluid bubble is a steady-state solution. Given an initial contour for the fluid bubble which is sufficiently close to a circle, we show that there exists a unique, global-in-time solution. This unique solution decays to a circle exponentially fast, which means that circular fluid bubbles are stable steady-state solutions. We also obtain a result concerning the regularity of the unique solution, that although the initial perturbation around a circular contour is assumed to be of low regularity, any later perturbation becomes real analytic, hence smooth.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
S. Afanasiev,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
H. Al-Bataineh,
J. Alexander,
M. Alfred,
K. Aoki,
N. Apadula,
L. Aphecetche,
J. Asai,
H. Asano,
E. T. Atomssa,
R. Averbeck,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
G. Baksay,
L. Baksay,
A. Baldisseri
, et al. (510 additional authors not shown)
Abstract:
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs…
▽ More
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Automated Information Extraction from Thyroid Operation Narrative: A Comparative Study of GPT-4 and Fine-tuned KoELECTRA
Authors:
Dongsuk Jang,
Hyeryun Park,
Jiye Son,
Hyeonuk Hwang,
Sujin Kim,
Jinwook Choi
Abstract:
In the rapidly evolving field of healthcare, the integration of artificial intelligence (AI) has become a pivotal component in the automation of clinical workflows, ushering in a new era of efficiency and accuracy. This study focuses on the transformative capabilities of the fine-tuned KoELECTRA model in comparison to the GPT-4 model, aiming to facilitate automated information extraction from thyr…
▽ More
In the rapidly evolving field of healthcare, the integration of artificial intelligence (AI) has become a pivotal component in the automation of clinical workflows, ushering in a new era of efficiency and accuracy. This study focuses on the transformative capabilities of the fine-tuned KoELECTRA model in comparison to the GPT-4 model, aiming to facilitate automated information extraction from thyroid operation narratives. The current research landscape is dominated by traditional methods heavily reliant on regular expressions, which often face challenges in processing free-style text formats containing critical details of operation records, including frozen biopsy reports. Addressing this, the study leverages advanced natural language processing (NLP) techniques to foster a paradigm shift towards more sophisticated data processing systems. Through this comparative study, we aspire to unveil a more streamlined, precise, and efficient approach to document processing in the healthcare domain, potentially revolutionizing the way medical data is handled and analyzed.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Exclusion of the Cosmological Triangle in Reactor-Based Search for Axion-Like Particles
Authors:
Byung Ju Park,
Jae Jin Choi,
Eunju Jeon,
Jinyu Kim,
Kyungwon Kim,
Sung Hyun Kim,
Sun Kee Kim,
Yeongduk Kim,
Young Ju Ko,
Byoung-Cheol Koh,
Chang Hyon Ha,
Seo Hyun Lee,
In Soo Lee,
Hyunseok Lee,
Hyun Su Lee,
Jaison Lee,
Yoomin Oh,
Doojin Kim
Abstract:
We report new constraints on axion-like particle (ALP) using data corresponding to a sodium iodine target exposure of 3063 kg$\cdot$days from the neutrino elastic scattering observation with NaI (NEON) experiment. A 16.7 kg of thallium-doped sodium iodide target was located 23.7 meters from a 2.8 GW thermal power nuclear reactor. We searched for ALPs produced by high-flux photons by comparing the…
▽ More
We report new constraints on axion-like particle (ALP) using data corresponding to a sodium iodine target exposure of 3063 kg$\cdot$days from the neutrino elastic scattering observation with NaI (NEON) experiment. A 16.7 kg of thallium-doped sodium iodide target was located 23.7 meters from a 2.8 GW thermal power nuclear reactor. We searched for ALPs produced by high-flux photons by comparing the energy spectra of data collected during reactor-on (1596 kg$\cdot$days exposure) and reactor-off (1467 kg$\cdot$days exposure) periods. No signal consistent with ALP interaction was identified, allowing us to set exclusion limits at the 95% confidence level. Our limits cover previously unexplored regions for both photon couplings (${g_{aγ}}$) and electron couplings (${g_{ae}}$) for axion masses around 1 MeV/c$^2$. Notably, the NEON data excludes the unconstrained region identified by laboratory-based searches for photon couplings within the "cosmological triangle" for the first time. The observed 95\% confidence level limits reach as low as ${g_{aγ}}$ of 4.33$\times$ 10$^{-8}$ GeV$^{-1}$ and ${g_{ae}}$ of 1.10$\times$ 10$^{-9}$ for axion masses of 1.7 MeV/c$^2$ and 1.0 MeV/c$^2$, respectively.
△ Less
Submitted 11 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance
Authors:
Semin Kim,
Myeonghun Jeong,
Hyeonseung Lee,
Minchan Kim,
Byoung Jin Choi,
Nam Soo Kim
Abstract:
In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancin…
▽ More
In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancing the quality of generated voices with large amount of unlabeled data. At inference, our novel dual guiding mechanism gives text and pitch guidance on the reverse diffusion step by estimating the score of masked input. Experimental results show that the model trained in a semi-supervised manner outperforms other baselines trained only on the labeled data in terms of pronunciation, pitch accuracy and overall quality. Furthermore, we demonstrate that by adding Text-to-Speech (TTS) data in training, the model can synthesize the singing voices of TTS speakers even without their singing voices.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Design of reliable technology valuation model with calibrated machine learning of patent indicators
Authors:
Seunghyun Lee,
Janghyeok Yoon,
Jaewoong Choi
Abstract:
Machine learning (ML) has revolutionized the digital transformation of technology valuation by predicting the value of patents with high accuracy. However, the lack of validation regarding the reliability of these models hinders experts from fully trusting the confidence of model predictions. To address this issue, we propose an analytical framework for reliable technology valuation using calibrat…
▽ More
Machine learning (ML) has revolutionized the digital transformation of technology valuation by predicting the value of patents with high accuracy. However, the lack of validation regarding the reliability of these models hinders experts from fully trusting the confidence of model predictions. To address this issue, we propose an analytical framework for reliable technology valuation using calibrated ML models, which provide robust confidence levels in model predictions. We extract quantitative patent indicators that represent various technology characteristics as input data, using the patent maintenance period as a proxy for technology values. Multiple ML models are developed to capture the nonlinear relationship between patent indicators and technology value. The reliability and accuracy of these models are evaluated, presenting a Pareto-front map where the expected calibration error, Matthews correlation coefficient and F1-scores are compared. After identifying the best-performing model, we apply SHapley Additive exPlanation (SHAP) analysis to pinpoint the most significant input features by confidence bin. Through a case study, we confirmed that the proposed approach offers a practical guideline for developing reliable and accurate ML-based technology valuation models, with significant implications for both academia and industry.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature
Authors:
Gyeong Hoon Yi,
Jiwoo Choi,
Hyeongyun Song,
Olivia Miano,
Jaewoong Choi,
Kihoon Bang,
Byungju Lee,
Seok Su Sohn,
David Buttler,
Anna Hiszpanski,
Sang Soo Han,
Donghun Kim
Abstract:
Efficiently extracting data from tables in the scientific literature is pivotal for building large-scale databases. However, the tables reported in materials science papers exist in highly diverse forms; thus, rule-based extractions are an ineffective approach. To overcome this challenge, we present MaTableGPT, which is a GPT-based table data extractor from the materials science literature. MaTabl…
▽ More
Efficiently extracting data from tables in the scientific literature is pivotal for building large-scale databases. However, the tables reported in materials science papers exist in highly diverse forms; thus, rule-based extractions are an ineffective approach. To overcome this challenge, we present MaTableGPT, which is a GPT-based table data extractor from the materials science literature. MaTableGPT features key strategies of table data representation and table splitting for better GPT comprehension and filtering hallucinated information through follow-up questions. When applied to a vast volume of water splitting catalysis literature, MaTableGPT achieved an extraction accuracy (total F1 score) of up to 96.8%. Through comprehensive evaluations of the GPT usage cost, labeling cost, and extraction accuracy for the learning methods of zero-shot, few-shot and fine-tuning, we present a Pareto-front mapping where the few-shot learning method was found to be the most balanced solution owing to both its high extraction accuracy (total F1 score>95%) and low cost (GPT usage cost of 5.97 US dollars and labeling cost of 10 I/O paired examples). The statistical analyses conducted on the database generated by MaTableGPT revealed valuable insights into the distribution of the overpotential and elemental utilization across the reported catalysts in the water splitting literature.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers
Authors:
Lütfi Kerem Senel,
Besnik Fetahu,
Davis Yoshida,
Zhiyu Chen,
Giuseppe Castellucci,
Nikhita Vedula,
Jason Choi,
Shervin Malmasi
Abstract:
Recommender systems are widely used to suggest engaging content, and Large Language Models (LLMs) have given rise to generative recommenders. Such systems can directly generate items, including for open-set tasks like question suggestion. While the world knowledge of LLMs enable good recommendations, improving the generated content through user feedback is challenging as continuously fine-tuning L…
▽ More
Recommender systems are widely used to suggest engaging content, and Large Language Models (LLMs) have given rise to generative recommenders. Such systems can directly generate items, including for open-set tasks like question suggestion. While the world knowledge of LLMs enable good recommendations, improving the generated content through user feedback is challenging as continuously fine-tuning LLMs is prohibitively expensive. We present a training-free approach for optimizing generative recommenders by connecting user feedback loops to LLM-based optimizers. We propose a generative explore-exploit method that can not only exploit generated items with known high engagement, but also actively explore and discover hidden population preferences to improve recommendation quality. We evaluate our approach on question generation in two domains (e-commerce and general knowledge), and model user feedback with Click Through Rate (CTR). Experiments show our LLM-based explore-exploit approach can iteratively improve recommendations, and consistently increase CTR. Ablation analysis shows that generative exploration is key to learning user preferences, avoiding the pitfalls of greedy exploit-only approaches. A human evaluation strongly supports our quantitative findings.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Iterative Sparse Identification of Nonlinear Dynamics
Authors:
Jinho Choi
Abstract:
In order to extract governing equations from time-series data, various approaches are proposed. Among those, sparse identification of nonlinear dynamics (SINDy) stands out as a successful method capable of modeling governing equations with a minimal number of terms, utilizing the principles of compressive sensing. This feature, which relies on a small number of terms, is crucial for interpretabili…
▽ More
In order to extract governing equations from time-series data, various approaches are proposed. Among those, sparse identification of nonlinear dynamics (SINDy) stands out as a successful method capable of modeling governing equations with a minimal number of terms, utilizing the principles of compressive sensing. This feature, which relies on a small number of terms, is crucial for interpretability. The effectiveness of SINDy hinges on the choice of candidate functions within its dictionary to extract governing equations of dynamical systems. A larger dictionary allows for more terms, enhancing the quality of approximations. However, the computational complexity scales with dictionary size, rendering SINDy less suitable for high-dimensional datasets, even though it has been successfully applied to low-dimensional datasets. To address this challenge, we introduce iterative SINDy in this paper, where the dictionary undergoes expansion and compression through iterations. We also conduct an analysis of the convergence properties of iterative SINDy. Simulation results validate that iterative SINDy can achieve nearly identical performance to SINDy, while significantly reducing computational complexity. Notably, iterative SINDy demonstrates effectiveness with high-dimensional time-series data without incurring the prohibitively high computational cost associated with SINDy.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
PANDA: Expanded Width-Aware Message Passing Beyond Rewiring
Authors:
Jeongwhan Choi,
Sumin Park,
Hyowon Wi,
Sung-Bae Cho,
Noseong Park
Abstract:
Recent research in the field of graph neural network (GNN) has identified a critical issue known as "over-squashing," resulting from the bottleneck phenomenon in graph structures, which impedes the propagation of long-range information. Prior works have proposed a variety of graph rewiring concepts that aim at optimizing the spatial or spectral properties of graphs to promote the signal propagatio…
▽ More
Recent research in the field of graph neural network (GNN) has identified a critical issue known as "over-squashing," resulting from the bottleneck phenomenon in graph structures, which impedes the propagation of long-range information. Prior works have proposed a variety of graph rewiring concepts that aim at optimizing the spatial or spectral properties of graphs to promote the signal propagation. However, such approaches inevitably deteriorate the original graph topology, which may lead to a distortion of information flow. To address this, we introduce an expanded width-aware (PANDA) message passing, a new message passing paradigm where nodes with high centrality, a potential source of over-squashing, are selectively expanded in width to encapsulate the growing influx of signals from distant nodes. Experimental results show that our method outperforms existing rewiring methods, suggesting that selectively expanding the hidden state of nodes can be a compelling alternative to graph rewiring for addressing the over-squashing.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Towards Dynamic Trend Filtering through Trend Point Detection with Reinforcement Learning
Authors:
Jihyeon Seong,
Sekwang Oh,
Jaesik Choi
Abstract:
Trend filtering simplifies complex time series data by applying smoothness to filter out noise while emphasizing proximity to the original data. However, existing trend filtering methods fail to reflect abrupt changes in the trend due to `approximateness,' resulting in constant smoothness. This approximateness uniformly filters out the tail distribution of time series data, characterized by extrem…
▽ More
Trend filtering simplifies complex time series data by applying smoothness to filter out noise while emphasizing proximity to the original data. However, existing trend filtering methods fail to reflect abrupt changes in the trend due to `approximateness,' resulting in constant smoothness. This approximateness uniformly filters out the tail distribution of time series data, characterized by extreme values, including both abrupt changes and noise. In this paper, we propose Trend Point Detection formulated as a Markov Decision Process (MDP), a novel approach to identifying essential points that should be reflected in the trend, departing from approximations. We term these essential points as Dynamic Trend Points (DTPs) and extract trends by interpolating them. To identify DTPs, we utilize Reinforcement Learning (RL) within a discrete action space and a forecasting sum-of-squares loss function as a reward, referred to as the Dynamic Trend Filtering network (DTF-net). DTF-net integrates flexible noise filtering, preserving critical original subsequences while removing noise as required for other subsequences. We demonstrate that DTF-net excels at capturing abrupt changes compared to other trend filtering algorithms and enhances forecasting performance, as abrupt changes are predicted rather than smoothed out.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Anomalous 4$f$ fine structure in TmSe$_{1-x}$Te$_x$ across the metal-insulator transition
Authors:
C. -H. Min,
S. Müller,
W. J. Choi,
L. Dudy,
V. Zabolotny,
M. Heber,
J. D. Denlinger,
C. -J. Kang,
M. Kalläne,
N. Wind,
M. Scholz,
T. L. Lee,
C. Schlueter,
A. Gloskovskii,
E. D. L. Rienks,
V. Hinkov,
H. Bentmann,
Y. S. Kwon,
F. Reinert,
K. Rossnagel
Abstract:
Hybridization between localized 4$f$ and itinerant 5$d$6$s$ states in heavy fermion compounds is a well-studied phenomenon and commonly captured by the paradigmatic Anderson model. However, the investigation of additional electronic interactions, beyond the standard Anderson model, has been limited, despite their predicted important role in the exotic quasiparticle formation in mixed-valence syste…
▽ More
Hybridization between localized 4$f$ and itinerant 5$d$6$s$ states in heavy fermion compounds is a well-studied phenomenon and commonly captured by the paradigmatic Anderson model. However, the investigation of additional electronic interactions, beyond the standard Anderson model, has been limited, despite their predicted important role in the exotic quasiparticle formation in mixed-valence systems. We investigate the 4$f$ states in TmSe$_{1-x}$Te$_x$ throughout a semimetal-insulator phase transition, which drastically varies the interactions related to the 4$f$ states. Using synchrotron-based hard x-ray and extreme ultraviolet photoemission spectroscopy, we resolve subtle peak splitting in the 4$f$ peaks near the Fermi level in the mixed-valent semimetal phase. The separation is enhanced by several tens of meV by increasing the lattice parameter by a few percent. Our results elucidate the evolving nature of the 4$f$ state across the phase transition, and provide direct experimental evidence for electronic interactions beyond the standard Anderson model in mixed-valence systems.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Stochastic Optimal Control for Diffusion Bridges in Function Spaces
Authors:
Byoungwoo Park,
Jungwon Choi,
Sungbin Lim,
Juho Lee
Abstract:
Recent advancements in diffusion models and diffusion bridges primarily focus on finite-dimensional spaces, yet many real-world problems necessitate operations in infinite-dimensional function spaces for more natural and interpretable formulations. In this paper, we present a theory of stochastic optimal control (SOC) tailored to infinite-dimensional spaces, aiming to extend diffusion-based algori…
▽ More
Recent advancements in diffusion models and diffusion bridges primarily focus on finite-dimensional spaces, yet many real-world problems necessitate operations in infinite-dimensional function spaces for more natural and interpretable formulations. In this paper, we present a theory of stochastic optimal control (SOC) tailored to infinite-dimensional spaces, aiming to extend diffusion-based algorithms to function spaces. Specifically, we demonstrate how Doob's $h$-transform, the fundamental tool for constructing diffusion bridges, can be derived from the SOC perspective and expanded to infinite dimensions. This expansion presents a challenge, as infinite-dimensional spaces typically lack closed-form densities. Leveraging our theory, we establish that solving the optimal control problem with a specific objective function choice is equivalent to learning diffusion-based generative models. We propose two applications: (1) learning bridges between two infinite-dimensional distributions and (2) generative models for sampling from an infinite-dimensional distribution. Our approach proves effective for diverse problems involving continuous function space representations, such as resolution-free images, time-series data, and probability density functions.
△ Less
Submitted 2 June, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
Open-Set Domain Adaptation for Semantic Segmentation
Authors:
Seun-An Choe,
Ah-Hyung Shin,
Keon-Hee Park,
Jinwoo Choi,
Gyeong-Moon Park
Abstract:
Unsupervised domain adaptation (UDA) for semantic segmentation aims to transfer the pixel-wise knowledge from the labeled source domain to the unlabeled target domain. However, current UDA methods typically assume a shared label space between source and target, limiting their applicability in real-world scenarios where novel categories may emerge in the target domain. In this paper, we introduce O…
▽ More
Unsupervised domain adaptation (UDA) for semantic segmentation aims to transfer the pixel-wise knowledge from the labeled source domain to the unlabeled target domain. However, current UDA methods typically assume a shared label space between source and target, limiting their applicability in real-world scenarios where novel categories may emerge in the target domain. In this paper, we introduce Open-Set Domain Adaptation for Semantic Segmentation (OSDA-SS) for the first time, where the target domain includes unknown classes. We identify two major problems in the OSDA-SS scenario as follows: 1) the existing UDA methods struggle to predict the exact boundary of the unknown classes, and 2) they fail to accurately predict the shape of the unknown classes. To address these issues, we propose Boundary and Unknown Shape-Aware open-set domain adaptation, coined BUS. Our BUS can accurately discern the boundaries between known and unknown classes in a contrastive manner using a novel dilation-erosion-based contrastive loss. In addition, we propose OpenReMix, a new domain mixing augmentation method that guides our model to effectively learn domain and size-invariant features for improving the shape detection of the known and unknown classes. Through extensive experiments, we demonstrate that our proposed BUS effectively detects unknown classes in the challenging OSDA-SS scenario compared to the previous methods by a large margin. The code is available at https://github.com/KHU-AGI/BUS.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
An Information Theoretic Metric for Evaluating Unlearning Models
Authors:
Dongjae Jeon,
Wonje Jeung,
Taeheon Kim,
Albert No,
Jonghyun Choi
Abstract:
Machine unlearning (MU) addresses privacy concerns by removing information of `forgetting data' samples from trained models. Typically, evaluating MU methods involves comparing unlearned models to those retrained from scratch without forgetting data, using metrics such as membership inference attacks (MIA) and accuracy measurements. These evaluations implicitly assume that if the output logits of…
▽ More
Machine unlearning (MU) addresses privacy concerns by removing information of `forgetting data' samples from trained models. Typically, evaluating MU methods involves comparing unlearned models to those retrained from scratch without forgetting data, using metrics such as membership inference attacks (MIA) and accuracy measurements. These evaluations implicitly assume that if the output logits of the unlearned and retrained models are similar, the unlearned model has successfully forgotten the data. Here, we challenge if this assumption is valid. In particular, we conduct a simple experiment of training only the last layer of a given original model using a novel masked-distillation technique while keeping the rest fixed. Surprisingly, simply altering the last layer yields favorable outcomes in the existing evaluation metrics, while the model does not successfully unlearn the samples or classes. For better evaluating the MU methods, we propose a metric that quantifies the residual information about forgetting data samples in intermediate features using mutual information, called information difference index or IDI for short. The IDI provides a comprehensive evaluation of MU methods by efficiently analyzing the internal structure of DNNs. Our metric is scalable to large datasets and adaptable to various model architectures. Additionally, we present COLapse-and-Align (COLA), a simple contrastive-based method that effectively unlearns intermediate features.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Differential Voltage Analysis and Patterns in Parallel-Connected Pairs of Imbalanced Cells
Authors:
Clement Wong,
Andrew Weng,
Sravan Pannala,
Jeesoon Choi,
Jason B. Siegel,
Anna Stefanopoulou
Abstract:
Diagnosing imbalances in capacity and resistance within parallel-connected cells in battery packs is critical for battery management and fault detection, but it is challenging given that individual currents flowing into each cell are often unmeasured. This work introduces a novel method useful for identifying imbalances in capacity and resistance within a pair of parallel-connected cells using onl…
▽ More
Diagnosing imbalances in capacity and resistance within parallel-connected cells in battery packs is critical for battery management and fault detection, but it is challenging given that individual currents flowing into each cell are often unmeasured. This work introduces a novel method useful for identifying imbalances in capacity and resistance within a pair of parallel-connected cells using only voltage and current measurements from the pair. Our method utilizes differential voltage analysis (DVA) when the pair is under constant current discharge and demonstrates that features of the pair's differential voltage curve (dV/dQ), namely its mid-to-high SOC dV/dQ peak's height and skewness, are sensitive to imbalances in capacity and resistance. We analyze and explain how and why these dV/dQ peak shape features change in response to these imbalances, highlighting that the underlying current imbalance dynamics resulting from these imbalances contribute to these changes. Ultimately, we demonstrate that dV/dQ peak shape features can identify the product of capacity imbalance and resistance imbalance, but cannot uniquely identify the imbalances. This work lays the groundwork for identifying imbalances in capacity and resistance in parallel-connected cell groups in battery packs, where commonly only a single current sensor is placed for each parallel cell group.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Enhancing Reliability in LEO Satellite Networks via High-Speed Inter-Satellite Links
Authors:
Jinho Choi
Abstract:
Low Earth orbit (LEO) satellites play a crucial role in providing global connectivity for non-terrestrial networks (NTNs) and supporting various Internet-of-Remote-Things (IoRT) applications. Each LEO satellite functions as a relay node in the sky, employing store-and-forward transmission strategies that necessitate the use of buffers. However, due to the finite size of these buffers, occurrences…
▽ More
Low Earth orbit (LEO) satellites play a crucial role in providing global connectivity for non-terrestrial networks (NTNs) and supporting various Internet-of-Remote-Things (IoRT) applications. Each LEO satellite functions as a relay node in the sky, employing store-and-forward transmission strategies that necessitate the use of buffers. However, due to the finite size of these buffers, occurrences of buffer overflow leading to packet loss are inevitable. In this paper, we demonstrate how inter-satellite links (ISLs) can mitigate the probability of buffer overflow. Specifically, we propose an approach to reallocate packets among LEO satellites via ISLs to minimize the occurrence of buffer overflow events. Consequently, the implementation of ISLs can lead to a more reliable satellite network, enabling efficient packet reallocation to reduce the probability of buffer overflow.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples
Authors:
Dae Ung Jo,
Kyuewang Lee,
JaeHo Chung,
Jin Young Choi
Abstract:
Securing a sufficient amount of paired data is important to train an image-text retrieval (ITR) model, but collecting paired data is very expensive. To address this issue, in this paper, we propose an active learning algorithm for ITR that can collect paired data cost-efficiently. Previous studies assume that image-text pairs are given and their category labels are asked to the annotator. However,…
▽ More
Securing a sufficient amount of paired data is important to train an image-text retrieval (ITR) model, but collecting paired data is very expensive. To address this issue, in this paper, we propose an active learning algorithm for ITR that can collect paired data cost-efficiently. Previous studies assume that image-text pairs are given and their category labels are asked to the annotator. However, in the recent ITR studies, the importance of category label is decreased since a retrieval model can be trained with only image-text pairs. For this reason, we set up an active learning scenario where unpaired images (or texts) are given and the annotator provides corresponding texts (or images) to make paired data. The key idea of the proposed AL algorithm is to select unpaired images (or texts) that can be hard negative samples for existing texts (or images). To this end, we introduce a novel scoring function to choose hard negative samples. We validate the effectiveness of the proposed method on Flickr30K and MS-COCO datasets.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier
Authors:
Aristeidis Tsaris,
Chengming Zhang,
Xiao Wang,
Junqi Yin,
Siyan Liu,
Moetasim Ashfaq,
Ming Fan,
Jong Youl Choi,
Mohamed Wahib,
Dan Lu,
Prasanna Balaprakash,
Feiyi Wang
Abstract:
Vision Transformers (ViTs) are pivotal for foundational models in scientific imagery, including Earth science applications, due to their capability to process large sequence lengths. While transformers for text has inspired scaling sequence lengths in ViTs, yet adapting these for ViTs introduces unique challenges. We develop distributed sequence parallelism for ViTs, enabling them to handle up to…
▽ More
Vision Transformers (ViTs) are pivotal for foundational models in scientific imagery, including Earth science applications, due to their capability to process large sequence lengths. While transformers for text has inspired scaling sequence lengths in ViTs, yet adapting these for ViTs introduces unique challenges. We develop distributed sequence parallelism for ViTs, enabling them to handle up to 1M tokens. Our approach, leveraging DeepSpeed-Ulysses and Long-Sequence-Segmentation with model sharding, is the first to apply sequence parallelism in ViT training, achieving a 94% batch scaling efficiency on 2,048 AMD-MI250X GPUs. Evaluating sequence parallelism in ViTs, particularly in models up to 10B parameters, highlighted substantial bottlenecks. We countered these with hybrid sequence, pipeline, tensor parallelism, and flash attention strategies, to scale beyond single GPU memory limits. Our method significantly enhances climate modeling accuracy by 20% in temperature predictions, marking the first training of a transformer model on a full-attention matrix over 188K sequence length.
△ Less
Submitted 17 April, 2024;
originally announced May 2024.
-
Diverse and Effective Synthetic Data Generation for Adaptable Zero-Shot Dialogue State Tracking
Authors:
James D. Finch,
Jinho D. Choi
Abstract:
We demonstrate substantial performance gains in zero-shot dialogue state tracking (DST) by enhancing training data diversity through synthetic data generation. Existing DST datasets are severely limited in the number of application domains and slot types they cover due to the high costs of data collection, restricting their adaptability to new domains. This work addresses this challenge with a nov…
▽ More
We demonstrate substantial performance gains in zero-shot dialogue state tracking (DST) by enhancing training data diversity through synthetic data generation. Existing DST datasets are severely limited in the number of application domains and slot types they cover due to the high costs of data collection, restricting their adaptability to new domains. This work addresses this challenge with a novel, fully automatic data generation approach that creates synthetic zero-shot DST datasets. Distinguished from previous methods, our approach can generate dialogues across a massive range of application domains, complete with silver-standard dialogue state annotations and slot descriptions. This technique is used to create the D0T dataset for training zero-shot DST models, encompassing an unprecedented 1,000+ domains. Experiments on the MultiWOZ benchmark show that training models on diverse synthetic data improves Joint Goal Accuracy by 6.7%, achieving results competitive with models 13.5 times larger than ours.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Sobolev regularity theory for stochastic reaction-diffusion-advection equations with spatially homogeneous colored noises and variable-order nonlocal operators
Authors:
Jae-Hwan Choi,
Beom-Seok Han,
Daehan Park
Abstract:
This article investigates the existence, uniqueness, and regularity of solutions to nonlinear stochastic reaction-diffusion-advection equations (SRDAEs) with spatially homogeneous colored noises and variable-order nonlocal operators in mixed norm $L_q(L_p)$-spaces. We introduce a new condition (strongly reinforced Dalang's condition) on colored noise, which facilitates a deeper understanding of th…
▽ More
This article investigates the existence, uniqueness, and regularity of solutions to nonlinear stochastic reaction-diffusion-advection equations (SRDAEs) with spatially homogeneous colored noises and variable-order nonlocal operators in mixed norm $L_q(L_p)$-spaces. We introduce a new condition (strongly reinforced Dalang's condition) on colored noise, which facilitates a deeper understanding of the complicated relation between nonlinearities and stochastic forces. Additionally, we establish the space-time Hölder type regularity of solutions.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Authors:
Jihwan Kim,
Junoh Kang,
Jinyoung Choi,
Bohyung Han
Abstract:
We propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without additional training. This is achieved by iteratively performing diagonal denoising, which concurrently processes a series of consecutive frames with increasing noise levels in a…
▽ More
We propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without additional training. This is achieved by iteratively performing diagonal denoising, which concurrently processes a series of consecutive frames with increasing noise levels in a queue; our method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail. However, diagonal denoising is a double-edged sword as the frames near the tail can take advantage of cleaner ones by forward reference but such a strategy induces the discrepancy between training and inference. Hence, we introduce latent partitioning to reduce the training-inference gap and lookahead denoising to leverage the benefit of forward referencing. Practically, FIFO-Diffusion consumes a constant amount of memory regardless of the target video length given a baseline model, while well-suited for parallel inference on multiple GPUs. We have demonstrated the promising results and effectiveness of the proposed methods on existing text-to-video generation baselines. Generated video samples and source codes are available at our project page.
△ Less
Submitted 12 June, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
Automating PTSD Diagnostics in Clinical Interviews: Leveraging Large Language Models for Trauma Assessments
Authors:
Sichang Tu,
Abigail Powers,
Natalie Merrill,
Negar Fani,
Sierra Carter,
Stephen Doogan,
Jinho D. Choi
Abstract:
The shortage of clinical workforce presents significant challenges in mental healthcare, limiting access to formal diagnostics and services. We aim to tackle this shortage by integrating a customized large language model (LLM) into the workflow, thus promoting equity in mental healthcare for the general population. Although LLMs have showcased their capability in clinical decision-making, their ad…
▽ More
The shortage of clinical workforce presents significant challenges in mental healthcare, limiting access to formal diagnostics and services. We aim to tackle this shortage by integrating a customized large language model (LLM) into the workflow, thus promoting equity in mental healthcare for the general population. Although LLMs have showcased their capability in clinical decision-making, their adaptation to severe conditions like Post-traumatic Stress Disorder (PTSD) remains largely unexplored. Therefore, we collect 411 clinician-administered diagnostic interviews and devise a novel approach to obtain high-quality data. Moreover, we build a comprehensive framework to automate PTSD diagnostic assessments based on interview contents by leveraging two state-of-the-art LLMs, GPT-4 and Llama-2, with potential for broader clinical diagnoses. Our results illustrate strong promise for LLMs, tested on our dataset, to aid clinicians in diagnostic validation. To the best of our knowledge, this is the first AI system that fully automates assessments for mental illness based on clinician-administered interviews.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
The daily modulations and broadband strategy in axion searches. An application with CAST-CAPP detector
Authors:
C. M. Adair,
K. Altenmüller,
V. Anastassopoulos,
S. Arguedas Cuendis,
J. Baier,
K. Barth,
A. Belov,
D. Bozicevic,
H. Bräuninger,
G. Cantatore,
F. Caspers,
J. F. Castel,
S. A. Çetin,
W. Chung,
H. Choi,
J. Choi,
T. Dafni,
M. Davenport,
A. Dermenev,
K. Desch,
B. Döbrich,
H. Fischer,
W. Funk,
J. Galan,
A. Gardikiotis
, et al. (38 additional authors not shown)
Abstract:
It has been previously advocated that the presence of the daily and annual modulations of the axion flux on the Earth's surface may dramatically change the strategy of the axion searches. The arguments were based on the so-called Axion Quark Nugget (AQN) dark matter model which was originally put forward to explain the similarity of the dark and visible cosmological matter densities…
▽ More
It has been previously advocated that the presence of the daily and annual modulations of the axion flux on the Earth's surface may dramatically change the strategy of the axion searches. The arguments were based on the so-called Axion Quark Nugget (AQN) dark matter model which was originally put forward to explain the similarity of the dark and visible cosmological matter densities $Ω_{\rm dark}\sim Ω_{\rm visible}$. In this framework, the population of galactic axions with mass $ 10^{-6} {\rm eV}\lesssim m_a\lesssim 10^{-3}{\rm eV}$ and velocity $\langle v_a\rangle\sim 10^{-3} c$ will be accompanied by axions with typical velocities $\langle v_a\rangle\sim 0.6 c$ emitted by AQNs. Furthermore, in this framework, it has also been argued that the AQN-induced axion daily modulation (in contrast with the conventional WIMP paradigm) could be as large as $(10-20)\%$, which represents the main motivation for the present investigation. We argue that the daily modulations along with the broadband detection strategy can be very useful tools for the discovery of such relativistic axions. The data from the CAST-CAPP detector have been used following such arguments. Unfortunately, due to the dependence of the amplifier chain on temperature-dependent gain drifts and other factors, we could not conclusively show the presence or absence of a dark sector-originated daily modulation. However, this proof of principle analysis procedure can serve as a reference for future studies.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Language-Oriented Semantic Latent Representation for Image Transmission
Authors:
Giordano Cicchetti,
Eleonora Grassucci,
Jihong Park,
Jinho Choi,
Sergio Barbarossa,
Danilo Comminiello
Abstract:
In the new paradigm of semantic communication (SC), the focus is on delivering meanings behind bits by extracting semantic information from raw data. Recent advances in data-to-text models facilitate language-oriented SC, particularly for text-transformed image communication via image-to-text (I2T) encoding and text-to-image (T2I) decoding. However, although semantically aligned, the text is too c…
▽ More
In the new paradigm of semantic communication (SC), the focus is on delivering meanings behind bits by extracting semantic information from raw data. Recent advances in data-to-text models facilitate language-oriented SC, particularly for text-transformed image communication via image-to-text (I2T) encoding and text-to-image (T2I) decoding. However, although semantically aligned, the text is too coarse to precisely capture sophisticated visual features such as spatial locations, color, and texture, incurring a significant perceptual difference between intended and reconstructed images. To address this limitation, in this paper, we propose a novel language-oriented SC framework that communicates both text and a compressed image embedding and combines them using a latent diffusion model to reconstruct the intended image. Experimental results validate the potential of our approach, which transmits only 2.09\% of the original image size while achieving higher perceptual similarities in noisy communication channels compared to a baseline SC method that communicates only through text.The code is available at https://github.com/ispamm/Img2Img-SC/ .
△ Less
Submitted 16 May, 2024;
originally announced May 2024.