-
RIS-Assisted High Resolution Radar Sensing
Authors:
Martin Voigt Vejling,
Hyowon Kim,
Christophe A. N. Biscio,
Henk Wymeersch,
Petar Popovski
Abstract:
This paper analyzes monostatic sensing by a user equipment (UE) for a setting in which the UE is unable to resolve multiple targets due to their interference within a single resolution bin. It is shown how sensing accuracy, in terms of both detection rate and localization accuracy, can be boosted by a reconfigurable intelligent surface (RIS), which can be advantageously used to provide signal dive…
▽ More
This paper analyzes monostatic sensing by a user equipment (UE) for a setting in which the UE is unable to resolve multiple targets due to their interference within a single resolution bin. It is shown how sensing accuracy, in terms of both detection rate and localization accuracy, can be boosted by a reconfigurable intelligent surface (RIS), which can be advantageously used to provide signal diversity and aid in resolving the targets. Specifically, assuming prior information on the presence of a cluster of targets, a RIS beam sweep procedure is used to facilitate the high resolution sensing. We derive the Cramér-Rao lower bounds (CRLBs) for channel parameter estimation and sensing and an upper bound on the detection probability. The concept of coherence is defined and analyzed theoretically. Then, we propose an orthogonal matching pursuit (OMP) channel estimation algorithm combined with data association to fuse the information of the non-RIS signal and the RIS signal and perform sensing. Finally, we provide numerical results to verify the potential of RIS for improving sensor resolution, and to demonstrate that the proposed methods can realize this potential for RIS-assisted high resolution sensing.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Neural Compression of Atmospheric States
Authors:
Piotr Mirowski,
David Warde-Farley,
Mihaela Rosca,
Matthew Koichi Grimes,
Yana Hasson,
Hyunjik Kim,
Mélanie Rey,
Simon Osindero,
Suman Ravuri,
Shakir Mohamed
Abstract:
Atmospheric states derived from reanalysis comprise a substantial portion of weather and climate simulation outputs. Many stakeholders -- such as researchers, policy makers, and insurers -- use this data to better understand the earth system and guide policy decisions. Atmospheric states have also received increased interest as machine learning approaches to weather prediction have shown promising…
▽ More
Atmospheric states derived from reanalysis comprise a substantial portion of weather and climate simulation outputs. Many stakeholders -- such as researchers, policy makers, and insurers -- use this data to better understand the earth system and guide policy decisions. Atmospheric states have also received increased interest as machine learning approaches to weather prediction have shown promising results. A key issue for all audiences is that dense time series of these high-dimensional states comprise an enormous amount of data, precluding all but the most well resourced groups from accessing and using historical data and future projections. To address this problem, we propose a method for compressing atmospheric states using methods from the neural network literature, adapting spherical data to processing by conventional neural architectures through the use of the area-preserving HEALPix projection. We investigate two model classes for building neural compressors: the hyperprior model from the neural image compression literature and recent vector-quantised models. We show that both families of models satisfy the desiderata of small average error, a small number of high-error reconstructed pixels, faithful reproduction of extreme events such as hurricanes and heatwaves, preservation of the spectral power distribution across spatial scales. We demonstrate compression ratios in excess of 1000x, with compression and decompression at a rate of approximately one second per global atmospheric state.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Batch SLAM with PMBM Data Association Sampling and Graph-Based Optimization
Authors:
Yu Ge,
Ossi Kaltiokallio,
Yuxuan Xia,
Ángel F. García-Fernández,
Hyowon Kim,
Jukka Talvitie,
Mikko Valkama,
Henk Wymeersch,
Lennart Svensson
Abstract:
Simultaneous localization and mapping (SLAM) methods need to both solve the data association (DA) problem and the joint estimation of the sensor trajectory and the map, conditioned on a DA. In this paper, we propose a novel integrated approach to solve both the DA problem and the batch SLAM problem simultaneously, combining random finite set (RFS) theory and the graph-based SLAM approach. A sampli…
▽ More
Simultaneous localization and mapping (SLAM) methods need to both solve the data association (DA) problem and the joint estimation of the sensor trajectory and the map, conditioned on a DA. In this paper, we propose a novel integrated approach to solve both the DA problem and the batch SLAM problem simultaneously, combining random finite set (RFS) theory and the graph-based SLAM approach. A sampling method based on the Poisson multi-Bernoulli mixture (PMBM) density is designed for dealing with the DA uncertainty, and a graph-based SLAM solver is applied for the conditional SLAM problem. In the end, a post-processing approach is applied to merge SLAM results from different iterations. Using synthetic data, it is demonstrated that the proposed SLAM approach achieves performance close to the posterior Cramér-Rao bound, and outperforms state-of-the-art RFS-based SLAM filters in high clutter and high process noise scenarios.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Mask-Free Neuron Concept Annotation for Interpreting Neural Networks in Medical Domain
Authors:
Hyeon Bae Kim,
Yong Hyun Ahn,
Seong Tae Kim
Abstract:
Recent advancements in deep neural networks have shown promise in aiding disease diagnosis and medical decision-making. However, ensuring transparent decision-making processes of AI models in compliance with regulations requires a comprehensive understanding of the model's internal workings. However, previous methods heavily rely on expensive pixel-wise annotated datasets for interpreting the mode…
▽ More
Recent advancements in deep neural networks have shown promise in aiding disease diagnosis and medical decision-making. However, ensuring transparent decision-making processes of AI models in compliance with regulations requires a comprehensive understanding of the model's internal workings. However, previous methods heavily rely on expensive pixel-wise annotated datasets for interpreting the model, presenting a significant drawback in medical domains. In this paper, we propose a novel medical neuron concept annotation method, named Mask-free Medical Model Interpretation (MAMMI), addresses these challenges. By using a vision-language model, our method relaxes the need for pixel-level masks for neuron concept annotation. MAMMI achieves superior performance compared to other interpretation methods, demonstrating its efficacy in providing rich representations for neurons in medical image analysis. Our experiments on a model trained on NIH chest X-rays validate the effectiveness of MAMMI, showcasing its potential for transparent clinical decision-making in the medical domain. The code is available at https://github.com/ailab-kyunghee/MAMMI.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Team HYU ASML ROBOVOX SP Cup 2024 System Description
Authors:
Jeong-Hwan Choi,
Gaeun Kim,
Hee-Jae Lee,
Seyun Ahn,
Hyun-Soo Kim,
Joon-Hyuk Chang
Abstract:
This report describes the submission of HYU ASML team to the IEEE Signal Processing Cup 2024 (SP Cup 2024). This challenge, titled "ROBOVOX: Far-Field Speaker Recognition by a Mobile Robot," focuses on speaker recognition using a mobile robot in noisy and reverberant conditions. Our solution combines the result of deep residual neural networks and time-delay neural network-based speaker embedding…
▽ More
This report describes the submission of HYU ASML team to the IEEE Signal Processing Cup 2024 (SP Cup 2024). This challenge, titled "ROBOVOX: Far-Field Speaker Recognition by a Mobile Robot," focuses on speaker recognition using a mobile robot in noisy and reverberant conditions. Our solution combines the result of deep residual neural networks and time-delay neural network-based speaker embedding models. These models were trained on a diverse dataset that includes French speech. To account for the challenging evaluation environment characterized by high noise, reverberation, and short speech conditions, we focused on data augmentation and training speech duration for the speaker embedding model. Our submission achieved second place on the SP Cup 2024 public leaderboard, with a detection cost function of 0.5245 and an equal error rate of 6.46%.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Flatfish Disease Detection Based on Part Segmentation Approach and Disease Image Generation
Authors:
Seo-Bin Hwang,
Han-Young Kim,
Chae-Yeon Heo,
Hie-Yong Jung,
Sung-Ju Jung,
Yeong-Jun Cho
Abstract:
The flatfish is a major farmed species consumed globally in large quantities. However, due to the densely populated farming environment, flatfish are susceptible to injuries and diseases, making early disease detection crucial. Traditionally, diseases were detected through visual inspection, but observing large numbers of fish is challenging. Automated approaches based on deep learning technologie…
▽ More
The flatfish is a major farmed species consumed globally in large quantities. However, due to the densely populated farming environment, flatfish are susceptible to injuries and diseases, making early disease detection crucial. Traditionally, diseases were detected through visual inspection, but observing large numbers of fish is challenging. Automated approaches based on deep learning technologies have been widely used, to address this problem, but accurate detection remains difficult due to the diversity of the fish and the lack of the fish disease dataset. In this study, augments fish disease images using generative adversarial networks and image harmonization methods. Next, disease detectors are trained separately for three body parts (head, fins, and body) to address individual diseases properly. In addition, a flatfish disease image dataset called \texttt{FlatIMG} is created and verified on the dataset using the proposed methods. A flash salmon disease dataset is also tested to validate the generalizability of the proposed methods. The results achieved 12\% higher performance than the baseline framework. This study is the first attempt to create a large-scale flatfish disease image dataset and propose an effective disease detection framework. Automatic disease monitoring could be achieved in farming environments based on the proposed methods and dataset.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Navigating the swarm: Deep neural networks command emergent behaviours
Authors:
Dongjo Kim,
Jeongsu Lee,
Ho-Young Kim
Abstract:
Interacting individuals in complex systems often give rise to coherent motion exhibiting coordinated global structures. Such phenomena are ubiquitously observed in nature, from cell migration, bacterial swarms, animal and insect groups, and even human societies. Primary mechanisms responsible for the emergence of collective behavior have been extensively identified, including local alignments base…
▽ More
Interacting individuals in complex systems often give rise to coherent motion exhibiting coordinated global structures. Such phenomena are ubiquitously observed in nature, from cell migration, bacterial swarms, animal and insect groups, and even human societies. Primary mechanisms responsible for the emergence of collective behavior have been extensively identified, including local alignments based on average or relative velocity, non-local pairwise repulsive-attractive interactions such as distance-based potentials, interplay between local and non-local interactions, and cognitive-based inhomogeneous interactions. However, discovering how to adapt these mechanisms to modulate emergent behaviours remains elusive. Here, we demonstrate that it is possible to generate coordinated structures in collective behavior at desired moments with intended global patterns by fine-tuning an inter-agent interaction rule. Our strategy employs deep neural networks, obeying the laws of dynamics, to find interaction rules that command desired collective structures. The decomposition of interaction rules into distancing and aligning forces, expressed by polynomial series, facilitates the training of neural networks to propose desired interaction models. Presented examples include altering the mean radius and size of clusters in vortical swarms, timing of transitions from random to ordered states, and continuously shifting between typical modes of collective motions. This strategy can even be leveraged to superimpose collective modes, resulting in hitherto unexplored but highly practical hybrid collective patterns, such as protective security formations. Our findings reveal innovative strategies for creating and controlling collective motion, paving the way for new applications in robotic swarm operations, active matter organisation, and for the uncovering of obscure interaction rules in biological systems.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Geometric additivity of modular commutator for multipartite entanglement
Authors:
Sung-Min Park,
Isaac H. Kim,
Eun-Gook Moon
Abstract:
A recent surge of research in many-body quantum entanglement has uncovered intriguing properties of quantum many-body systems. A prime example is the modular commutator, which can extract a topological invariant from a single wave function. Here, we unveil novel geometric properties of many-body entanglement via a modular commutator of two-dimensional gapped quantum many-body systems. We obtain th…
▽ More
A recent surge of research in many-body quantum entanglement has uncovered intriguing properties of quantum many-body systems. A prime example is the modular commutator, which can extract a topological invariant from a single wave function. Here, we unveil novel geometric properties of many-body entanglement via a modular commutator of two-dimensional gapped quantum many-body systems. We obtain the geometric additivity of a modular commutator, indicating that modular commutator for a multipartite system may be an integer multiple of the one for tripartite systems. Using our additivity formula, we also derive a curious identity for the modular commutators involving disconnected intervals in a certain class of conformal field theories. We further illustrate this geometric additivity for both bulk and edge subsystems using numerical calculations of the Haldane and $π$-flux models.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Joint-Embedding Predictive Architecture for Self-Supervised Learning of Mask Classification Architecture
Authors:
Dong-Hee Kim,
Sungduk Cho,
Hyeonwoo Cho,
Chanmin Park,
Jinyoung Kim,
Won Hwa Kim
Abstract:
In this work, we introduce Mask-JEPA, a self-supervised learning framework tailored for mask classification architectures (MCA), to overcome the traditional constraints associated with training segmentation models. Mask-JEPA combines a Joint Embedding Predictive Architecture with MCA to adeptly capture intricate semantics and precise object boundaries. Our approach addresses two critical challenge…
▽ More
In this work, we introduce Mask-JEPA, a self-supervised learning framework tailored for mask classification architectures (MCA), to overcome the traditional constraints associated with training segmentation models. Mask-JEPA combines a Joint Embedding Predictive Architecture with MCA to adeptly capture intricate semantics and precise object boundaries. Our approach addresses two critical challenges in self-supervised learning: 1) extracting comprehensive representations for universal image segmentation from a pixel decoder, and 2) effectively training the transformer decoder. The use of the transformer decoder as a predictor within the JEPA framework allows proficient training in universal image segmentation tasks. Through rigorous evaluations on datasets such as ADE20K, Cityscapes and COCO, Mask-JEPA demonstrates not only competitive results but also exceptional adaptability and robustness across various training scenarios. The architecture-agnostic nature of Mask-JEPA further underscores its versatility, allowing seamless adaptation to various mask classification family.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Properties of neutron stars and strangeness-mixed stars from a pion mean-field approach
Authors:
Nam-Yong Ghim,
Hyun-Chul Kim,
Ulugbek Yakhshiev,
Ghil-Seok Yang
Abstract:
We investigate the properties of the static neutron stars and strangeness-mixed stars, based on the equations of state derived from a pion mean-field approach. Using the empirical data on the pion-nucleus scattering and bulk properties of nuclear matter, we have already fixed all the parameters in a previous work, where the nucleons and hyperons were shown to be modified in various nuclear medium.…
▽ More
We investigate the properties of the static neutron stars and strangeness-mixed stars, based on the equations of state derived from a pion mean-field approach. Using the empirical data on the pion-nucleus scattering and bulk properties of nuclear matter, we have already fixed all the parameters in a previous work, where the nucleons and hyperons were shown to be modified in various nuclear medium. In the current work, we first examine the energy and pressure inside a neutron star. We show that the central densities in various neutron stars vary within the range of $(3-6)ρ_0$, where $ρ_0$ is the normal nuclear matter density. The mass-radius relations are obtained and discussed. As the slope parameter for neutron matter increases, the radii of the neutron stars increase with their masses fixed. We also study the strangeness-mixed stars or the hyperon stars using the same sets of the parameters. As the strangeness content of strange matter increases, the binding energy per nucleon is saturated and the corresponding equation of state becomes softened. Consequently, the central densities of the strangeness-mixed stars increase. Assuming that recently observed neutron stars are the strangeness-mixed ones, we find that the central densities increase. In the case of the pure strange stars, the central densities reach almost $(5-6)ρ_0$.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Multistate ferroelectric diodes with high electroresistance based on van der Waals heterostructures
Authors:
Soumya Sarkar,
Zirun Han,
Maheera Abdul Ghani,
Nives Strkalj,
Jung Ho Kim,
Yan Wang,
Deep Jariwala,
Manish Chhowalla
Abstract:
Some van der Waals (vdW) materials exhibit ferroelectricity, making them promising for novel non-volatile memories (NVMs) such as ferroelectric diodes (FeDs). CuInP2S6 (CIPS) is a well-known vdW ferroelectric that has been integrated with graphene for memory devices. Here we demonstrate FeDs with self-rectifying, hysteretic current-voltage characteristics based on vertical heterostructures of 10-n…
▽ More
Some van der Waals (vdW) materials exhibit ferroelectricity, making them promising for novel non-volatile memories (NVMs) such as ferroelectric diodes (FeDs). CuInP2S6 (CIPS) is a well-known vdW ferroelectric that has been integrated with graphene for memory devices. Here we demonstrate FeDs with self-rectifying, hysteretic current-voltage characteristics based on vertical heterostructures of 10-nm-thick CIPS and graphene. By using vdW indium-cobalt top electrodes and graphene bottom electrodes, we achieve high electroresistance (on- and off-state resistance ratios) of ~10^6, on-state rectification ratios of ~2500 for read/write voltages of 2 V/0.5 V and maximum output current densities of 100 A/cm^2. These metrics compare favourably with state-of-the-art FeDs. Piezoresponse force microscopy measurements show that stabilization of intermediate net polarization states in CIPS leads to stable multi-bit data retention at room temperature. The combination of two-terminal design, multi-bit memory, and low-power operation in CIPS-based FeDs is potentially interesting for compute-in-memory and neuromorphic computing applications.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
Authors:
Byeonghyun Pak,
Byeongju Woo,
Sunghwan Kim,
Dae-hwan Kim,
Hoseong Kim
Abstract:
In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. T…
▽ More
In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. To leverage the power of textual object queries, we introduce a novel framework named the textual query-driven mask transformer (tqdm). Our tqdm aims to (1) generate textual object queries that maximally encode domain-invariant semantics and (2) enhance the semantic clarity of dense visual features. Additionally, we suggest three regularization losses to improve the efficacy of tqdm by aligning between visual and textual features. By utilizing our method, the model can comprehend inherent semantic information for classes of interest, enabling it to generalize to extreme domains (e.g., sketch style). Our tqdm achieves 68.9 mIoU on GTA5$\rightarrow$Cityscapes, outperforming the prior state-of-the-art method by 2.5 mIoU. The project page is available at https://byeonghyunpak.github.io/tqdm.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Introducing VaDA: Novel Image Segmentation Model for Maritime Object Segmentation Using New Dataset
Authors:
Yongjin Kim,
Jinbum Park,
Sanha Kang,
Hanguen Kim
Abstract:
The maritime shipping industry is undergoing rapid evolution driven by advancements in computer vision artificial intelligence (AI). Consequently, research on AI-based object recognition models for maritime transportation is steadily growing, leveraging advancements in sensor technology and computing performance. However, object recognition in maritime environments faces challenges such as light r…
▽ More
The maritime shipping industry is undergoing rapid evolution driven by advancements in computer vision artificial intelligence (AI). Consequently, research on AI-based object recognition models for maritime transportation is steadily growing, leveraging advancements in sensor technology and computing performance. However, object recognition in maritime environments faces challenges such as light reflection, interference, intense lighting, and various weather conditions. To address these challenges, high-performance deep learning algorithms tailored to maritime imagery and high-quality datasets specialized for maritime scenes are essential. Existing AI recognition models and datasets have limited suitability for composing autonomous navigation systems. Therefore, in this paper, we propose a Vertical and Detail Attention (VaDA) model for maritime object segmentation and a new model evaluation method, the Integrated Figure of Calculation Performance (IFCP), to verify its suitability for the system in real-time. Additionally, we introduce a benchmark maritime dataset, OASIs (Ocean AI Segmentation Initiatives) to standardize model performance evaluation across diverse maritime environments. OASIs dataset and details are available at our website: https://www.navlue.com/dataset
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
On Ruijsenaars-Schneider spectrum from superconformal indices and ramified instantons
Authors:
Hee-Cheol Kim,
Anton Nedelin,
Shlomo S. Razamat
Abstract:
We discuss two physics-inspired approaches to derivation of the eigenfunctions and eigenvalues of $A_N$ Ruijsenaars-Schneider model. First approach which was recently proposed by the authors relies on the computations of superconformal indices of class $\mathcal{S}$ $4d$ ${\mathcal N}=2$ theories with the insertion of surface defects. Second approach uses computations of Nekrasov-Shatashvili limit…
▽ More
We discuss two physics-inspired approaches to derivation of the eigenfunctions and eigenvalues of $A_N$ Ruijsenaars-Schneider model. First approach which was recently proposed by the authors relies on the computations of superconformal indices of class $\mathcal{S}$ $4d$ ${\mathcal N}=2$ theories with the insertion of surface defects. Second approach uses computations of Nekrasov-Shatashvili limit of $5d$ ${\mathcal N} = 1^*$ instanton partition functions in the presence of co-dimension two defect. We compare results of these two approaches for the low-lying levels of Ruijsenaars-Schneider model. We also discuss different previously proposed exact quantization conditions for the Coulomb branch parameters of the instanton partition functions and their interpretations in terms of index calculations.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Centrality dependence of Lévy-stable two-pion Bose-Einstein correlations in $\sqrt{s_{_{NN}}}=200$ GeV Au$+$Au collisions
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
H. Al-Ta'ani,
J. Alexander,
A. Angerami,
K. Aoki,
N. Apadula,
Y. Aramaki,
H. Asano,
E. C. Aschenauer,
E. T. Atomssa,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
B. Bannier,
K. N. Barish,
B. Bassalleck,
S. Bathe
, et al. (377 additional authors not shown)
Abstract:
The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability…
▽ More
The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability $α$, and the Lévy-scale parameter $R$ as a function of transverse mass $m_T$ and centrality. The $λ(m_T)$ parameter is constant at larger values of $m_T$, but decreases as $m_T$ decreases. The Lévy scale parameter $R(m_T)$ decreases with $m_T$ and exhibits proportionality to the length scale of the nuclear overlap region. The Lévy exponent $α(m_T)$ is independent of $m_T$ within uncertainties in each investigated centrality bin, but shows a clear centrality dependence. At all centralities, the Lévy exponent $α$ is significantly different from that of Gaussian ($α=2$) or Cauchy ($α=1$) source distributions. Comparisons to the predictions of Monte-Carlo simulations of resonance-decay chains show that in all but the most peripheral centrality class (50%-60%), the obtained results are inconsistent with the measurements, unless a significant reduction of the in-medium mass of the $η'$ meson is included. In each centrality class, the best value of the in-medium $η'$ mass is compared to the mass of the $η$ meson, as well as to several theoretical predictions that consider restoration of $U_A(1)$ symmetry in hot hadronic matter.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
SelfIE: Self-Initiated Explorable Instructions Towards Enhanced User Experience
Authors:
Hyeongcheol Kim,
Katherine Fennedy,
Georgia Zhang,
Can Liu,
Shengdong Zhao
Abstract:
Given the widespread use of procedural instructions with non-linear access (situational information retrieval), there has been a proposal to accommodate both linear and non-linear usage in instructional design. However, it has received inadequate scholarly attention, leading to limited exploration. This paper introduces Self-Initiated Explorable (SelfIE) instructions, a new design concept aiming a…
▽ More
Given the widespread use of procedural instructions with non-linear access (situational information retrieval), there has been a proposal to accommodate both linear and non-linear usage in instructional design. However, it has received inadequate scholarly attention, leading to limited exploration. This paper introduces Self-Initiated Explorable (SelfIE) instructions, a new design concept aiming at enabling users to navigate instructions flexibly by blending linear and non-linear access according to individual needs and situations during tasks. Using a Wizard-of-Oz protocol, we initially embodied SelfIE instructions within a toy-block assembly context and compared it with baseline instructions offering linear-only access (N=21). Results show a 71% increase in user preferences due to its ease of reflecting individual differences, empirically supporting the prior proposal. Besides, our observations identify three strategies for flexible access and suggest the potential of enhancing the user experience by considering cognitive processes and implementing flexible access in a wearable configuration. Following the design phase, we translated the WoZ-based design embodiment as working prototypes on the tablet and OHMD to assess usability and compare user experience between the two configurations (N=8). Our data yields valuable insights into managing the trade-offs between the two configurations, thereby facilitating more effective flexible access development.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
VideoMamba: Spatio-Temporal Selective State Space Model
Authors:
Jinyoung Park,
Hee-Seon Kim,
Kangwook Ko,
Minbeom Kim,
Changick Kim
Abstract:
We introduce VideoMamba, a novel adaptation of the pure Mamba architecture, specifically designed for video recognition. Unlike transformers that rely on self-attention mechanisms leading to high computational costs by quadratic complexity, VideoMamba leverages Mamba's linear complexity and selective SSM mechanism for more efficient processing. The proposed Spatio-Temporal Forward and Backward SSM…
▽ More
We introduce VideoMamba, a novel adaptation of the pure Mamba architecture, specifically designed for video recognition. Unlike transformers that rely on self-attention mechanisms leading to high computational costs by quadratic complexity, VideoMamba leverages Mamba's linear complexity and selective SSM mechanism for more efficient processing. The proposed Spatio-Temporal Forward and Backward SSM allows the model to effectively capture the complex relationship between non-sequential spatial and sequential temporal information in video. Consequently, VideoMamba is not only resource-efficient but also effective in capturing long-range dependency in videos, demonstrated by competitive performance and outstanding efficiency on a variety of video understanding benchmarks. Our work highlights the potential of VideoMamba as a powerful tool for video understanding, offering a simple yet effective baseline for future research in video analysis.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Telescope control software and proto-model siderostat for the SDSS-V Local Volume Mapper
Authors:
Hojae Ahn,
Florian Briegel,
Jimin Han,
Mingyu Jeon,
Thomas M. Herbst,
Sumin Lee,
Woojin Park,
Sunwoo Lee,
Inhwan Jung,
Tae-Geun Ji,
Changgon Kim,
Geon Hee Kim,
Wolfgang Gaessler,
Markus Kuhlberg,
Hyun Chul Park,
Soojong Pak,
Nicholas P. Konidaris,
Niv Drory,
José R. Sánchez-Gallego,
Cynthia S. Froning,
Solange Ramirez,
Juna A. Kollmeier
Abstract:
The fifth Sloan Digital Sky Survey (SDSS-V) Local Volume Mapper (LVM) is a wide-field integral field unit (IFU) survey that uses an array of four 160 mm fixed telescopes with siderostats to minimize the number of moving parts. Individual telescope observes the science field or calibration field independently and is synchronized with the science exposure. We developed the LVM Acquisition and Guidin…
▽ More
The fifth Sloan Digital Sky Survey (SDSS-V) Local Volume Mapper (LVM) is a wide-field integral field unit (IFU) survey that uses an array of four 160 mm fixed telescopes with siderostats to minimize the number of moving parts. Individual telescope observes the science field or calibration field independently and is synchronized with the science exposure. We developed the LVM Acquisition and Guiding Package (LVMAGP) optimized telescope control software program for LVM observations, which can simultaneously control four focusers, three K-mirrors, one fiber selector, four mounts (siderostats), and seven guide cameras. This software is built on a hierarchical architecture and the SDSS framework and provides three key sequences: autofocus, field acquisition, and autoguide. We designed and fabricated a proto-model siderostat to test the telescope pointing model and LVMAGP software. The mirrors of the proto-model were designed as an isogrid open-back type, which reduced the weight by 46% and enabled reaching thermal equilibrium quickly. Additionally, deflection due to bolting torque, self-gravity, and thermal deformation was simulated, and the maximum scatter of the pointing model induced by the tilt of optomechanics was predicted to be $4'.4$, which can be compensated for by the field acquisition sequence. We performed a real sky test of LVMAGP with the proto-model siderostat and obtained field acquisition and autoguide accuracies of $0''.38$ and $1''.5$, respectively. It met all requirements except for the autoguide specification, which will be resolved by more precise alignment among the hardware components at Las Campanas Observatory.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
A Study of Digital Appliances Accessibility for People with Visual Disabilities
Authors:
Hyunjin An,
Hyundoug Kim,
Seungwoo Hong,
Youngsun Shin
Abstract:
This research aims to find where visually impaired users find appliances hard to use and suggest guideline to solve this issue. 181 visually impaired users have been surveyed, and 12 visually impaired users have been selected based on disability cause and classification. In a home-like environment, we had participants perform tasks which were sorted using Hierarchical task analysis on six major ho…
▽ More
This research aims to find where visually impaired users find appliances hard to use and suggest guideline to solve this issue. 181 visually impaired users have been surveyed, and 12 visually impaired users have been selected based on disability cause and classification. In a home-like environment, we had participants perform tasks which were sorted using Hierarchical task analysis on six major home appliances. From this research we found out that home appliances sometimes only provide visual information which causes difficulty in sensory processing. Also, interfaces tactile/auditory feedbacks are the same making it hard for people to recognize which feature is processed. Blind users cannot see the provided information so they rely on long-term memory to use products. This research provides guideline for button, knob and remote control interface for visually impaired users. This information will be helpful for project planners, designers, and developers to create products which are accessible by visually impaired people. Some of the features will be applied to upcoming home appliance products.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models
Authors:
Chani Jung,
Dongkwan Kim,
Jiho Jin,
Jiseon Kim,
Yeon Seonwoo,
Yejin Choi,
Alice Oh,
Hyunwoo Kim
Abstract:
While humans naturally develop theory of mind (ToM), the capability to understand other people's mental states and beliefs, state-of-the-art large language models (LLMs) underperform on simple ToM benchmarks. We posit that we can extend our understanding of LLMs' ToM abilities by evaluating key human ToM precursors -- perception inference and perception-to-belief inference -- in LLMs. We introduce…
▽ More
While humans naturally develop theory of mind (ToM), the capability to understand other people's mental states and beliefs, state-of-the-art large language models (LLMs) underperform on simple ToM benchmarks. We posit that we can extend our understanding of LLMs' ToM abilities by evaluating key human ToM precursors -- perception inference and perception-to-belief inference -- in LLMs. We introduce two datasets, Percept-ToMi and Percept-FANToM, to evaluate these precursory inferences for ToM in LLMs by annotating characters' perceptions on ToMi and FANToM, respectively. Our evaluation of eight state-of-the-art LLMs reveals that the models generally perform well in perception inference while exhibiting limited capability in perception-to-belief inference (e.g., lack of inhibitory control). Based on these results, we present PercepToM, a novel ToM method leveraging LLMs' strong perception inference capability while supplementing their limited perception-to-belief inference. Experimental results demonstrate that PercepToM significantly enhances LLM's performance, especially in false belief scenarios.
△ Less
Submitted 9 July, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Improved limit on neutrinoless double beta decay of \mohundred~from AMoRE-I
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (83 additional authors not shown)
Abstract:
AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate c…
▽ More
AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate crystals, at the Yangyang Underground Laboratory for over two years. The exposure was 8.02 kg$\cdot$year (or 3.89 kg$_{\mathrm{^{100}Mo}}\cdot$year) and the total background rate near the Q-value was 0.025 $\pm$ 0.002 counts/keV/kg/year. We observed no indication of $0νββ$ decay and report a new lower limit of the half-life of $^{100}$Mo $0νββ$ decay as $ T^{0ν}_{1/2}>3.0\times10^{24}~\mathrm{years}$ at 90\% confidence level. The effective Majorana mass limit range is $m_{ββ}<$(210--610) meV using nuclear matrix elements estimated in the framework of different models, including the recent shell model calculations.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
A Supersymmetric Extension of $w_{1+\infty}$ Algebra in the Celestial Holography
Authors:
Changhyun Ahn,
Man Hea Kim
Abstract:
We determine the ${\cal N}=1$ supersymmetric topological $W_{\infty} $ algebra by using the $λ$ deformed bosons $(β,γ)$ and fermions $(b,c)$ ghost system. By considering the real bosons and the real fermions at $λ=0$ (or $λ=\frac{1}{2}$), the ${\cal N}=1$ supersymmetric $W_{\frac{\infty}{2}}$ algebra is obtained. At $λ=\frac{1}{4}$, other ${\cal N}=1$ supersymmetric $W_{1+\infty}[λ=\frac{1}{4}]$ a…
▽ More
We determine the ${\cal N}=1$ supersymmetric topological $W_{\infty} $ algebra by using the $λ$ deformed bosons $(β,γ)$ and fermions $(b,c)$ ghost system. By considering the real bosons and the real fermions at $λ=0$ (or $λ=\frac{1}{2}$), the ${\cal N}=1$ supersymmetric $W_{\frac{\infty}{2}}$ algebra is obtained. At $λ=\frac{1}{4}$, other ${\cal N}=1$ supersymmetric $W_{1+\infty}[λ=\frac{1}{4}]$ algebra is determined. We also obtain the extension of Lie superalgebra $PSU(2,2|{\cal N}=4)$ appearing in the worldsheet theory by using the symplectic bosons and the fermions. We identify the soft current algebra between the graviton, the gravitino, the photon (the gluon), the photino (the gluino) or the scalars, equivalent to ${\cal N}=1$ supersymmetric $W_{1+\infty}[λ]$ algebra, in two dimensions with the ${\cal N}=1$ supergravity theory in four dimensions discovered by Freedman, van Nieuwenhuizen and Ferrara in 1976 and its matter coupled theories, via celestial holography.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension
Authors:
Zekun Li,
Xianjun Yang,
Kyuri Choi,
Wanrong Zhu,
Ryan Hsieh,
HyeonJung Kim,
Jin Hyuk Lim,
Sungyoung Ji,
Byungju Lee,
Xifeng Yan,
Linda Ruth Petzold,
Stephen D. Wilson,
Woosang Lim,
William Yang Wang
Abstract:
The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks pr…
▽ More
The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks primarily focus on relatively simple scientific tasks and figures, lacking comprehensive assessments across diverse advanced scientific disciplines. To bridge this gap, we collected a multimodal, multidisciplinary dataset from open-access scientific articles published in Nature Communications journals. This dataset spans 72 scientific disciplines, ensuring both diversity and quality. We created benchmarks with various tasks and settings to comprehensively evaluate LMMs' capabilities in understanding scientific figures and content. Our evaluation revealed that these tasks are highly challenging: many open-source models struggled significantly, and even GPT-4V and GPT-4o faced difficulties. We also explored using our dataset as training resources by constructing visual instruction-following data, enabling the 7B LLaVA model to achieve performance comparable to GPT-4V/o on our benchmark. Additionally, we investigated the use of our interleaved article texts and figure images for pre-training LMMs, resulting in improvements on the material generation task. The source dataset, including articles, figures, constructed benchmarks, and visual instruction-following data, is open-sourced.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection
Authors:
YeongHyeon Park,
Sungho Kang,
Myung Jin Kim,
Hyeong Seok Kim,
Juneho Yi
Abstract:
In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that av…
▽ More
In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that avoids large-scale complex NNs. Motivated by this, we aim to optimize the UAD performance with minimal changes to NN settings. Thus, we revisit the reconstruction-by-inpainting approach and rethink to improve it by analyzing strengths and weaknesses. The strength of the SOTA methods is a single deterministic masking approach that addresses the challenges of random multiple masking that is inference latency and output inconsistency. Nevertheless, the issue of failure to provide a mask to completely cover anomalous regions is a remaining weakness. To mitigate this issue, we propose Feature Attenuation of Defective Representation (FADeR) that only employs two MLP layers which attenuates feature information of anomaly reconstruction during decoding. By leveraging FADeR, features of unseen anomaly patterns are reconstructed into seen normal patterns, reducing false alarms. Experimental results demonstrate that FADeR achieves enhanced performance compared to similar-scale NNs. Furthermore, our approach exhibits scalability in performance enhancement when integrated with other single deterministic masking methods in a plug-and-play manner.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech
Authors:
Haechan Kim,
Junho Myung,
Seoyoung Kim,
Sungpah Lee,
Dongyeop Kang,
Juho Kim
Abstract:
Prevalent ungrammatical expressions and disfluencies in spontaneous speech from second language (L2) learners pose unique challenges to Automatic Speech Recognition (ASR) systems. However, few datasets are tailored to L2 learner speech. We publicly release LearnerVoice, a dataset consisting of 50.04 hours of audio and transcriptions of L2 learners' spontaneous speech. Our linguistic analysis revea…
▽ More
Prevalent ungrammatical expressions and disfluencies in spontaneous speech from second language (L2) learners pose unique challenges to Automatic Speech Recognition (ASR) systems. However, few datasets are tailored to L2 learner speech. We publicly release LearnerVoice, a dataset consisting of 50.04 hours of audio and transcriptions of L2 learners' spontaneous speech. Our linguistic analysis reveals that transcriptions in our dataset contain L2S (L2 learner's Spontaneous speech) features, consisting of ungrammatical expressions and disfluencies (e.g., filler words, word repetitions, self-repairs, false starts), significantly more than native speech datasets. Fine-tuning whisper-small.en with LearnerVoice achieves a WER of 10.26%, 44.2% lower than vanilla whisper-small.en. Furthermore, our qualitative analysis indicates that 54.2% of errors from the vanilla model on LearnerVoice are attributable to L2S features, with 48.1% of them being reduced in the fine-tuned model.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Variational Partial Group Convolutions for Input-Aware Partial Equivariance of Rotations and Color-Shifts
Authors:
Hyunsu Kim,
Yegon Kim,
Hongseok Yang,
Juho Lee
Abstract:
Group Equivariant CNNs (G-CNNs) have shown promising efficacy in various tasks, owing to their ability to capture hierarchical features in an equivariant manner. However, their equivariance is fixed to the symmetry of the whole group, limiting adaptability to diverse partial symmetries in real-world datasets, such as limited rotation symmetry of handwritten digit images and limited color-shift sym…
▽ More
Group Equivariant CNNs (G-CNNs) have shown promising efficacy in various tasks, owing to their ability to capture hierarchical features in an equivariant manner. However, their equivariance is fixed to the symmetry of the whole group, limiting adaptability to diverse partial symmetries in real-world datasets, such as limited rotation symmetry of handwritten digit images and limited color-shift symmetry of flower images. Recent efforts address this limitation, one example being Partial G-CNN which restricts the output group space of convolution layers to break full equivariance. However, such an approach still fails to adjust equivariance levels across data. In this paper, we propose a novel approach, Variational Partial G-CNN (VP G-CNN), to capture varying levels of partial equivariance specific to each data instance. VP G-CNN redesigns the distribution of the output group elements to be conditioned on input data, leveraging variational inference to avoid overfitting. This enables the model to adjust its equivariance levels according to the needs of individual data points. Additionally, we address training instability inherent in discrete group equivariance models by redesigning the reparametrizable distribution. We demonstrate the effectiveness of VP G-CNN on both toy and real-world datasets, including MNIST67-180, CIFAR10, ColorMNIST, and Flowers102. Our results show robust performance, even in uncertainty metrics.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Multiomics-based Outcome Prediction for the Treatment of Brain Metastases with Personalized Ultra-fractionated Stereotactic Adaptive Radiotherapy (PULSAR)
Authors:
Haozhao Zhang,
Michael Dohopolski,
Strahinja Stojadinovic,
Luiza Giuliani,
Soummitra Anand,
Heejung Kim,
Arnold Pompos,
Andrew Godley,
Steve Jiang,
Tu Dan,
Zabi Wardak,
Robert Timmerman,
Hao Peng
Abstract:
Purpose: We aimed to develop a data-driven multiomics approach integrating radiomics, dosiomics, and delta features to predict treatment response at an earlier stage (intra-treatment) for brain metastases (BMs) patients treated with PULSAR. Methods: We conducted a retrospective study of 39 patients with 69 BMs treated with PULSAR. Radiomics, dosiomics, and delta features were extracted from pretre…
▽ More
Purpose: We aimed to develop a data-driven multiomics approach integrating radiomics, dosiomics, and delta features to predict treatment response at an earlier stage (intra-treatment) for brain metastases (BMs) patients treated with PULSAR. Methods: We conducted a retrospective study of 39 patients with 69 BMs treated with PULSAR. Radiomics, dosiomics, and delta features were extracted from pretreatment and intra-treatment MRI scans and dose distributions. Six individual models and an ensemble feature selection (EFS) model were constructed using SVM and evaluated via stratified 5-fold cross-validation. The classification task distinguished lesions with >20% volume reduction at follow-up. We assessed performance metrics including sensitivity, specificity, accuracy, precision, F1 score, and AUC. Various feature extraction and ensemble selection scenarios were explored to enhance model robustness and reduce overfitting. Results: The EFS model, integrating features from pre-treatment radiomics and dosiomics, intra-treatment radiomics, and delta-radiomics, outperformed six individual models. It achieved an AUC of 0.979, accuracy of 0.917, and F1 score of 0.821. Among the top 9 features, six were derived from post-wavelet transformation, and three from original images. The discrete wavelet transform decomposes volumetric images into multi-resolution components, offering a more comprehensive characterization of underlying structures. Conclusion: Our study demonstrated the feasibility of employing a data-driven multiomics approach to predict tumor volume changes in BMs patients with PULSAR. The EFS model demonstrates enhanced performance compared with six individual models, emphasizing the importance of integrating both pretreatment and intra-treatment data. This application holds promise for optimizing BMs management, potentially mitigating risks associated with under- or over-treatment.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Evidence of $h_{b}(\text{2P}) \to Υ(\text{1S})η$ decay and search for $h_{b}(\text{1P,2P}) \to Υ(\text{1S})π^0$ with the Belle detector
Authors:
Belle Collaboration,
E. Kovalenko,
I. Adachi,
H. Aihara,
D. M. Asner,
T. Aushev,
R. Ayad,
V. Babu,
Sw. Banerjee,
K. Belous,
J. Bennett,
M. Bessner,
T. Bilka,
D. Biswas,
A. Bobrov,
D. Bodrov,
A. Bondar,
A. Bozek,
M. Bračko,
P. Branchini,
T. E. Browder,
A. Budano,
M. Campajola,
M. -C. Chang,
B. G. Cheon
, et al. (142 additional authors not shown)
Abstract:
We report the first evidence for the $h_{b}(\text{2P}) \to Υ(\text{1S})η$ transition with a significance of $3.5$ standard deviations. The decay branching fraction is measured to be $\mathcal{B}[h_{b}(\text{2P}) \to Υ(\text{1S})η]=(7.1 ~^{+3.7} _{-3.2}\pm 0.8)\times10^{-3}$, which is noticeably smaller than expected. We also set upper limits on $π^0$ transitions of…
▽ More
We report the first evidence for the $h_{b}(\text{2P}) \to Υ(\text{1S})η$ transition with a significance of $3.5$ standard deviations. The decay branching fraction is measured to be $\mathcal{B}[h_{b}(\text{2P}) \to Υ(\text{1S})η]=(7.1 ~^{+3.7} _{-3.2}\pm 0.8)\times10^{-3}$, which is noticeably smaller than expected. We also set upper limits on $π^0$ transitions of $\mathcal{B}[h_{b}(\text{2P}) \to Υ(\text{1S})π^0] < 1.8\times10^{-3}$, and $\mathcal{B}[h_{b}(\text{1P})\to Υ(\text{1S})π^0] < 1.8\times10^{-3}$, at the $90\%$ confidence level. These results are obtained with a $131.4$~fb$^{-1}$ data sample collected near the $Υ(\text{5S})$ resonance with the Belle detector at the KEKB asymmetric-energy $e^+e^-$ collider.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Authors:
Sungnyun Kim,
Kangwook Jang,
Sangmin Bae,
Hoirin Kim,
Se-Young Yun
Abstract:
Audio-visual speech recognition (AVSR) aims to transcribe human speech using both audio and video modalities. In practical environments with noise-corrupted audio, the role of video information becomes crucial. However, prior works have primarily focused on enhancing audio features in AVSR, overlooking the importance of video features. In this study, we strengthen the video features by learning th…
▽ More
Audio-visual speech recognition (AVSR) aims to transcribe human speech using both audio and video modalities. In practical environments with noise-corrupted audio, the role of video information becomes crucial. However, prior works have primarily focused on enhancing audio features in AVSR, overlooking the importance of video features. In this study, we strengthen the video features by learning three temporal dynamics in video data: context order, playback direction, and the speed of video frames. Cross-modal attention modules are introduced to enrich video features with audio information so that speech variability can be taken into account when training on the video temporal dynamics. Based on our approach, we achieve the state-of-the-art performance on the LRS2 and LRS3 AVSR benchmarks for the noise-dominant settings. Our approach excels in scenarios especially for babble and speech noise, indicating the ability to distinguish the speech signal that should be recognized from lip movements in the video modality. We support the validity of our methodology by offering the ablation experiments for the temporal dynamics losses and the cross-modal attention architecture design.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory
Authors:
Suyeon Lee,
Sunghwan Kim,
Minju Kim,
Dongjin Kang,
Dongil Yang,
Harim Kim,
Minseok Kang,
Dayi Jung,
Min Hee Kim,
Seungbeen Lee,
Kyoung-Mee Chung,
Youngjae Yu,
Dongha Lee,
Jinyoung Yeo
Abstract:
Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To add…
▽ More
Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To address this, we introduce Cactus, a multi-turn dialogue dataset that emulates real-life interactions using the goal-oriented and structured approach of Cognitive Behavioral Therapy (CBT). We create a diverse and realistic dataset by designing clients with varied, specific personas, and having counselors systematically apply CBT techniques in their interactions. To assess the quality of our data, we benchmark against established psychological criteria used to evaluate real counseling sessions, ensuring alignment with expert evaluations. Experimental results demonstrate that Camel, a model trained with Cactus, outperforms other models in counseling skills, highlighting its effectiveness and potential as a counseling agent. We make our data, model, and code publicly available.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Effective Heterogeneous Federated Learning via Efficient Hypernetwork-based Weight Generation
Authors:
Yujin Shin,
Kichang Lee,
Sungmin Lee,
You Rim Choi,
Hyung-Sin Kim,
JeongGil Ko
Abstract:
While federated learning leverages distributed client resources, it faces challenges due to heterogeneous client capabilities. This necessitates allocating models suited to clients' resources and careful parameter aggregation to accommodate this heterogeneity. We propose HypeMeFed, a novel federated learning framework for supporting client heterogeneity by combining a multi-exit network architectu…
▽ More
While federated learning leverages distributed client resources, it faces challenges due to heterogeneous client capabilities. This necessitates allocating models suited to clients' resources and careful parameter aggregation to accommodate this heterogeneity. We propose HypeMeFed, a novel federated learning framework for supporting client heterogeneity by combining a multi-exit network architecture with hypernetwork-based model weight generation. This approach aligns the feature spaces of heterogeneous model layers and resolves per-layer information disparity during weight aggregation. To practically realize HypeMeFed, we also propose a low-rank factorization approach to minimize computation and memory overhead associated with hypernetworks. Our evaluations on a real-world heterogeneous device testbed indicate that HypeMeFed enhances accuracy by 5.12% over FedAvg, reduces the hypernetwork memory requirements by 98.22%, and accelerates its operations by 1.86 times compared to a naive hypernetwork approach. These results demonstrate HypeMeFed's effectiveness in leveraging and engaging heterogeneous clients for federated learning.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
RISC-V R-Extension: Advancing Efficiency with Rented-Pipeline for Edge DNN Processing
Authors:
Won Hyeok Kim,
Hyeong Jin Kim,
Tae Hee Han
Abstract:
The proliferation of edge devices necessitates efficient computational architectures for lightweight tasks, particularly deep neural network (DNN) inference. Traditional NPUs, though effective for such operations, face challenges in power, cost, and area when integrated into lightweight edge devices. The RISC-V architecture, known for its modularity and open-source nature, offers a viable alternat…
▽ More
The proliferation of edge devices necessitates efficient computational architectures for lightweight tasks, particularly deep neural network (DNN) inference. Traditional NPUs, though effective for such operations, face challenges in power, cost, and area when integrated into lightweight edge devices. The RISC-V architecture, known for its modularity and open-source nature, offers a viable alternative. This paper introduces the RISC-V R-extension, a novel approach to enhancing DNN process efficiency on edge devices. The extension features rented-pipeline stages and architectural pipeline registers (APR), which optimize critical operation execution, thereby reducing latency and memory access frequency. Furthermore, this extension includes new custom instructions to support these architectural improvements. Through comprehensive analysis, this study demonstrates the boost of R-extension in edge device processing, setting the stage for more responsive and intelligent edge applications.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
$C^{1,α}$ regularity for degenerate fully nonlinear elliptic equations with oblique boundary conditions on $C^1$ domains
Authors:
Sun-Sig Byun,
Hongsoo Kim,
Jehan Oh
Abstract:
We provide a sharp $C^{1,α}$ estimate up to the boundary for a viscosity solution of a degenerate fully nonlinear elliptic equation with the oblique boundary condition on a $C^1$ domain. To this end, we first obtain a uniform boundary H{ö}lder estimate with the oblique boundary condition in an "almost $C^1$-flat" domain for the equations which is uniformly elliptic only where the gradient is far f…
▽ More
We provide a sharp $C^{1,α}$ estimate up to the boundary for a viscosity solution of a degenerate fully nonlinear elliptic equation with the oblique boundary condition on a $C^1$ domain. To this end, we first obtain a uniform boundary H{ö}lder estimate with the oblique boundary condition in an "almost $C^1$-flat" domain for the equations which is uniformly elliptic only where the gradient is far from some point, and then we establish a desired $C^{1,α}$ regularity based on perturbation and compactness arguments.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Study of $χ_{bJ}(2P)\toωΥ(1S)$ at Belle
Authors:
Belle Collaboration,
Z. S. Stottler,
T. K. Pedlar,
B. G. Fulsom,
I. Adachi,
K. Adamczyk,
H. Aihara,
S. Al Said,
D. M. Asner,
H. Atmacan,
T. Aushev,
R. Ayad,
V. Babu,
Sw. Banerjee,
M. Bauer,
P. Behera,
K. Belous,
J. Bennett,
F. Bernlochner,
M. Bessner,
T. Bilka,
D. Biswas,
A. Bobrov,
D. Bodrov,
G. Bonvicini
, et al. (157 additional authors not shown)
Abstract:
We report a study of the hadronic transitions $χ_{bJ}(2P)\toωΥ(1S)$, with $ω\toπ^{+}π^{-}π^{0}$, using $28.2\times10^6~Υ(3S)$ mesons recorded by the Belle detector. We present the first evidence for the near--threshold transition $χ_{b0}(2P)\toωΥ(1S)$, the analog of the charm sector decay $χ_{c1}(3872)\toωJ/ψ$, with a branching fraction of…
▽ More
We report a study of the hadronic transitions $χ_{bJ}(2P)\toωΥ(1S)$, with $ω\toπ^{+}π^{-}π^{0}$, using $28.2\times10^6~Υ(3S)$ mesons recorded by the Belle detector. We present the first evidence for the near--threshold transition $χ_{b0}(2P)\toωΥ(1S)$, the analog of the charm sector decay $χ_{c1}(3872)\toωJ/ψ$, with a branching fraction of $B\big(χ_{b0}(2P)\toωΥ(1S)\big) = \big(0.55\pm0.19\pm0.07\big)\%$. We also obtain branching fractions of $B\big(χ_{b1}(2P)\toωΥ(1S)\big) = \big(2.39{}^{+0.20}_{-0.19}\pm0.24\big)\%$ and $B\big(χ_{b2}(2P)\toωΥ(1S)\big) = \big(0.47{}^{+0.13}_{-0.12}\pm0.06\big)\%$, confirming the measurement of the $ω$ transitions of the $J=1,2~P$--wave states. The ratio for the $J=2$ to $J=1$ transitions is also measured and found to differ by 3.3 standard deviations from the expected value in the QCD multipole expansion.
△ Less
Submitted 8 July, 2024; v1 submitted 30 June, 2024;
originally announced July 2024.
-
Resilient Estimator-based Control Barrier Functions for Dynamical Systems with Disturbances and Noise
Authors:
Chuyuan Tao,
Wenbin Wan,
Junjie Gao,
Bihao Mo,
Hunmin Kim,
Naira Hovakimyan
Abstract:
Control Barrier Function (CBF) is an emerging method that guarantees safety in path planning problems by generating a control command to ensure the forward invariance of a safety set. Most of the developments up to date assume availability of correct state measurements and absence of disturbances on the system. However, if the system incurs disturbances and is subject to noise, the CBF cannot guar…
▽ More
Control Barrier Function (CBF) is an emerging method that guarantees safety in path planning problems by generating a control command to ensure the forward invariance of a safety set. Most of the developments up to date assume availability of correct state measurements and absence of disturbances on the system. However, if the system incurs disturbances and is subject to noise, the CBF cannot guarantee safety due to the distorted state estimate. To improve the resilience and adaptability of the CBF, we propose a resilient estimator-based control barrier function (RE-CBF), which is based on a novel stochastic CBF optimization and resilient estimator, to guarantee the safety of systems with disturbances and noise in the path planning problems. The proposed algorithm uses the resilient estimation algorithm to estimate disturbances and counteract their effect using novel stochastic CBF optimization, providing safe control inputs for dynamical systems with disturbances and noise. To demonstrate the effectiveness of our algorithm in handling both noise and disturbances in dynamics and measurement, we design a quadrotor testing pipeline to simulate the proposed algorithm and then implement the algorithm on a real drone in our flying arena. Both simulations and real-world experiments show that the proposed method can guarantee safety for systems with disturbances and noise.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Direct observation of layer skyrmions in twisted WSe2 bilayers
Authors:
Fan Zhang,
Nicolás Morales-Durán,
Yanxing Li,
Wang Yao,
Jung-Jung Su,
Yu-Chuan Lin,
Chengye Dong,
Hyunsue Kim,
Joshua A. Robinson,
Allan H. Macdonald,
Chih-Kang Shih
Abstract:
Transition metal dichalcogenide (TMD) twisted homobilayers have been established as an ideal platform for studying strong correlation phenomena, as exemplified by the recent discovery of fractional Chern insulator (FCI) states in twisted MoTe2 and Chern insulators (CI) and unconventional superconductivity in twisted WSe2. In these systems, nontrivial topology in the strongly layer-hybridized regim…
▽ More
Transition metal dichalcogenide (TMD) twisted homobilayers have been established as an ideal platform for studying strong correlation phenomena, as exemplified by the recent discovery of fractional Chern insulator (FCI) states in twisted MoTe2 and Chern insulators (CI) and unconventional superconductivity in twisted WSe2. In these systems, nontrivial topology in the strongly layer-hybridized regime can arise from a spatial patterning of interlayer tunneling amplitudes and layer-dependent potentials that yields a lattice of layer skyrmions. Here we report the direct observation of skyrmion textures in the layer degree of freedom of Rhombohedral-stacked (R-stacked) twisted WSe2 homobilayers. This observation is based on scanning tunneling spectroscopy that separately resolves the Γ-valley and K-valley moiré electronic states. We show that Γ-valley states are subjected to a moiré potential with an amplitude of ~ 120 meV. At ~150 meV above the Γ-valley, the K-valley states are subjected to a weaker moiré potential of ~30 meV. Most significantly, we reveal opposite layer polarization of the K-valley at the MX and XM sites within the moiré unit cell, confirming the theoretically predicted skyrmion layer-texture. The dI/dV mappings allow the parameters that enter the continuum model for the description of moiré bands in twisted TMD bilayers to be determined experimentally, further establishing a direct correlation between the shape of LDOS profile in real space and topology of topmost moiré band.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Unstable Retention Behavior in MIFIS FEFET: Accurate Analysis of the Origin by Absolute Polarization Measurement
Authors:
Song-Hyeon Kuk,
Kyul Ko,
Bong Ho Kim,
Jae-Hoon Han,
Sang-Hyeon Kim
Abstract:
Ferroelectric field-effect-transistor (FEFET) has emerged as a scalable solution for 3D NAND and embedded flash (eFlash), with recent progress in achieving large memory window (MW) using metal-insulator-ferroelectric-insulator-semiconductor (MIFIS) gate stacks. Although the physical origin of the large MW in the MIFIS stack has already been discussed, its retention characteristics have not been ex…
▽ More
Ferroelectric field-effect-transistor (FEFET) has emerged as a scalable solution for 3D NAND and embedded flash (eFlash), with recent progress in achieving large memory window (MW) using metal-insulator-ferroelectric-insulator-semiconductor (MIFIS) gate stacks. Although the physical origin of the large MW in the MIFIS stack has already been discussed, its retention characteristics have not been explored yet. Here, we demonstrate MIFIS FEFET with a maximum MW of 9.7 V, and show that MIFIS FEFET has unstable retention characteristics, especially after erase. We discover the origin of the unstable retention characteristics and prove our hypothesis with absolute polarization measurement and different operation modes, showing that the unstable retention characteristics is a fundamental issue. Based on the understanding, we discuss a novel charge compensation model and promising engineering methodologies to achieve stable retention in MIFIS FEFET.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Isotropy of cosmic rays beyond $10^{20}$ eV favors their heavy mass composition
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
Y. Abe,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
I. Buckland,
B. G. Cheon,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
N. Globus,
R. Gonzalez,
W. Hanlon,
N. Hayashida,
H. He
, et al. (118 additional authors not shown)
Abstract:
We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the resul…
▽ More
We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the results are consistent with a relatively heavy injected composition at E ~ 10 EeV that becomes lighter up to E ~ 100 EeV, while the composition at E > 100 EeV is very heavy. The latter is true even in the presence of highest experimentally allowed extra-galactic magnetic fields, while the composition at lower energies can be light if a strong EGMF is present. The effect of the uncertainty in the galactic magnetic field on these results is subdominant.
△ Less
Submitted 3 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Mass composition of ultra-high energy cosmic rays from distribution of their arrival directions with the Telescope Array
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
Y. Abe,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
I. Buckland,
B. G. Cheon,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
N. Globus,
R. Gonzalez,
W. Hanlon,
N. Hayashida,
H. He
, et al. (118 additional authors not shown)
Abstract:
We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale struc…
▽ More
We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale structure (LSS) of the Universe. As we report in the companion letter, the TA data show large deflections with respect to the LSS which can be explained, assuming small extra-galactic magnetic fields (EGMF), by an intermediate composition changing to a heavy one (iron) in the highest energy bin. Here we show that these results are robust to uncertainties in UHECR injection spectra, the energy scale of the experiment and galactic magnetic fields (GMF). The assumption of weak EGMF, however, strongly affects this interpretation at all but the highest energies E > 100 EeV, where the remarkable isotropy of the data implies a heavy injected composition even in the case of strong EGMF. This result also holds if UHECR sources are as rare as $2 \times 10^{-5}$ Mpc$^{-3}$, that is the conservative lower limit for the source number density.
△ Less
Submitted 3 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Spin-orbit entangled moments and magnetic exchange interactions in cobalt-based honeycomb magnets BaCo$_2$($X$O$_4$)$_2$ ($X$ = P, As, Sb)
Authors:
Subhasis Samanta,
Fabrizio Cossu,
Heung-Sik Kim
Abstract:
Co-based honeycomb magnets have been actively studied recently for the potential realization of emergent quantum magnetism therein such as the Kitaev spin liquid. Here we employ density functional and dynamical mean-field theory methods to examine a family of the Kitaev magnet candidates BaCo$_2$($X$O$_4$)$_2$ ($X$ = P, As, Sb), where the compound with $X$ = Sb being not synthesized yet. Our study…
▽ More
Co-based honeycomb magnets have been actively studied recently for the potential realization of emergent quantum magnetism therein such as the Kitaev spin liquid. Here we employ density functional and dynamical mean-field theory methods to examine a family of the Kitaev magnet candidates BaCo$_2$($X$O$_4$)$_2$ ($X$ = P, As, Sb), where the compound with $X$ = Sb being not synthesized yet. Our study confirms the formation of Mott insulating phase and the $J_{\rm eff}$ = 1/2 spin moments at Co$^{2+}$ sites despite the presence of a sizable amount of trigonal crystal field in all three compounds. The pnictogen substitution from phosphorus to antimony significantly changes the in-plane lattice parameters and direct overlap integral between the neighboring Co ions, leading to the suppression of the Heisenberg interaction. More interestingly, the marginal antiferromagnetic nearest-neighbor Kitaev term changes sign into a ferromagnetic one and becomes sizable at the $X$ = Sb limit. Our study suggests that the pnictogen substitution can be a viable route to continuously tune magnetic exchange interactions and to promote magnetic frustration for the realization of potential spin liquid phases in BaCo$_2$($X$O$_4$)$_2$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Topological Classification of Symmetry Breaking and Vacuum Degeneracy
Authors:
Simon-Raphael Fischer,
Mehran Jalali Farahani,
Hyungrok Kim,
Christian Saemann
Abstract:
We argue that a general system of scalar fields and gauge fields manifesting vacuum degeneracy induces a principal groupoid bundle over spacetime and that the pattern of spontaneous symmetry breaking and the Higgs mechanism are encoded by the singular foliation canonically induced on the moduli space of scalar vacuum expectation values by the Lie groupoid structure. Recent mathematical results in…
▽ More
We argue that a general system of scalar fields and gauge fields manifesting vacuum degeneracy induces a principal groupoid bundle over spacetime and that the pattern of spontaneous symmetry breaking and the Higgs mechanism are encoded by the singular foliation canonically induced on the moduli space of scalar vacuum expectation values by the Lie groupoid structure. Recent mathematical results in the classification of singular foliations then provide a qualitative classification of the possible patterns of vacuum degeneracy.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition
Authors:
Parham Zolfaghari,
Vitor Fortes Rey,
Lala Ray,
Hyun Kim,
Sungho Suh,
Paul Lukowicz
Abstract:
The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from…
▽ More
The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from users, the volume of labeled data remains insufficient compared to domains where deep learning has achieved remarkable success. Addressing this gap, in this paper, we propose a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model that generates sensor data directly from 3D skeleton pose sequences. our method simultaneously trains the pose-to-sensor network and a human activity classifier, optimizing both data reconstruction and activity recognition. Our contributions include the integration of simultaneous training, direct pose-to-sensor generation, and a comprehensive evaluation on the MM-Fit dataset. Experimental results demonstrate the superiority of our framework with significant performance improvements over baseline methods.
△ Less
Submitted 25 April, 2024;
originally announced June 2024.
-
Adjusted Connections I: Differential Cocycles for Principal Groupoid Bundles with Connection
Authors:
Simon-Raphael Fischer,
Mehran Jalali Farahani,
Hyungrok Kim,
Christian Saemann
Abstract:
We develop a new perspective on principal bundles with connection as morphisms from the tangent bundle of the underlying manifold to a classifying dg-Lie groupoid. This groupoid can be identified with a lift of the inner homomorphisms groupoid arising in Ševera's differentiation procedure of Lie quasi-groupoids. Our new perspective readily extends to principal groupoid bundles, but requires an adj…
▽ More
We develop a new perspective on principal bundles with connection as morphisms from the tangent bundle of the underlying manifold to a classifying dg-Lie groupoid. This groupoid can be identified with a lift of the inner homomorphisms groupoid arising in Ševera's differentiation procedure of Lie quasi-groupoids. Our new perspective readily extends to principal groupoid bundles, but requires an adjustment, an additional datum familiar from higher gauge theory. The resulting adjusted connections naturally provide a global formulation of the kinematical data of curved Yang-Mills-Higgs theories as described by Kotov-Strobl (arXiv:1510.07654) and Fischer (arXiv:2104.02175).
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection
Authors:
Hyun Myung Kim,
Kangwook Jang,
Hoirin Kim
Abstract:
As speech synthesis systems continue to make remarkable advances in recent years, the importance of robust deepfake detection systems that perform well in unseen systems has grown. In this paper, we propose a novel adaptive centroid shift (ACS) method that updates the centroid representation by continually shifting as the weighted average of bonafide representations. Our approach uses only bonafid…
▽ More
As speech synthesis systems continue to make remarkable advances in recent years, the importance of robust deepfake detection systems that perform well in unseen systems has grown. In this paper, we propose a novel adaptive centroid shift (ACS) method that updates the centroid representation by continually shifting as the weighted average of bonafide representations. Our approach uses only bonafide samples to define their centroid, which can yield a specialized centroid for one-class learning. Integrating our ACS with one-class learning gathers bonafide representations into a single cluster, forming well-separated embeddings robust to unseen spoofing attacks. Our proposed method achieves an equal error rate (EER) of 2.19% on the ASVspoof 2021 deepfake dataset, outperforming all existing systems. Furthermore, the t-SNE visualization illustrates that our method effectively maps the bonafide embeddings into a single cluster and successfully disentangles the bonafide and spoof classes.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling
Authors:
Min-Seop Kwak,
Donghoon Ahn,
Ines Hyeonsu Kim,
Jin-Hwa Kim,
Seungryong Kim
Abstract:
Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may…
▽ More
Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may be induced by multiview inconsistencies between 2D scores predicted from various viewpoints, we introduce GSD, a simple and general plug-and-play framework for incorporating 3D consistency and therefore geometry awareness into the SDS process. Our methodology is composed of three components: 3D consistent noising, designed to produce 3D consistent noise maps that perfectly follow the standard Gaussian distribution, geometry-based gradient warping for identifying correspondences between predicted gradients of different viewpoints, and novel gradient consistency loss to optimize the scene geometry toward producing more consistent gradients. We demonstrate that our method significantly improves performance, successfully addressing the geometric inconsistency problems in text-to-3D generation task with minimal computation cost and being compatible with existing score distillation-based models. Our project page is available at https://ku-cvlab.github.io/GSD/.
△ Less
Submitted 30 June, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Project Management for Ground-based Telescope Array Development
Authors:
Ji Hoon Kim,
Myungshin Im,
Hyung Mok Lee,
Seo-Won Chang
Abstract:
Center for the Gravitational-Wave Universe at Seoul National University has been operating its main observational facility, the 7-Dimensional Telescope (7DT) since October 2023. Located at El Sauce Observatory in Chilean Rio Hurtado Valley, 7DT consists of 20 50-cm telescopes equipped with 40 medium-band filters of 25 nm full width at half maximum along with a CMOS camera of 61 megapixels. 7DT pro…
▽ More
Center for the Gravitational-Wave Universe at Seoul National University has been operating its main observational facility, the 7-Dimensional Telescope (7DT) since October 2023. Located at El Sauce Observatory in Chilean Rio Hurtado Valley, 7DT consists of 20 50-cm telescopes equipped with 40 medium-band filters of 25 nm full width at half maximum along with a CMOS camera of 61 megapixels. 7DT produces about 1 TB per night of spectral mapping image data including calibration, and the byproduct of the data reduction pipeline once our planned three layered surveys (Reference Imaging Survey, Wide Field Survey, and Intensive Monitoring Survey) start in 2024. We are expecting to generate 1 PB per year by combining raw data, reduced data, and data products (e.g. calibrated stacked images, spectral cubes, and object catalogs). To incorporate this huge amount of data, we now have a data storage for 1 PB which we will increment by 1 PB per year. We also have a high-performance computation facility that is equipped with 2 NVIDIA A100 GPU cards since we plan to carry out real-time data reduction and analysis for follow-up observation data of gravitational wave events. To incorporate this, we established a 400 Mbps network connection between the facilities in Korea and Chile. Taking advantage of the high-performance network, we have been carrying out fully remote operations since October 2023. In this talk, we present details of designing, planning, and executing the ground-based telescope facility project, especially within low-budget academic environments. While we cover as much ground as possible, we will emphasize human resource management, project risk management, and financial contingency management.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Introduction to the 7-Dimensional Telescope: Commissioning Procedures and Data Characteristics
Authors:
Ji Hoon Kim,
Myungshin Im,
Hyung Mok Lee,
Seo-Won Chang,
Hyeonho Choi,
Gregory S. H. Paek
Abstract:
The 7-Dimensional Telescope (7DT) is a multi-telescope system designed to identify electromagnetic (EM) counterparts of gravitational-wave (GW) sources. Consisting of 20 50-cm telescopes along with 40 medium-band filters of 25 nm width, 7DT can obtain spectral mapping images for a large field of view (~1.25 square degrees). Along with flexible operation, real-time data reduction, and analysis, the…
▽ More
The 7-Dimensional Telescope (7DT) is a multi-telescope system designed to identify electromagnetic (EM) counterparts of gravitational-wave (GW) sources. Consisting of 20 50-cm telescopes along with 40 medium-band filters of 25 nm width, 7DT can obtain spectral mapping images for a large field of view (~1.25 square degrees). Along with flexible operation, real-time data reduction, and analysis, the 7DT's spectral mapping capability enables 7DT to follow up GW events quickly and discover EM counterparts. Among 20 planned telescopes, 12 units are deployed at the El Sauce Observatory located at Rio Hurtado Valley in Chile. Since we obtained the first light of 7DT in October 2023, we started its commissioning procedures including examination of bias levels, master flat production, and spectrophotometric standardization. In this talk, we present 7DT instruments and their set-up, commissioning procedures, and data characteristics of 7DT along with our three-layered surveys which are assumed to be initiated in early 2024.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection
Authors:
Choonghyun Park,
Hyuhng Joon Kim,
Junyeob Kim,
Youna Kim,
Taeuk Kim,
Hyunsoo Cho,
Hwiyeol Jo,
Sang-goo Lee,
Kang Min Yoo
Abstract:
AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper…
▽ More
AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper, we analyze the impact of such shortcuts in AIGT detection. We propose Feedback-based Adversarial Instruction List Optimization (FAILOpt), an attack that searches for instructions deceptive to AIGT detectors exploiting prompt-specific shortcuts. FAILOpt effectively drops the detection performance of the target detector, comparable to other attacks based on adversarial in-context examples. We also utilize our method to enhance the robustness of the detector by mitigating the shortcuts. Based on the findings, we further train the classifier with the dataset augmented by FAILOpt prompt. The augmented classifier exhibits improvements across generation models, tasks, and attacks. Our code will be available at https://github.com/zxcvvxcz/FAILOpt.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification
Authors:
Inès Hyeonsu Kim,
JoungBin Lee,
Soowon Son,
Woojeong Jin,
Kyusun Cho,
Junyoung Seo,
Min-Seop Kwak,
Seokju Cho,
JeongYeol Baek,
Byeongwon Lee,
Seungryong Kim
Abstract:
Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data a…
▽ More
Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data augmentation; however, they rely on human poses already present in the training dataset, failing to effectively reduce the human pose bias in the dataset. We propose Diff-ID, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution. Our objective is to augment a training dataset that enables existing Re-ID models to learn features unbiased by human pose and camera viewpoint variations. To achieve this, we leverage the knowledge of pre-trained large-scale diffusion models. Using the SMPL model, we simultaneously capture both the desired human poses and camera viewpoints, enabling realistic human rendering. The depth information provided by the SMPL model indirectly conveys the camera viewpoints. By conditioning the diffusion model on both the human pose and camera viewpoint concurrently through the SMPL model, we generate realistic images with diverse human poses and camera viewpoints. Qualitative results demonstrate the effectiveness of our method in addressing human pose bias and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches. The performance gains achieved by training Re-ID models on our offline augmented dataset highlight the potential of our proposed framework in improving the scalability and generalizability of person Re-ID models.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
I Experienced More than 10 DeFi Scams: On DeFi Users' Perception of Security Breaches and Countermeasures
Authors:
Mingyi Liu,
Jun Ho Huh,
HyungSeok Han,
Jaehyuk Lee,
Jihae Ahn,
Frank Li,
Hyoungshick Kim,
Taesoo Kim
Abstract:
Decentralized Finance (DeFi) offers a whole new investment experience and has quickly emerged as an enticing alternative to Centralized Finance (CeFi). Rapidly growing market size and active users, however, have also made DeFi a lucrative target for scams and hacks, with 1.95 billion USD lost in 2023. Unfortunately, no prior research thoroughly investigates DeFi users' security risk awareness leve…
▽ More
Decentralized Finance (DeFi) offers a whole new investment experience and has quickly emerged as an enticing alternative to Centralized Finance (CeFi). Rapidly growing market size and active users, however, have also made DeFi a lucrative target for scams and hacks, with 1.95 billion USD lost in 2023. Unfortunately, no prior research thoroughly investigates DeFi users' security risk awareness levels and the adequacy of their risk mitigation strategies.
Based on a semi-structured interview study (N = 14) and a follow-up survey (N = 493), this paper investigates DeFi users' security perceptions and commonly adopted practices, and how those affected by previous scams or hacks (DeFi victims) respond and try to recover their losses. Our analysis shows that users often prefer DeFi over CeFi due to their decentralized nature and strong profitability. Despite being aware that DeFi, compared to CeFi, is prone to more severe attacks, users are willing to take those risks to explore new investment opportunities. Worryingly, most victims do not learn from previous experiences; unlike victims studied through traditional systems, DeFi victims tend to find new services, without revising their security practices, to recover their losses quickly. The abundance of various DeFi services and opportunities allows victims to continuously explore new financial opportunities, and this reality seems to cloud their security priorities. Indeed, our results indicate that DeFi users' strong financial motivations outweigh their security concerns - much like those who are addicted to gambling. Our observations about victims' post-incident behaviors suggest that stronger control in the form of industry regulations would be necessary to protect DeFi users from future breaches.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.