-
Calibration and simulation of ionization signal and electronics noise in the ICARUS liquid argon time projection chamber
Authors:
ICARUS collaboration,
P. Abratenko,
N. Abrego-Martinez,
A. Aduszkiewicz,
F. Akbar,
L. Aliaga Soplin,
M. Artero Pons,
J. Asaadi,
W. F. Badgett,
B. Baibussinov,
B. Behera,
V. Bellini,
R. Benocci,
S. Berkman,
S. Bertolucci,
M. Betancourt,
M. Bonesini,
T. Boone,
B. Bottino,
A. Braggiotti,
D. Brailsford,
S. J. Brice,
V. Brio,
C. Brizzolari,
H. S. Budd A. Campani
, et al. (153 additional authors not shown)
Abstract:
The ICARUS liquid argon time projection chamber (LArTPC) neutrino detector has been taking physics data since 2022 as part of the Short-Baseline Neutrino (SBN) Program. This paper details the equalization of the response to charge in the ICARUS time projection chamber (TPC), as well as data-driven tuning of the simulation of ionization charge signals and electronics noise. The equalization procedu…
▽ More
The ICARUS liquid argon time projection chamber (LArTPC) neutrino detector has been taking physics data since 2022 as part of the Short-Baseline Neutrino (SBN) Program. This paper details the equalization of the response to charge in the ICARUS time projection chamber (TPC), as well as data-driven tuning of the simulation of ionization charge signals and electronics noise. The equalization procedure removes non-uniformities in the ICARUS TPC response to charge in space and time. This work leverages the copious number of cosmic ray muons available to ICARUS at the surface. The ionization signal shape simulation applies a novel procedure that tunes the simulation to match what is measured in data. The end result of the equalization procedure and simulation tuning allows for a comparison of charge measurements in ICARUS between Monte Carlo simulation and data, showing good performance with minimal residual bias between the two.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Learning to Make Keypoints Sub-Pixel Accurate
Authors:
Shinjeong Kim,
Marc Pollefeys,
Daniel Barath
Abstract:
This work addresses the challenge of sub-pixel accuracy in detecting 2D local features, a cornerstone problem in computer vision. Despite the advancements brought by neural network-based methods like SuperPoint and ALIKED, these modern approaches lag behind classical ones such as SIFT in keypoint localization accuracy due to their lack of sub-pixel precision. We propose a novel network that enhanc…
▽ More
This work addresses the challenge of sub-pixel accuracy in detecting 2D local features, a cornerstone problem in computer vision. Despite the advancements brought by neural network-based methods like SuperPoint and ALIKED, these modern approaches lag behind classical ones such as SIFT in keypoint localization accuracy due to their lack of sub-pixel precision. We propose a novel network that enhances any detector with sub-pixel precision by learning an offset vector for detected features, thereby eliminating the need for designing specialized sub-pixel accurate detectors. This optimization directly minimizes test-time evaluation metrics like relative pose error. Through extensive testing with both nearest neighbors matching and the recent LightGlue matcher across various real-world datasets, our method consistently outperforms existing methods in accuracy. Moreover, it adds only around 7 ms to the time of a particular detector. The code is available at https://github.com/KimSinjeong/keypt2subpx .
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Isometric Representation Learning for Disentangled Latent Space of Diffusion Models
Authors:
Jaehoon Hahm,
Junho Lee,
Sunghyun Kim,
Joonseok Lee
Abstract:
The latent space of diffusion model mostly still remains unexplored, despite its great success and potential in the field of generative modeling. In fact, the latent space of existing diffusion models are entangled, with a distorted mapping from its latent space to image space. To tackle this problem, we present Isometric Diffusion, equipping a diffusion model with a geometric regularizer to guide…
▽ More
The latent space of diffusion model mostly still remains unexplored, despite its great success and potential in the field of generative modeling. In fact, the latent space of existing diffusion models are entangled, with a distorted mapping from its latent space to image space. To tackle this problem, we present Isometric Diffusion, equipping a diffusion model with a geometric regularizer to guide the model to learn a geometrically sound latent space of the training data manifold. This approach allows diffusion models to learn a more disentangled latent space, which enables smoother interpolation, more accurate inversion, and more precise control over attributes directly in the latent space. Our extensive experiments consisting of image interpolations, image inversions, and linear editing show the effectiveness of our method.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Mask-Free Neuron Concept Annotation for Interpreting Neural Networks in Medical Domain
Authors:
Hyeon Bae Kim,
Yong Hyun Ahn,
Seong Tae Kim
Abstract:
Recent advancements in deep neural networks have shown promise in aiding disease diagnosis and medical decision-making. However, ensuring transparent decision-making processes of AI models in compliance with regulations requires a comprehensive understanding of the model's internal workings. However, previous methods heavily rely on expensive pixel-wise annotated datasets for interpreting the mode…
▽ More
Recent advancements in deep neural networks have shown promise in aiding disease diagnosis and medical decision-making. However, ensuring transparent decision-making processes of AI models in compliance with regulations requires a comprehensive understanding of the model's internal workings. However, previous methods heavily rely on expensive pixel-wise annotated datasets for interpreting the model, presenting a significant drawback in medical domains. In this paper, we propose a novel medical neuron concept annotation method, named Mask-free Medical Model Interpretation (MAMMI), addresses these challenges. By using a vision-language model, our method relaxes the need for pixel-level masks for neuron concept annotation. MAMMI achieves superior performance compared to other interpretation methods, demonstrating its efficacy in providing rich representations for neurons in medical image analysis. Our experiments on a model trained on NIH chest X-rays validate the effectiveness of MAMMI, showcasing its potential for transparent clinical decision-making in the medical domain. The code is available at https://github.com/ailab-kyunghee/MAMMI.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach
Authors:
Sojung Lucia Kim,
Taehong Jang,
Joonmo Ahn
Abstract:
This study aims to compare three methods for translating ancient texts with sparse corpora: (1) the traditional statistical translation method of phrase alignment, (2) in-context LLM learning, and (3) proposed inter methodological approach - statistical machine translation method using sentence piece tokens derived from unified set of source-target corpus. The performance of the proposed approach…
▽ More
This study aims to compare three methods for translating ancient texts with sparse corpora: (1) the traditional statistical translation method of phrase alignment, (2) in-context LLM learning, and (3) proposed inter methodological approach - statistical machine translation method using sentence piece tokens derived from unified set of source-target corpus. The performance of the proposed approach in this study is 36.71 in BLEU score, surpassing the scores of SOLAR-10.7B context learning and the best existing Seq2Seq model. Further analysis and discussion are presented.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
I$^2$-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM
Authors:
Gwangtak Bae,
Changwoon Choi,
Hyeongjun Heo,
Sang Min Kim,
Young Min Kim
Abstract:
We present an inverse image-formation module that can enhance the robustness of existing visual SLAM pipelines for casually captured scenarios. Casual video captures often suffer from motion blur and varying appearances, which degrade the final quality of coherent 3D visual representation. We propose integrating the physical imaging into the SLAM system, which employs linear HDR radiance maps to c…
▽ More
We present an inverse image-formation module that can enhance the robustness of existing visual SLAM pipelines for casually captured scenarios. Casual video captures often suffer from motion blur and varying appearances, which degrade the final quality of coherent 3D visual representation. We propose integrating the physical imaging into the SLAM system, which employs linear HDR radiance maps to collect measurements. Specifically, individual frames aggregate images of multiple poses along the camera trajectory to explain prevalent motion blur in hand-held videos. Additionally, we accommodate per-frame appearance variation by dedicating explicit variables for image formation steps, namely white balance, exposure time, and camera response function. Through joint optimization of additional variables, the SLAM pipeline produces high-quality images with more accurate trajectories. Extensive experiments demonstrate that our approach can be incorporated into recent visual SLAM pipelines using various scene representations, such as neural radiance fields or Gaussian splatting.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
S-confinement of 3d Argyres-Douglas theories and the Seiberg-like duality with an adjoint matter
Authors:
Chiung Hwang,
Sungjoon Kim
Abstract:
We propose an $\mathcal{N}=2$ preserving deformation that leads to the confining phase of the 3d reduction of the $D_p[SU(N)]$ Argyres-Douglas theories, referred to as $\mathbb{D}_p[SU(N)]$. This deformation incorporates monopole superpotential terms, which have recently played interesting roles in exploring possible RG fixed points of 3d supersymmetric gauge theories. Employing this confining phe…
▽ More
We propose an $\mathcal{N}=2$ preserving deformation that leads to the confining phase of the 3d reduction of the $D_p[SU(N)]$ Argyres-Douglas theories, referred to as $\mathbb{D}_p[SU(N)]$. This deformation incorporates monopole superpotential terms, which have recently played interesting roles in exploring possible RG fixed points of 3d supersymmetric gauge theories. Employing this confining phenomenon in 3d $\mathbb{D}_p[SU(N)]$ theories, we also propose a deconfined version of the Kim-Park duality, an IR duality for 3d $\mathcal{N}=2$ adjoint SQCDs, where an adjoint matter field is replaced by a linear quiver tail of $\mathbb{D}_p[SU(N)]$. Surprisingly, both the confinement of deformed $\mathbb{D}_p[SU(N)]$ and the deconfined Kim-Park duality can be proven only assuming some basic 3d $\mathcal{N}=2$ IR dualities. Finally, we propose a variant of the Kim-Park duality deformed by a single monopole superpotential term, which can also be derived using the same method.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
X-ray and multiwavelength polarization of Mrk 501 from 2022 to 2023
Authors:
Chien-Ting J. Chen,
Ioannis Liodakis,
Riccardo Middei,
Dawoon E. Kim,
Laura Di Gesu,
Alessandro Di Marco,
Steven R. Ehlert,
Manel Errando,
Michela Negro,
Svetlana G. Jorstad,
Alan P. Marscher,
Kinwah Wu,
Iván Agudo,
Juri Poutanen,
Tsunefumi Mizuno,
Pouya M. Kouch,
Elina Lindfors,
George A. Borman,
Tatiana S. Grishina,
Evgenia N. Kopatskaya,
Elena G. Larionova,
Daria A. Morozova,
Sergey S. Savchenko,
Ivan S. Troitsky,
Yulia V. Troitskaya
, et al. (121 additional authors not shown)
Abstract:
We present multiwavelength polarization measurements of the luminous blazar Mrk~501 over a 14-month period. The 2--8 keV X-ray polarization was measured with the Imaging X-ray Polarimetry Explorer (IXPE) with six 100-ks observations spanning from 2022 March to 2023 April. Each IXPE observation was accompanied by simultaneous X-ray data from NuSTAR, Swift/XRT, and/or XMM-Newton. Complementary optic…
▽ More
We present multiwavelength polarization measurements of the luminous blazar Mrk~501 over a 14-month period. The 2--8 keV X-ray polarization was measured with the Imaging X-ray Polarimetry Explorer (IXPE) with six 100-ks observations spanning from 2022 March to 2023 April. Each IXPE observation was accompanied by simultaneous X-ray data from NuSTAR, Swift/XRT, and/or XMM-Newton. Complementary optical-infrared polarization measurements were also available in the B, V, R, I, and J bands, as were radio polarization measurements from 4.85 GHz to 225.5 GHz. Among the first five IXPE observations, we did not find significant variability in the X-ray polarization degree and angle with IXPE. However, the most recent sixth observation found an elevated polarization degree at $>3σ$ above the average of the other five observations. The optical and radio measurements show no apparent correlations with the X-ray polarization properties. Throughout the six IXPE observations, the X-ray polarization degree remained higher than, or similar to, the R-band optical polarization degree, which remained higher than the radio value. This is consistent with the energy-stratified shock scenario proposed to explain the first two IXPE observations, in which the polarized X-ray, optical, and radio emission arises from different regions.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Tailoring Solution Accuracy for Fast Whole-body Model Predictive Control of Legged Robots
Authors:
Charles Khazoom,
Seungwoo Hong,
Matthew Chignoli,
Elijah Stanger-Jones,
Sangbae Kim
Abstract:
Thanks to recent advancements in accelerating non-linear model predictive control (NMPC), it is now feasible to deploy whole-body NMPC at real-time rates for humanoid robots. However, enforcing inequality constraints in real time for such high-dimensional systems remains challenging due to the need for additional iterations. This paper presents an implementation of whole-body NMPC for legged robot…
▽ More
Thanks to recent advancements in accelerating non-linear model predictive control (NMPC), it is now feasible to deploy whole-body NMPC at real-time rates for humanoid robots. However, enforcing inequality constraints in real time for such high-dimensional systems remains challenging due to the need for additional iterations. This paper presents an implementation of whole-body NMPC for legged robots that provides low-accuracy solutions to NMPC with general equality and inequality constraints. Instead of aiming for highly accurate optimal solutions, we leverage the alternating direction method of multipliers to rapidly provide low-accuracy solutions to quadratic programming subproblems. Our extensive simulation results indicate that real robots often cannot benefit from highly accurate solutions due to dynamics discretization errors, inertial modeling errors and delays. We incorporate control barrier functions (CBFs) at the initial timestep of the NMPC for the self-collision constraints, resulting in up to a 26-fold reduction in the number of self-collisions without adding computational burden. The controller is reliably deployed on hardware at 90 Hz for a problem involving 32 timesteps, 2004 variables, and 3768 constraints. The NMPC delivers sufficiently accurate solutions, enabling the MIT Humanoid to plan complex crossed-leg and arm motions that enhance stability when walking and recovering from significant disturbances.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
3D Geometric Shape Assembly via Efficient Point Cloud Matching
Authors:
Nahyuk Lee,
Juhong Min,
Junha Lee,
Seungwook Kim,
Kanghee Lee,
Jaesik Park,
Minsu Cho
Abstract:
Learning to assemble geometric shapes into a larger target structure is a pivotal task in various practical applications. In this work, we tackle this problem by establishing local correspondences between point clouds of part shapes in both coarse- and fine-levels. To this end, we introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matchin…
▽ More
Learning to assemble geometric shapes into a larger target structure is a pivotal task in various practical applications. In this work, we tackle this problem by establishing local correspondences between point clouds of part shapes in both coarse- and fine-levels. To this end, we introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matching between mating surfaces of parts while incurring low costs in memory and computation. Building upon PMT, we introduce a new framework, dubbed Proxy Match TransformeR (PMTR), for the geometric assembly task. We evaluate the proposed PMTR on the large-scale 3D geometric shape assembly benchmark dataset of Breaking Bad and demonstrate its superior performance and efficiency compared to state-of-the-art methods. Project page: https://nahyuklee.github.io/pmtr.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Multibeam Satellite Communications with Massive MIMO: Asymptotic Performance Analysis and Design Insights
Authors:
Seyong Kim,
Jinseok Choi,
Wonjae Shin,
Namyoon Lee,
Jeonghun Park
Abstract:
To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by…
▽ More
To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by which inter-beam interference is efficiently mitigated by narrowing corresponding beam width. By modeling the ground users' locations via a Poisson point process, we rigorously analyze the achievable performance of the presented multibeam satellite system. In particular, we investigate the asymptotic scaling laws that reveal the interplay between the user density, the number of beams, and the number of antennas. Our analysis offers critical design insights for the multibeam satellite with massive MIMO: i) If the user density scales in power with the number of antennas, the considered precoding can achieve a linear fraction of the optimal rate in the asymptotic regime. ii) A certain additional scaling factor for the user density is needed as the number of beams increases to maintain the asymptotic optimality.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection
Authors:
Sanmin Kim,
Youngseok Kim,
Sihwan Hwang,
Hyeonjun Jeong,
Dongsuk Kum
Abstract:
Recent advancements in camera-based 3D object detection have introduced cross-modal knowledge distillation to bridge the performance gap with LiDAR 3D detectors, leveraging the precise geometric information in LiDAR point clouds. However, existing cross-modal knowledge distillation methods tend to overlook the inherent imperfections of LiDAR, such as the ambiguity of measurements on distant or occ…
▽ More
Recent advancements in camera-based 3D object detection have introduced cross-modal knowledge distillation to bridge the performance gap with LiDAR 3D detectors, leveraging the precise geometric information in LiDAR point clouds. However, existing cross-modal knowledge distillation methods tend to overlook the inherent imperfections of LiDAR, such as the ambiguity of measurements on distant or occluded objects, which should not be transferred to the image detector. To mitigate these imperfections in LiDAR teacher, we propose a novel method that leverages aleatoric uncertainty-free features from ground truth labels. In contrast to conventional label guidance approaches, we approximate the inverse function of the teacher's head to effectively embed label inputs into feature space. This approach provides additional accurate guidance alongside LiDAR teacher, thereby boosting the performance of the image detector. Additionally, we introduce feature partitioning, which effectively transfers knowledge from the teacher modality while preserving the distinctive features of the student, thereby maximizing the potential of both modalities. Experimental results demonstrate that our approach improves mAP and NDS by 5.1 points and 4.9 points compared to the baseline model, proving the effectiveness of our approach. The code is available at https://github.com/sanmin0312/LabelDistill
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Grain boundaries control lithiation of solid solution substrates in lithium metal batteries
Authors:
Leonardo Shoji Aota,
Chanwon Jung,
Siyuan Zhang,
Ömer K. Büyükuslu,
Poonam Yadav,
Mahander Pratap Singh,
Xinren Chen,
Eric Woods,
Christina Scheu,
Se-Ho Kim,
Dierk Raabe,
Baptiste Gault
Abstract:
The development of sustainable transportation and communication systems requires an increase in both energy density and capacity retention of Li-batteries. Using substrates forming a solid solution with body centered cubic Li enhances the cycle stability of anode-less batteries. However, it remains unclear how the substrate microstructure affects the lithiation behavior. Here, we deploy a correlat…
▽ More
The development of sustainable transportation and communication systems requires an increase in both energy density and capacity retention of Li-batteries. Using substrates forming a solid solution with body centered cubic Li enhances the cycle stability of anode-less batteries. However, it remains unclear how the substrate microstructure affects the lithiation behavior. Here, we deploy a correlative, near-atomic scale probing approach through combined ion- and electron-microscopy to examine the distribution of Li in Li-Ag diffusion couples as model system. We reveal that Li regions with over 93.8% at.% nucleate within Ag at random high angle grain boundaries, whereas grain interiors are not lithiated. We evidence the role of kinetics and mechanical constraint from the microstructure over equilibrium thermodynamics in dictating the lithiation process. The findings suggest that grain size and grain boundary character are critical to enhance the electrochemical performance of interlayers/electrodes, particularly for improving lithiation kinetics and hence reducing dendrite formation.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Topological Fermi-arc surface state covered by floating electrons on a two-dimensional electride
Authors:
Chan-young Lim,
Min-Seok Kim,
Dong Cheol Lim,
Sunghun Kim,
Yeonghoon Lee,
Jaehoon Cha,
Gyubin Lee,
Sang Yong Song,
Dinesh Thapa,
Jonathan D. Denlinger,
Seong-Gon Kim,
Sung Wng Kim,
Jungpil Seo,
Yeongkwan Kim
Abstract:
Two-dimensional electrides can acquire topologically non-trivial phases due to intriguing interplay between the cationic atomic layers and anionic electron layers. However, experimental evidence of topological surface states has yet to be verified. Here, via angle-resolved photoemission spectroscopy (ARPES) and scanning tunnelling microscopy (STM), we probe the magnetic Weyl states of the ferromag…
▽ More
Two-dimensional electrides can acquire topologically non-trivial phases due to intriguing interplay between the cationic atomic layers and anionic electron layers. However, experimental evidence of topological surface states has yet to be verified. Here, via angle-resolved photoemission spectroscopy (ARPES) and scanning tunnelling microscopy (STM), we probe the magnetic Weyl states of the ferromagnetic electride $[Gd_{2}$C]^{2+}\cdot2e^{-}$. In particular, the presence of Weyl cones and Fermi-arc states is demonstrated through photon energy-dependent ARPES measurements, agreeing with theoretical band structure calculations. Notably, the STM measurements reveal that the Fermi-arc states exist underneath a floating quantum electron liquid on the top Gd layer, forming double-stacked surface states in a heterostructure. Our work thus not only unveils the non-trivial topology of the $[Gd_{2}$C]^{2+}\cdot2e^{-}$ electride but also realizes a surface heterostructure that can host phenomena distinct from the bulk.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
Authors:
Byeonghyun Pak,
Byeongju Woo,
Sunghwan Kim,
Dae-hwan Kim,
Hoseong Kim
Abstract:
In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. T…
▽ More
In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. To leverage the power of textual object queries, we introduce a novel framework named the textual query-driven mask transformer (tqdm). Our tqdm aims to (1) generate textual object queries that maximally encode domain-invariant semantics and (2) enhance the semantic clarity of dense visual features. Additionally, we suggest three regularization losses to improve the efficacy of tqdm by aligning between visual and textual features. By utilizing our method, the model can comprehend inherent semantic information for classes of interest, enabling it to generalize to extreme domains (e.g., sketch style). Our tqdm achieves 68.9 mIoU on GTA5$\rightarrow$Cityscapes, outperforming the prior state-of-the-art method by 2.5 mIoU. The project page is available at https://byeonghyunpak.github.io/tqdm.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Characterizing Prompt Compression Methods for Long Context Inference
Authors:
Siddharth Jha,
Lutfi Eren Erdogan,
Sehoon Kim,
Kurt Keutzer,
Amir Gholami
Abstract:
Long context inference presents challenges at the system level with increased compute and memory requirements, as well as from an accuracy perspective in being able to reason over long contexts. Recently, several methods have been proposed to compress the prompt to reduce the context length. However, there has been little work on comparing the different proposed methods across different tasks thro…
▽ More
Long context inference presents challenges at the system level with increased compute and memory requirements, as well as from an accuracy perspective in being able to reason over long contexts. Recently, several methods have been proposed to compress the prompt to reduce the context length. However, there has been little work on comparing the different proposed methods across different tasks through a standardized analysis. This has led to conflicting results. To address this, here we perform a comprehensive characterization and evaluation of different prompt compression methods. In particular, we analyze extractive compression, summarization-based abstractive compression, and token pruning methods. Surprisingly, we find that extractive compression often outperforms all the other approaches, and enables up to 10x compression with minimal accuracy degradation. Interestingly, we also find that despite several recent claims, token pruning methods often lag behind extractive compression. We only found marginal improvements on summarization tasks.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Near-order relation of power means
Authors:
Jinmi Hwang,
Sejong Kim
Abstract:
On the setting of positive definite operators we study the near-order properties of power means such as the quasi-arithmetic mean (Hölder mean) and Rényi power mean. We see the monotonicity of spectral geometric mean and Wasserstein mean on parameters with respect to the near-order and the near-order relationship between the spectral geometric mean and Wasserstein mean. Furthermore, the monotonici…
▽ More
On the setting of positive definite operators we study the near-order properties of power means such as the quasi-arithmetic mean (Hölder mean) and Rényi power mean. We see the monotonicity of spectral geometric mean and Wasserstein mean on parameters with respect to the near-order and the near-order relationship between the spectral geometric mean and Wasserstein mean. Furthermore, the monotonicity of quasi-arithmetic mean on parameters and the convergence of Rényi power mean to the log-Euclidean mean with respect to the near-order have been established.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization
Authors:
Jeongseok Hyun,
Su Ho Han,
Hyolim Kang,
Joon-Young Lee,
Seon Joo Kim
Abstract:
The vocabulary size in temporal action localization (TAL) is constrained by the scarcity of large-scale annotated datasets. To address this, recent works incorporate powerful pre-trained vision-language models (VLMs), such as CLIP, to perform open-vocabulary TAL (OV-TAL). However, unlike VLMs trained on extensive image/video-text pairs, existing OV-TAL methods still rely on small, fully labeled TA…
▽ More
The vocabulary size in temporal action localization (TAL) is constrained by the scarcity of large-scale annotated datasets. To address this, recent works incorporate powerful pre-trained vision-language models (VLMs), such as CLIP, to perform open-vocabulary TAL (OV-TAL). However, unlike VLMs trained on extensive image/video-text pairs, existing OV-TAL methods still rely on small, fully labeled TAL datasets for training an action localizer. In this paper, we explore the scalability of self-training with unlabeled YouTube videos for OV-TAL. Our self-training approach consists of two stages. First, a class-agnostic action localizer is trained on a human-labeled TAL dataset and used to generate pseudo-labels for unlabeled videos. Second, the large-scale pseudo-labeled dataset is combined with the human-labeled dataset to train the localizer. Extensive experiments demonstrate that leveraging web-scale videos in self-training significantly enhances the generalizability of an action localizer. Additionally, we highlighted issues with existing OV-TAL evaluation schemes and proposed a new evaluation protocol. Code is released at https://github.com/HYUNJS/STOV-TAL
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders
Authors:
Jinseok Kim,
Jaewon Jung,
Sangyeop Kim,
Sohyung Park,
Sungzoon Cho
Abstract:
Despite the impressive capabilities of Large Language Models (LLMs) in various tasks, their vulnerability to unsafe prompts remains a critical issue. These prompts can lead LLMs to generate responses on illegal or sensitive topics, posing a significant threat to their safe and ethical use. Existing approaches attempt to address this issue using classification models, but they have several drawback…
▽ More
Despite the impressive capabilities of Large Language Models (LLMs) in various tasks, their vulnerability to unsafe prompts remains a critical issue. These prompts can lead LLMs to generate responses on illegal or sensitive topics, posing a significant threat to their safe and ethical use. Existing approaches attempt to address this issue using classification models, but they have several drawbacks. With the increasing complexity of unsafe prompts, similarity search-based techniques that identify specific features of unsafe prompts provide a more robust and effective solution to this evolving problem. This paper investigates the potential of sentence encoders to distinguish safe from unsafe prompts, and the ability to classify various unsafe prompts according to a safety taxonomy. We introduce new pairwise datasets and the Categorical Purity (CP) metric to measure this capability. Our findings reveal both the effectiveness and limitations of existing sentence encoders, proposing directions to improve sentence encoders to operate as more robust safety detectors. Our code is available at https://github.com/JwdanielJung/Safe-Embed.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
A Survey on Mixture of Experts
Authors:
Weilin Cai,
Juyong Jiang,
Fan Wang,
Jing Tang,
Sunghun Kim,
Jiayi Huang
Abstract:
Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power harnessed during training, all of which contribute to the emergent abilities of LLMs (e.g., in-context…
▽ More
Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power harnessed during training, all of which contribute to the emergent abilities of LLMs (e.g., in-context learning) that are not present in small models. Within this context, the mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal computation overhead, gaining significant attention from academia and industry. Despite its growing prevalence, there lacks a systematic and comprehensive review of the literature on MoE. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE. We first briefly introduce the structure of the MoE layer, followed by proposing a new taxonomy of MoE. Next, we overview the core designs for various MoE models including both algorithmic and systemic aspects, alongside collections of available open-source implementations, hyperparameter configurations and empirical evaluations. Furthermore, we delineate the multifaceted applications of MoE in practice, and outline some potential directions for future research. To facilitate ongoing updates and the sharing of cutting-edge developments in MoE research, we have established a resource repository accessible at https://github.com/withinmiaov/A-Survey-on-Mixture-of-Experts.
△ Less
Submitted 26 June, 2024;
originally announced July 2024.
-
Is GPT-4 Alone Sufficient for Automated Essay Scoring?: A Comparative Judgment Approach Based on Rater Cognition
Authors:
Seungju Kim,
Meounggun Jo
Abstract:
Large Language Models (LLMs) have shown promise in Automated Essay Scoring (AES), but their zero-shot and few-shot performance often falls short compared to state-of-the-art models and human raters. However, fine-tuning LLMs for each specific task is impractical due to the variety of essay prompts and rubrics used in real-world educational contexts. This study proposes a novel approach combining L…
▽ More
Large Language Models (LLMs) have shown promise in Automated Essay Scoring (AES), but their zero-shot and few-shot performance often falls short compared to state-of-the-art models and human raters. However, fine-tuning LLMs for each specific task is impractical due to the variety of essay prompts and rubrics used in real-world educational contexts. This study proposes a novel approach combining LLMs and Comparative Judgment (CJ) for AES, using zero-shot prompting to choose between two essays. We demonstrate that a CJ method surpasses traditional rubric-based scoring in essay scoring using LLMs.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Improved limit on neutrinoless double beta decay of \mohundred~from AMoRE-I
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (83 additional authors not shown)
Abstract:
AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate c…
▽ More
AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate crystals, at the Yangyang Underground Laboratory for over two years. The exposure was 8.02 kg$\cdot$year (or 3.89 kg$_{\mathrm{^{100}Mo}}\cdot$year) and the total background rate near the Q-value was 0.025 $\pm$ 0.002 counts/keV/kg/year. We observed no indication of $0νββ$ decay and report a new lower limit of the half-life of $^{100}$Mo $0νββ$ decay as $ T^{0ν}_{1/2}>3.0\times10^{24}~\mathrm{years}$ at 90\% confidence level. The effective Majorana mass limit range is $m_{ββ}<$(210--610) meV using nuclear matrix elements estimated in the framework of different models, including the recent shell model calculations.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Exploring the role of nonlocal Coulomb interactions in perovskite transition metal oxides
Authors:
Indukuru Ramesh Reddy,
Chang-Jong Kang,
Sooran Kim,
Bongjae Kim
Abstract:
Employing the density functional theory incorporating on-site and inter-site Coulomb interactions (DFT+U+V), we have investigated the role of the nonlocal interactions on the electronic structures of the transition metal oxide perovskites. Using constrained random phase approximation calculations, we derived screened Coulomb interaction parameters and revealed a competition between localization an…
▽ More
Employing the density functional theory incorporating on-site and inter-site Coulomb interactions (DFT+U+V), we have investigated the role of the nonlocal interactions on the electronic structures of the transition metal oxide perovskites. Using constrained random phase approximation calculations, we derived screened Coulomb interaction parameters and revealed a competition between localization and screening effects, which results in nonmonotonic behavior with d-orbital occupation. We highlight the significant role and nonlocality of inter-site Coulomb interactions, V, comparable in magnitude to the local interaction, U. Our DFT+U+V results exemplarily show the representative band renormalization, and deviations from ideal extended Hubbard models due to increased hybridization between transition metal d and oxygen p orbitals as occupation increases. We further demonstrate that the inclusion of the inter-site V is essential for accurately reproducing the experimental magnetic order in transition metal oxides.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Understanding Political Communication and Political Communicators on Twitch
Authors:
Sangyeon Kim
Abstract:
As new technologies rapidly reshape patterns of political communication, platforms like Twitch are transforming how people consume political information. This entertainment-oriented live streaming platform allows us to observe the impact of technologies such as ``live-streaming'' and ``streaming-chat'' on political communication. Despite its entertainment focus, Twitch hosts a variety of political…
▽ More
As new technologies rapidly reshape patterns of political communication, platforms like Twitch are transforming how people consume political information. This entertainment-oriented live streaming platform allows us to observe the impact of technologies such as ``live-streaming'' and ``streaming-chat'' on political communication. Despite its entertainment focus, Twitch hosts a variety of political actors, including politicians and pundits. This study explores Twitch politics by addressing three main questions: 1) Who are the political Twitch streamers? 2) What content is covered in political streams? 3) How do audiences of political streams interact with each other? To identify political streamers, I leveraged the Twitch API and supervised machine-learning techniques, identifying 574 political streamers. I used topic modeling to analyze the content of political streams, revealing seven broad categories of political topics and a unique pattern of communication involving context-specific ``emotes.'' Additionally, I created user-reference networks to examine interaction patterns, finding that a small number of users dominate the communication network. This research contributes to our understanding of how new social media technologies influence political communication, particularly among younger audiences.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection
Authors:
YeongHyeon Park,
Sungho Kang,
Myung Jin Kim,
Hyeong Seok Kim,
Juneho Yi
Abstract:
In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that av…
▽ More
In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that avoids large-scale complex NNs. Motivated by this, we aim to optimize the UAD performance with minimal changes to NN settings. Thus, we revisit the reconstruction-by-inpainting approach and rethink to improve it by analyzing strengths and weaknesses. The strength of the SOTA methods is a single deterministic masking approach that addresses the challenges of random multiple masking that is inference latency and output inconsistency. Nevertheless, the issue of failure to provide a mask to completely cover anomalous regions is a remaining weakness. To mitigate this issue, we propose Feature Attenuation of Defective Representation (FADeR) that only employs two MLP layers which attenuates feature information of anomaly reconstruction during decoding. By leveraging FADeR, features of unseen anomaly patterns are reconstructed into seen normal patterns, reducing false alarms. Experimental results demonstrate that FADeR achieves enhanced performance compared to similar-scale NNs. Furthermore, our approach exhibits scalability in performance enhancement when integrated with other single deterministic masking methods in a plug-and-play manner.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech
Authors:
Haechan Kim,
Junho Myung,
Seoyoung Kim,
Sungpah Lee,
Dongyeop Kang,
Juho Kim
Abstract:
Prevalent ungrammatical expressions and disfluencies in spontaneous speech from second language (L2) learners pose unique challenges to Automatic Speech Recognition (ASR) systems. However, few datasets are tailored to L2 learner speech. We publicly release LearnerVoice, a dataset consisting of 50.04 hours of audio and transcriptions of L2 learners' spontaneous speech. Our linguistic analysis revea…
▽ More
Prevalent ungrammatical expressions and disfluencies in spontaneous speech from second language (L2) learners pose unique challenges to Automatic Speech Recognition (ASR) systems. However, few datasets are tailored to L2 learner speech. We publicly release LearnerVoice, a dataset consisting of 50.04 hours of audio and transcriptions of L2 learners' spontaneous speech. Our linguistic analysis reveals that transcriptions in our dataset contain L2S (L2 learner's Spontaneous speech) features, consisting of ungrammatical expressions and disfluencies (e.g., filler words, word repetitions, self-repairs, false starts), significantly more than native speech datasets. Fine-tuning whisper-small.en with LearnerVoice achieves a WER of 10.26%, 44.2% lower than vanilla whisper-small.en. Furthermore, our qualitative analysis indicates that 54.2% of errors from the vanilla model on LearnerVoice are attributable to L2S features, with 48.1% of them being reduced in the fine-tuned model.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
KAN-ODEs: Kolmogorov-Arnold Network Ordinary Differential Equations for Learning Dynamical Systems and Hidden Physics
Authors:
Benjamin C. Koenig,
Suyong Kim,
Sili Deng
Abstract:
Kolmogorov-Arnold Networks (KANs) as an alternative to Multi-layer perceptrons (MLPs) are a recent development demonstrating strong potential for data-driven modeling. This work applies KANs as the backbone of a Neural Ordinary Differential Equation framework, generalizing their use to the time-dependent and grid-sensitive cases often seen in scientific machine learning applications. The proposed…
▽ More
Kolmogorov-Arnold Networks (KANs) as an alternative to Multi-layer perceptrons (MLPs) are a recent development demonstrating strong potential for data-driven modeling. This work applies KANs as the backbone of a Neural Ordinary Differential Equation framework, generalizing their use to the time-dependent and grid-sensitive cases often seen in scientific machine learning applications. The proposed KAN-ODEs retain the flexible dynamical system modeling framework of Neural ODEs while leveraging the many benefits of KANs, including faster neural scaling, stronger interpretability, and lower parameter counts when compared against MLPs. We demonstrate these benefits in three test cases: the Lotka-Volterra predator-prey model, Burgers' equation, and the Fisher-KPP PDE. We showcase the strong performance of parameter-lean KAN-ODE systems generally in reconstructing entire dynamical systems, and also in targeted applications to the inference of a source term in an otherwise known flow field. We additionally demonstrate the interpretability of KAN-ODEs via activation function visualization and symbolic regression of trained results. The successful training of KAN-ODEs and their improved performance when compared to traditional Neural ODEs implies significant potential in leveraging this novel network architecture in myriad scientific machine learning applications.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Domain Wall Networks as Skyrmion Crystals in Chiral Magnets
Authors:
Seungho Lee,
Toshiaki Fujimori,
Muneto Nitta,
Se Kwon Kim
Abstract:
We theoretically investigate the ground states of a chiral magnet with a square anisotropy and show that it supports domain wall networks as stable ground states. A domain wall junction in the domain wall network turns out to be a skyrmion with half topological charge and, therefore, the found domain wall network has a second topological nature, a skyrmion crystal. More specifically, we present a…
▽ More
We theoretically investigate the ground states of a chiral magnet with a square anisotropy and show that it supports domain wall networks as stable ground states. A domain wall junction in the domain wall network turns out to be a skyrmion with half topological charge and, therefore, the found domain wall network has a second topological nature, a skyrmion crystal. More specifically, we present a ground-state phase diagram of the chiral magnet with varying anisotropy parameters consisting of skyrmion lattices, chiral soliton lattices, and ferromagnetic states. In the presence of the square anisotropy, the skyrmion crystal forms a domain wall network. The size of domains in the domain wall network is shown to be tunable by an external magnetic field, offering a way to realize experimentally detectable domain wall networks.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
NuSTAR as an Axion Helioscope
Authors:
J. Ruz,
E. Todarello,
J. K. Vogel,
M. Giannotti,
B. Grefenstette,
H. S. Hudson,
I. G. Hannah,
I. G. Irastorza,
C. S. Kim,
T. O'Shea,
M. Regis,
D. M. Smith,
M. Taoso,
J. Trujillo Bueno
Abstract:
The nature of dark matter in the Universe is still an open question in astrophysics and cosmology. Axions and axion-like particles (ALPs) offer a compelling solution, and traditionally ground-based experiments have eagerly, but to date unsuccessfully, searched for these hypothetical low-mass particles that are expected to be produced in large quantities in the strong electromagnetic fields in the…
▽ More
The nature of dark matter in the Universe is still an open question in astrophysics and cosmology. Axions and axion-like particles (ALPs) offer a compelling solution, and traditionally ground-based experiments have eagerly, but to date unsuccessfully, searched for these hypothetical low-mass particles that are expected to be produced in large quantities in the strong electromagnetic fields in the interior of stars. This work offers a fresh look at axions and ALPs by leveraging their conversion into X-rays in the magnetic field of the Sun's atmosphere rather than a laboratory magnetic field. Unique data acquired with the Nuclear Spectroscopic Telescope Array (NuSTAR) during the solar minimum in 2020 allows us to set stringent limits on the coupling of axions to photons using state-of-the-art magnetic field models of the solar atmosphere. We report pioneering limits on the axion-photon coupling strength of $6.9\times 10^{-12}$ GeV$^{-1}$ at 95\% confidence level for axion masses $m_a \lesssim 2\times 10^{-7}$ eV, surpassing current ground-based searches and further probing unexplored regions of the axion-photon coupling parameter space up to axion masses of $m_a \lesssim 5\times 10^{-4}$ eV.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Authors:
Sungnyun Kim,
Kangwook Jang,
Sangmin Bae,
Hoirin Kim,
Se-Young Yun
Abstract:
Audio-visual speech recognition (AVSR) aims to transcribe human speech using both audio and video modalities. In practical environments with noise-corrupted audio, the role of video information becomes crucial. However, prior works have primarily focused on enhancing audio features in AVSR, overlooking the importance of video features. In this study, we strengthen the video features by learning th…
▽ More
Audio-visual speech recognition (AVSR) aims to transcribe human speech using both audio and video modalities. In practical environments with noise-corrupted audio, the role of video information becomes crucial. However, prior works have primarily focused on enhancing audio features in AVSR, overlooking the importance of video features. In this study, we strengthen the video features by learning three temporal dynamics in video data: context order, playback direction, and the speed of video frames. Cross-modal attention modules are introduced to enrich video features with audio information so that speech variability can be taken into account when training on the video temporal dynamics. Based on our approach, we achieve the state-of-the-art performance on the LRS2 and LRS3 AVSR benchmarks for the noise-dominant settings. Our approach excels in scenarios especially for babble and speech noise, indicating the ability to distinguish the speech signal that should be recognized from lip movements in the video modality. We support the validity of our methodology by offering the ablation experiments for the temporal dynamics losses and the cross-modal attention architecture design.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory
Authors:
Suyeon Lee,
Sunghwan Kim,
Minju Kim,
Dongjin Kang,
Dongil Yang,
Harim Kim,
Minseok Kang,
Dayi Jung,
Min Hee Kim,
Seungbeen Lee,
Kyoung-Mee Chung,
Youngjae Yu,
Dongha Lee,
Jinyoung Yeo
Abstract:
Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To add…
▽ More
Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To address this, we introduce Cactus, a multi-turn dialogue dataset that emulates real-life interactions using the goal-oriented and structured approach of Cognitive Behavioral Therapy (CBT). We create a diverse and realistic dataset by designing clients with varied, specific personas, and having counselors systematically apply CBT techniques in their interactions. To assess the quality of our data, we benchmark against established psychological criteria used to evaluate real counseling sessions, ensuring alignment with expert evaluations. Experimental results demonstrate that Camel, a model trained with Cactus, outperforms other models in counseling skills, highlighting its effectiveness and potential as a counseling agent. We make our data, model, and code publicly available.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Learning to Reduce: Towards Improving Performance of Large Language Models on Structured Data
Authors:
Younghun Lee,
Sungchul Kim,
Ryan A. Rossi,
Tong Yu,
Xiang Chen
Abstract:
Large Language Models (LLMs) have been achieving competent performance on a wide range of downstream tasks, yet existing work shows that inference on structured data is challenging for LLMs. This is because LLMs need to either understand long structured data or select the most relevant evidence before inference, and both approaches are not trivial. This paper proposes a framework, Learning to Redu…
▽ More
Large Language Models (LLMs) have been achieving competent performance on a wide range of downstream tasks, yet existing work shows that inference on structured data is challenging for LLMs. This is because LLMs need to either understand long structured data or select the most relevant evidence before inference, and both approaches are not trivial. This paper proposes a framework, Learning to Reduce, that fine-tunes a language model with On-Policy Learning to generate a reduced version of an input structured data. When compared to state-of-the-art LLMs like GPT-4, Learning to Reduce not only achieves outstanding performance in reducing the input, but shows generalizability on different datasets. We further show that the model fine-tuned with our framework helps LLMs better perform on table QA tasks especially when the context is longer.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Addressing Prediction Delays in Time Series Forecasting: A Continuous GRU Approach with Derivative Regularization
Authors:
Sheo Yon Jhin,
Seojin Kim,
Noseong Park
Abstract:
Time series forecasting has been an essential field in many different application areas, including economic analysis, meteorology, and so forth. The majority of time series forecasting models are trained using the mean squared error (MSE). However, this training based on MSE causes a limitation known as prediction delay. The prediction delay, which implies the ground-truth precedes the prediction,…
▽ More
Time series forecasting has been an essential field in many different application areas, including economic analysis, meteorology, and so forth. The majority of time series forecasting models are trained using the mean squared error (MSE). However, this training based on MSE causes a limitation known as prediction delay. The prediction delay, which implies the ground-truth precedes the prediction, can cause serious problems in a variety of fields, e.g., finance and weather forecasting -- as a matter of fact, predictions succeeding ground-truth observations are not practically meaningful although their MSEs can be low. This paper proposes a new perspective on traditional time series forecasting tasks and introduces a new solution to mitigate the prediction delay. We introduce a continuous-time gated recurrent unit (GRU) based on the neural ordinary differential equation (NODE) which can supervise explicit time-derivatives. We generalize the GRU architecture in a continuous-time manner and minimize the prediction delay through our time-derivative regularization. Our method outperforms in metrics such as MSE, Dynamic Time Warping (DTW) and Time Distortion Index (TDI). In addition, we demonstrate the low prediction delay of our method in a variety of datasets.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
No More Potentially Dynamic Objects: Static Point Cloud Map Generation based on 3D Object Detection and Ground Projection
Authors:
Soojin Woo,
Donghwi Jung,
Seong-Woo Kim
Abstract:
In this paper, we propose an algorithm to generate a static point cloud map based on LiDAR point cloud data. Our proposed pipeline detects dynamic objects using 3D object detectors and projects points of dynamic objects onto the ground. Typically, point cloud data acquired in real-time serves as a snapshot of the surrounding areas containing both static objects and dynamic objects. The static obje…
▽ More
In this paper, we propose an algorithm to generate a static point cloud map based on LiDAR point cloud data. Our proposed pipeline detects dynamic objects using 3D object detectors and projects points of dynamic objects onto the ground. Typically, point cloud data acquired in real-time serves as a snapshot of the surrounding areas containing both static objects and dynamic objects. The static objects include buildings and trees, otherwise, the dynamic objects contain objects such as parked cars that change their position over time. Removing dynamic objects from the point cloud map is crucial as they can degrade the quality and localization accuracy of the map. To address this issue, in this paper, we propose an algorithm that creates a map only consisting of static objects. We apply a 3D object detection algorithm to the point cloud data which are obtained from LiDAR to implement our pipeline. We then stack the points to create the map after performing ground segmentation and projection. As a result, not only we can eliminate currently dynamic objects at the time of map generation but also potentially dynamic objects such as parked vehicles. We validate the performance of our method using two kinds of datasets collected on real roads: KITTI and our dataset. The result demonstrates the capability of our proposal to create an accurate static map excluding dynamic objects from input point clouds. Also, we verified the improved performance of localization using a generated map based on our method.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Statistical inference on partially shape-constrained function-on-scalar linear regression models
Authors:
Kyunghee Han,
Yeonjoo Park,
Soo-Young Kim
Abstract:
We consider functional linear regression models where functional outcomes are associated with scalar predictors by coefficient functions with shape constraints, such as monotonicity and convexity, that apply to sub-domains of interest. To validate the partial shape constraints, we propose testing a composite hypothesis of linear functional constraints on regression coefficients. Our approach emplo…
▽ More
We consider functional linear regression models where functional outcomes are associated with scalar predictors by coefficient functions with shape constraints, such as monotonicity and convexity, that apply to sub-domains of interest. To validate the partial shape constraints, we propose testing a composite hypothesis of linear functional constraints on regression coefficients. Our approach employs kernel- and spline-based methods within a unified inferential framework, evaluating the statistical significance of the hypothesis by measuring an $L^2$-distance between constrained and unconstrained model fits. In the theoretical study of large-sample analysis under mild conditions, we show that both methods achieve the standard rate of convergence observed in the nonparametric estimation literature. Through numerical experiments of finite-sample analysis, we demonstrate that the type I error rate keeps the significance level as specified across various scenarios and that the power increases with sample size, confirming the consistency of the test procedure under both estimation methods. Our theoretical and numerical results provide researchers the flexibility to choose a method based on computational preference. The practicality of partial shape-constrained inference is illustrated by two data applications: one involving clinical trials of NeuroBloc in type A-resistant cervical dystonia and the other with the National Institute of Mental Health Schizophrenia Study.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
BAPO: Base-Anchored Preference Optimization for Personalized Alignment in Large Language Models
Authors:
Gihun Lee,
Minchan Jeong,
Yujin Kim,
Hojung Jung,
Jaehoon Oh,
Sangmook Kim,
Se-Young Yun
Abstract:
While learning to align Large Language Models (LLMs) with human preferences has shown remarkable success, aligning these models to meet the diverse user preferences presents further challenges in preserving previous knowledge. This paper examines the impact of personalized preference optimization on LLMs, revealing that the extent of knowledge loss varies significantly with preference heterogeneit…
▽ More
While learning to align Large Language Models (LLMs) with human preferences has shown remarkable success, aligning these models to meet the diverse user preferences presents further challenges in preserving previous knowledge. This paper examines the impact of personalized preference optimization on LLMs, revealing that the extent of knowledge loss varies significantly with preference heterogeneity. Although previous approaches have utilized the KL constraint between the reference model and the policy model, we observe that they fail to maintain general knowledge and alignment when facing personalized preferences. To this end, we introduce Base-Anchored Preference Optimization (BAPO), a simple yet effective approach that utilizes the initial responses of reference model to mitigate forgetting while accommodating personalized alignment. BAPO effectively adapts to diverse user preferences while minimally affecting global knowledge or general alignment. Our experiments demonstrate the efficacy of BAPO in various setups.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Probabilistic multi-Stirling numbers of the second kind and probabilistic multi-Lah numbers
Authors:
Taekyun Kim,
Dae san Kim
Abstract:
Assume that the moment generating function of the random vari able Y exists in a neighborhood of the origin. We introduce the probabilistic
multi-Stirling numbers of the second kind associated with Y and the proba bilistic multi-Lah numbers associated with Y, both of indices (k1,k2,...,kr), by
means of the multiple logarithm. Those numbers are respectively probabilistic
extensions of the mul…
▽ More
Assume that the moment generating function of the random vari able Y exists in a neighborhood of the origin. We introduce the probabilistic
multi-Stirling numbers of the second kind associated with Y and the proba bilistic multi-Lah numbers associated with Y, both of indices (k1,k2,...,kr), by
means of the multiple logarithm. Those numbers are respectively probabilistic
extensions of the multi-Stirling numbers of the second kind and the multi-Lah
numbers which, for (k1,k2,...,kr) = (1,1,...,1), boil down respectively to the
Stirling numbers of the second and the unsigned Lah numbers. The aim of this
paper is to study some properties, related identities, recurrence relations and
explicit expressions of those probabilistic extension numbers in connection with
several other special numbers
△ Less
Submitted 17 June, 2024;
originally announced July 2024.
-
Adaptive and Parallel Multiscale Framework for Modeling Cohesive Failure in Engineering Scale Systems
Authors:
Sion Kim,
Ezra Kissel,
Karel Matous
Abstract:
The high computational demands of multiscale modeling necessitate advanced parallel and adaptive strategies. To address this challenge, we introduce an adaptive method that utilizes two microscale models based on an offline database for multiscale modeling of curved interfaces (e.g., adhesive layers). This database employs nonlinear classifiers, developed using Support Vector Machines from microsc…
▽ More
The high computational demands of multiscale modeling necessitate advanced parallel and adaptive strategies. To address this challenge, we introduce an adaptive method that utilizes two microscale models based on an offline database for multiscale modeling of curved interfaces (e.g., adhesive layers). This database employs nonlinear classifiers, developed using Support Vector Machines from microscale sampling data, as a preprocessing step for multiscale simulations. Next, we develop a new parallel network library that enables seamless model selection with customized communication layers, ensuring scalability in parallel computing environments. The correctness and effectiveness of the hierarchically parallel solver are verified on a crack propagation problem within the curved adhesive layer. Finally, we predict the ultimate bending moment and adhesive layer failure of a wind turbine blade and validate the solver on a difficult large-scale engineering problem.
△ Less
Submitted 18 April, 2024;
originally announced July 2024.
-
3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints
Authors:
Yoonkyu Yoo,
Donghwi Jung,
Seong-Woo Kim
Abstract:
In this paper, we propose a control algorithm based on reinforcement learning, employing independent rewards for each joint to control excavators in a 3D space. The aim of this research is to address the challenges associated with achieving precise control of excavators, which are extensively utilized in construction sites but prove challenging to control with precision due to their hydraulic stru…
▽ More
In this paper, we propose a control algorithm based on reinforcement learning, employing independent rewards for each joint to control excavators in a 3D space. The aim of this research is to address the challenges associated with achieving precise control of excavators, which are extensively utilized in construction sites but prove challenging to control with precision due to their hydraulic structures. Traditional methods relied on operator expertise for precise excavator operation, occasionally resulting in safety accidents. Therefore, there have been endeavors to attain precise excavator control through equation-based control algorithms. However, these methods had the limitation of necessitating prior information related to physical values of the excavator, rendering them unsuitable for the diverse range of excavators used in the field. To overcome these limitations, we have explored reinforcement learning-based control methods that do not demand prior knowledge of specific equipment but instead utilize data to train models. Nevertheless, existing reinforcement learning-based methods overlooked cabin swing rotation and confined the bucket's workspace to a 2D plane. Control confined within such a limited area diminishes the applicability of the algorithm in construction sites. We address this issue by expanding the previous 2D plane workspace of the bucket operation into a 3D space, incorporating cabin swing rotation. By expanding the workspace into 3D, excavators can execute continuous operations without requiring human intervention. To accomplish this objective, distinct targets were established for each joint, facilitating the training of action values for each joint independently, regardless of the progress of other joint learning.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Unstable Retention Behavior in MIFIS FEFET: Accurate Analysis of the Origin by Absolute Polarization Measurement
Authors:
Song-Hyeon Kuk,
Kyul Ko,
Bong Ho Kim,
Jae-Hoon Han,
Sang-Hyeon Kim
Abstract:
Ferroelectric field-effect-transistor (FEFET) has emerged as a scalable solution for 3D NAND and embedded flash (eFlash), with recent progress in achieving large memory window (MW) using metal-insulator-ferroelectric-insulator-semiconductor (MIFIS) gate stacks. Although the physical origin of the large MW in the MIFIS stack has already been discussed, its retention characteristics have not been ex…
▽ More
Ferroelectric field-effect-transistor (FEFET) has emerged as a scalable solution for 3D NAND and embedded flash (eFlash), with recent progress in achieving large memory window (MW) using metal-insulator-ferroelectric-insulator-semiconductor (MIFIS) gate stacks. Although the physical origin of the large MW in the MIFIS stack has already been discussed, its retention characteristics have not been explored yet. Here, we demonstrate MIFIS FEFET with a maximum MW of 9.7 V, and show that MIFIS FEFET has unstable retention characteristics, especially after erase. We discover the origin of the unstable retention characteristics and prove our hypothesis with absolute polarization measurement and different operation modes, showing that the unstable retention characteristics is a fundamental issue. Based on the understanding, we discuss a novel charge compensation model and promising engineering methodologies to achieve stable retention in MIFIS FEFET.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
AR-PPF: Advanced Resolution-Based Pixel Preemption Data Filtering for Efficient Time-Series Data Analysis
Authors:
Taewoong Kim,
Kukjin Choi,
Sungjun Kim
Abstract:
With the advent of automation, many manufacturing industries have transitioned to data-centric methodologies, giving rise to an unprecedented influx of data during the manufacturing process. This data has become instrumental in analyzing the quality of manufacturing process and equipment. Engineers and data analysts, in particular, require extensive time-series data for seasonal cycle analysis. Ho…
▽ More
With the advent of automation, many manufacturing industries have transitioned to data-centric methodologies, giving rise to an unprecedented influx of data during the manufacturing process. This data has become instrumental in analyzing the quality of manufacturing process and equipment. Engineers and data analysts, in particular, require extensive time-series data for seasonal cycle analysis. However, due to computational resource constraints, they are often limited to querying short-term data multiple times or resorting to the use of summarized data in which key patterns may be overlooked. This study proposes a novel solution to overcome these limitations; the advanced resolution-based pixel preemption data filtering (AR-PPF) algorithm. This technology allows for efficient visualization of time-series charts over long periods while significantly reducing the time required to retrieve data. We also demonstrates how this approach not only enhances the efficiency of data analysis but also ensures that key feature is not lost, thereby providing a more accurate and comprehensive understanding of the data.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Subtractive Training for Music Stem Insertion using Latent Diffusion Models
Authors:
Ivan Villa-Renteria,
Mason L. Wang,
Zachary Shah,
Zhe Li,
Soohyun Kim,
Neelesh Ramachandran,
Mert Pilanci
Abstract:
We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi…
▽ More
We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusion model to generate the missing instrument stem, guided by both the existing stems and the text instruction. Our results demonstrate Subtractive Training's efficacy in creating authentic drum stems that seamlessly blend with the existing tracks. We also show that we can use the text instruction to control the generation of the inserted stem in terms of rhythm, dynamics, and genre, allowing us to modify the style of a single instrument in a full song while keeping the remaining instruments the same. Lastly, we extend this technique to MIDI formats, successfully generating compatible bass, drum, and guitar parts for incomplete arrangements.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Isotropy of cosmic rays beyond $10^{20}$ eV favors their heavy mass composition
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
Y. Abe,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
I. Buckland,
B. G. Cheon,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
N. Globus,
R. Gonzalez,
W. Hanlon,
N. Hayashida,
H. He
, et al. (118 additional authors not shown)
Abstract:
We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the resul…
▽ More
We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the results are consistent with a relatively heavy injected composition at E ~ 10 EeV that becomes lighter up to E ~ 100 EeV, while the composition at E > 100 EeV is very heavy. The latter is true even in the presence of highest experimentally allowed extra-galactic magnetic fields, while the composition at lower energies can be light if a strong EGMF is present. The effect of the uncertainty in the galactic magnetic field on these results is subdominant.
△ Less
Submitted 3 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Mass composition of ultra-high energy cosmic rays from distribution of their arrival directions with the Telescope Array
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
Y. Abe,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
I. Buckland,
B. G. Cheon,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
N. Globus,
R. Gonzalez,
W. Hanlon,
N. Hayashida,
H. He
, et al. (118 additional authors not shown)
Abstract:
We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale struc…
▽ More
We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale structure (LSS) of the Universe. As we report in the companion letter, the TA data show large deflections with respect to the LSS which can be explained, assuming small extra-galactic magnetic fields (EGMF), by an intermediate composition changing to a heavy one (iron) in the highest energy bin. Here we show that these results are robust to uncertainties in UHECR injection spectra, the energy scale of the experiment and galactic magnetic fields (GMF). The assumption of weak EGMF, however, strongly affects this interpretation at all but the highest energies E > 100 EeV, where the remarkable isotropy of the data implies a heavy injected composition even in the case of strong EGMF. This result also holds if UHECR sources are as rare as $2 \times 10^{-5}$ Mpc$^{-3}$, that is the conservative lower limit for the source number density.
△ Less
Submitted 3 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
Authors:
Hyun Joon Park,
Jin Sob Kim,
Wooseok Shin,
Sung Won Han
Abstract:
Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a…
▽ More
Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a general diffusion TTS framework, DEX-TTS includes encoders and adapters to handle styles extracted from reference speech. Key innovations contain the differentiation of styles into time-invariant and time-variant categories for effective style extraction, as well as the design of encoders and adapters with high generalization ability. In addition, we introduce overlapping patchify and convolution-frequency patch embedding strategies to improve DiT-based diffusion networks for TTS. DEX-TTS yields outstanding performance in terms of objective and subjective evaluation in English multi-speaker and emotional multi-speaker datasets, without relying on pre-training strategies. Lastly, the comparison results for the general TTS on a single-speaker dataset verify the effectiveness of our enhanced diffusion backbone. Demos are available here.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Origin of extended Main Sequence Turn Off in open cluster NGC 2355
Authors:
Jayanand Maurya,
M. R. Samal,
Louis Amard,
Yu Zhang,
Hubiao Niu,
Sang Chul Kim,
Y. C. Joshi,
B. Kumar
Abstract:
The presence of extended Main Sequence Turn-Off (eMSTO) in the open clusters has been attributed to various factors, such as spread in rotation rates, binary stars, and dust-like extinction from stellar excretion discs. We present a comprehensive analysis of the eMSTO in the open cluster NGC 2355. Using spectra from the Gaia-ESO archives, we find that the stars in the red part of the eMSTO have a…
▽ More
The presence of extended Main Sequence Turn-Off (eMSTO) in the open clusters has been attributed to various factors, such as spread in rotation rates, binary stars, and dust-like extinction from stellar excretion discs. We present a comprehensive analysis of the eMSTO in the open cluster NGC 2355. Using spectra from the Gaia-ESO archives, we find that the stars in the red part of the eMSTO have a higher mean v sin i value of 135.3$\pm$4.6 km s$^{-1}$ compared to the stars in the blue part that have an average v sin i equal to 81.3$\pm$5.6 km s$^{-1}$. This suggests that the eMSTO in NGC 2355 is possibly caused by the spread in rotation rates of stars. We do not find any substantial evidence of the dust-like extinction from the eMSTO stars using ultraviolet data from the Swift survey. The estimated synchronization time for low mass ratio close binaries in the blue part of the eMSTO suggests that they would be mostly slow-rotating if present. However, the stars in the blue part of the eMSTO are preferentially located in the outer region of the cluster indicating that they may lack low mass ratio close binaries. The spread in rotation rates of eMSTO stars in NGC 2355 is most likely caused by the star-disc interaction mechanism. The stars in the lower main sequence beyond the eMSTO region of NGC 2355 are slow-rotating (mean v sin i = 26.5$\pm$1.3 km s$^{-1}$) possibly due to the magnetic braking of their rotations.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
On-off switchable nonreciprocal negative refraction in non-Hermitian photon-magnon hybrid systems
Authors:
Junyoung Kim,
Bosung Kim,
Bo-Jong Kim,
Haechan Jeon,
Sang-Koog Kim
Abstract:
Photon-magnon coupling, where electromagnetic waves interact with spin waves, and negative refraction, which bends the direction of electromagnetic waves unnaturally, constitute critical foundations and advancements in the realms of optics, spintronics, and quantum information technology. Here, we explore a magnetic-field-controlled, on-off switchable, nonreciprocal negative refraction within a no…
▽ More
Photon-magnon coupling, where electromagnetic waves interact with spin waves, and negative refraction, which bends the direction of electromagnetic waves unnaturally, constitute critical foundations and advancements in the realms of optics, spintronics, and quantum information technology. Here, we explore a magnetic-field-controlled, on-off switchable, nonreciprocal negative refraction within a non-Hermitian photon-magnon hybrid system. By integrating an yttrium iron garnet film with an inverted split-ring resonator, we discover pronounced negative refraction driven by the system's non-Hermitian properties. This phenomenon exhibits unique nonreciprocal behavior dependent on the signal's propagation direction. Our analytical model sheds light on the crucial interplay between coherent and dissipative coupling, significantly altering permittivity and permeability's imaginary components, crucial for negative refraction's emergence. This work pioneers new avenues for employing negative refraction in photon-magnon hybrid systems, signaling substantial advancements in quantum hybrid systems.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
GFFE: G-buffer Free Frame Extrapolation for Low-latency Real-time Rendering
Authors:
Songyin Wu,
Deepak Vembar,
Anton Sochenov,
Selvakumar Panneer,
Sungye Kim,
Anton Kaplanyan,
Ling-Qi Yan
Abstract:
Real-time rendering has been embracing ever-demanding effects, such as ray tracing. However, rendering such effects in high resolution and high frame rate remains challenging. Frame extrapolation methods, which don't introduce additional latency as opposed to frame interpolation methods such as DLSS 3 and FSR 3, boost the frame rate by generating future frames based on previous frames. However, it…
▽ More
Real-time rendering has been embracing ever-demanding effects, such as ray tracing. However, rendering such effects in high resolution and high frame rate remains challenging. Frame extrapolation methods, which don't introduce additional latency as opposed to frame interpolation methods such as DLSS 3 and FSR 3, boost the frame rate by generating future frames based on previous frames. However, it is a more challenging task because of the lack of information in the disocclusion regions, and recent methods also have a high engine integration cost due to requiring G-buffers as input. We propose a \emph{G-buffer free} frame extrapolation, GFFE, with a novel heuristic framework and an efficient neural network, to plausibly generate new frames in real-time without introducing additional latency. We analyze the motion of dynamic fragments and different types of disocclusions, and design the corresponding modules of the extrapolation block to handle them. After filling disocclusions, a light-weight shading correction network is used to correct shading and improve overall quality. GFFE achieves comparable or better results compared to previous interpolation as well as G-buffer-dependent extrapolation methods, with more efficient performance and easier game integration.
△ Less
Submitted 23 May, 2024;
originally announced June 2024.
-
Burst Image Super-Resolution with Base Frame Selection
Authors:
Sanghyun Kim,
Min Jung Lee,
Woohyeok Kim,
Deunsol Jung,
Jaesung Rim,
Sunghyun Cho,
Minsu Cho
Abstract:
Burst image super-resolution has been a topic of active research in recent years due to its ability to obtain a high-resolution image by using complementary information between multiple frames in the burst. In this work, we explore using burst shots with non-uniform exposures to confront real-world practical scenarios by introducing a new benchmark dataset, dubbed Non-uniformly Exposed Burst Image…
▽ More
Burst image super-resolution has been a topic of active research in recent years due to its ability to obtain a high-resolution image by using complementary information between multiple frames in the burst. In this work, we explore using burst shots with non-uniform exposures to confront real-world practical scenarios by introducing a new benchmark dataset, dubbed Non-uniformly Exposed Burst Image (NEBI), that includes the burst frames at varying exposure times to obtain a broader range of irradiance and motion characteristics within a scene. As burst shots with non-uniform exposures exhibit varying levels of degradation, fusing information of the burst shots into the first frame as a base frame may not result in optimal image quality. To address this limitation, we propose a Frame Selection Network (FSN) for non-uniform scenarios. This network seamlessly integrates into existing super-resolution methods in a plug-and-play manner with low computational costs. The comparative analysis reveals the effectiveness of the nonuniform setting for the practical scenario and our FSN on synthetic-/real- NEBI datasets.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model
Authors:
Joun Yeop Lee,
Myeonghun Jeong,
Minchan Kim,
Ji-Hyun Lee,
Hoon-Young Cho,
Nam Soo Kim
Abstract:
We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target v…
▽ More
We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target voice to generate acoustic tokens from semantic tokens, enriching speech reconstruction. The Interpreting stage employs a transducer for its robustness in aligning text to speech. In contrast, the Speaking stage utilizes a Conformer-based architecture integrated with a Grouped Masked Language Model (G-MLM) to boost computational efficiency. Our experiments verify that this innovative structure surpasses the conventional models in the zero-shot scenario in terms of speech quality and speaker similarity.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.