Skip to main content

Showing 1–50 of 1,322 results for author: Chung

  1. arXiv:2407.13676  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

    Authors: Arda Senocak, Hyeonggon Ryu, Junsik Kim, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung

    Abstract: Recent studies on learning-based sound source localization have mainly focused on the localization performance perspective. However, prior work and existing benchmarks overlook a crucial aspect: cross-modal interaction, which is essential for interactive sound source localization. Cross-modal interaction is vital for understanding semantically matched or mismatched audio-visual events, such as sil… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Journal Extension of ICCV 2023 paper (arXiV:2309.10724). Code is available at https://github.com/kaistmm/SSLalignment

  2. arXiv:2407.12604  [pdf, ps, other

    cs.IT cs.DS cs.SI

    Exact Graph Matching in Correlated Gaussian-Attributed Erdős-Rényi Model

    Authors: Joonhyuk Yang, Hye Won Chung

    Abstract: Graph matching problem aims to identify node correspondence between two or more correlated graphs. Previous studies have primarily focused on models where only edge information is provided. However, in many social networks, not only the relationships between users, represented by edges, but also their personal information, represented by features, are present. In this paper, we address the challen… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: IEEE International Symposium on Information Theory (ISIT) 2024

  3. arXiv:2407.12304  [pdf, other

    cs.RO

    MAGIC-VFM: Meta-learning Adaptation for Ground Interaction Control with Visual Foundation Models

    Authors: Elena Sorina Lupu, Fengze Xie, James A. Preiss, Jedidiah Alindogan, Matthew Anderson, Soon-Jo Chung

    Abstract: Control of off-road vehicles is challenging due to the complex dynamic interactions with the terrain. Accurate modeling of these interactions is important to optimize driving performance, but the relevant physical phenomena are too complex to model from first principles. Therefore, we present an offline meta-learning algorithm to construct a rapidly-tunable model of residual dynamics and disturban… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  4. arXiv:2407.12207  [pdf, other

    cs.CV cs.RO

    NeuSurfEmb: A Complete Pipeline for Dense Correspondence-based 6D Object Pose Estimation without CAD Models

    Authors: Francesco Milano, Jen Jen Chung, Hermann Blum, Roland Siegwart, Lionel Ott

    Abstract: State-of-the-art approaches for 6D object pose estimation assume the availability of CAD models and require the user to manually set up physically-based rendering (PBR) pipelines for synthetic training data generation. Both factors limit the application of these methods in real-world scenarios. In this work, we present a pipeline that does not require CAD models and allows training a state-of-the-… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024. 8 pages, 4 figures, 5 tables

  5. arXiv:2407.11376  [pdf, other

    quant-ph cs.NI cs.PF

    Analytical Performance Estimations for Quantum Repeater Network Scenarios

    Authors: Allen Zang, Joaquin Chung, Rajkumar Kettimuthu, Martin Suchara, Tian Zhong

    Abstract: Quantum repeater chains will form the backbone of future quantum networks that distribute entanglement between network nodes. Therefore, it is important to understand the entanglement distribution performance of quantum repeater chains, especially their throughput and latency. By using Markov chains to model the stochastic dynamics in quantum repeater chains, we offer analytical estimations for lo… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 6 pages, 5 figures, accepted to QCE24

  6. arXiv:2407.11245  [pdf, other

    cs.IR cs.AI

    Pacer and Runner: Cooperative Learning Framework between Single- and Cross-Domain Sequential Recommendation

    Authors: Chung Park, Taesan Kim, Hyungjun Yoon, Junui Hong, Yelim Yu, Mincheol Cho, Minsung Choi, Jaegul Choo

    Abstract: Cross-Domain Sequential Recommendation (CDSR) improves recommendation performance by utilizing information from multiple domains, which contrasts with Single-Domain Sequential Recommendation (SDSR) that relies on a historical interaction within a specific domain. However, CDSR may underperform compared to the SDSR approach in certain domains due to negative transfer, which occurs when there is a l… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted at SIGIR'24

  7. arXiv:2407.10641  [pdf, other

    cs.CV cs.LG

    Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems

    Authors: Hyungjin Chung, Jong Chul Ye

    Abstract: Recent inverse problem solvers that leverage generative diffusion priors have garnered significant attention due to their exceptional quality. However, adaptation of the prior is necessary when there exists a discrepancy between the training and testing distributions. In this work, we propose deep diffusion image prior (DDIP), which generalizes the recent adaptation method of SCD by introducing a… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, 25 pages, 8 figures

  8. arXiv:2407.10589  [pdf, other

    cs.CR

    The Medium is the Message: How Secure Messaging Apps Leak Sensitive Data to Push Notification Services

    Authors: Nikita Samarin, Alex Sanchez, Trinity Chung, Akshay Dan Bhavish Juleemun, Conor Gilsenan, Nick Merrill, Joel Reardon, Serge Egelman

    Abstract: Like most modern software, secure messaging apps rely on third-party components to implement important app functionality. Although this practice reduces engineering costs, it also introduces the risk of inadvertent privacy breaches due to misconfiguration errors or incomplete documentation. Our research investigated secure messaging apps' usage of Google's Firebase Cloud Messaging (FCM) service to… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: To be presented at the 24th Privacy Enhancing Technologies Symposium (PETS 2024)

  9. arXiv:2407.08991  [pdf

    eess.AS cs.AI cs.CC

    Optimization of DNN-based speaker verification model through efficient quantization technique

    Authors: Yeona Hong, Woo-Jin Chung, Hong-Goo Kang

    Abstract: As Deep Neural Networks (DNNs) rapidly advance in various fields, including speech verification, they typically involve high computational costs and substantial memory consumption, which can be challenging to manage on mobile systems. Quantization of deep models offers a means to reduce both computational and memory expenses. Our research proposes an optimization framework for the quantization of… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: in Korean language, Accepted at Society of Electronic Engineers of Korea Conference 2024

  10. arXiv:2407.08855  [pdf, other

    eess.IV cs.CV

    BraTS-PEDs: Results of the Multi-Consortium International Pediatric Brain Tumor Segmentation Challenge 2023

    Authors: Anahita Fathi Kazerooni, Nastaran Khalili, Xinyang Liu, Debanjan Haldar, Zhifan Jiang, Anna Zapaishchykova, Julija Pavaine, Lubdha M. Shah, Blaise V. Jones, Nakul Sheth, Sanjay P. Prabhu, Aaron S. McAllister, Wenxin Tu, Khanak K. Nandolia, Andres F. Rodriguez, Ibraheem Salman Shaikh, Mariana Sanchez Montano, Hollie Anne Lai, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Hannah Anderson, Syed Muhammed Anwar, Alejandro Aristizabal, Sina Bagheri , et al. (55 additional authors not shown)

    Abstract: Pediatric central nervous system tumors are the leading cause of cancer-related deaths in children. The five-year survival rate for high-grade glioma in children is less than 20%. The development of new treatments is dependent upon multi-institutional collaborative clinical trials requiring reproducible and accurate centralized response assessment. We present the results of the BraTS-PEDs 2023 cha… ▽ More

    Submitted 16 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  11. arXiv:2407.08691  [pdf, other

    cs.SD cs.AI eess.AS

    ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions

    Authors: Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak

    Abstract: Transformers have rapidly overtaken CNN-based architectures as the new standard in audio classification. Transformer-based models, such as the Audio Spectrogram Transformers (AST), also inherit the fixed-size input paradigm from CNNs. However, this leads to performance degradation for ASTs in the inference when input lengths vary from the training. This paper introduces an approach that enables th… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024. Code is available at https://github.com/JiuFengSC/ElasticAST

  12. arXiv:2407.06025  [pdf, other

    cs.AI

    iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement

    Authors: Aoyu Pang, Maonan Wang, Man-On Pun, Chung Shue Chen, Xi Xiong

    Abstract: Urban congestion remains a critical challenge, with traffic signal control (TSC) emerging as a potent solution. TSC is often modeled as a Markov Decision Process problem and then solved using reinforcement learning (RL), which has proven effective. However, the existing RL-based TSC system often overlooks imperfect observations caused by degraded communication, such as packet loss, delays, and noi… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  13. arXiv:2407.05467  [pdf, other

    cs.DC cs.AI

    The infrastructure powering IBM's Gen AI model development

    Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

    Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

  14. arXiv:2407.04307  [pdf, other

    cs.CL cs.LG

    Crafting Large Language Models for Enhanced Interpretability

    Authors: Chung-En Sun, Tuomas Oikarinen, Tsui-Wei Weng

    Abstract: We introduce the Concept Bottleneck Large Language Model (CB-LLM), a pioneering approach to creating inherently interpretable Large Language Models (LLMs). Unlike traditional black-box LLMs that rely on post-hoc interpretation methods with limited neuron function insights, CB-LLM sets a new standard with its built-in interpretability, scalability, and ability to provide clear, accurate explanation… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Present at ICML 2024 Mechanistic Interpretability (MI) Workshop

  15. arXiv:2407.03103  [pdf, other

    cs.CL

    Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

    Authors: Suyeon Lee, Sunghwan Kim, Minju Kim, Dongjin Kang, Dongil Yang, Harim Kim, Minseok Kang, Dayi Jung, Min Hee Kim, Seungbeen Lee, Kyoung-Mee Chung, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To add… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Under Review

  16. arXiv:2407.00568  [pdf, other

    cs.LG cs.AI

    Divide And Conquer: Learning Chaotic Dynamical Systems With Multistep Penalty Neural Ordinary Differential Equations

    Authors: Dibyajyoti Chakraborty, Seung Whan Chung, Romit Maulik

    Abstract: Forecasting high-dimensional dynamical systems is a fundamental challenge in various fields, such as the geosciences and engineering. Neural Ordinary Differential Equations (NODEs), which combine the power of neural networks and numerical solvers, have emerged as a promising algorithm for forecasting complex nonlinear dynamical systems. However, classical techniques used for NODE training are inef… ▽ More

    Submitted 1 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

    Comments: 20 pages, 10 Figures, submitted to Journal of Computational Physics

  17. arXiv:2406.19622  [pdf, other

    cs.LG cs.AI

    Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness

    Authors: Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-Rung Lee

    Abstract: The security and robustness of deep neural networks (DNNs) have become increasingly concerning. This paper aims to provide both a theoretical foundation and a practical solution to ensure the reliability of DNNs. We explore the concept of Lipschitz continuity to certify the robustness of DNNs against adversarial attacks, which aim to mislead the network with adding imperceptible perturbations into… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  18. arXiv:2406.18925  [pdf, other

    cs.CL cs.CV

    Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

    Authors: Jiwan Chung, Sungjae Lee, Minseo Kim, Seungju Han, Ashkan Yousefpour, Jack Hessel, Youngjae Yu

    Abstract: Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something. Understanding these arguments requires selective vision: only specific visual stimuli within an image are relevant to the argument, and relevance can only be understood within the context of a broader argumentative structure. While visual arguments are readily appreciated by… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures

  19. arXiv:2406.18561  [pdf, other

    cs.CV cs.LG

    SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching

    Authors: Yongmin Lee, Hye Won Chung

    Abstract: Dataset distillation aims to synthesize a small number of images per class (IPC) from a large dataset to approximate full dataset training with minimal performance loss. While effective in very small IPC ranges, many distillation methods become less effective, even underperforming random sample selection, as IPC increases. Our examination of state-of-the-art trajectory-matching based distillation… ▽ More

    Submitted 28 May, 2024; originally announced June 2024.

    Comments: ICML 2024

  20. arXiv:2406.18375  [pdf, other

    cs.CV

    From Majority to Minority: A Diffusion-based Augmentation for Underrepresented Groups in Skin Lesion Analysis

    Authors: Janet Wang, Yunsung Chung, Zhengming Ding, Jihun Hamm

    Abstract: AI-based diagnoses have demonstrated dermatologist-level performance in classifying skin cancer. However, such systems are prone to under-performing when tested on data from minority groups that lack sufficient representation in the training sets. Although data collection and annotation offer the best means for promoting minority groups, these processes are costly and time-consuming. Prior works h… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  21. arXiv:2406.18062  [pdf, other

    cs.LG cs.AI

    Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

    Authors: Chung-En Sun, Sicun Gao, Tsui-Wei Weng

    Abstract: Robustness remains a paramount concern in deep reinforcement learning (DRL), with randomized smoothing emerging as a key technique for enhancing this attribute. However, a notable gap exists in the performance of current smoothed DRL agents, often characterized by significantly low clean rewards and weak robustness. In response to this challenge, our study introduces innovative algorithms aimed at… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Published in ICML 2024

  22. arXiv:2406.17329  [pdf, other

    eess.SP cs.SD eess.AS physics.bio-ph

    Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

    Authors: Woo-Jin Chung, Hong-Goo Kang

    Abstract: We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern in… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  23. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures

  24. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  25. arXiv:2406.16313  [pdf, other

    cs.CR

    Thinking Inside The Box: Privacy Against Stronger Adversaries

    Authors: Eldon Chung

    Abstract: In this thesis, we study extensions of statistical cryptographic primitives. In particular we study leakage-resilient secret sharing, non-malleable extractors, and immunized ideal one-way functions. The thesis is divided into three main chapters. In the first chapter, we show that 2-out-of-2 leakage resilient (and also non-malleable) secret sharing requires randomness sources that are also extract… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: PhD thesis

  26. arXiv:2406.15451  [pdf, other

    cs.CV cs.AI cs.LG

    Deep Vision-Based Framework for Coastal Flood Prediction Under Climate Change Impacts and Shoreline Adaptations

    Authors: Areg Karapetyan, Aaron Chung Hin Chow, Samer Madanat

    Abstract: In light of growing threats posed by climate change in general and sea level rise (SLR) in particular, the necessity for computationally efficient means to estimate and analyze potential coastal flood hazards has become increasingly pressing. Data-driven supervised learning methods serve as promising candidates that can dramatically expedite the process, thereby eliminating the computational bottl… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  27. arXiv:2406.14703  [pdf, other

    cs.CL cs.AI

    Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

    Authors: Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu

    Abstract: The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliabilit… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint; Under review

  28. arXiv:2406.14559  [pdf, other

    cs.SD eess.AS

    Disentangled Representation Learning for Environment-agnostic Speaker Recognition

    Authors: KiHyun Nam, Hee-Soo Heo, Jee-weon Jung, Joon Son Chung

    Abstract: This work presents a framework based on feature disentanglement to learn speaker embeddings that are robust to environmental variations. Our framework utilises an auto-encoder as a disentangler, dividing the input speaker embedding into components related to the speaker and other residual information. We employ a group of objective functions to ensure that the auto-encoder's code representation -… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024. The official webpage can be found at https://mm.kaist.ac.kr/projects/voxceleb-disentangler/

  29. arXiv:2406.14190  [pdf, other

    cs.LO

    Extended Resolution Clause Learning via Dual Implication Points

    Authors: Sam Buss, Jonathan Chung, Vijay Ganesh, Albert Oliveras

    Abstract: We present a new extended resolution clause learning (ERCL) algorithm, implemented as part of a conflict-driven clause-learning (CDCL) SAT solver, wherein new variables are dynamically introduced as definitions for {\it Dual Implication Points} (DIPs) in the implication graph constructed by the solver at runtime. DIPs are generalizations of unique implication points and can be informally viewed as… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  30. arXiv:2406.13144  [pdf, other

    cs.CL cs.AI

    DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents

    Authors: Jiho Kim, Woosog Chay, Hyeonji Hwang, Daeun Kyung, Hyunseung Chung, Eunbyeol Cho, Yohan Jo, Edward Choi

    Abstract: Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  31. arXiv:2406.12632  [pdf, other

    eess.IV cs.CV

    Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Image Synthesis: T1 MRI to Tau-PET

    Authors: Symac Kim, Junho Moon, Haejun Chung, Ikbeom Jang

    Abstract: Alzheimer's Disease (AD) is the most common form of dementia, characterised by cognitive decline and biomarkers such as tau-proteins. Tau-positron emission tomography (tau-PET), which employs a radiotracer to selectively bind, detect, and visualise tau protein aggregates within the brain, is valuable for early AD diagnosis but is less accessible due to high costs, limited availability, and its inv… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 24 pages, 5 figures

  32. arXiv:2406.12246  [pdf, other

    cs.LG cs.CL cs.CV

    TroL: Traversal of Layers for Large Language and Vision Models

    Authors: Byung-Kwan Lee, Sangyun Chung, Chae Won Kim, Beomchan Park, Yong Man Ro

    Abstract: Large language and vision models (LLVMs) have been driven by the generalization power of large language models (LLMs) and the advent of visual instruction tuning. Along with scaling them up directly, these models enable LLVMs to showcase powerful vision language (VL) performances by covering diverse tasks via natural language instructions. However, existing open-source LLVMs that perform comparabl… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Code is available in https://github.com/ByungKwanLee/TroL

  33. arXiv:2406.10549  [pdf, other

    eess.AS cs.CL cs.SD

    Lightweight Audio Segmentation for Long-form Speech Translation

    Authors: Jaesong Lee, Soyoon Kim, Hanbyul Kim, Joon Son Chung

    Abstract: Speech segmentation is an essential part of speech translation (ST) systems in real-world scenarios. Since most ST models are designed to process speech segments, long-form audio must be partitioned into shorter segments before translation. Recently, data-driven approaches for the speech segmentation task have been developed. Although the approaches improve overall translation quality, a performan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  34. arXiv:2406.09366  [pdf, other

    cs.LG cs.CV q-bio.NC

    Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

    Authors: Rylan Schaeffer, Victor Lecomte, Dhruv Bhandarkar Pai, Andres Carranza, Berivan Isik, Alyssa Unell, Mikail Khona, Thomas Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

    Abstract: Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to impro… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  35. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  36. arXiv:2406.09286  [pdf, other

    eess.AS cs.SD

    FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching

    Authors: Chaeyoung Jung, Suyeon Lee, Ji-Hoon Kim, Joon Son Chung

    Abstract: This work proposes an efficient method to enhance the quality of corrupted speech signals by leveraging both acoustic and visual cues. While existing diffusion-based approaches have demonstrated remarkable quality, their applicability is limited by slow inference speeds and computational complexity. To address this issue, we present FlowAVSE which enhances the inference speed and reduces the numbe… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  37. arXiv:2406.08719  [pdf, other

    cs.CR

    TikTag: Breaking ARM's Memory Tagging Extension with Speculative Execution

    Authors: Juhee Kim, Jinbum Park, Sihyeon Roh, Jaeyoung Chung, Youngjoo Lee, Taesoo Kim, Byoungyoung Lee

    Abstract: ARM Memory Tagging Extension (MTE) is a new hardware feature introduced in ARMv8.5-A architecture, aiming to detect memory corruption vulnerabilities. The low overhead of MTE makes it an attractive solution to mitigate memory corruption attacks in modern software systems and is considered the most promising path forward for improving C/C++ software security. This paper explores the potential secur… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  38. arXiv:2406.08070  [pdf, ps, other

    cs.CV cs.AI cs.LG

    CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

    Authors: Hyungjin Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, Jong Chul Ye

    Abstract: Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG has notable drawbacks. For instance, DDIM with CFG lacks invertibility, complicating image editing; furthermore, high guidance scales, essential for high-quality outputs, frequently result in issues like mode collapse. Contrary to the widespread belief that these are… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  39. arXiv:2406.07735  [pdf, other

    cs.CL cs.LG

    REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy

    Authors: Haw-Shiuan Chang, Nanyun Peng, Mohit Bansal, Anil Ramakrishna, Tagyoung Chung

    Abstract: Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. For example, a higher p threshold in the nucleus (top-p) sampling increases the diversity but decreases the factuality, and vice versa. In this paper, we propose REAL (Residual Entropy from Asymptotic Line) sampling, a decoding method that achieves improved fa… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  40. arXiv:2406.06899  [pdf, other

    cs.RO

    Developing, Analyzing, and Evaluating Vehicular Lane Keeping Algorithms Under Dynamic Lighting and Weather Conditions Using Electric Vehicles

    Authors: Michael Khalfin, Jack Volgren, Matthew Jones, Luke LeGoullon, Joshua Siegel, Chan-Jin Chung

    Abstract: Self-driving vehicles have the potential to reduce accidents and fatalities on the road. Many production vehicles already come equipped with basic self-driving capabilities, but have trouble following lanes in adverse lighting and weather conditions. Therefore, we develop, analyze, and evaluate two vehicular lane-keeping algorithms under dynamic weather conditions using a combined deep learning- a… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Supported by the National Science Foundation under Grants No. 2150292 and 2150096

  41. arXiv:2406.06014  [pdf, other

    math.ST cs.SI stat.ME stat.ML

    Network two-sample test for block models

    Authors: Chung Kyong Nguen, Oscar Hernan Madrid Padilla, Arash A. Amini

    Abstract: We consider the two-sample testing problem for networks, where the goal is to determine whether two sets of networks originated from the same stochastic model. Assuming no vertex correspondence and allowing for different numbers of nodes, we address a fundamental network testing problem that goes beyond simple adjacency matrix comparisons. We adopt the stochastic block model (SBM) for network dist… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  42. arXiv:2406.05339  [pdf, other

    eess.AS cs.AI

    To what extent can ASV systems naturally defend against spoofing attacks?

    Authors: Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, Joon Son Chung

    Abstract: The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV systems. This study investigates whether ASV effortlessly acquires robustness against spoofing attacks (i.e., zero-shot capability) by systematically ex… ▽ More

    Submitted 14 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, 3 tables, Interspeech 2024

  43. arXiv:2406.04627  [pdf, other

    cs.LG cs.AI

    Denoising-Aware Contrastive Learning for Noisy Time Series

    Authors: Shuang Zhou, Daochen Zha, Xiao Shen, Xiao Huang, Rui Zhang, Fu-Lai Chung

    Abstract: Time series self-supervised learning (SSL) aims to exploit unlabeled data for pre-training to mitigate the reliance on labels. Despite the great success in recent years, there is limited discussion on the potential noise in the time series, which can severely impair the performance of existing SSL methods. To mitigate the noise, the de facto strategy is to apply conventional denoising methods befo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)

  44. arXiv:2406.03344  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

    Authors: Mehmet Hamza Erol, Arda Senocak, Jiu Feng, Joon Son Chung

    Abstract: Transformers have rapidly become the preferred choice for audio classification, surpassing methods based on CNNs. However, Audio Spectrogram Transformers (ASTs) exhibit quadratic scaling due to self-attention. The removal of this quadratic self-attention cost presents an appealing direction. Recently, state space models (SSMs), such as Mamba, have demonstrated potential in language and vision task… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Code is available at https://github.com/mhamzaerol/Audio-Mamba-AuM

  45. arXiv:2406.03057  [pdf, other

    cs.LG stat.ML

    BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges

    Authors: Hoyong Choi, Nohyun Ki, Hye Won Chung

    Abstract: Data subset selection aims to find a smaller yet informative subset of a large dataset that can approximate the full-dataset training, addressing challenges associated with training neural networks on large-scale datasets. However, existing methods tend to specialize in either high or low selection ratio regimes, lacking a universal approach that consistently achieves competitive performance acros… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  46. arXiv:2406.01480  [pdf, other

    cs.CV

    Towards Automating the Retrospective Generation of BIM Models: A Unified Framework for 3D Semantic Reconstruction of the Built Environment

    Authors: Ka Lung Cheung, Chi Chung Lee

    Abstract: The adoption of Building Information Modeling (BIM) is beneficial in construction projects. However, it faces challenges due to the lack of a unified and scalable framework for converting 3D model details into BIM. This paper introduces SRBIM, a unified semantic reconstruction architecture for BIM generation. Our approach's effectiveness is demonstrated through extensive qualitative and quantitati… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: CVPRW 2024, Oral

  47. arXiv:2406.01337  [pdf, other

    cs.CV

    ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds

    Authors: Ka Lung Cheung, Chi Chung Lee

    Abstract: Precise segmentation of architectural structures provides detailed information about various building components, enhancing our understanding and interaction with our built environment. Nevertheless, existing outdoor 3D point cloud datasets have limited and detailed annotations on architectural exteriors due to privacy concerns and the expensive costs of data acquisition and annotation. To overcom… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: CVPRW 2024 (Oral)

  48. arXiv:2405.19595  [pdf

    cs.CV

    The RSNA Abdominal Traumatic Injury CT (RATIC) Dataset

    Authors: Jeffrey D. Rudie, Hui-Ming Lin, Robyn L. Ball, Sabeena Jalal, Luciano M. Prevedello, Savvas Nicolaou, Brett S. Marinelli, Adam E. Flanders, Kirti Magudia, George Shih, Melissa A. Davis, John Mongan, Peter D. Chang, Ferco H. Berger, Sebastiaan Hermans, Meng Law, Tyler Richards, Jan-Peter Grunz, Andreas Steven Kunz, Shobhit Mathur, Sandro Galea-Soler, Andrew D. Chung, Saif Afat, Chin-Chi Kuo, Layal Aweidah , et al. (15 additional authors not shown)

    Abstract: The RSNA Abdominal Traumatic Injury CT (RATIC) dataset is the largest publicly available collection of adult abdominal CT studies annotated for traumatic injuries. This dataset includes 4,274 studies from 23 institutions across 14 countries. The dataset is freely available for non-commercial use via Kaggle at https://www.kaggle.com/competitions/rsna-2023-abdominal-trauma-detection. Created for the… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 40 pages, 2 figures, 3 tables

  49. arXiv:2405.18698  [pdf, other

    cs.LG cs.AI

    Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees

    Authors: Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung, Kyungjae Lee, Songhwai Oh

    Abstract: The field of risk-constrained reinforcement learning (RCRL) has been developed to effectively reduce the likelihood of worst-case scenarios by explicitly handling risk-measure-based constraints. However, the nonlinearity of risk measures makes it challenging to achieve convergence and optimality. To overcome the difficulties posed by the nonlinearity, we propose a spectral risk measure-constrained… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 26 pages

  50. arXiv:2405.18383  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    Brain Tumor Segmentation (BraTS) Challenge 2024: Meningioma Radiotherapy Planning Automated Segmentation

    Authors: Dominic LaBella, Katherine Schumacher, Michael Mix, Kevin Leu, Shan McBurney-Lin, Pierre Nedelec, Javier Villanueva-Meyer, Jonathan Shapey, Tom Vercauteren, Kazumi Chia, Omar Al-Salihi, Justin Leu, Lia Halasz, Yury Velichko, Chunhao Wang, John Kirkpatrick, Scott Floyd, Zachary J. Reitman, Trey Mullikin, Ulas Bagci, Sean Sachdev, Jona A. Hattangadi-Gluth, Tyler Seibert, Nikdokht Farid, Connor Puett , et al. (45 additional authors not shown)

    Abstract: The 2024 Brain Tumor Segmentation Meningioma Radiotherapy (BraTS-MEN-RT) challenge aims to advance automated segmentation algorithms using the largest known multi-institutional dataset of radiotherapy planning brain MRIs with expert-annotated target labels for patients with intact or post-operative meningioma that underwent either conventional external beam radiotherapy or stereotactic radiosurger… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 13 pages, 9 figures, 1 table