Skip to main content

Showing 1–50 of 401 results for author: Choi, S

  1. arXiv:2407.11793  [pdf, other

    cs.CV cs.AI cs.GR

    Click-Gaussian: Interactive Segmentation to Any 3D Gaussians

    Authors: Seokhun Choi, Hyeonseop Song, Jaechul Kim, Taehyeong Kim, Hoseok Do

    Abstract: Interactive segmentation of 3D Gaussians opens a great opportunity for real-time manipulation of 3D scenes thanks to the real-time rendering capability of 3D Gaussian Splatting. However, the current methods suffer from time-consuming post-processing to deal with noisy segmentation output. Also, they struggle to provide detailed segmentation, which is important for fine-grained manipulation of 3D s… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. The first two authors contributed equally to this work

  2. arXiv:2407.11781  [pdf, other

    cs.CV

    SlingBAG: Sliding ball adaptive growth algorithm with differentiable radiation enables super-efficient iterative 3D photoacoustic image reconstruction

    Authors: Shuang Li, Yibing Wang, Jian Gao, Chulhong Kim, Seongwook Choi, Yu Zhang, Qian Chen, Yao Yao, Changhui Li

    Abstract: High-quality 3D photoacoustic imaging (PAI) reconstruction under sparse view or limited view has long been challenging. Traditional 3D iterative-based reconstruction methods suffer from both slow speed and high memory consumption. Recently, in computer graphics, the differentiable rendering has made significant progress, particularly with the rise of 3D Gaussian Splatting. Inspired by these, we in… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  3. arXiv:2407.11057  [pdf, other

    cs.LG cs.AI q-bio.BM

    SPIN: SE(3)-Invariant Physics Informed Network for Binding Affinity Prediction

    Authors: Seungyeon Choi, Sangmin Seo, Sanghyun Park

    Abstract: Accurate prediction of protein-ligand binding affinity is crucial for rapid and efficient drug development. Recently, the importance of predicting binding affinity has led to increased attention on research that models the three-dimensional structure of protein-ligand complexes using graph neural networks to predict binding affinity. However, traditional methods often fail to accurately model the… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted to ECAI 2024

  4. arXiv:2407.10206  [pdf

    cs.CE cs.AI cs.NE cs.SI

    Dominant Design Prediction with Phylogenetic Networks

    Authors: Youwei He, Jeong-Dong Lee, Dawoon Jeong, Sungjun Choi, Jiyong Kim

    Abstract: This study proposes an effective method to predict technology development from an evolutionary perspective. Product evolution is the result of technological evolution and market selection. A phylogenetic network is the main method to study product evolution. The formation of the dominant design determines the trajectory of technology development. How to predict future dominant design has become a… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  5. arXiv:2407.09434  [pdf, other

    cs.LG cs.AI cs.CE eess.SY

    A Perspective on Foundation Models for the Electric Power Grid

    Authors: Hendrik F. Hamann, Thomas Brunschwiler, Blazhe Gjorgiev, Leonardo S. A. Martins, Alban Puech, Anna Varbella, Jonas Weiss, Juan Bernabe-Moreno, Alexandre Blondin Massé, Seong Choi, Ian Foster, Bri-Mathias Hodge, Rishabh Jain, Kibaek Kim, Vincent Mai, François Mirallès, Martin De Montigny, Octavio Ramos-Leaños, Hussein Suprême, Le Xie, El-Nasser S. Youssef, Arnaud Zinflou, Alexander J. Belvi, Ricardo J. Bessa, Bishnu Prasad Bhattari , et al. (2 additional authors not shown)

    Abstract: Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Lead contact: H.F.H.; Major equal contributors: H.F.H., T.B., B.G., L.S.A.M., A.P., A.V., J.W.; Significant equal contributors: J.B., A.B.M., S.C., I.F., B.H., R.J., K.K., V.M., F.M., M.D.M., O.R., H.S., L.X., E.S.Y., A.Z.; Other equal contributors: A.J.B., R.J.B., B.P.B., J.S., S.S

  6. arXiv:2407.08964  [pdf, other

    cs.LG cs.RO

    Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control

    Authors: Sicong Jiang, Seongjin Choi, Lijun Sun

    Abstract: Cooperative Adaptive Cruise Control (CACC) plays a pivotal role in enhancing traffic efficiency and safety in Connected and Autonomous Vehicles (CAVs). Reinforcement Learning (RL) has proven effective in optimizing complex decision-making processes in CACC, leading to improved system performance and adaptability. Among RL approaches, Multi-Agent Reinforcement Learning (MARL) has shown remarkable p… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  7. arXiv:2407.08245  [pdf, other

    cs.LG cs.CV

    Feature Diversification and Adaptation for Federated Domain Generalization

    Authors: Seunghan Yang, Seokeon Choi, Hyunsin Park, Sungha Choi, Simyung Chang, Sungrack Yun

    Abstract: Federated learning, a distributed learning paradigm, utilizes multiple clients to build a robust global model. In real-world applications, local clients often operate within their limited domains, leading to a `domain shift' across clients. Privacy concerns limit each client's learning to its own domain data, which increase the risk of overfitting. Moreover, the process of aggregating models train… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  8. arXiv:2407.06551  [pdf, other

    cs.CL

    OffsetBias: Leveraging Debiased Data for Tuning Evaluators

    Authors: Junsoo Park, Seungyeon Jwa, Meiying Ren, Daeyoung Kim, Sanghyuk Choi

    Abstract: Employing Large Language Models (LLMs) to assess the quality of generated responses, such as prompting instruct-tuned models or fine-tuning judge models, has become a widely adopted evaluation method. It is also known that such evaluators are vulnerable to biases, such as favoring longer responses. While it is important to overcome this problem, the specifics of these biases remain under-explored.… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Work in Progress

  9. arXiv:2407.05664  [pdf, other

    stat.ML cs.AI cs.LG

    How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning

    Authors: Arthur Jacot, Seok Hoan Choi, Yuxiao Wen

    Abstract: We show that deep neural networks (DNNs) can efficiently learn any composition of functions with bounded $F_{1}$-norm, which allows DNNs to break the curse of dimensionality in ways that shallow networks cannot. More specifically, we derive a generalization bound that combines a covering number argument for compositionality, and the $F_{1}$-norm (or the related Barron norm) for large width adaptiv… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  10. arXiv:2406.15725  [pdf, other

    eess.AS cs.SD

    Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes

    Authors: Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park

    Abstract: To tackle sound event detection (SED) task, we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency warping and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 Challenge Task 4 technical report

  11. arXiv:2406.08796  [pdf, other

    cs.CL

    Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning

    Authors: Janghoon Han, Changho Lee, Joongbo Shin, Stanley Jungkyu Choi, Honglak Lee, Kynghoon Bae

    Abstract: Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in ins… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024 (Camera-ready), by Janghoon Han and Changho Lee, with equal contribution

  12. arXiv:2406.06246  [pdf, other

    cs.LG

    Data-Efficient Learning with Neural Programs

    Authors: Alaia Solko-Breslin, Seewon Choi, Ziyang Li, Neelay Velingker, Rajeev Alur, Mayur Naik, Eric Wong

    Abstract: Many computational tasks can be naturally expressed as a composition of a DNN followed by a program written in a traditional programming language or an API call to an LLM. We call such composites "neural programs" and focus on the problem of learning the DNN parameters when the training data consist of end-to-end input-output labels for the composite. When the program is written in a differentiabl… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  13. arXiv:2406.05472  [pdf, other

    cs.CR eess.SY

    A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications

    Authors: Aydin Zaboli, Seong Lok Choi, Tai-Jin Song, Junho Hong

    Abstract: Cybersecurity breaches in digital substations can pose significant challenges to the stability and reliability of power system operations. To address these challenges, defense and mitigation techniques are required. Identifying and detecting anomalies in information and communication technology (ICT) is crucial to ensure secure device interactions within digital substations. This paper proposes a… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures, Submitted to IEEE Transactions on Information Forensics and Security

  14. arXiv:2406.04000  [pdf, other

    physics.optics cs.ET

    Stochastic logic in biased coupled photonic probabilistic bits

    Authors: Michael Horodynski, Charles Roques-Carmes, Yannick Salamin, Seou Choi, Jamison Sloan, Di Luo, Marin Soljačić

    Abstract: Optical computing often employs tailor-made hardware to implement specific algorithms, trading generality for improved performance in key aspects like speed and power efficiency. An important computing approach that is still missing its corresponding optical hardware is probabilistic computing, used e.g. for solving difficult combinatorial optimization problems. In this study, we propose an experi… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  15. arXiv:2406.03234  [pdf, other

    cs.LG cs.AI

    Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning

    Authors: Inwoo Hwang, Yunhyeok Kwak, Suhyung Choi, Byoung-Tak Zhang, Sanghack Lee

    Abstract: Causal dynamics learning has recently emerged as a promising approach to enhancing robustness in reinforcement learning (RL). Typically, the goal is to build a dynamics model that makes predictions based on the causal relationships among the entities. Despite the fact that causal connections often manifest only under certain contexts, existing approaches overlook such fine-grained relationships an… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  16. arXiv:2405.16155  [pdf, other

    cs.CL

    Improving Multi-lingual Alignment Through Soft Contrastive Learning

    Authors: Minsu Park, Seyeon Choi, Chanyeol Choi, Jun-Seong Kim, Jy-yong Sohn

    Abstract: Making decent multi-lingual sentence representations is critical to achieve high performances in cross-lingual downstream tasks. In this work, we propose a novel method to align multi-lingual embeddings based on the similarity of sentences measured by a pre-trained mono-lingual embedding model. Given translation sentence pairs, we train a multi-lingual model in a way that the similarity between cr… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: 8 pages, 1 figures, Accepted at NAACL SRW 2024

  17. arXiv:2405.01591  [pdf, other

    cs.CL cs.AI eess.IV

    Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model

    Authors: Seonhee Cho, Choonghan Kim, Jiho Lee, Chetan Chilkunda, Sujin Choi, Joo Heung Yoon

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have attracted interest in their generalization capability with only a few samples in the prompt. This progress is particularly relevant to the medical domain, where the quality and sensitivity of data pose unique challenges for model training and application. However, the dependency on high-quality data for effective in-context learning raises… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

    Comments: Under review

  18. arXiv:2405.01016  [pdf, other

    cs.CV cs.AI

    Addressing Diverging Training Costs using Local Restoration for Precise Bird's Eye View Map Construction

    Authors: Minsu Kim, Giseop Kim, Sunwook Choi

    Abstract: Recent advancements in Bird's Eye View (BEV) fusion for map construction have demonstrated remarkable mapping of urban environments. However, their deep and bulky architecture incurs substantial amounts of backpropagation memory and computing latency. Consequently, the problem poses an unavoidable bottleneck in constructing high-resolution (HR) BEV maps, as their large-sized features cause signifi… ▽ More

    Submitted 3 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  19. arXiv:2404.18423  [pdf, other

    cs.CV cs.AI

    Unsupervised Dynamics Prediction with Object-Centric Kinematics

    Authors: Yeon-Ji Song, Suhyung Choi, Jaein Kim, Jin-Hwa Kim, Byoung-Tak Zhang

    Abstract: Human perception involves discerning complex multi-object scenes into time-static object appearance (ie, size, shape, color) and time-varying object motion (ie, location, velocity, acceleration). This innate ability to unconsciously understand the environment is the motivation behind the success of dynamics modeling. Object-centric representations have emerged as a promising tool for dynamics pred… ▽ More

    Submitted 6 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: 15 pages, 6 figures, 4 tables

  20. arXiv:2404.17179  [pdf, other

    cs.HC cs.ET

    Meta-Object: Interactive and Multisensory Virtual Object Learned from the Real World for the Post-Metaverse

    Authors: Dooyoung Kim, Taewook Ha, Jinseok Hong, Seonji Kim, Selin Choi, Heejeong Ko, Woontack Woo

    Abstract: With the proliferation of wearable Augmented Reality/Virtual Reality (AR/VR) devices, ubiquitous virtual experiences seamlessly integrate into daily life through metaverse platforms. To support immersive metaverse experiences akin to reality, we propose a next-generation virtual object, a meta-object, a property-embedded virtual object that contains interactive and multisensory characteristics lea… ▽ More

    Submitted 28 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: 12 pages, 4 figures, under review in the IEEE CG&A magazine

  21. arXiv:2404.16418  [pdf, other

    cs.CL

    Instruction Matters, a Simple yet Effective Task Selection Approach in Instruction Tuning for Specific Tasks

    Authors: Changho Lee, Janghoon Han, Seonghyeon Ye, Stanley Jungkyu Choi, Honglak Lee, Kyunghoon Bae

    Abstract: Instruction tuning has shown its ability to not only enhance zero-shot generalization across various tasks but also its effectiveness in improving the performance of specific tasks. A crucial aspect in instruction tuning for a particular task is a strategic selection of related tasks that offer meaningful supervision, thereby enhancing efficiency and preventing performance degradation from irrelev… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 21 pages, 6 figures, 16 tables

  22. arXiv:2404.15635  [pdf, other

    cs.CV cs.LG

    A Real-time Evaluation Framework for Pedestrian's Potential Risk at Non-Signalized Intersections Based on Predicted Post-Encroachment Time

    Authors: Tengfeng Lin, Zhixiong Jin, Seongjin Choi, Hwasoo Yeo

    Abstract: Addressing pedestrian safety at intersections is one of the paramount concerns in the field of transportation research, driven by the urgency of reducing traffic-related injuries and fatalities. With advances in computer vision technologies and predictive models, the pursuit of developing real-time proactive protection systems is increasingly recognized as vital to improving pedestrian safety at i… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  23. arXiv:2404.13272  [pdf, other

    cs.HC

    DinAR: Augmenting Reality for Sustainable Dining

    Authors: MJ Johns, Eunsol Sol Choi, Derusha Baskaran

    Abstract: Sustainable food is among the many challenges associated with climate change. The resources required to grow or gather the food and the distance it travels to reach the consumer are two key factors of an ingredient's sustainability. Food that is grown locally and is currently "in-season" will have a lower carbon footprint, but when dining out these details unfortunately may not affect one's orderi… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: Presented at CHI 2024 (arXiv:2404.05889), 5 pages, and 4 figures

    Report number: ARSJ/2024/10

  24. arXiv:2404.13028  [pdf

    cs.CE cs.AI

    When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering

    Authors: Stephen Choi, William Gazeley

    Abstract: This paper presents the LLM-ADE framework, a novel methodology for continued pre-training of large language models (LLMs) that addresses the challenges of catastrophic forgetting and double descent. LLM-ADE employs dynamic architectural adjustments, including selective block freezing and expansion, tailored to specific datasets. This strategy enhances model adaptability to new data while preservin… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 6 pages, 3 tables and 3 figures

  25. arXiv:2404.12168  [pdf, other

    cs.CV cs.AI

    Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization

    Authors: Insoo Kim, Jae Seok Choi, Geonseok Seo, Kinam Kwon, Jinwoo Shin, Hyong-Euk Lee

    Abstract: As recent advances in mobile camera technology have enabled the capability to capture high-resolution images, such as 4K images, the demand for an efficient deblurring model handling large motion has increased. In this paper, we discover that the image residual errors, i.e., blur-sharp pixel differences, can be grouped into some categories according to their motion blur type and how complex their… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: CVPR2024 Camera-Ready

  26. arXiv:2404.11936  [pdf, other

    cs.LG cs.AI cs.CV

    LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights

    Authors: Thibault Castells, Hyoung-Kyu Song, Bo-Kyeong Kim, Shinkook Choi

    Abstract: Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources. However, deploying LDMs on resource-limited devices remains a complex issue, presenting challenges such as memory consumption and inference speed. To address this issue, we introduce LD-Pruner, a novel performance-preserving structured prunin… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 8 pages, accepted to CVPR24 First Workshop on Efficient and On-Device Generation (EDGE)

  27. arXiv:2404.11925  [pdf, other

    cs.LG cs.AI cs.CV

    EdgeFusion: On-Device Text-to-Image Generation

    Authors: Thibault Castells, Hyoung-Kyu Song, Tairen Piao, Shinkook Choi, Bo-Kyeong Kim, Hanyoung Yim, Changgwun Lee, Jae Gon Kim, Tae-Ho Kim

    Abstract: The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application. To tackle this challenge, recent research focuses on methods to reduce sampling steps, such as Latent Consistency Model (LCM), and on employing architectural optimizations, including pruning and knowledge distillation. Diverging from existing approaches… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 4 pages, accepted to CVPR24 First Workshop on Efficient and On-Device Generation (EDGE)

  28. arXiv:2404.11810  [pdf, other

    cs.GR

    Holographic Parallax Improves 3D Perceptual Realism

    Authors: Dongyeon Kim, Seung-Woo Nam, Suyeon Choi, Jong-Mo Seo, Gordon Wetzstein, Yoonchan Jeong

    Abstract: Holographic near-eye displays are a promising technology to solve long-standing challenges in virtual and augmented reality display systems. Over the last few years, many different computer-generated holography (CGH) algorithms have been proposed that are supervised by different types of target content, such as 2.5D RGB-depth maps, 3D focal stacks, and 4D light fields. It is unclear, however, what… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 33 pages, 34 figures

  29. arXiv:2404.11630  [pdf, other

    cs.CV cs.AI

    SNP: Structured Neuron-level Pruning to Preserve Attention Scores

    Authors: Kyunghwan Shim, Jaewoong Yun, Shinkook Choi

    Abstract: Multi-head self-attention (MSA) is a key component of Vision Transformers (ViTs), which have achieved great success in various vision tasks. However, their high computational cost and memory footprint hinder their deployment on resource-constrained devices. Conventional pruning approaches can only compress and accelerate the MSA module using head pruning, although the head is not an atomic unit. T… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  30. arXiv:2404.11557  [pdf, other

    cs.RO

    Spatio-Temporal Motion Retargeting for Quadruped Robots

    Authors: Taerim Yoon, Dongho Kang, Seungmin Kim, Minsung Ahn, Stelian Coros, Sungjoon Choi

    Abstract: This work introduces a motion retargeting approach for legged robots, which aims to create motion controllers that imitate the fine behavior of animals. Our approach, namely spatio-temporal motion retargeting (STMR), guides imitation learning procedures by transferring motion from source to target, effectively bridging the morphological disparities by ensuring the feasibility of imitation on the t… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 34 pages, 7 figures, videos/code available at https://terry97-guel.github.io/STMR-RL.github.io/

  31. arXiv:2404.11343  [pdf, other

    cs.IR cs.AI

    Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System

    Authors: Sein Kim, Hongseok Kang, Seungyoon Choi, Donghyun Kim, Minchul Yang, Chanyoung Park

    Abstract: Collaborative filtering recommender systems (CF-RecSys) have shown successive results in enhancing the user experience on social media and e-commerce platforms. However, as CF-RecSys struggles under cold scenarios with sparse user-item interactions, recent strategies have focused on leveraging modality information of user/items (e.g., text or images) based on pre-trained modality encoders and Larg… ▽ More

    Submitted 1 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: KDD 2024

  32. arXiv:2404.09451  [pdf, other

    cs.CV

    Contrastive Mean-Shift Learning for Generalized Category Discovery

    Authors: Sua Choi, Dahyun Kang, Minsu Cho

    Abstract: We address the problem of generalized category discovery (GCD) that aims to partition a partially labeled collection of images; only a small part of the collection is labeled and the total number of target classes is unknown. To address this generalized image clustering problem, we revisit the mean-shift algorithm, i.e., a classic, powerful technique for mode seeking, and incorporate it into a con… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  33. arXiv:2404.02684  [pdf, other

    cs.CL cs.AI cs.LG

    Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers

    Authors: Sehyun Choi

    Abstract: Recently, multiple architectures has been proposed to improve the efficiency of the Transformer Language Models through changing the design of the self-attention block to have a linear-cost inference (LCI). A notable approach in this realm is the State-Space Machines (SSMs) architecture, which showed on-par performance on language modeling tasks with the self-attention transformers. However, such… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Preprint

  34. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  35. arXiv:2404.01628  [pdf, other

    cs.CV cs.AI cs.LG

    Learning Equi-angular Representations for Online Continual Learning

    Authors: Minhyuk Seo, Hyunseo Koh, Wonje Jeung, Minjae Lee, San Kim, Hankook Lee, Sungjun Cho, Sungik Choi, Hyunwoo Kim, Jonghyun Choi

    Abstract: Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so th… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  36. arXiv:2404.01524  [pdf, other

    cs.CV cs.AI

    On Train-Test Class Overlap and Detection for Image Retrieval

    Authors: Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang, Shunghyun Choi, Yeong Hyeon Gu, Yannis Avrithis

    Abstract: How important is it for training and evaluation sets to not have class overlap in image retrieval? We revisit Google Landmarks v2 clean, the most popular training set, by identifying and removing class overlap with Revisited Oxford and Paris [34], the most popular evaluation set. By comparing the original and the new RGLDv2-clean on a benchmark of reproduced state-of-the-art methods, our findings… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR2024 Accepted

  37. arXiv:2404.01156  [pdf, other

    cs.CV cs.AI

    SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining

    Authors: Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon, Shunghyun Choi, Yeong Hyeon Gu

    Abstract: Vision-language models (VLMs) have made significant strides in cross-modal understanding through large-scale paired datasets. However, in fashion domain, datasets often exhibit a disparity between the information conveyed in image and text. This issue stems from datasets containing multiple images of a single fashion item all paired with one text, leading to cases where some textual details are no… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR2024 Accepted

  38. Contextual AI Journaling: Integrating LLM and Time Series Behavioral Sensing Technology to Promote Self-Reflection and Well-being using the MindScape App

    Authors: Subigya Nepal, Arvind Pillai, William Campbell, Talie Massachi, Eunsol Soul Choi, Orson Xu, Joanna Kuc, Jeremy Huckins, Jason Holden, Colin Depp, Nicholas Jacobson, Mary Czerwinski, Eric Granholm, Andrew T. Campbell

    Abstract: MindScape aims to study the benefits of integrating time series behavioral patterns (e.g., conversational engagement, sleep, location) with Large Language Models (LLMs) to create a new form of contextual AI journaling, promoting self-reflection and well-being. We argue that integrating behavioral sensing in LLMs will likely lead to a new frontier in AI. In this Late-Breaking Work paper, we discuss… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    ACM Class: H.5.0; H.5.3; H.5.m; J.0

  39. arXiv:2403.17863  [pdf, other

    cs.DC

    An AI-Native Runtime for Multi-Wearable Environments

    Authors: Chulhong Min, Utku Günay Acer, SiYoung Jang, Sangwon Choi, Diana A. Vasile, Taesik Gong, Juheon Yi, Fahim Kawsar

    Abstract: The miniaturization of AI accelerators is paving the way for next-generation wearable applications within wearable technologies. We introduce Mojito, an AI-native runtime with advanced MLOps designed to facilitate the development and deployment of these applications on wearable devices. It emphasizes the necessity of dynamic orchestration of distributed resources equipped with ultra-low-power AI a… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 7 pages, 4 figures

  40. arXiv:2403.16167  [pdf, other

    cs.CV cs.CL

    Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models

    Authors: Minchan Kim, Minyeong Kim, Junik Bae, Suhwan Choi, Sungkyung Kim, Buru Chang

    Abstract: Hallucinations in vision-language models pose a significant challenge to their reliability, particularly in the generation of long captions. Current methods fall short of accurately identifying and mitigating these hallucinations. To address this issue, we introduce ESREAL, a novel unsupervised learning framework designed to suppress the generation of hallucinations through accurate localization a… ▽ More

    Submitted 5 May, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  41. arXiv:2403.15049  [pdf, other

    cs.CV cs.AI

    Continual Vision-and-Language Navigation

    Authors: Seongjun Jeong, Gi-Cheon Kang, Seongho Choi, Joochan Kim, Byoung-Tak Zhang

    Abstract: Vision-and-Language Navigation (VLN) agents navigate to a destination using natural language instructions and the visual information they observe. Existing methods for training VLN agents presuppose fixed datasets, leading to a significant limitation: the introduction of new environments necessitates retraining with previously encountered environments to preserve their knowledge. This makes it dif… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  42. arXiv:2403.11513  [pdf, other

    cs.RO

    Visual Preference Inference: An Image Sequence-Based Preference Reasoning in Tabletop Object Manipulation

    Authors: Joonhyung Lee, Sangbeom Park, Yongin Kwon, Jemin Lee, Minwook Ahn, Sungjoon Choi

    Abstract: In robotic object manipulation, human preferences can often be influenced by the visual attributes of objects, such as color and shape. These properties play a crucial role in operating a robot to interact with objects and align with human intention. In this paper, we focus on the problem of inferring underlying human preferences from a sequence of raw visual observations in tabletop manipulation… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 8 pages

  43. arXiv:2403.10041  [pdf, other

    cs.RO cs.AI

    Towards Embedding Dynamic Personas in Interactive Robots: Masquerading Animated Social Kinematics (MASK)

    Authors: Jeongeun Park, Taemoon Jeong, Hyeonseong Kim, Taehyun Byun, Seungyoon Shin, Keunjun Choi, Jaewoon Kwon, Taeyoon Lee, Matthew Pan, Sungjoon Choi

    Abstract: This paper presents the design and development of an innovative interactive robotic system to enhance audience engagement using character-like personas. Built upon the foundations of persona-driven dialog agents, this work extends the agent application to the physical realm, employing robots to provide a more immersive and interactive experience. The proposed system, named the Masquerading Animate… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 4 pages, 3 figures

  44. arXiv:2403.09168  [pdf, other

    cs.HC

    VIVID: Human-AI Collaborative Authoring of Vicarious Dialogues from Lecture Videos

    Authors: Seulgi Choi, Hyewon Lee, Yoonjoo Lee, Juho Kim

    Abstract: The lengthy monologue-style online lectures cause learners to lose engagement easily. Designing lectures in a "vicarious dialogue" format can foster learners' cognitive activities more than monologue-style. However, designing online lectures in a dialogue style catered to the diverse needs of learners is laborious for instructors. We conducted a design workshop with eight educational experts and s… ▽ More

    Submitted 10 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  45. arXiv:2403.08061  [pdf, other

    cs.RO

    Gaze-based Human-Robot Interaction System for Infrastructure Inspections

    Authors: Sunwoong Choi, Zaid Abbas Al-Sabbag, Sriram Narasimhan, Chul Min Yeum

    Abstract: Routine inspections for critical infrastructures such as bridges are required in most jurisdictions worldwide. Such routine inspections are largely visual in nature, which are qualitative, subjective, and not repeatable. Although robotic infrastructure inspections address such limitations, they cannot replace the superior ability of experts to make decisions in complex situations, thus making huma… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 7 pages, 8 figures, 1 supplementary video; Accepted to the 2024 IEEE International Conference on Robotics and Automation (ICRA)

  46. arXiv:2403.07041  [pdf, other

    cs.LG cs.NE

    Ant Colony Sampling with GFlowNets for Combinatorial Optimization

    Authors: Minsu Kim, Sanghyeok Choi, Hyeonah Kim, Jiwoo Son, Jinkyoo Park, Yoshua Bengio

    Abstract: This paper introduces the Generative Flow Ant Colony Sampler (GFACS), a neural-guided probabilistic search algorithm for solving combinatorial optimization (CO). GFACS integrates generative flow networks (GFlowNets), an emerging amortized inference method, with ant colony optimization (ACO), a promising probabilistic search algorithm. Specifically, we use GFlowNets to learn a constructive policy i… ▽ More

    Submitted 22 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 23 pages, 5 figures

  47. arXiv:2403.04760  [pdf, other

    cs.HC cs.AI cs.CY cs.LG

    iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries

    Authors: Adam Coscia, Langdon Holmes, Wesley Morris, Joon Suh Choi, Scott Crossley, Alex Endert

    Abstract: The recent explosion in popularity of large language models (LLMs) has inspired learning engineers to incorporate them into adaptive educational tools that automatically score summary writing. Understanding and evaluating LLMs is vital before deploying them in critical learning environments, yet their unprecedented size and expanding number of parameters inhibits transparency and impedes trust whe… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted to IUI 2024. 16 pages, 5 figures, 1 table. For a demo video, see https://youtu.be/EYJX-_fQPf0 . For a live demo, visit https://adamcoscia.com/papers/iscore/demo/ . The source code is available at https://github.com/AdamCoscia/iScore

  48. arXiv:2403.01827  [pdf

    cs.NE cs.AI

    Analysis and Fully Memristor-based Reservoir Computing for Temporal Data Classification

    Authors: Ankur Singh, Sanghyeon Choi, Gunuk Wang, Maryaradhiya Daimari, Byung-Geun Lee

    Abstract: Reservoir computing (RC) offers a neuromorphic framework that is particularly effective for processing spatiotemporal signals. Known for its temporal processing prowess, RC significantly lowers training costs compared to conventional recurrent neural networks. A key component in its hardware deployment is the ability to generate dynamic reservoir states. Our research introduces a novel dual-memory… ▽ More

    Submitted 16 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 22 pages, 20 figures, Journal, Typo corrected and updated reference

  49. arXiv:2402.18362  [pdf, other

    cs.CV cs.AI

    Objective and Interpretable Breast Cosmesis Evaluation with Attention Guided Denoising Diffusion Anomaly Detection Model

    Authors: Sangjoon Park, Yong Bae Kim, Jee Suk Chang, Seo Hee Choi, Hyungjin Chung, Ik Jae Lee, Hwa Kyung Byun

    Abstract: As advancements in the field of breast cancer treatment continue to progress, the assessment of post-surgical cosmetic outcomes has gained increasing significance due to its substantial impact on patients' quality of life. However, evaluating breast cosmesis presents challenges due to the inherently subjective nature of expert labeling. In this study, we present a novel automated approach, Attenti… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  50. arXiv:2402.18046  [pdf, other

    cs.LG

    Data augmentation method for modeling health records with applications to clopidogrel treatment failure detection

    Authors: Sunwoong Choi, Samuel Kim

    Abstract: We present a novel data augmentation method to address the challenge of data scarcity in modeling longitudinal patterns in Electronic Health Records (EHR) of patients using natural language processing (NLP) algorithms. The proposed method generates augmented data by rearranging the orders of medical records within a visit where the order of elements are not obvious, if any. Applying the proposed m… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.08757