Skip to main content

Showing 1–50 of 226 results for author: Yon, J

  1. arXiv:2407.04345  [pdf, other

    cs.CV

    CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images

    Authors: Jisu Shin, Junmyeong Lee, Seongmin Lee, Min-Gyu Park, Ju-Mi Kang, Ju Hong Yoon, Hae-Gon Jeon

    Abstract: We present a novel framework for reconstructing animatable human avatars from multiple images, termed CanonicalFusion. Our central concept involves integrating individual reconstruction results into the canonical space. To be specific, we first predict Linear Blend Skinning (LBS) weight maps and depth maps using a shared-encoder-dual-decoder network, enabling direct canonicalization of the 3D mesh… ▽ More

    Submitted 15 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 Accepted (18 pages, 9 figures)

  2. arXiv:2406.18388  [pdf, other

    cs.RO cs.AI

    SAM: Semi-Active Mechanism for Extensible Continuum Manipulator and Real-time Hysteresis Compensation Control Algorithm

    Authors: Junhyun Park, Seonghyeok Jang, Myeongbo Park, Hyojae Park, Jeonghyeon Yoon, Minho Hwang

    Abstract: Cable-Driven Continuum Manipulators (CDCMs) enable scar-free procedures via natural orifices and improve target lesion accessibility through curved paths. However, CDCMs face limitations in workspace and control accuracy due to non-linear cable effects causing hysteresis. This paper introduces an extensible CDCM with a Semi-active Mechanism (SAM) to expand the workspace via translational motion wi… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 12 pages, 14 figures, 6 tables

  3. arXiv:2406.15659  [pdf, other

    cs.LG cs.MA

    Contextual Sprint Classification in Soccer Based on Deep Learning

    Authors: Hyunsung Kim, Gun-Hee Joe, Jinsung Yoon, Sang-Ki Ko

    Abstract: The analysis of high-intensity runs (or sprints) in soccer has long been a topic of interest for sports science researchers and practitioners. In particular, recent studies suggested contextualizing sprints based on their tactical purposes to better understand the physical-tactical requirements of modern match-play. However, they have a limitation in scalability, as human experts have to manually… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at IJCAI 2024 Workshop on Intelligent Technologies for Precision Sports Science (IT4PSS 2024)

  4. arXiv:2406.09716  [pdf, ps, other

    cs.CR cs.AI cs.DC cs.LG

    Speed-up of Data Analysis with Kernel Trick in Encrypted Domain

    Authors: Joon Soo Yoo, Baek Kyung Song, Tae Min Ahn, Ji Won Heo, Ji Won Yoon

    Abstract: Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performanc… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Submitted as a preprint

  5. arXiv:2406.05446  [pdf

    cs.CL cs.AI

    Design of reliable technology valuation model with calibrated machine learning of patent indicators

    Authors: Seunghyun Lee, Janghyeok Yoon, Jaewoong Choi

    Abstract: Machine learning (ML) has revolutionized the digital transformation of technology valuation by predicting the value of patents with high accuracy. However, the lack of validation regarding the reliability of these models hinders experts from fully trusting the confidence of model predictions. To address this issue, we propose an analytical framework for reliable technology valuation using calibrat… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  6. arXiv:2405.19209  [pdf, other

    cs.CV cs.AI cs.CL

    VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

    Authors: Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal

    Abstract: Video-language understanding tasks have focused on short video clips, often struggling with long-form video understanding tasks. Recently, many long video-language understanding approaches have leveraged the reasoning capabilities of Large Language Models (LLMs) to perform long video QA, transforming videos into densely sampled frame captions, and asking LLMs to respond to text queries over captio… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 20 pages, first three authors contributed equally; Project page: https://videotree2024.github.io/

  7. arXiv:2405.18406  [pdf, other

    cs.CV cs.AI cs.CL

    RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives

    Authors: Jaehong Yoon, Shoubin Yu, Mohit Bansal

    Abstract: Recent video generative models primarily rely on carefully written text prompts for specific tasks, like inpainting or style editing. They require labor-intensive textual descriptions for input videos, hindering their flexibility to adapt personal/raw videos to user specifications. This paper proposes RACCooN, a versatile and user-friendly video-to-paragraph-to-video generative framework that supp… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: The first two authors contribute equally. Project Page: https://raccoon-mllm-gen.github.io/

  8. arXiv:2405.14985  [pdf, other

    cs.SI physics.soc-ph

    Implicit degree bias in the link prediction task

    Authors: Rachith Aiyappa, Xin Wang, Munjung Kim, Ozgur Can Seckin, Jisung Yoon, Yong-Yeol Ahn, Sadamori Kojaku

    Abstract: Link prediction -- a task of distinguishing actual hidden edges from random unconnected node pairs -- is one of the quintessential tasks in graph machine learning. Despite being widely accepted as a universal benchmark and a downstream task for representation learning, the validity of the link prediction benchmark itself has been rarely questioned. Here, we show that the common edge sampling proce… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 13 pages, 3 figures

  9. arXiv:2405.01591  [pdf, other

    cs.CL cs.AI eess.IV

    Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model

    Authors: Seonhee Cho, Choonghan Kim, Jiho Lee, Chetan Chilkunda, Sujin Choi, Joo Heung Yoon

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have attracted interest in their generalization capability with only a few samples in the prompt. This progress is particularly relevant to the medical domain, where the quality and sensitivity of data pose unique challenges for model training and application. However, the dependency on high-quality data for effective in-context learning raises… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

    Comments: Under review

  10. arXiv:2405.00664  [pdf, other

    cs.CL cs.AI cs.LG

    Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3

    Authors: Junsang Yoon, Akshat Gupta, Gopala Anumanchipalli

    Abstract: This study presents a targeted model editing analysis focused on the latest large language model, Llama-3. We explore the efficacy of popular model editing techniques - ROME, MEMIT, and EMMET, which are designed for precise layer interventions. We identify the most effective layers for targeted edits through an evaluation that encompasses up to 4096 edits across three distinct strategies: sequenti… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  11. arXiv:2404.15814  [pdf, other

    cs.LG cs.AI

    Fast Ensembling with Diffusion Schrödinger Bridge

    Authors: Hyunsu Kim, Jongmin Yoon, Juho Lee

    Abstract: Deep Ensemble (DE) approach is a straightforward technique used to enhance the performance of deep neural networks by training them from different initial points, converging towards various local optima. However, a limitation of this methodology lies in its high computational overhead for inference, arising from the necessity to store numerous learned parameters and execute individual forward pass… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Journal ref: ICLR 2024

  12. arXiv:2404.09491  [pdf, other

    cs.LG

    Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

    Authors: Sungwon Han, Jinsung Yoon, Sercan O Arik, Tomas Pfister

    Abstract: Large Language Models (LLMs), with their remarkable ability to tackle challenging and unseen reasoning problems, hold immense potential for tabular learning, that is vital for many real-world applications. In this paper, we propose a novel in-context learning framework, FeatLLM, which employs LLMs as feature engineers to produce an input data set that is optimally suited for tabular predictions. T… ▽ More

    Submitted 6 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted to ICML, 2024

  13. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  14. arXiv:2404.01537  [pdf, other

    cs.RO

    Are Doppler Velocity Measurements Useful for Spinning Radar Odometry?

    Authors: Daniil Lisus, Keenan Burnett, David J. Yoon, Richard Poulton, John Marshall, Timothy D. Barfoot

    Abstract: Spinning, frequency-modulated continuous-wave (FMCW) radars with 360 degree coverage have been gaining popularity for autonomous-vehicle navigation. However, unlike 'fixed' automotive radar, commercially available spinning radar systems typically do not produce radial velocities due to the lack of repeated measurements in the same direction and the fundamental hardware setup. To make these radial… ▽ More

    Submitted 12 July, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 8 pages, 7 figures, 2 tables, submitted to Robotics and Automation Letters (RA-L)

  15. arXiv:2404.01524  [pdf, other

    cs.CV cs.AI

    On Train-Test Class Overlap and Detection for Image Retrieval

    Authors: Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang, Shunghyun Choi, Yeong Hyeon Gu, Yannis Avrithis

    Abstract: How important is it for training and evaluation sets to not have class overlap in image retrieval? We revisit Google Landmarks v2 clean, the most popular training set, by identifying and removing class overlap with Revisited Oxford and Paris [34], the most popular evaluation set. By comparing the original and the new RGLDv2-clean on a benchmark of reproduced state-of-the-art methods, our findings… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR2024 Accepted

  16. arXiv:2404.01156  [pdf, other

    cs.CV cs.AI

    SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining

    Authors: Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon, Shunghyun Choi, Yeong Hyeon Gu

    Abstract: Vision-language models (VLMs) have made significant strides in cross-modal understanding through large-scale paired datasets. However, in fashion domain, datasets often exhibit a disparity between the information conveyed in image and text. This issue stems from datasets containing multiple images of a single fashion item all paired with one text, leading to cases where some textual details are no… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR2024 Accepted

  17. arXiv:2404.00562  [pdf, other

    cs.CV

    Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

    Authors: Junuk Cha, Jihyeon Kim, Jae Shin Yoon, Seungryul Baek

    Abstract: This paper introduces the first text-guided work for generating the sequence of hand-object interaction in 3D. The main challenge arises from the lack of labeled data where existing ground-truth datasets are nowhere near generalizable in interaction type and object category, which inhibits the modeling of diverse 3D hand-object interaction with the correct physical implication (e.g., contacts and… ▽ More

    Submitted 1 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  18. arXiv:2403.12327  [pdf, other

    cs.CV cs.LG

    GT-Rain Single Image Deraining Challenge Report

    Authors: Howard Zhang, Yunhao Ba, Ethan Yang, Rishi Upadhyay, Alex Wong, Achuta Kadambi, Yun Guo, Xueyao Xiao, Xiaoxiong Wang, Yi Li, Yi Chang, Luxin Yan, Chaochao Zheng, Luping Wang, Bin Liu, Sunder Ali Khowaja, Jiseok Yoon, Ik-Hyun Lee, Zhao Zhang, Yanyan Wei, Jiahuan Ren, Suiyi Zhao, Huan Zheng

    Abstract: This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained o… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  19. arXiv:2403.12014  [pdf, other

    cs.CL cs.AI cs.LG

    EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents

    Authors: Abhay Zala, Jaemin Cho, Han Lin, Jaehong Yoon, Mohit Bansal

    Abstract: Recent SOTA approaches for embodied learning via interaction directly employ large language models (LLMs) as agents to determine the next steps in an environment. Due to their world knowledge and reasoning capabilities, LLM agents achieve stronger performance than previous smaller agents based on reinforcement learning (RL); however, frequently calling LLMs is slow and expensive. Instead of direct… ▽ More

    Submitted 12 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: COLM 2024; First two authors contributed equally; Project website: https://envgen-llm.github.io/

  20. arXiv:2403.06952  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

    Authors: Jialu Li, Jaemin Cho, Yi-Lin Sung, Jaehong Yoon, Mohit Bansal

    Abstract: Recent text-to-image (T2I) generation models have demonstrated impressive capabilities in creating images from text descriptions. However, these T2I generation models often fall short of generating images that precisely match the details of the text inputs, such as incorrect spatial relationship or missing objects. In this paper, we introduce SELMA: Skill-Specific Expert Learning and Merging with… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: First two authors contributed equally; Project website: https://selma-t2i.github.io/

  21. arXiv:2402.18866  [pdf, other

    cs.LG

    Dr. Strategy: Model-Based Generalist Agents with Strategic Dreaming

    Authors: Hany Hamed, Subin Kim, Dongyeong Kim, Jaesik Yoon, Sungjin Ahn

    Abstract: Model-based reinforcement learning (MBRL) has been a primary approach to ameliorating the sample efficiency issue as well as to make a generalist agent. However, there has not been much effort toward enhancing the strategy of dreaming itself. Therefore, it is a question whether and how an agent can "dream better" in a more structured and strategic way. In this paper, inspired by the observation fr… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: First two authors contributed equally

  22. arXiv:2402.16101  [pdf, other

    cs.RO

    Optimizing Base Placement of Surgical Robot: Kinematics Data-Driven Approach by Analyzing Working Pattern

    Authors: Jeonghyeon Yoon, Junhyun Park, Hyojae Park, Hakyoon Lee, Sangwon Lee, Minho Hwang

    Abstract: In robot-assisted minimally invasive surgery (RAMIS), optimal placement of the surgical robot base is crucial for successful surgery. Improper placement can hinder performance because of manipulator limitations and inaccessible workspaces. Conventional base placement relies on the experience of trained medical staff. This study proposes a novel method for determining the optimal base pose based on… ▽ More

    Submitted 10 April, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: 8 pages, 7 figures, 2 tables

  23. arXiv:2402.15160  [pdf, other

    cs.LG cs.AI

    Spatially-Aware Transformer for Embodied Agents

    Authors: Junmo Cho, Jaesik Yoon, Sungjin Ahn

    Abstract: Episodic memory plays a crucial role in various cognitive processes, such as the ability to mentally recall past events. While cognitive science emphasizes the significance of spatial context in the formation and retrieval of episodic memory, the current primary approach to implementing episodic memory in AI systems is through transformers that store temporally ordered experiences, which overlooks… ▽ More

    Submitted 29 February, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: ICLR 2024 Spotlight. First two authors contributed equally

  24. arXiv:2402.11604  [pdf, other

    cs.LG

    Self-evolving Autoencoder Embedded Q-Network

    Authors: J. Senthilnath, Bangjian Zhou, Zhen Wei Ng, Deeksha Aggarwal, Rajdeep Dutta, Ji Wei Yoon, Aye Phyu Phyu Aung, Keyu Wu, Min Wu, Xiaoli Li

    Abstract: In the realm of sequential decision-making tasks, the exploration capability of a reinforcement learning (RL) agent is paramount for achieving high rewards through interactions with the environment. To enhance this crucial ability, we propose SAQN, a novel approach wherein a self-evolving autoencoder (SA) is embedded with a Q-Network (QN). In SAQN, the self-evolving autoencoder architecture adapts… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: 11 pages, 9 figures, 3 tables

  25. arXiv:2402.08712  [pdf, other

    cs.LG cs.CV

    BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation

    Authors: Daeun Lee, Jaehong Yoon, Sung Ju Hwang

    Abstract: Continual Test Time Adaptation (CTTA) is required to adapt efficiently to continuous unseen domains while retaining previously learned knowledge. However, despite the progress of CTTA, it is still challenging to deploy the model with improved forgetting-adaptation trade-offs and efficiency. In addition, current CTTA scenarios assume only the disjoint situation, even though real-world domains are s… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML2024, 22 pages, Project page: https://becotta-ctta.github.io/

  26. arXiv:2402.06461  [pdf, other

    cs.LG cs.CV stat.ML

    Sequential Flow Straightening for Generative Modeling

    Authors: Jongmin Yoon, Juho Lee

    Abstract: Straightening the probability flow of the continuous-time generative models, such as diffusion models or flow-based models, is the key to fast sampling through the numerical solvers, existing methods learn a linear path by directly generating the probability path the joint distribution between the noise and data distribution. One key reason for the slow sampling speed of the ODE-based solvers that… ▽ More

    Submitted 14 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: 21 pages, 13 figures

  27. arXiv:2402.05889  [pdf, other

    cs.CV cs.AI cs.CL

    CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion

    Authors: Shoubin Yu, Jaehong Yoon, Mohit Bansal

    Abstract: Despite impressive advancements in recent multimodal reasoning approaches, they are still limited in flexibility and efficiency, as these models typically process only a few fixed modality inputs and require updates to numerous parameters. This paper tackles these critical challenges and proposes CREMA, a generalizable, highly efficient, and modular modality-fusion framework that can incorporate a… ▽ More

    Submitted 12 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: first two authors contributed equally. Project page: https://CREMA-VideoLLM.github.io/

  28. arXiv:2401.15938  [pdf, other

    cs.CV eess.SY

    Motion-induced error reduction for high-speed dynamic digital fringe projection system

    Authors: Sanghoon Jeon, Hyo-Geon Lee, Jae-Sung Lee, Bo-Min Kang, Byung-Wook Jeon, Jun Young Yoon, Jae-Sang Hyun

    Abstract: In phase-shifting profilometry (PSP), any motion during the acquisition of fringe patterns can introduce errors because it assumes both the object and measurement system are stationary. Therefore, we propose a method to pixel-wise reduce the errors when the measurement system is in motion due to a motorized linear stage. The proposed method introduces motion-induced error reduction algorithm, whic… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 9 pages, 7 figures

  29. arXiv:2401.13921  [pdf, other

    eess.AS cs.SD

    Intelli-Z: Toward Intelligible Zero-Shot TTS

    Authors: Sunghee Jung, Won Jang, Jaesam Yoon, Bongwan Kim

    Abstract: Although numerous recent studies have suggested new frameworks for zero-shot TTS using large-scale, real-world data, studies that focus on the intelligibility of zero-shot TTS are relatively scarce. Zero-shot TTS demands additional efforts to ensure clear pronunciation and speech quality due to its inherent requirement of replacing a core parameter (speaker embedding or acoustic prompt) with a new… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  30. arXiv:2401.13588  [pdf

    cs.CL cs.AI cs.SE

    Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes

    Authors: Darren Liu, Cheng Ding, Delgersuren Bold, Monique Bouvier, Jiaying Lu, Benjamin Shickel, Craig S. Jabaley, Wenhui Zhang, Soojin Park, Michael J. Young, Mark S. Wainwright, Gilles Clermont, Parisa Rashidi, Eric S. Rosenthal, Laurie Dimisko, Ran Xiao, Joo Heung Yoon, Carl Yang, Xiao Hu

    Abstract: The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in r… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  31. arXiv:2401.10529  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

    Authors: Xiyao Wang, Yuhang Zhou, Xiaoyu Liu, Hongjin Lu, Yuancheng Xu, Feihong He, Jaehong Yoon, Taixi Lu, Gedas Bertasius, Mohit Bansal, Huaxiu Yao, Furong Huang

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated proficiency in handling a variety of visual-language tasks. However, current MLLM benchmarks are predominantly designed to evaluate reasoning based on static information about a single image, and the ability of modern MLLMs to extrapolate from image sequences, which is essential for understanding our ever-changing world, has been less inve… ▽ More

    Submitted 24 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: 27 pages, 23 figures

  32. arXiv:2401.06415  [pdf, other

    cs.CV

    3D Reconstruction of Interacting Multi-Person in Clothing from a Single Image

    Authors: Junuk Cha, Hansol Lee, Jaewon Kim, Nhat Nguyen Bao Truong, Jae Shin Yoon, Seungryul Baek

    Abstract: This paper introduces a novel pipeline to reconstruct the geometry of interacting multi-person in clothing on a globally coherent scene space from a single image. The main challenge arises from the occlusion: a part of a human body is not visible from a single view due to the occlusion by others or the self, which introduces missing geometry and physical implausibility (e.g., penetration). We over… ▽ More

    Submitted 2 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted to WACV 2024

  33. arXiv:2312.16842  [pdf, other

    cs.CV

    Dynamic Appearance Modeling of Clothed 3D Human Avatars using a Single Camera

    Authors: Hansol Lee, Junuk Cha, Yunhoe Ku, Jae Shin Yoon, Seungryul Baek

    Abstract: The appearance of a human in clothing is driven not only by the pose but also by its temporal context, i.e., motion. However, such context has been largely neglected by existing monocular human modeling methods whose neural networks often struggle to learn a video of a person with large dynamics due to the motion ambiguity, i.e., there exist numerous geometric configurations of clothes that are de… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  34. arXiv:2312.13565  [pdf, other

    cs.LG cs.AI

    Automatic Curriculum Learning with Gradient Reward Signals

    Authors: Ryan Campbell, Junsang Yoon

    Abstract: This paper investigates the impact of using gradient norm reward signals in the context of Automatic Curriculum Learning (ACL) for deep reinforcement learning (DRL). We introduce a framework where the teacher model, utilizing the gradient norm information of a student model, dynamically adapts the learning curriculum. This approach is based on the hypothesis that gradient norms can provide a nuanc… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 11 pages, 15 figures

  35. arXiv:2312.11973  [pdf, other

    cs.CV cs.AI cs.LG

    Continual Learning: Forget-free Winning Subnetworks for Video Representations

    Authors: Haeyong Kang, Jaehong Yoon, Sung Ju Hwang, Chang D. Yoo

    Abstract: Inspired by the Lottery Ticket Hypothesis (LTH), which highlights the existence of efficient subnetworks within larger, dense networks, a high-performing Winning Subnetwork (WSN) in terms of task performance under appropriate sparsity conditions is considered for various continual learning tasks. It leverages pre-existing weights from dense networks to achieve efficient learning in Task Incrementa… ▽ More

    Submitted 2 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.14962, arXiv:2306.11305

  36. arXiv:2312.11884  [pdf, other

    cs.SE

    Teaching Software Ethics to Future Software Engineers

    Authors: Aastha Pant, Simone V. Spiegler, Rashina Hoda, Jeremy Yoon, Nabeeb Yusuf, Tian Er, Shenyi Hu

    Abstract: The importance of teaching software ethics to software engineering (SE) students is more critical now than ever before as software related ethical issues continue to impact society at an alarming rate. Traditional classroom methods, vignettes, role-play games, and quizzes have been employed over the years to teach SE students about software ethics. Recognising the significance of incorporating sof… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 11 pages, 7 figures, 2 tables

  37. arXiv:2312.06886  [pdf, other

    cs.CV

    Relightful Harmonization: Lighting-aware Portrait Background Replacement

    Authors: Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, He Zhang

    Abstract: Portrait harmonization aims to composite a subject into a new background, adjusting its lighting and color to ensure harmony with the background scene. Existing harmonization techniques often only focus on adjusting the global color and brightness of the foreground and ignore crucial illumination cues from the background such as apparent lighting direction, leading to unrealistic compositions. We… ▽ More

    Submitted 7 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 camera ready

  38. arXiv:2311.16538  [pdf, other

    cs.LG cs.CR

    Federated Learning with Diffusion Models for Privacy-Sensitive Vision Tasks

    Authors: Ye Lin Tun, Chu Myaet Thwal, Ji Su Yoon, Sun Moo Kang, Chaoning Zhang, Choong Seon Hong

    Abstract: Diffusion models have shown great potential for vision-related tasks, particularly for image generation. However, their training is typically conducted in a centralized manner, relying on data collected from publicly available sources. This approach may not be feasible or practical in many domains, such as the medical field, which involves privacy concerns over data collection. Despite the challen… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  39. arXiv:2311.10707  [pdf, other

    cs.LG cs.CV

    Multimodal Representation Learning by Alternating Unimodal Adaptation

    Authors: Xiaohui Zhang, Jaehong Yoon, Mohit Bansal, Huaxiu Yao

    Abstract: Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant than others during multimodal learning, resulting in suboptimal performance. To address this challenge, we propose MLA (Multimodal Learning with Alternating Uni… ▽ More

    Submitted 1 April, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR 2024

  40. arXiv:2311.08649  [pdf, other

    cs.SE cs.AI

    Autonomous Large Language Model Agents Enabling Intent-Driven Mobile GUI Testing

    Authors: Juyeon Yoon, Robert Feldt, Shin Yoo

    Abstract: GUI testing checks if a software system behaves as expected when users interact with its graphical interface, e.g., testing specific functionality or validating relevant use case scenarios. Currently, deciding what to test at this high level is a manual task since automated GUI testing tools target lower level adequacy metrics such as structural code coverage or activity coverage. We propose Droid… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 10 pages

  41. arXiv:2311.08106  [pdf, other

    cs.CL

    Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models

    Authors: Yujin Kim, Jaehong Yoon, Seonghyeon Ye, Sangmin Bae, Namgyu Ho, Sung Ju Hwang, Se-young Yun

    Abstract: The dynamic nature of knowledge in an ever-changing world presents challenges for language models trained on static data; the model in the real world often requires not only acquiring new knowledge but also overwriting outdated information into updated ones. To study the ability of language models for these time-dependent dynamics in human language, we introduce a novel task, EvolvingQA, a tempora… ▽ More

    Submitted 20 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 15 pages, 10 figures, 5 tables; accepted to NAACL 2024

  42. arXiv:2311.04532  [pdf, other

    cs.SE

    Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction

    Authors: Sungmin Kang, Juyeon Yoon, Nargiz Askarbekkyzy, Shin Yoo

    Abstract: Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly focused on crash bugs, which are easier to automatically detect and verify. In this work, we overcome this limitation by using large language models (LLMs), whi… ▽ More

    Submitted 8 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: This work is an extension of our prior work, available at arXiv:2209.11515

  43. arXiv:2311.02194  [pdf, other

    cs.LG cs.AI

    AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation

    Authors: Daiki E. Matsunaga, Jongmin Lee, Jaeseok Yoon, Stefanos Leonardos, Pieter Abbeel, Kee-Eung Kim

    Abstract: One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy. This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation. This challenge is amplified in the offline Multi-Agent RL (MARL) s… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: 31 pages, 12 figures, Accepted at NeurIPS 2023

  44. arXiv:2310.18688  [pdf, other

    cs.LG

    Clairvoyance: A Pipeline Toolkit for Medical Time Series

    Authors: Daniel Jarrett, Jinsung Yoon, Ioana Bica, Zhaozhi Qian, Ari Ercole, Mihaela van der Schaar

    Abstract: Time-series learning is the bread and butter of data-driven *clinical decision support*, and the recent explosion in ML research has demonstrated great potential in various healthcare settings. At the same time, medical time-series problems in the wild are challenging due to their highly *composite* nature: They entail design choices and interactions among components that preprocess data, impute m… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Journal ref: In Proc. 9th International Conference on Learning Representations (ICLR 2021)

  45. arXiv:2310.13229  [pdf, other

    cs.SE

    The GitHub Recent Bugs Dataset for Evaluating LLM-based Debugging Applications

    Authors: Jae Yong Lee, Sungmin Kang, Juyeon Yoon, Shin Yoo

    Abstract: Large Language Models (LLMs) have demonstrated strong natural language processing and code synthesis capabilities, which has led to their rapid adoption in software engineering applications. However, details about LLM training data are often not made public, which has caused concern as to whether existing bug benchmarks are included. In lieu of the training data for the popular GPT models, we exam… ▽ More

    Submitted 1 November, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

  46. arXiv:2310.11689  [pdf, other

    cs.CL cs.LG

    Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs

    Authors: Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan O Arik, Tomas Pfister, Somesh Jha

    Abstract: Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions wh… ▽ More

    Submitted 11 November, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Paper published at Findings of the Association for Computational Linguistics: EMNLP, 2023

  47. arXiv:2310.08750  [pdf, other

    cs.LG

    Search-Adaptor: Embedding Customization for Information Retrieval

    Authors: Jinsung Yoon, Sercan O Arik, Yanfei Chen, Tomas Pfister

    Abstract: Embeddings extracted by pre-trained Large Language Models (LLMs) have significant potential to improve information retrieval and search. Beyond the zero-shot setup in which they are being conventionally used, being able to take advantage of the information from the relevant query-corpus paired data can further boost the LLM capabilities. In this paper, we propose a novel method, Search-Adaptor, fo… ▽ More

    Submitted 12 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

  48. arXiv:2310.08598  [pdf, other

    eess.IV cs.AI cs.CV

    Domain Generalization for Medical Image Analysis: A Survey

    Authors: Jee Seok Yoon, Kwanseok Oh, Yooseung Shin, Maciej A. Mazurowski, Heung-Il Suk

    Abstract: Medical image analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, deploying DL models for MedIA in real-world situations remains challenging due to their failure to generalize across the distributional gap bet… ▽ More

    Submitted 15 February, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

  49. arXiv:2310.08204  [pdf, other

    cs.CV cs.LG

    STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment

    Authors: Jaewoo Lee, Jaehong Yoon, Wonjae Kim, Yunji Kim, Sung Ju Hwang

    Abstract: Continuously learning a variety of audio-video semantics over time is crucial for audio-related reasoning tasks in our ever-evolving world. However, this is a nontrivial problem and poses two critical challenges: sparse spatio-temporal correlation between audio-video pairs and multimodal correlation overwriting that forgets audio-video relations. To tackle this problem, we propose a new continual… ▽ More

    Submitted 28 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  50. arXiv:2310.07174  [pdf, other

    cs.LG stat.ML

    Generalized Neural Sorting Networks with Error-Free Differentiable Swap Functions

    Authors: Jungtaek Kim, Jeongbeen Yoon, Minsu Cho

    Abstract: Sorting is a fundamental operation of all computer systems, having been a long-standing significant research topic. Beyond the problem formulation of traditional sorting algorithms, we consider sorting problems for more abstract yet expressive inputs, e.g., multi-digit images and image fragments, through a neural sorting network. To learn a mapping from a high-dimensional input to an ordinal varia… ▽ More

    Submitted 13 March, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted at the 12th International Conference on Learning Representations (ICLR 2024)