Skip to main content

Showing 1–50 of 90 results for author: Seo, H

  1. arXiv:2407.07412  [pdf, other

    cs.CV cs.AI

    Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation

    Authors: Seonghoon Yu, Paul Hongsuck Seo, Jeany Son

    Abstract: We propose a new framework that automatically generates high-quality segmentation masks with their referring expressions as pseudo supervisions for referring image segmentation (RIS). These pseudo supervisions allow the training of any supervised RIS methods without the cost of manual labeling. To achieve this, we incorporate existing segmentation and image captioning foundation models, leveraging… ▽ More

    Submitted 15 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2406.08718  [pdf, other

    cs.CL

    Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations

    Authors: Jun-Woo Kim, Ji-Eun Han, Jun-Seok Koh, Hyeon-Tae Seo, Du-Seong Chang

    Abstract: We introduce a pipeline that leverages Large Language Models (LLMs) to transform single-turn psychotherapy counseling sessions into multi-turn interactions. While AI-supported online counseling services for individuals with mental disorders exist, they are often constrained by the limited availability of multi-turn training datasets and frequently fail to fully utilize therapists' expertise. Our p… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: IJCAI 2024 AI4Research workshop

  3. arXiv:2405.18581  [pdf, other

    cs.AI

    Unleashing the Potential of Text-attributed Graphs: Automatic Relation Decomposition via Large Language Models

    Authors: Hyunjin Seo, Taewon Kim, June Yong Yang, Eunho Yang

    Abstract: Recent advancements in text-attributed graphs (TAGs) have significantly improved the quality of node features by using the textual modeling capabilities of language models. Despite this success, utilizing text attributes to enhance the predefined graph structure remains largely unexplored. Our extensive analysis reveals that conventional edges on TAGs, treated as a single relation (e.g., hyperlink… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  4. arXiv:2405.06778  [pdf, other

    cs.CV cs.GR

    Shape Conditioned Human Motion Generation with Diffusion Model

    Authors: Kebing Xue, Hyewon Seo

    Abstract: Human motion synthesis is an important task in computer graphics and computer vision. While focusing on various conditioning signals such as text, action class, or audio to guide the generation process, most existing methods utilize skeleton-based pose representation, requiring additional skinning to produce renderable meshes. Given that human motion is a complex interplay of bones, joints, and mu… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  5. arXiv:2404.14664  [pdf, ps, other

    cs.LG cs.AI

    Employing Layerwised Unsupervised Learning to Lessen Data and Loss Requirements in Forward-Forward Algorithms

    Authors: Taewook Hwang, Hyein Seo, Sangkeun Jung

    Abstract: Recent deep learning models such as ChatGPT utilizing the back-propagation algorithm have exhibited remarkable performance. However, the disparity between the biological brain processes and the back-propagation algorithm has been noted. The Forward-Forward algorithm, which trains deep learning models solely through the forward pass, has emerged to address this. Although the Forward-Forward algorit… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 8 pages, 6 figures

  6. arXiv:2404.05144  [pdf, other

    cs.CL cs.CV cs.LG

    Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients

    Authors: HyoJe Jung, Yunha Kim, Heejung Choi, Hyeram Seo, Minkyoung Kim, JiYe Han, Gaeun Kee, Seohyun Park, Soyoung Ko, Byeolhee Kim, Suyeon Kim, Tae Joon Jun, Young-Hak Kim

    Abstract: Medical documentation, including discharge notes, is crucial for ensuring patient care quality, continuity, and effective medical communication. However, the manual creation of these documents is not only time-consuming but also prone to inconsistencies and potential errors. The automation of this documentation process using artificial intelligence (AI) represents a promising area of innovation in… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 10 pages, 1 figure, 3 tables, conference

  7. arXiv:2404.04544  [pdf, other

    cs.CV cs.AI

    BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

    Authors: Gwanghyun Kim, Hayeon Kim, Hoigi Seo, Dong Un Kang, Se Young Chun

    Abstract: Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Project page: https://janeyeon.github.io/beyond-scene

  8. arXiv:2404.03924  [pdf, other

    cs.CV

    Learning Correlation Structures for Vision Transformers

    Authors: Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid, Minsu Cho

    Abstract: We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention. StructSA generates attention maps by recognizing space-time structures of key-query correlations via convolution and uses them to dynamically aggregate local contexts of value features. This effectively leverages ri… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  9. arXiv:2404.03745  [pdf, other

    cs.HC cs.AI cs.CL

    Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations

    Authors: Mahjabin Nahar, Haeseung Seo, Eun-Ju Lee, Aiping Xiong, Dongwon Lee

    Abstract: The widespread adoption and transformative effects of large language models (LLMs) have sparked concerns regarding their capacity to produce inaccurate and fictitious content, referred to as `hallucinations'. Given the potential risks associated with hallucinations, humans should be able to identify them. This research aims to understand the human perception of LLM hallucinations by systematically… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  10. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  11. arXiv:2404.01339  [pdf, other

    cs.CL cs.AI cs.HC

    Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation

    Authors: Rohan Chaudhury, Mihir Godbole, Aakash Garg, Jinsil Hwaryoung Seo

    Abstract: Contemporary conversational systems often present a significant limitation: their responses lack the emotional depth and disfluent characteristic of human interactions. This absence becomes particularly noticeable when users seek more personalized and empathetic interactions. Consequently, this makes them seem mechanical and less relatable to human users. Recognizing this gap, we embarked on a jou… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 10 pages, 1 figure, for associated code and media files, see https://github.com/Rohan-Chaudhury/Humane-Speech-Synthesis-through-Zero-Shot-Emotion-and-Disfluency-Generation

  12. arXiv:2404.00930  [pdf, other

    cs.CL

    PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models

    Authors: Ji-Eun Han, Jun-Seok Koh, Hyeon-Tae Seo, Du-Seong Chang, Kyung-Ah Sohn

    Abstract: We present a novel end-to-end personality-based synthetic dialogue data generation pipeline, specifically designed to elicit responses from large language models via prompting. We design the prompts to generate more human-like dialogues considering real-world scenarios when users engage with chatbots. We introduce PSYDIAL, the first Korean dialogue dataset focused on personality-based dialogues, c… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: LREC-COLING 2024 Main

  13. arXiv:2403.19105  [pdf, ps, other

    cs.IT eess.SP

    Pilot Signal and Channel Estimator Co-Design for Hybrid-Field XL-MIMO

    Authors: Yoonseong Kang, Hyowoon Seo, Wan Choi

    Abstract: This paper addresses the intricate task of hybrid-field channel estimation in extremely large-scale MIMO (XL-MIMO) systems, critical for the progression of 6G communications. Within these systems, comprising a line-of-sight (LoS) channel component alongside far-field and near-field scattering channel components, our objective is to tackle the channel estimation challenge. We encounter two central… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  14. arXiv:2403.13756  [pdf, other

    cs.CV

    Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model

    Authors: Diwei Wang, Kun Yuan, Candice Muller, Frédéric Blanc, Nicolas Padoy, Hyewon Seo

    Abstract: We present a knowledge augmentation strategy for assessing the diagnostic groups and gait impairment from monocular gait videos. Based on a large-scale pre-trained Vision Language Model (VLM), our model learns and improves visual, textual, and numerical representations of patient gait videos, through a collective learning across three distinct modalities: gait videos, class-specific descriptions,… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  15. arXiv:2403.06841  [pdf, other

    cs.GR

    Inverse Garment and Pattern Modeling with a Differentiable Simulator

    Authors: Boyang Yu, Frederic Cordier, Hyewon Seo

    Abstract: The capability to generate simulation-ready garment models from 3D shapes of clothed humans will significantly enhance the interpretability of captured geometry of real garments, as well as their faithful reproduction in the virtual world. This will have notable impact on fields like shape capture in social VR, and virtual try-on in the fashion industry. To align with the garment modeling process… ▽ More

    Submitted 15 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  16. arXiv:2403.05093  [pdf, other

    cs.CV eess.IV

    Spectrum Translation for Refinement of Image Generation (STIG) Based on Contrastive Learning and Spectral Filter Profile

    Authors: Seokjun Lee, Seung-Won Jung, Hyunseok Seo

    Abstract: Currently, image generation and synthesis have remarkably progressed with generative models. Despite photo-realistic results, intrinsic discrepancies are still observed in the frequency domain. The spectral discrepancy appeared not only in generative adversarial networks but in diffusion models. In this study, we propose a framework to effectively mitigate the disparity in frequency domain of the… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted to AAAI 2024

  17. arXiv:2402.16774  [pdf, ps, other

    cs.CV

    Video-Based Autism Detection with Deep Learning

    Authors: M. Serna-Aguilera, X. B. Nguyen, A. Singh, L. Rockers, S. Park, L. Neely, H. Seo, K. Luu

    Abstract: Individuals with Autism Spectrum Disorder (ASD) often experience challenges in health, communication, and sensory processing; therefore, early diagnosis is necessary for proper treatment and care. In this work, we consider the problem of detecting or classifying ASD children to aid medical professionals in early diagnosis. We develop a deep learning model that analyzes video clips of children reac… ▽ More

    Submitted 30 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Poster Abstract. Accepted into 2024 IEEE Green Technologies Conference

  18. arXiv:2402.10457  [pdf, other

    cs.DS cs.LG

    Learning-Augmented Skip Lists

    Authors: Chunkai Fu, Jung Hoon Seo, Samson Zhou

    Abstract: We study the integration of machine learning advice into the design of skip lists to improve upon traditional data structure design. Given access to a possibly erroneous oracle that outputs estimated fractional frequencies for search queries on a set of items, we construct a skip list that provably provides the optimal expected search time, within nearly a factor of two. In fact, our learning-augm… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  19. arXiv:2402.09784  [pdf, other

    cs.IR cs.AI

    Sequential Recommendation on Temporal Proximities with Contrastive Learning and Self-Attention

    Authors: Hansol Jung, Hyunwoo Seo, Chiehyeon Lim

    Abstract: Sequential recommender systems identify user preferences from their past interactions to predict subsequent items optimally. Although traditional deep-learning-based models and modern transformer-based models in previous studies capture unidirectional and bidirectional patterns within user-item interactions, the importance of temporal contexts, such as individual behavioral and societal trend patt… ▽ More

    Submitted 17 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 10 pages, 9 figures

  20. arXiv:2402.04563  [pdf, other

    cs.CV cs.AI

    Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention

    Authors: Saebom Leem, Hyunseok Seo

    Abstract: Vision Transformer(ViT) is one of the most widely used models in the computer vision field with its great performance on various tasks. In order to fully utilize the ViT-based architecture in various applications, proper visualization methods with a decent localization performance are necessary, but these methods employed in CNN-based models are still not available in ViT due to its unique structu… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: AAAI2024. Code available at https://github.com/LeemSaebom/Attention-Guided-CAM-Visual-Explanations-of-Vision-Transformer-Guided-by-Self-Attention.git

  21. arXiv:2402.01261  [pdf, other

    cs.LG cs.AI

    TEDDY: Trimming Edges with Degree-based Discrimination strategY

    Authors: Hyunjin Seo, Jihun Yun, Eunho Yang

    Abstract: Since the pioneering work on the lottery ticket hypothesis for graph neural networks (GNNs) was proposed in Chen et al. (2021), the study on finding graph lottery tickets (GLT) has become one of the pivotal focus in the GNN community, inspiring researchers to discover sparser GLT while achieving comparable performance to original dense networks. In parallel, the graph structure has gained substant… ▽ More

    Submitted 15 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  22. arXiv:2312.04861  [pdf, other

    cs.CV cs.AI

    Exploring Radar Data Representations in Autonomous Driving: A Comprehensive Review

    Authors: Shanliang Yao, Runwei Guan, Zitian Peng, Chenhang Xu, Yilu Shi, Weiping Ding, Eng Gee Lim, Yong Yue, Hyungjoon Seo, Ka Lok Man, Jieming Ma, Xiaohui Zhu, Yutao Yue

    Abstract: With the rapid advancements of sensor technology and deep learning, autonomous driving systems are providing safe and efficient access to intelligent vehicles as well as intelligent transportation. Among these equipped sensors, the radar sensor plays a crucial role in providing robust perception information in diverse environmental conditions. This review focuses on exploring different radar data… ▽ More

    Submitted 19 April, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: 24 pages, 10 figures, 5 tables. arXiv admin note: text overlap with arXiv:2304.10410

  23. arXiv:2311.18654  [pdf, other

    cs.CV cs.AI

    Detailed Human-Centric Text Description-Driven Large Scene Synthesis

    Authors: Gwanghyun Kim, Dong Un Kang, Hoigi Seo, Hayeon Kim, Se Young Chun

    Abstract: Text-driven large scene image synthesis has made significant progress with diffusion models, but controlling it is challenging. While using additional spatial controls with corresponding texts has improved the controllability of large scene synthesis, it is still challenging to faithfully reflect detailed text descriptions without user-provided controls. Here, we propose DetText2Scene, a novel tex… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  24. arXiv:2310.16112  [pdf, other

    cs.CV

    Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge

    Authors: Gregory Holste, Yiliang Zhou, Song Wang, Ajay Jaiswal, Mingquan Lin, Sherry Zhuge, Yuzhe Yang, Dongkyun Kim, Trong-Hieu Nguyen-Mau, Minh-Triet Tran, Jaehyup Jeong, Wongi Park, Jongbin Ryu, Feng Hong, Arsh Verma, Yosuke Yamagishi, Changhyun Kim, Hyeryeong Seo, Myungjoo Kang, Leo Anthony Celi, Zhiyong Lu, Ronald M. Summers, George Shih, Zhangyang Wang, Yifan Peng

    Abstract: Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" $\unicode{x2013}$ there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of… ▽ More

    Submitted 1 April, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Update after major revision

  25. arXiv:2309.16936  [pdf, other

    cs.CV cs.AI cs.LG

    PC-Adapter: Topology-Aware Adapter for Efficient Domain Adaption on Point Clouds with Rectified Pseudo-label

    Authors: Joonhyung Park, Hyunjin Seo, Eunho Yang

    Abstract: Understanding point clouds captured from the real-world is challenging due to shifts in data distribution caused by varying object scales, sensor angles, and self-occlusion. Prior works have addressed this issue by combining recent learning principles such as self-supervised learning, self-training, and adversarial training, which leads to significant computational overhead.Toward succinct yet pow… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 11 pages; Accepted to ICCV 2023

  26. arXiv:2308.13564  [pdf, other

    econ.EM cs.LG math.ST stat.CO stat.ML

    SGMM: Stochastic Approximation to Generalized Method of Moments

    Authors: Xiaohong Chen, Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin, Myunghyun Song

    Abstract: We introduce a new class of algorithms, Stochastic Generalized Method of Moments (SGMM), for estimation and inference on (overidentified) moment restriction models. Our SGMM is a novel stochastic approximation alternative to the popular Hansen (1982) (offline) GMM, and offers fast and scalable implementation with the ability to handle streaming datasets in real time. We establish the almost sure c… ▽ More

    Submitted 30 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: 46 pages, 4 tables, 2 figures

  27. arXiv:2307.06505  [pdf, other

    cs.CV cs.RO

    WaterScenes: A Multi-Task 4D Radar-Camera Fusion Dataset and Benchmarks for Autonomous Driving on Water Surfaces

    Authors: Shanliang Yao, Runwei Guan, Zhaodong Wu, Yi Ni, Zile Huang, Ryan Wen Liu, Yong Yue, Weiping Ding, Eng Gee Lim, Hyungjoon Seo, Ka Lok Man, Jieming Ma, Xiaohui Zhu, Yutao Yue

    Abstract: Autonomous driving on water surfaces plays an essential role in executing hazardous and time-consuming missions, such as maritime surveillance, survivors rescue, environmental monitoring, hydrography mapping and waste cleaning. This work presents WaterScenes, the first multi-task 4D radar-camera fusion dataset for autonomous driving on water surfaces. Equipped with a 4D radar and a monocular camer… ▽ More

    Submitted 15 June, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems

  28. arXiv:2307.01753  [pdf, other

    astro-ph.CO cs.LG physics.comp-ph physics.data-an

    Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies

    Authors: Mehdi Rezaie, Ashley J. Ross, Hee-Jong Seo, Hui Kong, Anna Porredon, Lado Samushia, Edmond Chaussidon, Alex Krolewski, Arnaud de Mattia, Florian Beutler, Jessica Nicole Aguilar, Steven Ahlen, Shadab Alam, Santiago Avila, Benedict Bahr-Kalus, Jose Bermejo-Climent, David Brooks, Todd Claybaugh, Shaun Cole, Kyle Dawson, Axel de la Macorra, Peter Doel, Andreu Font-Ribera, Jaime E. Forero-Romero, Satya Gontcho A Gontcho , et al. (24 additional authors not shown)

    Abstract: We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter $\fnl$. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range $0.2< z < 1.35$. We identify Galactic extinction, survey depth, and astronomical seeing as the… ▽ More

    Submitted 25 June, 2024; v1 submitted 4 July, 2023; originally announced July 2023.

    Comments: 21 pages, 17 figures, 7 tables (Appendix excluded). Published in MNRAS

  29. arXiv:2306.06403  [pdf, other

    cs.IT cs.LG

    Bayesian Inverse Contextual Reasoning for Heterogeneous Semantics-Native Communication

    Authors: Hyowoon Seo, Yoonseong Kang, Mehdi Bennis, Wan Choi

    Abstract: This work deals with the heterogeneous semantic-native communication (SNC) problem. When agents do not share the same communication context, the effectiveness of contextual reasoning (CR) is compromised calling for agents to infer other agents' context. This article proposes a novel framework for solving the inverse problem of CR in SNC using two Bayesian inference methods, namely: Bayesian invers… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

    Comments: 14 pages, 7 figures, submitted for possible publication

  30. arXiv:2305.06310  [pdf, other

    cs.CV

    SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition

    Authors: Naga VS Raviteja Chappa, Pha Nguyen, Alexander H Nelson, Han-Seok Seo, Xin Li, Page Daniel Dobbs, Khoa Luu

    Abstract: This paper introduces a novel approach to Social Group Activity Recognition (SoGAR) using Self-supervised Transformers network that can effectively utilize unlabeled video data. To extract spatio-temporal information, we created local and global views with varying frame rates. Our self-supervised objective ensures that features extracted from contrasting views of the same video were consistent acr… ▽ More

    Submitted 28 August, 2023; v1 submitted 26 April, 2023; originally announced May 2023.

    Comments: Under review for PR journal; 32 pages, 7 figures. arXiv admin note: text overlap with arXiv:2303.12149

  31. arXiv:2304.10410  [pdf, other

    cs.CV cs.AI cs.RO

    Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review

    Authors: Shanliang Yao, Runwei Guan, Xiaoyu Huang, Zhuoxiao Li, Xiangyu Sha, Yong Yue, Eng Gee Lim, Hyungjoon Seo, Ka Lok Man, Xiaohui Zhu, Yutao Yue

    Abstract: Driven by deep learning techniques, perception technology in autonomous driving has developed rapidly in recent years, enabling vehicles to accurately detect and interpret surrounding environment for safe and efficient navigation. To achieve accurate and robust perception capabilities, autonomous vehicles are often equipped with multiple sensors, making sensor fusion a crucial part of the percepti… ▽ More

    Submitted 23 August, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: Accepted by IEEE Transactions on Intelligent Vehicles (T-IV)

    Journal ref: IEEE Transactions on Intelligent Vehicles 2023

  32. arXiv:2304.03195  [pdf, other

    cs.CV

    Micron-BERT: BERT-based Facial Micro-Expression Recognition

    Authors: Xuan-Bac Nguyen, Chi Nhan Duong, Xin Li, Susan Gauch, Han-Seok Seo, Khoa Luu

    Abstract: Micro-expression recognition is one of the most challenging topics in affective computing. It aims to recognize tiny facial movements difficult for humans to perceive in a brief period, i.e., 0.25 to 0.5 seconds. Recent advances in pre-training deep Bidirectional Transformers (BERT) have significantly improved self-supervised learning tasks in computer vision. However, the standard BERT in vision… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR2023

  33. arXiv:2304.02827  [pdf, other

    cs.CV cs.AI

    DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

    Authors: Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun

    Abstract: The increasing demand for high-quality 3D content creation has motivated the development of automated methods for creating 3D object models from a single image and/or from a text prompt. However, the reconstructed 3D objects using state-of-the-art image-to-3D methods still exhibit low correspondence to the given image and low multi-view consistency. Recent state-of-the-art text-to-3D methods are a… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Project page: https://janeyeon.github.io/ditto-nerf/

  34. arXiv:2303.17811  [pdf, other

    cs.CV cs.AI cs.CL

    Zero-shot Referring Image Segmentation with Global-Local Context Features

    Authors: Seonghoon Yu, Paul Hongsuck Seo, Jeany Son

    Abstract: Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. Collecting labelled datasets for this task, however, is notoriously costly and labor-intensive. To overcome this issue, we propose a simple yet effective zero-shot referring image segmentation method by leveraging the pre-trained cross-modal knowledge from CLIP.… ▽ More

    Submitted 3 April, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  35. 4D Facial Expression Diffusion Model

    Authors: Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, Hyewon Seo

    Abstract: Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this paper, we introduce a generative framework for generating 3D facial expression sequences (i.e. 4D faces) that can be conditioned on diff… ▽ More

    Submitted 15 April, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

  36. arXiv:2303.16501  [pdf, other

    cs.CV cs.SD eess.AS

    AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

    Authors: Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

    Abstract: Audiovisual automatic speech recognition (AV-ASR) aims to improve the robustness of a speech recognition system by incorporating visual information. Training fully supervised multimodal models for this task from scratch, however is limited by the need for large labelled audiovisual datasets (in each downstream domain of interest). We present AVFormer, a simple method for augmenting audio-only mode… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  37. arXiv:2303.14396  [pdf, other

    cs.CV cs.AI cs.LG

    IFSeg: Image-free Semantic Segmentation via Vision-Language Model

    Authors: Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, Jinwoo Shin

    Abstract: Vision-language (VL) pre-training has recently gained much attention for its transferability and flexibility in novel concepts (e.g., cross-modality transfer) across various visual tasks. However, VL-driven segmentation has been under-explored, and the existing approaches still have the burden of acquiring additional training images or even segmentation annotations to adapt a VL model to downstrea… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  38. arXiv:2303.12149  [pdf, other

    cs.CV

    SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition

    Authors: Naga VS Raviteja Chappa, Pha Nguyen, Alexander H Nelson, Han-Seok Seo, Xin Li, Page Daniel Dobbs, Khoa Luu

    Abstract: In this paper, we propose a new, simple, and effective Self-supervised Spatio-temporal Transformers (SPARTAN) approach to Group Activity Recognition (GAR) using unlabeled video data. Given a video, we create local and global Spatio-temporal views with varying spatial patch sizes and frame rates. The proposed self-supervised objective aims to match the features of these contrasting views representi… ▽ More

    Submitted 28 August, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPRW 2023; 11 pages, 5 figures

  39. arXiv:2303.11797  [pdf, other

    cs.CV

    CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

    Authors: Seokju Cho, Heeseong Shin, Sunghwan Hong, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim

    Abstract: Open-vocabulary semantic segmentation presents the challenge of labeling each pixel within an image based on a wide range of text descriptions. In this work, we introduce a novel cost-based approach to adapt vision-language foundation models, notably CLIP, for the intricate task of semantic segmentation. Through aggregating the cosine similarity score, i.e., the cost volume between image and text… ▽ More

    Submitted 31 March, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2024. Project page: https://ku-cvlab.github.io/CAT-Seg/

  40. arXiv:2302.14115  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

    Authors: Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic, Cordelia Schmid

    Abstract: In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning model pretrained on narrated videos which are readily-available at scale. The Vid2Seq architecture augments a language model with special time tokens, allowing it to seamlessly predict event boundaries and textual descriptions in the same output sequence. Such a unified model requires large-scale training data, w… ▽ More

    Submitted 21 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: CVPR 2023 Camera-Ready; Project Webpage: https://antoyang.github.io/vid2seq.html ; 18 pages; 6 figures

  41. arXiv:2301.04811  [pdf

    cs.CV eess.IV

    Deformation measurement of a soil mixing retaining wall using terrestrial laser scanning

    Authors: Yang Zhao, Lei Fan, Hyungjoon Seo

    Abstract: Retaining walls are often built to prevent excessive lateral movements of the ground surrounding an excavation site. During an excavation, failure of retaining walls could cause catastrophic accidents and hence their lateral deformations are monitored regularly. Laser scanning can rapidly acquire the spatial data of a relatively large area at fine spatial resolutions, which is ideal for monitoring… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

    Comments: 22 pages

    Journal ref: Lasers in Engineering: Volume 54, Number 1-3 (2023)

  42. arXiv:2211.09966  [pdf, ps, other

    cs.CV cs.MM cs.SD eess.AS eess.IV

    AVATAR submission to the Ego4D AV Transcription Challenge

    Authors: Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

    Abstract: In this report, we describe our submission to the Ego4D AudioVisual (AV) Speech Transcription Challenge 2022. Our pipeline is based on AVATAR, a state of the art encoder-decoder model for AV-ASR that performs early fusion of spectrograms and RGB images. We describe the datasets, experimental settings and ablations. Our final method achieves a WER of 68.40 on the challenge test set, outperforming t… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

  43. arXiv:2211.03313  [pdf

    cs.RO physics.med-ph

    Quasi-Static Analysis on Transoral Surgical Tendon-Driven Articulated Robot Units

    Authors: Hojin Seo, Yeoun-Jae Kim, Jaesoon Choi, Youngjin Moon

    Abstract: Wire actuation in tendon-driven continuum robots enables the transmission of force from a distance, but it is understood that tension control problems can arise when a pulley is used to actuate two cables in a push-pull mode. This paper analyzes the relationship between angle of rotation, pressure, as well as variables of a single continuum unit in a quasi-static equilibrium. The primary objective… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  44. arXiv:2209.09452  [pdf

    cs.LG cs.AI eess.SP

    SleePyCo: Automatic Sleep Scoring with Feature Pyramid and Contrastive Learning

    Authors: Seongju Lee, Yeonguk Yu, Seunghyeok Back, Hogeon Seo, Kyoobin Lee

    Abstract: Automatic sleep scoring is essential for the diagnosis and treatment of sleep disorders and enables longitudinal sleep tracking in home environments. Conventionally, learning-based automatic sleep scoring on single-channel electroencephalogram (EEG) is actively studied because obtaining multi-channel signals during sleep is difficult. However, learning representation from raw EEG signals is challe… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: 14 pages, 3 figures, 8 tables

  45. arXiv:2209.06915  [pdf, other

    cs.IT

    Predictive Closed-Loop Remote Control over Wireless Two-Way Split Koopman Autoencoder

    Authors: Abanoub M. Girgis, Hyowoon Seo, Jihong Park, Mehdi Bennis, Jinho Choi

    Abstract: Real-time remote control over wireless is an important-yet-challenging application in 5G and beyond due to its mission-critical nature under limited communication resources. Current solutions hinge on not only utilizing ultra-reliable and low-latency communication (URLLC) links but also predicting future states, which may consume enormous communication resources and struggle with a short predictio… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

  46. arXiv:2208.05642  [pdf, other

    cs.CV

    Self-Knowledge Distillation via Dropout

    Authors: Hyoje Lee, Yeachan Park, Hyun Seo, Myungjoo Kang

    Abstract: To boost the performance, deep neural networks require deeper or wider network structures that involve massive computational and memory costs. To alleviate this issue, the self-knowledge distillation method regularizes the model by distilling the internal knowledge of the model itself. Conventional self-knowledge distillation methods require additional trainable parameters or are dependent on the… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

    Comments: 11 pages

  47. arXiv:2206.07684  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    AVATAR: Unconstrained Audiovisual Speech Recognition

    Authors: Valentin Gabeur, Paul Hongsuck Seo, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid

    Abstract: Audio-visual automatic speech recognition (AV-ASR) is an extension of ASR that incorporates visual cues, often from the movements of a speaker's mouth. Unlike works that simply focus on the lip motion, we investigate the contribution of entire visual frames (visual actions, objects, background etc.). This is particularly useful for unconstrained videos, where the speaker is not necessarily visible… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

  48. arXiv:2206.04688  [pdf, other

    cs.LG

    A New Frontier of AI: On-Device AI Training and Personalization

    Authors: Ji Joong Moon, Hyun Suk Lee, Jiho Chu, Donghak Park, Seungbaek Hong, Hyungjun Seo, Donghyeon Jeong, Sungsik Kong, MyungJoo Ham

    Abstract: Modern consumer electronic devices have started executing deep learning-based intelligence services on devices, not cloud servers, to keep personal data on devices and to reduce network and cloud costs. We find such a trend as the opportunity to personalize intelligence services by updating neural networks with user data without exposing the data out of devices: on-device training. However, the li… ▽ More

    Submitted 4 January, 2024; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 12 pages, 16 figures, Accepted in ICSE 2024

  49. arXiv:2204.03863  [pdf, other

    eess.AS cs.CL

    Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning

    Authors: Eesung Kim, Jae-Jin Jeon, Hyeji Seo, Hoon Kim

    Abstract: Self-supervised learning (SSL) approaches such as wav2vec 2.0 and HuBERT models have shown promising results in various downstream tasks in the speech community. In particular, speech representations learned by SSL models have been shown to be effective for encoding various speech-related characteristics. In this context, we propose a novel automatic pronunciation assessment method based on SSL mo… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  50. arXiv:2204.00679  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Learning Audio-Video Modalities from Image Captions

    Authors: Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid

    Abstract: A major challenge in text-video and text-audio retrieval is the lack of large-scale training data. This is unlike image-captioning, where datasets are in the order of millions of samples. To close this gap we propose a new video mining pipeline which involves transferring captions from image captioning datasets to video clips with no additional manual effort. Using this pipeline, we create a new l… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.