Skip to main content

Showing 1–50 of 115 results for author: Woo, S

  1. arXiv:2407.01073  [pdf, other

    cs.RO

    No More Potentially Dynamic Objects: Static Point Cloud Map Generation based on 3D Object Detection and Ground Projection

    Authors: Soojin Woo, Donghwi Jung, Seong-Woo Kim

    Abstract: In this paper, we propose an algorithm to generate a static point cloud map based on LiDAR point cloud data. Our proposed pipeline detects dynamic objects using 3D object detectors and projects points of dynamic objects onto the ground. Typically, point cloud data acquired in real-time serves as a snapshot of the surrounding areas containing both static objects and dynamic objects. The static obje… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.16860  [pdf, other

    cs.CV

    Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

    Authors: Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie

    Abstract: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Website at https://cambrian-mllm.github.io

  3. arXiv:2405.18012  [pdf, other

    cs.CV eess.IV

    Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition

    Authors: Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Jinyoung Park, Yooseung Wang, Donguk Kim, Changick Kim

    Abstract: Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net) for WSGAR, which consists of the motion-aware actor encoder to extract actor features and the two-pathways relation module to infer the interaction… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  4. arXiv:2405.17928  [pdf, other

    cs.CV

    Relational Self-supervised Distillation with Compact Descriptors for Image Copy Detection

    Authors: Juntae Kim, Sungwon Woo, Jongho Nang

    Abstract: This paper addresses image copy detection, a task in online sharing platforms for copyright protection. While previous approaches have performed exceptionally well, the large size of their networks and descriptors remains a significant disadvantage, complicating their practical application. In this paper, we propose a novel method that achieves a competitive performance by using a lightweight netw… ▽ More

    Submitted 7 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 12 pages, 8 figures

    ACM Class: I.4.0; I.4.10

  5. arXiv:2405.17825  [pdf, other

    cs.CV cs.AI

    Diffusion Model Patching via Mixture-of-Prompts

    Authors: Seokil Ham, Sangmin Woo, Jin-Young Kim, Hyojun Go, Byeongjun Park, Changick Kim

    Abstract: We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen. The effectiveness of DMP is not merely due to the addition of parameters but stems from… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://sangminwoo.github.io/DMP/

  6. arXiv:2405.17821  [pdf, other

    cs.CV cs.AI

    RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs

    Authors: Sangmin Woo, Jaehyuk Jang, Donguk Kim, Yubin Choi, Changick Kim

    Abstract: Recent advancements in Large Vision Language Models (LVLMs) have revolutionized how machines understand and generate textual responses based on visual inputs. Despite their impressive capabilities, they often produce "hallucinatory" outputs that do not accurately reflect the visual information, posing challenges in reliability and trustworthiness. Current methods such as contrastive decoding have… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://sangminwoo.github.io/RITUAL/

  7. arXiv:2405.17820  [pdf, other

    cs.CV cs.AI

    Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models

    Authors: Sangmin Woo, Donguk Kim, Jaehyuk Jang, Yubin Choi, Changick Kim

    Abstract: This study addresses the issue observed in Large Vision Language Models (LVLMs), where excessive attention on a few image tokens, referred to as blind tokens, leads to hallucinatory responses in tasks requiring fine-grained understanding of visual objects. We found that tokens receiving lower attention weights often hold essential information for identifying nuanced object details -- ranging from… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://sangminwoo.github.io/AvisC/

  8. arXiv:2405.01934  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Impact of Architectural Modifications on Deep Learning Adversarial Robustness

    Authors: Firuz Juraev, Mohammed Abuhamad, Simon S. Woo, George K Thiruvathukal, Tamer Abuhmed

    Abstract: Rapid advancements of deep learning are accelerating adoption in a wide variety of applications, including safety-critical applications such as self-driving vehicles, drones, robots, and surveillance systems. These advancements include applying variations of sophisticated techniques that improve the performance of models. However, such models are not immune to adversarial manipulations, which can… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  9. arXiv:2404.14617  [pdf, other

    cs.AR

    TDRAM: Tag-enhanced DRAM for Efficient Caching

    Authors: Maryam Babaie, Ayaz Akram, Wendy Elsasser, Brent Haukness, Michael Miller, Taeksang Song, Thomas Vogelsang, Steven Woo, Jason Lowe-Power

    Abstract: As SRAM-based caches are hitting a scaling wall, manufacturers are integrating DRAM-based caches into system designs to continue increasing cache sizes. While DRAM caches can improve the performance of memory systems, existing DRAM cache designs suffer from high miss penalties, wasted data movement, and interference between misses and demand requests. In this paper, we propose TDRAM, a novel DRAM… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  10. arXiv:2403.20225  [pdf, other

    cs.CV

    MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark

    Authors: Sanghyun Woo, Kwanyong Park, Inkyu Shin, Myungchul Kim, In So Kweon

    Abstract: Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras. This task has practical applications in various fields, such as visual surveillance, crowd behavior analysis, and anomaly detection. However, due to the difficulty and cost of collecting and labeling data, existing datasets for this task are e… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted on CVPR 2024

  11. arXiv:2403.14113  [pdf, other

    cs.CV

    Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition

    Authors: Sumin Lee, Yooseung Wang, Sangmin Woo, Changick Kim

    Abstract: Panoramic Activity Recognition (PAR) seeks to identify diverse human activities across different scales, from individual actions to social group and global activities in crowded panoramic scenes. PAR presents two major challenges: 1) recognizing the nuanced interactions among numerous individuals and 2) understanding multi-granular human activities. To address these, we propose Social Proximity-aw… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  12. arXiv:2403.11582  [pdf, other

    cs.CV

    OurDB: Ouroboric Domain Bridging for Multi-Target Domain Adaptive Semantic Segmentation

    Authors: Seungbeom Woo, Geonwoo Baek, Taehoon Kim, Jaemin Na, Joong-won Hwang, Wonjun Hwang

    Abstract: Multi-target domain adaptation (MTDA) for semantic segmentation poses a significant challenge, as it involves multiple target domains with varying distributions. The goal of MTDA is to minimize the domain discrepancies among a single source and multi-target domains, aiming to train a single model that excels across all target domains. Previous MTDA approaches typically employ multiple teacher arch… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  13. arXiv:2403.09176  [pdf, other

    cs.CV

    Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

    Authors: Byeongjun Park, Hyojun Go, Jin-Young Kim, Sangmin Woo, Seokil Ham, Changick Kim

    Abstract: Diffusion models have achieved remarkable success across a range of generative tasks. Recent efforts to enhance diffusion model architectures have reimagined them as a form of multi-task learning, where each task corresponds to a denoising task at a specific noise level. While these efforts have focused on parameter isolation and task routing, they fall short of capturing detailed inter-task relat… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Project Page: https://byeongjun-park.github.io/Switch-DiT/

  14. arXiv:2403.04981  [pdf, other

    cs.ET

    Paving the Way for Pass Disturb Free Vertical NAND Storage via A Dedicated and String-Compatible Pass Gate

    Authors: Zijian Zhao, Sola Woo, Khandker Akif Aabrar, Sharadindu Gopal Kirtania, Zhouhang Jiang, Shan Deng, Yi Xiao, Halid Mulaosmanovic, Stefan Duenkel, Dominik Kleimaier, Steven Soss, Sven Beyer, Rajiv Joshi, Scott Meninger, Mohamed Mohamed, Kijoon Kim, Jongho Woo, Suhwan Lim, Kwangsoo Kim, Wanki Kim, Daewon Ha, Vijaykrishnan Narayanan, Suman Datta, Shimeng Yu, Kai Ni

    Abstract: In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 29 pages, 7 figures

  15. arXiv:2402.18848  [pdf, other

    cs.CV

    SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

    Authors: Hoon Kim, Minje Jang, Wonjun Yoon, Jisoo Lee, Donghyun Na, Sanghyun Woo

    Abstract: We introduce a co-designed approach for human portrait relighting that combines a physics-guided architecture with a pre-training framework. Drawing on the Cook-Torrance reflectance model, we have meticulously configured the architecture design to precisely simulate light-surface interactions. Furthermore, to overcome the limitation of scarce high-quality lightstage data, we have developed a self-… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: CVPR2024. Live demos available at https://www.beeble.ai/

  16. arXiv:2402.18817  [pdf, other

    cs.CV

    Gradient Alignment for Cross-Domain Face Anti-Spoofing

    Authors: Binh M. Le, Simon S. Woo

    Abstract: Recent advancements in domain generalization (DG) for face anti-spoofing (FAS) have garnered considerable attention. Traditional methods have focused on designing learning objectives and additional modules to isolate domain-specific features while retaining domain-invariant characteristics in their representations. However, such approaches often lack guarantees of consistent maintenance of domain-… ▽ More

    Submitted 11 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  17. arXiv:2402.18293  [pdf, other

    cs.CV

    Continuous Memory Representation for Anomaly Detection

    Authors: Joo Chan Lee, Taejune Kim, Eunbyung Park, Simon S. Woo, Jong Hwan Ko

    Abstract: There have been significant advancements in anomaly detection in an unsupervised manner, where only normal images are available for training. Several recent methods aim to detect anomalies based on a memory, comparing or reconstructing the input with directly stored normal features (or trained features with normal images). However, such memory-based approaches operate on a discrete feature space i… ▽ More

    Submitted 10 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Project page: https://tae-mo.github.io/crad/

  18. arXiv:2402.17812  [pdf, other

    cs.LG cs.CL

    DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation

    Authors: Sunghyeon Woo, Baeseong Park, Byeongwook Kim, Minjung Jo, Sejung Kwon, Dongsuk Jeon, Dongsoo Lee

    Abstract: Training deep neural networks typically involves substantial computational costs during both forward and backward propagation. The conventional layer dropping techniques drop certain layers during training for reducing the computations burden. However, dropping layers during forward propagation adversely affects the training process by degrading accuracy. In this paper, we propose Dropping Backwar… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  19. arXiv:2401.17690  [pdf, other

    eess.AS cs.AI cs.SD

    EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

    Authors: Jaeyeon Kim, Jaeyoon Jung, Jinjoo Lee, Sang Hoon Woo

    Abstract: We propose EnCLAP, a novel framework for automated audio captioning. EnCLAP employs two acoustic representation models, EnCodec and CLAP, along with a pretrained language model, BART. We also introduce a new training objective called masked codec modeling that improves acoustic awareness of the pretrained language model. Experimental results on AudioCaps and Clotho demonstrate that our model surpa… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  20. arXiv:2401.16189  [pdf, other

    cs.CV cs.RO

    FIMP: Future Interaction Modeling for Multi-Agent Motion Prediction

    Authors: Sungmin Woo, Minjung Kim, Donghyeong Kim, Sungjun Jang, Sangyoun Lee

    Abstract: Multi-agent motion prediction is a crucial concern in autonomous driving, yet it remains a challenge owing to the ambiguous intentions of dynamic agents and their intricate interactions. Existing studies have attempted to capture interactions between road entities by using the definite data in history timesteps, as future information is not available and involves high uncertainty. However, without… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted by ICRA 2024

  21. arXiv:2401.04364  [pdf, other

    cs.CV cs.CR cs.LG

    SoK: Facial Deepfake Detectors

    Authors: Binh M. Le, Jiwon Kim, Shahroz Tariq, Kristen Moore, Alsharif Abuadbba, Simon S. Woo

    Abstract: Deepfakes have rapidly emerged as a profound and serious threat to society, primarily due to their ease of creation and dissemination. This situation has triggered an accelerated development of deepfake detection technologies. However, many existing detectors rely heavily on lab-generated datasets for validation, which may not effectively prepare them for novel, emerging, and real-world deepfake t… ▽ More

    Submitted 25 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: 18 pages, 6 figures, 5 table, under peer-review

  22. arXiv:2401.02113  [pdf, other

    cs.CV

    Source-Free Online Domain Adaptive Semantic Segmentation of Satellite Images under Image Degradation

    Authors: Fahim Faisal Niloy, Kishor Kumar Bhaumik, Simon S. Woo

    Abstract: Online adaptation to distribution shifts in satellite image segmentation stands as a crucial yet underexplored problem. In this paper, we address source-free and online domain adaptation, i.e., test-time adaptation (TTA), for satellite images, with the focus on mitigating distribution shifts caused by various forms of image degradation. Towards achieving this goal, we propose a novel TTA approach… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  23. arXiv:2312.16823  [pdf, other

    cs.LG cs.CR

    Layer Attack Unlearning: Fast and Accurate Machine Unlearning via Layer Level Attack and Knowledge Distillation

    Authors: Hyunjune Kim, Sangyong Lee, Simon S. Woo

    Abstract: Recently, serious concerns have been raised about the privacy issues related to training datasets in machine learning algorithms when including personal data. Various regulations in different countries, including the GDPR grant individuals to have personal data erased, known as 'the right to be forgotten' or 'the right to erasure'. However, there has been less research on effectively and practical… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  24. arXiv:2312.15980  [pdf, other

    cs.CV cs.AI

    HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

    Authors: Sangmin Woo, Byeongjun Park, Hyojun Go, Jin-Young Kim, Changick Kim

    Abstract: Recent progress in single-image 3D generation highlights the importance of multi-view coherency, leveraging 3D priors from large-scale diffusion models pretrained on Internet-scale images. However, the aspect of novel-view diversity remains underexplored within the research landscape due to the ambiguity in converting a 2D image into 3D content, where numerous potential shapes can emerge. Here, we… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Project page: https://byeongjun-park.github.io/HarmonyView/

  25. arXiv:2312.12807  [pdf, other

    cs.CV cs.AI

    All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models

    Authors: Seunghoo Hong, Juhun Lee, Simon S. Woo

    Abstract: Text-to-Image models such as Stable Diffusion have shown impressive image generation synthesis, thanks to the utilization of large-scale datasets. However, these datasets may contain sexually explicit, copyrighted, or undesirable content, which allows the model to directly generate them. Given that retraining these large models on individual concept deletion requests is infeasible, fine-tuning alg… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Main paper with supplementary materials

  26. Blind-Touch: Homomorphic Encryption-Based Distributed Neural Network Inference for Privacy-Preserving Fingerprint Authentication

    Authors: Hyunmin Choi, Simon Woo, Hyoungshick Kim

    Abstract: Fingerprint authentication is a popular security mechanism for smartphones and laptops. However, its adoption in web and cloud environments has been limited due to privacy concerns over storing and processing biometric data on servers. This paper introduces Blind-Touch, a novel machine learning-based fingerprint authentication system leveraging homomorphic encryption to address these privacy conce… ▽ More

    Submitted 1 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: The 38th Annual AAAI Conference on Artificial Intelligence (AAAI) 2024

  27. arXiv:2311.12344  [pdf, other

    cs.CV

    Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition

    Authors: Sumin Lee, Sangmin Woo, Muhammad Adi Nugroho, Changick Kim

    Abstract: Due to the distinctive characteristics of sensors, each modality exhibits unique physical properties. For this reason, in the context of multi-modal action recognition, it is important to consider not only the overall action content but also the complementary nature of different modalities. In this paper, we propose a novel network, named Modality Mixer (M-Mixer) network, which effectively leverag… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2208.11314

  28. arXiv:2310.16354  [pdf

    cs.AR

    RAMPART: RowHammer Mitigation and Repair for Server Memory Systems

    Authors: Steven C. Woo, Wendy Elsasser, Mike Hamburg, Eric Linstadt, Michael R. Miller, Taeksang Song, James Tringali

    Abstract: RowHammer attacks are a growing security and reliability concern for DRAMs and computer systems as they can induce many bit errors that overwhelm error detection and correction capabilities. System-level solutions are needed as process technology and circuit improvements alone are unlikely to provide complete protection against RowHammer attacks in the future. This paper introduces RAMPART, a nove… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 16 pages, 13 figures. A version of this paper will appear in the Proceedings of MEMSYS23

    ACM Class: B.3.1; B.3.4

  29. arXiv:2310.07138  [pdf, other

    cs.CV cs.AI

    Denoising Task Routing for Diffusion Models

    Authors: Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim

    Abstract: Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL). Despite the inherent connection between diffusion models and MTL, there remains an unexplored area in designing neural architectures that explicitly incorporate MTL into the framework of diffusion models. In this paper, we present Denoising… ▽ More

    Submitted 20 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  30. arXiv:2309.05911  [pdf, other

    cs.CV cs.AI

    Quality-Agnostic Deepfake Detection with Intra-model Collaborative Learning

    Authors: Binh M. Le, Simon S. Woo

    Abstract: Deepfake has recently raised a plethora of societal concerns over its possible security threats and dissemination of fake information. Much research on deepfake detection has been undertaken. However, detecting low quality as well as simultaneously detecting different qualities of deepfakes still remains a grave challenge. Most SOTA approaches are limited by using a single specific model for detec… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Journal ref: International Conference on Computer Vision 2023

  31. Towards Understanding of Deepfake Videos in the Wild

    Authors: Beomsang Cho, Binh M. Le, Jiwon Kim, Simon Woo, Shahroz Tariq, Alsharif Abuadbba, Kristen Moore

    Abstract: Deepfakes have become a growing concern in recent years, prompting researchers to develop benchmark datasets and detection algorithms to tackle the issue. However, existing datasets suffer from significant drawbacks that hamper their effectiveness. Notably, these datasets fail to encompass the latest deepfake videos produced by state-of-the-art methods that are being shared across various platform… ▽ More

    Submitted 6 September, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

    Journal ref: 32nd ACM International Conference on Information & Knowledge Management (CIKM), UK, 2023

  32. arXiv:2308.09322  [pdf, other

    cs.CV cs.AI cs.MM

    Audio-Visual Glance Network for Efficient Video Recognition

    Authors: Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Changick Kim

    Abstract: Deep learning has made significant strides in video understanding tasks, but the computation required to classify lengthy and massive videos using clip-level video classifiers remains impractical and prohibitively expensive. To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temp… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  33. arXiv:2307.11906  [pdf, other

    cs.CV cs.CR cs.LG

    Unveiling Vulnerabilities in Interpretable Deep Learning Systems with Query-Efficient Black-box Attacks

    Authors: Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed

    Abstract: Deep learning has been rapidly employed in many applications revolutionizing many industries, but it is known to be vulnerable to adversarial attacks. Such attacks pose a serious threat to deep learning-based systems compromising their integrity, reliability, and trust. Interpretable Deep Learning Systems (IDLSes) are designed to make the system more transparent and explainable, but they are also… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: text overlap with arXiv:2307.06496

  34. arXiv:2307.11052  [pdf, other

    cs.CV

    HRFNet: High-Resolution Forgery Network for Localizing Satellite Image Manipulation

    Authors: Fahim Faisal Niloy, Kishor Kumar Bhaumik, Simon S. Woo

    Abstract: Existing high-resolution satellite image forgery localization methods rely on patch-based or downsampling-based training. Both of these training methods have major drawbacks, such as inaccurate boundaries between pristine and forged regions, the generation of unwanted artifacts, etc. To tackle the aforementioned challenges, inspired by the high-resolution image segmentation literature, we propose… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: ICIP 2023

  35. arXiv:2307.06496  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Microbial Genetic Algorithm-based Black-box Attack against Interpretable Deep Learning Systems

    Authors: Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed

    Abstract: Deep learning models are susceptible to adversarial samples in white and black-box environments. Although previous studies have shown high attack success rates, coupling DNN models with interpretation models could offer a sense of security when a human expert is involved, who can identify whether a given sample is benign or malicious. However, in white-box environments, interpretable deep learning… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  36. arXiv:2307.03558  [pdf, other

    cs.RO

    We, Vertiport 6, are temporarily closed: Interactional Ontological Methods for Changing the Destination

    Authors: Seungwan Woo, Jeongseok Kim, Kangjin Kim

    Abstract: This paper presents a continuation of the previous research on the interaction between a human traffic manager and the UATMS. In particular, we focus on the automation of the process of handling a vertiport outage, which was partially covered in the previous work. Once the manager reports that a vertiport is out of service, which means landings for all corresponding agents are prohibited, the air… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: 8 pages, 1 figure, submitted to IEEERO-MAN (RO-MAN 2023) Workshop on Ontologies for Autonomous Robotics (RobOntics)

  37. Integrating Psychometrics and Computing Perspectives on Bias and Fairness in Affective Computing: A Case Study of Automated Video Interviews

    Authors: Brandon M Booth, Louis Hickman, Shree Krishna Subburaj, Louis Tay, Sang Eun Woo, Sidney K. DMello

    Abstract: We provide a psychometric-grounded exposition of bias and fairness as applied to a typical machine learning pipeline for affective computing. We expand on an interpersonal communication framework to elucidate how to identify sources of bias that may arise in the process of inferring human emotions and other psychological constructs from observed behavior. Various methods and metrics for measuring… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: 21 pages, 4 figures

    Journal ref: IEEE Signal Processing Magazine 38.6 (2021): 84-95

  38. arXiv:2304.00450  [pdf, other

    cs.CV

    Sketch-based Video Object Localization

    Authors: Sangmin Woo, So-Yeong Jeon, Jinyoung Park, Minji Son, Sumin Lee, Changick Kim

    Abstract: We introduce Sketch-based Video Object Localization (SVOL), a new task aimed at localizing spatio-temporal object boxes in video queried by the input sketch. We first outline the challenges in the SVOL task and build the Sketch-Video Attention Network (SVANet) with the following design principles: (i) to consider temporal information of video and bridge the domain gap between sketch and video; (ii… ▽ More

    Submitted 29 November, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: WACV 2024; Code: https://github.com/sangminwoo/SVOL

  39. arXiv:2303.11793  [pdf, other

    cs.CV

    Bridging Optimal Transport and Jacobian Regularization by Optimal Trajectory for Enhanced Adversarial Defense

    Authors: Binh M. Le, Shahroz Tariq, Simon S. Woo

    Abstract: Deep neural networks, particularly in vision tasks, are notably susceptible to adversarial perturbations. To overcome this challenge, developing a robust classifier is crucial. In light of the recent advancements in the robustness of classifiers, we delve deep into the intricacies of adversarial training and Jacobian regularization, two pivotal defenses. Our work is the first carefully analyzes an… ▽ More

    Submitted 12 February, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

  40. arXiv:2303.09779  [pdf, other

    cs.CV

    Bidirectional Domain Mixup for Domain Adaptive Semantic Segmentation

    Authors: Daehan Kim, Minseok Seo, Kwanyong Park, Inkyu Shin, Sanghyun Woo, In-So Kweon, Dong-Geol Choi

    Abstract: Mixup provides interpolated training samples and allows the model to obtain smoother decision boundaries for better generalization. The idea can be naturally applied to the domain adaptation task, where we can mix the source and target samples to obtain domain-mixed samples for better adaptation. However, the extension of the idea from classification to segmentation (i.e., structured output) is no… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

    Comments: 10 pages, 3 figures, Accepted on AAAI 2023

  41. Why Do Facial Deepfake Detectors Fail?

    Authors: Binh Le, Shahroz Tariq, Alsharif Abuadbba, Kristen Moore, Simon Woo

    Abstract: Recent rapid advancements in deepfake technology have allowed the creation of highly realistic fake media, such as video, image, and audio. These materials pose significant challenges to human authentication, such as impersonation, misinformation, or even a threat to national security. To keep pace with these rapid advancements, several deepfake detection algorithms have been proposed, leading to… ▽ More

    Submitted 10 September, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

    Comments: 5 pages, ACM ASIACCS 2023

  42. arXiv:2301.04333  [pdf, other

    cs.LG cs.AI

    Learnable Path in Neural Controlled Differential Equations

    Authors: Sheo Yon Jhin, Minju Jo, Seungji Kook, Noseong Park, Sungpil Woo, Sunhwan Lim

    Abstract: Neural controlled differential equations (NCDEs), which are continuous analogues to recurrent neural networks (RNNs), are a specialized model in (irregular) time-series processing. In comparison with similar models, e.g., neural ordinary differential equations (NODEs), the key distinctive characteristics of NCDEs are i) the adoption of the continuous path created by an interpolation algorithm from… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

    Comments: Accepted by AAAI 2023

  43. arXiv:2301.00808  [pdf, other

    cs.CV

    ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

    Authors: Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie

    Abstract: Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can a… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

    Comments: Code and models available at https://github.com/facebookresearch/ConvNeXt-V2

  44. arXiv:2212.10149  [pdf, other

    cs.CV

    Tracking by Associating Clips

    Authors: Sanghyun Woo, Kwanyong Park, Seoung Wug Oh, In So Kweon, Joon-Young Lee

    Abstract: The tracking-by-detection paradigm today has become the dominant method for multi-object tracking and works by detecting objects in each frame and then performing data association across frames. However, its sequential frame-wise matching property fundamentally suffers from the intermediate interruptions in a video, such as object occlusions, fast camera movements, and abrupt light changes. Moreov… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: ECCV 2022

  45. arXiv:2212.10147  [pdf, other

    cs.CV

    Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection

    Authors: Sanghyun Woo, Kwanyong Park, Seoung Wug Oh, In So Kweon, Joon-Young Lee

    Abstract: Scaling object taxonomies is one of the important steps toward a robust real-world deployment of recognition systems. We have faced remarkable progress in images since the introduction of the LVIS benchmark. To continue this success in videos, a new video benchmark, TAO, was recently presented. Given the recent encouraging results from both detection and tracking communities, we are interested in… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: ECCV 2022

  46. arXiv:2212.08356  [pdf, other

    cs.CV

    Test-time Adaptation in the Dynamic World with Compound Domain Knowledge Management

    Authors: Junha Song, Kwanyong Park, InKyu Shin, Sanghyun Woo, Chaoning Zhang, In So Kweon

    Abstract: Prior to the deployment of robotic systems, pre-training the deep-recognition models on all potential visual cases is infeasible in practice. Hence, test-time adaptation (TTA) allows the model to adapt itself to novel environments and improve its performance during test time (i.e., lifelong adaptation). Several works for TTA have shown promising adaptation performances in continuously changing env… ▽ More

    Submitted 15 April, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: 8 pages

  47. arXiv:2212.08355  [pdf, other

    cs.CV

    Learning Classifiers of Prototypes and Reciprocal Points for Universal Domain Adaptation

    Authors: Sungsu Hur, Inkyu Shin, Kwanyong Park, Sanghyun Woo, In So Kweon

    Abstract: Universal Domain Adaptation aims to transfer the knowledge between the datasets by handling two shifts: domain-shift and category-shift. The main challenge is correctly distinguishing the unknown target samples while adapting the distribution of known class knowledge from source to target. Most existing methods approach this problem by first training the target adapted known classifier and then re… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted at WACV 2023

  48. arXiv:2212.04761  [pdf, other

    cs.CV

    Leveraging Spatio-Temporal Dependency for Skeleton-Based Action Recognition

    Authors: Jungho Lee, Minhyeok Lee, Suhwan Cho, Sungmin Woo, Sungjun Jang, Sangyoun Lee

    Abstract: Skeleton-based action recognition has attracted considerable attention due to its compact representation of the human body's skeletal sructure. Many recent methods have achieved remarkable performance using graph convolutional networks (GCNs) and convolutional neural networks (CNNs), which extract spatial and temporal features, respectively. Although spatial and temporal dependencies in the human… ▽ More

    Submitted 18 July, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: Accepted by ICCV 2023

  49. arXiv:2212.04548  [pdf, other

    cs.LG

    STLGRU: Spatio-Temporal Lightweight Graph GRU for Traffic Flow Prediction

    Authors: Kishor Kumar Bhaumik, Fahim Faisal Niloy, Saif Mahmud, Simon Woo

    Abstract: Reliable forecasting of traffic flow requires efficient modeling of traffic data. Indeed, different correlations and influences arise in a dynamic traffic network, making modeling a complicated task. Existing literature has proposed many different methods to capture traffic networks' complex underlying spatial-temporal relations. However, given the heterogeneity of traffic data, consistently captu… ▽ More

    Submitted 19 February, 2024; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: PAKDD 2024 (Oral)

  50. arXiv:2211.15926  [pdf, other

    cs.CR cs.CV cs.LG

    Interpretations Cannot Be Trusted: Stealthy and Effective Adversarial Perturbations against Interpretable Deep Learning

    Authors: Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed

    Abstract: Deep learning methods have gained increased attention in various applications due to their outstanding performance. For exploring how this high performance relates to the proper use of data artifacts and the accurate problem formulation of a given task, interpretation models have become a crucial component in developing deep learning-based systems. Interpretation models enable the understanding of… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.