Skip to main content

Showing 1–50 of 186 results for author: Woo, S

  1. arXiv:2407.11714  [pdf, other

    cs.CV

    Improving Unsupervised Video Object Segmentation via Fake Flow Generation

    Authors: Suhwan Cho, Minhyeok Lee, Jungho Lee, Donghyeong Kim, Seunghoon Lee, Sungmin Woo, Sangyoun Lee

    Abstract: Unsupervised video object segmentation (VOS), also known as video salient object detection, aims to detect the most prominent object in a video at the pixel level. Recently, two-stream approaches that leverage both RGB images and optical flow maps have gained significant attention. However, the limited amount of training data remains a substantial challenge. In this study, we propose a novel data… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.10784  [pdf, other

    cs.LG cs.AI stat.ML

    AdapTable: Test-Time Adaptation for Tabular Data via Shift-Aware Uncertainty Calibrator and Label Distribution Handler

    Authors: Changhun Kim, Taewon Kim, Seungyeon Woo, June Yong Yang, Eunho Yang

    Abstract: In real-world applications, tabular data often suffer from distribution shifts due to their widespread and abundant nature, leading to erroneous predictions of pre-trained machine learning models. However, addressing such distribution shifts in the tabular domain has been relatively underexplored due to unique challenges such as varying attributes and dataset sizes, as well as the limited represen… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  3. arXiv:2407.10399  [pdf, other

    cs.CV

    Exploring the Impact of Moire Pattern on Deepfake Detectors

    Authors: Razaib Tariq, Shahroz Tariq, Simon S. Woo

    Abstract: Deepfake detection is critical in mitigating the societal threats posed by manipulated videos. While various algorithms have been developed for this purpose, challenges arise when detectors operate externally, such as on smartphones, when users take a photo of deepfake images and upload on the Internet. One significant challenge in such scenarios is the presence of Moiré patterns, which degrade im… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 7 page, 4 figures, 1 table, Accepted for publication in IEEE International Conference on Image Processing (ICIP 2024)

  4. arXiv:2407.10277  [pdf, other

    cs.CV cs.AI cs.LG

    Disrupting Diffusion-based Inpainters with Semantic Digression

    Authors: Geonho Son, Juhun Lee, Simon S. Woo

    Abstract: The fabrication of visual misinformation on the web and social media has increased exponentially with the advent of foundational text-to-image diffusion models. Namely, Stable Diffusion inpainters allow the synthesis of maliciously inpainted images of personal and private figures, and copyrighted contents, also known as deepfakes. To combat such generations, a disruption framework, namely Photogua… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 16 pages, 13 figures, IJCAI 2024

  5. arXiv:2407.09303  [pdf, other

    cs.CV

    ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion

    Authors: Sungmin Woo, Wonjoon Lee, Woo Jin Kim, Dogyoon Lee, Sangyoun Lee

    Abstract: Self-supervised multi-frame monocular depth estimation relies on the geometric consistency between successive frames under the assumption of a static scene. However, the presence of moving objects in dynamic scenes introduces inevitable inconsistencies, causing misaligned multi-frame feature matching and misleading self-supervision during training. In this paper, we propose a novel framework calle… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project Page: https://sungmin-woo.github.io/prodepth/

  6. arXiv:2407.01073  [pdf, other

    cs.RO

    No More Potentially Dynamic Objects: Static Point Cloud Map Generation based on 3D Object Detection and Ground Projection

    Authors: Soojin Woo, Donghwi Jung, Seong-Woo Kim

    Abstract: In this paper, we propose an algorithm to generate a static point cloud map based on LiDAR point cloud data. Our proposed pipeline detects dynamic objects using 3D object detectors and projects points of dynamic objects onto the ground. Typically, point cloud data acquired in real-time serves as a snapshot of the surrounding areas containing both static objects and dynamic objects. The static obje… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2406.16860  [pdf, other

    cs.CV

    Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

    Authors: Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie

    Abstract: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Website at https://cambrian-mllm.github.io

  8. arXiv:2406.16344  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall

    Effective Elastic Properties of Multilayer Graphene

    Authors: Yun Hwangbo, Seong-jae Jeon, Young-Woo Son, Sungjong Woo

    Abstract: We present experimental measurements on Young's modulus and Grüneisen parameter of multilayer graphene with varying number of layers using in situ bulge tests corroborated by atomic force microscopy and Raman spectroscopy. Due to the experimental challenges posed by the significant disparity between intra and interlayer mechanical strengths, measuring conclusive elastic parameters are proven diffi… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 6 pages, 4 figures

  9. arXiv:2406.10315  [pdf

    cond-mat.mtrl-sci

    Quantum Confined Luminescence in Two dimensions

    Authors: Saiphaneendra Bachu, Fatimah Habis, Benjamin Huet, Steffi Y. Woo, Leixin Miao, Danielle Reifsnyder Hickey, Gwangwoo Kim, Nicholas Trainor, Kenji Watanabe, Takashi Taniguchi, Deep Jariwala, Joan M. Redwing, Yuanxi Wang, Mathieu Kociak, Luiz H. G. Tizei, Nasim Alem

    Abstract: Achieving localized light emission from monolayer two-dimensional (2D) transition metal dichalcogenides (TMDs) embedded in the matrix of another TMD has been theoretically proposed but not experimentally proven. In this study, we used cathodoluminescence performed in a scanning transmission electron microscope to unambiguously resolve localized light emission from 2D monolayer MoSe2 nanodots of va… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 40 pages total, 5 figures in main text and 12 figures in Supporting Information, submitted to Nature Materials as of 06/14/2024

  10. arXiv:2406.03365  [pdf

    cond-mat.mtrl-sci quant-ph

    Optical read and write of spin states in organic diradicals

    Authors: Rituparno Chowdhury, Petri Murto, Naitik A. Panjwani, Yan Sun, Pratyush Ghosh, Yorrick Boeije, Vadim Derkach, Seung-Je Woo, Oliver Millington, Daniel G. Congrave, Yao Fu, Tarig B. E. Mustafa, Miguel Monteverde, Jesús Cerdá, Jan Behrends, Akshay Rao, David Beljonne, Alexei Chepelianskii, Hugo Bronstein, Richard H. Friend

    Abstract: Optical control and read-out of the ground state spin structure has been demonstrated for defect states in crystalline semiconductors, including the diamond NV- center, and these are promising systems for quantum technologies. Molecular organic semiconductors offer synthetic control of spin placement, in contrast to current limitations in these crystalline systems. Here we report the discovery of… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  11. arXiv:2405.18012  [pdf, other

    cs.CV eess.IV

    Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition

    Authors: Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Jinyoung Park, Yooseung Wang, Donguk Kim, Changick Kim

    Abstract: Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net) for WSGAR, which consists of the motion-aware actor encoder to extract actor features and the two-pathways relation module to infer the interaction… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  12. arXiv:2405.17928  [pdf, other

    cs.CV

    Relational Self-supervised Distillation with Compact Descriptors for Image Copy Detection

    Authors: Juntae Kim, Sungwon Woo, Jongho Nang

    Abstract: Image copy detection is a task of detecting edited copies from any image within a reference database. While previous approaches have shown remarkable progress, the large size of their networks and descriptors remains disadvantage, complicating their practical application. In this paper, we propose a novel method that achieves a competitive performance by using a lightweight network and compact des… ▽ More

    Submitted 16 July, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    ACM Class: I.4.0; I.4.10

  13. arXiv:2405.17825  [pdf, other

    cs.CV cs.AI

    Diffusion Model Patching via Mixture-of-Prompts

    Authors: Seokil Ham, Sangmin Woo, Jin-Young Kim, Hyojun Go, Byeongjun Park, Changick Kim

    Abstract: We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen. The effectiveness of DMP is not merely due to the addition of parameters but stems from… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://sangminwoo.github.io/DMP/

  14. arXiv:2405.17821  [pdf, other

    cs.CV cs.AI

    RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs

    Authors: Sangmin Woo, Jaehyuk Jang, Donguk Kim, Yubin Choi, Changick Kim

    Abstract: Recent advancements in Large Vision Language Models (LVLMs) have revolutionized how machines understand and generate textual responses based on visual inputs. Despite their impressive capabilities, they often produce "hallucinatory" outputs that do not accurately reflect the visual information, posing challenges in reliability and trustworthiness. Current methods such as contrastive decoding have… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://sangminwoo.github.io/RITUAL/

  15. arXiv:2405.17820  [pdf, other

    cs.CV cs.AI

    Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models

    Authors: Sangmin Woo, Donguk Kim, Jaehyuk Jang, Yubin Choi, Changick Kim

    Abstract: This study addresses the issue observed in Large Vision Language Models (LVLMs), where excessive attention on a few image tokens, referred to as blind tokens, leads to hallucinatory responses in tasks requiring fine-grained understanding of visual objects. We found that tokens receiving lower attention weights often hold essential information for identifying nuanced object details -- ranging from… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://sangminwoo.github.io/AvisC/

  16. arXiv:2405.12360  [pdf

    physics.med-ph

    DCE-Qnet: Deep Network Quantification of Dynamic Contrast Enhanced (DCE) MRI

    Authors: Ouri Cohen, Soudabeh Kargar, Sungmin Woo, Alberto Vargas, Ricardo Otazo

    Abstract: Introduction: Quantification of dynamic contrast-enhanced (DCE)-MRI has the potential to provide valuable clinical information, but robust pharmacokinetic modeling remains a challenge for clinical adoption. Methods: A 7-layer neural network called DCE-Qnet was trained on simulated DCE-MRI signals derived from the Extended Tofts model with the Parker arterial input function. Network training inco… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  17. arXiv:2405.01934  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Impact of Architectural Modifications on Deep Learning Adversarial Robustness

    Authors: Firuz Juraev, Mohammed Abuhamad, Simon S. Woo, George K Thiruvathukal, Tamer Abuhmed

    Abstract: Rapid advancements of deep learning are accelerating adoption in a wide variety of applications, including safety-critical applications such as self-driving vehicles, drones, robots, and surveillance systems. These advancements include applying variations of sophisticated techniques that improve the performance of models. However, such models are not immune to adversarial manipulations, which can… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  18. arXiv:2404.14617  [pdf, other

    cs.AR

    TDRAM: Tag-enhanced DRAM for Efficient Caching

    Authors: Maryam Babaie, Ayaz Akram, Wendy Elsasser, Brent Haukness, Michael Miller, Taeksang Song, Thomas Vogelsang, Steven Woo, Jason Lowe-Power

    Abstract: As SRAM-based caches are hitting a scaling wall, manufacturers are integrating DRAM-based caches into system designs to continue increasing cache sizes. While DRAM caches can improve the performance of memory systems, existing DRAM cache designs suffer from high miss penalties, wasted data movement, and interference between misses and demand requests. In this paper, we propose TDRAM, a novel DRAM… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  19. arXiv:2403.20225  [pdf, other

    cs.CV

    MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark

    Authors: Sanghyun Woo, Kwanyong Park, Inkyu Shin, Myungchul Kim, In So Kweon

    Abstract: Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras. This task has practical applications in various fields, such as visual surveillance, crowd behavior analysis, and anomaly detection. However, due to the difficulty and cost of collecting and labeling data, existing datasets for this task are e… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted on CVPR 2024

  20. arXiv:2403.14113  [pdf, other

    cs.CV

    Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition

    Authors: Sumin Lee, Yooseung Wang, Sangmin Woo, Changick Kim

    Abstract: Panoramic Activity Recognition (PAR) seeks to identify diverse human activities across different scales, from individual actions to social group and global activities in crowded panoramic scenes. PAR presents two major challenges: 1) recognizing the nuanced interactions among numerous individuals and 2) understanding multi-granular human activities. To address these, we propose Social Proximity-aw… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  21. arXiv:2403.11582  [pdf, other

    cs.CV

    OurDB: Ouroboric Domain Bridging for Multi-Target Domain Adaptive Semantic Segmentation

    Authors: Seungbeom Woo, Geonwoo Baek, Taehoon Kim, Jaemin Na, Joong-won Hwang, Wonjun Hwang

    Abstract: Multi-target domain adaptation (MTDA) for semantic segmentation poses a significant challenge, as it involves multiple target domains with varying distributions. The goal of MTDA is to minimize the domain discrepancies among a single source and multi-target domains, aiming to train a single model that excels across all target domains. Previous MTDA approaches typically employ multiple teacher arch… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  22. arXiv:2403.09176  [pdf, other

    cs.CV

    Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

    Authors: Byeongjun Park, Hyojun Go, Jin-Young Kim, Sangmin Woo, Seokil Ham, Changick Kim

    Abstract: Diffusion models have achieved remarkable success across a range of generative tasks. Recent efforts to enhance diffusion model architectures have reimagined them as a form of multi-task learning, where each task corresponds to a denoising task at a specific noise level. While these efforts have focused on parameter isolation and task routing, they fall short of capturing detailed inter-task relat… ▽ More

    Submitted 10 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Project Page: https://byeongjun-park.github.io/Switch-DiT/

  23. arXiv:2403.04981  [pdf, other

    cs.ET

    Paving the Way for Pass Disturb Free Vertical NAND Storage via A Dedicated and String-Compatible Pass Gate

    Authors: Zijian Zhao, Sola Woo, Khandker Akif Aabrar, Sharadindu Gopal Kirtania, Zhouhang Jiang, Shan Deng, Yi Xiao, Halid Mulaosmanovic, Stefan Duenkel, Dominik Kleimaier, Steven Soss, Sven Beyer, Rajiv Joshi, Scott Meninger, Mohamed Mohamed, Kijoon Kim, Jongho Woo, Suhwan Lim, Kwangsoo Kim, Wanki Kim, Daewon Ha, Vijaykrishnan Narayanan, Suman Datta, Shimeng Yu, Kai Ni

    Abstract: In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 29 pages, 7 figures

  24. arXiv:2402.18848  [pdf, other

    cs.CV

    SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

    Authors: Hoon Kim, Minje Jang, Wonjun Yoon, Jisoo Lee, Donghyun Na, Sanghyun Woo

    Abstract: We introduce a co-designed approach for human portrait relighting that combines a physics-guided architecture with a pre-training framework. Drawing on the Cook-Torrance reflectance model, we have meticulously configured the architecture design to precisely simulate light-surface interactions. Furthermore, to overcome the limitation of scarce high-quality lightstage data, we have developed a self-… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: CVPR2024. Live demos available at https://www.beeble.ai/

  25. arXiv:2402.18817  [pdf, other

    cs.CV

    Gradient Alignment for Cross-Domain Face Anti-Spoofing

    Authors: Binh M. Le, Simon S. Woo

    Abstract: Recent advancements in domain generalization (DG) for face anti-spoofing (FAS) have garnered considerable attention. Traditional methods have focused on designing learning objectives and additional modules to isolate domain-specific features while retaining domain-invariant characteristics in their representations. However, such approaches often lack guarantees of consistent maintenance of domain-… ▽ More

    Submitted 11 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  26. arXiv:2402.18293  [pdf, other

    cs.CV

    Continuous Memory Representation for Anomaly Detection

    Authors: Joo Chan Lee, Taejune Kim, Eunbyung Park, Simon S. Woo, Jong Hwan Ko

    Abstract: There have been significant advancements in anomaly detection in an unsupervised manner, where only normal images are available for training. Several recent methods aim to detect anomalies based on a memory, comparing or reconstructing the input with directly stored normal features (or trained features with normal images). However, such memory-based approaches operate on a discrete feature space i… ▽ More

    Submitted 10 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Project page: https://tae-mo.github.io/crad/

  27. arXiv:2402.17812  [pdf, other

    cs.LG cs.CL

    DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation

    Authors: Sunghyeon Woo, Baeseong Park, Byeongwook Kim, Minjung Jo, Sejung Kwon, Dongsuk Jeon, Dongsoo Lee

    Abstract: Training deep neural networks typically involves substantial computational costs during both forward and backward propagation. The conventional layer dropping techniques drop certain layers during training for reducing the computations burden. However, dropping layers during forward propagation adversely affects the training process by degrading accuracy. In this paper, we propose Dropping Backwar… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  28. arXiv:2402.14183  [pdf, other

    eess.SY

    Parking of Connected Automated Vehicles: Vehicle Control, Parking Assignment, and Multi-agent Simulation

    Authors: Xu Shen, Yongkeun Choi, Alex Wong, Francesco Borrelli, Scott Moura, Soomin Woo

    Abstract: This paper introduces a novel approach to optimize the parking efficiency for fleets of Connected and Automated Vehicles (CAVs). We present a novel multi-vehicle parking simulator, equipped with hierarchical path planning and collision avoidance capabilities for individual CAVs. The simulator is designed to capture the key decision-making processes in parking, from low-level vehicle control to hig… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  29. arXiv:2401.17690  [pdf, other

    eess.AS cs.AI cs.SD

    EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

    Authors: Jaeyeon Kim, Jaeyoon Jung, Jinjoo Lee, Sang Hoon Woo

    Abstract: We propose EnCLAP, a novel framework for automated audio captioning. EnCLAP employs two acoustic representation models, EnCodec and CLAP, along with a pretrained language model, BART. We also introduce a new training objective called masked codec modeling that improves acoustic awareness of the pretrained language model. Experimental results on AudioCaps and Clotho demonstrate that our model surpa… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  30. arXiv:2401.16189  [pdf, other

    cs.CV cs.RO

    FIMP: Future Interaction Modeling for Multi-Agent Motion Prediction

    Authors: Sungmin Woo, Minjung Kim, Donghyeong Kim, Sungjun Jang, Sangyoun Lee

    Abstract: Multi-agent motion prediction is a crucial concern in autonomous driving, yet it remains a challenge owing to the ambiguous intentions of dynamic agents and their intricate interactions. Existing studies have attempted to capture interactions between road entities by using the definite data in history timesteps, as future information is not available and involves high uncertainty. However, without… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted by ICRA 2024

  31. arXiv:2401.04364  [pdf, other

    cs.CV cs.CR cs.LG

    SoK: Facial Deepfake Detectors

    Authors: Binh M. Le, Jiwon Kim, Shahroz Tariq, Kristen Moore, Alsharif Abuadbba, Simon S. Woo

    Abstract: Deepfakes have rapidly emerged as a profound and serious threat to society, primarily due to their ease of creation and dissemination. This situation has triggered an accelerated development of deepfake detection technologies. However, many existing detectors rely heavily on lab-generated datasets for validation, which may not effectively prepare them for novel, emerging, and real-world deepfake t… ▽ More

    Submitted 25 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: 18 pages, 6 figures, 5 table, under peer-review

  32. arXiv:2401.02113  [pdf, other

    cs.CV

    Source-Free Online Domain Adaptive Semantic Segmentation of Satellite Images under Image Degradation

    Authors: Fahim Faisal Niloy, Kishor Kumar Bhaumik, Simon S. Woo

    Abstract: Online adaptation to distribution shifts in satellite image segmentation stands as a crucial yet underexplored problem. In this paper, we address source-free and online domain adaptation, i.e., test-time adaptation (TTA), for satellite images, with the focus on mitigating distribution shifts caused by various forms of image degradation. Towards achieving this goal, we propose a novel TTA approach… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  33. arXiv:2312.16823  [pdf, other

    cs.LG cs.CR

    Layer Attack Unlearning: Fast and Accurate Machine Unlearning via Layer Level Attack and Knowledge Distillation

    Authors: Hyunjune Kim, Sangyong Lee, Simon S. Woo

    Abstract: Recently, serious concerns have been raised about the privacy issues related to training datasets in machine learning algorithms when including personal data. Various regulations in different countries, including the GDPR grant individuals to have personal data erased, known as 'the right to be forgotten' or 'the right to erasure'. However, there has been less research on effectively and practical… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  34. arXiv:2312.15980  [pdf, other

    cs.CV cs.AI

    HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

    Authors: Sangmin Woo, Byeongjun Park, Hyojun Go, Jin-Young Kim, Changick Kim

    Abstract: Recent progress in single-image 3D generation highlights the importance of multi-view coherency, leveraging 3D priors from large-scale diffusion models pretrained on Internet-scale images. However, the aspect of novel-view diversity remains underexplored within the research landscape due to the ambiguity in converting a 2D image into 3D content, where numerous potential shapes can emerge. Here, we… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Project page: https://byeongjun-park.github.io/HarmonyView/

  35. arXiv:2312.12807  [pdf, other

    cs.CV cs.AI

    All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models

    Authors: Seunghoo Hong, Juhun Lee, Simon S. Woo

    Abstract: Text-to-Image models such as Stable Diffusion have shown impressive image generation synthesis, thanks to the utilization of large-scale datasets. However, these datasets may contain sexually explicit, copyrighted, or undesirable content, which allows the model to directly generate them. Given that retraining these large models on individual concept deletion requests is infeasible, fine-tuning alg… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Main paper with supplementary materials

  36. Blind-Touch: Homomorphic Encryption-Based Distributed Neural Network Inference for Privacy-Preserving Fingerprint Authentication

    Authors: Hyunmin Choi, Simon Woo, Hyoungshick Kim

    Abstract: Fingerprint authentication is a popular security mechanism for smartphones and laptops. However, its adoption in web and cloud environments has been limited due to privacy concerns over storing and processing biometric data on servers. This paper introduces Blind-Touch, a novel machine learning-based fingerprint authentication system leveraging homomorphic encryption to address these privacy conce… ▽ More

    Submitted 1 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: The 38th Annual AAAI Conference on Artificial Intelligence (AAAI) 2024

  37. arXiv:2311.12344  [pdf, other

    cs.CV

    Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition

    Authors: Sumin Lee, Sangmin Woo, Muhammad Adi Nugroho, Changick Kim

    Abstract: Due to the distinctive characteristics of sensors, each modality exhibits unique physical properties. For this reason, in the context of multi-modal action recognition, it is important to consider not only the overall action content but also the complementary nature of different modalities. In this paper, we propose a novel network, named Modality Mixer (M-Mixer) network, which effectively leverag… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2208.11314

  38. arXiv:2311.07085  [pdf, other

    cond-mat.mes-hall

    Engineering 2D material exciton lineshape with graphene/h-BN encapsulation

    Authors: Steffi Y. Woo, Fuhui Shao, Ashish Arora, Robert Schneider, Nianjheng Wu, Andrew J. Mayne, Ching-Hwa Ho, Mauro Och, Cecilia Mattevi, Antoine Reserbat-Plantey, Alvaro Moreno, Hanan Herzig Sheinfux, Kenji Watanabe, Takashi Taniguchi, Steffen Michaelis de Vasconcellos, Frank H. L. Koppens, Zhichuan Niu, Odile Stéphan, Mathieu Kociak, F. Javier García de Abajo, Rudolf Bratschitsch, Andrea Konečná, Luiz H. G. Tizei

    Abstract: Control over the optical properties of atomically thin two-dimensional (2D) layers, including those of transition metal dichalcogenides (TMDs), is needed for future optoelectronic applications. Remarkable advances have been achieved through alloying, chemical and electrical doping, and applied strain. However, the integration of TMDs with other 2D materials in van der Waals heterostructures (vdWHs… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  39. arXiv:2310.16354  [pdf

    cs.AR

    RAMPART: RowHammer Mitigation and Repair for Server Memory Systems

    Authors: Steven C. Woo, Wendy Elsasser, Mike Hamburg, Eric Linstadt, Michael R. Miller, Taeksang Song, James Tringali

    Abstract: RowHammer attacks are a growing security and reliability concern for DRAMs and computer systems as they can induce many bit errors that overwhelm error detection and correction capabilities. System-level solutions are needed as process technology and circuit improvements alone are unlikely to provide complete protection against RowHammer attacks in the future. This paper introduces RAMPART, a nove… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 16 pages, 13 figures. A version of this paper will appear in the Proceedings of MEMSYS23

    ACM Class: B.3.1; B.3.4

  40. arXiv:2310.07138  [pdf, other

    cs.CV cs.AI

    Denoising Task Routing for Diffusion Models

    Authors: Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim

    Abstract: Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL). Despite the inherent connection between diffusion models and MTL, there remains an unexplored area in designing neural architectures that explicitly incorporate MTL into the framework of diffusion models. In this paper, we present Denoising… ▽ More

    Submitted 20 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  41. arXiv:2309.05911  [pdf, other

    cs.CV cs.AI

    Quality-Agnostic Deepfake Detection with Intra-model Collaborative Learning

    Authors: Binh M. Le, Simon S. Woo

    Abstract: Deepfake has recently raised a plethora of societal concerns over its possible security threats and dissemination of fake information. Much research on deepfake detection has been undertaken. However, detecting low quality as well as simultaneously detecting different qualities of deepfakes still remains a grave challenge. Most SOTA approaches are limited by using a single specific model for detec… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Journal ref: International Conference on Computer Vision 2023

  42. Towards Understanding of Deepfake Videos in the Wild

    Authors: Beomsang Cho, Binh M. Le, Jiwon Kim, Simon Woo, Shahroz Tariq, Alsharif Abuadbba, Kristen Moore

    Abstract: Deepfakes have become a growing concern in recent years, prompting researchers to develop benchmark datasets and detection algorithms to tackle the issue. However, existing datasets suffer from significant drawbacks that hamper their effectiveness. Notably, these datasets fail to encompass the latest deepfake videos produced by state-of-the-art methods that are being shared across various platform… ▽ More

    Submitted 6 September, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

    Journal ref: 32nd ACM International Conference on Information & Knowledge Management (CIKM), UK, 2023

  43. arXiv:2308.09322  [pdf, other

    cs.CV cs.AI cs.MM

    Audio-Visual Glance Network for Efficient Video Recognition

    Authors: Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Changick Kim

    Abstract: Deep learning has made significant strides in video understanding tasks, but the computation required to classify lengthy and massive videos using clip-level video classifiers remains impractical and prohibitively expensive. To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temp… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  44. arXiv:2307.11906  [pdf, other

    cs.CV cs.CR cs.LG

    Unveiling Vulnerabilities in Interpretable Deep Learning Systems with Query-Efficient Black-box Attacks

    Authors: Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed

    Abstract: Deep learning has been rapidly employed in many applications revolutionizing many industries, but it is known to be vulnerable to adversarial attacks. Such attacks pose a serious threat to deep learning-based systems compromising their integrity, reliability, and trust. Interpretable Deep Learning Systems (IDLSes) are designed to make the system more transparent and explainable, but they are also… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: text overlap with arXiv:2307.06496

  45. arXiv:2307.11052  [pdf, other

    cs.CV

    HRFNet: High-Resolution Forgery Network for Localizing Satellite Image Manipulation

    Authors: Fahim Faisal Niloy, Kishor Kumar Bhaumik, Simon S. Woo

    Abstract: Existing high-resolution satellite image forgery localization methods rely on patch-based or downsampling-based training. Both of these training methods have major drawbacks, such as inaccurate boundaries between pristine and forged regions, the generation of unwanted artifacts, etc. To tackle the aforementioned challenges, inspired by the high-resolution image segmentation literature, we propose… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: ICIP 2023

  46. arXiv:2307.06496  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Microbial Genetic Algorithm-based Black-box Attack against Interpretable Deep Learning Systems

    Authors: Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed

    Abstract: Deep learning models are susceptible to adversarial samples in white and black-box environments. Although previous studies have shown high attack success rates, coupling DNN models with interpretation models could offer a sense of security when a human expert is involved, who can identify whether a given sample is benign or malicious. However, in white-box environments, interpretable deep learning… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  47. arXiv:2307.03558  [pdf, other

    cs.RO

    We, Vertiport 6, are temporarily closed: Interactional Ontological Methods for Changing the Destination

    Authors: Seungwan Woo, Jeongseok Kim, Kangjin Kim

    Abstract: This paper presents a continuation of the previous research on the interaction between a human traffic manager and the UATMS. In particular, we focus on the automation of the process of handling a vertiport outage, which was partially covered in the previous work. Once the manager reports that a vertiport is out of service, which means landings for all corresponding agents are prohibited, the air… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: 8 pages, 1 figure, submitted to IEEERO-MAN (RO-MAN 2023) Workshop on Ontologies for Autonomous Robotics (RobOntics)

  48. arXiv:2306.15372  [pdf, other

    cond-mat.mtrl-sci

    Excitation's lifetime extracted from electron-photon (EELS-CL) nanosecond-scale temporal coincidences

    Authors: Nadezda Varkentina, Yves Auad, Steffi Y. Woo, Florian Castioni, Jean-Denis Blazit, Marcel Tencé, Huan-Cheng Chang, Jeson Chen, Kenji Watanabe, Takashi Taniguchi, Mathieu Kociak, Luiz H. G. Tizei

    Abstract: Electron-photon temporal correlations in electron energy loss (EELS) and cathodoluminescence (CL) spectroscopies have recently been used to measure the relative quantum efficiency of materials. This combined spectroscopy, named Cathodoluminescence excitation spectroscopy (CLE), allows the identification of excitation and decay channels which are hidden in average measurements. Here, we demonstrate… ▽ More

    Submitted 11 November, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  49. Integrating Psychometrics and Computing Perspectives on Bias and Fairness in Affective Computing: A Case Study of Automated Video Interviews

    Authors: Brandon M Booth, Louis Hickman, Shree Krishna Subburaj, Louis Tay, Sang Eun Woo, Sidney K. DMello

    Abstract: We provide a psychometric-grounded exposition of bias and fairness as applied to a typical machine learning pipeline for affective computing. We expand on an interpersonal communication framework to elucidate how to identify sources of bias that may arise in the process of inferring human emotions and other psychological constructs from observed behavior. Various methods and metrics for measuring… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: 21 pages, 4 figures

    Journal ref: IEEE Signal Processing Magazine 38.6 (2021): 84-95

  50. arXiv:2304.00450  [pdf, other

    cs.CV

    Sketch-based Video Object Localization

    Authors: Sangmin Woo, So-Yeong Jeon, Jinyoung Park, Minji Son, Sumin Lee, Changick Kim

    Abstract: We introduce Sketch-based Video Object Localization (SVOL), a new task aimed at localizing spatio-temporal object boxes in video queried by the input sketch. We first outline the challenges in the SVOL task and build the Sketch-Video Attention Network (SVANet) with the following design principles: (i) to consider temporal information of video and bridge the domain gap between sketch and video; (ii… ▽ More

    Submitted 29 November, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: WACV 2024; Code: https://github.com/sangminwoo/SVOL