Skip to main content

Showing 1–50 of 66 results for author: Heo, J

  1. arXiv:2407.11859  [pdf, other

    cs.CV

    Mitigating Background Shift in Class-Incremental Semantic Segmentation

    Authors: Gilhan Park, WonJun Moon, SuBeen Lee, Tae-Young Kim, Jae-Pil Heo

    Abstract: Class-Incremental Semantic Segmentation(CISS) aims to learn new classes without forgetting the old ones, using only the labels of the new classes. To achieve this, two popular strategies are employed: 1) pseudo-labeling and knowledge distillation to preserve prior knowledge; and 2) background weight transfer, which leverages the broad coverage of background in learning new classes by transferring… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. Code is available at http://github.com/RoadoneP/ECCV2024_MBS

  2. arXiv:2406.09716  [pdf, ps, other

    cs.CR cs.AI cs.DC cs.LG

    Speed-up of Data Analysis with Kernel Trick in Encrypted Domain

    Authors: Joon Soo Yoo, Baek Kyung Song, Tae Min Ahn, Ji Won Heo, Ji Won Yoon

    Abstract: Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performanc… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Submitted as a preprint

  3. arXiv:2406.07103  [pdf, other

    eess.AS cs.AI

    MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms

    Authors: Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-Jin Yu

    Abstract: In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, accepted by Interspeech 2024

  4. arXiv:2406.02968  [pdf, other

    cs.CV

    Adversarial Generation of Hierarchical Gaussians for 3D Generative Model

    Authors: Sangeek Hyun, Jae-Pil Heo

    Abstract: Most advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation. In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its effi… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Project page: https://hse1032.github.io/gsgan

  5. arXiv:2406.00410  [pdf, other

    cs.LG cs.AI

    Posterior Label Smoothing for Node Classification

    Authors: Jaeseung Heo, Moonjeong Park, Dongwoo Kim

    Abstract: Soft labels can improve the generalization of a neural network classifier in many domains, such as image classification. Despite its success, the current literature has overlooked the efficiency of label smoothing in node classification with graph-structured data. In this work, we propose a simple yet effective label smoothing for the transductive node classification task. We design the soft label… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  6. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  7. arXiv:2403.13548  [pdf, other

    cs.CV

    Diversity-aware Channel Pruning for StyleGAN Compression

    Authors: Jiwoo Chung, Sangeek Hyun, Sang-Heon Shim, Jae-Pil Heo

    Abstract: StyleGAN has shown remarkable performance in unconditional image generation. However, its high computational cost poses a significant challenge for practical applications. Although recent efforts have been made to compress StyleGAN while preserving its performance, existing compressed models still lag behind the original model, particularly in terms of sample diversity. To overcome this, we propos… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. Project page: https://jiwoogit.github.io/DCP-GAN_site

  8. arXiv:2403.10543  [pdf, other

    cs.SI cs.LG

    Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs

    Authors: MoonJeong Park, Jaeseung Heo, Dongwoo Kim

    Abstract: Graph Neural Network (GNN) resembles the diffusion process, leading to the over-smoothing of learned representations when stacking many layers. Hence, the reverse process of message passing can produce the distinguishable node representations by inverting the forward message propagation. The distinguishable representations can help us to better classify neighboring nodes with different labels, suc… ▽ More

    Submitted 11 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by ICML 2024

  9. arXiv:2403.07907  [pdf

    cs.CY cs.CR

    Reflection of Federal Data Protection Standards on Cloud Governance

    Authors: Olga Dye, Justin Heo, Ebru Celikel Cankaya

    Abstract: As demand for more storage and processing power increases rapidly, cloud services in general are becoming more ubiquitous and popular. This, in turn, is increasing the need for developing highly sophisticated mechanisms and governance to reduce data breach risks in cloud-based infrastructures. Our research focuses on cloud governance by harmoniously combining multiple data security measures with l… ▽ More

    Submitted 26 February, 2024; originally announced March 2024.

    ACM Class: F.2.2, I.2.7

  10. arXiv:2401.01259  [pdf, other

    cs.LG cs.AI

    Do Concept Bottleneck Models Obey Locality?

    Authors: Naveen Raman, Mateo Espinosa Zarlenga, Juyeon Heo, Mateja Jamnik

    Abstract: Concept-based methods explain model predictions using human-understandable concepts. These models require accurate concept predictors, yet the faithfulness of existing concept predictors to their underlying concepts is unclear. In this paper, we investigate the faithfulness of Concept Bottleneck Models (CBMs), a popular family of concept-based architectures, by looking at whether they respect "loc… ▽ More

    Submitted 28 May, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

    Comments: Previous Version Accepted at NeurIPs 23 XAI in Action Workshop

  11. arXiv:2312.17526  [pdf, other

    cs.CV

    Noise-free Optimization in Early Training Steps for Image Super-Resolution

    Authors: MinKyu Lee, Jae-Pil Heo

    Abstract: Recent deep-learning-based single image super-resolution (SISR) methods have shown impressive performance whereas typical methods train their networks by minimizing the pixel-wise distance with respect to a given high-resolution (HR) image. However, despite the basic training scheme being the predominant choice, its use in the context of ill-posed inverse problems has not been thoroughly investiga… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024. Codes are available at github.com/2minkyulee/ECO

  12. arXiv:2312.16580  [pdf, other

    cs.CV

    VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting

    Authors: Seunggu Kang, WonJun Moon, Euiyeon Kim, Jae-Pil Heo

    Abstract: Zero-Shot Object Counting (ZSOC) aims to count referred instances of arbitrary classes in a query image without human-annotated exemplars. To deal with ZSOC, preceding studies proposed a two-stage pipeline: discovering exemplars and counting. However, there remains a challenge of vulnerability to error propagation of the sequentially designed two-stage process. In this work, an one-stage baseline,… ▽ More

    Submitted 30 December, 2023; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024. Code is available at https://github.com/Seunggu0305/VLCounter

  13. arXiv:2312.15894  [pdf, other

    cs.CV

    Task-Disruptive Background Suppression for Few-Shot Segmentation

    Authors: Suho Park, SuBeen Lee, Sangeek Hyun, Hyun Seok Seong, Jae-Pil Heo

    Abstract: Few-shot segmentation aims to accurately segment novel target objects within query images using only a limited number of annotated support images. The recent works exploit support background as well as its foreground to precisely compute the dense correlations between query and support. However, they overlook the characteristics of the background that generally contains various types of objects. I… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  14. arXiv:2312.15861  [pdf, other

    cs.CV

    Towards Squeezing-Averse Virtual Try-On via Sequential Deformation

    Authors: Sang-Heon Shim, Jiwoo Chung, Jae-Pil Heo

    Abstract: In this paper, we first investigate a visual quality degradation problem observed in recent high-resolution virtual try-on approach. The tendency is empirically found that the textures of clothes are squeezed at the sleeve, as visualized in the upper row of Fig.1(a). A main reason for the issue arises from a gradient conflict between two popular losses, the Total Variation (TV) and adversarial los… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  15. arXiv:2312.12100  [pdf, other

    cs.IR cs.AI

    VITA: 'Carefully Chosen and Weighted Less' Is Better in Medication Recommendation

    Authors: Taeri Kim, Jiho Heo, Hongil Kim, Kijung Shin, Sang-Wook Kim

    Abstract: We address the medication recommendation problem, which aims to recommend effective medications for a patient's current visit by utilizing information (e.g., diagnoses and procedures) given at the patient's current and past visits. While there exist a number of recommender systems designed for this problem, we point out that they are challenged in accurately capturing the relation (spec., the degr… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  16. arXiv:2312.09008  [pdf, other

    cs.CV

    Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

    Authors: Jiwoo Chung, Sangeek Hyun, Jae-Pil Heo

    Abstract: Despite the impressive generative capabilities of diffusion models, existing diffusion model-based style transfer methods require inference-stage optimization (e.g. fine-tuning or textual inversion of style) which is time-consuming, or fails to leverage the generative ability of large-scale diffusion models. To address these issues, we introduce a novel artistic style transfer method based on a pr… ▽ More

    Submitted 20 March, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR 2024. Project page: https://jiwoogit.github.io/StyleID_site

  17. arXiv:2312.08063  [pdf, other

    cs.LG cs.AI cs.CL

    Estimation of Concept Explanations Should be Uncertainty Aware

    Authors: Vihari Piratla, Juyeon Heo, Katherine M. Collins, Sukriti Singh, Adrian Weller

    Abstract: Model explanations can be valuable for interpreting and debugging predictive models. We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts. Although popular for their easy interpretation, concept explanations are known to be noisy. We begin our work by identifying various sources of uncertainty in the estimation pipeline th… ▽ More

    Submitted 5 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  18. arXiv:2311.08835  [pdf, other

    cs.CV

    Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding

    Authors: WonJun Moon, Sangeek Hyun, SuBeen Lee, Jae-Pil Heo

    Abstract: Temporal Grounding is to identify specific moments or highlights from a video corresponding to textual descriptions. Typical approaches in temporal grounding treat all video clips equally during the encoding process regardless of their semantic relevance with the text query. Therefore, we propose Correlation-Guided DEtection TRansformer (CG-DETR), exploring to provide clues for query-associated vi… ▽ More

    Submitted 3 July, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: 29 pages, 15 figures, 14 tables, Code is available at https://github.com/wjun0830/CGDETR

  19. arXiv:2311.06243  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

    Authors: Weiyang Liu, Zeju Qiu, Yao Feng, Yuliang Xiu, Yuxuan Xue, Longhui Yu, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, Bernhard Schölkopf

    Abstract: Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly larg… ▽ More

    Submitted 28 April, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: ICLR 2024 (v2: 34 pages, 19 figures)

  20. arXiv:2309.15531  [pdf, other

    cs.LG

    Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

    Authors: Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee

    Abstract: Large Language Models (LLMs) have recently demonstrated remarkable success across various tasks. However, efficiently serving LLMs has been a challenge due to the large memory bottleneck, specifically in small batch inference settings (e.g. mobile devices). Weight-only quantization can be a promising approach, but sub-4 bit quantization remains a challenge due to large-magnitude activation outlier… ▽ More

    Submitted 24 March, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: ICLR 2024. 19 pages, 11 figures, 10 tables

  21. arXiv:2309.08320  [pdf, other

    eess.AS cs.SD

    Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

    Authors: Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu

    Abstract: Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV… ▽ More

    Submitted 13 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 pages, 2 figures, accepted for ICASSP 2024

  22. arXiv:2309.08208  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

    Authors: Hyun-seo Shin, Jungwoo Heo, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-Jin Yu

    Abstract: Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems. Spoofing evidence, which helps to distinguish between spoofed and bona-fide utterances, might exist either locally or globally in the input features. To capture these, the Conformer, which consists of Transformers and CNN, possesses a suitable structure. However, since… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Submitted to 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

  23. arXiv:2309.04549  [pdf, other

    cs.CV cs.DC cs.MM eess.IV

    Poster: Making Edge-assisted LiDAR Perceptions Robust to Lossy Point Cloud Compression

    Authors: Jin Heo, Gregorie Phillips, Per-Erik Brodin, Ada Gavrilovska

    Abstract: Real-time light detection and ranging (LiDAR) perceptions, e.g., 3D object detection and simultaneous localization and mapping are computationally intensive to mobile devices of limited resources and often offloaded on the edge. Offloading LiDAR perceptions requires compressing the raw sensor data, and lossy compression is used for efficiently reducing the data volume. Lossy compression degrades t… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: extended abstract of 2 pages, 2 figures, 1 table

  24. Poster: Enabling Flexible Edge-assisted XR

    Authors: Jin Heo, Ketan Bhardwaj, Ada Gavrilovska

    Abstract: Extended reality (XR) is touted as the next frontier of the digital future. XR includes all immersive technologies of augmented reality (AR), virtual reality (VR), and mixed reality (MR). XR applications obtain the real-world context of the user from an underlying system, and provide rich, immersive, and interactive virtual experiences based on the user's context in real-time. XR systems process s… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: extended abstract of 2 pages, 1 figure, 2 tables

  25. arXiv:2308.10570  [pdf, other

    cs.CV

    Self-Feedback DETR for Temporal Action Detection

    Authors: Jihwan Kim, Miso Lee, Jae-Pil Heo

    Abstract: Temporal Action Detection (TAD) is challenging but fundamental for real-world video applications. Recently, DETR-based models have been devised for TAD but have not performed well yet. In this paper, we point out the problem in the self-attention of DETR for TAD; the attention modules focus on a few key elements, called temporal collapse problem. It degrades the capability of the encoder and decod… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023

  26. arXiv:2308.00093  [pdf, other

    cs.CV

    Task-Oriented Channel Attention for Fine-Grained Few-Shot Classification

    Authors: SuBeen Lee, WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

    Abstract: The difficulty of the fine-grained image classification mainly comes from a shared overall appearance across classes. Thus, recognizing discriminative details, such as eyes and beaks for birds, is a key in the task. However, this is particularly challenging when training data is limited. To address this, we propose Task Discrepancy Maximization (TDM), a task-oriented channel attention method tailo… ▽ More

    Submitted 28 July, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2207.01376

  27. FleXR: A System Enabling Flexibly Distributed Extended Reality

    Authors: Jin Heo, Ketan Bhardwaj, Ada Gavrilovska

    Abstract: Extended reality (XR) applications require computationally demanding functionalities with low end-to-end latency and high throughput. To enable XR on commodity devices, a number of distributed systems solutions enable offloading of XR workloads on remote servers. However, they make a priori decisions regarding the offloaded functionalities based on assumptions about operating factors, and their be… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: 11 pages, 11 figures, conference paper

    Journal ref: In Proceedings of the 14th Conference on ACM Multimedia Systems (pp. 1-13) June, 2023

  28. FLiCR: A Fast and Lightweight LiDAR Point Cloud Compression Based on Lossy RI

    Authors: Jin Heo, Christopher Phillips, Ada Gavrilovska

    Abstract: Light detection and ranging (LiDAR) sensors are becoming available on modern mobile devices and provide a 3D sensing capability. This new capability is beneficial for perceptions in various use cases, but it is challenging for resource-constrained mobile devices to use the perceptions in real-time because of their high computational complexity. In this context, edge computing can be used to enable… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: 12 pages, 11 figures, conference paper

    Journal ref: In 2022 IEEE/ACM 7th Symposium on Edge Computing (SEC) (pp. 54-67). IEEE 2022

  29. arXiv:2307.10628  [pdf, other

    eess.AS cs.SD

    PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification

    Authors: Wonbin Kim, Hyun-seo Shin, Ju-ho Kim, Jungwoo Heo, Chan-yeong Lim, Ha-Jin Yu

    Abstract: Background noise reduces speech intelligibility and quality, making speaker verification (SV) in noisy environments a challenging task. To improve the noise robustness of SV systems, additive noise data augmentation method has been commonly used. In this paper, we propose a new additive noise method, partial additive speech (PAS), which aims to train SV systems to be less affected by noisy environ… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 5 pages, 2 figures, 1 table, accepted to CKAIA2023 as a conference paper

  30. arXiv:2306.14861  [pdf, other

    stat.ML cs.LG

    Leveraging Task Structures for Improved Identifiability in Neural Network Representations

    Authors: Wenlin Chen, Julien Horwood, Juyeon Heo, José Miguel Hernández-Lobato

    Abstract: This work extends the theory of identifiability in supervised learning by considering the consequences of having access to a distribution of tasks. In such cases, we show that identifiability is achievable even in the case of regression, extending prior work restricted to linear identifiability in the single-task classification case. Furthermore, we show that the existence of a task distribution w… ▽ More

    Submitted 29 September, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: 18 pages, 4 figures, 5 tables, 1 algorithm

  31. arXiv:2306.01310  [pdf, other

    cs.LG cs.AI

    EPIC: Graph Augmentation with Edit Path Interpolation via Learnable Cost

    Authors: Jaeseung Heo, Seungbeom Lee, Sungsoo Ahn, Dongwoo Kim

    Abstract: Data augmentation plays a critical role in improving model performance across various domains, but it becomes challenging with graph data due to their complex and irregular structure. To address this issue, we propose EPIC (Edit Path Interpolation via learnable Cost), a novel interpolation-based method for augmenting graph datasets. To interpolate between two graphs lying in an irregular domain, E… ▽ More

    Submitted 4 June, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

  32. arXiv:2305.17394  [pdf, other

    eess.AS cs.SD

    One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

    Authors: Jungwoo Heo, Chan-yeong Lim, Ju-ho Kim, Hyun-seo Shin, Ha-Jin Yu

    Abstract: The application of speech self-supervised learning (SSL) models has achieved remarkable performance in speaker verification (SV). However, there is a computational cost hurdle in employing them, which makes development and deployment difficult. Several studies have simply compressed SSL models through knowledge distillation (KD) without considering the target task. Consequently, these methods coul… ▽ More

    Submitted 7 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: ISCA INTERSPEECH 2023

  33. arXiv:2305.04526  [pdf, other

    cs.CV

    CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation

    Authors: Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram

    Abstract: Transfer learning has become a popular task adaptation method in the era of foundation models. However, many foundation models require large storage and computing resources, which makes off-the-shelf deployment impractical. Post-training compression techniques such as pruning and quantization can help lower deployment costs. Unfortunately, the resulting performance degradation limits the usability… ▽ More

    Submitted 8 July, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Preprint

  34. arXiv:2303.15014  [pdf, other

    cs.CV

    Leveraging Hidden Positives for Unsupervised Semantic Segmentation

    Authors: Hyun Seok Seong, WonJun Moon, SuBeen Lee, Jae-Pil Heo

    Abstract: Dramatic demand for manpower to label pixel-level annotations triggered the advent of unsupervised semantic segmentation. Although the recent work employing the vision transformer (ViT) backbone shows exceptional performance, there is still a lack of consideration for task-specific training guidance and local semantic consistency. To tackle these issues, we leverage contrastive learning by excavat… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  35. arXiv:2303.13874  [pdf, other

    cs.CV cs.AI

    Query-Dependent Video Representation for Moment Retrieval and Highlight Detection

    Authors: WonJun Moon, Sangeek Hyun, SangUk Park, Dongchan Park, Jae-Pil Heo

    Abstract: Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as the demand for video understanding is drastically increased. The key objective of MR/HD is to localize the moment and estimate clip-wise accordance level, i.e., saliency score, to the given text query. Although the recent transformer-based models brought some advances, we found that these methods do not fully… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023. Code is available at https://github.com/wjun0830/QD-DETR

  36. arXiv:2303.06419  [pdf, other

    cs.LG

    Use Perturbations when Learning from Explanations

    Authors: Juyeon Heo, Vihari Piratla, Matthew Wicker, Adrian Weller

    Abstract: Machine learning from explanations (MLX) is an approach to learning that uses human-provided explanations of relevant or irrelevant features for each input to ensure that model predictions are right for the right reasons. Existing MLX approaches rely on local model interpretation methods and require strong model smoothing to align model and human explanations, leading to sub-optimal performance. W… ▽ More

    Submitted 1 December, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

    Comments: NeurIPS 2023; https://github.com/vihari/robust_mlx

  37. arXiv:2303.02331  [pdf, other

    cs.CV cs.AI cs.LG

    Training-Free Acceleration of ViTs with Delayed Spatial Merging

    Authors: Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram

    Abstract: Token merging has emerged as a new paradigm that can accelerate the inference of Vision Transformers (ViTs) without any retraining or fine-tuning. To push the frontier of training-free acceleration in ViTs, we improve token merging by adding the perspectives of 1) activation outliers and 2) hierarchical representations. Through a careful analysis of the attention behavior in ViTs, we characterize… ▽ More

    Submitted 1 July, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: ICML 2024 ES-FoMo Workshop

  38. arXiv:2212.08568  [pdf, other

    cs.CV cs.LG

    Biomedical image analysis competitions: The state of current participation practice

    Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, Anubha Gupta, Jan Kybic, Alison Noble, Carlos Ortiz de Solórzano, Samiksha Pachade, Caroline Petitjean, Daniel Sage, Donglai Wei, Elizabeth Wilden, Deepak Alapatt, Vincent Andrearczyk, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano , et al. (331 additional authors not shown)

    Abstract: The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,… ▽ More

    Submitted 12 September, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

  39. arXiv:2212.08507  [pdf, other

    cs.LG

    Robust Explanation Constraints for Neural Networks

    Authors: Matthew Wicker, Juyeon Heo, Luca Costabello, Adrian Weller

    Abstract: Post-hoc explanation methods are used with the intent of providing insights about neural networks and are sometimes said to help engender trust in their outputs. However, popular explanations methods have been found to be fragile to minor perturbations of input features or model parameters. Relying on constraint relaxation techniques from non-convex optimization, we develop a method that upper-bou… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: 23 pages, 12 figures

  40. arXiv:2211.15900  [pdf, other

    cs.CV

    Towards More Robust Interpretation via Local Gradient Alignment

    Authors: Sunghwan Joo, Seokhyeon Jeong, Juyeon Heo, Adrian Weller, Taesup Moon

    Abstract: Neural network interpretation methods, particularly feature attribution methods, are known to be fragile with respect to adversarial input perturbations. To address this, several methods for enhancing the local smoothness of the gradient while training have been proposed for attaining \textit{robust} feature attributions. However, the lack of considering the normalization of the attributions, whic… ▽ More

    Submitted 7 December, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: 22 pages (9 pages in paper, 13 pages in Appendix), 9 figures, 6 tables Accepted in AAAI 23 (Association for the Advancement of Artificial Intelligence)

  41. arXiv:2211.13471  [pdf, other

    cs.CV

    Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

    Authors: WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

    Abstract: A dramatic increase in real-world video volume with extremely diverse and emerging topics naturally forms a long-tailed video distribution in terms of their categories, and it spotlights the need for Video Long-Tailed Recognition (VLTR). In this work, we summarize the challenges in VLTR and explore how to overcome them. The challenges are: (1) it is impractical to re-train the whole model for high… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: Accepted to AAAI 2023. Code is available at https://github.com/wjun0830/MOVE

  42. arXiv:2211.02227  [pdf, other

    eess.AS cs.SD

    Integrated Parameter-Efficient Tuning for General-Purpose Audio Models

    Authors: Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu

    Abstract: The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of building task-specific models for target tasks. In the field of audio research, task-agnostic pre-trained models with high transferability and adaptability have achieved state-of-the-art performances through fine-tuning for downstream tasks. Nevertheless, re-training all the parameters of these massive mod… ▽ More

    Submitted 1 March, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: 5 pages, 3 figures

  43. arXiv:2211.01599  [pdf, other

    eess.AS cs.SD

    Convolution channel separation and frequency sub-bands aggregation for music genre classification

    Authors: Jungwoo Heo, Hyun-seo Shin, Ju-ho Kim, Chan-yeong Lim, Ha-Jin Yu

    Abstract: In music, short-term features such as pitch and tempo constitute long-term semantic features such as melody and narrative. A music genre classification (MGC) system should be able to analyze these features. In this research, we propose a novel framework that can extract and aggregate both short- and long-term features hierarchically. Our framework is based on ECAPA-TDNN, where all the layers that… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  44. arXiv:2207.10024  [pdf, other

    cs.CV

    Difficulty-Aware Simulator for Open Set Recognition

    Authors: WonJun Moon, Junho Park, Hyun Seok Seong, Cheol-Ho Cho, Jae-Pil Heo

    Abstract: Open set recognition (OSR) assumes unknown instances appear out of the blue at the inference time. The main challenge of OSR is that the response of models for unknowns is totally unpredictable. Furthermore, the diversity of open set makes it harder since instances have different difficulty levels. Therefore, we present a novel framework, DIfficulty-Aware Simulator (DIAS), that generates fakes wit… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022. Code is available at github.com/wjun0830/Difficulty-Aware-Simulator

  45. arXiv:2207.10023  [pdf, other

    cs.CV

    Tailoring Self-Supervision for Supervised Learning

    Authors: WonJun Moon, Ji-Hwan Kim, Jae-Pil Heo

    Abstract: Recently, it is shown that deploying a proper self-supervision is a prospective way to enhance the performance of supervised learning. Yet, the benefits of self-supervision are not fully exploited as previous pretext tasks are specialized for unsupervised representation learning. To this end, we begin by presenting three desirable properties for such auxiliary tasks to assist the supervised object… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022. Code is available at github.com/wjun0830/Localizable-Rotation

  46. arXiv:2207.01376  [pdf, other

    cs.CV cs.LG

    Task Discrepancy Maximization for Fine-grained Few-Shot Classification

    Authors: SuBeen Lee, WonJun Moon, Jae-Pil Heo

    Abstract: Recognizing discriminative details such as eyes and beaks is important for distinguishing fine-grained classes since they have similar overall appearances. In this regard, we introduce Task Discrepancy Maximization (TDM), a simple module for fine-grained few-shot classification. Our objective is to localize the class-wise discriminative regions by highlighting channels encoding distinct informatio… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted to CVPR 2022 as an oral presentation. Code is available at https://github.com/leesb7426/CVPR2022-Task-Discrepancy-Maximization-for-Fine-grained-Few-Shot-Classification

    Journal ref: IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2022

  47. Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators

    Authors: Jung Hwan Heo, Arash Fayyazi, Amirhossein Esmaili, Massoud Pedram

    Abstract: This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks. Specifically, the SPS dataflow enables a novel hardware design approach unlocked by an emergent pruning scheme, periodic pattern-based sparsity (PPS). By exploiting the regularity of PPS, our sparsity-aware compiler optimally reorde… ▽ More

    Submitted 30 June, 2022; originally announced July 2022.

    Comments: 6 pages, Published in ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED) 2022

  48. Two Methods for Spoofing-Aware Speaker Verification: Multi-Layer Perceptron Score Fusion Model and Integrated Embedding Projector

    Authors: Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin

    Abstract: The use of deep neural networks (DNN) has dramatically elevated the performance of automatic speaker verification (ASV) over the last decade. However, ASV systems can be easily neutralized by spoofing attacks. Therefore, the Spoofing-Aware Speaker Verification (SASV) challenge is designed and held to promote development of systems that can perform ASV considering spoofing attacks by integrating AS… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: 5 pages, 4 figures, 5 tables, accepted to 2022 Interspeech as a conference paper

    Journal ref: Proc. Interspeech 2022

  49. arXiv:2206.13044  [pdf, other

    eess.AS cs.SD

    Extended U-Net for Speaker Verification in Noisy Environments

    Authors: Ju-ho Kim, Jungwoo Heo, Hye-jin Shim, Ha-Jin Yu

    Abstract: Background noise is a well-known factor that deteriorates the accuracy and reliability of speaker verification (SV) systems by blurring speech intelligibility. Various studies have used separate pretrained enhancement models as the front-end module of the SV system in noisy environments, and these methods effectively remove noises. However, the denoising process of independent enhancement models n… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 5 pages, 2 figures, 4 tables, accepted to 2022 Interspeech as a conference paper

  50. arXiv:2206.09178  [pdf, other

    cs.CV cs.AI

    REVECA -- Rich Encoder-decoder framework for Video Event CAptioner

    Authors: Jaehyuk Heo, YongGi Jeong, Sunwoo Kim, Jaehee Kim, Pilsung Kang

    Abstract: We describe an approach used in the Generic Boundary Event Captioning challenge at the Long-Form Video Understanding Workshop held at CVPR 2022. We designed a Rich Encoder-decoder framework for Video Event CAptioner (REVECA) that utilizes spatial and temporal information from the video to generate a caption for the corresponding the event boundary. REVECA uses frame position embedding to incorpora… ▽ More

    Submitted 18 June, 2022; originally announced June 2022.

    Comments: The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR). LOng-form VidEo Understanding (LOVEU) workshop