Skip to main content

Showing 1–50 of 95 results for author: Chun, S

  1. arXiv:2407.05713  [pdf, other

    cs.CV cs.AI

    Short-term Object Interaction Anticipation with Disentangled Object Detection @ Ego4D Short Term Object Interaction Anticipation Challenge

    Authors: Hyunjin Cho, Dong Un Kang, Se Young Chun

    Abstract: Short-term object interaction anticipation is an important task in egocentric video analysis, including precise predictions of future interactions and their timings as well as the categories and positions of the involved active objects. To alleviate the complexity of this task, our proposed method, SOIA-DOD, effectively decompose it into 1) detecting active object and 2) classifying interaction an… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 4 pages

  2. arXiv:2407.05551  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Read, Watch and Scream! Sound Generation from Text and Video

    Authors: Yujin Jeong, Yunji Kim, Sanghyuk Chun, Jiyoung Lee

    Abstract: Multimodal generative models have shown impressive advances with the help of powerful diffusion models. Despite the progress, generating sound solely from text poses challenges in ensuring comprehensive scene depiction and temporal alignment. Meanwhile, video-to-sound generation limits the flexibility to prioritize sound synthesis for specific objects within the scene. To tackle these challenges,… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Project page: https://naver-ai.github.io/rewas

  3. arXiv:2406.09188  [pdf, ps, other

    cs.CV cs.IR

    Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval

    Authors: Jaeseok Byun, Seokhyeon Jeong, Wonjae Kim, Sanghyuk Chun, Taesup Moon

    Abstract: Composed Image Retrieval (CIR) aims to retrieve a target image based on a reference image and conditioning text, enabling controllable searches. Due to the expensive dataset construction cost for CIR triplets, a zero-shot (ZS) CIR setting has been actively studied to eliminate the need for human-collected triplet datasets. The mainstream of ZS-CIR employs an efficient projection module that projec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 17 pages

  4. arXiv:2404.17507  [pdf, other

    cs.CV

    HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts

    Authors: Wonjae Kim, Sanghyuk Chun, Taekyung Kim, Dongyoon Han, Sangdoo Yun

    Abstract: In an era where the volume of data drives the effectiveness of self-supervised learning, the specificity and clarity of data semantics play a crucial role in model training. Addressing this, we introduce HYPerbolic Entailment filtering (HYPE), a novel methodology designed to meticulously extract modality-wise meaningful and well-aligned data from extensive, noisy image-text pair datasets. Our appr… ▽ More

    Submitted 16 July, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: ECCV 2024; 33pages, 4.5MB

  5. arXiv:2404.04544  [pdf, other

    cs.CV cs.AI

    BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

    Authors: Gwanghyun Kim, Hayeon Kim, Hoigi Seo, Dong Un Kang, Se Young Chun

    Abstract: Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Project page: https://janeyeon.github.io/beyond-scene

  6. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  7. arXiv:2403.18260  [pdf, other

    cs.CV cs.CL

    Toward Interactive Regional Understanding in Vision-Large Language Models

    Authors: Jungbeom Lee, Sanghyuk Chun, Sangdoo Yun

    Abstract: Recent Vision-Language Pre-training (VLP) models have demonstrated significant advancements. Nevertheless, these models heavily rely on image-text pairs that capture only coarse and global information of an image, leading to a limitation in their regional understanding ability. In this work, we introduce \textbf{RegionVLM}, equipped with explicit regional modeling capabilities, allowing them to un… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: NAACL 2024 Main Conference

  8. arXiv:2403.04460  [pdf, other

    cs.CL

    Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset

    Authors: Minjin Kim, Minju Kim, Hana Kim, Beong-woo Kwak, Soyeon Chun, Hyunseo Kim, SeongKu Kang, Youngjae Yu, Jinyoung Yeo, Dongha Lee

    Abstract: Conversational recommender system is an emerging area that has garnered an increasing interest in the community, especially with the advancements in large language models (LLMs) that enable diverse reasoning over conversational input. Despite the progress, the field has many aspects left to explore. The currently available public datasets for conversational recommendation lack specific user prefer… ▽ More

    Submitted 8 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Published at ACL 2024 Findings

  9. arXiv:2312.13027  [pdf, other

    cs.LG cs.CV

    Doubly Perturbed Task Free Continual Learning

    Authors: Byung Hyun Lee, Min-hwan Oh, Se Young Chun

    Abstract: Task Free online continual learning (TF-CL) is a challenging problem where the model incrementally learns tasks without explicit task information. Although training with entire data from the past, present as well as future is considered as the gold standard, naive approaches in TF-CL with the current samples may be conflicted with learning with samples in the future, leading to catastrophic forget… ▽ More

    Submitted 18 February, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024 (Oral)

  10. arXiv:2312.07425  [pdf, other

    cs.LG cs.CV eess.IV eess.SP

    Deep Internal Learning: Deep Learning from a Single Input

    Authors: Tom Tirer, Raja Giryes, Se Young Chun, Yonina C. Eldar

    Abstract: Deep learning, in general, focuses on training a neural network from large labeled datasets. Yet, in many cases there is value in training a network just from the input at hand. This is particularly relevant in many signal and image processing problems where training data is scarce and diversity is large on the one hand, and on the other, there is a lot of structure in the data that can be exploit… ▽ More

    Submitted 8 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted to IEEE Signal Processing Magazine

  11. arXiv:2312.01998  [pdf, other

    cs.CV cs.IR

    Language-only Efficient Training of Zero-shot Composed Image Retrieval

    Authors: Geonmo Gu, Sanghyuk Chun, Wonjae Kim, Yoohoon Kang, Sangdoo Yun

    Abstract: Composed image retrieval (CIR) task takes a composed query of image and text, aiming to search relative images for both conditions. Conventional CIR approaches need a training dataset composed of triplets of query image, query text, and target image, which is very expensive to collect. Several recent works have worked on the zero-shot (ZS) CIR paradigm to tackle the issue without using pre-collect… ▽ More

    Submitted 31 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 camera-ready; First two authors contributed equally; 17 pages, 3.1MB

  12. arXiv:2312.01689  [pdf, other

    eess.IV cs.CV

    Fast and accurate sparse-view CBCT reconstruction using meta-learned neural attenuation field and hash-encoding regularization

    Authors: Heejun Shin, Taehee Kim, Jongho Lee, Se Young Chun, Seungryung Cho, Dongmyung Shin

    Abstract: Cone beam computed tomography (CBCT) is an emerging medical imaging technique to visualize the internal anatomical structures of patients. During a CBCT scan, several projection images of different angles or views are collectively utilized to reconstruct a tomographic image. However, reducing the number of projections in a CBCT scan while preserving the quality of a reconstructed image is challeng… ▽ More

    Submitted 16 January, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  13. arXiv:2311.18654  [pdf, other

    cs.CV cs.AI

    Detailed Human-Centric Text Description-Driven Large Scene Synthesis

    Authors: Gwanghyun Kim, Dong Un Kang, Hoigi Seo, Hayeon Kim, Se Young Chun

    Abstract: Text-driven large scene image synthesis has made significant progress with diffusion models, but controlling it is challenging. While using additional spatial controls with corresponding texts has improved the controllability of large scene synthesis, it is still challenging to faithfully reflect detailed text descriptions without user-provided controls. Here, we propose DetText2Scene, a novel tex… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  14. arXiv:2311.18387  [pdf, other

    cs.CV cs.LG

    On Exact Inversion of DPM-Solvers

    Authors: Seongmin Hong, Kyeonghyun Lee, Suh Yoon Jeon, Hyewon Bae, Se Young Chun

    Abstract: Diffusion probabilistic models (DPMs) are a key component in modern generative models. DPM-solvers have achieved reduced latency and enhanced quality significantly, but have posed challenges to find the exact inverse (i.e., finding the initial noise from the given image). Here we investigate the exact inversions for DPM-solvers and propose algorithms to perform them when samples are generated by t… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 16 pages

  15. arXiv:2311.01001  [pdf, other

    cs.CV cs.AI

    Fully Quantized Always-on Face Detector Considering Mobile Image Sensors

    Authors: Haechang Lee, Wongi Jeong, Dongil Ryu, Hyunwoo Je, Albert No, Kijeong Kim, Se Young Chun

    Abstract: Despite significant research on lightweight deep neural networks (DNNs) designed for edge devices, the current face detectors do not fully meet the requirements for "intelligent" CMOS image sensors (iCISs) integrated with embedded DNNs. These sensors are essential in various practical applications, such as energy-efficient mobile phones and surveillance systems with always-on capabilities. One not… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to ICCV 2023 Workshop on Low-Bit Quantized Neural Networks (LBQNN), Oral

  16. arXiv:2310.13593  [pdf, other

    cs.CV

    Learning with Unmasked Tokens Drives Stronger Vision Learners

    Authors: Taekyung Kim, Sanghyuk Chun, Byeongho Heo, Dongyoon Han

    Abstract: Masked image modeling (MIM) has become a leading self-supervised learning strategy. MIMs such as Masked Autoencoder (MAE) learn strong representations by randomly masking input tokens for the encoder to process, with the decoder reconstructing the masked tokens to the input. However, MIM pre-trained encoders often exhibit a limited attention span, attributed to MIM's sole focus on regressing maske… ▽ More

    Submitted 23 April, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

  17. arXiv:2308.14374  [pdf, other

    cs.LG

    Online Continual Learning on Hierarchical Label Expansion

    Authors: Byung Hyun Lee, Okchul Jung, Jonghyun Choi, Se Young Chun

    Abstract: Continual learning (CL) enables models to adapt to new tasks and environments without forgetting previously learned knowledge. While current CL setups have ignored the relationship between labels in the past task and the new task with or without small task overlaps, real-world scenarios often involve hierarchical relationships between old and new tasks, posing another challenge for traditional CL… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023

  18. arXiv:2308.13449  [pdf, other

    cs.CL

    The Poison of Alignment

    Authors: Aibek Bekbayev, Sungbae Chun, Yerzat Dulat, James Yamazaki

    Abstract: From the perspective of content safety issues, alignment has shown to limit large language models' (LLMs) harmful content generation. This intentional method of reinforcing models to not respond to certain user inputs seem to be present in many modern open-source instruction tuning datasets such as OpenAssistant or Guanaco. We introduce a novel insight to an instruction-tuned model's performance a… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  19. arXiv:2307.10667  [pdf, other

    eess.IV cs.CV

    Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors

    Authors: Haechang Lee, Dongwon Park, Wongi Jeong, Kijeong Kim, Hyunwoo Je, Dongil Ryu, Se Young Chun

    Abstract: As the physical size of recent CMOS image sensors (CIS) gets smaller, the latest mobile cameras are adopting unique non-Bayer color filter array (CFA) patterns (e.g., Quad, Nona, QxQ), which consist of homogeneous color units with adjacent pixels. These non-Bayer sensors are superior to conventional Bayer CFA thanks to their changeable pixel-bin sizes for different light conditions but may introdu… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  20. arXiv:2306.16615  [pdf, other

    cs.CV

    Representation learning of vertex heatmaps for 3D human mesh reconstruction from multi-view images

    Authors: Sungho Chun, Sungbum Park, Ju Yong Chang

    Abstract: This study addresses the problem of 3D human mesh reconstruction from multi-view images. Recently, approaches that directly estimate the skinned multi-person linear model (SMPL)-based human mesh vertices based on volumetric heatmap representation from input images have shown good performance. We show that representation learning of vertex heatmaps using an autoencoder helps improve the performance… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: ICIP 2023

  21. arXiv:2305.18171  [pdf, other

    cs.CV cs.LG

    Improved Probabilistic Image-Text Representations

    Authors: Sanghyuk Chun

    Abstract: Image-Text Matching (ITM) task, a fundamental vision-language (VL) task, suffers from the inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic functions are not sufficiently powerful to capture ambiguity, prompting the exploration of probabilistic embeddings to tackle the challenge. However, the existing probabilistic ITM approach encounters two key shortcomings; t… ▽ More

    Submitted 9 April, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: ICLR 2024 camera-ready; Code: https://github.com/naver-ai/pcmepp. Project page: https://naver-ai.github.io/pcmepp/. 30 pages, 2.2 MB

  22. arXiv:2305.16057  [pdf

    cs.LG cs.CL

    Fake News Detection and Behavioral Analysis: Case of COVID-19

    Authors: Chih-Yuan Li, Navya Martin Kollapally, Soon Ae Chun, James Geller

    Abstract: While the world has been combating COVID-19 for over three years, an ongoing "Infodemic" due to the spread of fake news regarding the pandemic has also been a global issue. The existence of the fake news impact different aspect of our daily lives, including politics, public health, economic activities, etc. Readers could mistake fake news for real news, and consequently have less access to authent… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 27 pages, 11 figures, 13 tables

    MSC Class: 68

  23. arXiv:2304.10727  [pdf, other

    cs.CV cs.AI

    RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models

    Authors: Seulki Park, Daeho Um, Hajung Yoon, Sanghyuk Chun, Sangdoo Yun, Jin Young Choi

    Abstract: In this paper, we propose a robustness benchmark for image-text matching models to assess their vulnerabilities. To this end, we insert adversarial texts and images into the search pool (i.e., gallery set) and evaluate models with the adversarial data. Specifically, we replace a word in the text to change the meaning of the text and mix images with different images to create perceptible changes in… ▽ More

    Submitted 14 July, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

  24. arXiv:2304.04875  [pdf, other

    cs.CV

    Three Recipes for Better 3D Pseudo-GTs of 3D Human Mesh Estimation in the Wild

    Authors: Gyeongsik Moon, Hongsuk Choi, Sanghyuk Chun, Jiyoung Lee, Sangdoo Yun

    Abstract: Recovering 3D human mesh in the wild is greatly challenging as in-the-wild (ITW) datasets provide only 2D pose ground truths (GTs). Recently, 3D pseudo-GTs have been widely used to train 3D human mesh estimation networks as the 3D pseudo-GTs enable 3D mesh supervision when training the networks on ITW datasets. However, despite the great potential of the 3D pseudo-GTs, there has been no extensive… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Published at CVPRW 2023

  25. arXiv:2304.04555  [pdf, other

    cs.LG cs.AI

    Neural Diffeomorphic Non-uniform B-spline Flows

    Authors: Seongmin Hong, Se Young Chun

    Abstract: Normalizing flows have been successfully modeling a complex probability distribution as an invertible transformation of a simple base distribution. However, there are often applications that require more than invertibility. For instance, the computation of energies and forces in physics requires the second derivatives of the transformation to be well-defined and continuous. Smooth normalizing flow… ▽ More

    Submitted 11 April, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: Accepted to AAAI 2023

  26. arXiv:2304.02827  [pdf, other

    cs.CV cs.AI

    DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

    Authors: Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun

    Abstract: The increasing demand for high-quality 3D content creation has motivated the development of automated methods for creating 3D object models from a single image and/or from a text prompt. However, the reconstructed 3D objects using state-of-the-art image-to-3D methods still exhibit low correspondence to the given image and low multi-view consistency. Recent state-of-the-art text-to-3D methods are a… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Project page: https://janeyeon.github.io/ditto-nerf/

  27. arXiv:2304.01900  [pdf, other

    cs.CV cs.AI

    PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain Gap Using Pose-Preserved Text-to-Image Diffusion

    Authors: Gwanghyun Kim, Ji Ha Jang, Se Young Chun

    Abstract: Recently, significant advancements have been made in 3D generative models, however training these models across diverse domains is challenging and requires an huge amount of training data and knowledge of pose distribution. Text-guided domain adaptation methods have allowed the generator to be adapted to the target domains using text prompts, thereby obviating the need for assembling numerous data… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: Project page: https://gwang-kim.github.io/podia_3d/

  28. arXiv:2303.17595  [pdf, other

    cs.CV cs.LG

    Neglected Free Lunch -- Learning Image Classifiers Using Annotation Byproducts

    Authors: Dongyoon Han, Junsuk Choe, Seonghyeok Chun, John Joon Young Chung, Minsuk Chang, Sangdoo Yun, Jean Y. Song, Seong Joon Oh

    Abstract: Supervised learning of image classifiers distills human knowledge into a parametric model through pairs of images and corresponding labels (X,Y). We argue that this simple and widely used representation of human knowledge neglects rich auxiliary information from the annotation procedure, such as the time-series of mouse traces and clicks left after image selection. Our insight is that such annotat… ▽ More

    Submitted 26 July, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: Code & data at https://github.com/naver-ai/NeglectedFreeLunch. To be presented at ICCV'23

  29. arXiv:2303.11916  [pdf, other

    cs.CV cs.IR

    CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion

    Authors: Geonmo Gu, Sanghyuk Chun, Wonjae Kim, HeeJae Jun, Yoohoon Kang, Sangdoo Yun

    Abstract: This paper proposes a novel diffusion-based model, CompoDiff, for solving zero-shot Composed Image Retrieval (ZS-CIR) with latent diffusion. This paper also introduces a new synthetic dataset, named SynthTriplets18M, with 18.8 million reference images, conditions, and corresponding target image triplets to train CIR models. CompoDiff and SynthTriplets18M tackle the shortages of the previous CIR ap… ▽ More

    Submitted 16 July, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: TMLR camera-ready; First two authors contributed equally; TMLR Expert Certification; 30 pages, 5.9MB

  30. arXiv:2303.11114  [pdf, other

    cs.CV

    SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage

    Authors: Song Park, Sanghyuk Chun, Byeongho Heo, Wonjae Kim, Sangdoo Yun

    Abstract: We need billion-scale images to achieve more generalizable and ground-breaking vision models, as well as massive dataset storage to ship the images (e.g., the LAION-4B dataset needs 240TB storage space). However, it has become challenging to deal with unlimited dataset storage with limited storage infrastructure. A number of storage-efficient training methods have been proposed to tackle the probl… ▽ More

    Submitted 11 September, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: ICCV 2023; First two authors contributed equally; code url: https://github.com/naver-ai/seit; 17 pages, 1.2MB

  31. arXiv:2303.00442  [pdf, other

    cs.LG cs.AI cs.CY

    Re-weighting Based Group Fairness Regularization via Classwise Robust Optimization

    Authors: Sangwon Jung, Taeeon Park, Sanghyuk Chun, Taesup Moon

    Abstract: Many existing group fairness-aware training methods aim to achieve the group fairness by either re-weighting underrepresented groups based on certain rules or using weakly approximated surrogates for the fairness metrics in the objective as regularization terms. Although each of the learning schemes has its own strength in terms of applicability or performance, respectively, it is difficult for an… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  32. arXiv:2212.04319  [pdf, other

    cs.CV cs.AI

    On the Robustness of Normalizing Flows for Inverse Problems in Imaging

    Authors: Seongmin Hong, Inbum Park, Se Young Chun

    Abstract: Conditional normalizing flows can generate diverse image samples for solving inverse problems. Most normalizing flows for inverse problems in imaging employ the conditional affine coupling layer that can generate diverse images quickly. However, unintended severe artifacts are occasionally observed in the output of them. In this work, we address this critical issue by investigating the origins of… ▽ More

    Submitted 16 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: 16 pages

  33. arXiv:2212.04114  [pdf, other

    cs.CV

    Group Generalized Mean Pooling for Vision Transformer

    Authors: Byungsoo Ko, Han-Gyu Kim, Byeongho Heo, Sangdoo Yun, Sanghyuk Chun, Geonmo Gu, Wonjae Kim

    Abstract: Vision Transformer (ViT) extracts the final representation from either class token or an average of all patch tokens, following the architecture of Transformer in Natural Language Processing (NLP) or Convolutional Neural Networks (CNNs) in computer vision. However, studies for the best way of aggregating the patch tokens are still limited to average pooling, while widely-used pooling strategies, s… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  34. arXiv:2211.16374  [pdf, other

    cs.CV cs.AI

    DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model

    Authors: Gwanghyun Kim, Se Young Chun

    Abstract: Recent 3D generative models have achieved remarkable performance in synthesizing high resolution photorealistic images with view consistency and detailed 3D shapes, but training them for diverse domains is challenging since it requires massive training images and their camera distribution information. Text-guided domain adaptation methods have shown impressive performance on converting the 2D gene… ▽ More

    Submitted 30 March, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted to CVPR 2023, Project page: https://gwang-kim.github.io/datid_3d/

  35. arXiv:2211.05910  [pdf, other

    eess.IV cs.CV

    Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, Jingang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, Jinwoo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

    Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

  36. arXiv:2211.04470  [pdf, other

    cs.CV eess.IV

    Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report

    Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Lukasz Treszczotko, Xin Chang, Piotr Ksiazek, Michal Lopuszynski, Maciej Pioro, Rafal Rudnicki, Maciej Smyl, Yujie Ma, Zhenyu Li, Zehui Chen, Jialei Xu, Xianming Liu, Junjun Jiang, XueChao Shi, Difan Xu, Yanan Li, Xiaotao Wang, Lei Lei, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo , et al. (14 additional authors not shown)

    Abstract: Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth es… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2105.08630, arXiv:2211.03885; text overlap with arXiv:2105.08819, arXiv:2105.08826, arXiv:2105.08629, arXiv:2105.07809, arXiv:2105.07825

  37. arXiv:2210.11407  [pdf, other

    cs.LG cs.CV

    Similarity of Neural Architectures using Adversarial Attack Transferability

    Authors: Jaehui Hwang, Dongyoon Han, Byeongho Heo, Song Park, Sanghyuk Chun, Jong-Seok Lee

    Abstract: In recent years, many deep neural architectures have been developed for image classification. Whether they are similar or dissimilar and what factors contribute to their (dis)similarities remains curious. To address this question, we aim to design a quantitative and scalable similarity measure between neural architectures. We propose Similarity by Attack Transferability (SAT) from the observation… ▽ More

    Submitted 7 December, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: 20pages, 13 figures, 2.3MB

  38. arXiv:2208.11251  [pdf, other

    cs.CV

    Learnable human mesh triangulation for 3D human pose and shape estimation

    Authors: Sungho Chun, Sungbum Park, Ju Yong Chang

    Abstract: Compared to joint position, the accuracy of joint rotation and shape estimation has received relatively little attention in the skinned multi-person linear model (SMPL)-based human mesh reconstruction from multi-view images. The work in this field is broadly classified into two categories. The first approach performs joint estimation and then produces SMPL parameters by fitting SMPL to resultant j… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

  39. arXiv:2208.09913  [pdf, other

    cs.LG cs.CV

    A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective

    Authors: Chanwoo Park, Sangdoo Yun, Sanghyuk Chun

    Abstract: We propose the first unified theoretical analysis of mixed sample data augmentation (MSDA), such as Mixup and CutMix. Our theoretical results show that regardless of the choice of the mixing strategy, MSDA behaves as a pixel-level regularization of the underlying training loss and a regularization of the first layer parameters. Similarly, our theoretical results support that the MSDA training stra… ▽ More

    Submitted 21 August, 2022; originally announced August 2022.

    Comments: First two authors contributed equally; 29 pages

  40. arXiv:2208.07552  [pdf

    eess.IV cs.CV cs.LG

    Coil2Coil: Self-supervised MR image denoising using phased-array coil images

    Authors: Juhyung Park, Dongwon Park, Hyeong-Geol Shin, Eun-Jung Choi, Hongjun An, Minjun Kim, Dongmyung Shin, Se Young Chun, Jongho Lee

    Abstract: Denoising of magnetic resonance images is beneficial in improving the quality of low signal-to-noise ratio images. Recently, denoising using deep neural networks has demonstrated promising results. Most of these networks, however, utilize supervised learning, which requires large training images of noise-corrupted and clean image pairs. Obtaining training images, particularly clean images, is expe… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: 9 pages, 5figures

  41. arXiv:2207.01520  [pdf, other

    eess.IV cs.CV

    Adaptive GLCM sampling for transformer-based COVID-19 detection on CT

    Authors: Okchul Jung, Dong Un Kang, Gwanghyun Kim, Se Young Chun

    Abstract: The world has suffered from COVID-19 (SARS-CoV-2) for the last two years, causing much damage and change in people's daily lives. Thus, automated detection of COVID-19 utilizing deep learning on chest computed tomography (CT) scans became promising, which helps correct diagnosis efficiently. Recently, transformer-based COVID-19 detection method on CT is proposed to utilize 3D information in CT vol… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: 6 pages

  42. arXiv:2205.04821  [pdf, other

    eess.IV cs.CV

    Self-supervised regression learning using domain knowledge: Applications to improving self-supervised denoising in imaging

    Authors: Il Yong Chun, Dongwon Park, Xuehang Zheng, Se Young Chun, Yong Long

    Abstract: Regression that predicts continuous quantity is a central part of applications using computational imaging and computer vision technologies. Yet, studying and understanding self-supervised learning for regression tasks - except for a particular regression task, image denoising - have lagged behind. This paper proposes a general self-supervised regression learning (SSRL) framework that enables lear… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

    Comments: 17 pages, 16 figures, 2 tables, submitted to IEEE T-IP

  43. arXiv:2204.07962  [pdf, other

    cs.CV cs.AI cs.LG

    An Extendable, Efficient and Effective Transformer-based Object Detector

    Authors: Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang

    Abstract: Transformers have been widely used in numerous vision problems especially for visual recognition and detection. Detection transformers are the first fully end-to-end learning systems for object detection, while vision transformers are the first fully transformer-based architecture for image classification. In this paper, we integrate Vision and Detection Transformers (ViDT) to construct an effecti… ▽ More

    Submitted 17 April, 2022; originally announced April 2022.

    Comments: An extension of the ICLR paper, ViDT: An Efficient and Effective Fully Transformer-based Object Detector. arXiv admin note: substantial text overlap with arXiv:2110.03921

  44. arXiv:2204.03359  [pdf, other

    cs.CV

    ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO

    Authors: Sanghyuk Chun, Wonjae Kim, Song Park, Minsuk Chang, Seong Joon Oh

    Abstract: Image-Text matching (ITM) is a common task for evaluating the quality of Vision and Language (VL) models. However, existing ITM benchmarks have a significant limitation. They have many missing correspondences, originating from the data construction process itself. For example, a caption is only matched with one image although the caption can be matched with other similar images and vice versa. To… ▽ More

    Submitted 3 January, 2024; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Published in ECCV 2022; 32 pages (2.3MB); Code and dataset: https://github.com/naver-ai/eccv-caption; v5 fixes errors in Table 4: the COCO 1K R@1 numbers were incorrect. All other tables and figures are correct. v5 also adds RSUM scores in Tab 4 and 5: RSUM has a high correlation with COCO 1K recalls; v4 fixes errors in v3 -- see the v4 comment for details

  45. arXiv:2203.10789  [pdf, other

    cs.LG cs.CV

    Domain Generalization by Mutual-Information Regularization with Pre-trained Models

    Authors: Junbum Cha, Kyungjae Lee, Sungrae Park, Sanghyuk Chun

    Abstract: Domain generalization (DG) aims to learn a generalized model to an unseen target domain using only limited source domains. Previous attempts to DG fail to learn domain-invariant representations only from the source domains due to the significant domain shifts between training and test domains. Instead, we re-formulate the DG objective using mutual information with the oracle model, a model general… ▽ More

    Submitted 22 July, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: ECCV 2022 camera-ready

  46. arXiv:2202.02916  [pdf, other

    cs.CV cs.LG

    Dataset Condensation with Contrastive Signals

    Authors: Saehyung Lee, Sanghyuk Chun, Sangwon Jung, Sangdoo Yun, Sungroh Yoon

    Abstract: Recent studies have demonstrated that gradient matching-based dataset synthesis, or dataset condensation (DC), methods can achieve state-of-the-art performance when applied to data-efficient learning tasks. However, in this study, we prove that the existing DC methods can perform worse than the random selection method when task-irrelevant information forms a significant part of the training datase… ▽ More

    Submitted 16 June, 2022; v1 submitted 6 February, 2022; originally announced February 2022.

    Comments: Accepted at ICML 2022

  47. arXiv:2201.07436  [pdf, other

    cs.CV

    Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth

    Authors: Doyeon Kim, Woonghyun Ka, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim

    Abstract: Depth estimation from a single image is an important task that can be applied to various fields in computer vision, and has grown rapidly with the development of convolutional neural networks. In this paper, we propose a novel structure and training strategy for monocular depth estimation to further improve the prediction accuracy of the network. We deploy a hierarchical transformer encoder to cap… ▽ More

    Submitted 29 October, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

    Comments: 11pages, 5 figures

  48. arXiv:2112.11895  [pdf, other

    cs.CV

    Few-shot Font Generation with Weakly Supervised Localized Representations

    Authors: Song Park, Sanghyuk Chun, Junbum Cha, Bado Lee, Hyunjung Shim

    Abstract: Automatic few-shot font generation aims to solve a well-defined, real-world problem because manual font designs are expensive and sensitive to the expertise of designers. Existing methods learn to disentangle style and content elements by developing a universal style representation for each font style. However, this approach limits the model in representing diverse local styles, because it is unsu… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: First two authors contributed equally. This is a journal extension of our AAAI 2021 paper arXiv:2009.11042; Code: https://github.com/clovaai/lffont and https://github.com/clovaai/fewshot-font-generation

  49. arXiv:2111.14581  [pdf, other

    cs.LG cs.CV cs.CY

    Learning Fair Classifiers with Partially Annotated Group Labels

    Authors: Sangwon Jung, Sanghyuk Chun, Taesup Moon

    Abstract: Recently, fairness-aware learning have become increasingly crucial, but most of those methods operate by assuming the availability of fully annotated demographic group labels. We emphasize that such assumption is unrealistic for real-world applications since group label annotations are expensive and can conflict with privacy issues. In this paper, we consider a more practical scenario, dubbed as A… ▽ More

    Submitted 31 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: Accepted to CVPR 2022; Code is available at https://github.com/naver-ai/cgl_fairness

  50. arXiv:2110.03921  [pdf, other

    cs.CV cs.LG

    ViDT: An Efficient and Effective Fully Transformer-based Object Detector

    Authors: Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang

    Abstract: Transformers are transforming the landscape of computer vision, especially for recognition tasks. Detection transformers are the first fully end-to-end learning systems for object detection, while vision transformers are the first fully transformer-based architecture for image classification. In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient o… ▽ More

    Submitted 29 November, 2021; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: This is a revised version on Nov. 29