Skip to main content

Showing 1–50 of 106 results for author: Kwak, S

  1. arXiv:2406.06496  [pdf, other

    cs.LG cs.CL cs.CV

    Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation

    Authors: Oishi Banerjee, Hong-Yu Zhou, Subathra Adithan, Stephen Kwak, Kay Wu, Pranav Rajpurkar

    Abstract: Recent advances in generative vision-language models (VLMs) have exciting potential implications for AI in radiology, yet VLMs are also known to produce hallucinations, nonsensical text, and other unwanted behaviors that can waste clinicians' time and cause patient harm. Drawing on recent work on direct preference optimization (DPO), we propose a simple method for modifying the behavior of pretrai… ▽ More

    Submitted 14 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Added acknowledgemnts

  2. arXiv:2405.20729  [pdf, other

    cs.CV

    Extreme Point Supervised Instance Segmentation

    Authors: Hyeonjun Lee, Sehyun Hwang, Suha Kwak

    Abstract: This paper introduces a novel approach to learning instance segmentation using extreme points, i.e., the topmost, leftmost, bottommost, and rightmost points, of each object. These points are readily available in the modern bounding box annotation process while offering strong clues for precise segmentation, and thus allows to improve performance at the same annotation cost with box-supervised meth… ▽ More

    Submitted 3 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  3. arXiv:2405.05967  [pdf, other

    cs.CV cs.GR cs.LG

    Distilling Diffusion Models into Conditional GANs

    Authors: Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park

    Abstract: We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Project page: https://mingukkang.github.io/Diffusion2GAN/

  4. arXiv:2403.10820  [pdf, other

    cs.CV

    Active Label Correction for Semantic Segmentation with Foundation Models

    Authors: Hoyoung Kim, Sehyun Hwang, Suha Kwak, Jungseul Ok

    Abstract: Training and validating models for semantic segmentation require datasets with pixel-wise annotations, which are notoriously labor-intensive. Although useful priors such as foundation models or crowdsourced datasets are available, they are error-prone. We hence propose an effective framework of active label correction (ALC) based on a design of correction query to rectify pseudo labels of pixels,… ▽ More

    Submitted 4 June, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

  5. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  6. arXiv:2403.05139  [pdf, other

    cs.CV

    Improving Diffusion Models for Virtual Try-on

    Authors: Yisol Choi, Sangkyung Kwak, Kyungmin Lee, Hyungwon Choi, Jinwoo Shin

    Abstract: This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively. Previous works adapt existing exemplar-based inpainting diffusion models for virtual try-on to improve the naturalness of the generated visuals compared to other methods (e.g., GAN-based), but they fail to preserve… ▽ More

    Submitted 19 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  7. arXiv:2402.12004  [pdf, other

    cs.CV

    Direct Consistency Optimization for Compositional Text-to-Image Personalization

    Authors: Kyungmin Lee, Sangkyung Kwak, Kihyuk Sohn, Jinwoo Shin

    Abstract: Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency. However, they still lack in synthesizing images of different scenarios or styles that are possible in the original pretrained models. To address this, we propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Preprint. See our project page (https://dco-t2i.github.io/) for more examples and codes

  8. arXiv:2401.14654  [pdf, other

    cs.CL cs.LG

    A Korean Legal Judgment Prediction Dataset for Insurance Disputes

    Authors: Alice Saebom Kwak, Cheonkam Jeong, Ji Weon Lim, Byeongcheol Min

    Abstract: This paper introduces a Korean legal judgment prediction (LJP) dataset for insurance disputes. Successful LJP models on insurance disputes can benefit insurance companies and their customers. It can save both sides' time and money by allowing them to predict how the result would come out if they proceed to the dispute mediation process. As is often the case with low-resource languages, there is a… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: 5 pages, 1 figure

  9. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  10. arXiv:2312.04266  [pdf, other

    cs.CV

    Activity Grammars for Temporal Action Segmentation

    Authors: Dayoung Gong, Joonseok Lee, Deunsol Jung, Suha Kwak, Minsu Cho

    Abstract: Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties. The task of temporal action segmentation, which aims at translating an untrimmed activity video into a sequence of action segments, remains challenging for this reason. This paper addresses the problem by introducing an effective act… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted to NeurIPS 2023

  11. arXiv:2312.02878  [pdf, other

    cs.CV

    Towards More Practical Group Activity Detection: A New Benchmark and Model

    Authors: Dongkeun Kim, Youngkil Song, Minsu Cho, Suha Kwak

    Abstract: Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video. While GAD has been studied recently, there is still much room for improvement in both dataset and methodology due to their limited capability to address practical GAD scenarios. To resolve these issues, we first present a new dataset, dubbed Café. U… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Project page: https://cvlab.postech.ac.kr/research/CAFE

  12. arXiv:2310.17811  [pdf, other

    cs.AI cs.CL

    Style-Aware Radiology Report Generation with RadGraph and Few-Shot Prompting

    Authors: Benjamin Yan, Ruochen Liu, David E. Kuo, Subathra Adithan, Eduardo Pontes Reis, Stephen Kwak, Vasantha Kumar Venugopal, Chloe P. O'Connell, Agustina Saenz, Pranav Rajpurkar, Michael Moor

    Abstract: Automatically generated reports from medical images promise to improve the workflow of radiologists. Existing methods consider an image-to-report modeling task by directly generating a fully-fledged report from an image. However, this conflates the content of the report (e.g., findings and their attributes) with its style (e.g., format and choice of words), which can lead to clinically inaccurate… ▽ More

    Submitted 31 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of EMNLP 2023

  13. arXiv:2309.09319  [pdf, other

    cs.CV cs.AI cs.LG

    Active Learning for Semantic Segmentation with Multi-class Label Query

    Authors: Sehyun Hwang, Sohyun Lee, Hoyoung Kim, Minhyeon Oh, Jungseul Ok, Suha Kwak

    Abstract: This paper proposes a new active learning method for semantic segmentation. The core of our method lies in a new annotation query design. It samples informative local image regions (e.g., superpixels), and for each of such regions, asks an oracle for a multi-hot vector indicating all classes existing in the region. This multi-class labeling strategy is substantially more efficient than existing on… ▽ More

    Submitted 6 November, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023 accepted

    MSC Class: 68T07 ACM Class: I.2.10

  14. arXiv:2309.08944  [pdf, other

    cs.CV cs.AI cs.LG

    Universal Metric Learning with Parameter-Efficient Transfer Learning

    Authors: Sungyeon Kim, Donghyun Kim, Suha Kwak

    Abstract: A common practice in metric learning is to train and test an embedding model for each dataset. This dataset-specific approach fails to simulate real-world scenarios that involve multiple heterogeneous distributions of data. In this regard, we introduce a novel metric learning paradigm, called Universal Metric Learning (UML), which learns a unified distance metric capable of capturing relations acr… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  15. arXiv:2308.15512  [pdf, other

    cs.CV

    Shatter and Gather: Learning Referring Image Segmentation with Text Supervision

    Authors: Dongwon Kim, Namyup Kim, Cuiling Lan, Suha Kwak

    Abstract: Referring image segmentation, the task of segmenting any arbitrary entities described in free-form texts, opens up a variety of vision applications. However, manual labeling of training data for this task is prohibitively costly, leading to lack of labeled data for training. We address this issue by a weakly supervised learning approach using text descriptions of training images as the only source… ▽ More

    Submitted 24 October, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023, Project page: https://southflame.github.io/sag/

  16. arXiv:2308.00994  [pdf, other

    cs.CV cs.LG

    SYNAuG: Exploiting Synthetic Data for Data Imbalance Problems

    Authors: Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Nayeong Kim, Suha Kwak, Tae-Hyun Oh

    Abstract: Data imbalance in training data often leads to biased predictions from trained models, which in turn causes ethical and social issues. A straightforward solution is to carefully curate training data, but given the enormous scale of modern neural networks, this is prohibitively labor-intensive and thus impractical. Inspired by recent developments in generative models, this paper explores the potent… ▽ More

    Submitted 25 April, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: The paper is under consideration at Pattern Recognition Letters

  17. arXiv:2307.15199  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

    Authors: Junhyeong Cho, Gilhyun Nam, Sungyeon Kim, Hunmin Yang, Suha Kwak

    Abstract: In a joint vision-language space, a text feature (e.g., from "a photo of a dog") could effectively represent its relevant image features (e.g., from dog photos). Also, a recent study has demonstrated the cross-modal transferability phenomenon of this joint space. From these observations, we propose PromptStyler which simulates various distribution shifts in the joint space by synthesizing diverse… ▽ More

    Submitted 15 August, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023, Project Page: https://promptstyler.github.io/

  18. arXiv:2306.12978  [pdf, other

    cs.IT eess.SP

    Rate-Splitting Multiple Access for 6G Networks: Ten Promising Scenarios and Applications

    Authors: Jeonghun Park, Byungju Lee, Jinseok Choi, Hoon Lee, Namyoon Lee, Seok-Hwan Park, Kyoung-Jae Lee, Junil Choi, Sung Ho Chae, Sang-Woon Jeon, Kyung Sup Kwak, Bruno Clerckx, Wonjae Shin

    Abstract: In the upcoming 6G era, multiple access (MA) will play an essential role in achieving high throughput performances required in a wide range of wireless applications. Since MA and interference management are closely related issues, the conventional MA techniques are limited in that they cannot provide near-optimal performance in universal interference regimes. Recently, rate-splitting multiple acce… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 17 pages, 6 figures, submitted to IEEE Network Magazine

  19. arXiv:2306.08498  [pdf, other

    cs.CV

    Extending CLIP's Image-Text Alignment to Referring Image Segmentation

    Authors: Seoyeon Kim, Minguk Kang, Dongwon Kim, Jaesik Park, Suha Kwak

    Abstract: Referring Image Segmentation (RIS) is a cross-modal task that aims to segment an instance described by a natural language expression. Recent methods leverage large-scale pretrained unimodal models as backbones along with fusion techniques for joint reasoning across modalities. However, the inherent cross-modal nature of RIS raises questions about the effectiveness of unimodal backbones. We propose… ▽ More

    Submitted 7 April, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: NAACL 2024

  20. arXiv:2303.16817  [pdf, other

    cs.CV

    Adaptive Superpixel for Active Learning in Semantic Segmentation

    Authors: Hoyoung Kim, Minhyeon Oh, Sehyun Hwang, Suha Kwak, Jungseul Ok

    Abstract: Learning semantic segmentation requires pixel-wise annotations, which can be time-consuming and expensive. To reduce the annotation cost, we propose a superpixel-based active learning (AL) framework, which collects a dominant label per superpixel instead. To be specific, it consists of adaptive superpixel and sieving mechanisms, fully dedicated to AL. At each round of AL, we adaptively merge neigh… ▽ More

    Submitted 20 August, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

  21. arXiv:2303.15410  [pdf, other

    cs.CV

    Human Pose Estimation in Extremely Low-Light Conditions

    Authors: Sohyun Lee, Jaesung Rim, Boseung Jeong, Geonu Kim, Byungju Woo, Haechan Lee, Sunghyun Cho, Suha Kwak

    Abstract: We study human pose estimation in extremely low-light images. This task is challenging due to the difficulty of collecting real low-light images with accurate labels, and severely corrupted inputs that degrade prediction quality significantly. To address the first issue, we develop a dedicated camera system and build a new dataset of real low-light images with accurate pose labels. Thanks to our c… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  22. arXiv:2303.04841  [pdf

    cs.CY

    The dynamic nature of trust: Trust in Human-Robot Interaction revisited

    Authors: Jimin Rhim, Sonya S. Kwak, Angelica Lim, Jason Millar

    Abstract: The role of robots is expanding from tool to collaborator. Socially assistive robots (SARs) are an example of collaborative robots that assist humans in the real world. As robots enter our social sphere, unforeseen risks occur during human-robot interaction (HRI), as everyday human space is full of uncertainties. Risk introduces an element of trust, so understanding human trust in the robot is imp… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: Accepted at ACM CHI Conference workshop, WS32: Socially Assistive Robots as Decision Makers: Transparency, Motivations, and Intentions

    Report number: SARTMI/2023/7

  23. arXiv:2212.14258  [pdf, other

    cs.CV cs.AI

    HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization

    Authors: Sungyeon Kim, Boseung Jeong, Suha Kwak

    Abstract: Supervision for metric learning has long been given in the form of equivalence between human-labeled classes. Although this type of supervision has been a basis of metric learning for decades, we argue that it hinders further advances in the field. In this regard, we propose a new regularization method, dubbed HIER, to discover the latent semantic hierarchy of training data, and to deploy the hier… ▽ More

    Submitted 10 April, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: Accepted to CVPR 2023

  24. arXiv:2212.07579  [pdf, other

    cs.CV cs.AI

    Learning to Detect Semantic Boundaries with Image-level Class Labels

    Authors: Namyup Kim, Sehyun Hwang, Suha Kwak

    Abstract: This paper presents the first attempt to learn semantic boundary detection using image-level class labels as supervision. Our method starts by estimating coarse areas of object classes through attentions drawn by an image classification network. Since boundaries will locate somewhere between such areas of different classes, our task is formulated as a multiple instance learning (MIL) problem, wher… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: International Journal of Computer Vision (IJCV), 2022

  25. arXiv:2211.16761  [pdf, other

    cs.CV

    Improving Cross-Modal Retrieval with Set of Diverse Embeddings

    Authors: Dongwon Kim, Namyup Kim, Suha Kwak

    Abstract: Cross-modal retrieval across image and text modalities is a challenging task due to its inherent ambiguity: An image often exhibits various situations, and a caption can be coupled with diverse images. Set-based embedding has been studied as a solution to this problem. It seeks to encode a sample into a set of different embedding vectors that capture different semantics of the sample. In this pape… ▽ More

    Submitted 24 July, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: Accepted to CVPR 2023 (Highlight)

  26. arXiv:2211.14058  [pdf, other

    cs.CV cs.AI cs.LG

    Cross-Domain Ensemble Distillation for Domain Generalization

    Authors: Kyungmoon Lee, Sungyeon Kim, Suha Kwak

    Abstract: Domain generalization is the task of learning models that generalize to unseen target domains. We propose a simple yet effective method for domain generalization, named cross-domain ensemble distillation (XDED), that learns domain-invariant features while encouraging the model to converge to flat minima, which recently turned out to be a sufficient condition for domain generalization. To this end,… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: Accepted to ECCV 2022. Code is available at http://github.com/leekyungmoon/XDED

  27. arXiv:2211.07116  [pdf, other

    cs.CV

    Few-shot Metric Learning: Online Adaptation of Embedding for Retrieval

    Authors: Deunsol Jung, Dahyun Kang, Suha Kwak, Minsu Cho

    Abstract: Metric learning aims to build a distance metric typically by learning an effective embedding function that maps similar objects into nearby points in its embedding space. Despite recent advances in deep metric learning, it remains challenging for the learned metric to generalize to unseen classes with a substantial domain gap. To tackle the issue, we explore a new problem of few-shot metric learni… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Accepted at ACCV 2022

  28. arXiv:2210.16989  [pdf, other

    cs.CL

    Validity Assessment of Legal Will Statements as Natural Language Inference

    Authors: Alice Saebom Kwak, Jacob O. Israelsen, Clayton T. Morrison, Derek E. Bambauer, Mihai Surdeanu

    Abstract: This work introduces a natural language inference (NLI) dataset that focuses on the validity of statements in legal wills. This dataset is unique because: (a) each entailment decision requires three inputs: the statement from the will, the law, and the conditions that hold at the time of the testator's death; and (b) the included texts are longer than the ones in current NLI datasets. We trained e… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: 10 pages, 4 figures; To be published in the Findings of the Association for Computational Linguistics: EMNLP 2022

  29. arXiv:2208.07155  [pdf, ps, other

    cs.IT

    Cognitive Radio-Inspired Rate-Splitting Multiple Access for Semi-Grant-Free Transmissions

    Authors: Hongwu Liu, Kyeong Jin Kim, Theodoros A. Tsiftsis, Bruno Clerckx, Kyung Sup Kwak, H. Vincent Poor

    Abstract: In this paper, we propose a cognitive radio-inspired rate-splitting multiple access (CR-RSMA) scheme to assist semi-grant-free (SGF) transmissions in which a grant-based user (GBU) and multiple grant-free users (GFUs) access the base-station (BS) by sharing the same resource block. Using the cognitive radio principle, the GBU and admitted GFU are treated as the primary and secondary users, respect… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

    Comments: 30 pages, 7 figures

  30. arXiv:2208.06604  [pdf, other

    cs.LG cs.AI cs.CV

    Combating Label Distribution Shift for Active Domain Adaptation

    Authors: Sehyun Hwang, Sohyun Lee, Sungyeon Kim, Jungseul Ok, Suha Kwak

    Abstract: We consider the problem of active domain adaptation (ADA) to unlabeled target data, of which subset is actively selected and labeled given a budget constraint. Inspired by recent analysis on a critical issue from label distribution mismatch between source and target in domain adaptation, we devise a method that addresses the issue for the first time in ADA. At its heart lies a novel sampling strat… ▽ More

    Submitted 13 August, 2022; originally announced August 2022.

    Comments: ECCV 2022 accepted

    ACM Class: I.2.10

  31. arXiv:2206.10843  [pdf, other

    cs.LG cs.CV

    Learning Debiased Classifier with Biased Committee

    Authors: Nayeong Kim, Sehyun Hwang, Sungsoo Ahn, Jaesik Park, Suha Kwak

    Abstract: Neural networks are prone to be biased towards spurious correlations between classes and latent attributes exhibited in a major portion of training data, which ruins their generalization capability. We propose a new method for training debiased classifiers with no spurious attribute label. The key idea is to employ a committee of classifiers as an auxiliary module that identifies bias-conflicting… ▽ More

    Submitted 1 May, 2023; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: Conference on Neural Information Processing Systems (NeurIPS), New Orleans, 2022

  32. arXiv:2205.01903  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Taught Metric Learning without Labels

    Authors: Sungyeon Kim, Dongwon Kim, Minsu Cho, Suha Kwak

    Abstract: We present a novel self-taught framework for unsupervised metric learning, which alternates between predicting class-equivalence relations between data through a moving average of an embedding model and learning the model with the predicted relations as pseudo labels. At the heart of our framework lies an algorithm that investigates contexts of data on the embedding space to predict their class-eq… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted to CVPR 2022

  33. arXiv:2204.02139  [pdf, other

    cs.CV

    Detector-Free Weakly Supervised Group Activity Recognition

    Authors: Dongkeun Kim, Jinsung Lee, Minsu Cho, Suha Kwak

    Abstract: Group activity recognition is the task of understanding the activity conducted by a group of people as a whole in a multi-person video. Existing models for this task are often impractical in that they demand ground-truth bounding box labels of actors even in testing or rely on off-the-shelf object detectors. Motivated by this, we propose a novel model for group activity recognition that depends ne… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted to CVPR 2022

  34. arXiv:2204.02078  [pdf, other

    cs.CV

    Semi-supervised Semantic Segmentation with Error Localization Network

    Authors: Donghyeon Kwon, Suha Kwak

    Abstract: This paper studies semi-supervised learning of semantic segmentation, which assumes that only a small portion of training images are labeled and the others remain unlabeled. The unlabeled images are usually assigned pseudo labels to be used in training, which however often causes the risk of performance degradation due to the confirmation bias towards errors on the pseudo labels. We present a nove… ▽ More

    Submitted 31 May, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

  35. arXiv:2204.01587  [pdf, other

    cs.CV

    FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation

    Authors: Sohyun Lee, Taeyoung Son, Suha Kwak

    Abstract: Robust visual recognition under adverse weather conditions is of great importance in real-world applications. In this context, we propose a new method for learning semantic segmentation models robust against fog. Its key idea is to consider the fog condition of an image as its style and close the gap between images with different fog conditions in neural style spaces of a segmentation model. In pa… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to CVPR 2022 (Oral)

  36. arXiv:2203.16787  [pdf, other

    cs.CV

    Reflection and Rotation Symmetry Detection via Equivariant Learning

    Authors: Ahyun Seo, Byungjin Kim, Suha Kwak, Minsu Cho

    Abstract: The inherent challenge of detecting symmetries stems from arbitrary orientations of symmetry patterns; a reflection symmetry mirrors itself against an axis with a specific orientation while a rotation symmetry matches its rotated copy with a specific orientation. Discovering such symmetry patterns from an image thus benefits from an equivariant feature representation, which varies consistently wit… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: To be appear at CVPR 2022

  37. arXiv:2203.16768  [pdf, other

    cs.CV cs.AI

    ReSTR: Convolution-free Referring Image Segmentation Using Transformers

    Authors: Namyup Kim, Dongwon Kim, Cuiling Lan, Wenjun Zeng, Suha Kwak

    Abstract: Referring image segmentation is an advanced semantic segmentation task where target is not a predefined class but is described in natural language. Most of existing methods for this task rely heavily on convolutional neural networks, which however have trouble capturing long-range dependencies between entities in the language expression and are not flexible enough for modeling interactions between… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: CVPR 2022 accepted

  38. arXiv:2203.16518  [pdf, other

    cs.CV cs.AI cs.LG

    Collaborative Transformers for Grounded Situation Recognition

    Authors: Junhyeong Cho, Youngseok Yoon, Suha Kwak

    Abstract: Grounded situation recognition is the task of predicting the main activity, entities playing certain roles within the activity, and bounding-box groundings of the entities in the given image. To effectively deal with this challenging task, we introduce a novel approach where the two processes for activity classification and entity estimation are interactive and complementary. To implement this ide… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022, Code: https://github.com/jhcho99/CoFormer

  39. arXiv:2201.05721  [pdf, other

    cs.CL

    Extracting Space Situational Awareness Events from News Text

    Authors: Zhengnan Xie, Alice Saebom Kwak, Enfa George, Laura W. Dozal, Hoang Van, Moriba Jah, Roberto Furfaro, Peter Jansen

    Abstract: Space situational awareness typically makes use of physical measurements from radar, telescopes, and other assets to monitor satellites and other spacecraft for operational, navigational, and defense purposes. In this work we explore using textual input for the space situational awareness task. We construct a corpus of 48.5k news articles spanning all known active satellites between 2009 and 2020.… ▽ More

    Submitted 14 January, 2022; originally announced January 2022.

    Comments: Submitted to LREC 2022

  40. arXiv:2201.01008  [pdf, other

    cs.CV cs.LG

    Learning to Generate Novel Classes for Deep Metric Learning

    Authors: Kyungmoon Lee, Sungyeon Kim, Seunghoon Hong, Suha Kwak

    Abstract: Deep metric learning aims to learn an embedding space where the distance between data reflects their class equivalence, even when their classes are unseen during training. However, the limited number of classes available in training precludes generalization of the learned embedding space. Motivated by this, we introduce a new data augmentation approach that synthesizes novel classes and their embe… ▽ More

    Submitted 4 January, 2022; originally announced January 2022.

    Comments: Accepted to BMVC 2021

  41. arXiv:2111.10135  [pdf, other

    cs.CV cs.AI cs.LG

    Grounded Situation Recognition with Transformers

    Authors: Junhyeong Cho, Youngseok Yoon, Hyeonjun Lee, Suha Kwak

    Abstract: Grounded Situation Recognition (GSR) is the task that not only classifies a salient action (verb), but also predicts entities (nouns) associated with semantic roles and their locations in the given image. Inspired by the remarkable success of Transformers in vision tasks, we propose a GSR model based on a Transformer encoder-decoder architecture. The attention mechanism of our model enables accura… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

    Comments: Accepted to BMVC 2021, Code: https://github.com/jhcho99/gsrtr

  42. arXiv:2111.01673  [pdf, other

    cs.CV

    Relational Self-Attention: What's Missing in Attention for Video Understanding

    Authors: Manjin Kim, Heeseung Kwon, Chunyu Wang, Suha Kwak, Minsu Cho

    Abstract: Convolution has been arguably the most important feature transform for modern neural networks, leading to the advance of deep learning. Recent emergence of Transformer networks, which replace convolution layers with self-attention blocks, has revealed the limitation of stationary convolution kernels and opened the door to the era of dynamic feature transforms. The existing dynamic transforms, incl… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

    Comments: Accepted to NeurIPS 2021

  43. arXiv:2110.02127  [pdf, ps, other

    cs.IT

    Rate Splitting Multiple Access for Semi-Grant-Free Transmissions

    Authors: Hongwu Liu, Theodoros A. Tsiftsis, Bruno Clerckx, Kyeong Jin Kim, Kyung Sup Kwak, H. Vincent Poor

    Abstract: Enabled by hybrid grant-based (GB) and grant-free (GF) transmission techniques, GF users of internet of things (IoT) devices and massive machine-type communications (mMTC) meet opportunities to share wireless resources with GB users. In this paper, we propose a rate splitting multiple access (RSMA) strategy for an emerging semi-grant-free (SGF) transmission system to increase connectivity and reli… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: 29 pages, 12 figures, submitted to IEEE Journal

  44. arXiv:2109.14196  [pdf, other

    cs.CV cs.AI

    WEDGE: Web-Image Assisted Domain Generalization for Semantic Segmentation

    Authors: Namyup Kim, Taeyoung Son, Jaehyun Pahk, Cuiling Lan, Wenjun Zeng, Suha Kwak

    Abstract: Domain generalization for semantic segmentation is highly demanded in real applications, where a trained model is expected to work well in previously unseen domains. One challenge lies in the lack of data which could cover the diverse distributions of the possible unseen domains for training. In this paper, we propose a WEb-image assisted Domain GEneralization (WEDGE) scheme, which is the first to… ▽ More

    Submitted 2 May, 2023; v1 submitted 29 September, 2021; originally announced September 2021.

  45. arXiv:2108.04533  [pdf, other

    cs.CV cs.AI cs.LG

    ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer

    Authors: Boseung Jeong, Jicheol Park, Suha Kwak

    Abstract: Attribute-based person search is the task of finding person images that are best matched with a set of text attributes given as query. The main challenge of this task is the large modality gap between attributes and images. To reduce the gap, we present a new loss for learning cross-modal embeddings in the context of attribute-based person search. We regard a set of attributes as a category of peo… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: ICCV 2021 accepted

  46. arXiv:2107.01900  [pdf, other

    cs.LG cs.CV

    On The Distribution of Penultimate Activations of Classification Networks

    Authors: Minkyo Seo, Yoonho Lee, Suha Kwak

    Abstract: This paper studies probability distributions of penultimate activations of classification networks. We show that, when a classification network is trained with the cross-entropy loss, its final classification layer forms a Generative-Discriminative pair with a generative classifier based on a specific distribution of penultimate activations. More importantly, the distribution is parameterized by t… ▽ More

    Submitted 5 July, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: 8 pages, UAI 2021, The first two authors equally contributed

  47. Traffic signal prediction on transportation networks using spatio-temporal correlations on graphs

    Authors: Semin Kwak, Nikolas Geroliminis, Pascal Frossard

    Abstract: Multivariate time series forecasting poses challenges as the variables are intertwined in time and space, like in the case of traffic signals. Defining signals on graphs relaxes such complexities by representing the evolution of signals over a space using relevant graph kernels such as the heat diffusion kernel. However, this kernel alone does not fully capture the actual dynamics of the data as i… ▽ More

    Submitted 5 October, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

  48. arXiv:2103.14908  [pdf, other

    cs.CV cs.AI cs.LG

    Embedding Transfer with Label Relaxation for Improved Metric Learning

    Authors: Sungyeon Kim, Dongwon Kim, Minsu Cho, Suha Kwak

    Abstract: This paper presents a novel method for embedding transfer, a task of transferring knowledge of a learned embedding model to another. Our method exploits pairwise similarities between samples in the source embedding space as the knowledge, and transfers them through a loss used for learning target embedding models. To this end, we design a new loss called relaxed contrastive loss, which employs the… ▽ More

    Submitted 27 March, 2021; originally announced March 2021.

    Comments: Accepted to CVPR 2021

  49. arXiv:2102.07092  [pdf, other

    cs.CV

    Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition

    Authors: Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho

    Abstract: Spatio-temporal convolution often fails to learn motion dynamics in videos and thus an effective motion representation is required for video understanding in the wild. In this paper, we propose a rich and robust motion representation based on spatio-temporal self-similarity (STSS). Given a sequence of frames, STSS represents each local region as similarities to its neighbors in space and time. By… ▽ More

    Submitted 2 November, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

    Comments: Accepted to ICCV 2021

  50. A Comprehensive Utility Function for Resource Allocation in Mobile Edge Computing

    Authors: Zaiwar Ali, Sadia Khaf, Ziaul Haq Abba, Ghulam Abbas, Lei Jiao, Amna Irshad, Kyung Sup Kwak, Muhammad Bilal

    Abstract: In mobile edge computing (MEC), one of the important challenges is how much resources of which mobile edge server (MES) should be allocated to which user equipment (UE). The existing resource allocation schemes only consider CPU as the requested resource and assume utility for MESs as either a random variable or dependent on the requested CPU only. This paper presents a novel comprehensive utility… ▽ More

    Submitted 18 December, 2020; originally announced December 2020.

    Comments: 17 pages, 3 Figures, Published in Computers, Materials & Continua

    MSC Class: 46Fxx ACM Class: G.3; C.2.3; C.2.1

    Journal ref: Computers, Materials & Continua, Vol.66, No.2, 2021, pp.1461-1477