Skip to main content

Showing 1–16 of 16 results for author: Roh, B

  1. arXiv:2401.11505  [pdf, other

    cs.CL cs.IR

    CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling

    Authors: Jawook Gu, Han-Cheol Cho, Jiho Kim, Kihyun You, Eun Kyoung Hong, Byungseok Roh

    Abstract: Free-text radiology reports present a rich data source for various medical tasks, but effectively labeling these texts remains challenging. Traditional rule-based labeling methods fall short of capturing the nuances of diverse free-text patterns. Moreover, models using expert-annotated data are limited by data scarcity and pre-defined classes, impacting their performance, flexibility and scalabili… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: 16 pages, 3 figures

  2. arXiv:2312.06742  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Honeybee: Locality-enhanced Projector for Multimodal LLM

    Authors: Junbum Cha, Wooyoung Kang, Jonghwan Mun, Byungseok Roh

    Abstract: In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in man… ▽ More

    Submitted 31 March, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 camera-ready

  3. arXiv:2312.02103  [pdf, other

    cs.CV

    Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection

    Authors: Sunghun Kang, Junbum Cha, Jonghwan Mun, Byungseok Roh, Chang D. Yoo

    Abstract: Open-vocabulary object detection (OVOD) has recently gained significant attention as a crucial step toward achieving human-like visual intelligence. Existing OVOD methods extend target vocabulary from pre-defined categories to open-world by transferring knowledge of arbitrary concepts from vision-language pre-training models to the detectors. While previous methods have shown remarkable successes,… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  4. arXiv:2310.15747  [pdf, other

    cs.CV

    Large Language Models are Temporal and Causal Reasoners for Video Question Answering

    Authors: Dohwan Ko, Ji Soo Lee, Wooyoung Kang, Byungseok Roh, Hyunwoo J. Kim

    Abstract: Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA). However, such priors often cause suboptimal results on VideoQA by leading the model to over-rel… ▽ More

    Submitted 6 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted paper at EMNLP 2023 Main

  5. CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

    Authors: Kihyun You, Jawook Gu, Jiyeon Ham, Beomhee Park, Jiho Kim, Eun Kyoung Hong, Woonhyunk Baek, Byungseok Roh

    Abstract: A large-scale image-text pair dataset has greatly contributed to the development of vision-language pre-training (VLP) models, which enable zero-shot or few-shot classification without costly annotation. However, in the medical domain, the scarcity of data remains a significant challenge for developing a powerful VLP model. In this paper, we tackle the lack of image-text data in chest X-ray by exp… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted by MICCAI 2023

  6. arXiv:2309.01961  [pdf, other

    cs.CV

    NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

    Authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh , et al. (17 additional authors not shown)

    Abstract: In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested… ▽ More

    Submitted 10 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Tech report, project page https://nice.lgresearch.ai/

  7. arXiv:2303.13040  [pdf, other

    cs.CV cs.AI

    Open-Vocabulary Object Detection using Pseudo Caption Labels

    Authors: Han-Cheol Cho, Won Young Jhoo, Wooyoung Kang, Byungseok Roh

    Abstract: Recent open-vocabulary detection methods aim to detect novel objects by distilling knowledge from vision-language models (VLMs) trained on a vast amount of image-text pairs. To improve the effectiveness of these methods, researchers have utilized datasets with a large vocabulary that contains a large number of object classes, under the assumption that such data will enable models to extract compre… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

  8. arXiv:2303.13009  [pdf, other

    cs.CV

    MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

    Authors: Dohwan Ko, Joonmyung Choi, Hyeong Kyu Choi, Kyoung-Woon On, Byungseok Roh, Hyunwoo J. Kim

    Abstract: Foundation models have shown outstanding performance and generalization capabilities across domains. Since most studies on foundation models mainly focus on the pretraining phase, a naive strategy to minimize a single task-specific loss is adopted for fine-tuning. However, such fine-tuning methods do not fully leverage other losses that are potentially beneficial for the target task. Therefore, we… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted paper at CVPR 2023

  9. arXiv:2212.13563  [pdf, other

    cs.CV cs.AI

    Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

    Authors: Wooyoung Kang, Jonghwan Mun, Sungjun Lee, Byungseok Roh

    Abstract: Image captioning is one of the straightforward tasks that can take advantage of large-scale web-crawled data which provides rich knowledge about the visual world for a captioning model. However, since web-crawled data contains image-text pairs that are aligned at different levels, the inherent noises (e.g., misaligned pairs) make it difficult to learn a precise captioning model. While the filterin… ▽ More

    Submitted 27 September, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

  10. arXiv:2212.00785  [pdf, other

    cs.CV

    Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

    Authors: Junbum Cha, Jonghwan Mun, Byungseok Roh

    Abstract: We tackle open-world semantic segmentation, which aims at learning to segment arbitrary visual concepts in images, by using only image-text pairs without dense annotations. Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts and transferring the learned image-level understanding to the segmentation task. Ho… ▽ More

    Submitted 26 March, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: CVPR 2023 camera-ready

  11. arXiv:2111.14330  [pdf, other

    cs.CV cs.LG

    Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

    Authors: Byungseok Roh, JaeWoong Shin, Wuhyun Shin, Saehoon Kim

    Abstract: DETR is the first end-to-end object detector using a transformer encoder-decoder architecture and demonstrates competitive performance but low computational efficiency on high resolution feature maps. The subsequent work, Deformable DETR, enhances the efficiency of DETR by replacing dense attention with deformable attention, which achieves 10x faster convergence and improved performance. Deformabl… ▽ More

    Submitted 4 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: ICLR 2022. Code is available at https://github.com/kakaobrain/sparse-detr

  12. arXiv:2104.02471  [pdf

    cs.CV eess.IV

    A Facial Feature Discovery Framework for Race Classification Using Deep Learning

    Authors: Khalil Khan, Jehad Ali, Irfan Uddin, Sahib Khan, Byeong-hee Roh

    Abstract: Race classification is a long-standing challenge in the field of face image analysis. The investigation of salient facial features is an important task to avoid processing all face parts. Face segmentation strongly benefits several face analysis tasks, including ethnicity and race classification. We propose a raceclassification algorithm using a prior face segmentation framework. A deep convolutio… ▽ More

    Submitted 29 March, 2021; originally announced April 2021.

    Comments: Number of pages in the paper are 15

    Journal ref: Under review in Computer, Material, and Continua, 2021

  13. arXiv:2103.06122  [pdf, other

    cs.CV cs.LG

    Spatially Consistent Representation Learning

    Authors: Byungseok Roh, Wuhyun Shin, Ildoo Kim, Sungwoong Kim

    Abstract: Self-supervised learning has been widely used to obtain transferrable representations from unlabeled images. Especially, recent contrastive learning methods have shown impressive performances on downstream image classification tasks. While these contrastive methods mainly focus on generating invariant global representations at the image-level under semantic-preserving transformations, they are pro… ▽ More

    Submitted 28 April, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

    Comments: Accepted by CVPR 2021

  14. arXiv:2002.01609  [pdf, other

    cs.CV

    BABO: Background Activation Black-Out for Efficient Object Detection

    Authors: Byungseok Roh, Han-Cheol Cho, Myung-Ho Ju, Soon Hyung Pyo

    Abstract: Recent advances in deep learning have enabled complex real-world use cases comprised of multiple vision tasks and detection tasks are being shifted to the edge side as a pre-processing step of the entire workload. Since running a deep model on resource-constraint devices is challenging, techniques for efficient inference methods are demanded. In this paper, we present an objectness-aware object de… ▽ More

    Submitted 23 March, 2020; v1 submitted 4 February, 2020; originally announced February 2020.

    Comments: 14 pages, 6 figures, 7 tables

  15. arXiv:1611.08588  [pdf, other

    cs.CV

    PVANet: Lightweight Deep Neural Networks for Real-time Object Detection

    Authors: Sanghoon Hong, Byungseok Roh, Kye-Hyeon Kim, Yeongjae Cheon, Minje Park

    Abstract: In object detection, reducing computational cost is as important as improving accuracy for most practical usages. This paper proposes a novel network structure, which is an order of magnitude lighter than other state-of-the-art networks while maintaining the accuracy. Based on the basic principle of more layers with less channels, this new deep neural network minimizes its redundancy by adopting r… ▽ More

    Submitted 9 December, 2016; v1 submitted 23 November, 2016; originally announced November 2016.

    Comments: Presented at NIPS 2016 Workshop on Efficient Methods for Deep Neural Networks (EMDNN). Continuation of arXiv:1608.08021. The affiliation has been corrected

  16. arXiv:1608.08021  [pdf, other

    cs.CV

    PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection

    Authors: Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje Park

    Abstract: This paper presents how we can achieve the state-of-the-art accuracy in multi-category object detection task while minimizing the computational cost by adapting and combining recent technical innovations. Following the common pipeline of "CNN feature extraction + region proposal + RoI classification", we mainly redesign the feature extraction part, since region proposal part is not computationally… ▽ More

    Submitted 30 September, 2016; v1 submitted 29 August, 2016; originally announced August 2016.

    Comments: Full details about "PVANet 9.0" in the VOC2012 leaderboard (https://goo.gl/DuQBku). The test codes are available at https://github.com/sanghoon/pva-faster-rcnn