Skip to main content

Showing 1–30 of 30 results for author: Mun, J

  1. arXiv:2406.02989  [pdf, other

    cs.RO cs.AI

    Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy

    Authors: Yunho Kim, Jeong Hyun Lee, Choongin Lee, Juhyeok Mun, Donghoon Youm, Jeongsoo Park, Jemin Hwangbo

    Abstract: For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves m… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE Robotics and Automation Letters (RA-L), First two authors contributed equally

  2. arXiv:2404.13808  [pdf, other

    cs.IR cs.LG cs.MM

    General Item Representation Learning for Cold-start Content Recommendations

    Authors: Jooeun Kim, Jinri Kim, Kwangeun Yeo, Eungi Kim, Kyoung-Woon On, Jonghwan Mun, Joonseok Lee

    Abstract: Cold-start item recommendation is a long-standing challenge in recommendation systems. A common remedy is to use a content-based approach, but rich information from raw contents in various forms has not been fully utilized. In this paper, we propose a domain/data-agnostic item representation learning framework for cold-start recommendations, naturally equipped with multimodal alignment among vario… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 14 pages

  3. arXiv:2403.14791  [pdf, other

    cs.CY cs.AI

    Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits

    Authors: Jimin Mun, Liwei Jiang, Jenny Liang, Inyoung Cheong, Nicole DeCario, Yejin Choi, Tadayoshi Kohno, Maarten Sap

    Abstract: General purpose AI, such as ChatGPT, seems to have lowered the barriers for the public to use AI and harness its power. However, the governance and development of AI still remain in the hands of a few, and the pace of development is accelerating without proper assessment of risks. As a first step towards democratic governance and risk assessment of AI, we introduce Particip-AI, a framework to gath… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 35 pages, 4 figures, 23 tables

  4. arXiv:2403.00179  [pdf, other

    cs.HC

    Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate

    Authors: Jimin Mun, Cathy Buerger, Jenny T. Liang, Joshua Garland, Maarten Sap

    Abstract: Counterspeech, i.e., direct responses against hate speech, has become an important tool to address the increasing amount of hate online while avoiding censorship. Although AI has been proposed to help scale up counterspeech efforts, this raises questions of how exactly AI could assist in this process, since counterspeech is a deeply empathetic and agentic process for those involved. In this work,… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: To appear in CHI 2024. 22 pages, 3 figures, 7 tables

  5. arXiv:2402.15539  [pdf, ps, other

    eess.AS cs.CL

    Speech Corpus for Korean Children with Autism Spectrum Disorder: Towards Automatic Assessment Systems

    Authors: Seonwoo Lee, Jihyun Mun, Sunhee Kim, Minhwa Chung

    Abstract: Despite the growing demand for digital therapeutics for children with Autism Spectrum Disorder (ASD), there is currently no speech corpus available for Korean children with ASD. This paper introduces a speech corpus specifically designed for Korean children with ASD, aiming to advance speech technologies such as pronunciation and severity evaluation. Speech recordings from speech and language eval… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 11 pages, Accepted for LREC-COLING 2024

  6. Riveter: Measuring Power and Social Dynamics Between Entities

    Authors: Maria Antoniak, Anjalie Field, Jimin Mun, Melanie Walsh, Lauren F. Klein, Maarten Sap

    Abstract: Riveter provides a complete easy-to-use pipeline for analyzing verb connotations associated with entities in text corpora. We prepopulate the package with connotation frames of sentiment, power, and agency, which have demonstrated usefulness for capturing social phenomena, such as gender bias, in a broad range of corpora. For decades, lexical frameworks have been foundational tools in computationa… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Volume 3: System Demonstrations, 2023, pages 377-388

  7. arXiv:2312.06742  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Honeybee: Locality-enhanced Projector for Multimodal LLM

    Authors: Junbum Cha, Wooyoung Kang, Jonghwan Mun, Byungseok Roh

    Abstract: In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in man… ▽ More

    Submitted 31 March, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 camera-ready

  8. arXiv:2312.02103  [pdf, other

    cs.CV

    Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection

    Authors: Sunghun Kang, Junbum Cha, Jonghwan Mun, Byungseok Roh, Chang D. Yoo

    Abstract: Open-vocabulary object detection (OVOD) has recently gained significant attention as a crucial step toward achieving human-like visual intelligence. Existing OVOD methods extend target vocabulary from pre-defined categories to open-world by transferring knowledge of arbitrary concepts from vision-language pre-training models to the detectors. While previous methods have shown remarkable successes,… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  9. arXiv:2311.00161  [pdf, other

    cs.CL cs.AI

    Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language

    Authors: Jimin Mun, Emily Allaway, Akhila Yerukola, Laura Vianna, Sarah-Jane Leslie, Maarten Sap

    Abstract: Counterspeech, i.e., responses to counteract potential harms of hateful speech, has become an increasingly popular solution to address online hate speech without censorship. However, properly countering hateful language requires countering and dispelling the underlying inaccurate stereotypes implied by such language. In this work, we draw from psychology and philosophy literature to craft six psyc… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: EMNLP 2023 Findings, 19 pages

  10. arXiv:2309.01961  [pdf, other

    cs.CV

    NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

    Authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh , et al. (17 additional authors not shown)

    Abstract: In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested… ▽ More

    Submitted 10 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Tech report, project page https://nice.lgresearch.ai/

  11. arXiv:2307.12644  [pdf, other

    eess.IV cs.AI cs.CV cs.LG eess.SP

    Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPG

    Authors: Dae-Yeol Kim, Eunsu Goh, KwangKee Lee, JongEui Chae, JongHyeon Mun, Junyeong Na, Chae-bong Sohn, Do-Yup Kim

    Abstract: rPPG (Remote photoplethysmography) is a technology that measures and analyzes BVP (Blood Volume Pulse) by using the light absorption characteristics of hemoglobin captured through a camera. Analyzing the measured BVP can derive various physiological signals such as heart rate, stress level, and blood pressure, which can be applied to various applications such as telemedicine, remote patient monito… ▽ More

    Submitted 18 August, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 20 pages, 10 figures

    MSC Class: 68T45; 68T07 ACM Class: I.4.9; I.5.4; I.2

  12. arXiv:2306.14601  [pdf, other

    cs.RO cs.CV cs.LG

    Safe Navigation in Unstructured Environments by Minimizing Uncertainty in Control and Perception

    Authors: Junwon Seo, Jungwi Mun, Taekyung Kim

    Abstract: Uncertainty in control and perception poses challenges for autonomous vehicle navigation in unstructured environments, leading to navigation failures and potential vehicle damage. This paper introduces a framework that minimizes control and perception uncertainty to ensure safe and reliable navigation. The framework consists of two uncertainty-aware models: a learning-based vehicle dynamics model… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: RSS 2023 Workshop on Inference and Decision Making for Autonomous Vehicles (IDMAV)

  13. arXiv:2305.12240  [pdf, other

    cs.RO cs.AI

    Bridging Active Exploration and Uncertainty-Aware Deployment Using Probabilistic Ensemble Neural Network Dynamics

    Authors: Taekyung Kim, Jungwi Mun, Junwon Seo, Beomsu Kim, Seongil Hong

    Abstract: In recent years, learning-based control in robotics has gained significant attention due to its capability to address complex tasks in real-world environments. With the advances in machine learning algorithms and computational capabilities, this approach is becoming increasingly important for solving challenging control problems in robotics by learning unknown or partially known robot dynamics. Ac… ▽ More

    Submitted 28 May, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: 2023 Robotics: Science and Systems (RSS). Project page: https://taekyung.me/rss2023-bridging

  14. arXiv:2305.00676  [pdf, other

    cs.RO cs.AI eess.SY

    Learning Terrain-Aware Kinodynamic Model for Autonomous Off-Road Rally Driving With Model Predictive Path Integral Control

    Authors: Hojin Lee, Taekyung Kim, Jungwi Mun, Wonsuk Lee

    Abstract: High-speed autonomous driving in off-road environments has immense potential for various applications, but it also presents challenges due to the complexity of vehicle-terrain interactions. In such environments, it is crucial for the vehicle to predict its motion and adjust its controls proactively in response to environmental changes, such as variations in terrain elevation. To this end, we propo… ▽ More

    Submitted 22 September, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: Accepted to IEEE Robotics and Automation Letters (and ICRA 2024). Our video can be found at https://youtu.be/VXf_prNQnJo Project page : https://sites.google.com/view/terrainawarekinodyn

    Journal ref: IEEE Robotics and Automation Letters, 2023

  15. arXiv:2302.12952  [pdf

    cs.CL

    Robust language-based mental health assessments in time and space through social media

    Authors: Siddharth Mangalik, Johannes C. Eichstaedt, Salvatore Giorgi, Jihu Mun, Farhan Ahmed, Gilvir Gill, Adithya V. Ganesan, Shashanka Subrahmanya, Nikita Soni, Sean A. P. Clouston, H. Andrew Schwartz

    Abstract: Compared to physical health, population mental health measurement in the U.S. is very coarse-grained. Currently, in the largest population surveys, such as those carried out by the Centers for Disease Control or Gallup, mental health is only broadly captured through "mentally unhealthy days" or "sadness", and limited to relatively infrequent state or metropolitan estimates. Through the large scale… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: 9 pages, 7 figures, pre-print

    ACM Class: J.4; I.2.7

  16. arXiv:2212.13563  [pdf, other

    cs.CV cs.AI

    Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

    Authors: Wooyoung Kang, Jonghwan Mun, Sungjun Lee, Byungseok Roh

    Abstract: Image captioning is one of the straightforward tasks that can take advantage of large-scale web-crawled data which provides rich knowledge about the visual world for a captioning model. However, since web-crawled data contains image-text pairs that are aligned at different levels, the inherent noises (e.g., misaligned pairs) make it difficult to learn a precise captioning model. While the filterin… ▽ More

    Submitted 27 September, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

  17. arXiv:2212.00785  [pdf, other

    cs.CV

    Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

    Authors: Junbum Cha, Jonghwan Mun, Byungseok Roh

    Abstract: We tackle open-world semantic segmentation, which aims at learning to segment arbitrary visual concepts in images, by using only image-text pairs without dense annotations. Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts and transferring the learned image-level understanding to the segmentation task. Ho… ▽ More

    Submitted 26 March, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: CVPR 2023 camera-ready

  18. arXiv:2211.01705  [pdf

    cs.CL

    A speech corpus for chronic kidney disease

    Authors: Jihyun Mun, Sunhee Kim, Myeong Ju Kim, Jiwon Ryu, Sejoong Kim, Minhwa Chung

    Abstract: In this study, we present a speech corpus of patients with chronic kidney disease (CKD) that will be used for research on pathological voice analysis, automatic illness identification, and severity prediction. This paper introduces the steps involved in creating this corpus, including the choice of speech-related parameters and speech lists as well as the recording technique. The speakers in this… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  19. arXiv:2203.14709  [pdf, other

    cs.CV

    MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

    Authors: Bumsoo Kim, Jonghwan Mun, Kyoung-Woon On, Minchul Shin, Junhyun Lee, Eun-Sol Kim

    Abstract: Human-Object Interaction (HOI) detection is the task of identifying a set of <human, object, interaction> triplets from an image. Recent work proposed transformer encoder-decoder architectures that successfully eliminated the need for many hand-designed components in HOI detection through end-to-end training. However, they are limited to single-scale feature resolution, providing suboptimal perfor… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  20. arXiv:2202.05481  [pdf, ps, other

    cs.RO cs.LG eess.SY

    Concurrent Training of a Control Policy and a State Estimator for Dynamic and Robust Legged Locomotion

    Authors: Gwanghyeon Ji, Juhyeok Mun, Hyeongjun Kim, Jemin Hwangbo

    Abstract: In this paper, we propose a locomotion training framework where a control policy and a state estimator are trained concurrently. The framework consists of a policy network which outputs the desired joint positions and a state estimation network which outputs estimates of the robot's states such as the base linear velocity, foot height, and contact probability. We exploit a fast simulation environm… ▽ More

    Submitted 2 March, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: Accepted for IEEE Robotics and Automation Letters (RA-L) and ICRA 2022

    Journal ref: IEEE Robotics and Automation Letters (Volume: 7, Issue: 2, April 2022)

  21. arXiv:2201.05277  [pdf, other

    cs.CV

    Boundary-aware Self-supervised Learning for Video Scene Segmentation

    Authors: Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim

    Abstract: Self-supervised learning has drawn attention through its effectiveness in learning in-domain representations with no ground-truth annotations; in particular, it is shown that properly designed pretext tasks (e.g., contrastive prediction task) bring significant performance gains for downstream tasks (e.g., classification task). Inspired from this, we tackle video scene segmentation, which is a task… ▽ More

    Submitted 13 January, 2022; originally announced January 2022.

    Comments: The code is available at https://github.com/kakaobrain/bassl

  22. arXiv:2110.06476  [pdf, other

    cs.CV

    Winning the ICCV'2021 VALUE Challenge: Task-aware Ensemble and Transfer Learning with Visual Concepts

    Authors: Minchul Shin, Jonghwan Mun, Kyoung-Woon On, Woo-Young Kang, Gunsoo Han, Eun-Sol Kim

    Abstract: The VALUE (Video-And-Language Understanding Evaluation) benchmark is newly introduced to evaluate and analyze multi-modal representation learning algorithms on three video-and-language tasks: Retrieval, QA, and Captioning. The main objective of the VALUE challenge is to train a task-agnostic model that is simultaneously applicable for various tasks with different characteristics. This technical re… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: CLVL workshop at ICCV 2021

  23. arXiv:2109.14349  [pdf, other

    cs.DB cs.AR

    Relational Memory: Native In-Memory Accesses on Rows and Columns

    Authors: Shahin Roozkhosh, Denis Hoornaert, Ju Hyoung Mun, Tarikul Islam Papon, Ahmed Sanaullah, Ulrich Drepper, Renato Mancuso, Manos Athanassoulis

    Abstract: Analytical database systems are typically designed to use a column-first data layout to access only the desired fields. On the other hand, storing data row-first works great for accessing, inserting, or updating entire rows. Transforming rows to columns at runtime is expensive, hence, many analytical systems ingest data in row-first form and transform it in the background to columns to facilitate… ▽ More

    Submitted 6 February, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

  24. arXiv:2004.07514  [pdf, other

    cs.CV

    Local-Global Video-Text Interactions for Temporal Grounding

    Authors: Jonghwan Mun, Minsu Cho, Bohyung Han

    Abstract: This paper addresses the problem of text-to-video temporal grounding, which aims to identify the time interval in a video semantically relevant to a text query. We tackle this problem using a novel regression-based model that learns to extract a collection of mid-level features for semantic phrases in a text query, which corresponds to important semantic entities described in the query (e.g., acto… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

    Comments: CVPR 2020; code available in https://github.com/JonghwanMun/LGI4temporalgrounding

  25. arXiv:1911.13019  [pdf, other

    cs.LG stat.ML

    Towards Oracle Knowledge Distillation with Neural Architecture Search

    Authors: Minsoo Kang, Jonghwan Mun, Bohyung Han

    Abstract: We present a novel framework of knowledge distillation that is capable of learning powerful and efficient student models from ensemble teacher networks. Our approach addresses the inherent model capacity issue between teacher and student and aims to maximize benefit from teacher models during distillation by reducing their capacity gap. Specifically, we employ a neural architecture search techniqu… ▽ More

    Submitted 29 November, 2019; originally announced November 2019.

    Comments: accepted by AAAI-20

  26. arXiv:1904.03870  [pdf, other

    cs.CV

    Streamlined Dense Video Captioning

    Authors: Jonghwan Mun, Linjie Yang, Zhou Ren, Ning Xu, Bohyung Han

    Abstract: Dense video captioning is an extremely challenging task since accurate and coherent description of events in a video requires holistic understanding of video contents as well as contextual reasoning of individual events. Most existing approaches handle this problem by first detecting event proposals from a video and then captioning on a subset of the proposals. As a result, the generated sentences… ▽ More

    Submitted 8 April, 2019; originally announced April 2019.

    Comments: CVPR 2019

  27. arXiv:1810.02358  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Transfer Learning via Unsupervised Task Discovery for Visual Question Answering

    Authors: Hyeonwoo Noh, Taehoon Kim, Jonghwan Mun, Bohyung Han

    Abstract: We study how to leverage off-the-shelf visual and linguistic data to cope with out-of-vocabulary answers in visual question answering task. Existing large-scale visual datasets with annotations such as image class labels, bounding boxes and region descriptions are good sources for learning rich and diverse visual concepts. However, it is not straightforward how the visual concepts can be captured… ▽ More

    Submitted 7 April, 2019; v1 submitted 3 October, 2018; originally announced October 2018.

    Comments: CVPR 2019

  28. arXiv:1710.05179  [pdf, other

    cs.LG cs.CV

    Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization

    Authors: Hyeonwoo Noh, Tackgeun You, Jonghwan Mun, Bohyung Han

    Abstract: Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is known as a successful regularizer, but it is still not clear enough why such training techniques work well in practice and how we can maximize their benefit in the… ▽ More

    Submitted 9 November, 2017; v1 submitted 14 October, 2017; originally announced October 2017.

    Comments: NIPS 2017 camera ready

  29. arXiv:1612.03557  [pdf, other

    cs.CV

    Text-guided Attention Model for Image Captioning

    Authors: Jonghwan Mun, Minsu Cho, Bohyung Han

    Abstract: Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns to d… ▽ More

    Submitted 12 December, 2016; originally announced December 2016.

  30. arXiv:1612.01669  [pdf, other

    cs.CV

    MarioQA: Answering Questions by Watching Gameplay Videos

    Authors: Jonghwan Mun, Paul Hongsuck Seo, Ilchae Jung, Bohyung Han

    Abstract: We present a framework to analyze various aspects of models for video question answering (VideoQA) using customizable synthetic datasets, which are constructed automatically from gameplay videos. Our work is motivated by the fact that existing models are often tested only on datasets that require excessively high-level reasoning or mostly contain instances accessible through single frame inference… ▽ More

    Submitted 13 August, 2017; v1 submitted 6 December, 2016; originally announced December 2016.