Skip to main content

Showing 1–17 of 17 results for author: On, K

  1. arXiv:2404.13808  [pdf, other

    cs.IR cs.LG cs.MM

    General Item Representation Learning for Cold-start Content Recommendations

    Authors: Jooeun Kim, Jinri Kim, Kwangeun Yeo, Eungi Kim, Kyoung-Woon On, Jonghwan Mun, Joonseok Lee

    Abstract: Cold-start item recommendation is a long-standing challenge in recommendation systems. A common remedy is to use a content-based approach, but rich information from raw contents in various forms has not been fully utilized. In this paper, we propose a domain/data-agnostic item representation learning framework for cold-start recommendations, naturally equipped with multimodal alignment among vario… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 14 pages

  2. arXiv:2404.04656  [pdf, other

    cs.LG cs.AI cs.CL

    Binary Classifier Optimization for Large Language Model Alignment

    Authors: Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On

    Abstract: Aligning Large Language Models (LLMs) to human preferences through preference optimization has been crucial but labor-intensive, necessitating for each prompt a comparison of both a chosen and a rejected text completion by evaluators. Recently, Kahneman-Tversky Optimization (KTO) has demonstrated that LLMs can be aligned using merely binary "thumbs-up" or "thumbs-down" signals on each prompt-compl… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 18 pages, 9 figures

  3. arXiv:2403.09024  [pdf, other

    cs.CL cs.AI

    Semiparametric Token-Sequence Co-Supervision

    Authors: Hyunji Lee, Doyoung Kim, Jihoon Jun, Sejune Joo, Joel Jang, Kyoung-Woon On, Minjoon Seo

    Abstract: In this work, we introduce a semiparametric token-sequence co-supervision training method. It trains a language model by simultaneously leveraging supervision from the traditional next token prediction loss which is calculated over the parametric token embedding space and the next sequence prediction loss which is calculated over the nonparametric sequence embedding space. The nonparametric sequen… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  4. arXiv:2311.09069  [pdf, other

    cs.CL cs.AI

    How Well Do Large Language Models Truly Ground?

    Authors: Hyunji Lee, Sejune Joo, Chaeeun Kim, Joel Jang, Doyoung Kim, Kyoung-Woon On, Minjoon Seo

    Abstract: To reduce issues like hallucinations and lack of control in Large Language Models (LLMs), a common method is to generate responses by grounding on external contexts given as input, known as knowledge-augmented models. However, previous research often narrowly defines "grounding" as just having the correct answer, which does not ensure the reliability of the entire response. To overcome this, we pr… ▽ More

    Submitted 29 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: published at NAACL 2022

  5. arXiv:2310.06404  [pdf, other

    cs.CL cs.AI cs.LG

    Hexa: Self-Improving for Knowledge-Grounded Dialogue System

    Authors: Daejin Jo, Daniel Wontae Nam, Gunsoo Han, Kyoung-Woon On, Taehwan Kwon, Seungeun Rho, Sungwoong Kim

    Abstract: A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the gene… ▽ More

    Submitted 2 April, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  6. arXiv:2307.14856  [pdf, other

    cs.CL cs.AI

    Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

    Authors: Jihyeon Lee, Dain Kim, Doohae Jung, Boseop Kim, Kyoung-Woon On

    Abstract: In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architectur… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  7. arXiv:2305.13973  [pdf, other

    cs.CL

    Effortless Integration of Memory Management into Open-Domain Conversation Systems

    Authors: Eunbi Choi, Kyoung-Woon On, Gunsoo Han, Sungwoong Kim, Daniel Wontae Nam, Daejin Jo, Seung Eun Rho, Taehwan Kwon, Minjoon Seo

    Abstract: Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propo… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  8. arXiv:2303.13009  [pdf, other

    cs.CV

    MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

    Authors: Dohwan Ko, Joonmyung Choi, Hyeong Kyu Choi, Kyoung-Woon On, Byungseok Roh, Hyunwoo J. Kim

    Abstract: Foundation models have shown outstanding performance and generalization capabilities across domains. Since most studies on foundation models mainly focus on the pretraining phase, a naive strategy to minimize a single task-specific loss is adopted for fine-tuning. However, such fine-tuning methods do not fully leverage other losses that are potentially beneficial for the target task. Therefore, we… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted paper at CVPR 2023

  9. arXiv:2203.16784  [pdf, other

    cs.CV

    Video-Text Representation Learning via Differentiable Weak Temporal Alignment

    Authors: Dohwan Ko, Joonmyung Choi, Juyeon Ko, Shinyeong Noh, Kyoung-Woon On, Eun-Sol Kim, Hyunwoo J. Kim

    Abstract: Learning generic joint representations for video and text by a supervised method requires a prohibitively substantial amount of manually annotated video datasets. As a practical alternative, a large-scale but uncurated and narrated video dataset, HowTo100M, has recently been introduced. But it is still challenging to learn joint embeddings of video and text in a self-supervised manner, due to its… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  10. arXiv:2203.14709  [pdf, other

    cs.CV

    MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

    Authors: Bumsoo Kim, Jonghwan Mun, Kyoung-Woon On, Minchul Shin, Junhyun Lee, Eun-Sol Kim

    Abstract: Human-Object Interaction (HOI) detection is the task of identifying a set of <human, object, interaction> triplets from an image. Recent work proposed transformer encoder-decoder architectures that successfully eliminated the need for many hand-designed components in HOI detection through end-to-end training. However, they are limited to single-scale feature resolution, providing suboptimal perfor… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  11. arXiv:2110.06476  [pdf, other

    cs.CV

    Winning the ICCV'2021 VALUE Challenge: Task-aware Ensemble and Transfer Learning with Visual Concepts

    Authors: Minchul Shin, Jonghwan Mun, Kyoung-Woon On, Woo-Young Kang, Gunsoo Han, Eun-Sol Kim

    Abstract: The VALUE (Video-And-Language Understanding Evaluation) benchmark is newly introduced to evaluate and analyze multi-modal representation learning algorithms on three video-and-language tasks: Retrieval, QA, and Captioning. The main objective of the VALUE challenge is to train a task-agnostic model that is simultaneously applicable for various tasks with different characteristics. This technical re… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: CLVL workshop at ICCV 2021

  12. arXiv:2005.03356  [pdf, other

    cs.CL cs.AI cs.CV

    DramaQA: Character-Centered Video Story Understanding with Hierarchical QA

    Authors: Seongho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee, Byoung-Tak Zhang

    Abstract: Despite recent progress on computer vision and natural language processing, developing a machine that can understand video story is still hard to achieve due to the intrinsic difficulty of video story. Moreover, researches on how to evaluate the degree of video understanding based on human cognitive process have not progressed as yet. In this paper, we propose a novel video question answering (Vid… ▽ More

    Submitted 16 December, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: 15 pages, 11 figures, accepted to AAAI 2021

  13. arXiv:2001.07613  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data

    Authors: Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

    Abstract: Conventional sequential learning methods such as Recurrent Neural Networks (RNNs) focus on interactions between consecutive inputs, i.e. first-order Markovian dependency. However, most of sequential data, as seen with videos, have complex dependency structures that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods. Here, we propo… ▽ More

    Submitted 17 January, 2020; originally announced January 2020.

    Comments: 8 pages, 3 figures, Association for the Advancement of Artificial Intelligence (AAAI2020). arXiv admin note: substantial text overlap with arXiv:1907.01709

  14. arXiv:1907.01709  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Compositional Structure Learning for Sequential Video Data

    Authors: Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

    Abstract: Conventional sequential learning methods such as Recurrent Neural Networks (RNNs) focus on interactions between consecutive inputs, i.e. first-order Markovian dependency. However, most of sequential data, as seen with videos, have complex temporal dependencies that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods. Here, we propo… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

  15. arXiv:1904.00623  [pdf, other

    cs.AI cs.CV cs.LG cs.MM

    Constructing Hierarchical Q&A Datasets for Video Story Understanding

    Authors: Yu-Jung Heo, Kyoung-Woon On, Seongho Choi, Jaeseo Lim, Jinah Kim, Jeh-Kwang Ryu, Byung-Chull Bae, Byoung-Tak Zhang

    Abstract: Video understanding is emerging as a new paradigm for studying human-like AI. Question-and-Answering (Q&A) is used as a general benchmark to measure the level of intelligence for video understanding. While several previous studies have suggested datasets for video Q&A tasks, they did not really incorporate story-level understanding, resulting in highly-biased and lack of variance in degree of ques… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: Accepted to AAAI 2019 Spring Symposium Series : Story-Enabled Intelligence

  16. arXiv:1901.09066  [pdf

    cs.LG cs.CL

    Visualizing Semantic Structures of Sequential Data by Learning Temporal Dependencies

    Authors: Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

    Abstract: While conventional methods for sequential learning focus on interaction between consecutive inputs, we suggest a new method which captures composite semantic flows with variable-length dependencies. In addition, the semantic structures within given sequential data can be interpreted by visualizing temporal dependencies learned from the method. The proposed method, called Temporal Dependency Networ… ▽ More

    Submitted 20 January, 2019; originally announced January 2019.

    Comments: In AAAI-19 Workshop on Network Interpretability for Deep Learning

  17. arXiv:1610.04325  [pdf, other

    cs.CV cs.AI cs.NE

    Hadamard Product for Low-rank Bilinear Pooling

    Authors: Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang

    Abstract: Bilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex t… ▽ More

    Submitted 26 March, 2017; v1 submitted 14 October, 2016; originally announced October 2016.

    Comments: 13 pages, 1 figure, & appendix. ICLR 2017 accepted