Skip to main content

Showing 1–50 of 53 results for author: Kwon, T

  1. arXiv:2406.10996  [pdf, other

    cs.CL

    THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

    Authors: Seo Hyun Kim, Kai Tzu-iunn Ong, Taeyoon Kwon, Namyoung Kim, Keummin Ka, SeongHyeon Bae, Yohan Jo, Seung-won Hwang, Dongha Lee, Jinyoung Yeo

    Abstract: Large language models (LLMs) are capable of processing lengthy dialogue histories during prolonged interaction with users without additional memory modules; however, their responses tend to overlook or incorrectly recall information from the past. In this paper, we revisit memory-augmented response generation in the era of LLMs. While prior work focuses on getting rid of outdated memories, we argu… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Under Review

  2. arXiv:2404.06818  [pdf, other

    eess.AS cs.LG cs.SD

    Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models

    Authors: Taegyun Kwon, Dasaem Jeong, Juhan Nam

    Abstract: In recent years, advancements in neural network designs and the availability of large-scale labeled datasets have led to significant improvements in the accuracy of piano transcription models. However, most previous work focused on high-performance offline transcription, neglecting deliberate consideration of model size. The goal of this work is to implement real-time inference for piano transcrip… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 11 pages, 8 figures, preprint

  3. arXiv:2404.02575  [pdf, other

    cs.CL

    Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

    Authors: Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Seonghwan Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, Jinyoung Yeo

    Abstract: Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for large language models (LLMs), even though they have demonstrated promising performance in other reasoning tasks. Within this context, some recent studies use progra… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 38 pages, 4 figures

  4. arXiv:2403.15902  [pdf, other

    cs.GR

    Utilizing Motion Matching with Deep Reinforcement Learning for Target Location Tasks

    Authors: Jeongmin Lee, Taesoo Kwon, Hyunju Shin, Yoonsang Lee

    Abstract: We present an approach using deep reinforcement learning (DRL) to directly generate motion matching queries for long-term tasks, particularly targeting the reaching of specific locations. By integrating motion matching and DRL, our method demonstrates the rapid learning of policies for target location tasks within minutes on a standard desktop, employing a simple reward design. Additionally, we pr… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Eurographics 2024 Short Papers

  5. arXiv:2402.13211  [pdf, other

    cs.CL

    Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

    Authors: Dongjin Kang, Sunghwan Kim, Taeyoon Kwon, Seungjun Moon, Hyunsouk Cho, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Emotional Support Conversation (ESC) is a task aimed at alleviating individuals' emotional distress through daily conversation. Given its inherent complexity and non-intuitive nature, ESConv dataset incorporates support strategies to facilitate the generation of appropriate responses. Recently, despite the remarkable conversational ability of large language models (LLMs), previous studies have sug… ▽ More

    Submitted 5 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024

  6. arXiv:2402.12222  [pdf, other

    cs.CR cs.CL cs.LG cs.SE

    CovRL: Fuzzing JavaScript Engines with Coverage-Guided Reinforcement Learning for LLM-based Mutation

    Authors: Jueon Eom, Seyeon Jeong, Taekyoung Kwon

    Abstract: Fuzzing is an effective bug-finding technique but it struggles with complex systems like JavaScript engines that demand precise grammatical input. Recently, researchers have adopted language models for context-aware mutation in fuzzing to address this problem. However, existing techniques are limited in utilizing coverage guidance for fuzzing, which is rather performed in a black-box manner. This… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 14 pages, 4 figures, 9 tables, 2 listings

    ACM Class: D.4.6; I.2.5; D.2.4

  7. arXiv:2402.12189  [pdf, other

    cs.CL cs.CR cs.LG

    Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships

    Authors: Myung Gyo Oh, Hong Eun Ahn, Leo Hyun Park, Taekyoung Kwon

    Abstract: Neural language models (LMs) are vulnerable to training data extraction attacks due to data memorization. This paper introduces a novel attack scenario wherein an attacker adversarially fine-tunes pre-trained LMs to amplify the exposure of the original training data. This strategy differs from prior studies by aiming to intensify the LM's retention of its pre-training dataset. To achieve this, the… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 20 pages, 6 figures, 15 tables

    ACM Class: I.2.7; K.6.5

  8. arXiv:2402.12187  [pdf, other

    cs.CV cs.CR cs.LG

    Adversarial Feature Alignment: Balancing Robustness and Accuracy in Deep Learning via Adversarial Training

    Authors: Leo Hyun Park, Jaeuk Kim, Myung Gyo Oh, Jaewoo Park, Taekyoung Kwon

    Abstract: Deep learning models continue to advance in accuracy, yet they remain vulnerable to adversarial attacks, which often lead to the misclassification of adversarial examples. Adversarial training is used to mitigate this problem by increasing robustness against these attacks. However, this approach typically reduces a model's standard accuracy on clean, non-adversarial samples. The necessity for deep… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 19 pages, 5 figures, 16 tables, 2 algorithms

    ACM Class: I.4.0; K.6.5; D.2.7

  9. Performance-Based Biped Control using a Consumer Depth Camera

    Authors: Yoonsang Lee, Taesoo Kwon

    Abstract: We present a technique for controlling physically simulated characters using user inputs from an off-the-shelf depth camera. Our controller takes a real-time stream of user poses as input, and simulates a stream of target poses of a biped based on it. The simulated biped mimics the user's actions while moving forward at a modest speed and maintaining balance. The controller is parameterized over a… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: Eurographics 2017

    Journal ref: Computer Graphics Forum (Eurographics 2017), Volume 36 Issue 2, 387-395, May 2017

  10. arXiv:2401.09200  [pdf, other

    cs.SD cs.LG eess.AS

    A Real-Time Lyrics Alignment System Using Chroma And Phonetic Features For Classical Vocal Performance

    Authors: Jiyun Park, Sangeon Yong, Taegyun Kwon, Juhan Nam

    Abstract: The goal of real-time lyrics alignment is to take live singing audio as input and to pinpoint the exact position within given lyrics on the fly. The task can benefit real-world applications such as the automatic subtitling of live concerts or operas. However, designing a real-time model poses a great challenge due to the constraints of only using past input and operating within a minimal latency.… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: To Appear IEEE ICASSP 2024

  11. arXiv:2401.02974  [pdf, other

    cs.CL cs.AI cs.IR

    Efficacy of Utilizing Large Language Models to Detect Public Threat Posted Online

    Authors: Taeksoo Kwon, Connor Kim

    Abstract: This paper examines the efficacy of utilizing large language models (LLMs) to detect public threats posted online. Amid rising concerns over the spread of threatening rhetoric and advance notices of violence, automated content analysis techniques may aid in early identification and moderation. Custom data collection tools were developed to amass post titles from a popular Korean online community,… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

    Comments: 10 pages, 4 figures (1 image figure saved in PNG)

  12. arXiv:2312.07399  [pdf, other

    cs.CL cs.AI

    Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

    Authors: Taeyoon Kwon, Kai Tzu-iunn Ong, Dongjin Kang, Seungjun Moon, Jeong Ryong Lee, Dosik Hwang, Yongsik Sim, Beomseok Sohn, Dongha Lee, Jinyoung Yeo

    Abstract: Machine reasoning has made great progress in recent years owing to large language models (LLMs). In the clinical domain, however, most NLP-driven projects mainly focus on clinical classification or reading comprehension, and under-explore clinical reasoning for disease diagnosis due to the expensive rationale annotation with clinicians. In this work, we present a "reasoning-aware" diagnosis framew… ▽ More

    Submitted 10 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  13. arXiv:2311.07215  [pdf, other

    cs.CL cs.SE

    Coffee: Boost Your Code LLMs by Fixing Bugs with Feedback

    Authors: Seungjun Moon, Hyungjoo Chae, Yongho Song, Taeyoon Kwon, Dongjin Kang, Kai Tzu-iunn Ong, Seung-won Hwang, Jinyoung Yeo

    Abstract: Code editing is an essential step towards reliable program synthesis to automatically correct critical errors generated from code LLMs. Recent studies have demonstrated that closed-source LLMs (i.e., ChatGPT and GPT-4) are capable of generating corrective feedback to edit erroneous inputs. However, it remains challenging for open-source code LLMs to generate feedback for code editing, since these… ▽ More

    Submitted 23 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: Work in progress

  14. arXiv:2310.11604  [pdf, other

    cs.RO cs.AI cs.CL cs.HC cs.LG

    Language Models as Zero-Shot Trajectory Generators

    Authors: Teyun Kwon, Norman Di Palo, Edward Johns

    Abstract: Large Language Models (LLMs) have recently shown promise as high-level planners for robots when given access to a selection of low-level skills. However, it is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves. In this work, we address this assumption thoroughly, and investigate if an LLM (GPT-4) can directly predict a dense sequence o… ▽ More

    Submitted 17 June, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Published in IEEE Robotics and Automation Letters (Volume: 9, Issue: 7, July 2024, Pages: 6728-6735); 10 pages, 12 figures

    Journal ref: IEEE Robotics and Automation Letters (Volume: 9, Issue: 7, July 2024, Pages: 6728-6735)

  15. arXiv:2310.09343  [pdf, other

    cs.CL cs.AI

    Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

    Authors: Hyungjoo Chae, Yongho Song, Kai Tzu-iunn Ong, Taeyoon Kwon, Minjin Kim, Youngjae Yu, Dongha Lee, Dongyeop Kang, Jinyoung Yeo

    Abstract: Human-like chatbots necessitate the use of commonsense reasoning in order to effectively comprehend and respond to implicit information present within conversations. Achieving such coherence and informativeness in responses, however, is a non-trivial task. Even for large language models (LLMs), the task of identifying and aggregating key evidence within a single hop presents a substantial challeng… ▽ More

    Submitted 22 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: 25 pages, 8 figures, Accepted to EMNLP 2023

  16. arXiv:2310.06404  [pdf, other

    cs.CL cs.AI cs.LG

    Hexa: Self-Improving for Knowledge-Grounded Dialogue System

    Authors: Daejin Jo, Daniel Wontae Nam, Gunsoo Han, Kyoung-Woon On, Taehwan Kwon, Seungeun Rho, Sungwoong Kim

    Abstract: A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the gene… ▽ More

    Submitted 2 April, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  17. arXiv:2309.17024  [pdf, other

    cs.CV

    HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

    Authors: Xin Wang, Taein Kwon, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Dan Bohus, Ashley Feniello, Bugra Tekin, Felipe Vieira Frujeri, Neel Joshi, Marc Pollefeys

    Abstract: Building an interactive AI assistant that can perceive, reason, and collaborate with humans in the real world has been a long-standing pursuit in the AI community. This work is part of a broader research effort to develop intelligent agents that can interactively guide humans through performing tasks in the physical world. As a first step in this direction, we introduce HoloAssist, a large-scale e… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  18. arXiv:2309.10310  [pdf, other

    cs.LG

    TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions

    Authors: Taehyung Kwon, Jihoon Ko, Jinhong Jung, Kijung Shin

    Abstract: Many real-world datasets are represented as tensors, i.e., multi-dimensional arrays of numerical values. Storing them without compression often requires substantial space, which grows exponentially with the order. While many tensor compression algorithms are available, many of them rely on strong data assumptions regarding its order, sparsity, rank, and smoothness. In this work, we propose TENSORC… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted to ICDM 2023 - IEEE International Conference on Data Mining 2023

  19. arXiv:2309.10001  [pdf, other

    cs.CV

    CaSAR: Contact-aware Skeletal Action Recognition

    Authors: Junan Lin, Zhichao Sun, Enjie Cao, Taein Kwon, Mahdi Rad, Marc Pollefeys

    Abstract: Skeletal Action recognition from an egocentric view is important for applications such as interfaces in AR/VR glasses and human-robot interaction, where the device has limited resources. Most of the existing skeletal action recognition approaches use 3D coordinates of hand joints and 8-corner rectangular bounding boxes of objects as inputs, but they do not capture how the hands and objects interac… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: 10 pages, 8 figures

  20. arXiv:2308.07491  [pdf, other

    cs.RO cs.GR cs.LG

    Adaptive Tracking of a Single-Rigid-Body Character in Various Environments

    Authors: Taesoo Kwon, Taehong Gu, Jaewon Ahn, Yoonsang Lee

    Abstract: Since the introduction of DeepMimic [Peng et al. 2018], subsequent research has focused on expanding the repertoire of simulated motions across various scenarios. In this study, we propose an alternative approach for this goal, a deep reinforcement learning method based on the simulation of a single-rigid-body character. Using the centroidal dynamics model (CDM) to express the full-body character… ▽ More

    Submitted 28 January, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: SIGGRAPH Asia 2023 Conference Papers

    Journal ref: SA '23: SIGGRAPH Asia 2023 Conference Papers, December 2023, Article No.: 118, Pages 1-11

  21. arXiv:2305.13973  [pdf, other

    cs.CL

    Effortless Integration of Memory Management into Open-Domain Conversation Systems

    Authors: Eunbi Choi, Kyoung-Woon On, Gunsoo Han, Sungwoong Kim, Daniel Wontae Nam, Daejin Jo, Seung Eun Rho, Taehwan Kwon, Minjoon Seo

    Abstract: Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propo… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  22. arXiv:2305.13758  [pdf, other

    cs.SD eess.AS

    A study of audio mixing methods for piano transcription in violin-piano ensembles

    Authors: Hyemi Kim, Jiyun Park, Taegyun Kwon, Dasaem Jeong, Juhan Nam

    Abstract: While piano music transcription models have shown high performance for solo piano recordings, their performance degrades when applied to ensemble recordings. This study aims to analyze the impact of different data augmentation methods on piano transcription performance, specifically focusing on mixing techniques applied to violin-piano ensembles. We apply mixing methods that consider both harmonic… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: To Appear IEEE ICASSP 2023

  23. arXiv:2303.08767  [pdf, other

    cs.CV cs.AI

    Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion

    Authors: Inhwa Han, Serin Yang, Taesung Kwon, Jong Chul Ye

    Abstract: Diffusion models have shown superior performance in image generation and manipulation, but the inherent stochasticity presents challenges in preserving and manipulating image content and identity. While previous approaches like DreamBooth and Textual Inversion have proposed model or latent representation personalization to maintain the content, their reliance on multiple reference images and compl… ▽ More

    Submitted 19 April, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

  24. arXiv:2302.04570  [pdf, other

    cs.LG cs.SI

    NeuKron: Constant-Size Lossy Compression of Sparse Reorderable Matrices and Tensors

    Authors: Taehyung Kwon, Jihoon Ko, Jinhong Jung, Kijung Shin

    Abstract: Many real-world data are naturally represented as a sparse reorderable matrix, whose rows and columns can be arbitrarily ordered (e.g., the adjacency matrix of a bipartite graph). Storing a sparse matrix in conventional ways requires an amount of space linear in the number of non-zeros, and lossy compression of sparse matrices (e.g., Truncated SVD) typically requires an amount of space linear in t… ▽ More

    Submitted 30 March, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted to WWW 2023 - The Web Conference 2023

    ACM Class: H.4.m

  25. arXiv:2211.14568  [pdf, other

    cs.LG cs.AI

    BeGin: Extensive Benchmark Scenarios and An Easy-to-use Framework for Graph Continual Learning

    Authors: Jihoon Ko, Shinhwan Kang, Taehyung Kwon, Heechan Moon, Kijung Shin

    Abstract: Continual Learning (CL) is the process of learning ceaselessly a sequence of tasks. Most existing CL methods deal with independent data (e.g., images and text) for which many benchmark frameworks and results under standard experimental settings are available. Compared to them, however, CL methods for graph data (graph CL) are relatively underexplored because of (a) the lack of standard experimenta… ▽ More

    Submitted 22 February, 2024; v1 submitted 26 November, 2022; originally announced November 2022.

  26. arXiv:2211.07131  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    YM2413-MDB: A Multi-Instrumental FM Video Game Music Dataset with Emotion Annotations

    Authors: Eunjin Choi, Yoonjin Chung, Seolhee Lee, JongIk Jeon, Taegyun Kwon, Juhan Nam

    Abstract: Existing multi-instrumental datasets tend to be biased toward pop and classical music. In addition, they generally lack high-level annotations such as emotion tags. In this paper, we propose YM2413-MDB, an 80s FM video game music dataset with multi-label emotion annotations. It includes 669 audio and MIDI files of music from Sega and MSX PC games in the 80s using YM2413, a programmable sound gener… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: The paper has been accepted for publication at ISMIR 2022

    ACM Class: I.2.1; I.2.7

  27. arXiv:2210.05409  [pdf, other

    cs.LG cs.AI

    LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward

    Authors: Daejin Jo, Sungwoong Kim, Daniel Wontae Nam, Taehwan Kwon, Seungeun Rho, Jongmin Kim, Donghoon Lee

    Abstract: Episodic count has been widely used to design a simple yet effective intrinsic motivation for reinforcement learning with a sparse reward. However, the use of episodic count in a high-dimensional state space as well as over a long episode time requires a thorough state compression and fast hashing, which hinders rigorous exploitation of it in such hard and complex exploration environments. Moreove… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022

  28. arXiv:2209.08206  [pdf, other

    cs.CL cs.LG

    Selective Token Generation for Few-shot Natural Language Generation

    Authors: Daejin Jo, Taehwan Kwon, Eun-Sol Kim, Sungwoong Kim

    Abstract: Natural language modeling with limited training data is a challenging problem, and many algorithms make use of large-scale pretrained language models (PLMs) for this due to its great generalization ability. Among them, additive learning that incorporates a task-specific adapter on top of the fixed large-scale PLM has been popularly used in the few-shot setting. However, this added adapter is still… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

    Comments: COLING 2022

  29. arXiv:2204.12223  [pdf, other

    cs.CV

    Context-Aware Sequence Alignment using 4D Skeletal Augmentation

    Authors: Taein Kwon, Bugra Tekin, Siyu Tang, Marc Pollefeys

    Abstract: Temporal alignment of fine-grained human actions in videos is important for numerous applications in computer vision, robotics, and mixed reality. State-of-the-art methods directly learn image-based embedding space by leveraging powerful deep convolutional neural networks. While being straightforward, their results are far from satisfactory, the aligned videos exhibit severe temporal discontinuity… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: Project page: http://www.taeinkwon.com/projects/casa. Accepted to CVPR 2022 Oral

  30. arXiv:2203.11889  [pdf, other

    cs.LG cs.AI cs.NE cs.SC stat.ML

    Insights From the NeurIPS 2021 NetHack Challenge

    Authors: Eric Hambro, Sharada Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty, Edward Grefenstette, Minqi Jiang, Daejin Jo, Anssi Kanervisto, Jongmin Kim, Sungwoong Kim, Robert Kirk, Vitaly Kurin, Heinrich Küttler, Taehwon Kwon, Donghoon Lee, Vegard Mella, Nantas Nardelli, Ivan Nazarov, Nikita Ovsov, Jack Parker-Holder, Roberta Raileanu, Karolis Ramanauskas, Tim Rocktäschel, Danielle Rothermel , et al. (4 additional authors not shown)

    Abstract: In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challeng… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: Under review at PMLR for the NeuRIPS 2021 Competition Workshop Track, 10 pages + 10 in appendices

  31. Captivate! Contextual Language Guidance for Parent-Child Interaction

    Authors: Taeahn Kwon, Minkyung Jeong, Eon-Suk Ko, Youngki Lee

    Abstract: To acquire language, children need rich language input. However, many parents find it difficult to provide children with sufficient language input, which risks delaying their language development. To aid these parents, we design Captivate!, the first system that provides contextual language guidance to parents during play. Our system tracks both visual and spoken language cues to infer targets of… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

    Comments: Published as a conference paper at CHI 2022

  32. arXiv:2112.07642  [pdf, other

    cs.CV cs.AI

    EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices

    Authors: Siwei Zhang, Qianli Ma, Yan Zhang, Zhiyin Qian, Taein Kwon, Marc Pollefeys, Federica Bogo, Siyu Tang

    Abstract: Understanding social interactions from egocentric views is crucial for many applications, ranging from assistive robotics to AR/VR. Key to reasoning about interactions is to understand the body pose and motion of the interaction partner from the egocentric view. However, research in this area is severely hindered by the lack of datasets. Existing datasets are limited in terms of either size, captu… ▽ More

    Submitted 16 August, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: Camera ready version for ECCV 2022, appendix included

  33. arXiv:2112.03696  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Noise Distribution Adaptive Self-Supervised Image Denoising using Tweedie Distribution and Score Matching

    Authors: Kwanyoung Kim, Taesung Kwon, Jong Chul Ye

    Abstract: Tweedie distributions are a special case of exponential dispersion models, which are often used in classical statistics as distributions for generalized linear models. Here, we reveal that Tweedie distributions also play key roles in modern deep learning era, leading to a distribution independent self-supervised image denoising formula without clean reference images. Specifically, by combining wit… ▽ More

    Submitted 4 December, 2021; originally announced December 2021.

  34. arXiv:2112.00819  [pdf, other

    cs.CL

    CO-STAR: Conceptualisation of Stereotypes for Analysis and Reasoning

    Authors: Teyun Kwon, Anandha Gopalan

    Abstract: Warning: this paper contains material which may be offensive or upsetting. While much of recent work has focused on the detection of hate speech and overtly offensive content, very little research has explored the more subtle but equally harmful language in the form of implied stereotypes. This is a challenging domain, made even more so by the fact that humans often struggle to understand and re… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: 12 pages, 1 figure

  35. arXiv:2111.04920  [pdf, other

    cs.HC

    PopBlends: Strategies for Conceptual Blending with Large Language Models

    Authors: Sitong Wang, Savvas Petridis, Taeahn Kwon, Xiaojuan Ma, Lydia B. Chilton

    Abstract: Pop culture is an important aspect of communication. On social media people often post pop culture reference images that connect an event, product or other entity to a pop culture domain. Creating these images is a creative challenge that requires finding a conceptual connection between the users' topic and a pop culture domain. In cognitive theory, this task is called conceptual blending. We pres… ▽ More

    Submitted 19 February, 2023; v1 submitted 8 November, 2021; originally announced November 2021.

  36. arXiv:2110.14875  [pdf, other

    cs.SI cs.DB

    Finding a Concise, Precise, and Exhaustive Set of Near Bi-Cliques in Dynamic Graphs

    Authors: Hyeonjeong Shin, Taehyung Kwon, Neil Shah, Kijung Shin

    Abstract: A variety of tasks on dynamic graphs, including anomaly detection, community detection, compression, and graph understanding, have been formulated as problems of identifying constituent (near) bi-cliques (i.e., complete bipartite graphs). Even when we restrict our attention to maximal ones, there can be exponentially many near bi-cliques, and thus finding all of them is practically impossible for… ▽ More

    Submitted 12 January, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: To be published in WSDM 2022

  37. arXiv:2110.02711  [pdf, other

    cs.CV cs.AI cs.LG

    DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation

    Authors: Gwanghyun Kim, Taesung Kwon, Jong Chul Ye

    Abstract: Recently, GAN inversion methods combined with Contrastive Language-Image Pretraining (CLIP) enables zero-shot image manipulation guided by text prompts. However, their applications to diverse real images are still difficult due to the limited GAN inversion capability. Specifically, these approaches often have difficulties in reconstructing images with novel poses, views, and highly variable conten… ▽ More

    Submitted 11 August, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: Accepted to CVPR 2022

  38. arXiv:2106.06210  [pdf, other

    cs.LG

    Learning to Pool in Graph Neural Networks for Extrapolation

    Authors: Jihoon Ko, Taehyung Kwon, Kijung Shin, Juho Lee

    Abstract: Graph neural networks (GNNs) are one of the most popular approaches to using deep learning on graph-structured data, and they have shown state-of-the-art performances on a variety of tasks. However, according to a recent study, a careful choice of pooling functions, which are used for the aggregation and readout operations in GNNs, is crucial for enabling GNNs to extrapolate. Without proper choice… ▽ More

    Submitted 6 October, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

  39. arXiv:2104.11181  [pdf, other

    cs.CV

    H2O: Two Hands Manipulating Objects for First Person Interaction Recognition

    Authors: Taein Kwon, Bugra Tekin, Jan Stuhmer, Federica Bogo, Marc Pollefeys

    Abstract: We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects. To this end, we propose a method to create a unified dataset for egocentric 3D interaction recognition. Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each fram… ▽ More

    Submitted 24 August, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: Accepted to ICCV 2021

  40. arXiv:2104.08538  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Cycle-free CycleGAN using Invertible Generator for Unsupervised Low-Dose CT Denoising

    Authors: Taesung Kwon, Jong Chul Ye

    Abstract: Recently, CycleGAN was shown to provide high-performance, ultra-fast denoising for low-dose X-ray computed tomography (CT) without the need for a paired training dataset. Although this was possible thanks to cycle consistency, CycleGAN requires two generators and two discriminators to enforce cycle consistency, demanding significant GPU resources and technical skills for training. A recent proposa… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: 12 pages, 12 figures

  41. arXiv:2102.11517  [pdf, other

    cs.LG cs.DB cs.SI

    SliceNStitch: Continuous CP Decomposition of Sparse Tensor Streams

    Authors: Taehyung Kwon, Inkyu Park, Dongjin Lee, Kijung Shin

    Abstract: Consider traffic data (i.e., triplets in the form of source-destination-timestamp) that grow over time. Tensors (i.e., multi-dimensional arrays) with a time mode are widely used for modeling and analyzing such multi-aspect data streams. In such tensors, however, new entries are added only once per period, which is often an hour, a day, or even a year. This discreteness of tensors has limited their… ▽ More

    Submitted 2 March, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: Updated Figures 4, 5, 6, 7, and 8 after fixing a bug in preprocessing the Divvy dataset. To appear at the 37th IEEE International Conference on Data Engineering (ICDE '21)

    ACM Class: H.2.8

  42. arXiv:2102.04680  [pdf, other

    cs.SD cs.LG eess.AS

    TräumerAI: Dreaming Music with StyleGAN

    Authors: Dasaem Jeong, Seungheon Doh, Taegyun Kwon

    Abstract: The goal of this paper to generate a visually appealing video that responds to music with a neural network so that each frame of the video reflects the musical characteristics of the corresponding audio clip. To achieve the goal, we propose a neural music visualizer directly mapping deep music embeddings to style embeddings of StyleGAN, named TräumerAI, which consists of a music auto-tagging model… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: presented in NeurIPS Workshop 2020: Machine Learning for Creativity and Design

  43. arXiv:2010.03281  [pdf, other

    cs.LG stat.ML

    Variational Intrinsic Control Revisited

    Authors: Taehwan Kwon

    Abstract: In this paper, we revisit variational intrinsic control (VIC), an unsupervised reinforcement learning method for finding the largest set of intrinsic options available to an agent. In the original work by Gregor et al. (2016), two VIC algorithms were proposed: one that represents the options explicitly, and the other that does it implicitly. We show that the intrinsic reward used in the latter is… ▽ More

    Submitted 17 March, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

  44. arXiv:2010.01104  [pdf, other

    eess.AS cs.LG cs.SD

    Polyphonic Piano Transcription Using Autoregressive Multi-State Note Model

    Authors: Taegyun Kwon, Dasaem Jeong, Juhan Nam

    Abstract: Recent advances in polyphonic piano transcription have been made primarily by a deliberate design of neural network architectures that detect different note states such as onset or sustain and model the temporal evolution of the states. The majority of them, however, use separate neural networks for each note state, thereby optimizing multiple loss functions, and also they handle the temporal evol… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: 6+2 pages, 5 figures, Camera-ready version. To be published in ISMIR 2020. Project page is available at https://TaegyunKwon.github.io/ar_multi_transcription

  45. arXiv:1905.12204  [pdf, other

    cs.LG cs.AI cs.MA cs.RO stat.ML

    Learning NP-Hard Multi-Agent Assignment Planning using GNN: Inference on a Random Graph and Provable Auction-Fitted Q-learning

    Authors: Hyunwook Kang, Taehwan Kwon, Jinkyoo Park, James R. Morrison

    Abstract: This paper explores the possibility of near-optimally solving multi-agent, multi-task NP-hard planning problems with time-dependent rewards using a learning-based algorithm. In particular, we consider a class of robot/machine scheduling problems called the multi-robot reward collection problem (MRRC). Such MRRC problems well model ride-sharing, pickup-and-delivery, and a variety of related problem… ▽ More

    Submitted 13 August, 2023; v1 submitted 29 May, 2019; originally announced May 2019.

    Journal ref: Neural Information Processing Systems (NeurIPS) 2022

  46. arXiv:1711.04480  [pdf, other

    cs.SD eess.AS

    Audio-to-score alignment of piano music using RNN-based automatic music transcription

    Authors: Taegyun Kwon, Dasaem Jeong, Juhan Nam

    Abstract: We propose a framework for audio-to-score alignment on piano performance that employs automatic music transcription (AMT) using neural networks. Even though the AMT result may contain some errors, the note prediction output can be regarded as a learned feature representation that is directly comparable to MIDI note or chroma representation. To this end, we employ two recurrent neural networks that… ▽ More

    Submitted 13 November, 2017; originally announced November 2017.

    Comments: 6 pages, 5 figures, The paper was published in SMC 2017 proceedings, Proceedings of 14th Sound and Music Computing Conference (SMC). 2017

  47. AMUSE: Empowering Users for Cost-Aware Offloading with Throughput-Delay Tradeoffs

    Authors: Youngbin Im, Carlee Joe-Wong, Sangtae Ha, Soumya Sen, Ted Taekyoung Kwon, Mung Chiang

    Abstract: To cope with recent exponential increases in demand for mobile data, wireless Internet service providers (ISPs) are increasingly changing their pricing plans and deploying WiFi hotspots to offload their mobile traffic. However, these ISP-centric approaches for traffic management do not always match the interests of mobile users. Users face a complex, multi-dimensional tradeoff between cost, throug… ▽ More

    Submitted 17 February, 2017; originally announced February 2017.

    Comments: 15 pages, 16 figures, IEEE Transactions on Mobile Computing, Vol. 15, No. 5, May 2016

    Journal ref: IEEE Transactions on Mobile Computing, Vol. 15, No. 5, May 2016

  48. RF Lens-Embedded Massive MIMO Systems: Fabrication Issues and Codebook Design

    Authors: Taehoon Kwon, Yeon-Geun Lim, Byung-Wook Min, Chan-Byoung Chae

    Abstract: In this paper, we investigate a radio frequency (RF) lens-embedded massive multiple-input multiple-output (MIMO) system and evaluate the system performance of limited feedback by utilizing a technique for generating a suitable codebook for the system. We fabricate an RF lens that operates on a 77 GHz (mmWave) band. Experimental results show a proper value of amplitude gain and an appropriate focus… ▽ More

    Submitted 15 April, 2016; v1 submitted 1 October, 2015; originally announced October 2015.

  49. arXiv:1403.4346  [pdf, ps, other

    cs.IT

    Spatial Topology Adjustment for Minimizing Multicell Network Power Consumption

    Authors: Taesoo Kwon

    Abstract: While the deployment of base stations (BSs) becomes increasingly dense in order to accommodate the growth in traffic demand, these BSs may be under-utilized during most hours except peak hours. Accordingly, the deactivation of these under-utilized BSs is regarded as the key to reducing network power consumption; however, the remaining active BSs should increase their transmit power in order to fil… ▽ More

    Submitted 18 March, 2014; originally announced March 2014.

    Comments: 30 pages, 6 figures, 2 tables

  50. arXiv:1403.4342  [pdf, ps, other

    cs.IT cs.NI

    Spatial Performance Analysis and Design Principles for Wireless Peer Discovery

    Authors: Taesoo Kwon, Ji-Woong Choi

    Abstract: In wireless peer-to-peer networks that serve various proximity-based applications, peer discovery is the key to identifying other peers with which a peer can communicate and an understanding of its performance is fundamental to the design of an efficient discovery operation. This paper analyzes the performance of wireless peer discovery through comprehensively considering the wireless channel, spa… ▽ More

    Submitted 11 May, 2014; v1 submitted 18 March, 2014; originally announced March 2014.

    Comments: 12 pages (double columns), 10 figures, 1 table, to appear in the IEEE Transactions on Wireless Communications