Skip to main content

Showing 1–50 of 50 results for author: Kim, Y J

  1. arXiv:2405.00229  [pdf, other

    cs.HC cs.AI cs.PL

    Aptly: Making Mobile Apps from Natural Language

    Authors: Evan W. Patton, David Y. J. Kim, Ashley Granquist, Robin Liu, Arianna Scott, Jennet Zamanova, Harold Abelson

    Abstract: We present Aptly, an extension of the MIT App Inventor platform enabling mobile app development via natural language powered by code-generating large language models (LLMs). Aptly complements App Inventor's block language with a text language designed to allow visual code generation via text-based LLMs. We detail the technical aspects of how the Aptly server integrates LLMs with a realtime collabo… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 11 pages, 7 figures, 2 tables

  2. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  3. arXiv:2403.13513  [pdf, other

    cs.CV cs.AI cs.CL

    What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models

    Authors: Junho Kim, Yeon Ju Kim, Yong Man Ro

    Abstract: This paper presents a way of enhancing the reliability of Large Multi-modal Models (LMMs) in addressing hallucination, where the models generate cross-modal inconsistent responses. Without additional training, we propose Counterfactual Inception, a novel method that implants counterfactual thinking into LMMs using self-generated counterfactual keywords. Our method is grounded in the concept of cou… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Project page: https://ivy-lvlm.github.io/Counterfactual-Inception/

  4. arXiv:2401.08417  [pdf, other

    cs.CL

    Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

    Authors: Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim

    Abstract: Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We… ▽ More

    Submitted 2 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at ICML 2024

  5. arXiv:2312.01253  [pdf

    cs.IT

    On Merits of Faster-than-Nyquist Signaling in the Finite Blocklength Regime

    Authors: Yong Jin Daniel Kim

    Abstract: We identify potential merits of faster-than-Nyquist (FTN) signaling in the finite blocklength (FBL) regime. A unique aspect of FTN signaling is that it can increase the blocklength by packing more data symbols within the same time and frequency to yield strictly higher number of independent signaling dimensions than that of Nyquist rate signaling. Using the finite-blocklength information theory, w… ▽ More

    Submitted 25 April, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

  6. arXiv:2311.08590  [pdf, other

    cs.CL

    PEMA: An Offsite-Tunable Plug-in External Memory Adaptation for Language Models

    Authors: HyunJin Kim, Young Jin Kim, JinYeong Bak

    Abstract: Pre-trained language models (PLMs) show impressive performance in various downstream NLP tasks. However, pre-training large language models demands substantial memory and training compute. Furthermore, due to the substantial resources required, many PLM weights are confidential. Consequently, users are compelled to share their data with model owners for fine-tuning specific tasks. To overcome the… ▽ More

    Submitted 29 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024

  7. arXiv:2310.02410  [pdf, other

    cs.LG cs.CL

    Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness

    Authors: Young Jin Kim, Raffy Fahim, Hany Hassan Awadalla

    Abstract: Large Mixture of Experts (MoE) models could achieve state-of-the-art quality on various language tasks, including machine translation task, thanks to the efficient model scaling capability with expert parallelism. However, it has brought a fundamental issue of larger memory consumption and increased memory bandwidth bottleneck at deployment time. In this paper, we propose Mixture of Quantized Expe… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  8. arXiv:2309.14741  [pdf, other

    eess.AS cs.SD

    Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification

    Authors: Hee-Soo Heo, KiHyun Nam, Bong-Jin Lee, Youngki Kwon, Minjae Lee, You Jin Kim, Joon Son Chung

    Abstract: In the field of speaker verification, session or channel variability poses a significant challenge. While many contemporary methods aim to disentangle session information from speaker embeddings, we introduce a novel approach using an additional embedding to represent the session information. This is achieved by training an auxiliary network appended to the speaker embedding extractor which remain… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  9. arXiv:2309.12306  [pdf, other

    cs.CV cs.SD eess.AS

    TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

    Authors: Chaeyoung Jung, Suyeon Lee, Kihyun Nam, Kyeongha Rho, You Jin Kim, Youngjoon Jang, Joon Son Chung

    Abstract: The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames. Previous works have dealt with the task by exploring network architectures while learning effective representations has been less explored. In this work, we propose TalkNCE, a novel talk-aware contrastive loss. The loss is only applied to part of the full se… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  10. arXiv:2309.11674  [pdf, other

    cs.CL

    A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

    Authors: Haoran Xu, Young Jin Kim, Amr Sharaf, Hany Hassan Awadalla

    Abstract: Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities… ▽ More

    Submitted 6 February, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted at ICLR 2024

  11. arXiv:2308.15772  [pdf, other

    cs.CL

    Task-Based MoE for Multitask Multilingual Machine Translation

    Authors: Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla

    Abstract: Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic… ▽ More

    Submitted 24 October, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  12. arXiv:2308.13539  [pdf, other

    cs.HC cs.CY

    Redefining Computer Science Education: Code-Centric to Natural Language Programming with AI-Based No-Code Platforms

    Authors: David Y. J. Kim

    Abstract: This paper delves into the evolving relationship between humans and computers in the realm of programming. Historically, programming has been a dialogue where humans meticulously crafted communication to suit machine understanding, shaping the trajectory of computer science education. However, the advent of AI-based no-code platforms is revolutionizing this dynamic. Now, humans can converse in the… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: 7 pages, 1 figure

  13. arXiv:2308.09723  [pdf, other

    cs.LG cs.CL

    FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs

    Authors: Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla

    Abstract: Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment due to their substantial memory requirements. Furthermore, the latest generative models suffer from high inference costs caused by the memory bandwidth bottleneck in the auto-regressive decoding process. To address these issues, we propose an efficient… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  14. arXiv:2306.00680  [pdf, other

    cs.SD cs.AI eess.AS

    Encoder-decoder multimodal speaker change detection

    Authors: Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You Jin Kim, Young-ki Kwon, Minjae Lee, Bong-Jin Lee

    Abstract: The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance. Recently, multimodal SCD (MMSCD) models, which utilise text modality in addition to audio, have shown improved performance. In this study, the proposed model are bui… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 5 pages, accepted for presentation at INTERSPEECH 2023

  15. Multicast and Unicast Superposition Transmission in MIMO OFDMA Systems with Statistical CSIT

    Authors: Yong Jin Daniel Kim, David Vargas

    Abstract: We consider a downlink multicast and unicast superposition transmission in multi-layer Multiple-Input Multiple-Output (MIMO) Orthogonal Frequency Division Multiple Access (OFDMA) systems when only the statistical channel state information is available at the transmitter (CSIT). Multiple users can be scheduled by using the time/frequency resources in OFDMA, while for each scheduled user MIMO spatia… ▽ More

    Submitted 29 September, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 17 pages, 10 figures, 2 tables

  16. arXiv:2302.09210  [pdf, other

    cs.CL

    How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation

    Authors: Amr Hendy, Mohamed Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hitokazu Matsushita, Young Jin Kim, Mohamed Afify, Hany Hassan Awadalla

    Abstract: Generative Pre-trained Transformer (GPT) models have shown remarkable capabilities for natural language generation, but their performance for machine translation has not been thoroughly investigated. In this paper, we present a comprehensive evaluation of GPT models for machine translation, covering various aspects such as quality of different GPT models in comparison with state-of-the-art researc… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  17. arXiv:2211.10017  [pdf, other

    cs.CL cs.AI cs.LG

    Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production

    Authors: Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla

    Abstract: Mixture of Experts (MoE) models with conditional execution of sparsely activated layers have enabled training models with a much larger number of parameters. As a result, these models have achieved significantly better quality on various natural language processing tasks including machine translation. However, it remains challenging to deploy such models in real-life scenarios due to the large mem… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted to SustaiNLP 2022 (EMNLP 2022)

  18. arXiv:2211.04768  [pdf, other

    eess.AS cs.SD

    Absolute decision corrupts absolutely: conservative online speaker diarisation

    Authors: Youngki Kwon, Hee-Soo Heo, Bong-Jin Lee, You Jin Kim, Jee-weon Jung

    Abstract: Our focus lies in developing an online speaker diarisation framework which demonstrates robust performance across diverse domains. In online speaker diarisation, outputs generated in real-time are irreversible, and a few misjudgements in the early phase of an input session can lead to catastrophic results. We hypothesise that cautiously increasing the number of estimated speakers is of paramount i… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: 5pages, 2 figure, 4 tables, submitted to ICASSP

  19. arXiv:2211.04060  [pdf, other

    cs.SD cs.CL eess.AS

    High-resolution embedding extractor for speaker diarisation

    Authors: Hee-Soo Heo, Youngki Kwon, Bong-Jin Lee, You Jin Kim, Jee-weon Jung

    Abstract: Speaker embedding extractors significantly influence the performance of clustering-based speaker diarisation systems. Conventionally, only one embedding is extracted from each speech segment. However, because of the sliding window approach, a segment easily includes two or more speakers owing to speaker change points. This study proposes a novel embedding extractor architecture, referred to as a h… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 5pages, 2 figure, 3 tables, submitted to ICASSP

  20. arXiv:2210.07592  [pdf, other

    cs.RO cs.GR

    TSP-Bot: Robotic TSP Pen Art using High-DoF Manipulators

    Authors: Daeun Song, Eunjung Lim, Jiyoon Park, Minjung Jung, Young J. Kim

    Abstract: TSP art is an art form for drawing an image using piecewise-continuous line segments. We present TSP-Bot, a robotic pen drawing system capable of creating complicated TSP pen art on a planar surface using multiple colors. The system begins by converting a colored raster image into a set of points that represent the image's tone, which can be controlled by adjusting the point density. Next, the sys… ▽ More

    Submitted 10 April, 2024; v1 submitted 14 October, 2022; originally announced October 2022.

  21. arXiv:2210.07590  [pdf, other

    cs.RO cs.GR

    Stroke-based Rendering and Planning for Robotic Performance of Artistic Drawing

    Authors: Ivaylo Ilinkin, Daeun Song, Young J. Kim

    Abstract: We present a new robotic drawing system based on stroke-based rendering (SBR). Our motivation is the artistic quality of the whole performance. Not only should the generated strokes in the final drawing resemble the input image, but the stroke sequence should also exhibit a human artist's planning process. Thus, when a robot executes the drawing task, both the drawing results and the way the robot… ▽ More

    Submitted 3 March, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: Submitted to IEEE IROS 2023

  22. arXiv:2210.07535  [pdf, other

    cs.CL cs.LG

    AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation

    Authors: Ganesh Jawahar, Subhabrata Mukherjee, Xiaodong Liu, Young Jin Kim, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah, Sebastien Bubeck, Jianfeng Gao

    Abstract: Mixture-of-Expert (MoE) models have obtained state-of-the-art performance in Neural Machine Translation (NMT) tasks. Existing works in MoE mostly consider a homogeneous design where the same number of experts of the same size are placed uniformly throughout the network. Furthermore, existing MoE works do not consider computational constraints (e.g., FLOPs, latency) to guide their design. To this e… ▽ More

    Submitted 7 June, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: ACL 2023 Findings

  23. arXiv:2209.02966  [pdf, other

    cs.HC

    ExpTrialMng: A Universal Experiment Trial Manager for AR/VR/MR Experiments based on Unity

    Authors: Jinwook Kim, Yee Joon Kim, Jeongmi Lee

    Abstract: Based on the improvement of recent virtual and augmented reality (VR and AR) Head Mounted Display (HMD), there have been attempts to adopt VR and AR in various fields. Since VR and AR could provide more immersive experimental environments and stimuli than 2D settings in a cost-efficient way, psychological and cognitive researchers are particularly interested in using these platforms. However, ther… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: 5 pages, 3 figures, https://github.com/jinwook31/Unity-Experiment-Trial-Manager

  24. arXiv:2208.06874  [pdf, other

    cs.CL cs.LG

    Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU

    Authors: Hossam Amer, Young Jin Kim, Mohamed Afify, Hitokazu Matsushita, Hany Hassan Awadallah

    Abstract: Multilingual Neural Machine Translation has been showing great success using transformer models. Deploying these models is challenging because they usually require large vocabulary (vocab) sizes for various languages. This limits the speed of predicting the output tokens in the last vocab projection layer. To alleviate these challenges, this paper proposes a fast vocabulary projection method via c… ▽ More

    Submitted 14 August, 2022; originally announced August 2022.

    Comments: 12 pages, accepted at AMTA-2022 (Association for Machine Translation in the Americas Conference)

  25. arXiv:2206.03715  [pdf, other

    cs.AI cs.CL cs.LG

    Modularized Transfer Learning with Multiple Knowledge Graphs for Zero-shot Commonsense Reasoning

    Authors: Yu Jin Kim, Beong-woo Kwak, Youngwook Kim, Reinald Kim Amplayo, Seung-won Hwang, Jinyoung Yeo

    Abstract: Commonsense reasoning systems should be able to generalize to diverse reasoning cases. However, most state-of-the-art approaches depend on expensive data annotations and overfit to a specific benchmark without learning how to perform general semantic reasoning. To overcome these drawbacks, zero-shot QA systems have shown promise as a robust learning scheme by transforming a commonsense knowledge g… ▽ More

    Submitted 22 June, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: Accepted to NAACL2022

  26. arXiv:2205.14336  [pdf, other

    cs.LG

    Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

    Authors: Rui Liu, Young Jin Kim, Alexandre Muzio, Hany Hassan Awadalla

    Abstract: Sparsely activated transformers, such as Mixture of Experts (MoE), have received great interest due to their outrageous scaling capability which enables dramatical increases in model size without significant increases in computational cost. To achieve this, MoE models replace the feedforward sub-layer with Mixture-of-Experts sub-layer in transformers and use a gating network to route each token to… ▽ More

    Submitted 4 July, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

    Comments: Accepted to ICML 2022

  27. arXiv:2203.08488  [pdf, other

    eess.AS cs.AI

    Pushing the limits of raw waveform speaker recognition

    Authors: Jee-weon Jung, You Jin Kim, Hee-Soo Heo, Bong-Jin Lee, Youngki Kwon, Joon Son Chung

    Abstract: In recent years, speaker recognition systems based on raw waveform inputs have received increasing attention. However, the performance of such systems are typically inferior to the state-of-the-art handcrafted feature-based counterparts, which demonstrate equal error rates under 1% on the popular VoxCeleb1 test set. This paper proposes a novel speaker recognition model based on raw waveform inputs… ▽ More

    Submitted 28 March, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: submitted to INTERSPEECH 2022 as a conference paper. 5 pages, 2 figures, 5 tables

  28. arXiv:2201.11661  [pdf, other

    cs.LG cs.AI

    TrustAL: Trustworthy Active Learning using Knowledge Distillation

    Authors: Beong-woo Kwak, Youngwook Kim, Yu Jin Kim, Seung-won Hwang, Jinyoung Yeo

    Abstract: Active learning can be defined as iterations of data labeling, model training, and data acquisition, until sufficient labels are acquired. A traditional view of data acquisition is that, through iterations, knowledge from human labels and models is implicitly distilled to monotonically increase the accuracy and label consistency. Under this assumption, the most recently trained model is a good sur… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

    Comments: Accepted to AAAI2022

  29. arXiv:2110.04260  [pdf, other

    cs.CL cs.LG

    Taming Sparsely Activated Transformer with Stochastic Experts

    Authors: Simiao Zuo, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao, Jianfeng Gao

    Abstract: Sparsely activated models (SAMs), such as Mixture-of-Experts (MoE), can easily scale to have outrageously large amounts of parameters without significant increase in computational cost. However, SAMs are reported to be parameter inefficient such that larger models do not always lead to better performance. While most on-going research focuses on improving SAMs models by exploring methods of routing… ▽ More

    Submitted 3 February, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: ICLR 2022

  30. arXiv:2110.03380  [pdf, other

    cs.SD cs.CL

    Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity

    Authors: You Jin Kim, Hee-Soo Heo, Jee-weon Jung, Youngki Kwon, Bong-Jin Lee, Joon Son Chung

    Abstract: The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisation. Speaker embeddings play a crucial role in the performance of diarisation systems, but they often capture spurious information such as noise, adversely affecting performance. Our previous work has proposed an auto-encoder-based dimensionality reduction module to help remove the redundant informat… ▽ More

    Submitted 3 November, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: This paper was submitted to ICASSP 2023

  31. arXiv:2110.03361  [pdf, other

    eess.AS cs.AI

    Multi-scale speaker embedding-based graph attention networks for speaker diarisation

    Authors: Youngki Kwon, Hee-Soo Heo, Jee-weon Jung, You Jin Kim, Bong-Jin Lee, Joon Son Chung

    Abstract: The objective of this work is effective speaker diarisation using multi-scale speaker embeddings. Typically, there is a trade-off between the ability to recognise short speaker segments and the discriminative power of the embedding, according to the segment length used for embedding extraction. To this end, recent works have proposed the use of multi-scale embeddings where segments with varying le… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: 5 pages, 2 figures, submitted to ICASSP as a conference paper

  32. arXiv:2109.10465  [pdf, other

    cs.CL cs.AI cs.LG

    Scalable and Efficient MoE Training for Multitask Multilingual Models

    Authors: Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andres Felipe Cruz Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla

    Abstract: The Mixture of Experts (MoE) models are an emerging class of sparsely activated deep learning models that have sublinear compute costs with respect to their parameters. In contrast with dense models, the sparse architecture of MoE offers opportunities for drastically growing model size with significant accuracy gain while consuming much lower compute budget. However, supporting large scale MoE tra… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

  33. arXiv:2108.07640  [pdf, other

    cs.CV cs.SD eess.AS eess.IV

    Look Who's Talking: Active Speaker Detection in the Wild

    Authors: You Jin Kim, Hee-Soo Heo, Soyeon Choe, Soo-Whan Chung, Yoohwan Kwon, Bong-Jin Lee, Youngki Kwon, Joon Son Chung

    Abstract: In this work, we present a novel audio-visual dataset for active speaker detection in the wild. A speaker is considered active when his or her face is visible and the voice is audible simultaneously. Although active speaker detection is a crucial pre-processing step for many audio-visual tasks, there is no existing dataset of natural human speech to evaluate the performance of active speaker detec… ▽ More

    Submitted 17 August, 2021; originally announced August 2021.

    Comments: To appear in Interspeech 2021. Data will be available from https://github.com/clovaai/lookwhostalking

  34. arXiv:2107.08737  [pdf, other

    cs.GR cs.CV

    Synthesizing Human Faces using Latent Space Factorization and Local Weights (Extended Version)

    Authors: Minyoung Kim, Young J. Kim

    Abstract: We propose a 3D face generative model with local weights to increase the model's variations and expressiveness. The proposed model allows partial manipulation of the face while still learning the whole face mesh. For this purpose, we address an effective way to extract local facial features from the entire data and explore a way to manipulate them during a holistic generation. First, we factorize… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: Extended version of the paper to will be published in Computer Graphics International 2021 (LNCS Proceeding Papers)

  35. arXiv:2105.02580  [pdf, other

    cs.LG

    Time-Aware Q-Networks: Resolving Temporal Irregularity for Deep Reinforcement Learning

    Authors: Yeo Jin Kim, Min Chi

    Abstract: Deep Reinforcement Learning (DRL) has shown outstanding performance on inducing effective action policies that maximize expected long-term return on many complex tasks. Much of DRL work has been focused on sequences of events with discrete time steps and ignores the irregular time intervals between consecutive events. Given that in many real-world domains, data often consists of temporal sequences… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: 36 pages, 27 figures

  36. arXiv:2105.00568  [pdf, other

    cs.LG

    InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem

    Authors: Markel Sanz Ausin, Hamoon Azizsoltani, Song Ju, Yeo Jin Kim, Min Chi

    Abstract: The temporal Credit Assignment Problem (CAP) is a well-known and challenging task in AI. While Reinforcement Learning (RL), especially Deep RL, works well when immediate rewards are available, it can fail when only delayed rewards are available or when the reward function is noisy. In this work, we propose delegating the CAP to a Neural Network-based algorithm named InferNet that explicitly learns… ▽ More

    Submitted 2 May, 2021; originally announced May 2021.

  37. arXiv:2104.02879  [pdf, other

    eess.AS cs.LG cs.SD

    Adapting Speaker Embeddings for Speaker Diarisation

    Authors: Youngki Kwon, Jee-weon Jung, Hee-Soo Heo, You Jin Kim, Bong-Jin Lee, Joon Son Chung

    Abstract: The goal of this paper is to adapt speaker embeddings for solving the problem of speaker diarisation. The quality of speaker embeddings is paramount to the performance of speaker diarisation systems. Despite this, prior works in the field have directly used embeddings designed only to be effective on the speaker verification task. In this paper, we propose three techniques that can be used to bett… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 5 pages, 2 figures, 3 tables, submitted to Interspeech as a conference paper

  38. arXiv:2011.10348  [pdf, other

    cs.RO cs.GR

    Accelerating Probabilistic Volumetric Mapping using Ray-Tracing Graphics Hardware

    Authors: Heajung Min, Kyung Min Han, Young J. Kim

    Abstract: Probabilistic volumetric mapping (PVM) represents a 3D environmental map for an autonomous robotic navigational task. A popular implementation such as Octomap is widely used in the robotics community for such a purpose. The Octomap relies on octree to represent a PVM and its main bottleneck lies in massive ray-shooting to determine the occupancy of the underlying volumetric voxel grids. In this pa… ▽ More

    Submitted 2 December, 2020; v1 submitted 20 November, 2020; originally announced November 2020.

    Comments: Submitted IEEE International Conference on Robotics and Automation

  39. Solving Footstep Planning as a Feasibility Problem using L1-norm Minimization (Extended Version)

    Authors: Daeun Song, Pierre Fernbach, Thomas Flayols, Andrea Del Prete, Nicolas Mansard, Steve Tonneau, Young J. Kim

    Abstract: One challenge of legged locomotion on uneven terrains is to deal with both the discrete problem of selecting a contact surface for each footstep and the continuous problem of placing each footstep on the selected surface. Consequently, footstep planning can be addressed with a Mixed Integer Program (MIP), an elegant but computationally-demanding method, which can make it unsuitable for online plan… ▽ More

    Submitted 16 May, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

    Comments: Extended version of the paper to be published in IEEE Robotics and Automation Letters

    Journal ref: IEEE Robotics and Automation Letters, Volume 6, Issue 3, July 2021

  40. arXiv:2010.13382  [pdf, other

    cs.CL

    FastFormers: Highly Efficient Transformer Models for Natural Language Understanding

    Authors: Young Jin Kim, Hany Hassan Awadalla

    Abstract: Transformer-based models are the state-of-the-art for Natural Language Understanding (NLU) applications. Models are getting bigger and better on various tasks. However, Transformer models remain computationally challenging since they are not efficient at inference-time compared to traditional approaches. In this paper, we present FastFormers, a set of recipes to achieve efficient inference-time pe… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: Accepted to SustaiNLP 2020 at EMNLP 2020

  41. arXiv:2005.08606  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    End-to-End Lip Synchronisation Based on Pattern Classification

    Authors: You Jin Kim, Hee Soo Heo, Soo-Whan Chung, Bong-Jin Lee

    Abstract: The goal of this work is to synchronise audio and video of a talking face using deep neural network models. Existing works have trained networks on proxy tasks such as cross-modal similarity learning, and then computed similarities between audio and video frames using a sliding window approach. While these methods demonstrate satisfactory performance, the networks are not trained directly on the t… ▽ More

    Submitted 19 March, 2021; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: slt 2021 accepted

  42. arXiv:1911.06144  [pdf, other

    cs.CG cs.GR cs.RO

    A Penetration Metric for Deforming Tetrahedra using Object Norm

    Authors: Jisu Kim, Young J. Kim

    Abstract: In this paper, we propose a novel penetration metric, called deformable penetration depth PDd, to define a measure of inter-penetration between two linearly deforming tetrahedra using the object norm. First of all, we show that a distance metric for a tetrahedron deforming between two configurations can be found in closed form based on object norm. Then, we show that the PDd between an intersectin… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: Published in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019

  43. arXiv:1901.09351  [pdf, other

    cs.CV

    Automated Quality Control in Image Segmentation: Application to the UK Biobank Cardiac MR Imaging Study

    Authors: Robert Robinson, Vanya V. Valindria, Wenjia Bai, Ozan Oktay, Bernhard Kainz, Hideaki Suzuki, Mihir M. Sanghvi, Nay Aung, Jos$é$ Miguel Paiva, Filip Zemrak, Kenneth Fung, Elena Lukaschuk, Aaron M. Lee, Valentina Carapella, Young Jin Kim, Stefan K. Piechnik, Stefan Neubauer, Steffen E. Petersen, Chris Page, Paul M. Matthews, Daniel Rueckert, Ben Glocker

    Abstract: Background: The trend towards large-scale studies including population imaging poses new challenges in terms of quality control (QC). This is a particular issue when automatic processing tools, e.g. image segmentation methods, are employed to derive quantitative measures or biomarkers for later analyses. Manual inspection and visual QC of each segmentation isn't feasible at large scale. However, i… ▽ More

    Submitted 27 January, 2019; originally announced January 2019.

    Comments: 14 pages, 7 figures, Journal of Cardiovascular Magnetic Resonance

  44. arXiv:1811.00912  [pdf

    cs.IT

    Two-Layered Superposition of Broadcast/Multicast and Unicast Signals in Multiuser OFDMA Systems

    Authors: David Vargas, Yong Jin Daniel Kim

    Abstract: We study optimal delivery strategies of one common and $K$ independent messages from a source to multiple users in wireless environments. In particular, two-layered superposition of broadcast/multicast and unicast signals is considered in a downlink multiuser OFDMA system. In the literature and industry, the two-layer superposition is often considered as a pragmatic approach to make a compromise b… ▽ More

    Submitted 4 December, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

  45. arXiv:1806.06244  [pdf, other

    cs.CV

    Real-time Prediction of Segmentation Quality

    Authors: Robert Robinson, Ozan Oktay, Wenjia Bai, Vanya Valindria, Mihir Sanghvi, Nay Aung, José Paiva, Filip Zemrak, Kenneth Fung, Elena Lukaschuk, Aaron Lee, Valentina Carapella, Young Jin Kim, Bernhard Kainz, Stefan Piechnik, Stefan Neubauer, Steffen Petersen, Chris Page, Daniel Rueckert, Ben Glocker

    Abstract: Recent advances in deep learning based image segmentation methods have enabled real-time performance with human-level accuracy. However, occasionally even the best method fails due to low image quality, artifacts or unexpected behaviour of black box algorithms. Being able to predict segmentation quality in the absence of ground truth is of paramount importance in clinical practice, but also in lar… ▽ More

    Submitted 16 June, 2018; originally announced June 2018.

    Comments: Accepted at MICCAI 2018

  46. arXiv:1712.00010  [pdf, ps, other

    cs.LG stat.ML

    Highrisk Prediction from Electronic Medical Records via Deep Attention Networks

    Authors: You Jin Kim, Yun-Geun Lee, Jeong Whun Kim, Jin Joo Park, Borim Ryu, Jung-Woo Ha

    Abstract: Predicting highrisk vascular diseases is a significant issue in the medical domain. Most predicting methods predict the prognosis of patients from pathological and radiological measurements, which are expensive and require much time to be analyzed. Here we propose deep attention models that predict the onset of the high risky vascular disease from symbolic medical histories sequence of hypertensio… ▽ More

    Submitted 30 November, 2017; originally announced December 2017.

    Comments: Accepted poster at NIPS 2017 Workshop on Machine Learning for Health (https://ml4health.github.io/2017/)

  47. arXiv:1710.09289  [pdf, other

    cs.CV

    Automated cardiovascular magnetic resonance image analysis with fully convolutional networks

    Authors: Wenjia Bai, Matthew Sinclair, Giacomo Tarroni, Ozan Oktay, Martin Rajchl, Ghislain Vaillant, Aaron M. Lee, Nay Aung, Elena Lukaschuk, Mihir M. Sanghvi, Filip Zemrak, Kenneth Fung, Jose Miguel Paiva, Valentina Carapella, Young Jin Kim, Hideaki Suzuki, Bernhard Kainz, Paul M. Matthews, Steffen E. Petersen, Stefan K. Piechnik, Stefan Neubauer, Ben Glocker, Daniel Rueckert

    Abstract: Cardiovascular magnetic resonance (CMR) imaging is a standard imaging modality for assessing cardiovascular diseases (CVDs), the leading cause of death globally. CMR enables accurate quantification of the cardiac chamber volume, ejection fraction and myocardial mass, providing information for diagnosis and monitoring of CVDs. However, for years, clinicians have been relying on manual approaches fo… ▽ More

    Submitted 22 May, 2018; v1 submitted 25 October, 2017; originally announced October 2017.

    Comments: Accepted for publication by Journal of Cardiovascular Magnetic Resonance

  48. arXiv:1704.02724  [pdf, other

    cs.GR

    CanvoX: High-resolution VR Painting in Large Volumetric Canvas

    Authors: Yeojin Kim, Byungmoon Kim, Jiyang Kim, Young J. Kim

    Abstract: With virtual reality, digital painting on 2D canvases is now being extended to 3D spaces. Tilt Brush and Oculus Quill are widely accepted among artists as tools that pave the way to a new form of art - 3D emmersive painting. Current 3D painting systems are only a start, emitting textured triangular geometries. In this paper, we advance this new art of 3D painting to 3D volumetric painting that ena… ▽ More

    Submitted 10 April, 2017; originally announced April 2017.

  49. arXiv:1508.06181  [pdf, other

    cs.GR cs.CG cs.RO

    PolyDepth: Real-time Penetration Depth Computation using Iterative Contact-Space Projection

    Authors: Changsoo Je, Min Tang, Youngeun Lee, Minkyoung Lee, Young J. Kim

    Abstract: We present a real-time algorithm that finds the Penetration Depth (PD) between general polygonal models based on iterative and local optimization techniques. Given an in-collision configuration of an object in configuration space, we find an initial collision-free configuration using several methods such as centroid difference, maximally clear configuration, motion coherence, random configuration,… ▽ More

    Submitted 25 August, 2015; originally announced August 2015.

    Comments: Presented in ACM SIGGRAPH 2012. 15 pages, 23 figures

    ACM Class: I.2.9; I.3.5; I.3.7; I.6.8

    Journal ref: ACM Transactions on Graphics (ToG 2012), Volume 31, Issue 1, Article 5, pp. 1-14, January 1, 2012

  50. arXiv:1403.1048  [pdf

    physics.soc-ph cs.GT

    Network Structures between Strategies in Iterated Prisoners' Dilemma Games

    Authors: Young Jin Kim, Myungkyoon Roh, Seung-Woo Son

    Abstract: We use replicator dynamics to study an iterated prisoners' dilemma game with memory. In this study, we investigate the characteristics of all 32 possible strategies with a single-step memory by observing the results when each strategy encounters another one. Based on these results, we define similarity measures between the 32 strategies and perform a network analysis of the relationship between th… ▽ More

    Submitted 5 March, 2014; originally announced March 2014.

    Journal ref: The Korean Physical Society February 2014, Volume 64, Issue 3, pp 341-345