Skip to main content

Showing 1–49 of 49 results for author: Koh, J

  1. arXiv:2407.06537  [pdf, other

    cs.CL cs.AI

    Efficient and Accurate Memorable Conversation Model using DPO based on sLLM

    Authors: Youngkyung Seo, Yoonseok Heo, Jun-Seok Koh, Du-Seoung Chang

    Abstract: In multi-session dialog system, it is essential to continuously update the memory as the session progresses. Simply accumulating memory can make it difficult to focus on the content of the conversation for inference due to the limited input sentence size. Therefore, efficient and accurate conversation model that is capable of managing memory to reflect the conversation history continuously is nece… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2407.01476  [pdf, other

    cs.AI cs.CL cs.LG

    Tree Search for Language Model Agents

    Authors: Jing Yu Koh, Stephen McAleer, Daniel Fried, Ruslan Salakhutdinov

    Abstract: Autonomous agents powered by language models (LMs) have demonstrated promise in their ability to perform decision-making tasks such as web automation. However, a key limitation remains: LMs, primarily optimized for natural language understanding and generation, struggle with multi-step reasoning, planning, and using environmental feedback when attempting to solve realistic computer tasks. Towards… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 11 pages. Models and code available at https://jykoh.com/search-agents

  3. arXiv:2406.12814  [pdf, other

    cs.LG cs.CL cs.CR cs.CV

    Adversarial Attacks on Multimodal Agents

    Authors: Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, Aditi Raghunathan

    Abstract: Vision-enabled language models (VLMs) are now used to build autonomous multimodal agents capable of taking actions in real environments. In this paper, we show that multimodal agents raise new safety risks, even though attacking agents is more challenging than prior attacks due to limited access to and knowledge about the environment. Our attacks use adversarial text strings to guide gradient-base… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 19 pages

  4. arXiv:2406.08718  [pdf, other

    cs.CL

    Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations

    Authors: Jun-Woo Kim, Ji-Eun Han, Jun-Seok Koh, Hyeon-Tae Seo, Du-Seong Chang

    Abstract: We introduce a pipeline that leverages Large Language Models (LLMs) to transform single-turn psychotherapy counseling sessions into multi-turn interactions. While AI-supported online counseling services for individuals with mental disorders exist, they are often constrained by the limited availability of multi-turn training datasets and frequently fail to fully utilize therapists' expertise. Our p… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: IJCAI 2024 AI4Research workshop

  5. arXiv:2406.00505  [pdf, other

    cs.CV

    Improving Text Generation on Images with Synthetic Captions

    Authors: Jun Young Koh, Sang Hyun Park, Joy Song

    Abstract: The recent emergence of latent diffusion models such as SDXL and SD 1.5 has shown significant capability in generating highly detailed and realistic images. Despite their remarkable ability to produce images, generating accurate text within images still remains a challenging task. In this paper, we examine the validity of fine-tuning approaches in generating legible text within the image. We propo… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 9 pages, 12 figures

  6. arXiv:2405.18623  [pdf

    cs.HC

    I See You: Teacher Analytics with GPT-4 Vision-Powered Observational Assessment

    Authors: Unggi Lee, Yeil Jeong, Junbo Koh, Gyuri Byun, Yunseo Lee, Hyunwoong Lee, Seunmin Eun, Jewoong Moon, Cheolil Lim, Hyeoncheol Kim

    Abstract: This preliminary study explores the integration of GPT-4 Vision (GPT-4V) technology into teacher analytics, focusing on its applicability in observational assessment to enhance reflective teaching practice. This research is grounded in developing a Video-based Automatic Assessment System (VidAAS) empowered by GPT-4V. Our approach aims to revolutionize teachers' assessment of students' practices by… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 27 pages, 5 figures, 4 tables

  7. arXiv:2404.07554  [pdf, other

    cs.CV cs.AI

    CAT: Contrastive Adapter Training for Personalized Image Generation

    Authors: Jae Wan Park, Sang Hyun Park, Jun Young Koh, Junha Lee, Min Song

    Abstract: The emergence of various adapters, including Low-Rank Adaptation (LoRA) applied from the field of natural language processing, has allowed diffusion models to personalize image generation at a low cost. However, due to the various challenges including limited datasets and shortage of regularization and computation resources, adapter training often results in unsatisfactory outcomes, leading to the… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPRW 2024

  8. arXiv:2404.03984  [pdf, other

    cs.MA cs.LG eess.SY

    ROMA-iQSS: An Objective Alignment Approach via State-Based Value Learning and ROund-Robin Multi-Agent Scheduling

    Authors: Chi-Hui Lin, Joewie J. Koh, Alessandro Roncone, Lijun Chen

    Abstract: Effective multi-agent collaboration is imperative for solving complex, distributed problems. In this context, two key challenges must be addressed: first, autonomously identifying optimal objectives for collective outcomes; second, aligning these objectives among agents. Traditional frameworks, often reliant on centralized learning, struggle with scalability and efficiency in large multi-agent sys… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 10 pages, 3 figures, extended version of our 2024 American Control Conference publication

    Journal ref: Proceedings of the 2024 American Control Conference (ACC), 2024

  9. arXiv:2404.00930  [pdf, other

    cs.CL

    PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models

    Authors: Ji-Eun Han, Jun-Seok Koh, Hyeon-Tae Seo, Du-Seong Chang, Kyung-Ah Sohn

    Abstract: We present a novel end-to-end personality-based synthetic dialogue data generation pipeline, specifically designed to elicit responses from large language models via prompting. We design the prompts to generate more human-like dialogues considering real-world scenarios when users engage with chatbots. We introduce PSYDIAL, the first Korean dialogue dataset focused on personality-based dialogues, c… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: LREC-COLING 2024 Main

  10. arXiv:2403.09969  [pdf, other

    cs.LG

    Prediction of Vessel Arrival Time to Pilotage Area Using Multi-Data Fusion and Deep Learning

    Authors: Xiaocai Zhang, Xiuju Fu, Zhe Xiao, Haiyan Xu, Xiaoyang Wei, Jimmy Koh, Daichi Ogawa, Zheng Qin

    Abstract: This paper investigates the prediction of vessels' arrival time to the pilotage area using multi-data fusion and deep learning approaches. Firstly, the vessel arrival contour is extracted based on Multivariate Kernel Density Estimation (MKDE) and clustering. Secondly, multiple data sources, including Automatic Identification System (AIS), pilotage booking information, and meteorological data, are… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: The 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

  11. arXiv:2403.06433  [pdf, other

    cs.CV cs.AI

    Fine-Grained Pillar Feature Encoding Via Spatio-Temporal Virtual Grid for 3D Object Detection

    Authors: Konyul Park, Yecheol Kim, Junho Koh, Byungwoo Park, Jun Won Choi

    Abstract: Developing high-performance, real-time architectures for LiDAR-based 3D object detectors is essential for the successful commercialization of autonomous vehicles. Pillar-based methods stand out as a practical choice for onboard deployment due to their computational efficiency. However, despite their efficiency, these methods can sometimes underperform compared to alternative point encoding techniq… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: ICRA 2024

  12. arXiv:2402.17553  [pdf, other

    cs.AI cs.CL cs.CV cs.HC

    OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

    Authors: Raghav Kapoor, Yash Parag Butala, Melisa Russak, Jing Yu Koh, Kiran Kamble, Waseem Alshikh, Ruslan Salakhutdinov

    Abstract: For decades, human-computer interaction has fundamentally been manual. Even today, almost all productive work done on the computer necessitates human input at every step. Autonomous virtual agents represent an exciting step in automating many of these menial tasks. Virtual agents would empower users with limited technical proficiency to harness the full possibilities of computer systems. They coul… ▽ More

    Submitted 28 February, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  13. arXiv:2401.13649  [pdf, other

    cs.LG cs.CL cs.CV

    VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

    Authors: Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried

    Abstract: Autonomous agents capable of planning, reasoning, and executing actions on the web offer a promising avenue for automating computer tasks. However, the majority of existing benchmarks primarily focus on text-based agents, neglecting many natural tasks that require visual information to effectively solve. Given that most computer interfaces cater to human perception, visual information often augmen… ▽ More

    Submitted 5 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted to ACL 2024. 24 pages. Project page: https://jykoh.com/vwa

  14. arXiv:2311.13691  [pdf

    physics.ao-ph cs.AI physics.comp-ph

    Next-Generation Earth System Models: Towards Reliable Hybrid Models for Weather and Climate Applications

    Authors: Tom Beucler, Erwan Koch, Sven Kotlarski, David Leutwyler, Adrien Michel, Jonathan Koh

    Abstract: We review how machine learning has transformed our ability to model the Earth system, and how we expect recent breakthroughs to benefit end-users in Switzerland in the near future. Drawing from our review, we identify three recommendations. Recommendation 1: Develop Hybrid AI-Physical Models: Emphasize the integration of AI and physical modeling for improved reliability, especially for longer pr… ▽ More

    Submitted 26 January, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: 12 pages, 1 figure, submitted as part of the Swiss Academy of Engineering Sciences' 2024 whitepaper on "Artificial Intelligence for Climate Change Mitigation"

  15. arXiv:2310.07478  [pdf, other

    cs.AI

    Multimodal Graph Learning for Generative Tasks

    Authors: Minji Yoon, Jing Yu Koh, Bryan Hooi, Ruslan Salakhutdinov

    Abstract: Multimodal learning combines multiple data modalities, broadening the types and complexity of data our models can utilize: for example, from plain text to image-caption pairs. Most multimodal learning algorithms focus on modeling simple one-to-one pairs of data from two modalities, such as image-caption pairs, or audio-text pairs. However, in most real-world settings, entities of different modalit… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

  16. arXiv:2309.05032  [pdf, other

    cs.CV

    Unified Contrastive Fusion Transformer for Multimodal Human Action Recognition

    Authors: Kyoung Ok Yang, Junho Koh, Jun Won Choi

    Abstract: Various types of sensors have been considered to develop human action recognition (HAR) models. Robust HAR performance can be achieved by fusing multimodal data acquired by different sensors. In this paper, we introduce a new multimodal fusion architecture, referred to as Unified Contrastive Fusion Transformer (UCFFormer) designed to integrate data with diverse distributions to enhance HAR perform… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  17. arXiv:2305.17216  [pdf, other

    cs.CL cs.CV cs.LG

    Generating Images with Multimodal Language Models

    Authors: Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov

    Abstract: We propose a method to fuse frozen text-only large language models (LLMs) with pre-trained image encoder and decoder models, by mapping between their embedding spaces. Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue. Ours is the first approach capable of conditioning on arbitrarily interleaved image and text inputs to… ▽ More

    Submitted 13 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023. Project page: http://jykoh.com/gill

  18. arXiv:2302.06833  [pdf, other

    cs.CV

    VQ3D: Learning a 3D-Aware Generative Model on ImageNet

    Authors: Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun

    Abstract: Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars. However, these models struggle on larger, more complex datasets. To model diverse and unconstrained image collections such as ImageNet, we present VQ3D, which introduces a NeRF-based decoder… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: 15 pages. For visual results, please visit the project webpage at http://kylesargent.github.io/vq3d

  19. arXiv:2301.13823  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Grounding Language Models to Images for Multimodal Inputs and Outputs

    Authors: Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried

    Abstract: We propose an efficient method to ground pretrained text-only language models to the visual domain, enabling them to process arbitrarily interleaved image-and-text data, and generate text interleaved with retrieved images. Our method leverages the abilities of language models learnt from large scale text-only pretraining, such as in-context learning and free-form text generation. We keep the langu… ▽ More

    Submitted 13 June, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: Published in ICML 2023. Project page: https://jykoh.com/fromage

  20. arXiv:2212.00442  [pdf, other

    cs.CV

    MGTANet: Encoding Sequential LiDAR Points Using Long Short-Term Motion-Guided Temporal Attention for 3D Object Detection

    Authors: Junho Koh, Junhyung Lee, Youngwoo Lee, Jaekyum Kim, Jun Won Choi

    Abstract: Most scanning LiDAR sensors generate a sequence of point clouds in real-time. While conventional 3D object detectors use a set of unordered LiDAR points acquired over a fixed time interval, recent studies have revealed that substantial performance improvement can be achieved by exploiting the spatio-temporal context present in a sequence of LiDAR point sets. In this paper, we propose a novel 3D ob… ▽ More

    Submitted 21 December, 2022; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI'23)

  21. arXiv:2210.03112  [pdf, other

    cs.LG cs.CL cs.CV cs.RO

    A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

    Authors: Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh

    Abstract: Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments, as a step towards robots that can follow human instructions. However, given the scarcity of human instruction data and limited diversity in the training environments, these agents still struggle with complex language grounding and spatial langua… ▽ More

    Submitted 17 April, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: CVPR 2023

  22. arXiv:2210.00087  [pdf, other

    cs.CV

    D-Align: Dual Query Co-attention Network for 3D Object Detection Based on Multi-frame Point Cloud Sequence

    Authors: Junhyung Lee, Junho Koh, Youngwoo Lee, Jun Won Choi

    Abstract: LiDAR sensors are widely used for 3D object detection in various mobile robotics applications. LiDAR sensors continuously generate point cloud data in real-time. Conventional 3D object detectors detect objects using a set of points acquired over a fixed duration. However, recent studies have shown that the performance of object detection can be further enhanced by utilizing spatio-temporal informa… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

  23. arXiv:2207.00117  [pdf, other

    cs.CR

    WAKU-RLN-RELAY: Privacy-Preserving Peer-to-Peer Economic Spam Protection

    Authors: Sanaz Taheri-Boshrooyeh, Oskar Thorén, Barry Whitehat, Wei Jie Koh, Onur Kilic, Kobi Gurkan

    Abstract: In this paper, we propose WAKU-RLN-RELAY as a spam-protected gossip-based routing protocol that can run in heterogeneous networks. It features a privacy-preserving peer-to-peer (p2p) economic spam protection mechanism. WAKU-RLN-RELAY addresses the performance and privacy issues of the state-of-the-art p2p spam prevention techniques including peer scoring utilized by libp2p, and proof-of-work used… ▽ More

    Submitted 30 June, 2022; originally announced July 2022.

    Comments: IEEE ICDCSW 2022

  24. arXiv:2207.00116  [pdf, other

    cs.CR

    Privacy-Preserving Spam-Protected Gossip-Based Routing

    Authors: Sanaz Taheri-Boshrooyeh, Oskar Thorén, Barry Whitehat, Wei Jie Koh, Onur Kilic, Kobi Gurkan

    Abstract: WAKU-RLN-RELAY is an anonymous peer-to-peer gossip-based routing protocol that features a privacy-preserving spam-protection with cryptographically guaranteed economic incentives. While being an anonymous routing protocol where routed messages are not attributable to their origin, it allows global identification and removal of spammers. It addresses the performance and privacy issues of its counte… ▽ More

    Submitted 30 June, 2022; originally announced July 2022.

    Comments: IEEE ICDCS 2022

  25. arXiv:2206.10789  [pdf, other

    cs.CV cs.LG

    Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

    Authors: Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu

    Abstract: We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge. Parti treats text-to-image generation as a sequence-to-sequence modeling problem, akin to machine translation, with sequences of image tokens as the target outputs rather than text tokens in a… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: Preprint

  26. arXiv:2204.02960  [pdf, other

    cs.CV cs.AI cs.LG

    Simple and Effective Synthesis of Indoor 3D Scenes

    Authors: Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

    Abstract: We study the problem of synthesizing immersive 3D indoor scenes from one or more images. Our aim is to generate high-resolution images and videos from novel viewpoints, including viewpoints that extrapolate far beyond the input images while maintaining 3D consistency. Existing approaches are highly complex, with many separately trained stages and components. We propose a simple alternative: an ima… ▽ More

    Submitted 1 December, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: AAAI 2023

  27. Federated Active Learning (F-AL): an Efficient Annotation Strategy for Federated Learning

    Authors: Jin-Hyun Ahn, Kyungsang Kim, Jeongwan Koh, Quanzheng Li

    Abstract: Federated learning (FL) has been intensively investigated in terms of communication efficiency, privacy, and fairness. However, efficient annotation, which is a pain point in real-world FL applications, is less studied. In this project, we propose to apply active learning (AL) and sampling strategy into the FL framework to reduce the annotation workload. We expect that the AL and FL can improve th… ▽ More

    Submitted 7 February, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: 13 pages, 9 figures, submitted for conference publication

  28. arXiv:2112.07116  [pdf, other

    cs.CV cs.LG

    Joint 3D Object Detection and Tracking Using Spatio-Temporal Representation of Camera Image and LiDAR Point Clouds

    Authors: Junho Koh, Jaekyum Kim, Jinhyuk Yoo, Yecheol Kim, Dongsuk Kum, Jun Won Choi

    Abstract: In this paper, we propose a new joint object detection and tracking (JoDT) framework for 3D object detection and tracking based on camera and LiDAR sensors. The proposed method, referred to as 3D DetecTrack, enables the detector and tracker to cooperate to generate a spatio-temporal representation of the camera and LiDAR data, with which 3D object detection and tracking are then performed. The det… ▽ More

    Submitted 15 December, 2021; v1 submitted 13 December, 2021; originally announced December 2021.

  29. arXiv:2110.04627  [pdf, other

    cs.CV cs.LG

    Vector-quantized Image Modeling with Improved VQGAN

    Authors: Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

    Abstract: Pretraining language models with next-token prediction on massive text corpora has delivered phenomenal zero-shot, few-shot, transfer learning and multi-tasking capabilities on both generative and discriminative language tasks. Motivated by this success, we explore a Vector-quantized Image Modeling (VIM) approach that involves pretraining a Transformer to predict rasterized image tokens autoregres… ▽ More

    Submitted 4 June, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

    Comments: Accepted in ICLR 2022

  30. arXiv:2105.08756  [pdf, other

    cs.CV cs.LG

    Pathdreamer: A World Model for Indoor Navigation

    Authors: Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

    Abstract: People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals. Towards equipping computational agents with similar capabilities, we introduce Pathdreamer, a visual world model for agents navigating in novel indoor environments. Given one or more previous visual observations, Pathdreamer generates plausible high-re… ▽ More

    Submitted 16 August, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: In ICCV 2021

  31. arXiv:2105.06887  [pdf

    eess.IV cs.CV cs.LG

    A Frequency Domain Constraint for Synthetic and Real X-ray Image Super Resolution

    Authors: Qing Ma, Jae Chul Koh, WonSook Lee

    Abstract: Synthetic X-ray images are simulated X-ray images projected from CT data. High-quality synthetic X-ray images can facilitate various applications such as surgical image guidance systems and VR training simulations. However, it is difficult to produce high-quality arbitrary view synthetic X-ray images in real-time due to different CT slice thickness, high computational cost, and the complexity of a… ▽ More

    Submitted 10 August, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

  32. arXiv:2104.10386  [pdf, other

    cs.CV

    Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps

    Authors: Yuk Heo, Yeong Jun Koh, Chang-Su Kim

    Abstract: We propose a novel guided interactive segmentation (GIS) algorithm for video objects to improve the segmentation accuracy and reduce the interaction time. First, we design the reliability-based attention module to analyze the reliability of multiple annotated frames. Second, we develop the intersection-aware propagation module to propagate segmentation results to neighboring frames. Third, we intr… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

    Comments: accepted to CVPR2021 (oral)

  33. arXiv:2104.06697  [pdf, other

    cs.CV

    Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction

    Authors: Wonkwang Lee, Whie Jung, Han Zhang, Ting Chen, Jing Yu Koh, Thomas Huang, Hyungsuk Yoon, Honglak Lee, Seunghoon Hong

    Abstract: Learning to predict the long-term future of video frames is notoriously challenging due to inherent ambiguities in the distant future and dramatic amplifications of prediction error through time. Despite the recent advances in the literature, existing approaches are limited to moderately short-term prediction (less than a few seconds), while extrapolating it to a longer future quickly leads to des… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: Accepted as a conference paper at ICLR 2021

  34. arXiv:2101.04702  [pdf, other

    cs.CV

    Cross-Modal Contrastive Learning for Text-to-Image Generation

    Authors: Han Zhang, Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang

    Abstract: The output of text-to-image synthesis systems should be coherent, clear, photo-realistic scenes with high semantic fidelity to their conditioned text descriptions. Our Cross-Modal Contrastive Generative Adversarial Network (XMC-GAN) addresses this challenge by maximizing the mutual information between image and text. It does this via multiple contrastive losses which capture inter-modality and int… ▽ More

    Submitted 14 April, 2022; v1 submitted 12 January, 2021; originally announced January 2021.

    Comments: CVPR 2021

  35. arXiv:2011.10278  [pdf, other

    cs.CV

    Joint Representation of Temporal Image Sequences and Object Motion for Video Object Detection

    Authors: Junho Koh, Jaekyum Kim, Younji Shin, Byeongwon Lee, Seungji Yang, Jun Won Choi

    Abstract: In this paper, we propose a new video object detector (VoD) method referred to as temporal feature aggregation and motion-aware VoD (TM-VoD), which produces a joint representation of temporal image sequences and object motion. The proposed TM-VoD aggregates visual feature maps extracted by convolutional neural networks applying the temporal attention gating and spatial feature alignment. This temp… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

  36. arXiv:2011.03775  [pdf, other

    cs.CV cs.AI

    Text-to-Image Generation Grounded by Fine-Grained User Attention

    Authors: Jing Yu Koh, Jason Baldridge, Honglak Lee, Yinfei Yang

    Abstract: Localized Narratives is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases. We propose TReCS, a sequential model that exploits this grounding to generate images. TReCS uses descriptions to retrieve segmentation masks and predict object labels aligned with mouse traces. These alignments are used t… ▽ More

    Submitted 30 March, 2021; v1 submitted 7 November, 2020; originally announced November 2020.

    Comments: To appear in WACV 2021

  37. arXiv:2010.11457  [pdf, other

    eess.AS cs.SD

    Momentum Contrast Speaker Representation Learning

    Authors: Jangho Lee, Jaihyun Koh, Sungroh Yoon

    Abstract: Unsupervised representation learning has shown remarkable achievement by reducing the performance gap with supervised feature learning, especially in the image domain. In this study, to extend the technique of unsupervised learning to the speech domain, we propose the Momentum Contrast for VoxCeleb (MoCoVox) as a form of learning mechanism. We pre-trained the MoCoVox on the VoxCeleb1 by implementi… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  38. arXiv:2008.00679  [pdf, other

    cs.RO cs.GT cs.LG cs.MA

    Cooperative Control of Mobile Robots with Stackelberg Learning

    Authors: Joewie J. Koh, Guohui Ding, Christoffer Heckman, Lijun Chen, Alessandro Roncone

    Abstract: Multi-robot cooperation requires agents to make decisions that are consistent with the shared goal without disregarding action-specific preferences that might arise from asymmetry in capabilities and individual objectives. To accomplish this goal, we propose a method named SLiCC: Stackelberg Learning in Cooperative Control. SLiCC models the problem as a partially observable stochastic game compose… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: 8 pages, 7 figures

    ACM Class: I.2.9; I.2.6; I.2.11

    Journal ref: Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 7985-7992

  39. arXiv:2007.08139  [pdf, other

    cs.CV

    Interactive Video Object Segmentation Using Global and Local Transfer Modules

    Authors: Yuk Heo, Yeong Jun Koh, Chang-Su Kim

    Abstract: An interactive video object segmentation algorithm, which takes scribble annotations on query objects as input, is proposed in this paper. We develop a deep neural network, which consists of the annotation network (A-Net) and the transfer network (T-Net). First, given user scribbles on a frame, A-Net yields a segmentation result based on the encoder-decoder architecture. Second, T-Net transfers th… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  40. arXiv:2003.09540  [pdf

    cs.RO cs.GT cs.LG cs.MA

    Distributed Reinforcement Learning for Cooperative Multi-Robot Object Manipulation

    Authors: Guohui Ding, Joewie J. Koh, Kelly Merckaert, Bram Vanderborght, Marco M. Nicotra, Christoffer Heckman, Alessandro Roncone, Lijun Chen

    Abstract: We consider solving a cooperative multi-robot object manipulation task using reinforcement learning (RL). We propose two distributed multi-agent RL approaches: distributed approximate RL (DA-RL), where each agent applies Q-learning with individual reward functions; and game-theoretic RL (GT-RL), where the agents update their Q-values based on the Nash equilibrium of a bimatrix Q-value game. We val… ▽ More

    Submitted 20 March, 2020; originally announced March 2020.

    Comments: 3 pages, 3 figures

    ACM Class: I.2.9; I.2.6; I.2.11

    Journal ref: Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2020, pp. 1831-1833

  41. arXiv:2002.04455  [pdf

    eess.IV cs.CV

    HRINet: Alternative Supervision Network for High-resolution CT image Interpolation

    Authors: Jiawei Li, Jae Chul Koh, Won-Sook Lee

    Abstract: Image interpolation in medical area is of high importance as most 3D biomedical volume images are sampled where the distance between consecutive slices significantly greater than the in-plane pixel size due to radiation dose or scanning time. Image interpolation creates a number of new slices between known slices in order to obtain an isotropic volume image. The results can be used for the higher… ▽ More

    Submitted 7 June, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

  42. arXiv:2002.02634  [pdf, other

    cs.CV

    SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side Information

    Authors: Jing Yu Koh, Duc Thanh Nguyen, Quang-Trung Truong, Sai-Kit Yeung, Alexander Binder

    Abstract: Fully-automatic execution is the ultimate goal for many Computer Vision applications. However, this objective is not always realistic in tasks associated with high failure costs, such as medical applications. For these tasks, semi-automatic methods allowing minimal effort from users to guide computer algorithms are often preferred due to desirable accuracy and performance. Inspired by the practica… ▽ More

    Submitted 17 July, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

    Comments: ECCV 2020

  43. arXiv:1910.04446  [pdf, other

    physics.soc-ph cond-mat.stat-mech cs.GT

    Passive network evolution promotes group welfare in complex networks

    Authors: Ye Ye, Xiao Rong Hang, Jin Ming Koh, Jarosław Adam Miszczak, Kang Hao Cheong, Neng-gang Xie

    Abstract: The Parrondo's paradox is a counterintuitive phenomenon in which individually losing strategies, canonically termed game A and game B, are combined to produce winning outcomes. In this paper, a co-evolution of game dynamics and network structure is adopted to study adaptability and survivability in multi-agent dynamics. The model includes action A, representing a rewiring process on the network, a… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Comments: 15 pages, 9 figures

    Journal ref: Chaos, Solitons & Fractals, Vol. 130, pp. 109464 (2020)

  44. arXiv:1811.08705  [pdf, other

    cs.CR cs.CL cs.LG cs.NI

    Inline Detection of Domain Generation Algorithms with Context-Sensitive Word Embeddings

    Authors: Joewie J. Koh, Barton Rhodes

    Abstract: Domain generation algorithms (DGAs) are frequently employed by malware to generate domains used for connecting to command-and-control (C2) servers. Recent work in DGA detection leveraged deep learning architectures like convolutional neural networks (CNNs) and character-level long short-term memory networks (LSTMs) to classify domains. However, these classifiers perform poorly with wordlist-based… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

    Comments: 6 pages, 5 figures, 2 tables

    ACM Class: K.6.5; C.2.0; I.2.7; I.2.6

    Journal ref: Proceedings of the 2018 IEEE International Conference on Big Data, 2018, pp. 2966-2971

  45. arXiv:1807.06233  [pdf, other

    cs.CV

    Robust Deep Multi-modal Learning Based on Gated Information Fusion Network

    Authors: Jaekyum Kim, Junho Koh, Yecheol Kim, Jaehyung Choi, Youngbae Hwang, Jun Won Choi

    Abstract: The goal of multi-modal learning is to use complimentary information on the relevant task provided by the multiple modalities to achieve reliable and robust performance. Recently, deep learning has led significant improvement in multi-modal learning by allowing for the information fusion in the intermediate feature levels. This paper addresses a problem of designing robust deep multi-modal learnin… ▽ More

    Submitted 2 November, 2018; v1 submitted 17 July, 2018; originally announced July 2018.

    Comments: 2018 Asian Conference on Computer Vision (ACCV)

  46. arXiv:1711.08142  [pdf, other

    cs.IT

    On the Feasibility of Full-duplex Large-scale MIMO Cellular Systems

    Authors: Jeongwan Koh, Yeon-Geun Lim, Chan-Byoung Chae, Joonhyuk Kang

    Abstract: This paper concerns the feasibility of full-duplex large-scale multiple-input-multiple-output (MIMO) cellular systems. We first propose a pilot transmission scheme and assess its performance, specifically the ergodic sum-rate. The proposed scheme -- the simultaneous pilot transmission (SPT) -- enables to reduce pilot overhead, where the pilot overhead depends on the number of antennas at the base… ▽ More

    Submitted 22 November, 2017; originally announced November 2017.

    Comments: 29 pages, 8 figures, submitted to transaction on wireless communications (TWC)

  47. arXiv:1606.09187  [pdf, other

    cs.CV cs.NE stat.ML

    Object Boundary Detection and Classification with Image-level Labels

    Authors: Jing Yu Koh, Wojciech Samek, Klaus-Robert Müller, Alexander Binder

    Abstract: Semantic boundary and edge detection aims at simultaneously detecting object edge pixels in images and assigning class labels to them. Systematic training of predictors for this task requires the labeling of edges in images which is a particularly tedious task. We propose a novel strategy for solving this task, when pixel-level annotations are not available, performing it in an almost zero-shot ma… ▽ More

    Submitted 25 June, 2017; v1 submitted 29 June, 2016; originally announced June 2016.

    Comments: 12 pages, 2 figures, accepted for GCPR 2017 - 39th German Conference on Pattern Recognition

  48. Geo-spatial Location Spoofing Detection for Internet of Things

    Authors: Jing Yang Koh, Ido Nevat, Derek Leong, Wai-Choong Wong

    Abstract: We develop a new location spoofing detection algorithm for geo-spatial tagging and location-based services in the Internet of Things (IoT), called Enhanced Location Spoofing Detection using Audibility (ELSA) which can be implemented at the backend server without modifying existing legacy IoT systems. ELSA is based on a statistical decision theory framework and uses two-way time-of-arrival (TW-TOA)… ▽ More

    Submitted 28 March, 2017; v1 submitted 17 February, 2016; originally announced February 2016.

    Comments: A shorten version of this work has been accepted to the IEEE IoT Journal (IoT-J) on 08-Feb-2016

    Journal ref: IEEE Internet of Things Journal, vol. 3, no. 6, pp. 971-978, Dec. 2016

  49. arXiv:1601.07229  [pdf, other

    cs.HC eess.SY

    Genie: A Longitudinal Study Comparing Physical and Software-augmented Thermostats in Office Buildings

    Authors: Bharathan Balaji, Jason Koh, Nadir Weibel, Yuvraj Agarwal

    Abstract: Thermostats are primary interfaces for occupants of office buildings to express their comfort preferences. However, standard thermostats are often ineffective due to inaccessibility, lack of information, or limited responsiveness, leading to occupant discomfort. Software thermostats based on web or smartphone applications provide alternative interfaces to occupants with minimal deployment cost. Ho… ▽ More

    Submitted 26 January, 2016; originally announced January 2016.

    Comments: 12 pages

    ACM Class: H.5.3