Skip to main content

Showing 1–4 of 4 results for author: Ryoo, W

  1. arXiv:2406.09396  [pdf, other

    cs.CV

    Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA

    Authors: Jongwoo Park, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryoo, Donghyun Kim, Michael S. Ryoo

    Abstract: Long-form videos that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely-related. Therefore, when performing long-form video question answering (LVQA),all information necessary to generate a correct response can often be contained within a small subset of frames. Recent literature explore the use of large lan… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2309.04509  [pdf, other

    cs.SD cs.CV cs.GR eess.AS

    The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion

    Authors: Yujin Jeong, Wonjeong Ryoo, Seunghyun Lee, Dabin Seo, Wonmin Byeon, Sangpil Kim, Jinkyu Kim

    Abstract: In recent years, video generation has become a prominent generative tool and has drawn significant attention. However, there is little consideration in audio-to-video generation, though audio contains unique qualities like temporal semantics and magnitude. Hence, we propose The Power of Sound (TPoS) model to incorporate audio input that includes both changeable temporal semantics and magnitude. To… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: ICCV2023

  3. Event Fusion Photometric Stereo Network

    Authors: Wonjeong Ryoo, Giljoo Nam, Jae-Sang Hyun, Sangpil Kim

    Abstract: We present a novel method to estimate the surface normal of an object in an ambient light environment using RGB and event cameras. Modern photometric stereo methods rely on an RGB camera, mainly in a dark room, to avoid ambient illumination. To alleviate the limitations of the darkroom environment and to use essential light information, we employ an event camera with a high dynamic range and low l… ▽ More

    Submitted 11 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: 33 pages, 11 figures

  4. arXiv:2204.09273  [pdf, other

    cs.CV cs.AI

    Sound-Guided Semantic Video Generation

    Authors: Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Chanyoung Kim, Won Jeong Ryoo, Sang Ho Yoon, Hyunjun Cho, Jihyun Bae, Jinkyu Kim, Sangpil Kim

    Abstract: The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation. However, the generated motion in the video is usually not semantically meaningful due to the difficulty of determining the direction and magnitude in the StyleGAN latent space. In this paper, we propose a framework to generate realistic videos by leveraging multimodal (sound… ▽ More

    Submitted 21 October, 2022; v1 submitted 20 April, 2022; originally announced April 2022.