Skip to main content

Showing 1–50 of 171 results for author: Ye, J C

  1. arXiv:2407.11555  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Guided Generation of Minority Samples Using Diffusion Models

    Authors: Soobin Um, Jong Chul Ye

    Abstract: We present a novel approach for generating minority samples that live on low-density regions of a data manifold. Our framework is built upon diffusion models, leveraging the principle of guided sampling that incorporates an arbitrary energy-based guidance during inference time. The key defining feature of our sampler lies in its \emph{self-contained} nature, \ie, implementable solely with a pretra… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  2. arXiv:2407.11244  [pdf, other

    cs.LG

    (Deep) Generative Geodesics

    Authors: Beomsu Kim, Michael Puthawala, Jong Chul Ye, Emanuele Sansone

    Abstract: In this work, we propose to study the global geometrical properties of generative models. We introduce a new Riemannian metric to assess the similarity between any two data points. Importantly, our metric is agnostic to the parametrization of the generative model and requires only the evaluation of its data likelihood. Moreover, the metric leads to the conceptual definition of generative distances… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 10 pages, 9 figures

  3. arXiv:2407.10641  [pdf, other

    cs.CV cs.LG

    Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems

    Authors: Hyungjin Chung, Jong Chul Ye

    Abstract: Recent inverse problem solvers that leverage generative diffusion priors have garnered significant attention due to their exceptional quality. However, adaptation of the prior is necessary when there exists a discrepancy between the training and testing distributions. In this work, we propose deep diffusion image prior (DDIP), which generalizes the recent adaptation method of SCD by introducing a… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, 25 pages, 8 figures

  4. arXiv:2406.08070  [pdf, ps, other

    cs.CV cs.AI cs.LG

    CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

    Authors: Hyungjin Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, Jong Chul Ye

    Abstract: Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG has notable drawbacks. For instance, DDIM with CFG lacks invertibility, complicating image editing; furthermore, high guidance scales, essential for high-quality outputs, frequently result in issues like mode collapse. Contrary to the widespread belief that these are… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2405.17829  [pdf, other

    cs.LG cs.AI

    LDMol: Text-Conditioned Molecule Diffusion Model Leveraging Chemically Informative Latent Space

    Authors: Jinho Chang, Jong Chul Ye

    Abstract: With the emergence of diffusion models as the frontline of generative models, many researchers have proposed molecule generation techniques using conditional diffusion models. However, due to the fundamental nature of a molecule, which carries highly entangled correlations within a small number of atoms and bonds, it becomes difficult for a model to connect raw data with the conditions when the co… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  6. arXiv:2405.17720  [pdf, other

    cs.CV cs.AI cs.LG

    MindFormer: A Transformer Architecture for Multi-Subject Brain Decoding via fMRI

    Authors: Inhwa Han, Jaayeon Lee, Jong Chul Ye

    Abstract: Research efforts to understand neural signals have been ongoing for many years, with visual decoding from fMRI signals attracting considerable attention. Particularly, the advent of image diffusion models has advanced the reconstruction of images from fMRI data significantly. However, existing approaches often introduce inter- and intra- subject variations in the reconstructed images, which can co… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  7. arXiv:2405.16823  [pdf, other

    cs.CV cs.AI

    Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection

    Authors: Gihyun Kwon, Jangho Park, Jong Chul Ye

    Abstract: While text-to-image models have achieved impressive capabilities in image generation and editing, their application across various modalities often necessitates training separate models. Inspired by existing method of single image editing with self attention injection and video editing with shared attention, we propose a novel unified editing framework that combines the strengths of both approache… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Page: https://unifyediting.github.io/

  8. arXiv:2404.03913  [pdf, other

    cs.CV cs.AI cs.LG

    Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

    Authors: Gihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron

    Abstract: While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time. Specifically, the method breaks the process into two steps: creating a template image aligned with t… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  9. arXiv:2403.15249  [pdf, other

    cs.CV cs.AI cs.LG

    Spectral Motion Alignment for Video Motion Transfer using Diffusion Models

    Authors: Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee, Jong Chul Ye

    Abstract: The evolution of diffusion models has greatly impacted video generation and understanding. Particularly, text-to-video diffusion models (VDMs) have significantly facilitated the customization of input video with target appearance, motion, etc. Despite these advances, challenges persist in accurately distilling motion information from video frames. While existing works leverage the consecutive fram… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Project page: https://geonyeong-park.github.io/spectral-motion-alignment/

  10. arXiv:2403.14183  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation

    Authors: Kwanyoung Kim, Yujin Oh, Jong Chul Ye

    Abstract: The recent success of CLIP has demonstrated promising results in zero-shot semantic segmentation by transferring muiltimodal knowledge to pixel-level classification. However, leveraging pre-trained CLIP knowledge to closely align text embeddings with pixel embeddings still has limitations in existing approaches. To address this issue, we propose OTSeg, a novel multimodal attention mechanism aimed… ▽ More

    Submitted 11 July, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: ECCV 2024; 23 pages, 8 tables, 8 figures; Project Page: https://cubeyoung.github.io/OTSeg_project/

  11. arXiv:2403.13551  [pdf, other

    cs.CV cs.LG

    Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing

    Authors: Hangeol Chang, Jinho Chang, Jong Chul Ye

    Abstract: Despite recent advancements in text-to-image diffusion models facilitating various image editing techniques, complex text prompts often lead to an oversight of some requests due to a bottleneck in processing text information. To tackle this challenge, we present Ground-A-Score, a simple yet powerful model-agnostic image editing method by incorporating grounding during score distillation. This appr… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  12. arXiv:2403.12510  [pdf, other

    cs.CV cs.AI cs.LG

    Generalized Consistency Trajectory Models for Image Manipulation

    Authors: Beomsu Kim, Jaemin Kim, Jeongsol Kim, Jong Chul Ye

    Abstract: Diffusion-based generative models excel in unconditional generation, as well as on applied tasks such as image editing and restoration. The success of diffusion models lies in the iterative nature of diffusion: diffusion breaks down the complex process of mapping noise to data into a sequence of simple denoising tasks. Moreover, we are able to exert fine-grained control over the generation process… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  13. arXiv:2403.12002  [pdf, other

    cs.CV cs.AI

    DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing

    Authors: Hyeonho Jeong, Jinho Chang, Geon Yeong Park, Jong Chul Ye

    Abstract: Text-driven diffusion-based video editing presents a unique challenge not encountered in image editing literature: establishing real-world motion. Unlike existing video editing approaches, here we focus on score distillation sampling to circumvent the standard reverse diffusion process and initiate optimization from videos that already exhibit natural motion. Our analysis reveals that while video… ▽ More

    Submitted 15 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to ECCV 2024, Project page: https://hyeonho99.github.io/dreammotion/

  14. arXiv:2403.11415  [pdf, other

    cs.CV cs.AI cs.LG

    DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation

    Authors: Jeongsol Kim, Geon Yeong Park, Jong Chul Ye

    Abstract: Reverse sampling and score-distillation have emerged as main workhorses in recent years for image manipulation using latent diffusion models (LDMs). While reverse diffusion sampling often requires adjustments of LDM architecture or feature engineering, score distillation offers a simple yet powerful model-agnostic approach, but it is often prone to mode-collapsing. To address these limitations and… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  15. arXiv:2403.06275  [pdf, other

    cs.CV cs.AI cs.LG physics.med-ph

    UNICORN: Ultrasound Nakagami Imaging via Score Matching and Adaptation

    Authors: Kwanyoung Kim, Jaa-Yeon Lee, Jong Chul Ye

    Abstract: Nakagami imaging holds promise for visualizing and quantifying tissue scattering in ultrasound waves, with potential applications in tumor diagnosis and fat fraction estimation which are challenging to discern by conventional ultrasound B-mode images. Existing methods struggle with optimal window size selection and suffer from estimator instability, leading to degraded resolution images. To addres… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 12 pages, 5 figure

  16. arXiv:2402.08601  [pdf, other

    cs.CV

    Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing

    Authors: Yunji Jung, Seokju Lee, Tair Djanibekov, Hyunjung Shim, Jong Chul Ye

    Abstract: Text-guided non-rigid editing involves complex edits for input images, such as changing motion or compositions within their surroundings. Since it requires manipulating the input structure, existing methods often struggle with preserving object identity and background, particularly when combined with Stable Diffusion. In this work, we propose a training-free approach for non-rigid editing with Sta… ▽ More

    Submitted 14 February, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  17. arXiv:2402.02407  [pdf, other

    cs.LG cs.CV cs.NE

    Defining Neural Network Architecture through Polytope Structures of Dataset

    Authors: Sangmin Lee, Abbas Mammadov, Jong Chul Ye

    Abstract: Current theoretical and empirical research in neural networks suggests that complex datasets require large network architectures for thorough classification, yet the precise nature of this relationship remains unclear. This paper tackles this issue by defining upper and lower bounds for neural network widths, which are informed by the polytope structure of the dataset in question. We also delve in… ▽ More

    Submitted 30 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  18. arXiv:2312.08223  [pdf, other

    cs.CV

    Patch-wise Graph Contrastive Learning for Image Translation

    Authors: Chanyong Jung, Gihyun Kwon, Jong Chul Ye

    Abstract: Recently, patch-wise contrastive learning is drawing attention for the image translation by exploring the semantic correspondence between the input and output images. To further explore the patch-wise topology for high-level semantic understanding, here we exploit the graph neural network to capture the topology-aware features. Specifically, we construct the graph based on the patch-wise similarit… ▽ More

    Submitted 19 February, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: AAAI 2024

  19. arXiv:2312.03013  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Breast Ultrasound Report Generation using LangChain

    Authors: Jaeyoung Huh, Hyun Jeong Park, Jong Chul Ye

    Abstract: Breast ultrasound (BUS) is a critical diagnostic tool in the field of breast imaging, aiding in the early detection and characterization of breast abnormalities. Interpreting breast ultrasound images commonly involves creating comprehensive medical reports, containing vital information to promptly assess the patient's condition. However, the ultrasound imaging system necessitates capturing multipl… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  20. arXiv:2312.00845  [pdf, other

    cs.CV cs.AI cs.LG

    VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

    Authors: Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye

    Abstract: Text-to-video diffusion models have advanced video generation significantly. However, customizing these models to generate videos with tailored motions presents a substantial challenge. In specific, they encounter hurdles in (a) accurately reproducing motion from a target video, and (b) creating diverse visual variations. For example, straightforward extensions of static image customization method… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: Project page: https://video-motion-customization.github.io

  21. arXiv:2311.18608  [pdf, other

    cs.CV cs.AI cs.LG

    Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing

    Authors: Hyelin Nam, Gihyun Kwon, Geon Yeong Park, Jong Chul Ye

    Abstract: With the remarkable advent of text-to-image diffusion models, image editing methods have become more diverse and continue to evolve. A promising recent approach in this realm is Delta Denoising Score (DDS) - an image editing technique based on Score Distillation Sampling (SDS) framework that leverages the rich generative prior of text-to-image diffusion models. However, relying solely on the diffe… ▽ More

    Submitted 1 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: CVPR 2024 (poster); Project page: https://hyelinnam.github.io/CDS/

  22. arXiv:2311.15876  [pdf, other

    cs.CV cs.AI cs.LG

    End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding

    Authors: Kwanyoung Kim, Yujin Oh, Sangjoon Park, Hwa Kyung Byun, Joongyo Lee, Jin Sung Kim, Yong Bae Kim, Jong Chul Ye

    Abstract: Recent advances in AI foundation models have significant potential for lightening the clinical workload by mimicking the comprehensive and multi-faceted approaches used by medical professionals. In the field of radiation oncology, the integration of multiple modalities holds great importance, so the opportunity of foundational model is abundant. Inspired by this, here we present RO-LMM, a multi-pu… ▽ More

    Submitted 1 July, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 10 pages, 4 figures, 11 tables

  23. arXiv:2311.15658  [pdf, other

    cs.CV cs.AI cs.LG

    Regularization by Texts for Latent Diffusion Inverse Solvers

    Authors: Jeongsol Kim, Geon Yeong Park, Hyungjin Chung, Jong Chul Ye

    Abstract: The recent advent of diffusion models has led to significant progress in solving inverse problems, leveraging these models as effective generative priors. Nonetheless, there remain challenges related to the ill-posed nature of such problems, often due to inherent ambiguities in measurements or intrinsic system symmetries. To address this, drawing inspiration from the human ability to resolve visua… ▽ More

    Submitted 16 April, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  24. arXiv:2311.01908  [pdf, other

    eess.IV cs.CV

    LLM-driven Multimodal Target Volume Contouring in Radiation Oncology

    Authors: Yujin Oh, Sangjoon Park, Hwa Kyung Byun, Yeona Cho, Ik Jae Lee, Jin Sung Kim, Jong Chul Ye

    Abstract: Target volume contouring for radiation therapy is considered significantly more challenging than the normal organ segmentation tasks as it necessitates the utilization of both image and text-based clinical information. Inspired by the recent advancement of large language models (LLMs) that can facilitate the integration of the textural information and images, here we present a novel LLM-driven mul… ▽ More

    Submitted 15 April, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

  25. arXiv:2310.02713  [pdf, other

    cs.LG cs.AI q-bio.GN q-bio.QM

    scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain

    Authors: Gyutaek Oh, Baekgyu Choi, Inkyung Jung, Jong Chul Ye

    Abstract: Single-cell RNA sequencing (scRNA-seq) has made significant strides in unraveling the intricate cellular diversity within complex tissues. This is particularly critical in the brain, presenting a greater diversity of cell types than other tissue types, to gain a deeper understanding of brain function within various cellular contexts. However, analyzing scRNA-seq data remains a challenge due to inh… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 21 pages, 16 figures

  26. arXiv:2310.02712  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    ED-NeRF: Efficient Text-Guided Editing of 3D Scene with Latent Space NeRF

    Authors: Jangho Park, Gihyun Kwon, Jong Chul Ye

    Abstract: Recently, there has been a significant advancement in text-to-image diffusion models, leading to groundbreaking performance in 2D image generation. These advancements have been extended to 3D models, enabling the generation of novel 3D objects from textual descriptions. This has evolved into NeRF editing methods, which allow the manipulation of existing 3D objects through textual conditioning. How… ▽ More

    Submitted 21 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: ICLR 2024; Project Page: https://jhq1234.github.io/ed-nerf.github.io/

  27. arXiv:2310.01110  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Prompt-tuning latent diffusion models for inverse problems

    Authors: Hyungjin Chung, Jong Chul Ye, Peyman Milanfar, Mauricio Delbracio

    Abstract: We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. Existing methods using latent diffusion models for inverse problems typically rely on simple null text prompts, which can lead to suboptimal performance. To address this limitation, we introduce a method for prompt tuning, which jointly optimizes the text embedding on-the-fly… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 22 pages, 10 figures

  28. arXiv:2310.01107  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models

    Authors: Hyeonho Jeong, Jong Chul Ye

    Abstract: Recent endeavors in video editing have showcased promising results in single-attribute editing or style transfer tasks, either by training text-to-video (T2V) models on text-video data or adopting training-free methods. However, when confronted with the complexities of multi-attribute editing scenarios, they exhibit shortcomings such as omitting or overlooking intended attribute changes, modifying… ▽ More

    Submitted 24 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024, Project Page: http://ground-a-video.github.io

  29. arXiv:2308.14409  [pdf, other

    cs.CV cs.LG

    Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Imaging Inverse Problems

    Authors: Riccardo Barbano, Alexander Denker, Hyungjin Chung, Tae Hoon Roh, Simon Arrdige, Peter Maass, Bangti Jin, Jong Chul Ye

    Abstract: Denoising diffusion models have emerged as the go-to framework for solving inverse problems in imaging. A critical concern regarding these models is their performance on out-of-distribution (OOD) tasks, which remains an under-explored challenge. Realistic reconstructions inconsistent with the measured data can be generated, hallucinating image features that are uniquely present in the training dat… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  30. arXiv:2308.00193  [pdf, other

    eess.IV cs.CV cs.LG

    C-DARL: Contrastive diffusion adversarial representation learning for label-free blood vessel segmentation

    Authors: Boah Kim, Yujin Oh, Bradford J. Wood, Ronald M. Summers, Jong Chul Ye

    Abstract: Blood vessel segmentation in medical imaging is one of the essential steps for vascular disease diagnosis and interventional planning in a broad spectrum of clinical scenarios in image-based medicine and interventional medicine. Unfortunately, manual annotation of the vessel masks is challenging and resource-intensive due to subtle branches and complex structures. To overcome this issue, this pape… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

  31. arXiv:2307.15208  [pdf, other

    eess.IV cs.CV

    Generative AI for Medical Imaging: extending the MONAI Framework

    Authors: Walter H. L. Pinaya, Mark S. Graham, Eric Kerfoot, Petru-Daniel Tudosiu, Jessica Dafflon, Virginia Fernandez, Pedro Sanchez, Julia Wolleb, Pedro F. da Costa, Ashay Patel, Hyungjin Chung, Can Zhao, Wei Peng, Zelong Liu, Xueyan Mei, Oeslle Lucena, Jong Chul Ye, Sotirios A. Tsaftaris, Prerna Dogra, Andrew Feng, Marc Modat, Parashkev Nachev, Sebastien Ourselin, M. Jorge Cardoso

    Abstract: Recent advances in generative AI have brought incredible breakthroughs in several areas, including medical imaging. These generative models have tremendous potential not only to help safely share medical data via synthetic datasets but also to perform an array of diverse applications, such as anomaly detection, image-to-image translation, denoising, and MRI reconstruction. However, due to the comp… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  32. arXiv:2306.09869  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models

    Authors: Geon Yeong Park, Jeongsol Kim, Beomsu Kim, Sang Wan Lee, Jong Chul Ye

    Abstract: Despite the remarkable performance of text-to-image diffusion models in image generation tasks, recent studies have raised the issue that generated images sometimes cannot capture the intended semantic contents of the text prompts, which phenomenon is often called semantic misalignment. To address this, here we present a novel energy-based model (EBM) framework for adaptive context control by mode… ▽ More

    Submitted 4 November, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023

  33. arXiv:2306.04396  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Improving Diffusion-based Image Translation using Asymmetric Gradient Guidance

    Authors: Gihyun Kwon, Jong Chul Ye

    Abstract: Diffusion models have shown significant progress in image translation tasks recently. However, due to their stochastic nature, there's often a trade-off between style transformation and content preservation. Current strategies aim to disentangle style and content, preserving the source image's structure while successfully transitioning from a source to a target domain under text or one-shot image… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  34. arXiv:2306.04339  [pdf, other

    eess.IV cs.AI cs.CV cs.LG physics.med-ph

    Unpaired Deep Learning for Pharmacokinetic Parameter Estimation from Dynamic Contrast-Enhanced MRI

    Authors: Gyutaek Oh, Won-Jin Moon, Jong Chul Ye

    Abstract: DCE-MRI provides information about vascular permeability and tissue perfusion through the acquisition of pharmacokinetic parameters. However, traditional methods for estimating these pharmacokinetic parameters involve fitting tracer kinetic models, which often suffer from computational complexity and low accuracy due to noisy arterial input function (AIF) measurements. Although some deep learning… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  35. arXiv:2305.19809  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Direct Diffusion Bridge using Data Consistency for Inverse Problems

    Authors: Hyungjin Chung, Jeongsol Kim, Jong Chul Ye

    Abstract: Diffusion model-based inverse problem solvers have shown impressive performance, but are limited in speed, mostly as they require reverse diffusion sampling starting from noise. Several recent works have tried to alleviate this problem by building a diffusion process, directly bridging the clean and the corrupted for specific inverse problems. In this paper, we first unify these existing works und… ▽ More

    Submitted 24 October, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023 camera-ready. 16 pages, 6 figures

  36. arXiv:2305.16375  [pdf, other

    cs.LG cs.AI stat.ML

    Data Topology-Dependent Upper Bounds of Neural Network Widths

    Authors: Sangmin Lee, Jong Chul Ye

    Abstract: This paper investigates the relationship between the universal approximation property of deep neural networks and topological characteristics of datasets. Our primary contribution is to introduce data topology-dependent upper bounds on the network width. Specifically, we first show that a three-layer neural network, applying a ReLU activation function and max pooling, can be designed to approximat… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  37. arXiv:2305.15086  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Unpaired Image-to-Image Translation via Neural Schrödinger Bridge

    Authors: Beomsu Kim, Gihyun Kwon, Kwanyoung Kim, Jong Chul Ye

    Abstract: Diffusion models are a powerful class of generative models which simulate stochastic differential equations (SDEs) to generate data from noise. While diffusion models have achieved remarkable progress, they have limitations in unpaired image-to-image (I2I) translation tasks due to the Gaussian prior assumption. Schrödinger Bridge (SB), which learns an SDE to translate between two arbitrary distrib… ▽ More

    Submitted 2 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: ICLR 2024

  38. arXiv:2305.11490  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation

    Authors: Suhyeon Lee, Won Jun Kim, Jinho Chang, Jong Chul Ye

    Abstract: Following the impressive development of LLMs, vision-language alignment in LLMs is actively being researched to enable multimodal reasoning and visual IO. This direction of research is particularly relevant to medical imaging because medical image analysis and generation consist of reasoning based on a combination of visual features and prior knowledge. Many recent works have focused on training a… ▽ More

    Submitted 17 March, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: 21 pages, 8 figures; ICLR 2024 (poster)

  39. arXiv:2303.08767  [pdf, other

    cs.CV cs.AI

    Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion

    Authors: Inhwa Han, Serin Yang, Taesung Kwon, Jong Chul Ye

    Abstract: Diffusion models have shown superior performance in image generation and manipulation, but the inherent stochasticity presents challenges in preserving and manipulating image content and identity. While previous approaches like DreamBooth and Textual Inversion have proposed model or latent representation personalization to maintain the content, their reliance on multiple reference images and compl… ▽ More

    Submitted 19 April, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

  40. arXiv:2303.08622  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer

    Authors: Serin Yang, Hyunmin Hwang, Jong Chul Ye

    Abstract: Diffusion models have shown great promise in text-guided image style transfer, but there is a trade-off between style transformation and content preservation due to their stochastic nature. Existing methods require computationally expensive fine-tuning of diffusion models or additional neural network. To address this, here we propose a zero-shot contrastive loss for diffusion models that doesn't r… ▽ More

    Submitted 12 April, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

  41. arXiv:2303.08440  [pdf, other

    eess.IV cs.CV cs.LG

    Improving 3D Imaging with Pre-Trained Perpendicular 2D Diffusion Models

    Authors: Suhyeon Lee, Hyungjin Chung, Minyoung Park, Jonghyuk Park, Wi-Sun Ryu, Jong Chul Ye

    Abstract: Diffusion models have become a popular approach for image generation and reconstruction due to their numerous advantages. However, most diffusion-based inverse problem-solving methods only deal with 2D images, and even recently published 3D methods do not fully exploit the 3D distribution prior. To address this, we propose a novel approach using two perpendicular pre-trained 2D diffusion models to… ▽ More

    Submitted 1 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: ICCV23 poster. 15 pages, 9 figures

  42. arXiv:2303.05754  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse Problems

    Authors: Hyungjin Chung, Suhyeon Lee, Jong Chul Ye

    Abstract: Krylov subspace, which is generated by multiplying a given vector by the matrix of a linear transformation and its successive powers, has been extensively studied in classical optimization literature to design algorithms that converge quickly for large linear inverse problems. For example, the conjugate gradient method (CG), one of the most popular Krylov subspace methods, is based on the idea of… ▽ More

    Submitted 19 February, 2024; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: ICLR 2024; 28 pages, 9 figures

  43. arXiv:2303.00091  [pdf, other

    eess.AS cs.AI cs.CL cs.CV cs.SD eess.IV

    Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model

    Authors: Jaeyoung Huh, Sangjoon Park, Jeong Eun Lee, Jong Chul Ye

    Abstract: Automatic Speech Recognition (ASR) is a technology that converts spoken words into text, facilitating interaction between humans and machines. One of the most common applications of ASR is Speech-To-Text (STT) technology, which simplifies user workflows by transcribing spoken words into text. In the medical field, STT has the potential to significantly reduce the workload of clinicians who rely on… ▽ More

    Submitted 27 February, 2023; originally announced March 2023.

  44. arXiv:2302.03900  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models

    Authors: Hyeonho Jeong, Gihyun Kwon, Jong Chul Ye

    Abstract: Recent advancements in large scale text-to-image models have opened new possibilities for guiding the creation of images through human-devised natural language. However, while prior literature has primarily focused on the generation of individual images, it is essential to consider the capability of these models to ensure coherency within a sequence of images to fulfill the demands of real-world a… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

  45. arXiv:2301.12334  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Don't Play Favorites: Minority Guidance for Diffusion Models

    Authors: Soobin Um, Suhyeon Lee, Jong Chul Ye

    Abstract: We explore the problem of generating minority samples using diffusion models. The minority samples are instances that lie on low-density regions of a data manifold. Generating a sufficient number of such minority instances is important, since they often contain some unique attributes of the data. However, the conventional generation process of the diffusion models mostly yields majority samples (t… ▽ More

    Submitted 26 February, 2024; v1 submitted 28 January, 2023; originally announced January 2023.

    Comments: ICLR 2024

  46. arXiv:2301.12171  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    ZegOT: Zero-shot Segmentation Through Optimal Transport of Text Prompts

    Authors: Kwanyoung Kim, Yujin Oh, Jong Chul Ye

    Abstract: Recent success of large-scale Contrastive Language-Image Pre-training (CLIP) has led to great promise in zero-shot semantic segmentation by transferring image-text aligned knowledge to pixel-level classification. However, existing methods usually require an additional image encoder or retraining/tuning the CLIP module. Here, we propose a novel Zero-shot segmentation with Optimal Transport (ZegOT)… ▽ More

    Submitted 30 May, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

    Comments: 18pages, 8 figures

  47. arXiv:2301.12003  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Minimizing Trajectory Curvature of ODE-based Generative Models

    Authors: Sangyun Lee, Beomsu Kim, Jong Chul Ye

    Abstract: Recent ODE/SDE-based generative models, such as diffusion models, rectified flows, and flow matching, define a generative process as a time reversal of a fixed forward process. Even though these models show impressive performance on large-scale datasets, numerical simulation requires multiple evaluations of a neural network, leading to a slow sampling speed. We attribute the reason to the high cur… ▽ More

    Submitted 25 May, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: ICML 2023

  48. arXiv:2301.03027  [pdf, other

    eess.IV cs.CV cs.LG

    Annealed Score-Based Diffusion Model for MR Motion Artifact Reduction

    Authors: Gyutaek Oh, Jeong Eun Lee, Jong Chul Ye

    Abstract: Motion artifact reduction is one of the important research topics in MR imaging, as the motion artifact degrades image quality and makes diagnosis difficult. Recently, many deep learning approaches have been studied for motion artifact reduction. Unfortunately, most existing models are trained in a supervised manner, requiring paired motion-corrupted and motion-free images, or are based on a stric… ▽ More

    Submitted 8 January, 2023; originally announced January 2023.

  49. arXiv:2301.02064  [pdf, other

    cs.CV cs.AI

    Single-round Self-supervised Distributed Learning using Vision Transformer

    Authors: Sangjoon Park, Ik-Jae Lee, Jun Won Kim, Jong Chul Ye

    Abstract: Despite the recent success of deep learning in the field of medicine, the issue of data scarcity is exacerbated by concerns about privacy and data ownership. Distributed learning approaches, including federated learning, have been investigated to address these issues. However, they are hindered by the need for cumbersome communication overheads and weaknesses in privacy protection. To tackle these… ▽ More

    Submitted 15 April, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  50. arXiv:2211.10656  [pdf, other

    cs.CV cs.LG stat.ML

    Parallel Diffusion Models of Operator and Image for Blind Inverse Problems

    Authors: Hyungjin Chung, Jeongsol Kim, Sehui Kim, Jong Chul Ye

    Abstract: Diffusion model-based inverse problem solvers have demonstrated state-of-the-art performance in cases where the forward operator is known (i.e. non-blind). However, the applicability of the method to blind inverse problems has yet to be explored. In this work, we show that we can indeed solve a family of blind inverse problems by constructing another diffusion prior for the forward operator. Speci… ▽ More

    Submitted 19 November, 2022; originally announced November 2022.

    Comments: 25 pages, 13 figures