Skip to main content

Showing 1–16 of 16 results for author: Saharia, C

  1. arXiv:2306.08276  [pdf, other

    cs.CV cs.GR

    TryOnDiffusion: A Tale of Two UNets

    Authors: Luyang Zhu, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, Ira Kemelmacher-Shlizerman

    Abstract: Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on g… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: CVPR 2023. Project page: https://tryondiffusion.github.io/

  2. arXiv:2304.08466  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Synthetic Data from Diffusion Models Improves ImageNet Classification

    Authors: Shekoofeh Azizi, Simon Kornblith, Chitwan Saharia, Mohammad Norouzi, David J. Fleet

    Abstract: Deep generative models are becoming increasingly powerful, now generating diverse high fidelity photo-realistic samples given text prompts. Have they reached the point where models of natural images can be used for generative data augmentation, helping to improve challenging discriminative tasks? We show that large-scale text-to image diffusion models can be fine-tuned to produce class conditional… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  3. arXiv:2302.07864  [pdf, other

    cs.CV eess.IV

    Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild

    Authors: Hshmat Sahak, Daniel Watson, Chitwan Saharia, David Fleet

    Abstract: Diffusion models have shown promising results on single-image super-resolution and other image- to-image translation tasks. Despite this success, they have not outperformed state-of-the-art GAN models on the more challenging blind super-resolution task, where the input images are out of distribution, with unknown degradations. This paper introduces SR3+, a diffusion-based model for blind super-res… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  4. arXiv:2212.10562  [pdf, other

    cs.CL cs.CV

    Character-Aware Models Improve Visual Text Rendering

    Authors: Rosanne Liu, Dan Garrette, Chitwan Saharia, William Chan, Adam Roberts, Sharan Narang, Irina Blok, RJ Mical, Mohammad Norouzi, Noah Constant

    Abstract: Current image generation models struggle to reliably produce well-formed visual text. In this paper, we investigate a key contributing factor: popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs. To quantify this effect, we conduct a series of experiments comparing character-aware vs. character-blind text e… ▽ More

    Submitted 3 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  5. arXiv:2212.06909  [pdf, other

    cs.CV cs.AI

    Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

    Authors: Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan

    Abstract: Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to input text prompts, while consistent with input images. We present Imagen Editor, a cascaded diffusion model built, by fine-tuning Imagen on text-guided image inpainting. Imagen Editor's edits are faithful to the text prompts, which is accomplish… ▽ More

    Submitted 12 April, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: CVPR 2023 Camera Ready

  6. arXiv:2210.02303  [pdf, other

    cs.CV cs.LG

    Imagen Video: High Definition Video Generation with Diffusion Models

    Authors: Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, Tim Salimans

    Abstract: We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design deci… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: See accompanying website: https://imagen.research.google/video/

  7. arXiv:2209.14491  [pdf, other

    cs.CV cs.AI cs.LG

    Re-Imagen: Retrieval-Augmented Text-to-Image Generator

    Authors: Wenhu Chen, Hexiang Hu, Chitwan Saharia, William W. Cohen

    Abstract: Research on text-to-image generation has witnessed significant progress in generating diverse and photo-realistic images, driven by diffusion and auto-regressive models trained on large-scale image-text data. Though state-of-the-art models can generate high-quality images of common entities, they often have difficulty generating images of uncommon entities, such as `Chortai (dog)' or `Picarones (f… ▽ More

    Submitted 21 November, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: 9 pages

  8. arXiv:2205.11487  [pdf, other

    cs.CV cs.LG

    Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

    Authors: Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

    Abstract: We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only c… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  9. arXiv:2112.02475  [pdf, other

    cs.CV eess.IV

    Deblurring via Stochastic Refinement

    Authors: Jay Whang, Mauricio Delbracio, Hossein Talebi, Chitwan Saharia, Alexandros G. Dimakis, Peyman Milanfar

    Abstract: Image deblurring is an ill-posed problem with multiple plausible solutions for a given input image. However, most existing methods produce a deterministic estimate of the clean image and are trained to minimize pixel-level distortion. These metrics are known to be poorly correlated with human perception, and often lead to unrealistic reconstructions. We present an alternative framework for blind d… ▽ More

    Submitted 28 December, 2021; v1 submitted 4 December, 2021; originally announced December 2021.

  10. arXiv:2111.05826  [pdf, other

    cs.CV cs.LG

    Palette: Image-to-Image Diffusion Models

    Authors: Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, Mohammad Norouzi

    Abstract: This paper develops a unified framework for image-to-image translation based on conditional diffusion models and evaluates this framework on four challenging image-to-image translation tasks, namely colorization, inpainting, uncropping, and JPEG restoration. Our simple implementation of image-to-image diffusion models outperforms strong GAN and regression baselines on all tasks, without task-speci… ▽ More

    Submitted 3 May, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

  11. arXiv:2106.15282  [pdf, other

    cs.CV cs.AI cs.LG

    Cascaded Diffusion Models for High Fidelity Image Generation

    Authors: Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans

    Abstract: We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation benchmark, without any assistance from auxiliary image classifiers to boost sample quality. A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution, beginning with a standard diffusion model at the lowe… ▽ More

    Submitted 17 December, 2021; v1 submitted 30 May, 2021; originally announced June 2021.

  12. arXiv:2104.07636  [pdf, other

    eess.IV cs.CV cs.LG

    Image Super-Resolution via Iterative Refinement

    Authors: Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, Mohammad Norouzi

    Abstract: We present SR3, an approach to image Super-Resolution via Repeated Refinement. SR3 adapts denoising diffusion probabilistic models to conditional image generation and performs super-resolution through a stochastic denoising process. Inference starts with pure Gaussian noise and iteratively refines the noisy output using a U-Net model trained on denoising at various noise levels. SR3 exhibits stron… ▽ More

    Submitted 30 June, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

  13. arXiv:2004.07437  [pdf, ps, other

    cs.CL cs.LG

    Non-Autoregressive Machine Translation with Latent Alignments

    Authors: Chitwan Saharia, William Chan, Saurabh Saxena, Mohammad Norouzi

    Abstract: This paper presents two strong methods, CTC and Imputer, for non-autoregressive machine translation that model latent alignments with dynamic programming. We revisit CTC for machine translation and demonstrate that a simple CTC model can achieve state-of-the-art for single-step non-autoregressive machine translation, contrary to what prior work indicates. In addition, we adapt the Imputer model fo… ▽ More

    Submitted 16 November, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

  14. arXiv:2002.08926  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Imputer: Sequence Modelling via Imputation and Dynamic Programming

    Authors: William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep Jaitly

    Abstract: This paper presents the Imputer, a neural sequence model that generates output sequences iteratively via imputations. The Imputer is an iterative generative model, requiring only a constant number of generation steps independent of the number of input or output tokens. The Imputer can be trained to approximately marginalize over all possible alignments between the input and output sequences, and a… ▽ More

    Submitted 22 April, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

  15. arXiv:2002.00412  [pdf, other

    cs.LG cs.AI stat.ML

    Combating False Negatives in Adversarial Imitation Learning

    Authors: Konrad Zolna, Chitwan Saharia, Leonard Boussioux, David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

    Abstract: In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's t… ▽ More

    Submitted 2 February, 2020; originally announced February 2020.

    Comments: This is an extended version of the student abstract published at 34th AAAI Conference on Artificial Intelligence

  16. arXiv:1810.08272  [pdf, other

    cs.AI cs.CL

    BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning

    Authors: Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio

    Abstract: Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts. Here, we introduce the BabyAI research platform to support investigations towards including humans in the loop for grounded languag… ▽ More

    Submitted 19 December, 2019; v1 submitted 18 October, 2018; originally announced October 2018.

    Comments: Accepted at ICLR 2019