Skip to main content

Showing 1–21 of 21 results for author: Aittala, M

  1. arXiv:2406.02507  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Guiding a Diffusion Model with a Bad Version of Itself

    Authors: Tero Karras, Miika Aittala, Tuomas Kynkäänniemi, Jaakko Lehtinen, Timo Aila, Samuli Laine

    Abstract: The primary axes of interest in image-generating diffusion models are image quality, the amount of variation in the results, and how well the results align with a given condition, e.g., a class label or a text prompt. The popular classifier-free guidance approach uses an unconditional model to guide a conditional model, leading to simultaneously better prompt alignment and higher-quality images at… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2404.07724  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

    Authors: Tuomas Kynkäänniemi, Miika Aittala, Tero Karras, Samuli Laine, Timo Aila, Jaakko Lehtinen

    Abstract: Guidance is a crucial technique for extracting the best performance out of image-generating diffusion models. Traditionally, a constant guidance weight has been applied throughout the sampling chain of an image. We show that guidance is clearly harmful toward the beginning of the chain (high noise levels), largely unnecessary toward the end (low noise levels), and only beneficial in the middle. We… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  3. arXiv:2312.02696  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Analyzing and Improving the Training Dynamics of Diffusion Models

    Authors: Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, Samuli Laine

    Abstract: Diffusion models currently dominate the field of data-driven image synthesis with their unparalleled scaling to large datasets. In this paper, we identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture, without altering its high-level structure. Observing uncontrolled magnitude changes and imbalances in both the network activations an… ▽ More

    Submitted 20 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  4. arXiv:2304.02602  [pdf, other

    cs.CV cs.AI cs.GR

    Generative Novel View Synthesis with 3D-Aware Diffusion Models

    Authors: Eric R. Chan, Koki Nagano, Matthew A. Chan, Alexander W. Bergman, Jeong Joon Park, Axel Levy, Miika Aittala, Shalini De Mello, Tero Karras, Gordon Wetzstein

    Abstract: We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our model samples from the distribution of possible renderings consistent with the input and, even in the presence of ambiguity, is capable of rendering diverse and plausible novel views. To achieve this, our method makes use of existing 2D diffusion backbones but, crucially, incorp… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Project page: https://nvlabs.github.io/genvs

  5. arXiv:2212.07431  [pdf, other

    eess.IV cs.CV cs.LG cs.NE

    Simulator-Based Self-Supervision for Learned 3D Tomography Reconstruction

    Authors: Onni Kosomaa, Samuli Laine, Tero Karras, Miika Aittala, Jaakko Lehtinen

    Abstract: We propose a deep learning method for 3D volumetric reconstruction in low-dose helical cone-beam computed tomography. Prior machine learning approaches require reference reconstructions computed by another algorithm for training. In contrast, we train our model in a fully self-supervised manner using only noisy 2D X-ray data. This is enabled by incorporating a fast differentiable CT simulator in t… ▽ More

    Submitted 26 May, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

  6. arXiv:2211.01324  [pdf, other

    cs.CV cs.LG

    eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

    Authors: Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu

    Abstract: Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion while conditioning on text prompts. We find that their synthesis behavior qualitatively changes throughout this process: Early in sampling, generation strongly… ▽ More

    Submitted 13 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  7. arXiv:2207.01413  [pdf, other

    cs.CV cs.GR

    Disentangling Random and Cyclic Effects in Time-Lapse Sequences

    Authors: Erik Härkönen, Miika Aittala, Tuomas Kynkäänniemi, Samuli Laine, Timo Aila, Jaakko Lehtinen

    Abstract: Time-lapse image sequences offer visually compelling insights into dynamic processes that are too slow to observe in real time. However, playing a long time-lapse sequence back as a video often results in distracting flicker due to random effects, such as weather, as well as cyclic effects, such as the day-night cycle. We introduce the problem of disentangling time-lapse sequences in a way that al… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted to SIGGRAPH 2022. Code: https://github.com/harskish/tlgan

  8. arXiv:2206.03429  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Generating Long Videos of Dynamic Scenes

    Authors: Tim Brooks, Janne Hellsten, Miika Aittala, Ting-Chun Wang, Timo Aila, Jaakko Lehtinen, Ming-Yu Liu, Alexei A. Efros, Tero Karras

    Abstract: We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time while maintaining consistencies expected in real environments, such as plausible dynamics and object persistence. A common failure case is for content to never chan… ▽ More

    Submitted 9 June, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

  9. arXiv:2206.00364  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Elucidating the Design Space of Diffusion-Based Generative Models

    Authors: Tero Karras, Miika Aittala, Timo Aila, Samuli Laine

    Abstract: We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new st… ▽ More

    Submitted 11 October, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  10. arXiv:2203.06026  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    The Role of ImageNet Classes in Fréchet Inception Distance

    Authors: Tuomas Kynkäänniemi, Tero Karras, Miika Aittala, Timo Aila, Jaakko Lehtinen

    Abstract: Fréchet Inception Distance (FID) is the primary metric for ranking models in data-driven generative modeling. While remarkably successful, the metric is known to sometimes disagree with human judgement. We investigate a root cause of these discrepancies, and visualize what FID "looks at" in generated images. We show that the feature space that FID is (typically) computed in is so close to the Imag… ▽ More

    Submitted 14 February, 2023; v1 submitted 11 March, 2022; originally announced March 2022.

    Comments: ICLR 2023 camera ready. Code: https://github.com/kynkaat/role-of-imagenet-classes-in-fid

  11. arXiv:2108.13027  [pdf, other

    cs.CV

    What You Can Learn by Staring at a Blank Wall

    Authors: Prafull Sharma, Miika Aittala, Yoav Y. Schechner, Antonio Torralba, Gregory W. Wornell, William T. Freeman, Fredo Durand

    Abstract: We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room. Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene. We use this signal to classify between zero, one, or two m… ▽ More

    Submitted 30 August, 2021; originally announced August 2021.

  12. arXiv:2106.12423  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Alias-Free Generative Adversarial Networks

    Authors: Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, Timo Aila

    Abstract: We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the… ▽ More

    Submitted 18 October, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

  13. arXiv:2104.03989  [pdf, other

    cs.GR

    Appearance-Driven Automatic 3D Model Simplification

    Authors: Jon Hasselgren, Jacob Munkberg, Jaakko Lehtinen, Miika Aittala, Samuli Laine

    Abstract: We present a suite of techniques for jointly optimizing triangle meshes and shading models to match the appearance of reference scenes. This capability has a number of uses, including appearance-preserving simplification of extremely complex assets, conversion between rendering systems, and even conversion between geometric scene representations. We follow and extend the classic analysis-by-synt… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    ACM Class: I.3.7

  14. arXiv:2006.06676  [pdf, other

    cs.CV cs.LG cs.NE stat.ML

    Training Generative Adversarial Networks with Limited Data

    Authors: Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, Timo Aila

    Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch… ▽ More

    Submitted 7 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

  15. arXiv:1912.04958  [pdf, other

    cs.CV cs.LG cs.NE eess.IV stat.ML

    Analyzing and Improving the Image Quality of StyleGAN

    Authors: Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, Timo Aila

    Abstract: The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to enc… ▽ More

    Submitted 23 March, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

  16. arXiv:1912.02314  [pdf, other

    cs.CV cs.LG

    Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization

    Authors: Miika Aittala, Prafull Sharma, Lukas Murmann, Adam B. Yedidia, Gregory W. Wornell, William T. Freeman, Fredo Durand

    Abstract: We recover a video of the motion taking place in a hidden scene by observing changes in indirect illumination in a nearby uncalibrated visible region. We solve this problem by factoring the observed video into a matrix product between the unknown hidden scene video and an unknown light transport matrix. This task is extremely ill-posed, as any non-negative factorization will satisfy the data. Insp… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: 14 pages, 5 figures, Advances in Neural Information Processing Systems 2019

    Journal ref: Aittala, Miika, et al. "Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization." Advances in Neural Information Processing Systems. 2019

  17. arXiv:1910.08131  [pdf, other

    cs.CV

    A Dataset of Multi-Illumination Images in the Wild

    Authors: Lukas Murmann, Michael Gharbi, Miika Aittala, Fredo Durand

    Abstract: Collections of images under a single, uncontrolled illumination have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. To fill this gap, we introduce a… ▽ More

    Submitted 17 October, 2019; originally announced October 2019.

    Comments: ICCV 2019

  18. arXiv:1906.11557  [pdf, other

    cs.GR

    Flexible SVBRDF Capture with a Multi-Image Deep Network

    Authors: Valentin Deschaintre, Miika Aittala, Fredo Durand, George Drettakis, Adrien Bousseau

    Abstract: Empowered by deep learning, recent methods for material capture can estimate a spatially-varying reflectance from a single photograph. Such lightweight capture is in stark contrast with the tens or hundreds of pictures required by traditional optimization-based approaches. However, a single image is often simply not enough to observe the rich appearance of real-world materials. We present a deep-l… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

    Comments: Accepted to EGSR 2019 in the CGF track

    ACM Class: I.3

    Journal ref: Computer Graphics Forum (EGSR Conference Proceedings), 38, 4(July 2019), 13 pages

  19. arXiv:1904.08825  [pdf, other

    cs.CV

    Generating Training Data for Denoising Real RGB Images via Camera Pipeline Simulation

    Authors: Ronnachai Jaroensri, Camille Biscarrat, Miika Aittala, Frédo Durand

    Abstract: Image reconstruction techniques such as denoising often need to be applied to the RGB output of cameras and cellphones. Unfortunately, the commonly used additive white noise (AWGN) models do not accurately reproduce the noise and the degradation encountered on these inputs. This is particularly important for learning-based techniques, because the mismatch between training and real world data will… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

  20. Single-Image SVBRDF Capture with a Rendering-Aware Deep Network

    Authors: Valentin Deschaintre, Miika Aittala, Fredo Durand, George Drettakis, Adrien Bousseau

    Abstract: Texture, highlights, and shading are some of many visual cues that allow humans to perceive material appearance in single pictures. Yet, recovering spatially-varying bi-directional reflectance distribution functions (SVBRDFs) from a single image based on such cues has challenged researchers in computer graphics for decades. We tackle lightweight appearance capture by training a deep neural network… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

    Comments: 15 pages, presented at Siggraph 2018

    ACM Class: I.3

    Journal ref: ACM Trans. Graph. 37, 4, Article 128 (August 2018), 15 pages

  21. arXiv:1803.04189  [pdf, other

    cs.CV cs.LG stat.ML

    Noise2Noise: Learning Image Restoration without Clean Data

    Authors: Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala, Timo Aila

    Abstract: We apply basic statistical reasoning to signal reconstruction by machine learning -- learning to map corrupted observations to clean signals -- with a simple and powerful conclusion: it is possible to learn to restore images by only looking at corrupted examples, at performance at and sometimes exceeding training using clean data, without explicit image priors or likelihood models of the corruptio… ▽ More

    Submitted 29 October, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

    Comments: Added link to official implementation and updated MRI results to match it