Skip to main content

Showing 1–23 of 23 results for author: Aila, T

  1. arXiv:2406.02507  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Guiding a Diffusion Model with a Bad Version of Itself

    Authors: Tero Karras, Miika Aittala, Tuomas Kynkäänniemi, Jaakko Lehtinen, Timo Aila, Samuli Laine

    Abstract: The primary axes of interest in image-generating diffusion models are image quality, the amount of variation in the results, and how well the results align with a given condition, e.g., a class label or a text prompt. The popular classifier-free guidance approach uses an unconditional model to guide a conditional model, leading to simultaneously better prompt alignment and higher-quality images at… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2404.07724  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

    Authors: Tuomas Kynkäänniemi, Miika Aittala, Tero Karras, Samuli Laine, Timo Aila, Jaakko Lehtinen

    Abstract: Guidance is a crucial technique for extracting the best performance out of image-generating diffusion models. Traditionally, a constant guidance weight has been applied throughout the sampling chain of an image. We show that guidance is clearly harmful toward the beginning of the chain (high noise levels), largely unnecessary toward the end (low noise levels), and only beneficial in the middle. We… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  3. arXiv:2312.02696  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Analyzing and Improving the Training Dynamics of Diffusion Models

    Authors: Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, Samuli Laine

    Abstract: Diffusion models currently dominate the field of data-driven image synthesis with their unparalleled scaling to large datasets. In this paper, we identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture, without altering its high-level structure. Observing uncontrolled magnitude changes and imbalances in both the network activations an… ▽ More

    Submitted 20 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  4. arXiv:2301.09515  [pdf, other

    cs.LG cs.CV

    StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

    Authors: Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger, Timo Aila

    Abstract: Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models. However, the best-performing models require iterative evaluation to generate a single sample. In contrast, generative adversarial networks (GANs) only need a single forward… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

    Comments: Project page: https://sites.google.com/view/stylegan-t/

  5. arXiv:2211.01324  [pdf, other

    cs.CV cs.LG

    eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

    Authors: Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu

    Abstract: Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion while conditioning on text prompts. We find that their synthesis behavior qualitatively changes throughout this process: Early in sampling, generation strongly… ▽ More

    Submitted 13 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  6. arXiv:2207.01413  [pdf, other

    cs.CV cs.GR

    Disentangling Random and Cyclic Effects in Time-Lapse Sequences

    Authors: Erik Härkönen, Miika Aittala, Tuomas Kynkäänniemi, Samuli Laine, Timo Aila, Jaakko Lehtinen

    Abstract: Time-lapse image sequences offer visually compelling insights into dynamic processes that are too slow to observe in real time. However, playing a long time-lapse sequence back as a video often results in distracting flicker due to random effects, such as weather, as well as cyclic effects, such as the day-night cycle. We introduce the problem of disentangling time-lapse sequences in a way that al… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted to SIGGRAPH 2022. Code: https://github.com/harskish/tlgan

  7. arXiv:2206.03429  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Generating Long Videos of Dynamic Scenes

    Authors: Tim Brooks, Janne Hellsten, Miika Aittala, Ting-Chun Wang, Timo Aila, Jaakko Lehtinen, Ming-Yu Liu, Alexei A. Efros, Tero Karras

    Abstract: We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time while maintaining consistencies expected in real environments, such as plausible dynamics and object persistence. A common failure case is for content to never chan… ▽ More

    Submitted 9 June, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

  8. arXiv:2206.00364  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Elucidating the Design Space of Diffusion-Based Generative Models

    Authors: Tero Karras, Miika Aittala, Timo Aila, Samuli Laine

    Abstract: We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new st… ▽ More

    Submitted 11 October, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  9. arXiv:2203.06026  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    The Role of ImageNet Classes in Fréchet Inception Distance

    Authors: Tuomas Kynkäänniemi, Tero Karras, Miika Aittala, Timo Aila, Jaakko Lehtinen

    Abstract: Fréchet Inception Distance (FID) is the primary metric for ranking models in data-driven generative modeling. While remarkably successful, the metric is known to sometimes disagree with human judgement. We investigate a root cause of these discrepancies, and visualize what FID "looks at" in generated images. We show that the feature space that FID is (typically) computed in is so close to the Imag… ▽ More

    Submitted 14 February, 2023; v1 submitted 11 March, 2022; originally announced March 2022.

    Comments: ICLR 2023 camera ready. Code: https://github.com/kynkaat/role-of-imagenet-classes-in-fid

  10. arXiv:2106.12423  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Alias-Free Generative Adversarial Networks

    Authors: Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, Timo Aila

    Abstract: We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the… ▽ More

    Submitted 18 October, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

  11. arXiv:2011.03277  [pdf, other

    cs.GR cs.CV cs.LG

    Modular Primitives for High-Performance Differentiable Rendering

    Authors: Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, Timo Aila

    Abstract: We present a modular differentiable renderer design that yields performance superior to previous methods by leveraging existing, highly optimized hardware graphics pipelines. Our design supports all crucial operations in a modern graphics pipeline: rasterizing large numbers of triangles, attribute interpolation, filtered texture lookups, as well as user-programmable shading and geometry processing… ▽ More

    Submitted 6 November, 2020; originally announced November 2020.

  12. arXiv:2006.06676  [pdf, other

    cs.CV cs.LG cs.NE stat.ML

    Training Generative Adversarial Networks with Limited Data

    Authors: Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, Timo Aila

    Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch… ▽ More

    Submitted 7 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

  13. arXiv:1912.04958  [pdf, other

    cs.CV cs.LG cs.NE eess.IV stat.ML

    Analyzing and Improving the Image Quality of StyleGAN

    Authors: Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, Timo Aila

    Abstract: The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to enc… ▽ More

    Submitted 23 March, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

  14. arXiv:1906.01916  [pdf, other

    cs.CV

    Semi-supervised semantic segmentation needs strong, varied perturbations

    Authors: Geoff French, Samuli Laine, Timo Aila, Michal Mackiewicz, Graham Finlayson

    Abstract: Consistency regularization describes a class of approaches that have yielded ground breaking results in semi-supervised classification problems. Prior work has established the cluster assumption - under which the data distribution consists of uniform class clusters of samples separated by low density regions - as important to its success. We analyze the problem of semantic segmentation and find th… ▽ More

    Submitted 11 August, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: 21 pages, 7 figures, accepted to BMVC 2020

  15. arXiv:1905.01723  [pdf, other

    cs.CV cs.AI cs.GR cs.MM stat.ML

    Few-Shot Unsupervised Image-to-Image Translation

    Authors: Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, Jan Kautz

    Abstract: Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images. While remarkably successful, current methods require access to many images in both source and destination classes at training time. We argue this greatly limits their use. Drawing inspiration from the human cap… ▽ More

    Submitted 9 September, 2019; v1 submitted 5 May, 2019; originally announced May 2019.

    Comments: The paper will be presented at the International Conference on Computer Vision (ICCV) 2019

    Journal ref: ICCV 2019

  16. arXiv:1904.06991  [pdf, other

    stat.ML cs.LG cs.NE

    Improved Precision and Recall Metric for Assessing Generative Models

    Authors: Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, Timo Aila

    Abstract: The ability to automatically estimate the quality and coverage of the samples produced by a generative model is a vital requirement for driving algorithm research. We present an evaluation metric that can separately and reliably measure both of these aspects in image generation tasks by forming explicit, non-parametric representations of the manifolds of real and generated data. We demonstrate the… ▽ More

    Submitted 30 October, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

    Comments: NeurIPS 2019 final version

  17. arXiv:1901.10277  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    High-Quality Self-Supervised Deep Image Denoising

    Authors: Samuli Laine, Tero Karras, Jaakko Lehtinen, Timo Aila

    Abstract: We describe a novel method for training high-quality image denoising models based on unorganized collections of corrupted images. The training does not need access to clean reference images, or explicit pairs of corrupted images, and can thus be applied in situations where such data is unacceptably expensive or impossible to acquire. We build on a recent technique that removes the need for referen… ▽ More

    Submitted 28 October, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: NeurIPS 2019 final version

  18. arXiv:1812.04948  [pdf, other

    cs.NE cs.LG stat.ML

    A Style-Based Generator Architecture for Generative Adversarial Networks

    Authors: Tero Karras, Samuli Laine, Timo Aila

    Abstract: We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific… ▽ More

    Submitted 29 March, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

    Comments: CVPR 2019 final version

  19. arXiv:1803.04189  [pdf, other

    cs.CV cs.LG stat.ML

    Noise2Noise: Learning Image Restoration without Clean Data

    Authors: Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala, Timo Aila

    Abstract: We apply basic statistical reasoning to signal reconstruction by machine learning -- learning to map corrupted observations to clean signals -- with a simple and powerful conclusion: it is possible to learn to restore images by only looking at corrupted examples, at performance at and sometimes exceeding training using clean data, without explicit image priors or likelihood models of the corruptio… ▽ More

    Submitted 29 October, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

    Comments: Added link to official implementation and updated MRI results to match it

  20. arXiv:1710.10196  [pdf, other

    cs.NE cs.LG stat.ML

    Progressive Growing of GANs for Improved Quality, Stability, and Variation

    Authors: Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen

    Abstract: We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images a… ▽ More

    Submitted 26 February, 2018; v1 submitted 27 October, 2017; originally announced October 2017.

    Comments: Final ICLR 2018 version

  21. arXiv:1611.06440  [pdf, other

    cs.LG stat.ML

    Pruning Convolutional Neural Networks for Resource Efficient Inference

    Authors: Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, Jan Kautz

    Abstract: We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with fine-tuning by backpropagation - a computationally efficient procedure that maintains good generalization in the pruned network. We propose a new criterion based on Taylor expansion that approximates the change in the cost function induce… ▽ More

    Submitted 8 June, 2017; v1 submitted 19 November, 2016; originally announced November 2016.

    Comments: 17 pages, 14 figures, ICLR 2017 paper

  22. arXiv:1610.02242  [pdf, other

    cs.NE cs.LG

    Temporal Ensembling for Semi-Supervised Learning

    Authors: Samuli Laine, Timo Aila

    Abstract: In this paper, we present a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. We introduce self-ensembling, where we form a consensus prediction of the unknown labels using the outputs of the network-in-training on different epochs, and most importantly, under different regularization and input augment… ▽ More

    Submitted 15 March, 2017; v1 submitted 7 October, 2016; originally announced October 2016.

    Comments: Final ICLR 2017 version. Includes new results for CIFAR-100 with additional unlabeled data from Tiny Images dataset

  23. arXiv:1609.06536  [pdf, other

    cs.CV cs.GR

    Production-Level Facial Performance Capture Using Deep Convolutional Neural Networks

    Authors: Samuli Laine, Tero Karras, Timo Aila, Antti Herva, Shunsuke Saito, Ronald Yu, Hao Li, Jaakko Lehtinen

    Abstract: We present a real-time deep learning framework for video-based facial performance capture -- the dense 3D tracking of an actor's face given a monocular video. Our pipeline begins with accurately capturing a subject using a high-end production facial capture pipeline based on multi-view stereo tracking and artist-enhanced animations. With 5-10 minutes of captured footage, we train a convolutional n… ▽ More

    Submitted 2 June, 2017; v1 submitted 21 September, 2016; originally announced September 2016.

    Comments: Final SCA 2017 version