Skip to main content

Showing 1–50 of 58 results for author: Lucchi, A

  1. arXiv:2406.16666  [pdf, other

    cs.LG math.NA math.OC

    Cubic regularized subspace Newton for non-convex optimization

    Authors: Jim Zhao, Aurelien Lucchi, Nikita Doikov

    Abstract: This paper addresses the optimization problem of minimizing non-convex continuous functions, which is relevant in the context of high-dimensional machine learning applications characterized by over-parametrization. We analyze a randomized coordinate second-order method named SSCN which can be interpreted as applying cubic regularization in random subspaces. This approach effectively reduces the co… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2402.12508  [pdf, other

    cs.LG math.OC

    SDEs for Minimax Optimization

    Authors: Enea Monzio Compagnoni, Antonio Orvieto, Hans Kersting, Frank Norbert Proske, Aurelien Lucchi

    Abstract: Minimax optimization problems have attracted a lot of attention over the past few years, with applications ranging from economics to machine learning. While advanced optimization methods exist for such problems, characterizing their dynamics in stochastic scenarios remains notably challenging. In this paper, we pioneer the use of stochastic differential equations (SDEs) to analyze and compare Mini… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted at AISTATS 2024 (Poster)

  3. arXiv:2402.01297  [pdf, other

    cs.LG stat.ML

    Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum

    Authors: Tin Sum Cheng, Aurelien Lucchi, Anastasis Kratsios, David Belius

    Abstract: We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  4. arXiv:2310.00987  [pdf, other

    cs.LG stat.ML

    A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge Regression

    Authors: Tin Sum Cheng, Aurelien Lucchi, Ivan Dokmanić, Anastasis Kratsios, David Belius

    Abstract: Existing statistical learning guarantees for general kernel regressors often yield loose bounds when used with finite-rank kernels. Yet, finite-rank kernels naturally appear in several machine learning problems, e.g.\ when fine-tuning a pre-trained deep neural network's last layer to adapt it to a novel task when performing transfer learning. We address this gap for finite-rank kernel ridge regres… ▽ More

    Submitted 3 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

  5. arXiv:2309.04557  [pdf, other

    cs.LG math.DS math.OC q-fin.CP

    Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing

    Authors: Xuwei Yang, Anastasis Kratsios, Florian Krach, Matheus Grasselli, Aurelien Lucchi

    Abstract: We propose an optimal iterative scheme for federated transfer learning, where a central planner has access to datasets ${\cal D}_1,\dots,{\cal D}_N$ for the same learning model $f_θ$. Our objective is to minimize the cumulative deviation of the generated parameters $\{θ_i(t)\}_{t=0}^T$ across all $T$ iterations from the specialized parameters $θ^\star_{1},\ldots,θ^\star_N$ obtained for each datase… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 54 pages, 3 figures

  6. arXiv:2306.00809  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Initial Guessing Bias: How Untrained Networks Favor Some Classes

    Authors: Emanuele Francazi, Aurelien Lucchi, Marco Baity-Jesi

    Abstract: Understanding and controlling biasing effects in neural networks is crucial for ensuring accurate and fair model performance. In the context of classification problems, we provide a theoretical analysis demonstrating that the structure of a deep neural network (DNN) can condition the model to assign all predictions to the same class, even before the beginning of training, and in the absence of exp… ▽ More

    Submitted 13 June, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: We have added experiments on pre-trained models and various new results, including analysis in the limit of an infinite number of classes and an extension of the analysis to non-identically distributed classes. Additionally, we have slightly restructured the main paper to include more discussion on the practical implications of the phenomenon

  7. arXiv:2305.15805  [pdf, other

    cs.CL cs.LG

    Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

    Authors: Sotiris Anagnostidis, Dario Pavllo, Luca Biggio, Lorenzo Noci, Aurelien Lucchi, Thomas Hofmann

    Abstract: Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the… ▽ More

    Submitted 31 May, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

  8. arXiv:2301.08203  [pdf, other

    cs.LG math.OC

    An SDE for Modeling SAM: Theory and Insights

    Authors: Enea Monzio Compagnoni, Luca Biggio, Antonio Orvieto, Frank Norbert Proske, Hans Kersting, Aurelien Lucchi

    Abstract: We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the full-batch and mini-batch settings. We demonstrate that these SDEs… ▽ More

    Submitted 4 June, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: Accepted at ICML 2023 (Poster)

  9. arXiv:2210.00828  [pdf, other

    cs.CV

    Mastering Spatial Graph Prediction of Road Networks

    Authors: Sotiris Anagnostidis, Aurelien Lucchi, Thomas Hofmann

    Abstract: Accurately predicting road networks from satellite images requires a global understanding of the network topology. We propose to capture such high-level information by introducing a graph-based framework that simulates the addition of sequences of graph edges using a reinforcement learning (RL) approach. In particular, given a partially generated graph associated with a satellite image, an RL agen… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

  10. arXiv:2209.09162  [pdf, other

    math.OC cs.LG

    On the Theoretical Properties of Noise Correlation in Stochastic Optimization

    Authors: Aurelien Lucchi, Frank Proske, Antonio Orvieto, Francis Bach, Hans Kersting

    Abstract: Studying the properties of stochastic noise to optimize complex non-convex functions has been an active area of research in the field of machine learning. Prior work has shown that the noise of stochastic gradient descent improves optimization by overcoming undesirable obstacles in the landscape. Moreover, injecting artificial Gaussian noise has become a popular idea to quickly escape saddle point… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Journal ref: Neurips 2022

  11. arXiv:2207.00391  [pdf, other

    stat.ML cs.LG

    A Theoretical Analysis of the Learning Dynamics under Class Imbalance

    Authors: Emanuele Francazi, Marco Baity-Jesi, Aurelien Lucchi

    Abstract: Data imbalance is a common problem in machine learning that can have a critical effect on the performance of a model. Various solutions exist but their impact on the convergence of the learning dynamics is not understood. Here, we elucidate the significant negative impact of data imbalance on learning, showing that the learning curves for minority and majority classes follow sub-optimal trajectori… ▽ More

    Submitted 19 February, 2024; v1 submitted 1 July, 2022; originally announced July 2022.

    Comments: In the latest update of our paper, we've refined the formulations of the theorems and their proofs in the appendix to improve clarity

    Journal ref: International Conference on Machine Learning 2023, (PMLR) 10285-10322

  12. arXiv:2206.03126  [pdf, other

    cs.LG

    Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

    Authors: Lorenzo Noci, Sotiris Anagnostidis, Luca Biggio, Antonio Orvieto, Sidak Pal Singh, Aurelien Lucchi

    Abstract: Transformers have achieved remarkable success in several domains, ranging from natural language processing to computer vision. Nevertheless, it has been recently shown that stacking self-attention layers - the distinctive architectural component of Transformers - can result in rank collapse of the tokens' representations at initialization. The question of if and how rank collapse affects training… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  13. arXiv:2203.07337  [pdf, other

    stat.ML cs.LG

    Phenomenology of Double Descent in Finite-Width Neural Networks

    Authors: Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Schölkopf

    Abstract: `Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized. The current theoretical understanding behind the occurrence of this phenomenon is primarily based on linear and kernel regression models -- with informal parallels to neural networks via the Neural Tangent Kernel. Therefore such analyses do not adequately capture… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: Published at ICLR 2022

  14. arXiv:2203.03443  [pdf, other

    cs.LG

    Generalization Through The Lens Of Leave-One-Out Error

    Authors: Gregor Bachmann, Thomas Hofmann, Aurélien Lucchi

    Abstract: Despite the tremendous empirical success of deep learning models to solve various learning tasks, our theoretical understanding of their generalization ability is very limited. Classical generalization bounds based on tools such as the VC dimension or Rademacher complexity, are so far unsuitable for deep models and it is doubtful that these techniques can yield tight bounds even in the most ideali… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  15. arXiv:2202.10464  [pdf, other

    cs.NE cs.LG math.OC stat.ML

    A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning

    Authors: Youssef Diouane, Aurelien Lucchi, Vihang Patil

    Abstract: Evolutionary strategies have recently been shown to achieve competing levels of performance for complex optimization problems in reinforcement learning. In such problems, one often needs to optimize an objective function subject to a set of constraints, including for instance constraints on the entropy of a policy or to restrict the possible set of actions or states accessible to an agent. Converg… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

    Journal ref: AISTATS 2022

  16. arXiv:2202.02831  [pdf, other

    stat.ML cs.LG math.OC

    Anticorrelated Noise Injection for Improved Generalization

    Authors: Antonio Orvieto, Hans Kersting, Frank Proske, Francis Bach, Aurelien Lucchi

    Abstract: Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models. Usually, uncorrelated noise is used in such perturbed gradient descent (PGD) methods. It is, however, not known if this is optimal or whether other types of noise could provide better generalization performance. In this paper, we zoom in on the problem of correlating th… ▽ More

    Submitted 19 May, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

    Comments: 22 pages, 16 figures

  17. arXiv:2112.05604  [pdf, other

    cs.LG math.OC stat.ML

    Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity

    Authors: Junchi Yang, Antonio Orvieto, Aurelien Lucchi, Niao He

    Abstract: Gradient descent ascent (GDA), the simplest single-loop algorithm for nonconvex minimax optimization, is widely used in practical applications such as generative adversarial networks (GANs) and adversarial training. Albeit its desirable simplicity, recent work shows inferior convergence rates of GDA in theory even assuming strong concavity of the objective on one side. This paper establishes new c… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

  18. arXiv:2110.13265  [pdf, other

    math.OC cs.LG

    On the Second-order Convergence Properties of Random Search Methods

    Authors: Aurelien Lucchi, Antonio Orvieto, Adamos Solomou

    Abstract: We study the theoretical convergence properties of random-search methods when optimizing non-convex objective functions without having access to derivatives. We prove that standard random-search methods that do not rely on second-order information converge to a second-order stationary point. However, they suffer from an exponential complexity in terms of the input dimension of the problem. In orde… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Journal ref: NeurIPS 2021

  19. arXiv:2106.06427  [pdf, other

    cs.LG

    Neural Symbolic Regression that Scales

    Authors: Luca Biggio, Tommaso Bendinelli, Alexander Neitz, Aurelien Lucchi, Giambattista Parascandolo

    Abstract: Symbolic equations are at the core of scientific discovery. The task of discovering the underlying equation from a set of input-output pairs is called symbolic regression. Traditionally, symbolic regression methods use hand-designed strategies that do not improve with experience. In this paper, we introduce the first symbolic regression method that leverages large scale pre-training. We procedural… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: Accepted at the 38th International Conference on Machine Learning (ICML) 2021

  20. arXiv:2106.03763  [pdf, other

    cs.LG

    Vanishing Curvature and the Power of Adaptive Methods in Randomly Initialized Deep Networks

    Authors: Antonio Orvieto, Jonas Kohler, Dario Pavllo, Thomas Hofmann, Aurelien Lucchi

    Abstract: This paper revisits the so-called vanishing gradient phenomenon, which commonly occurs in deep randomly initialized neural networks. Leveraging an in-depth analysis of neural chains, we first show that vanishing gradients cannot be circumvented when the network width scales with less than O(depth), even when initialized with the popular Xavier and He initializations. Second, we extend the analysis… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  21. arXiv:2103.15627  [pdf, other

    cs.CV cs.GR cs.LG

    Learning Generative Models of Textured 3D Meshes from Real-World Images

    Authors: Dario Pavllo, Jonas Kohler, Thomas Hofmann, Aurelien Lucchi

    Abstract: Recent advances in differentiable rendering have sparked an interest in learning generative models of textured 3D meshes from image collections. These models natively disentangle pose and appearance, enable downstream applications in computer graphics, and improve the ability of generative models to understand the concept of image formation. Although there has been prior work on learning such mode… ▽ More

    Submitted 17 August, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: ICCV 2021

  22. arXiv:2103.12685  [pdf, other

    cs.LG cs.AI

    Generative Minimization Networks: Training GANs Without Competition

    Authors: Paulina Grnarova, Yannic Kilcher, Kfir Y. Levy, Aurelien Lucchi, Thomas Hofmann

    Abstract: Many applications in machine learning can be framed as minimization problems and solved efficiently using gradient-based techniques. However, recent applications of generative models, particularly GANs, have triggered interest in solving min-max games for which standard optimization techniques are often not suitable. Among known problems experienced by practitioners is the lack of convergence guar… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

  23. arXiv:2102.11386  [pdf, other

    math.OC cs.LG

    Direct-Search for a Class of Stochastic Min-Max Problems

    Authors: Sotiris Anagnostidis, Aurelien Lucchi, Youssef Diouane

    Abstract: Recent applications in machine learning have renewed the interest of the community in min-max optimization problems. While gradient-based optimization methods are widely used to solve such problems, there are however many scenarios where these techniques are not well-suited, or even not applicable when the gradient is not accessible. We investigate the use of direct-search methods that belong to a… ▽ More

    Submitted 14 April, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

  24. The power of quantum neural networks

    Authors: Amira Abbas, David Sutter, Christa Zoufal, Aurélien Lucchi, Alessio Figalli, Stefan Woerner

    Abstract: Fault-tolerant quantum computers offer the promise of dramatically improving machine learning through speed-ups in computation or improved model scalability. In the near-term, however, the benefits of quantum machine learning are not so clear. Understanding expressibility and trainability of quantum models-and quantum neural networks in particular-requires further investigation. In this work, we u… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

    Comments: 25 pages, 10 figures

    Journal ref: Nat Comput Sci 1, 403-409 (2021)

  25. arXiv:2010.06948  [pdf, other

    cs.LG astro-ph.IM physics.comp-ph stat.ML

    Scalable Graph Networks for Particle Simulations

    Authors: Karolis Martinkus, Aurelien Lucchi, Nathanaël Perraudin

    Abstract: Learning system dynamics directly from observations is a promising direction in machine learning due to its potential to significantly enhance our ability to understand physical systems. However, the dynamics of many real-world systems are challenging to learn due to the presence of nonlinear potentials and a number of interactions that scales quadratically with the number of particles $N$, as in… ▽ More

    Submitted 20 March, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: 19 pages, 20 figures, AAAI 2021

  26. arXiv:2007.03311  [pdf, other

    math.OC cs.LG

    An Accelerated DFO Algorithm for Finite-sum Convex Functions

    Authors: Yuwen Chen, Antonio Orvieto, Aurelien Lucchi

    Abstract: Derivative-free optimization (DFO) has recently gained a lot of momentum in machine learning, spawning interest in the community to design faster methods for problems where gradients are not accessible. While some attention has been given to the concept of acceleration in the DFO literature, existing stochastic algorithms for objective functions with a finite-sum structure have not been shown theo… ▽ More

    Submitted 2 August, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

    Comments: 48 pages, 44 figures; Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020

  27. arXiv:2006.13591  [pdf, other

    cs.LG cs.DC stat.ML

    Randomized Block-Diagonal Preconditioning for Parallel Learning

    Authors: Celestine Mendler-Dünner, Aurelien Lucchi

    Abstract: We study preconditioned gradient-based optimization methods where the preconditioning matrix has block-diagonal form. Such a structural constraint comes with the advantage that the update computation is block-separable and can be parallelized across multiple independent tasks. Our main contribution is to demonstrate that the convergence of these methods can significantly be improved by a randomiza… ▽ More

    Submitted 7 December, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: improvement in Theorem 3 compared to ICML 2020 version

    Journal ref: PMLR 119:6841-6851 (2020)

  28. arXiv:2006.07660  [pdf, other

    cs.CV cs.GR cs.LG eess.IV

    Convolutional Generation of Textured 3D Meshes

    Authors: Dario Pavllo, Graham Spinks, Thomas Hofmann, Marie-Francine Moens, Aurelien Lucchi

    Abstract: While recent generative models for 2D images achieve impressive visual results, they clearly lack the ability to perform 3D reasoning. This heavily restricts the degree of control over generated objects as well as the possible applications of such models. In this work, we bridge this gap by leveraging recent advances in differentiable rendering. We design a framework that can generate triangle mes… ▽ More

    Submitted 23 October, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020, Oral presentation. Code at https://github.com/dariopavllo/convmesh

  29. arXiv:2004.08139  [pdf, other

    astro-ph.CO cs.LG eess.IV

    Emulation of cosmological mass maps with conditional generative adversarial networks

    Authors: Nathanaël Perraudin, Sandro Marcon, Aurelien Lucchi, Tomasz Kacprzak

    Abstract: Weak gravitational lensing mass maps play a crucial role in understanding the evolution of structures in the universe and our ability to constrain cosmological models. The prediction of these mass maps is based on expensive N-body simulations, which can create a computational bottleneck for cosmological analyses. Modern deep generative models, such as Generative Adversarial Networks (GAN), have de… ▽ More

    Submitted 6 May, 2021; v1 submitted 17 April, 2020; originally announced April 2020.

    Comments: Accepted at the Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS), December 14, 2019, https://ml4physicalsciences.github.io/files/NeurIPS_ML4PS_2019_97.pdf Accepted in Frontiers in Artificial Intelligence in May 2021

  30. arXiv:2003.01652  [pdf, other

    stat.ML cs.LG

    Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks

    Authors: Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

    Abstract: Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used. We here investigate this phenomenon by revisiting the connection between random initialization in deep networks and spectral instabilities in products of random matrices. Given the rich literature on random mat… ▽ More

    Submitted 11 June, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

  31. Controlling Style and Semantics in Weakly-Supervised Image Generation

    Authors: Dario Pavllo, Aurelien Lucchi, Thomas Hofmann

    Abstract: We propose a weakly-supervised approach for conditional image generation of complex scenes where a user has fine control over objects appearing in the scene. We exploit sparse semantic maps to control object shapes and classes, as well as textual descriptions or attributes to control both local and global style. In order to condition our model on textual descriptions, we introduce a semantic atten… ▽ More

    Submitted 21 July, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

    Comments: European Conference on Computer Vision (ECCV) 2020, Spotlight. Code at https://github.com/dariopavllo/style-semantics

  32. arXiv:1911.10367  [pdf, other

    math.OC cs.LG

    A Sub-sampled Tensor Method for Non-convex Optimization

    Authors: Aurelien Lucchi, Jonas Kohler

    Abstract: We present a stochastic optimization method that uses a fourth-order regularized model to find local minima of smooth and potentially non-convex objective functions with a finite-sum structure. This algorithm uses sub-sampled derivatives instead of exact quantities. The proposed approach is shown to find an $(ε_1,ε_2,ε_3)$-third-order critical point in at most… ▽ More

    Submitted 15 July, 2023; v1 submitted 23 November, 2019; originally announced November 2019.

    Comments: Initial title: A Stochastic Tensor Method for Non-convex Optimization

  33. arXiv:1911.05206  [pdf, other

    math.OC cs.LG

    Shadowing Properties of Optimization Algorithms

    Authors: Antonio Orvieto, Aurelien Lucchi

    Abstract: Ordinary differential equation (ODE) models of gradient-based optimization methods can provide insights into the dynamics of learning and inspire the design of new algorithms. Unfortunately, this thought-provoking perspective is weakened by the fact that, in the worst case, the error between the algorithm steps and its ODE approximation grows exponentially with the number of iterations. In an atte… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Journal ref: Advances in neural information processing systems. 2019

  34. arXiv:1908.05519  [pdf, other

    physics.comp-ph astro-ph.CO cs.LG eess.IV

    Cosmological N-body simulations: a challenge for scalable generative models

    Authors: Nathanaël Perraudin, Ankit Srivastava, Aurelien Lucchi, Tomasz Kacprzak, Thomas Hofmann, Alexandre Réfrégier

    Abstract: Deep generative models, such as Generative Adversarial Networks (GANs) or Variational Autoencoders (VAs) have been demonstrated to produce images of high visual quality. However, the existing hardware severely limits the size of the images that can be generated. The rapid growth of high dimensional data in many fields of science therefore poses a significant challenge for generative models. In cos… ▽ More

    Submitted 18 December, 2019; v1 submitted 15 August, 2019; originally announced August 2019.

  35. arXiv:1907.01678  [pdf, other

    cs.LG math.OC stat.ML

    The Role of Memory in Stochastic Optimization

    Authors: Antonio Orvieto, Jonas Kohler, Aurelien Lucchi

    Abstract: The choice of how to retain information about past gradients dramatically affects the convergence properties of state-of-the-art stochastic optimization methods, such as Heavy-ball, Nesterov's momentum, RMSprop and Adam. Building on this observation, we use stochastic differential equations (SDEs) to explicitly study the role of memory in gradient-based algorithms. We first derive a general contin… ▽ More

    Submitted 11 March, 2020; v1 submitted 2 July, 2019; originally announced July 2019.

    Comments: Accepted paper at the 35th Conference on Uncertainty in Artificial Intelligence (UAI), Tel Aviv, 2019

  36. arXiv:1905.09201  [pdf, other

    cs.LG stat.ML

    Adaptive norms for deep learning with regularized Newton methods

    Authors: Jonas Kohler, Leonard Adolphs, Aurelien Lucchi

    Abstract: We investigate the use of regularized Newton methods with adaptive norms for optimizing neural networks. This approach can be seen as a second-order counterpart of adaptive gradient methods, which we here show to be interpretable as first-order trust region methods with ellipsoidal constraints. In particular, we prove that the preconditioning matrix used in RMSProp and Adam satisfies the necessary… ▽ More

    Submitted 28 September, 2020; v1 submitted 22 May, 2019; originally announced May 2019.

  37. arXiv:1812.01497  [pdf, other

    cs.CV

    Topological Map Extraction from Overhead Images

    Authors: Zuoyue Li, Jan Dirk Wegner, Aurélien Lucchi

    Abstract: We propose a new approach, named PolyMapper, to circumvent the conventional pixel-wise segmentation of (aerial) images and predict objects in a vector representation directly. PolyMapper directly extracts the topological map of a city from overhead images as collections of building footprints and road networks. In order to unify the shape representation for different types of objects, we also prop… ▽ More

    Submitted 29 November, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: ICCV 2019

  38. arXiv:1811.05512  [pdf, other

    cs.LG stat.ML

    A domain agnostic measure for monitoring and evaluating GANs

    Authors: Paulina Grnarova, Kfir Y Levy, Aurelien Lucchi, Nathanael Perraudin, Ian Goodfellow, Thomas Hofmann, Andreas Krause

    Abstract: Generative Adversarial Networks (GANs) have shown remarkable results in modeling complex distributions, but their evaluation remains an unsettled issue. Evaluations are essential for: (i) relative assessment of different models and (ii) monitoring the progress of a single model throughout training. The latter cannot be determined by simply inspecting the generator and discriminator loss curves as… ▽ More

    Submitted 15 July, 2020; v1 submitted 13 November, 2018; originally announced November 2018.

  39. arXiv:1810.02565  [pdf, other

    math.OC cs.LG

    Continuous-time Models for Stochastic Optimization Algorithms

    Authors: Antonio Orvieto, Aurelien Lucchi

    Abstract: We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show th… ▽ More

    Submitted 10 March, 2020; v1 submitted 5 October, 2018; originally announced October 2018.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  40. arXiv:1806.07569  [pdf, other

    cs.LG stat.ML

    A Distributed Second-Order Algorithm You Can Trust

    Authors: Celestine Dünner, Aurelien Lucchi, Matilde Gargiani, An Bian, Thomas Hofmann, Martin Jaggi

    Abstract: Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years. While first-order methods seem to dominate the field, second-order methods are nevertheless attractive as they potentially require fewer communication rounds to converge. However, there are significant drawbacks that impede their wide adoption, such as the compu… ▽ More

    Submitted 20 June, 2018; originally announced June 2018.

    Comments: appearing at ICML 2018 - Proceedings of the 35th International Conference on Machine Learning, Stockholm, Schweden, PMLR 80, 2018

  41. arXiv:1805.10694  [pdf, other

    stat.ML cs.LG

    Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization

    Authors: Jonas Kohler, Hadi Daneshmand, Aurelien Lucchi, Ming Zhou, Klaus Neymeyr, Thomas Hofmann

    Abstract: Normalization techniques such as Batch Normalization have been applied successfully for training deep neural networks. Yet, despite its apparent empirical benefits, the reasons behind the success of Batch Normalization are mostly hypothetical. We here aim to provide a more thorough theoretical understanding from a classical optimization perspective. Our main contribution towards this goal is the i… ▽ More

    Submitted 6 October, 2018; v1 submitted 27 May, 2018; originally announced May 2018.

  42. arXiv:1805.08736  [pdf, other

    stat.ML cs.LG

    Adversarially Robust Training through Structured Gradient Regularization

    Authors: Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann

    Abstract: We propose a novel data-dependent structured gradient regularizer to increase the robustness of neural networks vis-a-vis adversarial perturbations. Our regularizer can be derived as a controlled approximation from first principles, leveraging the fundamental link between training with noise and regularization. It adds very little computational overhead during learning and is simple to implement g… ▽ More

    Submitted 22 May, 2018; originally announced May 2018.

  43. arXiv:1805.05751  [pdf, other

    cs.LG math.OC stat.ML

    Local Saddle Point Optimization: A Curvature Exploitation Approach

    Authors: Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

    Abstract: Gradient-based optimization methods are the most popular choice for finding local optima for classical minimization and saddle point problems. Here, we highlight a systemic issue of gradient dynamics that arise for saddle point problems, namely the presence of undesired stable stationary points that are no local optima. We propose a novel optimization approach that exploits curvature information i… ▽ More

    Submitted 14 February, 2019; v1 submitted 15 May, 2018; originally announced May 2018.

  44. arXiv:1803.05999  [pdf, other

    cs.LG math.OC stat.ML

    Escaping Saddles with Stochastic Gradients

    Authors: Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann

    Abstract: We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensio… ▽ More

    Submitted 16 September, 2018; v1 submitted 15 March, 2018; originally announced March 2018.

  45. arXiv:1710.11383  [pdf, other

    cs.LG stat.ML

    Flexible Prior Distributions for Deep Generative Models

    Authors: Yannic Kilcher, Aurelien Lucchi, Thomas Hofmann

    Abstract: We consider the problem of training generative models with deep neural networks as generators, i.e. to map latent codes to data points. Whereas the dominant paradigm combines simple priors over codes with complex deterministic models, we argue that it might be advantageous to use more flexible code distributions. We demonstrate how these distributions can be induced directly from the data. The ben… ▽ More

    Submitted 7 January, 2018; v1 submitted 31 October, 2017; originally announced October 2017.

    Comments: arXiv admin note: text overlap with arXiv:1707.09241

  46. arXiv:1710.11381  [pdf, other

    cs.LG stat.ML

    Semantic Interpolation in Implicit Models

    Authors: Yannic Kilcher, Aurelien Lucchi, Thomas Hofmann

    Abstract: In implicit models, one often interpolates between sampled points in latent space. As we show in this paper, care needs to be taken to match-up the distributional assumptions on code vectors with the geometry of the interpolating paths. Otherwise, typical assumptions about the quality and semantics of in-between points may not be justified. Based on our analysis we propose to modify the prior code… ▽ More

    Submitted 2 February, 2018; v1 submitted 31 October, 2017; originally announced October 2017.

  47. arXiv:1707.09241  [pdf, other

    stat.ML cs.LG

    Generator Reversal

    Authors: Yannic Kilcher, Aurélien Lucchi, Thomas Hofmann

    Abstract: We consider the problem of training generative models with deep neural networks as generators, i.e. to map latent codes to data points. Whereas the dominant paradigm combines simple priors over codes with complex deterministic models, we propose instead to use more flexible code distributions. These distributions are estimated non-parametrically by reversing the generator map during training. The… ▽ More

    Submitted 28 July, 2017; originally announced July 2017.

  48. Learning Aerial Image Segmentation from Online Maps

    Authors: Pascal Kaiser, Jan Dirk Wegner, Aurelien Lucchi, Martin Jaggi, Thomas Hofmann, Konrad Schindler

    Abstract: This study deals with semantic segmentation of high-resolution (aerial) images where a semantic class label is assigned to each pixel via supervised classification as a basis for automatic map generation. Recently, deep convolutional neural networks (CNNs) have shown impressive performance and have quickly become the de-facto standard for semantic segmentation, with the added benefit that task-spe… ▽ More

    Submitted 21 July, 2017; originally announced July 2017.

    Comments: Published in IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

  49. arXiv:1706.03269  [pdf, other

    cs.LG stat.ML

    An Online Learning Approach to Generative Adversarial Networks

    Authors: Paulina Grnarova, Kfir Y. Levy, Aurelien Lucchi, Thomas Hofmann, Andreas Krause

    Abstract: We consider the problem of training generative models with a Generative Adversarial Network (GAN). Although GANs can accurately model complex distributions, they are known to be difficult to train due to instabilities caused by a difficult minimax optimization problem. In this paper, we view the problem of training GANs as finding a mixed strategy in a zero-sum game. Building on ideas from online… ▽ More

    Submitted 10 June, 2017; originally announced June 2017.

  50. arXiv:1705.09367  [pdf, other

    cs.LG stat.ML

    Stabilizing Training of Generative Adversarial Networks through Regularization

    Authors: Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann

    Abstract: Deep generative models based on Generative Adversarial Networks (GANs) have demonstrated impressive sample quality but in order to work they require a careful choice of architecture, parameter initialization, and selection of hyper-parameters. This fragility is in part due to a dimensional mismatch or non-overlapping support between the model distribution and the data distribution, causing their d… ▽ More

    Submitted 7 November, 2017; v1 submitted 25 May, 2017; originally announced May 2017.