Skip to main content

Showing 1–18 of 18 results for author: Wojtowytsch, S

  1. arXiv:2403.19507  [pdf, other

    cs.LG

    SineNet: Learning Temporal Dynamics in Time-Dependent Partial Differential Equations

    Authors: Xuan Zhang, Jacob Helwig, Yuchao Lin, Yaochen Xie, Cong Fu, Stephan Wojtowytsch, Shuiwang Ji

    Abstract: We consider using deep neural networks to solve time-dependent partial differential equations (PDEs), where multi-scale processing is crucial for modeling complex, time-evolving dynamics. While the U-Net architecture with skip connections is commonly used by prior studies to enable multi-scale processing, our analysis shows that the need for features to evolve across layers results in temporally m… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: The Twelfth International Conference on Learning Representations

  2. arXiv:2311.06138  [pdf, other

    stat.ML cs.LG math.OC

    Minimum norm interpolation by perceptra: Explicit regularization and implicit bias

    Authors: Jiyoung Park, Ian Pelakh, Stephan Wojtowytsch

    Abstract: We investigate how shallow ReLU networks interpolate between known regions. Our analysis shows that empirical risk minimizers converge to a minimum norm interpolant as the number of data points and parameters tends to infinity when a weight decay regularizer is penalized with a coefficient which vanishes at a precise rate as the network width and the number of data points grow. With and without ex… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Journal ref: Thirty-seventh Conference on Neural Information Processing Systems 2023

  3. arXiv:2310.17610  [pdf, other

    math.OC cs.LG math.CA math.NA stat.ML

    A qualitative difference between gradient flows of convex functions in finite- and infinite-dimensional Hilbert spaces

    Authors: Jonathan W. Siegel, Stephan Wojtowytsch

    Abstract: We consider gradient flow/gradient descent and heavy ball/accelerated gradient descent optimization for convex objective functions. In the gradient flow case, we prove the following: 1. If $f$ does not have a minimizer, the convergence $f(x_t)\to \inf f$ can be arbitrarily slow. 2. If $f$ does have a minimizer, the excess energy $f(x_t) - \inf f$ is integrable/summable in time. In particular,… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    MSC Class: 26A51; 34A34

  4. arXiv:2306.05697  [pdf, other

    cs.LG math.NA

    Group Equivariant Fourier Neural Operators for Partial Differential Equations

    Authors: Jacob Helwig, Xuan Zhang, Cong Fu, Jerry Kurtin, Stephan Wojtowytsch, Shuiwang Ji

    Abstract: We consider solving partial differential equations (PDEs) with Fourier neural operators (FNOs), which operate in the frequency domain. Since the laws of physics do not depend on the coordinate system used to describe them, it is desirable to encode such symmetries in the neural operator architecture for better performance and easier learning. While encoding symmetries in the physical domain using… ▽ More

    Submitted 27 July, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: Proceedings of the 40th International Conference on Machine Learning https://icml.cc/virtual/2023/poster/23875

  5. arXiv:2302.05515  [pdf, other

    stat.ML cs.LG math.OC

    Achieving acceleration despite very noisy gradients

    Authors: Kanan Gupta, Jonathan Siegel, Stephan Wojtowytsch

    Abstract: We present a generalization of Nesterov's accelerated gradient descent algorithm. Our algorithm (AGNES) provably achieves acceleration for smooth convex minimization tasks with noisy gradient estimates if the noise intensity is proportional to the magnitude of the gradient. Nesterov's accelerated gradient descent does not converge under this noise model if the constant of proportionality exceeds o… ▽ More

    Submitted 25 May, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

    MSC Class: 68T07

  6. arXiv:2209.01173  [pdf, other

    stat.ML cs.LG

    Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

    Authors: Stephan Wojtowytsch

    Abstract: In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exist… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

    Comments: Main text 24 pages

    MSC Class: 68T07; 65D40; 41A30

  7. arXiv:2203.13410  [pdf, ps, other

    cs.LG math.FA stat.ML

    Qualitative neural network approximation over R and C: Elementary proofs for analytic and polynomial activation

    Authors: Josiah Park, Stephan Wojtowytsch

    Abstract: In this article, we prove approximation theorems in classes of deep and shallow neural networks with analytic activation functions by elementary arguments. We prove for both real and complex networks with non-polynomial activation that the closure of the class of neural networks coincides with the closure of the space of polynomials. The closure can further be characterized by the Stone-Weierstras… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: 26 pages

    MSC Class: 68T07; 41A30; 41A10; 32A05; 32A15; 31B05

  8. arXiv:2106.02588  [pdf, other

    cs.LG math.AP stat.ML

    Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis

    Authors: Stephan Wojtowytsch

    Abstract: The representation of functions by artificial neural networks depends on a large number of parameters in a non-linear fashion. Suitable parameters of these are found by minimizing a 'loss functional', typically by stochastic gradient descent (SGD) or an advanced SGD-based algorithm. In a continuous time model for SGD with noise that follows the 'machine learning scaling', we show that in a certa… ▽ More

    Submitted 14 September, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    MSC Class: Primary: 90C26; Secondary: 68T07; 35K65; 60H30

  9. arXiv:2105.01650  [pdf, other

    stat.ML cs.LG math.OC

    Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis

    Authors: Stephan Wojtowytsch

    Abstract: Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine learning. The noise encountered in these applications is different from that in many theoretical analyses of stochastic gradient algorithms. In this article, we discuss some of the common properties of energy landscapes and stochastic noise encountered in machine learning problems, and how they affect SGD-bas… ▽ More

    Submitted 14 September, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    MSC Class: Primary: 90C26; 90C15. Secondary: 68T07; 90C30

  10. arXiv:2012.05420  [pdf, ps, other

    cs.LG stat.ML

    On the emergence of simplex symmetry in the final and penultimate layers of neural network classifiers

    Authors: Weinan E, Stephan Wojtowytsch

    Abstract: A recent numerical study observed that neural network classifiers enjoy a large degree of symmetry in the penultimate layer. Namely, if $h(x) = Af(x) +b$ where $A$ is a linear map and $f$ is the output of the penultimate layer of the network (after activation), then all data points $x_{i, 1}, \dots, x_{i, N_i}$ in a class $C_i$ are mapped to a single point $y_i$ by $f$ and the points $y_i$ are loc… ▽ More

    Submitted 4 June, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

    MSC Class: 68T07; 62H30

  11. arXiv:2012.01484  [pdf, ps, other

    math.AP cs.LG

    Some observations on high-dimensional partial differential equations with Barron data

    Authors: Weinan E, Stephan Wojtowytsch

    Abstract: We use explicit representation formulas to show that solutions to certain partial differential equations lie in Barron spaces or multilayer spaces if the PDE data lie in such function spaces. Consequently, these solutions can be represented efficiently using artificial neural networks, even in high dimension. Conversely, we present examples in which the solution fails to lie in the function space… ▽ More

    Submitted 4 June, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    MSC Class: 68T07; 35C15; 65M80

  12. arXiv:2009.13500  [pdf, ps, other

    stat.ML cs.LG math.NA

    A priori estimates for classification problems using neural networks

    Authors: Weinan E, Stephan Wojtowytsch

    Abstract: We consider binary and multi-class classification problems using hypothesis classes of neural networks. For a given hypothesis class, we use Rademacher complexity estimates and direct approximation theorems to obtain a priori error estimates for regularized loss functionals.

    Submitted 28 September, 2020; originally announced September 2020.

    MSC Class: 68T07; 60-08

  13. arXiv:2009.10713  [pdf, other

    cs.LG math.NA stat.ML

    Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't

    Authors: Weinan E, Chao Ma, Stephan Wojtowytsch, Lei Wu

    Abstract: The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning. In the tradition of good old applied mathematics, we will not only give attention to rigorous mathematical results, but also the insight we have gained from careful numerical experiments as well as… ▽ More

    Submitted 7 December, 2020; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: Review article. Feedback welcome

    MSC Class: 68T07 (primary); 26B40; 41A30; 35Q68

  14. arXiv:2007.15623  [pdf, ps, other

    stat.ML cs.LG math.FA

    On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

    Authors: Weinan E, Stephan Wojtowytsch

    Abstract: We develop Banach spaces for ReLU neural networks of finite depth $L$ and infinite width. The spaces contain all finite fully connected $L$-layer networks and their $L^2$-limiting objects under bounds on the natural path-norm. Under this norm, the unit ball in the space for $L$-layer networks has low Rademacher complexity and thus favorable generalization properties. Functions in these spaces can… ▽ More

    Submitted 30 July, 2020; originally announced July 2020.

    MSC Class: 68T07; 46E15; 26B35; 26B40

  15. arXiv:2006.05982  [pdf, ps, other

    stat.ML cs.LG math.AP math.FA

    Representation formulas and pointwise properties for Barron functions

    Authors: Weinan E, Stephan Wojtowytsch

    Abstract: We study the natural function space for infinitely wide two-layer neural networks with ReLU activation (Barron space) and establish different representation formulae. In two cases, we describe the space explicitly up to isomorphism. Using a convenient representation, we study the pointwise properties of two-layer networks and show that functions whose singular set is fractal or curved (for examp… ▽ More

    Submitted 4 June, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    MSC Class: 68T07; 46E15; 26B35; 26B40

  16. arXiv:2005.13530  [pdf, ps, other

    math.AP cs.LG stat.ML

    On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime

    Authors: Stephan Wojtowytsch

    Abstract: We describe a necessary and sufficient condition for the convergence to minimum Bayes risk when training two-layer ReLU-networks by gradient descent in the mean field regime with omni-directional initial parameter distribution. This article extends recent results of Chizat and Bach to ReLU-activated networks and to the situation in which there are no parameters which exactly achieve MBR. The condi… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    MSC Class: 35Q68; 68T07; 49Q22; 35F20

  17. arXiv:2005.10815  [pdf, other

    cs.LG math.AP stat.ML

    Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

    Authors: Stephan Wojtowytsch, Weinan E

    Abstract: We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent tra… ▽ More

    Submitted 21 May, 2020; originally announced May 2020.

    Comments: 5 figures

    MSC Class: 68T07; 49Q22; 68W25

  18. arXiv:2005.10807  [pdf, ps, other

    math.FA cs.LG stat.ML

    Kolmogorov Width Decay and Poor Approximators in Machine Learning: Shallow Neural Networks, Random Feature Models and Neural Tangent Kernels

    Authors: Weinan E, Stephan Wojtowytsch

    Abstract: We establish a scale separation of Kolmogorov width type between subspaces of a given Banach space under the condition that a sequence of linear maps converges much faster on one of the subspaces. The general technique is then applied to show that reproducing kernel Hilbert spaces are poor $L^2$-approximators for the class of two-layer neural networks in high dimension, and that multi-layer networ… ▽ More

    Submitted 2 October, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    MSC Class: 68T07; 41A30; 41A65; 46E15; 46E22