Skip to main content

Showing 1–50 of 67 results for author: Pehlevan, C

  1. arXiv:2405.17198  [pdf, other

    cs.LG math.OC

    Convex Relaxation for Solving Large-Margin Classifiers in Hyperbolic Space

    Authors: Sheng Yang, Peihan Liu, Cengiz Pehlevan

    Abstract: Hyperbolic spaces have increasingly been recognized for their outstanding performance in handling data with inherent hierarchical structures compared to their Euclidean counterparts. However, learning in hyperbolic spaces poses significant challenges. In particular, extending support vector machines to hyperbolic spaces is in general a constrained non-convex optimization problem. Previous and popu… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2405.17181  [pdf, other

    cs.LG cs.CV

    Spectral regularization for adversarially-robust representation learning

    Authors: Sheng Yang, Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: The vulnerability of neural network classifiers to adversarial attacks is a major obstacle to their deployment in safety-critical applications. Regularization of network parameters during training can be used to improve adversarial robustness and generalization performance. Usually, the network is regularized end-to-end, with parameters at all layers affected by regularization. However, in setting… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 15 + 15 pages, 8 + 11 figures

  3. arXiv:2405.15712  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Infinite Limits of Multi-head Transformer Dynamics

    Authors: Blake Bordelon, Hamza Tahir Chaudhry, Cengiz Pehlevan

    Abstract: In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime. We identify the set of parameterizations that admit well-defined infinite width and depth limits, allowing the attention layers to update throughout training--a relevant notion of feature learning in these models. We then use tools from dynamical mean field theory (DMFT) t… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  4. arXiv:2405.15618  [pdf, other

    cs.LG cs.NE

    MLPs Learn In-Context

    Authors: William L. Tong, Cengiz Pehlevan

    Abstract: In-context learning (ICL), the remarkable ability to solve a task from only input exemplars, has commonly been assumed to be a unique hallmark of Transformer models. In this study, we demonstrate that multi-layer perceptrons (MLPs) can also learn in-context. Moreover, we find that MLPs, and the closely related MLP-Mixer models, learn in-context competitively with Transformers given the same comput… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 29 pages, 9 figures, code available at https://github.com/wtong98/mlp-icl

  5. arXiv:2405.11751  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotic theory of in-context learning by linear attention

    Authors: Yue M. Lu, Mary I. Letey, Jacob A. Zavatone-Veth, Anindita Maiti, Cengiz Pehlevan

    Abstract: Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a cornerstone of Transformers' success, yet questions about the necessary sample complexity, pretraining task diversity, and context length for successful ICL remain unr… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 20 pages, 5 figures, and supplementary information

  6. arXiv:2405.00592  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Scaling and renormalization in high-dimensional regression

    Authors: Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generaliza… ▽ More

    Submitted 26 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 68 pages, 17 figures

  7. arXiv:2402.01092  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    A Dynamical Model of Neural Scaling Laws

    Authors: Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan

    Abstract: On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude. This phenomenon is known as a neural scaling law. Of fundamental importance is the compute-optimal scaling law, which reports the performance as a function of units of compute when choosing model sizes optimally. We analyze a random feature… ▽ More

    Submitted 23 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: ICML Camera Ready. Included online SGD section with additional simulations and its connection to large sample limit of our gradient flow theory. Fixed typo in Appendix eq 112

  8. arXiv:2310.06110  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Grokking as the Transition from Lazy to Rich Training Dynamics

    Authors: Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan

    Abstract: We propose that the grokking phenomenon, where the train loss of a neural network decreases much earlier than its test loss, can arise due to a neural network transitioning from lazy training dynamics to a rich, feature learning regime. To illustrate this mechanism, we study the simple setting of vanilla gradient descent on a polynomial regression problem with a two layer neural network which exhi… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Adding new experiments on higher degree Hermite polynomials, multi-index targets, removed DMFT analysis from this version

  9. arXiv:2309.16620  [pdf, other

    stat.ML cond-mat.dis-nn cs.AI cs.LG

    Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

    Authors: Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan

    Abstract: The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $μ$P parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across dep… ▽ More

    Submitted 8 December, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

  10. arXiv:2307.04841  [pdf, other

    stat.ML cond-mat.dis-nn cs.AI cs.LG

    Loss Dynamics of Temporal Difference Reinforcement Learning

    Authors: Blake Bordelon, Paul Masset, Henry Kuo, Cengiz Pehlevan

    Abstract: Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use… ▽ More

    Submitted 7 November, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: Advances in Neural Information Processing Systems 36 (2023) Camera Ready

  11. arXiv:2307.03176  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG q-bio.NC

    Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles

    Authors: Benjamin S. Ruben, Cengiz Pehlevan

    Abstract: Feature bagging is a well-established ensembling method which aims to reduce prediction variance by combining predictions of many estimators trained on subsets or projections of features. Here, we develop a theory of feature-bagging in noisy least-squares ridge ensembles and simplify the resulting learning curves in the special case of equicorrelated data. Using analytical learning curves, we demo… ▽ More

    Submitted 9 January, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023 Camera-Ready. Contains significant updates from the original submission

    Journal ref: Advances in Neural Information Processing Systems 36 (2023)

  12. arXiv:2306.04810  [pdf, other

    cs.NE cs.IT cs.LG q-bio.NC

    Correlative Information Maximization: A Biologically Plausible Approach to Supervised Deep Neural Networks without Weight Symmetry

    Authors: Bariscan Bozkurt, Cengiz Pehlevan, Alper T Erdogan

    Abstract: The backpropagation algorithm has experienced remarkable success in training large-scale artificial neural networks; however, its biological plausibility has been strongly criticized, and it remains an open question whether the brain employs supervised learning mechanisms akin to it. Here, we propose correlative information maximization between layer activations as an alternative normative approac… ▽ More

    Submitted 17 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Preprint, 38 pages

  13. arXiv:2306.04532  [pdf, other

    cs.NE cond-mat.dis-nn cs.LG q-bio.NC stat.ML

    Long Sequence Hopfield Memory

    Authors: Hamza Tahir Chaudhry, Jacob A. Zavatone-Veth, Dmitry Krotov, Cengiz Pehlevan

    Abstract: Sequence memory is an essential attribute of natural and artificial intelligence that enables agents to encode, store, and retrieve complex sequences of stimuli and actions. Computational models of sequence memory have been proposed where recurrent Hopfield-like neural networks are trained with temporally asymmetric Hebbian rules. However, these networks suffer from limited sequence capacity (maxi… ▽ More

    Submitted 2 November, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 Camera-Ready, 41 pages

    Journal ref: Advances in Neural Information Processing Systems 36 (2023)

  14. arXiv:2305.18411  [pdf, other

    cs.LG

    Feature-Learning Networks Are Consistent Across Widths At Realistic Scales

    Authors: Nikhil Vyas, Alexander Atanasov, Blake Bordelon, Depen Morwani, Sabarish Sainathan, Cengiz Pehlevan

    Abstract: We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training. For simple tasks such as CIFAR-5m this holds throughout training for networks of realistic widths.… ▽ More

    Submitted 5 December, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

    Comments: 24 pages, 19 figures. NeurIPS 2023. Revised based on reviewer feedback

  15. arXiv:2304.03408  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

    Authors: Blake Bordelon, Cengiz Pehlevan

    Abstract: We analyze the dynamics of finite width effects in wide but finite feature learning neural networks. Starting from a dynamical mean field theory description of infinite width deep neural network kernel and prediction dynamics, we provide a characterization of the $O(1/\sqrt{\text{width}})$ fluctuations of the DMFT order parameters over random initializations of the network weights. Our results, wh… ▽ More

    Submitted 7 November, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: Advances in Neural Information Processing Systems 36 (2023) Camera Ready

  16. arXiv:2303.00564  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning curves for deep structured Gaussian feature models

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: In recent years, significant attention in deep learning theory has been devoted to analyzing when models that interpolate their training data can still generalize well to unseen examples. Many insights have been gained from studying models with multiple layers of Gaussian random features, for which one can compute precise generalization asymptotics. However, few works have considered the effect of… ▽ More

    Submitted 23 October, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: 14+18 pages, 2+1 figures. NeurIPS 2023 Camera Ready

    Journal ref: Advances in Neural Information Processing Systems 36 (2023)

  17. arXiv:2301.11375  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Neural networks learn to magnify areas near decision boundaries

    Authors: Jacob A. Zavatone-Veth, Sheng Yang, Julian A. Rubinfien, Cengiz Pehlevan

    Abstract: In machine learning, there is a long history of trying to build neural networks that can learn from fewer example data by baking in strong geometric priors. However, it is not always clear a priori what geometric constraints are appropriate for a given task. Here, we consider the possibility that one can uncover useful geometric inductive biases by studying how training molds the Riemannian geomet… ▽ More

    Submitted 14 October, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: 93 pages, 48 figures

  18. arXiv:2212.12147  [pdf, other

    stat.ML cs.LG

    The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes

    Authors: Alexander Atanasov, Blake Bordelon, Sabarish Sainathan, Cengiz Pehlevan

    Abstract: For small training set sizes $P$, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network (NN), either in the kernel or mean-field/feature-learning regime. However, after a critical sample size $P^*$, we empirically find the finite-width network generalization becomes worse than that of the infinite width network. In this work, we empi… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: 34 pages, 19 figures

  19. arXiv:2210.04222  [pdf, other

    eess.SP cs.LG

    Correlative Information Maximization Based Biologically Plausible Neural Networks for Correlated Source Separation

    Authors: Bariscan Bozkurt, Ates Isfendiyaroglu, Cengiz Pehlevan, Alper T. Erdogan

    Abstract: The brain effortlessly extracts latent causes of stimuli, but how it does this at the network level remains unknown. Most prior attempts at this problem proposed neural networks that implement independent component analysis which works under the limitation that latent causes are mutually independent. Here, we relax this limitation and propose a biologically plausible neural network that extracts c… ▽ More

    Submitted 8 April, 2023; v1 submitted 9 October, 2022; originally announced October 2022.

    Comments: ICLR Accepted, 34 pages

  20. arXiv:2210.02157  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

    Authors: Blake Bordelon, Cengiz Pehlevan

    Abstract: It is unclear how changing the learning rule of a deep neural network alters its learning dynamics and representations. To gain insight into the relationship between learned features, function approximation, and the learning rule, we analyze infinite-width deep networks trained with gradient descent (GD) and biologically-plausible alternatives including feedback alignment (FA), direct feedback ali… ▽ More

    Submitted 25 May, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: ICLR 2023 Camera Ready

  21. arXiv:2209.12894  [pdf, other

    eess.SP cs.LG

    Biologically-Plausible Determinant Maximization Neural Networks for Blind Separation of Correlated Sources

    Authors: Bariscan Bozkurt, Cengiz Pehlevan, Alper T. Erdogan

    Abstract: Extraction of latent sources of complex stimuli is critical for making sense of the world. While the brain solves this blind source separation (BSS) problem continuously, its algorithms remain unknown. Previous work on biologically-plausible BSS algorithms assumed that observed signals are linear mixtures of statistically independent or uncorrelated sources, limiting the domain of applicability of… ▽ More

    Submitted 25 November, 2022; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: NeurIPS 2022, 37 pages

  22. arXiv:2209.10634  [pdf, other

    q-bio.NC cs.LG cs.NE stat.ML

    Interneurons accelerate learning dynamics in recurrent neural networks for statistical adaptation

    Authors: David Lipshutz, Cengiz Pehlevan, Dmitri B. Chklovskii

    Abstract: Early sensory systems in the brain rapidly adapt to fluctuating input statistics, which requires recurrent communication between neurons. Mechanistically, such recurrent communication is often indirect and mediated by local interneurons. In this work, we explore the computational benefits of mediating recurrent communication via interneurons compared with direct recurrent connections. To this end,… ▽ More

    Submitted 24 August, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: 16 pages, 7 figures

  23. arXiv:2209.10499  [pdf, other

    cond-mat.dis-nn math.PR

    Replica method for eigenvalues of real Wishart product matrices

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: We show how the replica method can be used to compute the asymptotic eigenvalue spectrum of a real Wishart product matrix. For unstructured factors, this provides a compact, elementary derivation of a polynomial condition on the Stieltjes transform first proved by Müller [IEEE Trans. Inf. Theory. 48, 2086-2091 (2002)]. We then show how this computation can be extended to ensembles where the factor… ▽ More

    Submitted 20 January, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: 50 pages, 5 figures

  24. arXiv:2206.06686  [pdf, other

    quant-ph cs.LG

    Bandwidth Enables Generalization in Quantum Kernel Models

    Authors: Abdulkadir Canatar, Evan Peters, Cengiz Pehlevan, Stefan M. Wild, Ruslan Shaydulin

    Abstract: Quantum computers are known to provide speedups over classical state-of-the-art machine learning methods in some specialized settings. For example, quantum kernel methods have been shown to provide an exponential speedup on a learning version of the discrete logarithm problem. Understanding the generalization of quantum models is essential to realizing similar speedups on problems of practical int… ▽ More

    Submitted 18 June, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted version

  25. arXiv:2205.09653  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks

    Authors: Blake Bordelon, Cengiz Pehlevan

    Abstract: We analyze feature learning in infinite-width neural networks trained with gradient flow through a self-consistent dynamical field theory. We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points, providing a reduced description of network activity through training. These ke… ▽ More

    Submitted 4 October, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

    Comments: Neurips 2022 Camera Ready. Fixed Appendix typos. 55 pages

  26. arXiv:2203.00573  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Contrasting random and learned features in deep Bayesian linear regression

    Authors: Jacob A. Zavatone-Veth, William L. Tong, Cengiz Pehlevan

    Abstract: Understanding how feature learning affects generalization is among the foremost goals of modern deep learning theory. Here, we study how the ability to learn representations affects the generalization performance of a simple class of models: deep Bayesian linear neural networks trained on unstructured Gaussian data. By comparing deep random feature models to deep networks in which all layers are t… ▽ More

    Submitted 16 June, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: 35 pages, 7 figures. v2: minor typos corrected and references added; published in PRE

    Journal ref: Physical Review E 105, 064118 (2022)

  27. arXiv:2201.04669  [pdf, ps, other

    cond-mat.dis-nn cs.LG

    On neural network kernels and the storage capacity problem

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: In this short note, we reify the connection between work on the storage capacity problem in wide two-layer treelike neural networks and the rapidly-growing body of literature on kernel limits of wide neural networks. Concretely, we observe that the "effective order parameter" studied in the statistical mechanics literature is exactly equivalent to the infinite-width Neural Network Gaussian Process… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

    Comments: 5 pages, no figures

    Journal ref: Neural Computation (2022) 34 (5): 1136-1142

  28. Depth induces scale-averaging in overparameterized linear Bayesian neural networks

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: Inference in deep Bayesian neural networks is only fully understood in the infinite-width limit, where the posterior flexibility afforded by increased depth washes out and the posterior predictive collapses to a shallow Gaussian process. Here, we interpret finite deep linear Bayesian neural networks as data-dependent scale mixtures of Gaussian process predictors across output channels. We leverage… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

    Comments: 8 pages, no figures

    Journal ref: 55th Asilomar Conference on Signals, Systems, and Computers, 2021

  29. arXiv:2111.05498  [pdf, other

    cs.LG cs.AI

    Attention Approximates Sparse Distributed Memory

    Authors: Trenton Bricken, Cengiz Pehlevan

    Abstract: While Attention has come to be an important mechanism in deep learning, there remains limited intuition for why it works so well. Here, we show that Transformer Attention can be closely related under certain data conditions to Kanerva's Sparse Distributed Memory (SDM), a biologically plausible associative memory model. We confirm that these conditions are satisfied in pre-trained GPT2 Transformer… ▽ More

    Submitted 17 January, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  30. arXiv:2111.00034  [pdf, other

    stat.ML cs.LG

    Neural Networks as Kernel Learners: The Silent Alignment Effect

    Authors: Alexander Atanasov, Blake Bordelon, Cengiz Pehlevan

    Abstract: Neural networks in the lazy training regime converge to kernel machines. Can neural networks in the rich feature learning regime learn a kernel machine with a data-dependent kernel? We demonstrate that this can indeed happen due to a phenomenon we term silent alignment, which requires that the tangent kernel of a network evolves in eigenstructure while small and before the loss appreciably decreas… ▽ More

    Submitted 2 December, 2021; v1 submitted 29 October, 2021; originally announced November 2021.

    Comments: 29 pages, 15 figures. Added additional experiments and expanded the derivations in the appendix

    Journal ref: ICLR 2022

  31. arXiv:2110.07472  [pdf, other

    cs.LG cs.CV stat.ML

    Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?

    Authors: Matthew Farrell, Blake Bordelon, Shubhendu Trivedi, Cengiz Pehlevan

    Abstract: Equivariance has emerged as a desirable property of representations of objects subject to identity-preserving transformations that constitute a group, such as translations and rotations. However, the expressivity of a representation constrained by group equivariance is still not fully understood. We address this gap by providing a generalization of Cover's Function Counting Theorem that quantifies… ▽ More

    Submitted 5 February, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Version accepted to ICLR 2022

  32. arXiv:2106.02713  [pdf, other

    stat.ML cs.LG

    Learning Curves for SGD on Structured Features

    Authors: Blake Bordelon, Cengiz Pehlevan

    Abstract: The generalization performance of a machine learning algorithm such as a neural network depends in a non-trivial way on the structure of the data distribution. To analyze the influence of data structure on test loss dynamics, we study an exactly solveable model of stochastic gradient descent (SGD) on mean square loss which predicts test loss when training on features with arbitrary covariance stru… ▽ More

    Submitted 14 March, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Camera Ready for ICLR 2022: https://openreview.net/forum?id=WPI2vbkAl3Q

  33. arXiv:2106.02261  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Out-of-Distribution Generalization in Kernel Regression

    Authors: Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan

    Abstract: In real word applications, data generating process for training a machine learning model often differs from what the model encounters in the test stage. Understanding how and whether machine learning models generalize under such distributional shifts have been a theoretical challenge. Here, we study generalization in kernel regression when the training and test distributions are different using me… ▽ More

    Submitted 4 February, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Eq. (SI.1.59) corrected

    Journal ref: Neural Information Processing Systems (NeurIPS), 2021

  34. arXiv:2106.00651  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Asymptotics of representation learning in finite Bayesian neural networks

    Authors: Jacob A. Zavatone-Veth, Abdulkadir Canatar, Benjamin S. Ruben, Cengiz Pehlevan

    Abstract: Recent works have suggested that finite Bayesian neural networks may sometimes outperform their infinite cousins because finite networks can flexibly adapt their internal representations. However, our theoretical understanding of how the learned hidden layer representations of finite networks differ from the fixed representations of infinite networks remains incomplete. Perturbative finite-width c… ▽ More

    Submitted 8 February, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: 13+28 pages, 4 figures; v3: extensive revision with improved exposition and new section on CNNs, accepted to NeurIPS 2021; v4: minor updates to supplement; v5: post-NeurIPS update, minor typos fixed

    Journal ref: Advances in Neural Information Processing Systems 34 (2021); JSTAT 114008 (2022)

  35. arXiv:2104.11734  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Exact marginal prior distributions of finite Bayesian neural networks

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: Bayesian neural networks are theoretically well-understood only in the infinite-width limit, where Gaussian priors over network weights yield Gaussian priors over network outputs. Recent work has suggested that finite Bayesian networks may outperform their infinite counterparts, but their non-Gaussian function space priors have been characterized only though perturbative approaches. Here, we deriv… ▽ More

    Submitted 18 October, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: 12+9 pages, 4 figures; v3: Accepted as NeurIPS 2021 Spotlight

    Journal ref: Advances in Neural Information Processing Systems 34 (2021)

  36. arXiv:2010.12632  [pdf, other

    eess.SP cs.NE q-bio.NC

    Biologically plausible single-layer networks for nonnegative independent component analysis

    Authors: David Lipshutz, Cengiz Pehlevan, Dmitri B. Chklovskii

    Abstract: An important problem in neuroscience is to understand how brains extract relevant signals from mixtures of unknown sources, i.e., perform blind source separation. To model how the brain performs this task, we seek a biologically plausible single-layer neural network implementation of a blind source separation algorithm. For biological plausibility, we require the network to satisfy the following t… ▽ More

    Submitted 4 March, 2022; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Updated version includes a second single-layer network with indirect lateral connections for solving NICA

  37. arXiv:2007.11136  [pdf, other

    cond-mat.dis-nn cs.LG stat.ML

    Activation function dependence of the storage capacity of treelike neural networks

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: The expressive power of artificial neural networks crucially depends on the nonlinearity of their activation functions. Though a wide variety of nonlinear activation functions have been proposed for use in artificial neural networks, a detailed understanding of their role in determining the expressive power of a network has not emerged. Here, we study how activation functions affect the storage ca… ▽ More

    Submitted 4 February, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

    Comments: 5+23 pages, 2+4 figures. v3: accepted for publication as a Letter in Physical Review E

    Journal ref: Phys. Rev. E 103, 020301 (2021)

  38. arXiv:2006.16540  [pdf, other

    cs.LG stat.ML

    Associative Memory in Iterated Overparameterized Sigmoid Autoencoders

    Authors: Yibo Jiang, Cengiz Pehlevan

    Abstract: Recent work showed that overparameterized autoencoders can be trained to implement associative memory via iterative maps, when the trained input-output Jacobian of the network has all of its eigenvalue norms strictly below one. Here, we theoretically analyze this phenomenon for sigmoid networks by leveraging recent developments in deep learning theory, especially the correspondence between trainin… ▽ More

    Submitted 13 August, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

    Journal ref: ICML 2020

  39. arXiv:2006.13198  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks

    Authors: Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan

    Abstract: Generalization beyond a training dataset is a main goal of machine learning, but theoretical understanding of generalization remains an open problem for many models. The need for a new theory is exacerbated by recent observations in deep neural networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. In this paper, we investi… ▽ More

    Submitted 4 February, 2022; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: Accepted for publication in Nature Communications. SI Eq.71 is corrected

  40. arXiv:2006.08115  [pdf, other

    q-bio.NC

    Minimax Dynamics of Optimally Balanced Spiking Networks of Excitatory and Inhibitory Neurons

    Authors: Qianyi Li, Cengiz Pehlevan

    Abstract: Excitation-inhibition (E-I) balance is ubiquitously observed in the cortex. Recent studies suggest an intriguing link between balance on fast timescales, tight balance, and efficient information coding with spikes. We further this connection by taking a principled approach to optimal balanced networks of excitatory (E) and inhibitory (I) neurons. By deriving E-I spiking neural networks from greedy… ▽ More

    Submitted 30 April, 2021; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: There was a typo in Eq. 3 for the definition of firing rates, where we had e^{-(t-t')/τ_E} in the integrand, which should be e^{-t'/τ_E}, it is corrected in this version

    Journal ref: NeurIPS 2020

  41. arXiv:2004.05479  [pdf, other

    eess.SP cs.NE q-bio.NC

    Blind Bounded Source Separation Using Neural Networks with Local Learning Rules

    Authors: Alper T. Erdogan, Cengiz Pehlevan

    Abstract: An important problem encountered by both natural and engineered signal processing systems is blind source separation. In many instances of the problem, the sources are bounded by their nature and known to be so, even though the particular bound may not be known. To separate such bounded sources from their mixtures, we propose a new optimization problem, Bounded Similarity Matching (BSM). A princip… ▽ More

    Submitted 11 April, 2020; originally announced April 2020.

    Comments: ICASSP 2020

  42. arXiv:2002.10378  [pdf, other

    cs.LG q-bio.NC stat.ML

    Contrastive Similarity Matching for Supervised Learning

    Authors: Shanshan Qin, Nayantara Mudur, Cengiz Pehlevan

    Abstract: We propose a novel biologically-plausible solution to the credit assignment problem motivated by observations in the ventral visual pathway and trained deep neural networks. In both, representations of objects in the same category become progressively more similar, while objects belonging to different categories become less similar. We use this observation to motivate a layer-specific learning goa… ▽ More

    Submitted 5 December, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

  43. arXiv:2002.02561  [pdf, other

    cs.LG stat.ML

    Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

    Authors: Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan

    Abstract: We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Tangent Kernel (NTK). By computing the decomposition of the… ▽ More

    Submitted 25 February, 2021; v1 submitted 6 February, 2020; originally announced February 2020.

    Comments: ICML 2020 Update: Updated section on asymptotics generalization error for power law spectra, finding agreement with Spigler, Geiger, Wyart 2019 arXiv:1905.10843. Added a section on Discrete measures and an MNIST Experiment. Eigenvalue problem can be approximated by Kernel PCA. Typo fixed on 2/25/2021

    Journal ref: Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1024-1034, 2020

  44. arXiv:1912.05127  [pdf, other

    stat.ML cs.LG

    A Closer Look at Disentangling in $β$-VAE

    Authors: Harshvardhan Sikka, Weishun Zhong, Jun Yin, Cengiz Pehlevan

    Abstract: In many data analysis tasks, it is beneficial to learn representations where each dimension is statistically independent and thus disentangled from the others. If data generating factors are also statistically independent, disentangled representations can be formed by Bayesian inference of latent variables. We examine a generalization of the Variational Autoencoder (VAE), $β$-VAE, for learning suc… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

    Comments: Presented at the 53rd Asilomar Conference on Signals, Systems, and Computers

  45. arXiv:1910.04958  [pdf, other

    cs.NE cs.LG

    Structured and Deep Similarity Matching via Structured and Deep Hebbian Networks

    Authors: Dina Obeid, Hugo Ramambason, Cengiz Pehlevan

    Abstract: Synaptic plasticity is widely accepted to be the mechanism behind learning in the brain's neural networks. A central question is how synapses, with access to only local information about the network, can still organize collectively and perform circuit-wide learning in an efficient manner. In single-layered and all-to-all connected neural networks, local plasticity has been shown to implement gradi… ▽ More

    Submitted 4 December, 2019; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: Accepted for publication in NeurIPS 2019; Minor typos fixed

  46. arXiv:1908.01867  [pdf, other

    q-bio.NC cs.NE

    Neuroscience-inspired online unsupervised learning algorithms

    Authors: Cengiz Pehlevan, Dmitri B. Chklovskii

    Abstract: Although the currently popular deep learning networks achieve unprecedented performance on some tasks, the human brain still has a monopoly on general intelligence. Motivated by this and biological implausibility of deep learning networks, we developed a family of biologically plausible artificial neural networks (NNs) for unsupervised learning. Our approach is based on optimizing principled objec… ▽ More

    Submitted 6 September, 2019; v1 submitted 5 August, 2019; originally announced August 2019.

    Comments: Accepted for publication in IEEE Signal Processing Magazine

  47. arXiv:1902.01429  [pdf, other

    cs.NE q-bio.NC

    A Spiking Neural Network with Local Learning Rules Derived From Nonnegative Similarity Matching

    Authors: Cengiz Pehlevan

    Abstract: The design and analysis of spiking neural network algorithms will be accelerated by the advent of new theoretical approaches. In an attempt at such approach, we provide a principled derivation of a spiking algorithm for unsupervised learning, starting from the nonnegative similarity matching cost function. The resulting network consists of integrate-and-fire units and exhibits local learning rules… ▽ More

    Submitted 18 February, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: ICASSP 2019

  48. arXiv:1810.06966  [pdf, other

    stat.CO math.DS q-bio.NC

    Biologically Plausible Online Principal Component Analysis Without Recurrent Neural Dynamics

    Authors: Victor Minden, Cengiz Pehlevan, Dmitri B. Chklovskii

    Abstract: Artificial neural networks that learn to perform Principal Component Analysis (PCA) and related tasks using strictly local learning rules have been previously derived based on the principle of similarity matching: similar pairs of inputs should map to similar pairs of outputs. However, the operation of these networks (and of similar networks) requires a fixed-point iteration to determine the outpu… ▽ More

    Submitted 2 November, 2018; v1 submitted 16 October, 2018; originally announced October 2018.

    Comments: 8 pages, 2 figures

  49. arXiv:1808.02083  [pdf, other

    stat.CO cs.LG

    Efficient Principal Subspace Projection of Streaming Data Through Fast Similarity Matching

    Authors: Andrea Giovannucci, Victor Minden, Cengiz Pehlevan, Dmitri B. Chklovskii

    Abstract: Big data problems frequently require processing datasets in a streaming fashion, either because all data are available at once but collectively are larger than available memory or because the data intrinsically arrive one data point at a time and must be processed online. Here, we introduce a computationally efficient version of similarity matching, a framework for online dimensionality reduction… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

    Comments: 9 pages, 4 figures

  50. arXiv:1802.05362  [pdf, other

    hep-th

    Holography, Fractals and the Weyl Anomaly

    Authors: Gerald Guralnik, Zachary Guralnik, Cengiz Pehlevan

    Abstract: We study the large source asymptotics of the generating functional in quantum field theory using the holographic renormalization group, and draw comparisons with the asymptotics of the Hopf characteristic function in fractal geometry. Based on the asymptotic behavior, we find a correspondence relating the Weyl anomaly and the fractal dimension of the Euclidean path integral measure. We are led to… ▽ More

    Submitted 7 March, 2019; v1 submitted 14 February, 2018; originally announced February 2018.

    Comments: 24 pages, 2 figures, factor of two error corrected, minor edits

    Report number: Brown-HET-1726