Skip to main content

Showing 1–13 of 13 results for author: Bahri, Y

  1. arXiv:2403.03154  [pdf, other

    physics.comp-ph cond-mat.other cs.AI

    Quantum Many-Body Physics Calculations with Large Language Models

    Authors: Haining Pan, Nayantara Mudur, Will Taranto, Maria Tikhanovskaya, Subhashini Venugopalan, Yasaman Bahri, Michael P. Brenner, Eun-Ah Kim

    Abstract: Large language models (LLMs) have demonstrated an unprecedented ability to perform complex tasks in multiple domains, including mathematical and scientific reasoning. We demonstrate that with carefully designed prompts, LLMs can accurately carry out key calculations in research papers in theoretical physics. We focus on a broadly used approximation method in quantum physics: the Hartree-Fock metho… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 9 pages, 4 figures. Supplemental material in the source file

  2. arXiv:2309.01592  [pdf, other

    stat.ML cs.AI cs.LG hep-th math.PR

    Les Houches Lectures on Deep Learning at Large & Infinite Width

    Authors: Yasaman Bahri, Boris Hanin, Antonin Brossollet, Vittorio Erba, Christian Keup, Rosalba Pacelli, James B. Simon

    Abstract: These lectures, presented at the 2022 Les Houches Summer School on Statistical Physics and Machine Learning, focus on the infinite-width limit and large-width regime of deep neural networks. Topics covered include various statistical and dynamical properties of these networks. In particular, the lecturers discuss properties of random deep neural networks; connections between trained deep neural ne… ▽ More

    Submitted 12 February, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

    Comments: These are notes from lectures delivered by Yasaman Bahri and Boris Hanin at the 2022 Les Houches Summer School on Statistics Physics and Machine Learning and a first version of them were transcribed by Antonin Brossollet, Vittorio Erba, Christian Keup, Rosalba Pacelli, James B. Simon

  3. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  4. arXiv:2106.15831  [pdf, other

    cs.LG cs.AI cs.CV

    The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning

    Authors: Anders Andreassen, Yasaman Bahri, Behnam Neyshabur, Rebecca Roelofs

    Abstract: Although machine learning models typically experience a drop in performance on out-of-distribution data, accuracies on in- versus out-of-distribution data are widely observed to follow a single linear trend when evaluated across a testbed of models. Models that are more accurate on the out-of-distribution data relative to this baseline exhibit "effective robustness" and are exceedingly rare. Ident… ▽ More

    Submitted 30 June, 2021; originally announced June 2021.

    Comments: 27 pages, 25 figures

  5. arXiv:2102.06701  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Explaining Neural Scaling Laws

    Authors: Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, Utkarsh Sharma

    Abstract: The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scali… ▽ More

    Submitted 28 April, 2024; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: 11 pages, 3 figures + Supplement (expanded). This version to appear in PNAS

    Journal ref: PNAS 121 (27) e2311878121 (2024)

  6. arXiv:2006.10541  [pdf, other

    stat.ML cs.LG

    Exact posterior distributions of wide Bayesian neural networks

    Authors: Jiri Hron, Yasaman Bahri, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein

    Abstract: Recent work has shown that the prior over functions induced by a deep Bayesian neural network (BNN) behaves as a Gaussian process (GP) as the width of all layers becomes large. However, many BNN applications are concerned with the BNN function space posterior. While some empirical evidence of the posterior convergence was provided in the original works of Neal (1996) and Matthews et al. (2018), it… ▽ More

    Submitted 26 November, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

  7. arXiv:2006.10540  [pdf, other

    stat.ML cs.LG

    Infinite attention: NNGP and NTK for deep attention networks

    Authors: Jiri Hron, Yasaman Bahri, Jascha Sohl-Dickstein, Roman Novak

    Abstract: There is a growing amount of literature on the relationship between wide neural networks (NNs) and Gaussian processes (GPs), identifying an equivalence between the two for a variety of NN architectures. This equivalence enables, for instance, accurate approximation of the behaviour of wide Bayesian NNs without MCMC or variational approximations, or characterisation of the distribution of randomly… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: ICML 2020

  8. arXiv:2003.02218  [pdf, other

    stat.ML cs.LG

    The large learning rate phase of deep learning: the catapult mechanism

    Authors: Aitor Lewkowycz, Yasaman Bahri, Ethan Dyer, Jascha Sohl-Dickstein, Guy Gur-Ari

    Abstract: The choice of initial learning rate can have a profound effect on the performance of deep networks. We present a class of neural networks with solvable training dynamics, and confirm their predictions empirically in practical deep learning settings. The networks exhibit sharply distinct behaviors at small and large learning rates. The two regimes are separated by a phase transition. In the small l… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

    Comments: 25 pages, 19 figures

  9. Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

    Authors: Jaehoon Lee, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, Jeffrey Pennington

    Abstract: A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained… ▽ More

    Submitted 8 December, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

    Comments: 12+16 pages; open-source code available at https://github.com/google/neural-tangents; accepted to NeurIPS 2019

  10. arXiv:1810.05148  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

    Authors: Roman Novak, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Greg Yang, Jiri Hron, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

    Abstract: There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous… ▽ More

    Submitted 21 August, 2020; v1 submitted 11 October, 2018; originally announced October 2018.

    Comments: Published as a conference paper at ICLR 2019

  11. arXiv:1806.05393  [pdf, other

    stat.ML cs.LG

    Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

    Authors: Lechao Xiao, Yasaman Bahri, Jascha Sohl-Dickstein, Samuel S. Schoenholz, Jeffrey Pennington

    Abstract: In recent years, state-of-the-art methods in computer vision have utilized increasingly deep convolutional neural network architectures (CNNs), with some of the most successful models employing hundreds or even thousands of layers. A variety of pathologies such as vanishing/exploding gradients make training such deep networks challenging. While residual connections and batch normalization do enabl… ▽ More

    Submitted 10 July, 2018; v1 submitted 14 June, 2018; originally announced June 2018.

    Comments: ICML 2018 Conference Proceedings

  12. arXiv:1802.08760  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Sensitivity and Generalization in Neural Networks: an Empirical Study

    Authors: Roman Novak, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

    Abstract: In practice it is often found that large over-parameterized neural networks generalize better than their smaller counterparts, an observation that appears to conflict with classical notions of function complexity, which typically favor smaller models. In this work, we investigate this tension between complexity and generalization through an extensive empirical exploration of two natural metrics of… ▽ More

    Submitted 18 June, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: Published as a conference paper at ICLR 2018

  13. arXiv:1711.00165  [pdf, other

    stat.ML cs.LG

    Deep Neural Networks as Gaussian Processes

    Authors: Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein

    Abstract: It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer… ▽ More

    Submitted 2 March, 2018; v1 submitted 31 October, 2017; originally announced November 2017.

    Comments: Published version in ICLR 2018. 10 pages + appendix