Skip to main content

Showing 1–7 of 7 results for author: Nagle, A

  1. arXiv:2406.03072  [pdf, other

    cs.LG cs.IT stat.ML

    Local to Global: Learning Dynamics and Effect of Initialization for Transformers

    Authors: Ashok Vardhan Makkuva, Marco Bondaschi, Chanakya Ekbote, Adway Girish, Alliot Nagle, Hyeji Kim, Michael Gastpar

    Abstract: In recent years, transformer-based models have revolutionized deep learning, particularly in sequence modeling. To better understand this phenomenon, there is a growing interest in using Markov input processes to study transformers. However, our current understanding in this regard remains limited with many fundamental questions about how transformers learn Markov chains still unanswered. In this… ▽ More

    Submitted 27 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  2. arXiv:2402.04161  [pdf, other

    cs.LG cs.CL cs.IT stat.ML

    Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains

    Authors: Ashok Vardhan Makkuva, Marco Bondaschi, Adway Girish, Alliot Nagle, Martin Jaggi, Hyeji Kim, Michael Gastpar

    Abstract: In recent years, attention-based transformers have achieved tremendous success across a variety of disciplines including natural languages. A key ingredient behind their success is the generative pretraining procedure, during which these models are trained on a large text corpus in an auto-regressive manner. To shed light on this phenomenon, we propose a new framework that allows both theory and s… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  3. arXiv:2310.13844  [pdf

    cs.ET cond-mat.dis-nn cs.NE

    Multi-level, Forming Free, Bulk Switching Trilayer RRAM for Neuromorphic Computing at the Edge

    Authors: Jaeseoung Park, Ashwani Kumar, Yucheng Zhou, Sangheon Oh, Jeong-Hoon Kim, Yuhan Shi, Soumil Jain, Gopabandhu Hota, Amelie L. Nagle, Catherine D. Schuman, Gert Cauwenberghs, Duygu Kuzum

    Abstract: Resistive memory-based reconfigurable systems constructed by CMOS-RRAM integration hold great promise for low energy and high throughput neuromorphic computing. However, most RRAM technologies relying on filamentary switching suffer from variations and noise leading to computational accuracy loss, increased energy consumption, and overhead by expensive program and verify schemes. Low ON-state resi… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  4. arXiv:2307.07085  [pdf, ps, other

    physics.chem-ph cs.AI

    Machine-learned molecular mechanics force field for the simulation of protein-ligand systems and beyond

    Authors: Kenichiro Takaba, Iván Pulido, Pavan Kumar Behara, Chapin E. Cavender, Anika J. Friedman, Michael M. Henry, Hugo MacDermott Opeskin, Christopher R. Iacovella, Arnav M. Nagle, Alexander Matthew Payne, Michael R. Shirts, David L. Mobley, John D. Chodera, Yuanqing Wang

    Abstract: The development of reliable and extensible molecular mechanics (MM) force fields -- fast, empirical models characterizing the potential energy surface of molecular systems -- is indispensable for biomolecular simulation and computer-aided drug design. Here, we introduce a generalized and extensible machine-learned MM force field, \texttt{espaloma-0.3}, and an end-to-end differentiable framework us… ▽ More

    Submitted 8 December, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

  5. arXiv:2202.12002  [pdf, other

    cs.LG cs.AI cs.CV

    Rare Gems: Finding Lottery Tickets at Initialization

    Authors: Kartik Sreenivasan, Jy-yong Sohn, Liu Yang, Matthew Grinde, Alliot Nagle, Hongyi Wang, Eric Xing, Kangwook Lee, Dimitris Papailiopoulos

    Abstract: Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming "train, prune, re-train" approach. Frankle & Carbin conjecture that we can avoid this by training "lottery tickets", i.e., special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work by Frankle e… ▽ More

    Submitted 2 June, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  6. arXiv:2106.02797  [pdf, other

    cs.IT cs.LG

    Neural Distributed Source Coding

    Authors: Jay Whang, Alliot Nagle, Anish Acharya, Hyeji Kim, Alexandros G. Dimakis

    Abstract: Distributed source coding (DSC) is the task of encoding an input in the absence of correlated side information that is only available to the decoder. Remarkably, Slepian and Wolf showed in 1973 that an encoder without access to the side information can asymptotically achieve the same compression rate as when the side information is available to it. While there is vast prior work on this topic, pra… ▽ More

    Submitted 1 July, 2024; v1 submitted 5 June, 2021; originally announced June 2021.

    Comments: To be published in JSAIT

  7. arXiv:2006.07990  [pdf, other

    cs.LG cs.IT stat.ML

    Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient

    Authors: Ankit Pensia, Shashank Rajput, Alliot Nagle, Harit Vishwakarma, Dimitris Papailiopoulos

    Abstract: The strong {\it lottery ticket hypothesis} (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al. \cite{MalachEtAl20} establishes the first theoretical analysis for the strong LTH: one can provably approximate a neural network of width $d$ and depth $l$, by pruning a random… ▽ More

    Submitted 11 March, 2021; v1 submitted 14 June, 2020; originally announced June 2020.