Skip to main content

Showing 1–13 of 13 results for author: Makkuva, A V

  1. arXiv:2406.03072  [pdf, other

    cs.LG cs.IT stat.ML

    Local to Global: Learning Dynamics and Effect of Initialization for Transformers

    Authors: Ashok Vardhan Makkuva, Marco Bondaschi, Chanakya Ekbote, Adway Girish, Alliot Nagle, Hyeji Kim, Michael Gastpar

    Abstract: In recent years, transformer-based models have revolutionized deep learning, particularly in sequence modeling. To better understand this phenomenon, there is a growing interest in using Markov input processes to study transformers. However, our current understanding in this regard remains limited with many fundamental questions about how transformers learn Markov chains still unanswered. In this… ▽ More

    Submitted 27 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  2. arXiv:2402.04161  [pdf, other

    cs.LG cs.CL cs.IT stat.ML

    Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains

    Authors: Ashok Vardhan Makkuva, Marco Bondaschi, Adway Girish, Alliot Nagle, Martin Jaggi, Hyeji Kim, Michael Gastpar

    Abstract: In recent years, attention-based transformers have achieved tremendous success across a variety of disciplines including natural languages. A key ingredient behind their success is the generative pretraining procedure, during which these models are trained on a large text corpus in an auto-regressive manner. To shed light on this phenomenon, we propose a new framework that allows both theory and s… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  3. arXiv:2310.13033  [pdf, other

    cs.NE cs.AI cs.IT cs.LG

    LASER: Linear Compression in Wireless Distributed Optimization

    Authors: Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael C. Gastpar

    Abstract: Data-parallel SGD is the de facto algorithm for distributed optimization, especially for large scale machine learning. Despite its merits, communication bottleneck is one of its persistent issues. Most compression schemes to alleviate this either assume noiseless communication links, or fail to achieve good performance on practical tasks. In this paper, we close this gap and introduce LASER: LineA… ▽ More

    Submitted 6 February, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

  4. arXiv:2301.06251  [pdf, other

    cs.IT cs.AI cs.LG

    Machine Learning-Aided Efficient Decoding of Reed-Muller Subcodes

    Authors: Mohammad Vahid Jamali, Xiyang Liu, Ashok Vardhan Makkuva, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath

    Abstract: Reed-Muller (RM) codes achieve the capacity of general binary-input memoryless symmetric channels and are conjectured to have a comparable performance to that of random codes in terms of scaling laws. However, such results are established assuming maximum-likelihood decoders for general code parameters. Also, RM codes only admit limited sets of rates. Efficient decoders such as successive cancella… ▽ More

    Submitted 31 July, 2023; v1 submitted 15 January, 2023; originally announced January 2023.

    Comments: Accepted for publication in the Journal on Selected Areas in Information Theory. arXiv admin note: substantial text overlap with arXiv:2102.01671

  5. arXiv:2210.00313  [pdf, other

    cs.IT cs.AI cs.LG

    CRISP: Curriculum based Sequential Neural Decoders for Polar Code Family

    Authors: S Ashwin Hebbar, Viraj Nadkarni, Ashok Vardhan Makkuva, Suma Bhat, Sewoong Oh, Pramod Viswanath

    Abstract: Polar codes are widely used state-of-the-art codes for reliable communication that have recently been included in the 5th generation wireless standards (5G). However, there remains room for the design of polar decoders that are both efficient and reliable in the short blocklength regime. Motivated by recent successes of data-driven channel decoders, we introduce a novel $\textbf{C}$ur… ▽ More

    Submitted 29 May, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

    Comments: 23 pages, 23 figures. ICML 2023

  6. TinyTurbo: Efficient Turbo Decoders on Edge

    Authors: S Ashwin Hebbar, Rajesh K Mishra, Sravan Kumar Ankireddy, Ashok V Makkuva, Hyeji Kim, Pramod Viswanath

    Abstract: In this paper, we introduce a neural-augmented decoder for Turbo codes called TINYTURBO . TINYTURBO has complexity comparable to the classical max-log-MAP algorithm but has much better reliability than the max-log-MAP baseline and performs close to the MAP algorithm. We show that TINYTURBO exhibits strong robustness on a variety of practical channels of interest, such as EPA and EVA channels, whic… ▽ More

    Submitted 30 September, 2022; originally announced September 2022.

    Comments: 10 pages, 6 figures. Published at the 2022 IEEE International Symposium on Information Theory (ISIT)

    Journal ref: "TinyTurbo: Efficient Turbo Decoders on Edge," 2022 IEEE International Symposium on Information Theory (ISIT), 2022, pp. 2797-2802

  7. arXiv:2108.12920  [pdf, other

    cs.IT cs.AI

    KO codes: Inventing Nonlinear Encoding and Decoding for Reliable Wireless Communication via Deep-learning

    Authors: Ashok Vardhan Makkuva, Xiyang Liu, Mohammad Vahid Jamali, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath

    Abstract: Landmark codes underpin reliable physical layer communication, e.g., Reed-Muller, BCH, Convolution, Turbo, LDPC and Polar codes: each is a linear code and represents a mathematical breakthrough. The impact on humanity is huge: each of these codes has been used in global wireless communication standards (satellite, WiFi, cellular). Reliability of communication over the classical additive white Gaus… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.

  8. arXiv:2102.01671  [pdf, ps, other

    cs.IT

    Reed-Muller Subcodes: Machine Learning-Aided Design of Efficient Soft Recursive Decoding

    Authors: Mohammad Vahid Jamali, Xiyang Liu, Ashok Vardhan Makkuva, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath

    Abstract: Reed-Muller (RM) codes are conjectured to achieve the capacity of any binary-input memoryless symmetric (BMS) channel, and are observed to have a comparable performance to that of random codes in terms of scaling laws. On the negative side, RM codes lack efficient decoders with performance close to that of a maximum likelihood decoder for general parameters. Also, they only admit certain discrete… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

  9. arXiv:1908.10962  [pdf, other

    cs.LG stat.ML

    Optimal transport mapping via input convex neural networks

    Authors: Ashok Vardhan Makkuva, Amirhossein Taghvaei, Sewoong Oh, Jason D. Lee

    Abstract: In this paper, we present a novel and principled approach to learn the optimal transport between two distributions, from samples. Guided by the optimal transport theory, we learn the optimal Kantorovich potential which induces the optimal transport map. This involves learning two convex functions, by solving a novel minimax optimization. Building upon recent advances in the field of input convex n… ▽ More

    Submitted 17 June, 2020; v1 submitted 28 August, 2019; originally announced August 2019.

  10. arXiv:1906.02777  [pdf, other

    cs.LG stat.ML

    Learning in Gated Neural Networks

    Authors: Ashok Vardhan Makkuva, Sewoong Oh, Sreeram Kannan, Pramod Viswanath

    Abstract: Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks. The backbone of such gated networks is a mixture-of-experts layer, where several experts make regression decisions and gating controls how to weigh the decisions in an input-dependent manner. Despite having such a prominent role in both modern and classical machine learning, very little… ▽ More

    Submitted 17 June, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

  11. arXiv:1810.04133  [pdf, other

    cs.LG stat.ML

    Learning One-hidden-layer Neural Networks under General Input Distributions

    Authors: Weihao Gao, Ashok Vardhan Makkuva, Sewoong Oh, Pramod Viswanath

    Abstract: Significant advances have been made recently on training neural networks, where the main challenge is in solving an optimization problem with abundant critical points. However, existing approaches to address this issue crucially rely on a restrictive assumption: the training data is drawn from a Gaussian distribution. In this paper, we provide a novel unified framework to design loss functions wit… ▽ More

    Submitted 26 February, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

    Comments: 19 pages, 4 figures

  12. arXiv:1802.07417  [pdf, other

    cs.LG

    Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms

    Authors: Ashok Vardhan Makkuva, Sewoong Oh, Sreeram Kannan, Pramod Viswanath

    Abstract: Mixture-of-Experts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE including the EM algorithm, and gradient descent are known to get stuck in local optima. From a theoretical viewpoint, finding an… ▽ More

    Submitted 6 June, 2019; v1 submitted 20 February, 2018; originally announced February 2018.

  13. arXiv:1601.07498  [pdf, ps, other

    cs.IT

    Equivalence of additive-combinatorial linear inequalities for Shannon entropy and differential entropy

    Authors: Ashok Vardhan Makkuva, Yihong Wu

    Abstract: This paper addresses the correspondence between linear inequalities of Shannon entropy and differential entropy for sums of independent group-valued random variables. We show that any balanced (with the sum of coefficients being zero) linear inequality of Shannon entropy holds if and only if its differential entropy counterpart also holds; moreover, any linear inequality for differential entropy m… ▽ More

    Submitted 8 September, 2016; v1 submitted 27 January, 2016; originally announced January 2016.