Skip to main content

Showing 1–50 of 88 results for author: Rawat, A

  1. arXiv:2407.10005  [pdf, other

    cs.LG cs.AI cs.CL math.OC

    Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond

    Authors: Yingcong Li, Ankit Singh Rawat, Samet Oymak

    Abstract: Recent research has shown that Transformers with linear attention are capable of in-context learning (ICL) by implementing a linear estimator through gradient descent steps. However, the existing results on the optimization landscape apply under stylized settings where task and feature vectors are assumed to be IID and the attention weights are fully parameterized. In this work, we develop a stron… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  2. arXiv:2406.17968  [pdf, other

    cs.IR cs.AI cs.LG stat.ML

    Efficient Document Ranking with Learnable Late Interactions

    Authors: Ziwei Ji, Himanshu Jain, Andreas Veit, Sashank J. Reddi, Sadeep Jayasumana, Ankit Singh Rawat, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

    Abstract: Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been p… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.07840  [pdf, other

    cs.CV

    SynthForge: Synthesizing High-Quality Face Dataset with Controllable 3D Generative Models

    Authors: Abhay Rawat, Shubham Dokania, Astitva Srivastava, Shuaib Ahmed, Haiwen Feng, Rahul Tallamraju

    Abstract: Recent advancements in generative models have unlocked the capabilities to render photo-realistic data in a controllable fashion. Trained on the real data, these generative models are capable of producing realistic samples with minimal to no domain gap, as compared to the traditional graphics rendering. However, using the data generated using such models for training downstream tasks remains under… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 11 pages, 4 figures, 3 tables. Under Review

  4. arXiv:2406.00060  [pdf, other

    cs.CL cs.LG

    Cascade-Aware Training of Language Models

    Authors: Congchao Wang, Sean Augenstein, Keith Rush, Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Aditya Krishna Menon, Alec Go

    Abstract: Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for simpler queries. Cascaded systems are typically built with independently trained models, neglecting the advantages of considering inference-time interactions of the… ▽ More

    Submitted 29 May, 2024; originally announced June 2024.

    Comments: 22 pages, 13 figures

  5. arXiv:2405.19261  [pdf, other

    cs.CL cs.AI cs.LG

    Faster Cascades via Speculative Decoding

    Authors: Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in p… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  6. arXiv:2404.10136  [pdf, other

    cs.CL cs.AI cs.LG

    Language Model Cascades: Token-level uncertainty and beyond

    Authors: Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here, a small model is invoked for most "easy" instances, while a few "hard" instances are deferred to the large model. While the principles underpinning c… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  7. arXiv:2403.08081  [pdf, other

    cs.LG cs.AI cs.CL math.OC

    Mechanics of Next Token Prediction with Self-Attention

    Authors: Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit Singh Rawat, Samet Oymak

    Abstract: Transformer-based language models are trained on large datasets to predict the next token given an input sequence. Despite this simple training objective, they have led to revolutionary advances in natural language processing. Underlying this success is the self-attention mechanism. In this work, we ask: $\textit{What}$ $\textit{does}$ $\textit{a}$ $\textit{single}$ $\textit{self-attention}$… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted to AISTATS 2024

  8. arXiv:2403.06009  [pdf, other

    cs.LG

    Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

    Authors: Swapnaja Achintalwar, Adriana Alvarado Garcia, Ateret Anaby-Tavor, Ioana Baldini, Sara E. Berger, Bishwaranjan Bhattacharjee, Djallel Bouneffouf, Subhajit Chaudhury, Pin-Yu Chen, Lamogha Chiazor, Elizabeth M. Daly, Kirushikesh DB, Rogério Abreu de Paula, Pierre Dognin, Eitan Farchi, Soumya Ghosh, Michael Hind, Raya Horesh, George Kour, Ja Young Lee, Nishtha Madaan, Sameep Mehta, Erik Miehling, Keerthiram Murugesan, Manish Nagireddy , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we presen… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

  9. arXiv:2402.13512  [pdf, other

    cs.LG cs.AI cs.CL

    From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

    Authors: M. Emrullah Ildiz, Yixiao Huang, Yingcong Li, Ankit Singh Rawat, Samet Oymak

    Abstract: Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation. In this work, we study learning a 1-layer self-attention model from a set of prompts and associated output data sampled from the model. We first establish a precise mapping between the self-attention mechanism and Markov models: Inputting a prompt to the model… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 30 pages

  10. arXiv:2401.06524  [pdf, ps, other

    cs.LG

    Domain Adaptation for Time series Transformers using One-step fine-tuning

    Authors: Subina Khanal, Seshu Tirupathi, Giulio Zizzo, Ambrish Rawat, Torben Bach Pedersen

    Abstract: The recent breakthrough of Transformers in deep learning has drawn significant attention of the time series community due to their ability to capture long-range dependencies. However, like other deep learning models, Transformers face limitations in time series prediction, including insufficient temporal understanding, generalization challenges, and data shift issues for the domains with limited d… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted at the Fourth Workshop of Artificial Intelligence for Time Series Analysis (AI4TS): Theory, Algorithms, and Applications, AAAI 2024, Vancouver, Canada

  11. arXiv:2312.07420  [pdf, other

    cs.LG cs.CY

    FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs

    Authors: Swanand Ravindra Kadhe, Anisa Halimi, Ambrish Rawat, Nathalie Baracaldo

    Abstract: Training large language models (LLMs) is a costly endeavour in terms of time and computational resources. The large amount of training data used during the unsupervised pre-training phase makes it difficult to verify all data and, unfortunately, undesirable data may be ingested during training. Re-training from scratch is impractical and has led to the creation of the 'unlearning' discipline where… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted in NeurIPS 2023 Workshop on Socially Responsible Language Modelling Research (SoLaR)

  12. arXiv:2310.19304  [pdf, other

    cs.CR cs.LG

    Privacy-Preserving Federated Learning over Vertically and Horizontally Partitioned Data for Financial Anomaly Detection

    Authors: Swanand Ravindra Kadhe, Heiko Ludwig, Nathalie Baracaldo, Alan King, Yi Zhou, Keith Houck, Ambrish Rawat, Mark Purcell, Naoise Holohan, Mikio Takeuchi, Ryo Kawahara, Nir Drucker, Hayim Shaul, Eyal Kushnir, Omri Soceanu

    Abstract: The effective detection of evidence of financial anomalies requires collaboration among multiple entities who own a diverse set of data, such as a payment network system (PNS) and its partner banks. Trust among these financial institutions is limited by regulation and competition. Federated learning (FL) enables entities to collaboratively train a model when data is either vertically or horizontal… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Prize Winner in the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge

  13. arXiv:2310.10636  [pdf, other

    cs.LG

    Dual-Encoders for Extreme Multi-Label Classification

    Authors: Nilesh Gupta, Devvrit Khatri, Ankit S Rawat, Srinadh Bhojanapalli, Prateek Jain, Inderjit Dhillon

    Abstract: Dual-encoder (DE) models are widely used in retrieval tasks, most commonly studied on open QA benchmarks that are often characterized by multi-class and limited training data. In contrast, their performance in multi-label and data-rich retrieval settings like extreme multi-label classification (XMC), remains under-explored. Current empirical evidence indicates that DE models fall significantly sho… ▽ More

    Submitted 17 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 27 pages, 8 figures

    Journal ref: ICLR 2024 camera-ready publication

  14. arXiv:2310.08461  [pdf, other

    cs.CL cs.AI cs.LG

    DistillSpec: Improving Speculative Decoding via Knowledge Distillation

    Authors: Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal

    Abstract: Speculative decoding (SD) accelerates large language model inference by employing a faster draft model for generating multiple tokens, which are then verified in parallel by the larger target model, resulting in the text generated according to the target model distribution. However, identifying a compact draft model that is well-aligned with the target model is challenging. To tackle this issue, w… ▽ More

    Submitted 30 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

  15. arXiv:2310.05337  [pdf, other

    cs.LG cs.CV

    What do larger image classifiers memorise?

    Authors: Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels. To carefully study this issue, Feldman proposed a metric to quantify the degree of memorisation of individual training examples, and empirically computed the correspondi… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    MSC Class: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Machine Learning (stat.ML)

  16. arXiv:2310.02226  [pdf, other

    cs.CL cs.AI cs.LG

    Think before you speak: Training Language Models With Pause Tokens

    Authors: Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan

    Abstract: Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)^{th}$ token? We operationalize this idea by performing training and inference on lan… ▽ More

    Submitted 20 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Published at ICLR 2024

  17. arXiv:2307.02764  [pdf, other

    cs.LG stat.ML

    When Does Confidence-Based Cascade Deferral Suffice?

    Authors: Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar

    Abstract: Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite… ▽ More

    Submitted 23 January, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  18. arXiv:2306.09308  [pdf, other

    cs.CL cs.AI cs.CR

    Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large Language Models

    Authors: Myles Foley, Ambrish Rawat, Taesung Lee, Yufang Hou, Gabriele Picco, Giulio Zizzo

    Abstract: The wide applicability and adaptability of generative large language models (LLMs) has enabled their rapid adoption. While the pre-trained models can perform many tasks, such models are often fine-tuned to improve their performance on various downstream applications. However, this leads to issues over violation of model licenses, model theft, and copyright infringement. Moreover, recent advances s… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  19. arXiv:2306.03435  [pdf, other

    cs.LG cs.CL stat.ML

    On the Role of Attention in Prompt-tuning

    Authors: Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, Christos Thrampoulidis

    Abstract: Prompt-tuning is an emerging strategy to adapt large language models (LLM) to downstream tasks by learning a (soft-)prompt parameter from data. Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting. In this work, we explore prompt-tuning for one-layer attention architectures and study contextual mi… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: Published at ICML 2023

  20. arXiv:2302.01576  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    ResMem: Learn what you can and memorize the rest

    Authors: Zitong Yang, Michal Lukasik, Vaishnavh Nagarajan, Zonglin Li, Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g. a ne… ▽ More

    Submitted 20 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

  21. arXiv:2301.12245  [pdf, other

    cs.LG

    Supervision Complexity and its Role in Knowledge Distillation

    Authors: Hrayr Harutyunyan, Ankit Singh Rawat, Aditya Krishna Menon, Seungyeon Kim, Sanjiv Kumar

    Abstract: Despite the popularity and efficacy of knowledge distillation, there is limited understanding of why it helps. In order to study the generalization behavior of a distilled student, we propose a new theoretical framework that leverages supervision complexity: a measure of alignment between teacher-provided supervision and the student's neural tangent kernel. The framework highlights a delicate inte… ▽ More

    Submitted 28 January, 2023; originally announced January 2023.

    Comments: Published at ICLR 2023

  22. arXiv:2301.12005  [pdf, other

    cs.LG

    EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

    Authors: Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Sadeep Jayasumana, Veeranjaneyulu Sadhanala, Wittawat Jitkrittum, Aditya Krishna Menon, Rob Fergus, Sanjiv Kumar

    Abstract: Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR). In this paper, we aim to improve distillation methods that pave the way for the resource-efficient deployment of such models in practice. Inspired by our theoretical analysis of the teacher-student generalization gap for IR models, we propose a novel distillation approach that leverages… ▽ More

    Submitted 3 July, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  23. arXiv:2212.08290  [pdf, other

    cs.LG cs.CV

    Robust Learning Protocol for Federated Tumor Segmentation Challenge

    Authors: Ambrish Rawat, Giulio Zizzo, Swanand Kadhe, Jonathan P. Epperlein, Stefano Braghin

    Abstract: In this work, we devise robust and efficient learning protocols for orchestrating a Federated Learning (FL) process for the Federated Tumor Segmentation Challenge (FeTS 2022). Enabling FL for FeTS setup is challenging mainly due to data heterogeneity among collaborators and communication cost of training. To tackle these challenges, we propose Robust Learning Protocol (RoLePRO) which is a combinat… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: 14 pages, 2 figures, 3 tables

  24. arXiv:2211.05110  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models with Controllable Working Memory

    Authors: Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar

    Abstract: Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP), owing to their excellent understanding and generation abilities. Remarkably, what further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. While many downstream applications provide the model with an informational context to aid its performa… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  25. arXiv:2210.06313  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

    Authors: Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar

    Abstract: This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse. By activation map we refer to the intermediate output of the multi-layer perceptrons (MLPs) after a ReLU activation function, and by sparse we mean that on average very few entries (e.g., 3.0% for T5-Base and 6.3% for ViT-B16) are nonzero for each input to MLP… ▽ More

    Submitted 9 June, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: A short version was presented at ICLR 2023. Previous title: Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

  26. arXiv:2210.02617  [pdf, other

    cs.LG

    Generalization Properties of Retrieval-based Models

    Authors: Soumya Basu, Ankit Singh Rawat, Manzil Zaheer

    Abstract: Many modern high-performing machine learning models such as GPT-3 primarily rely on scaling up models, e.g., transformer networks. Simultaneously, a parallel line of work aims to improve the model performance by augmenting an input instance with other (labeled) instances during inference. Examples of such augmentations include task-specific prompts and similar examples retrieved from the training… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  27. arXiv:2210.02415  [pdf, other

    cs.LG cs.DS stat.ML

    A Fourier Approach to Mixture Learning

    Authors: Mingda Qiao, Guru Guruganesh, Ankit Singh Rawat, Avinava Dubey, Manzil Zaheer

    Abstract: We revisit the problem of learning mixtures of spherical Gaussians. Given samples from mixture $\frac{1}{k}\sum_{j=1}^{k}\mathcal{N}(μ_j, I_d)$, the goal is to estimate the means $μ_1, μ_2, \ldots, μ_k \in \mathbb{R}^d$ up to a small error. The hardness of this learning problem can be measured by the separation $Δ$ defined as the minimum distance between all pairs of means. Regev and Vijayaraghava… ▽ More

    Submitted 5 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: To appear at NeurIPS 2022; v2 corrected author information

  28. arXiv:2209.01881  [pdf, other

    cs.CV

    Semi-Supervised Domain Adaptation by Similarity based Pseudo-label Injection

    Authors: Abhay Rawat, Isha Dua, Saurav Gupta, Rahul Tallamraju

    Abstract: One of the primary challenges in Semi-supervised Domain Adaptation (SSDA) is the skewed ratio between the number of labeled source and target samples, causing the model to be biased towards the source domain. Recent works in SSDA show that aligning only the labeled target samples with the source samples potentially leads to incomplete domain alignment of the target domain to the source domain. In… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

    Comments: ECCV 2022, L2ID Workshop

  29. arXiv:2208.06825  [pdf, other

    cs.LG

    Teacher Guided Training: An Efficient Framework for Knowledge Transfer

    Authors: Manzil Zaheer, Ankit Singh Rawat, Seungyeon Kim, Chong You, Himanshu Jain, Andreas Veit, Rob Fergus, Sanjiv Kumar

    Abstract: The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployment also necessitates a large amount of (labeled or unlabeled) training data. In this paper, we propose the teacher-guided training (TGT) framework for training a… ▽ More

    Submitted 14 August, 2022; originally announced August 2022.

  30. arXiv:2207.05521  [pdf, other

    cs.LG cs.CR

    Federated Unlearning: How to Efficiently Erase a Client in FL?

    Authors: Anisa Halimi, Swanand Kadhe, Ambrish Rawat, Nathalie Baracaldo

    Abstract: With privacy legislation empowering the users with the right to be forgotten, it has become essential to make a model amenable for forgetting some of its training data. However, existing unlearning methods in the machine learning context can not be directly applied in the context of distributed settings like federated learning due to the differences in learning protocol and the presence of multipl… ▽ More

    Submitted 20 October, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

  31. arXiv:2207.03227  [pdf, other

    cs.LG cs.AI stat.ML

    Challenges and Pitfalls of Bayesian Unlearning

    Authors: Ambrish Rawat, James Requeima, Wessel Bruinsma, Richard Turner

    Abstract: Machine unlearning refers to the task of removing a subset of training data, thereby removing its contributions to a trained model. Approximate unlearning are one class of methods for this task which avoid the need to retrain the model from scratch on the retained data. Bayes' rule can be used to cast approximate unlearning as an inference problem where the objective is to obtain the updated poste… ▽ More

    Submitted 13 September, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: 5 pages, 3 figures, Updatable ML (UpML) Workshop, International Conference on Machine Learning (ICML) 2022

  32. arXiv:2204.13208  [pdf, other

    cs.LG stat.ML

    ELM: Embedding and Logit Margins for Long-Tail Learning

    Authors: Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

    Abstract: Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners. Several recent approaches for the problem have proposed enforcing a suitable margin in logit space. Such techniques are intuitive analogues of the guiding principle behind SVMs, and are equally applicable to linear models and neural models. However, when applied to neural m… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: 24 pages

  33. arXiv:2204.06772  [pdf, other

    cs.CV

    ViTOL: Vision Transformer for Weakly Supervised Object Localization

    Authors: Saurav Gupta, Sourav Lakhotia, Abhay Rawat, Rahul Tallamraju

    Abstract: Weakly supervised object localization (WSOL) aims at predicting object locations in an image using only image-level category labels. Common challenges that image classification models encounter when localizing objects are, (a) they tend to look at the most discriminative features in an image that confines the localization map to a very small region, (b) the localization maps are class agnostic, an… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: Accepted: 2022 IEEE CVPR Workshop on Learning with Limited Labelled Data for Image and Video Understanding (L3D-IVU)

  34. arXiv:2202.12443  [pdf, other

    cs.AI cs.LG

    Towards an Accountable and Reproducible Federated Learning: A FactSheets Approach

    Authors: Nathalie Baracaldo, Ali Anwar, Mark Purcell, Ambrish Rawat, Mathieu Sinn, Bashar Altakrouri, Dian Balta, Mahdi Sellami, Peter Kuhn, Ulrich Schopp, Matthias Buchinger

    Abstract: Federated Learning (FL) is a novel paradigm for the shared training of models based on decentralized and private data. With respect to ethical guidelines, FL is promising regarding privacy, but needs to excel vis-à-vis transparency and trustworthiness. In particular, FL has to address the accountability of the parties involved and their adherence to rules, law and principles. We introduce AF^2 Fra… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: 16 pages, 4 figures, 2 tables

  35. arXiv:2201.11865  [pdf, other

    cs.LG cs.DC

    FedLite: A Scalable Approach for Federated Learning on Resource-constrained Clients

    Authors: Jianyu Wang, Hang Qi, Ankit Singh Rawat, Sashank Reddi, Sagar Waghmare, Felix X. Yu, Gauri Joshi

    Abstract: In classical federated learning, the clients contribute to the overall training by communicating local updates for the underlying model on their private data to a coordinating server. However, updating and communicating the entire model becomes prohibitively expensive when resource-constrained clients collectively aim to train a large machine learning model. Split learning provides a natural solut… ▽ More

    Submitted 16 February, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

  36. arXiv:2112.10525  [pdf, other

    cs.LG cs.CR

    Certified Federated Adversarial Training

    Authors: Giulio Zizzo, Ambrish Rawat, Mathieu Sinn, Sergio Maffeis, Chris Hankin

    Abstract: In federated learning (FL), robust aggregation schemes have been developed to protect against malicious clients. Many robust aggregation schemes rely on certain numbers of benign clients being present in a quorum of workers. This can be hard to guarantee when clients can join at will, or join based on factors such as idle system status, and connected to power and WiFi. We tackle the scenario of se… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

    Comments: First presented at the 1st NeurIPS Workshop on New Frontiers in Federated Learning (NFFL 2021)

  37. arXiv:2110.10305  [pdf, other

    cs.LG

    When in Doubt, Summon the Titans: Efficient Inference with Large Models

    Authors: Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

    Abstract: Scaling neural networks to "large" sizes, with billions of parameters, has been shown to yield impressive results on many challenging problems. However, the inference cost incurred by such large models often prevents their application in most real-world settings. In this paper, we propose a two-stage framework based on distillation that realizes the modelling benefits of the large models, while la… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

  38. arXiv:2109.02532  [pdf, other

    cs.LG

    Automated Robustness with Adversarial Training as a Post-Processing Step

    Authors: Ambrish Rawat, Mathieu Sinn, Beat Buesser

    Abstract: Adversarial training is a computationally expensive task and hence searching for neural network architectures with robustness as the criterion can be challenging. As a step towards practical automation, this work explores the efficacy of a simple post processing step in yielding robust deep learning model. To achieve this, we adopt adversarial training as a post-processing step for optimised netwo… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

  39. arXiv:2108.01644  [pdf, other

    cs.CR cs.AI cs.LG

    The Devil is in the GAN: Backdoor Attacks and Defenses in Deep Generative Models

    Authors: Ambrish Rawat, Killian Levacher, Mathieu Sinn

    Abstract: Deep Generative Models (DGMs) are a popular class of deep learning models which find widespread use because of their ability to synthesize data from complex, high-dimensional manifolds. However, even with their increasing industrial adoption, they haven't been subject to rigorous security and privacy analysis. In this work we examine one such aspect, namely backdoor attacks on DGMs which can signi… ▽ More

    Submitted 14 December, 2022; v1 submitted 3 August, 2021; originally announced August 2021.

    Comments: 17 pages, 11 figures, 3 tables

  40. arXiv:2105.05736  [pdf, other

    cs.LG stat.ML

    Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

    Authors: Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar

    Abstract: Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off pe… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: To appear in ICML 2021

  41. arXiv:2102.06849  [pdf, other

    cs.LG cs.AI stat.ML

    Distilling Double Descent

    Authors: Andrew Cotter, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sashank J. Reddi, Yichen Zhou

    Abstract: Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model, which itself is trained on a labeled dataset. The most common explanations for why distillation "works" are predicated on the assumption that student is provided with \emph{soft} labels, \eg probabilities or confidences, from the teacher model. In this work, we show, that,… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  42. arXiv:2102.03349  [pdf, other

    cs.LG

    On the Reproducibility of Neural Network Predictions

    Authors: Srinadh Bhojanapalli, Kimberly Wilber, Andreas Veit, Ankit Singh Rawat, Seungyeon Kim, Aditya Menon, Sanjiv Kumar

    Abstract: Standard training techniques for neural networks involve multiple sources of randomness, e.g., initialization, mini-batch ordering and in some cases data augmentation. Given that neural networks are heavily over-parameterized in practice, such randomness can cause {\em churn} -- for the same input, disagreements between predictions of the two models independently trained by the same algorithm, con… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: 19 pages, 7 figures

  43. arXiv:2012.01791  [pdf, other

    cs.LG cs.CR

    FAT: Federated Adversarial Training

    Authors: Giulio Zizzo, Ambrish Rawat, Mathieu Sinn, Beat Buesser

    Abstract: Federated learning (FL) is one of the most important paradigms addressing privacy and data governance issues in machine learning (ML). Adversarial training has emerged, so far, as the most promising approach against evasion threats on ML models. In this paper, we take the first known steps towards federated adversarial training (FAT) combining both methods to reduce the threat of evasion during in… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

    Comments: NeurIPS 2020 Workshop on Scalability, Privacy, and Security in Federated Learning (SpicyFL)

  44. arXiv:2012.00363  [pdf, other

    cs.CL cs.LG

    Modifying Memories in Transformer Models

    Authors: Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar

    Abstract: Large Transformer models have achieved impressive performance in many natural language tasks. In particular, Transformer based language models have been shown to have great capabilities in encoding factual knowledge in their vast amount of parameters. While the tasks of improving the memorization and generalization of Transformers have been widely studied, it is not well known how to make transfor… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

  45. arXiv:2007.10987  [pdf, other

    cs.LG cs.CR cs.DC

    IBM Federated Learning: an Enterprise Framework White Paper V0.1

    Authors: Heiko Ludwig, Nathalie Baracaldo, Gegi Thomas, Yi Zhou, Ali Anwar, Shashank Rajamoni, Yuya Ong, Jayaram Radhakrishnan, Ashish Verma, Mathieu Sinn, Mark Purcell, Ambrish Rawat, Tran Minh, Naoise Holohan, Supriyo Chakraborty, Shalisha Whitherspoon, Dean Steuer, Laura Wynter, Hifaz Hassan, Sean Laguna, Mikhail Yurochkin, Mayank Agarwal, Ebube Chuba, Annie Abay

    Abstract: Federated Learning (FL) is an approach to conduct machine learning without centralizing training data in a single place, for reasons of privacy, confidentiality or data volume. However, solving federated machine learning problems raises issues above and beyond those of centralized machine learning. These issues include setting up communication infrastructure between parties, coordinating the learn… ▽ More

    Submitted 22 July, 2020; originally announced July 2020.

    Comments: 17 pages

    ACM Class: I.2.6; I.2.11

  46. arXiv:2007.07314  [pdf, other

    cs.LG stat.ML

    Long-tail learning via logit adjustment

    Authors: Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

    Abstract: Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes naïve learning biased towards dominant labels. In this paper, we present two simple modifications of standard softmax cross-entropy training to cope with these chall… ▽ More

    Submitted 9 July, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: Published as a conference paper in ICLR 2021

  47. arXiv:2007.06555  [pdf, other

    cs.LG cs.DS stat.ML

    Adversarial robustness via robust low rank representations

    Authors: Pranjal Awasthi, Himanshu Jain, Ankit Singh Rawat, Aravindan Vijayaraghavan

    Abstract: Adversarial robustness measures the susceptibility of a classifier to imperceptible perturbations made to the inputs at test time. In this work we highlight the benefits of natural low rank representations that often exist for real data such as images, for training neural networks with certified robustness guarantees. Our first contribution is for certified robustness to perturbations measured i… ▽ More

    Submitted 1 August, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: fixed a bug in the proof of Proposition B.2

  48. arXiv:2006.04862  [pdf, other

    cs.LG stat.ML

    $O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers

    Authors: Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

    Abstract: Recently, Transformer networks have redefined the state of the art in many NLP tasks. However, these models suffer from quadratic computational cost in the input sequence length $n$ to compute pairwise attention in each layer. This has prompted recent research into sparse Transformers that sparsify the connections in the attention layers. While empirically promising for long sequences, fundamental… ▽ More

    Submitted 19 December, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 31 pages, NeurIPS 2020 Camera-ready

  49. arXiv:2005.10419  [pdf, other

    cs.LG stat.ML

    Why distillation helps: a statistical perspective

    Authors: Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar

    Abstract: Knowledge distillation is a technique for improving the performance of a simple "student" model by replacing its one-hot training labels with a distribution over labels obtained from a complex "teacher" model. While this simple approach has proven widely effective, a basic question remains unresolved: why does distillation help? In this paper, we present a statistical perspective on distillation w… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

  50. arXiv:2004.10915  [pdf, other

    cs.LG stat.ML

    Doubly-stochastic mining for heterogeneous retrieval

    Authors: Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

    Abstract: Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e.g., users of a retrieval system may be from different countries), each of which poses a challenge. The first challenge concerns scalability: with a large number of labels, standard losses are difficult to optimise even on a single example.… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.