Skip to main content

Showing 1–29 of 29 results for author: Sahu, A K

  1. arXiv:2405.02774  [pdf, other

    cs.LG cs.AI cs.CL

    Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs

    Authors: Feiyang Kang, Hoang Anh Just, Yifan Sun, Himanshu Jahagirdar, Yuanzhi Zhang, Rongxing Du, Anit Kumar Sahu, Ruoxi Jia

    Abstract: This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-tune a pre-trained language model. The goal is to minimize the need for costly domain-specific data for subsequent fine-tuning while achieving desired performance levels. While many data selection algorithms have been designed for small-scale applications, rendering them unsuitable for our context, some emerg… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at ICLR 2024

  2. arXiv:2310.13681  [pdf, other

    cs.GT cs.CY cs.DC cs.LG econ.TH

    Towards Realistic Mechanisms That Incentivize Federated Participation and Contribution

    Authors: Marco Bornstein, Amrit Singh Bedi, Anit Kumar Sahu, Furqan Khan, Furong Huang

    Abstract: Edge device participation in federating learning (FL) is typically studied through the lens of device-server communication (e.g., device dropout) and assumes an undying desire from edge devices to participate in FL. As a result, current FL frameworks are flawed when implemented in realistic settings, with many encountering the free-rider dilemma. In a step to push FL towards realistic settings, we… ▽ More

    Submitted 22 May, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: 24 pages, 11 figures

  3. arXiv:2308.02013  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Federated Representation Learning for Automatic Speech Recognition

    Authors: Guruprasad V Ramesh, Gopinath Chennupati, Milind Rao, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo

    Abstract: Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respec… ▽ More

    Submitted 7 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: Accepted at ISCA SPSC Symposium 3rd Symposium on Security and Privacy in Speech Communication, 2023

  4. arXiv:2307.02460  [pdf, other

    cs.LG cs.AI cs.CE cs.CV

    Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources

    Authors: Feiyang Kang, Hoang Anh Just, Anit Kumar Sahu, Ruoxi Jia

    Abstract: Traditionally, data selection has been studied in settings where all samples from prospective sources are fully revealed to a machine learning developer. However, in practical data exchange scenarios, data providers often reveal only a limited subset of samples before an acquisition decision is made. Recently, there have been efforts to fit scaling laws that predict model performance at any size a… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: An extended abstract of this work appears in Data-centric Machine Learning Research (DMLR) Workshop at 40th International Conference on Machine Learning, Honolulu HI, USA. July 29, 2023

  5. arXiv:2306.12015  [pdf, other

    eess.AS cs.SD

    Federated Self-Learning with Weak Supervision for Speech Recognition

    Authors: Milind Rao, Gopinath Chennupati, Gautam Tiwari, Anit Kumar Sahu, Anirudh Raju, Ariya Rastrow, Jasha Droppo

    Abstract: Automatic speech recognition (ASR) models with low-footprint are increasingly being deployed on edge devices for conversational agents, which enhances privacy. We study the problem of federated continual incremental learning for recurrent neural network-transducer (RNN-T) ASR models in the privacy-enhancing scheme of learning on-device, without access to ground truth human transcripts or machine t… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: Proceedings of ICASSP 2023

  6. Learning When to Trust Which Teacher for Weakly Supervised ASR

    Authors: Aakriti Agrawal, Milind Rao, Anit Kumar Sahu, Gopinath Chennupati, Andreas Stolcke

    Abstract: Automatic speech recognition (ASR) training can utilize multiple experts as teacher models, each trained on a specific domain or accent. Teacher models may be opaque in nature since their architecture may be not be known or their training cadence is different from that of the student ASR model. Still, the student models are updated incrementally using the pseudo-labels generated independently by t… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: Proceedings of INTERSPEECH 2023

    Journal ref: Proc. Interspeech, Aug. 2023, pp. 381-385

  7. ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale

    Authors: Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure

    Abstract: Incremental learning is one paradigm to enable model building and updating at scale with streaming data. For end-to-end automatic speech recognition (ASR) tasks, the absence of human annotated labels along with the need for privacy preserving policies for model building makes it a daunting challenge. Motivated by these challenges, in this paper we use a cloud based framework for production systems… ▽ More

    Submitted 22 July, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: 9 pages

  8. arXiv:2206.10815  [pdf, other

    cs.LG cs.DC math.OC

    FedBC: Calibrating Global and Local Models via Federated Learning Beyond Consensus

    Authors: Amrit Singh Bedi, Chen Fan, Alec Koppel, Anit Kumar Sahu, Brian M. Sadler, Furong Huang, Dinesh Manocha

    Abstract: In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program. The objective of a device is its local objective, which it seeks to minimize while satisfying nonlinear constraints that quantify the proximity between the local and the global model. By considerin… ▽ More

    Submitted 1 February, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

  9. arXiv:2204.08069  [pdf, other

    cs.LG cs.AI

    Self-Aware Personalized Federated Learning

    Authors: Huili Chen, Jie Ding, Eric Tramel, Shuang Wu, Anit Kumar Sahu, Salman Avestimehr, Tao Zhang

    Abstract: In the context of personalized federated learning (FL), the critical challenge is to balance local model improvement and global model tuning when the personal and global objectives may not be exactly aligned. Inspired by Bayesian hierarchical models, we develop a self-aware personalized FL method where each client can automatically balance the training of its local personal model and the global mo… ▽ More

    Submitted 17 April, 2022; originally announced April 2022.

  10. arXiv:2204.02593  [pdf, other

    math.OC cs.IT cs.LG

    Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise

    Authors: Dusan Jakovetic, Dragana Bajovic, Anit Kumar Sahu, Soummya Kar, Nemanja Milosevic, Dusan Stamenkovic

    Abstract: We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assum… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: Submitted for publication Nov 2021

  11. arXiv:2202.00807  [pdf, other

    cs.LG cs.AI cs.DC

    Federated Learning Challenges and Opportunities: An Outlook

    Authors: Jie Ding, Eric Tramel, Anit Kumar Sahu, Shuang Wu, Salman Avestimehr, Tao Zhang

    Abstract: Federated learning (FL) has been developed as a promising framework to leverage the resources of edge devices, enhance customers' privacy, comply with regulations, and reduce development costs. Although many methods and applications have been developed for FL, several critical challenges for practical FL systems remain unaddressed. This paper provides an outlook on FL development, categorized into… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

    Comments: This paper provides an outlook on FL development as part of the ICASSP 2022 special session entitled "Frontiers of Federated Learning: Applications, Challenges, and Opportunities"

  12. arXiv:2201.03789  [pdf, other

    cs.LG stat.ML

    Partial Model Averaging in Federated Learning: Performance Guarantees and Benefits

    Authors: Sunwoo Lee, Anit Kumar Sahu, Chaoyang He, Salman Avestimehr

    Abstract: Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the workers. When local SGD runs with many workers, however, the periodic averaging causes a significant model discrepancy across the workers making the global loss c… ▽ More

    Submitted 11 January, 2022; originally announced January 2022.

  13. arXiv:2102.00029  [pdf, other

    cs.LG cs.CR

    You Only Query Once: Effective Black Box Adversarial Attacks with Minimal Repeated Queries

    Authors: Devin Willmott, Anit Kumar Sahu, Fatemeh Sheikholeslami, Filipe Condessa, Zico Kolter

    Abstract: Researchers have repeatedly shown that it is possible to craft adversarial attacks on deep classifiers (small perturbations that significantly change the class label), even in the "black-box" setting where one only has query access to the classifier. However, all prior work in the black-box setting attacks the classifier by repeatedly querying the same image with minor modifications, usually thous… ▽ More

    Submitted 29 January, 2021; originally announced February 2021.

  14. arXiv:2010.04205  [pdf, other

    cs.LG

    Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial Attacks

    Authors: Anit Kumar Sahu, Satya Narayan Shukla, J. Zico Kolter

    Abstract: We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle, providing us with loss function evaluations. Although this setting has been investigated in previous work, most past approaches using zeroth order optimization implicitly assume that the gradients of the loss function with respect to the input images are \emph{unstruc… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

  15. arXiv:2007.07210  [pdf, other

    cs.LG stat.ML

    Simple and Efficient Hard Label Black-box Adversarial Attacks in Low Query Budget Regimes

    Authors: Satya Narayan Shukla, Anit Kumar Sahu, Devin Willmott, J. Zico Kolter

    Abstract: We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output label~(hard label) to a queried data input. We propose a simple and efficient Bayesian Optimization~(BO) based approach for developing black-box adversarial attacks. Issues with BO's performance in high dimensions are avo… ▽ More

    Submitted 11 June, 2021; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: Accepted at KDD 2021. arXiv admin note: substantial text overlap with arXiv:1909.13857

  16. Data-driven Thermal Model Inference with ARMAX, in Smart Environments, based on Normalized Mutual Information

    Authors: Zhanhong Jiang, Jonathan Francis, Anit Kumar Sahu, Sirajum Munir, Charles Shelton, Anthony Rowe, Mario Bergés

    Abstract: Understanding the models that characterize the thermal dynamics in a smart building is important for the comfort of its occupants and for its energy optimization. A significant amount of research has attempted to utilize thermodynamics (physical) models for smart building control, but these approaches remain challenging due to the stochastic nature of the intermittent environmental disturbances. T… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

    Journal ref: American Control Conference (2018) 4634-4639

  17. arXiv:2001.01920  [pdf, other

    cs.LG stat.ML

    FedDANE: A Federated Newton-Type Method

    Authors: Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith

    Abstract: Federated learning aims to jointly learn statistical models over massively distributed remote devices. In this work, we propose FedDANE, an optimization method that we adapt from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning. We provide convergence guarantees for this method when learning over both convex and non-convex functions.… ▽ More

    Submitted 7 January, 2020; originally announced January 2020.

    Comments: Asilomar Conference on Signals, Systems, and Computers 2019

  18. arXiv:1909.13857  [pdf, other

    cs.LG stat.ML

    Black-box Adversarial Attacks with Bayesian Optimization

    Authors: Satya Narayan Shukla, Anit Kumar Sahu, Devin Willmott, J. Zico Kolter

    Abstract: We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization~(BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

  19. arXiv:1909.12473  [pdf, other

    cs.LG stat.ML

    Noisy Batch Active Learning with Deterministic Annealing

    Authors: Gaurav Gupta, Anit Kumar Sahu, Wan-Yi Lin

    Abstract: We study the problem of training machine learning models incrementally with batches of samples annotated with noisy oracles. We select each batch of samples that are important and also diverse via clustering and importance sampling. More importantly, we incorporate model uncertainty into the sampling probability to compensate for poor estimation of the importance scores when the training data is t… ▽ More

    Submitted 28 October, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

  20. arXiv:1908.07873  [pdf, other

    cs.LG cs.DC stat.ML

    Federated Learning: Challenges, Methods, and Future Directions

    Authors: Tian Li, Anit Kumar Sahu, Ameet Talwalkar, Virginia Smith

    Abstract: Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while keeping data localized. Training in heterogeneous and potentially massive networks introduces novel challenges that require a fundamental departure from standard approaches for large-scale machine learning, distributed optimization, and privacy-preserving da… ▽ More

    Submitted 21 August, 2019; originally announced August 2019.

  21. arXiv:1905.09435  [pdf, other

    cs.LG eess.SY math.OC stat.ML

    MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling

    Authors: Jianyu Wang, Anit Kumar Sahu, Zhouyi Yang, Gauri Joshi, Soummya Kar

    Abstract: This paper studies the problem of error-runtime trade-off, typically encountered in decentralized training based on stochastic gradient descent (SGD) using a given network. While a denser (sparser) network topology results in faster (slower) error convergence in terms of iterations, it incurs more (less) communication time/delay per iteration. In this paper, we propose MATCHA, an algorithm that ca… ▽ More

    Submitted 18 November, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

  22. arXiv:1903.07266  [pdf, other

    cs.LG cs.DC cs.MA eess.SY stat.ML

    Distributed stochastic optimization with gradient tracking over strongly-connected networks

    Authors: Ran Xin, Anit Kumar Sahu, Usman A. Khan, Soummya Kar

    Abstract: In this paper, we study distributed stochastic optimization to minimize a sum of smooth and strongly-convex local cost functions over a network of agents, communicating over a strongly-connected graph. Assuming that each agent has access to a stochastic first-order oracle ($\mathcal{SFO}$), we propose a novel distributed method, called $\mathcal{S}$-$\mathcal{AB}$, where each agent uses an auxilia… ▽ More

    Submitted 9 April, 2019; v1 submitted 18 March, 2019; originally announced March 2019.

  23. arXiv:1812.06127  [pdf, other

    cs.LG stat.ML

    Federated Optimization in Heterogeneous Networks

    Authors: Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith

    Abstract: Federated Learning is a distributed learning paradigm with two key challenges that differentiate it from traditional distributed optimization: (1) significant variability in terms of the systems characteristics on each device in the network (systems heterogeneity), and (2) non-identically distributed data across the network (statistical heterogeneity). In this work, we introduce a framework, FedPr… ▽ More

    Submitted 21 April, 2020; v1 submitted 14 December, 2018; originally announced December 2018.

    Comments: MLSys 2020

  24. arXiv:1811.04475  [pdf, other

    cs.GT cs.LG stat.ML

    Managing App Install Ad Campaigns in RTB: A Q-Learning Approach

    Authors: Anit Kumar Sahu, Shaunak Mishra, Narayan Bhamidipati

    Abstract: Real time bidding (RTB) enables demand side platforms (bidders) to scale ad campaigns across multiple publishers affiliated to an RTB ad exchange. While driving multiple campaigns for mobile app install ads via RTB, the bidder typically has to: (i) maintain each campaign's efficiency (i.e., meet advertiser's target cost-per-install), (ii) be sensitive to advertiser's budget, and (iii) make profit… ▽ More

    Submitted 11 November, 2018; originally announced November 2018.

    Comments: 6 pages

  25. arXiv:1810.03233  [pdf, other

    math.OC cs.LG

    Towards Gradient Free and Projection Free Stochastic Optimization

    Authors: Anit Kumar Sahu, Manzil Zaheer, Soummya Kar

    Abstract: This paper focuses on the problem of \emph{constrained} \emph{stochastic} optimization. A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank-Wolfe algorithm makes it gradient free. Under convexity and smoothness assumption, we show that the proposed algorithm converges to the optimal objective function at a rate… ▽ More

    Submitted 18 February, 2019; v1 submitted 7 October, 2018; originally announced October 2018.

    Comments: To appear in Proceedings of AISTATS 2019

  26. arXiv:1802.04943  [pdf, other

    math.OC cs.IT math.PR math.ST

    $\mathcal{CIRFE}$: A Distributed Random Fields Estimator

    Authors: Anit Kumar Sahu, Dusan Jakovetic, Soummya Kar

    Abstract: This paper presents a communication efficient distributed algorithm, $\mathcal{CIRFE}$ of the \emph{consensus}+\emph{innovations} type, to estimate a high-dimensional parameter in a multi-agent network, in which each agent is interested in reconstructing only a few components of the parameter. This problem arises for example when monitoring the high-dimensional distributed state of a large-scale i… ▽ More

    Submitted 11 June, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

    Comments: 30 pages. Submitted for journal publication. Initial Submission: Feb 2018, Revised: June 2018

  27. arXiv:1711.09739  [pdf

    cs.CY

    Automatic Pill Reminder for Easy Supervision

    Authors: A. Jabeena, Animesh Kumar Sahu, Rohit Roy, N. Sardar Basha

    Abstract: In this paper we present a working model of an automatic pill reminder and dispenser setup that can alleviate irregularities in taking prescribed dosage of medicines at the right time dictated by the medical practitioner and switch from approaches predominantly dependent on human memory to automation with negligible supervision, thus relieving persons from error-prone tasks of giving wrong medicin… ▽ More

    Submitted 17 November, 2017; originally announced November 2017.

    Comments: 5 pages, 7 figures, ICISS- 2017 (IEEE Conference)

  28. arXiv:1602.00382  [pdf, other

    math.OC cs.IT math.PR math.ST

    Distributed Constrained Recursive Nonlinear Least-Squares Estimation: Algorithms and Asymptotics

    Authors: Anit Kumar Sahu, Soummya Kar, Jose' M. F. Moura, H. Vincent Poor

    Abstract: This paper focuses on the problem of recursive nonlinear least squares parameter estimation in multi-agent networks, in which the individual agents observe sequentially over time an independent and identically distributed (i.i.d.) time-series consisting of a nonlinear function of the true but unknown parameter corrupted by noise. A distributed recursive estimator of the \emph{consensus} + \emph{in… ▽ More

    Submitted 19 October, 2016; v1 submitted 31 January, 2016; originally announced February 2016.

    Comments: 28 pages. Initial Submission: Feb. 2016, Revised: July 2016, Accepted: September 2016, To appear in IEEE Transactions on Signal and Information Processing over Networks: Special Issue on Inference and Learning over Networks

  29. arXiv:1601.04779  [pdf, other

    cs.IT math.PR

    Recursive Distributed Detection for Composite Hypothesis Testing: Nonlinear Observation Models in Additive Gaussian Noise

    Authors: Anit Kumar Sahu, Soummya Kar

    Abstract: This paper studies recursive composite hypothesis testing in a network of sparsely connected agents. The network objective is to test a simple null hypothesis against a composite alternative concerning the state of the field, modeled as a vector of (continuous) unknown parameters determining the parametric family of probability measures induced on the agents' observation spaces under the hypothese… ▽ More

    Submitted 21 February, 2017; v1 submitted 18 January, 2016; originally announced January 2016.

    Comments: To appear in IEEE Transactions on Information Theory