Skip to main content

Showing 1–50 of 81 results for author: Kamath, G

  1. arXiv:2406.17814  [pdf, ps, other

    stat.ML cs.DS cs.IT cs.LG math.ST

    Distribution Learnability and Robustness

    Authors: Shai Ben-David, Alex Bie, Gautam Kamath, Tosca Lechner

    Abstract: We examine the relationship between learnability and robust (or agnostic) learnability for the problem of distribution learning. We show that, contrary to other learning settings (e.g., PAC learning of function classes), realizable learnability of a class of probability distributions does not imply its agnostic learnability. We go on to examine what type of data corruption can disrupt the learnabi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: In NeurIPS 2023

  2. arXiv:2406.17216  [pdf, other

    cs.LG cs.AI cs.CR cs.CY

    Machine Unlearning Fails to Remove Data Poisoning Attacks

    Authors: Martin Pawelczyk, Jimmy Z. Di, Yiwei Lu, Gautam Kamath, Ayush Sekhari, Seth Neel

    Abstract: We revisit the efficacy of several practical methods for approximate machine unlearning developed for large-scale deep learning. In addition to complying with data deletion requests, one often-cited potential application for unlearning methods is to remove the effects of training on poisoned data. We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be ef… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2405.20769  [pdf, other

    cs.CR cs.DS cs.LG stat.ML

    Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

    Authors: Christian Janos Lebeda, Matthew Regehr, Gautam Kamath, Thomas Steinke

    Abstract: We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the compo… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  4. arXiv:2405.20405  [pdf, other

    cs.DS cs.CR cs.IT cs.LG stat.ML

    Private Mean Estimation with Person-Level Differential Privacy

    Authors: Sushant Agarwal, Gautam Kamath, Mahbod Majid, Argyris Mouzakis, Rose Silver, Jonathan Ullman

    Abstract: We study differentially private (DP) mean estimation in the case where each person holds multiple samples. Commonly referred to as the "user-level" setting, DP here requires the usual notion of distributional stability when all of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 67 pages, 3 figures

  5. arXiv:2405.04034  [pdf, other

    cs.LG cs.CR cs.CY

    Differentially Private Post-Processing for Fair Regression

    Authors: Ruicheng Xian, Qiaobo Li, Gautam Kamath, Han Zhao

    Abstract: This paper describes a differentially private post-processing algorithm for learning fair regressors satisfying statistical parity, addressing privacy concerns of machine learning models trained on sensitive data, as well as fairness concerns of their potential to propagate historical biases. Our algorithm can be applied to post-process any given regressor to improve fairness by remapping its outp… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: ICML 2024. Code is at https://github.com/rxian/fair-regression

  6. arXiv:2404.06737  [pdf, other

    cs.LG cs.CR

    Disguised Copyright Infringement of Latent Diffusion Models

    Authors: Yiwei Lu, Matthew Y. R. Yang, Zuoqiu Liu, Gautam Kamath, Yaoliang Yu

    Abstract: Copyright infringement may occur when a generative model produces samples substantially similar to some copyrighted data that it had access to during the training phase. The notion of access usually refers to including copyrighted samples directly in the training dataset, which one may inspect to identify an infringement. We argue that such visual auditing largely overlooks a concealed copyright i… ▽ More

    Submitted 3 June, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted to ICML 2024

  7. Scope Ambiguities in Large Language Models

    Authors: Gaurav Kamath, Sebastian Schuster, Sowmya Vajjala, Siva Reddy

    Abstract: Sentences containing multiple semantic operators with overlapping scope often create ambiguities in interpretation, known as scope ambiguities. These ambiguities offer rich insights into the interaction between semantic structure and world knowledge in language processing. Despite this, there has been little research into how modern large language models treat them. In this paper, we investigate h… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: To be published in Transactions of the Association for Computational Linguistics

  8. arXiv:2402.12626  [pdf, other

    cs.LG cs.CR

    Indiscriminate Data Poisoning Attacks on Pre-trained Feature Extractors

    Authors: Yiwei Lu, Matthew Y. R. Yang, Gautam Kamath, Yaoliang Yu

    Abstract: Machine learning models have achieved great success in supervised learning tasks for end-to-end training, which requires a large amount of labeled data that is not always feasible. Recently, many practitioners have shifted to self-supervised learning methods that utilize cheap unlabeled data to learn a general feature extractor via pre-training, which can be further applied to personalized downstr… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to SaTML 2024

  9. arXiv:2402.00267  [pdf, ps, other

    cs.DS cs.CR stat.ML

    Not All Learnable Distribution Classes are Privately Learnable

    Authors: Mark Bun, Gautam Kamath, Argyris Mouzakis, Vikrant Singhal

    Abstract: We give an example of a class of distributions that is learnable in total variation distance with a finite number of samples, but not learnable under $(\varepsilon, δ)$-differential privacy. This refutes a conjecture of Ashtiani.

    Submitted 5 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: To appear in ALT 2024. Added a minor clarification to the construction and an acknowledgement of the Fields Institute

  10. arXiv:2311.06477  [pdf, other

    cs.CY

    Report of the 1st Workshop on Generative AI and Law

    Authors: A. Feder Cooper, Katherine Lee, James Grimmelmann, Daphne Ippolito, Christopher Callison-Burch, Christopher A. Choquette-Choo, Niloofar Mireshghallah, Miles Brundage, David Mimno, Madiha Zahrah Choksi, Jack M. Balkin, Nicholas Carlini, Christopher De Sa, Jonathan Frankle, Deep Ganguli, Bryant Gipson, Andres Guadamuz, Swee Leng Harris, Abigail Z. Jacobs, Elizabeth Joh, Gautam Kamath, Mark Lemley, Cass Matthews, Christine McLeavey, Corynne McSherry , et al. (10 additional authors not shown)

    Abstract: This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report… ▽ More

    Submitted 2 December, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

  11. arXiv:2308.06239  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Private Distribution Learning with Public Data: The View from Sample Compression

    Authors: Shai Ben-David, Alex Bie, Clément L. Canonne, Gautam Kamath, Vikrant Singhal

    Abstract: We study the problem of private distribution learning with access to public data. In this setup, which we refer to as public-private learning, the learner is given public and private samples drawn from an unknown distribution $p$ belonging to a class $\mathcal Q$, with the goal of outputting an estimate of $p$ while adhering to privacy constraints (here, pure differential privacy) only with respec… ▽ More

    Submitted 14 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: 31 pages

  12. arXiv:2304.06929  [pdf

    cs.CR

    Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment

    Authors: Rachel Cummings, Damien Desfontaines, David Evans, Roxana Geambasu, Yangsibo Huang, Matthew Jagielski, Peter Kairouz, Gautam Kamath, Sewoong Oh, Olga Ohrimenko, Nicolas Papernot, Ryan Rogers, Milan Shen, Shuang Song, Weijie Su, Andreas Terzis, Abhradeep Thakurta, Sergei Vassilvitskii, Yu-Xiang Wang, Li Xiong, Sergey Yekhanin, Da Yu, Huanyu Zhang, Wanrong Zhang

    Abstract: In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field of differential privacy (DP), with a focus of advancing DP's deployment in real-world applications. Key points and high-level contents of the article were originated from the discussions from "Differential Privacy (DP): Challenges Towards the Next Frontier," a workshop held in July 20… ▽ More

    Submitted 12 March, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  13. arXiv:2303.03592  [pdf, other

    cs.LG cs.CR

    Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks

    Authors: Yiwei Lu, Gautam Kamath, Yaoliang Yu

    Abstract: Indiscriminate data poisoning attacks aim to decrease a model's test accuracy by injecting a small amount of corrupted training data. Despite significant interest, existing attacks remain relatively ineffective against modern machine learning (ML) architectures. In this work, we introduce the notion of model poisoning reachability as a technical tool to explore the intrinsic limits of data poisoni… ▽ More

    Submitted 6 June, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted to ICML 2023

  14. arXiv:2303.01256  [pdf, other

    stat.ML cs.CR cs.CV cs.DS cs.LG

    Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance

    Authors: Xin Gu, Gautam Kamath, Zhiwei Steven Wu

    Abstract: Differentially private stochastic gradient descent privatizes model training by injecting noise into each iteration, where the noise magnitude increases with the number of model parameters. Recent works suggest that we can reduce the noise by leveraging public data for private machine learning, by projecting gradients onto a subspace prescribed by the public data. However, given a choice of public… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  15. arXiv:2302.02936  [pdf, other

    cs.LG cs.CR cs.CV

    Private GANs, Revisited

    Authors: Alex Bie, Gautam Kamath, Guojun Zhang

    Abstract: We show that the canonical approach for training differentially private GANs -- updating the discriminator with differentially private stochastic gradient descent (DPSGD) -- can yield significantly improved results after modifications to training. Specifically, we propose that existing instantiations of this approach neglect to consider how adding noise only to discriminator updates inhibits discr… ▽ More

    Submitted 5 October, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: 28 pages; revisions and new experiments from TMLR camera-ready + code release at https://github.com/alexbie98/dpgan-revisit

  16. arXiv:2301.13334  [pdf, ps, other

    math.ST cs.CR cs.DS stat.ML

    A Bias-Variance-Privacy Trilemma for Statistical Estimation

    Authors: Gautam Kamath, Argyris Mouzakis, Matthew Regehr, Vikrant Singhal, Thomas Steinke, Jonathan Ullman

    Abstract: The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clipping controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clipping also introduces statistical bias. We prove that this tradeoff is inherent: no algorithm can simultaneously have low bias, low varia… ▽ More

    Submitted 28 February, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  17. arXiv:2212.10717  [pdf, other

    cs.LG cs.AI cs.CR cs.CY

    Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks

    Authors: Jimmy Z. Di, Jack Douglas, Jayadev Acharya, Gautam Kamath, Ayush Sekhari

    Abstract: We introduce camouflaged data poisoning attacks, a new attack vector that arises in the context of machine unlearning and other settings when model retraining may be induced. An adversary first adds a few carefully crafted points to the training dataset such that the impact on the model's predictions is minimal. The adversary subsequently triggers a request to remove a subset of the introduced poi… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  18. arXiv:2212.06470  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining

    Authors: Florian Tramèr, Gautam Kamath, Nicholas Carlini

    Abstract: The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets. We critically review this approach. We primarily question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving. We caution that publicizing these models pret… ▽ More

    Submitted 2 June, 2024; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: ICML 2024

  19. arXiv:2212.05015  [pdf, ps, other

    cs.DS cs.CR cs.IT stat.ML

    Robustness Implies Privacy in Statistical Estimation

    Authors: Samuel B. Hopkins, Gautam Kamath, Mahbod Majid, Shyam Narayanan

    Abstract: We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and cov… ▽ More

    Submitted 15 June, 2024; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: 90 pages, 2 tables. Appeared in STOC, 2023

  20. arXiv:2208.07984  [pdf, other

    cs.LG cs.CR stat.ML

    Private Estimation with Public Data

    Authors: Alex Bie, Gautam Kamath, Vikrant Singhal

    Abstract: We initiate the study of differentially private (DP) estimation with access to a small amount of public data. For private estimation of d-dimensional Gaussians, we assume that the public data comes from a Gaussian that may have vanishing similarity in total variation distance with the underlying Gaussian of the private data. We show that under the constraints of pure or concentrated DP, d+1 public… ▽ More

    Submitted 5 April, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: 55 pages; updated funding acknowledgement + simulation results from NeurIPS 2022 camera-ready

  21. arXiv:2206.02617  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

    Authors: Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, Huishuai Zhang

    Abstract: Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning. It provides a single privacy guarantee to all datapoints in the dataset. We propose output-specific $(\varepsilon,δ)$-DP to characterize privacy guarantees for individual examples when releasing models trained by DP-SGD. We also design an efficient algorithm to inves… ▽ More

    Submitted 2 September, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Published in Transactions on Machine Learning Research (TMLR)

  22. arXiv:2205.12224  [pdf

    physics.ao-ph cs.CV

    GLOBUS: GLObal Building heights for Urban Studies

    Authors: Harsh G. Kamath, Manmeet Singh, Lori A. Magruder, Zong-Liang Yang, Dev Niyogi

    Abstract: Urban weather and climate studies continue to be important as extreme events cause economic loss and impact public health. Weather models seek to represent urban areas but are oversimplified due to data availability, especially building information. This paper introduces a novel Level of Detail-1 (LoD-1) building dataset derived from a Deep Neural Network (DNN) called GLObal Building heights for U… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: 9 pages, 5 figures

  23. arXiv:2205.08532  [pdf, ps, other

    cs.DS cs.CR stat.ML

    New Lower Bounds for Private Estimation and a Generalized Fingerprinting Lemma

    Authors: Gautam Kamath, Argyris Mouzakis, Vikrant Singhal

    Abstract: We prove new lower bounds for statistical estimation tasks under the constraint of $(\varepsilon, δ)$-differential privacy. First, we provide tight lower bounds for private covariance estimation of Gaussian distributions. We show that estimating the covariance matrix in Frobenius norm requires $Ω(d^2)$ samples, and in spectral norm requires $Ω(d^{3/2})$ samples, both matching upper bounds up to lo… ▽ More

    Submitted 28 March, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022. Minor correction to the discussion of independent work

  24. arXiv:2204.09092  [pdf, other

    cs.LG cs.CR

    Indiscriminate Data Poisoning Attacks on Neural Networks

    Authors: Yiwei Lu, Gautam Kamath, Yaoliang Yu

    Abstract: Data poisoning attacks, in which a malicious adversary aims to influence a model by injecting "poisoned" data into the training process, have attracted significant recent attention. In this work, we take a closer look at existing poisoning attacks and connect them with old and new algorithms for solving sequential Stackelberg games. By choosing an appropriate loss function for the attacker and opt… ▽ More

    Submitted 15 February, 2024; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Accepted to TMLR in 2022

  25. arXiv:2201.11308  [pdf, other

    cs.CR cs.IT

    Calibration with Privacy in Peer Review

    Authors: Wenxin Ding, Gautam Kamath, Weina Wang, Nihar B. Shah

    Abstract: Reviewers in peer review are often miscalibrated: they may be strict, lenient, extreme, moderate, etc. A number of algorithms have previously been proposed to calibrate reviews. Such attempts of calibration can however leak sensitive information about which reviewer reviewed which paper. In this paper, we identify this problem of calibration with privacy, and provide a foundational building block… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

    Comments: 31 pages, 6 figures

  26. Efficient Mean Estimation with Pure Differential Privacy via a Sum-of-Squares Exponential Mechanism

    Authors: Samuel B. Hopkins, Gautam Kamath, Mahbod Majid

    Abstract: We give the first polynomial-time algorithm to estimate the mean of a $d$-variate probability distribution with bounded covariance from $\tilde{O}(d)$ independent samples subject to pure differential privacy. Prior algorithms for this problem either incur exponential running time, require $Ω(d^{1.5})$ samples, or satisfy only the weaker concentrated or approximate differential privacy conditions.… ▽ More

    Submitted 2 June, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: 66 pages, STOC 2022

  27. arXiv:2111.05320  [pdf, ps, other

    cs.DS cs.IT math.ST stat.ML

    Robust Estimation for Random Graphs

    Authors: Jayadev Acharya, Ayush Jain, Gautam Kamath, Ananda Theertha Suresh, Huanyu Zhang

    Abstract: We study the problem of robustly estimating the parameter $p$ of an Erdős-Rényi random graph on $n$ nodes, where a $γ$ fraction of nodes may be adversarially corrupted. After showing the deficiencies of canonical estimators, we design a computationally-efficient spectral algorithm which estimates $p$ up to accuracy $\tilde O(\sqrt{p(1-p)}/n + γ\sqrt{p(1-p)} /\sqrt{n}+ γ/n)$ for $γ< 1/60$. Furtherm… ▽ More

    Submitted 15 February, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

  28. arXiv:2111.04906  [pdf, other

    stat.ML cs.CR cs.LG

    The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection

    Authors: Shubhankar Mohapatra, Sajin Sasy, Xi He, Gautam Kamath, Om Thakkar

    Abstract: Hyperparameter optimization is a ubiquitous challenge in machine learning, and the performance of a trained model depends crucially upon their effective selection. While a rich set of tools exist for this purpose, there are currently no practical hyperparameter selection methods under the constraint of differential privacy (DP). We study honest hyperparameter selection for differentially private m… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

  29. arXiv:2111.04609  [pdf, ps, other

    stat.ML cs.CR cs.DS cs.IT cs.LG

    A Private and Computationally-Efficient Estimator for Unbounded Gaussians

    Authors: Gautam Kamath, Argyris Mouzakis, Vikrant Singhal, Thomas Steinke, Jonathan Ullman

    Abstract: We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(μ,Σ)$ in $\mathbb{R}^d$. All previous estimators are either nonconstructive, with unbounded running time, or require the user to specify a priori bounds on the parameters $μ$ and $Σ$. The primary new technical tool in our algorithm is… ▽ More

    Submitted 11 February, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

  30. arXiv:2110.14465  [pdf, other

    stat.ME cs.CR math.ST

    Unbiased Statistical Estimation and Valid Confidence Intervals Under Differential Privacy

    Authors: Christian Covington, Xi He, James Honaker, Gautam Kamath

    Abstract: We present a method for producing unbiased parameter estimates and valid confidence intervals under the constraints of differential privacy, a formal framework for limiting individual information leakage from sensitive data. Prior work in this area is limited in that it is tailored to calculating confidence intervals for specific statistical procedures, such as mean estimation or simple linear reg… ▽ More

    Submitted 14 February, 2024; v1 submitted 27 October, 2021; originally announced October 2021.

  31. arXiv:2110.06500  [pdf, other

    cs.LG cs.CL cs.CR stat.ML

    Differentially Private Fine-tuning of Language Models

    Authors: Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang

    Abstract: We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially… ▽ More

    Submitted 14 July, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: ICLR 2022. Code available at https://github.com/huseyinatahaninan/Differentially-Private-Fine-tuning-of-Language-Models

  32. arXiv:2107.04202  [pdf, other

    cs.IT

    Sketching and Sequence Alignment: A Rate-Distortion Perspective

    Authors: Ilan Shomorony, Govinda M. Kamath

    Abstract: Pairwise alignment of DNA sequencing data is a ubiquitous task in bioinformatics and typically represents a heavy computational burden. A standard approach to speed up this task is to compute "sketches" of the DNA reads (typically via hashing-based techniques) that allow the efficient computation of pairwise alignment scores. We propose a rate-distortion framework to study the problem of computing… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

  33. arXiv:2106.13414  [pdf, other

    cs.DS cs.IT math.PR math.ST stat.ML

    The Price of Tolerance in Distribution Testing

    Authors: Clément L. Canonne, Ayush Jain, Gautam Kamath, Jerry Li

    Abstract: We revisit the problem of tolerant distribution testing. That is, given samples from an unknown distribution $p$ over $\{1, \dots, n\}$, is it $\varepsilon_1$-close to or $\varepsilon_2$-far from a reference distribution $q$ (in total variation distance)? Despite significant interest over the past decade, this problem is well understood only in the extreme cases. In the noiseless setting (i.e.,… ▽ More

    Submitted 8 November, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: Added a result on instance-optimal testing, and further discussion in the introduction

  34. arXiv:2106.01336  [pdf, ps, other

    cs.LG cs.CR cs.DS math.OC stat.ML

    Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data

    Authors: Gautam Kamath, Xingtu Liu, Huanyu Zhang

    Abstract: We study stochastic convex optimization with heavy-tailed data under the constraint of differential privacy (DP). Most prior work on this problem is restricted to the case where the loss function is Lipschitz. Instead, as introduced by Wang, Xiao, Devadas, and Xu \cite{WangXDX20}, we study general convex loss functions with the assumption that the distribution of gradients has bounded $k$-th momen… ▽ More

    Submitted 1 November, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

  35. arXiv:2104.09732  [pdf, other

    stat.ML cs.LG

    Knowledge Distillation as Semiparametric Inference

    Authors: Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, Lester Mackey

    Abstract: A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model. Surprisingly, this two-step knowledge distillation process often leads to higher accuracy than training the student directly on labeled data. To explain and enhance this phenomenon, we cast knowledge distillation as a semiparametric in… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  36. arXiv:2103.03279  [pdf, ps, other

    cs.LG cs.AI

    Remember What You Want to Forget: Algorithms for Machine Unlearning

    Authors: Ayush Sekhari, Jayadev Acharya, Gautam Kamath, Ananda Theertha Suresh

    Abstract: We study the problem of unlearning datapoints from a learnt model. The learner first receives a dataset $S$ drawn i.i.d. from an unknown distribution, and outputs a model $\widehat{w}$ that performs well on unseen samples from the same distribution. However, at some point in the future, any training datapoint $z \in S$ can request to be unlearned, thus prompting the learner to modify its output mo… ▽ More

    Submitted 22 July, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

  37. arXiv:2011.04832  [pdf, other

    cs.LG cs.IT q-bio.GN stat.ML

    Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment

    Authors: Govinda M. Kamath, Tavor Z. Baharav, Ilan Shomorony

    Abstract: Pairwise alignment of DNA sequencing data is a ubiquitous task in bioinformatics and typically represents a heavy computational burden. State-of-the-art approaches to speed up this task use hashing to identify short segments (k-mers) that are shared by pairs of reads, which can then be used to estimate alignment scores. However, when the number of reads is large, accurately estimating alignment sc… ▽ More

    Submitted 12 February, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2020

  38. arXiv:2010.09929  [pdf, ps, other

    stat.ML cs.CR cs.DS cs.IT cs.LG

    On the Sample Complexity of Privately Learning Unbounded High-Dimensional Gaussians

    Authors: Ishaq Aden-Ali, Hassan Ashtiani, Gautam Kamath

    Abstract: We provide sample complexity upper bounds for agnostically learning multivariate Gaussians under the constraint of approximate differential privacy. These are the first finite sample upper bounds for general Gaussians which do not impose restrictions on the parameters of the distribution. Our bounds are near-optimal in the case when the covariance is known to be the identity, and conjectured to be… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

  39. arXiv:2010.09063  [pdf, other

    cs.LG cs.CR cs.PF

    Enabling Fast Differentially Private SGD via Just-in-Time Compilation and Vectorization

    Authors: Pranav Subramani, Nicholas Vadivelu, Gautam Kamath

    Abstract: A common pain point in differentially private machine learning is the significant runtime overhead incurred when executing Differentially Private Stochastic Gradient Descent (DPSGD), which may be as large as two orders of magnitude. We thoroughly demonstrate that by exploiting powerful language primitives, including vectorization, just-in-time compilation, and static graph optimization, one can dr… ▽ More

    Submitted 26 October, 2021; v1 submitted 18 October, 2020; originally announced October 2020.

    Comments: To appear in NeurIPS 2021

  40. arXiv:2006.06618  [pdf, other

    stat.ML cs.CR cs.DS cs.IT cs.LG math.ST

    CoinPress: Practical Private Mean and Covariance Estimation

    Authors: Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman

    Abstract: We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data that are accurate at small sample sizes. We demonstrate the effectiveness of our algorithms both theoretically and empirically using synthetic and real-world datasets -- showing that their asymptotic error rates match the state-of-the-art theoretical bounds, and that they concretely ou… ▽ More

    Submitted 9 October, 2022; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: Code is available at https://github.com/twistedcubic/coin-press

  41. arXiv:2005.00010  [pdf, other

    stat.ML cs.CR cs.DS cs.IT cs.LG

    A Primer on Private Statistics

    Authors: Gautam Kamath, Jonathan Ullman

    Abstract: Differentially private statistical estimation has seen a flurry of developments over the last several years. Study has been divided into two schools of thought, focusing on empirical statistics versus population statistics. We suggest that these two lines of work are more similar than different by giving examples of methods that were initially framed for empirical statistics, but can be applied ju… ▽ More

    Submitted 30 April, 2020; originally announced May 2020.

    Comments: 20 pages. Comments welcome

  42. arXiv:2004.00010  [pdf, other

    cs.DS cs.CR stat.ML

    The Discrete Gaussian for Differential Privacy

    Authors: Clément L. Canonne, Gautam Kamath, Thomas Steinke

    Abstract: A key tool for building differentially private systems is adding Gaussian noise to the output of a function evaluated on a sensitive dataset. Unfortunately, using a continuous distribution presents several practical challenges. First and foremost, finite computers cannot exactly represent samples from continuous distributions, and previous work has demonstrated that seemingly innocuous numerical e… ▽ More

    Submitted 18 January, 2021; v1 submitted 31 March, 2020; originally announced April 2020.

    Comments: Improved time analysis, and generalisation to the multivariate case

  43. arXiv:2002.12321  [pdf, other

    stat.ML cs.CR cs.DS cs.LG math.ST stat.ME

    PAPRIKA: Private Online False Discovery Rate Control

    Authors: Wanrong Zhang, Gautam Kamath, Rachel Cummings

    Abstract: In hypothesis testing, a false discovery occurs when a hypothesis is incorrectly rejected due to noise in the sample. When adaptively testing multiple hypotheses, the probability of a false discovery increases as more tests are performed. Thus the problem of False Discovery Rate (FDR) control is to find a procedure for testing multiple hypotheses that accounts for this effect in determining the se… ▽ More

    Submitted 20 October, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

  44. arXiv:2002.09465  [pdf, other

    cs.DS cs.CR cs.IT cs.LG stat.ML

    Locally Private Hypothesis Selection

    Authors: Sivakanth Gopi, Gautam Kamath, Janardhan Kulkarni, Aleksandar Nikolov, Zhiwei Steven Wu, Huanyu Zhang

    Abstract: We initiate the study of hypothesis selection under local differential privacy. Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $\mathcal{Q}$, we aim to output, under the constraints of $\varepsilon$-local differential privacy, a distribution from $\mathcal{Q}$ whose total variation distance to $p$ is comparable to the best such distribution. T… ▽ More

    Submitted 19 June, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: To appear in COLT 2020

  45. arXiv:2002.09464  [pdf, other

    cs.DS cs.CR cs.IT cs.LG stat.ML

    Private Mean Estimation of Heavy-Tailed Distributions

    Authors: Gautam Kamath, Vikrant Singhal, Jonathan Ullman

    Abstract: We give new upper and lower bounds on the minimax sample complexity of differentially private mean estimation of distributions with bounded $k$-th moments. Roughly speaking, in the univariate case, we show that $n = Θ\left(\frac{1}{α^2} + \frac{1}{α^{\frac{k}{k-1}}\varepsilon}\right)$ samples are necessary and sufficient to estimate the mean to $α$-accuracy under $\varepsilon$-differential privacy… ▽ More

    Submitted 16 February, 2021; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: Appeared in COLT 2020

  46. arXiv:2002.09463  [pdf, ps, other

    cs.DS cs.CR cs.LG stat.ML

    Privately Learning Markov Random Fields

    Authors: Huanyu Zhang, Gautam Kamath, Janardhan Kulkarni, Zhiwei Steven Wu

    Abstract: We consider the problem of learning Markov Random Fields (including the prototypical example, the Ising model) under the constraint of differential privacy. Our learning goals include both structure learning, where we try to estimate the underlying graph structure of the model, as well as the harder goal of parameter learning, in which we additionally estimate the parameter on each edge. We provid… ▽ More

    Submitted 14 August, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

  47. arXiv:1911.07357  [pdf, ps, other

    cs.DS cs.IT cs.LG math.PR math.ST

    Random Restrictions of High-Dimensional Distributions and Uniformity Testing with Subcube Conditioning

    Authors: Clément L. Canonne, Xi Chen, Gautam Kamath, Amit Levi, Erik Waingarten

    Abstract: We give a nearly-optimal algorithm for testing uniformity of distributions supported on $\{-1,1\}^n$, which makes $\tilde O (\sqrt{n}/\varepsilon^2)$ queries to a subcube conditional sampling oracle (Bhattacharyya and Chakraborty (2018)). The key technical component is a natural notion of random restriction for distributions on $\{-1,1\}^n$, and a quantitative analysis of how such a restriction af… ▽ More

    Submitted 4 February, 2021; v1 submitted 17 November, 2019; originally announced November 2019.

    Comments: Added Remark 4.4, which discusses the time complexity (the algorithms are polynomial-time, based on an observation from [CJLW20]); removing log log log n factor for the Gaussian testing algorithm. These changes reflect those included in the conference version (SODA'21)

  48. arXiv:1909.03951  [pdf, other

    cs.DS cs.CR cs.IT cs.LG stat.ML

    Differentially Private Algorithms for Learning Mixtures of Separated Gaussians

    Authors: Gautam Kamath, Or Sheffet, Vikrant Singhal, Jonathan Ullman

    Abstract: Learning the parameters of Gaussian mixture models is a fundamental and widely studied problem with numerous applications. In this work, we give new algorithms for learning the parameters of a high-dimensional, well separated, Gaussian mixture model subject to the strong constraint of differential privacy. In particular, we give a differentially private analogue of the algorithm of Achlioptas and… ▽ More

    Submitted 15 October, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: To appear in NeurIPS 2019

  49. arXiv:1908.01119  [pdf, other

    cs.IT cs.NI

    Optimal Information Updating based on Value of Information

    Authors: Rahul Singh, Gopal Krishna Kamath, P. R. Kumar

    Abstract: We address the problem of how to optimally schedule data packets over an unreliable channel in order to minimize the estimation error of a simple-to-implement remote linear estimator using a constant "Kalman'' gain to track the state of a Gauss Markov process. The remote estimator receives time-stamped data packets which contain noisy observations of the process. Additionally, they also contain th… ▽ More

    Submitted 3 August, 2019; originally announced August 2019.

    Comments: Accepted in Allerton 2019

  50. arXiv:1905.13229  [pdf, ps, other

    cs.DS cs.CR cs.LG stat.ML

    Private Hypothesis Selection

    Authors: Mark Bun, Gautam Kamath, Thomas Steinke, Zhiwei Steven Wu

    Abstract: We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution $P$ and a set of $m$ probability distributions $\mathcal{H}$, the goal is to output, in a $\varepsilon$-differentially private manner, a distribution from $\mathcal{H}$ whose total variation distance to $P$ is comparable to that of the best such distribution (which we deno… ▽ More

    Submitted 4 January, 2021; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: Appeared in NeurIPS 2019. Final version to appear in IEEE Transactions on Information Theory