Skip to main content

Showing 1–26 of 26 results for author: Kurakin, A

  1. arXiv:2403.11981  [pdf, other

    cs.CR cs.CV cs.LG

    Diffusion Denoising as a Certified Defense against Clean-label Poisoning

    Authors: Sanghyun Hong, Nicholas Carlini, Alexey Kurakin

    Abstract: We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion mo… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  2. arXiv:2402.11120  [pdf, other

    cs.LG cs.CV stat.ML

    DART: A Principled Approach to Adversarially Robust Unsupervised Domain Adaptation

    Authors: Yunjuan Wang, Hussein Hazimeh, Natalia Ponomareva, Alexey Kurakin, Ibrahim Hammoud, Raman Arora

    Abstract: Distribution shifts and adversarial examples are two major challenges for deploying machine learning models. While these challenges have been studied individually, their combination is an important topic that remains relatively under-explored. In this work, we study the problem of adversarial robustness under a common setting of distribution shift - unsupervised domain adaptation (UDA). Specifical… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  3. arXiv:2306.01684  [pdf, other

    cs.LG cs.CR

    Harnessing large-language models to generate private synthetic text

    Authors: Alexey Kurakin, Natalia Ponomareva, Umar Syed, Liam MacDermed, Andreas Terzis

    Abstract: Differentially private training algorithms like DP-SGD protect sensitive training data by ensuring that trained models do not reveal private information. An alternative approach, which this paper studies, is to use a sensitive dataset to generate synthetic data that is differentially private with respect to the original data, and then non-privately training a model on the synthetic data. Doing so… ▽ More

    Submitted 10 January, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: 31 pages; 7 figures; compared to previous version added result of LoRa-finetuning

  4. arXiv:2305.05973  [pdf, other

    cs.CL cs.CR cs.IR

    Synthetic Query Generation for Privacy-Preserving Deep Retrieval Systems using Differentially Private Language Models

    Authors: Aldo Gael Carranza, Rezsa Farahani, Natalia Ponomareva, Alex Kurakin, Matthew Jagielski, Milad Nasr

    Abstract: We address the challenge of ensuring differential privacy (DP) guarantees in training deep retrieval systems. Training these systems often involves the use of contrastive-style losses, which are typically non-per-example decomposable, making them difficult to directly DP-train with since common techniques require per-example gradients. To address this issue, we propose an approach that prioritizes… ▽ More

    Submitted 23 May, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted to NAACL 2024

  5. arXiv:2303.00654  [pdf, other

    cs.LG cs.CR stat.ML

    How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy

    Authors: Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson Denison, H. Brendan McMahan, Sergei Vassilvitskii, Steve Chien, Abhradeep Thakurta

    Abstract: ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP t… ▽ More

    Submitted 31 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Journal ref: Journal of Artificial Intelligence Research 77 (2023) 1113-1201

  6. arXiv:2302.09207  [pdf, other

    cs.CL cs.AI

    RETVec: Resilient and Efficient Text Vectorizer

    Authors: Elie Bursztein, Marina Zhang, Owen Vallis, Xinyu Jia, Alexey Kurakin

    Abstract: This paper describes RETVec, an efficient, resilient, and multilingual text vectorizer designed for neural-based text processing. RETVec combines a novel character encoding with an optional small embedding model to embed words into a 256-dimensional vector space. The RETVec embedding model is pre-trained using pair-wise metric learning to be robust against typos and character-level adversarial att… ▽ More

    Submitted 22 April, 2024; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  7. arXiv:2212.13700  [pdf, other

    cs.CR cs.LG

    Publishing Efficient On-device Models Increases Adversarial Vulnerability

    Authors: Sanghyun Hong, Nicholas Carlini, Alexey Kurakin

    Abstract: Recent increases in the computational demands of deep neural networks (DNNs) have sparked interest in efficient deep learning mechanisms, e.g., quantization or pruning. These mechanisms enable the construction of a small, efficient version of commercial-scale models with comparable accuracy, accelerating their deployment to resource-constrained devices. In this paper, we study the security consi… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

    Comments: Accepted to IEEE SaTML 2023

  8. arXiv:2211.13403  [pdf, other

    cs.LG cs.CR cs.CV

    Differentially Private Image Classification from Features

    Authors: Harsh Mehta, Walid Krichene, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

    Abstract: Leveraging transfer learning has recently been shown to be an effective strategy for training large models with Differential Privacy (DP). Moreover, somewhat surprisingly, recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP. While past studies largely rely on algorithms like DP-SGD for training large models, in the specific c… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  9. arXiv:2205.02973  [pdf, other

    cs.LG cs.CR cs.CV

    Large Scale Transfer Learning for Differentially Private Image Classification

    Authors: Harsh Mehta, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

    Abstract: Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. In the field of deep learning, Differentially Private Stochastic Gradient Descent (DP-SGD) has emerged as a popular private training algorithm. Unfortunately, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private trai… ▽ More

    Submitted 20 May, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

  10. arXiv:2201.12328  [pdf, other

    cs.LG

    Toward Training at ImageNet Scale with Differential Privacy

    Authors: Alexey Kurakin, Shuang Song, Steve Chien, Roxana Geambasu, Andreas Terzis, Abhradeep Thakurta

    Abstract: Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy. We set ou… ▽ More

    Submitted 8 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: 25 pages, 7 figures. Code available at https://github.com/google-research/dp-imagenet

  11. arXiv:2106.04732  [pdf, other

    cs.LG cs.AI cs.CV

    AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation

    Authors: David Berthelot, Rebecca Roelofs, Kihyuk Sohn, Nicholas Carlini, Alex Kurakin

    Abstract: We extend semi-supervised learning to the problem of domain adaptation to learn significantly higher-accuracy models that train on one data distribution and test on a different one. With the goal of generality, we introduce AdaMatch, a method that unifies the tasks of unsupervised domain adaptation (UDA), semi-supervised learning (SSL), and semi-supervised domain adaptation (SSDA). In an extensive… ▽ More

    Submitted 15 March, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted to ICLR 2022

  12. arXiv:2106.04690  [pdf, other

    cs.CR cs.LG

    Handcrafted Backdoors in Deep Neural Networks

    Authors: Sanghyun Hong, Nicholas Carlini, Alexey Kurakin

    Abstract: When machine learning training is outsourced to third parties, $backdoor$ $attacks$ become practical as the third party who trains the model may act maliciously to inject hidden behaviors into the otherwise accurate model. Until now, the mechanism to inject backdoors has been limited to $poisoning$. We argue that a supply-chain attacker has more attack techniques available by introducing a… ▽ More

    Submitted 15 November, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted to NeurIPS 2022 [Oral]

  13. arXiv:2010.11645  [pdf, other

    cs.LG cs.AI

    Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming

    Authors: Sumanth Dathathri, Krishnamurthy Dvijotham, Alexey Kurakin, Aditi Raghunathan, Jonathan Uesato, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy Liang, Pushmeet Kohli

    Abstract: Convex relaxations have emerged as a promising approach for verifying desirable properties of neural networks like robustness to adversarial perturbations. Widely used Linear Programming (LP) relaxations only work well when networks are trained to facilitate verification. This precludes applications that involve verification-agnostic networks, i.e., networks not specially trained for verification.… ▽ More

    Submitted 3 November, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

  14. arXiv:2001.07685  [pdf

    cs.LG cs.CV stat.ML

    FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

    Authors: Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel

    Abstract: Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling. Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images. For a given imag… ▽ More

    Submitted 25 November, 2020; v1 submitted 21 January, 2020; originally announced January 2020.

    Comments: Published at NeurIPS 2020 as a conference paper

  15. arXiv:1911.09785  [pdf, other

    cs.LG cs.CV stat.ML

    ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring

    Authors: David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel

    Abstract: We improve the recently-proposed "MixMatch" semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring. Distribution alignment encourages the marginal distribution of predictions on unlabeled data to be close to the marginal distribution of ground-truth labels. Augmentation anchoring feeds multiple strongly augmented versions of an input… ▽ More

    Submitted 13 February, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

  16. arXiv:1909.01838  [pdf, other

    cs.LG cs.CR stat.ML

    High Accuracy and High Fidelity Extraction of Neural Networks

    Authors: Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, Nicolas Papernot

    Abstract: In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access. We taxonomize model extraction attacks around two objectives: *accuracy*, i.e., performing well on the underlying learning task, and *fidelity*, i.e., matching the predictions of the remote victim classifier on any input. To extract a high-accuracy model, we dev… ▽ More

    Submitted 3 March, 2020; v1 submitted 3 September, 2019; originally announced September 2019.

    Comments: USENIX Security 2020, 18 pages, 6 figures

  17. arXiv:1902.06705  [pdf, ps, other

    cs.LG cs.CR stat.ML

    On Evaluating Adversarial Robustness

    Authors: Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, Alexey Kurakin

    Abstract: Correctly evaluating defenses against adversarial examples has proven to be extremely difficult. Despite the significant amount of recent work attempting to design defenses that withstand adaptive attacks, few have succeeded; most papers that propose defenses are quickly shown to be incorrect. We believe a large contributing factor is the difficulty of performing security evaluations. In this pa… ▽ More

    Submitted 20 February, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

    Comments: Living document; source available at https://github.com/evaluating-adversarial-robustness/adv-eval-paper/

  18. arXiv:1808.01976  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Adversarial Vision Challenge

    Authors: Wieland Brendel, Jonas Rauber, Alexey Kurakin, Nicolas Papernot, Behar Veliqi, Marcel Salathé, Sharada P. Mohanty, Matthias Bethge

    Abstract: The NIPS 2018 Adversarial Vision Challenge is a competition to facilitate measurable progress towards robust machine vision models and more generally applicable adversarial attacks. This document is an updated version of our competition proposal that was accepted in the competition track of 32nd Conference on Neural Information Processing Systems (NIPS 2018).

    Submitted 6 December, 2018; v1 submitted 6 August, 2018; originally announced August 2018.

    Comments: https://www.crowdai.org/challenges/adversarial-vision-challenge

  19. arXiv:1804.00097  [pdf, other

    cs.CV cs.CR cs.LG stat.ML

    Adversarial Attacks and Defences Competition

    Authors: Alexey Kurakin, Ian Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu, Xiaolin Hu, Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille, Sangxia Huang, Yao Zhao, Yuzhe Zhao, Zhonglin Han, Junjiajia Long, Yerkebulan Berdibekov, Takuya Akiba, Seiya Tokui, Motoki Abe

    Abstract: To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them. In this chapter, we describe the structure and organization of the competition and the solutions developed by several o… ▽ More

    Submitted 30 March, 2018; originally announced April 2018.

    Comments: 36 pages, 10 figures

  20. arXiv:1803.06373  [pdf, ps, other

    cs.LG stat.ML

    Adversarial Logit Pairing

    Authors: Harini Kannan, Alexey Kurakin, Ian Goodfellow

    Abstract: In this paper, we develop improved techniques for defending against adversarial examples at scale. First, we implement the state of the art version of adversarial training at unprecedented scale on ImageNet and investigate whether it remains effective in this setting - an important open scientific question (Athalye et al., 2018). Next, we introduce enhanced defenses using a technique we call logit… ▽ More

    Submitted 16 March, 2018; originally announced March 2018.

    Comments: 10 pages

  21. arXiv:1802.08195  [pdf, other

    cs.LG cs.CV q-bio.NC stat.ML

    Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

    Authors: Gamaleldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein

    Abstract: Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we address this question by leveraging recent techniques that transfer adversarial examples from computer vision models with… ▽ More

    Submitted 21 May, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

    Journal ref: Advances in Neural Information Processing Systems, 2018

  22. arXiv:1705.07204  [pdf, other

    stat.ML cs.CR cs.LG

    Ensemble Adversarial Training: Attacks and Defenses

    Authors: Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel

    Abstract: Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model's loss. We show that this form of adversarial training converges to a degenerate… ▽ More

    Submitted 26 April, 2020; v1 submitted 19 May, 2017; originally announced May 2017.

    Comments: 22 pages, 5 figures, International Conference on Learning Representations (ICLR) 2018 (amended in April 2020 to include subsequent attacks that significantly reduced the robustness of our models)

  23. arXiv:1703.01041  [pdf, other

    cs.NE cs.AI cs.CV cs.DC

    Large-Scale Evolution of Image Classifiers

    Authors: Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc Le, Alex Kurakin

    Abstract: Neural networks have proven effective at solving difficult problems but designing their architectures can be challenging, even for image classification problems alone. Our goal is to minimize human participation, so we employ evolutionary algorithms to discover such networks automatically. Despite significant computational requirements, we show that it is now possible to evolve models with accurac… ▽ More

    Submitted 11 June, 2017; v1 submitted 3 March, 2017; originally announced March 2017.

    Comments: Accepted for publication at ICML 2017 (34th International Conference on Machine Learning)

    ACM Class: I.2.6; I.5.1; I.5.2

  24. arXiv:1611.01236  [pdf, other

    cs.CV cs.CR cs.LG stat.ML

    Adversarial Machine Learning at Scale

    Authors: Alexey Kurakin, Ian Goodfellow, Samy Bengio

    Abstract: Adversarial examples are malicious inputs designed to fool machine learning models. They often transfer from one model to another, allowing attackers to mount black box attacks without knowledge of the target model's parameters. Adversarial training is the process of explicitly training a model on adversarial examples, in order to make it more robust to attack or to reduce its test error on clean… ▽ More

    Submitted 10 February, 2017; v1 submitted 3 November, 2016; originally announced November 2016.

    Comments: 17 pages, 5 figures

  25. arXiv:1610.00768  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

    Authors: Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas Rauber, Rujun Long , et al. (1 additional authors not shown)

    Abstract: CleverHans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop more robust machine learning models and to provide standardized benchmarks of models' performance in the adversarial setting. Benchmarks constructed without a standardized implementation of adversarial exam… ▽ More

    Submitted 27 June, 2018; v1 submitted 3 October, 2016; originally announced October 2016.

    Comments: Technical report for https://github.com/tensorflow/cleverhans

  26. arXiv:1607.02533  [pdf, other

    cs.CV cs.CR cs.LG stat.ML

    Adversarial examples in the physical world

    Authors: Alexey Kurakin, Ian Goodfellow, Samy Bengio

    Abstract: Most existing machine learning classifiers are highly vulnerable to adversarial examples. An adversarial example is a sample of input data which has been modified very slightly in a way that is intended to cause a machine learning classifier to misclassify it. In many cases, these modifications can be so subtle that a human observer does not even notice the modification at all, yet the classifier… ▽ More

    Submitted 10 February, 2017; v1 submitted 8 July, 2016; originally announced July 2016.

    Comments: 14 pages, 6 figures. Demo available at https://youtu.be/zQ_uMenoBCk