-
Smart Choices and the Selection Monad
Authors:
Martin Abadi,
Gordon Plotkin
Abstract:
Describing systems in terms of choices and their resulting costs and rewards offers the promise of freeing algorithm designers and programmers from specifying how those choices should be made; in implementations, the choices can be realized by optimization techniques and, increasingly, by machine-learning methods. We study this approach from a programming-language perspective. We define two small…
▽ More
Describing systems in terms of choices and their resulting costs and rewards offers the promise of freeing algorithm designers and programmers from specifying how those choices should be made; in implementations, the choices can be realized by optimization techniques and, increasingly, by machine-learning methods. We study this approach from a programming-language perspective. We define two small languages that support decision-making abstractions: one with choices and rewards, and the other additionally with probabilities. We give both operational and denotational semantics.
In the case of the second language we consider three denotational semantics, with varying degrees of correlation between possible program values and expected rewards. The operational semantics combine the usual semantics of standard constructs with optimization over spaces of possible execution strategies. The denotational semantics, which are compositional, rely on the selection monad, to handle choice, augmented with an auxiliary monad to handle other effects, such as rewards or probability.
We establish adequacy theorems that the two semantics coincide in all cases. We also prove full abstraction at base types, with varying notions of observation in the probabilistic case corresponding to the various degrees of correlation. We present axioms for choice combined with rewards and probability, establishing completeness at base types for the case of rewards without probability.
△ Less
Submitted 19 April, 2023; v1 submitted 17 July, 2020;
originally announced July 2020.
-
A Simple Differentiable Programming Language
Authors:
Martin Abadi,
Gordon D. Plotkin
Abstract:
Automatic differentiation plays a prominent role in scientific computing and in modern machine learning, often in the context of powerful programming systems. The relation of the various embodiments of automatic differentiation to the mathematical notion of derivative is not always entirely clear---discrepancies can arise, sometimes inadvertently. In order to study automatic differentiation in suc…
▽ More
Automatic differentiation plays a prominent role in scientific computing and in modern machine learning, often in the context of powerful programming systems. The relation of the various embodiments of automatic differentiation to the mathematical notion of derivative is not always entirely clear---discrepancies can arise, sometimes inadvertently. In order to study automatic differentiation in such programming contexts, we define a small but expressive programming language that includes a construct for reverse-mode differentiation. We give operational and denotational semantics for this language. The operational semantics employs popular implementation techniques, while the denotational semantics employs notions of differentiation familiar from real analysis. We establish that these semantics coincide.
△ Less
Submitted 1 February, 2020; v1 submitted 11 November, 2019;
originally announced November 2019.
-
Dynamic Control Flow in Large-Scale Machine Learning
Authors:
Yuan Yu,
Martín Abadi,
Paul Barham,
Eugene Brevdo,
Mike Burrows,
Andy Davis,
Jeff Dean,
Sanjay Ghemawat,
Tim Harley,
Peter Hawkins,
Michael Isard,
Manjunath Kudlur,
Rajat Monga,
Derek Murray,
Xiaoqiang Zheng
Abstract:
Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditional execution, and other features that call for dynamic control flow. These applications benefit from the ability to make rapid control-flow decisions a…
▽ More
Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditional execution, and other features that call for dynamic control flow. These applications benefit from the ability to make rapid control-flow decisions across a set of computing devices in a distributed system. For performance, scalability, and expressiveness, a machine learning system must support dynamic control flow in distributed and heterogeneous environments.
This paper presents a programming model for distributed machine learning that supports dynamic control flow. We describe the design of the programming model, and its implementation in TensorFlow, a distributed machine learning system. Our approach extends the use of dataflow graphs to represent machine learning models, offering several distinctive features. First, the branches of conditionals and bodies of loops can be partitioned across many machines to run on a set of heterogeneous devices, including CPUs, GPUs, and custom ASICs. Second, programs written in our model support automatic differentiation and distributed gradient computations, which are necessary for training machine learning models that use control flow. Third, our choice of non-strict semantics enables multiple loop iterations to execute in parallel across machines, and to overlap compute and I/O operations.
We have done our work in the context of TensorFlow, and it has been used extensively in research and production. We evaluate it using several real-world applications, and demonstrate its performance and scalability.
△ Less
Submitted 4 May, 2018;
originally announced May 2018.
-
Adversarial Patch
Authors:
Tom B. Brown,
Dandelion Mané,
Aurko Roy,
Martín Abadi,
Justin Gilmer
Abstract:
We present a method to create universal, robust, targeted adversarial image patches in the real world. The patches are universal because they can be used to attack any scene, robust because they work under a wide variety of transformations, and targeted because they can cause a classifier to output any target class. These adversarial patches can be printed, added to any scene, photographed, and pr…
▽ More
We present a method to create universal, robust, targeted adversarial image patches in the real world. The patches are universal because they can be used to attack any scene, robust because they work under a wide variety of transformations, and targeted because they can cause a classifier to output any target class. These adversarial patches can be printed, added to any scene, photographed, and presented to image classifiers; even when the patches are small, they cause the classifiers to ignore the other items in the scene and report a chosen target class.
To reproduce the results from the paper, our code is available at https://github.com/tensorflow/cleverhans/tree/master/examples/adversarial_patch
△ Less
Submitted 16 May, 2018; v1 submitted 27 December, 2017;
originally announced December 2017.
-
On the Protection of Private Information in Machine Learning Systems: Two Recent Approaches
Authors:
Martín Abadi,
Úlfar Erlingsson,
Ian Goodfellow,
H. Brendan McMahan,
Ilya Mironov,
Nicolas Papernot,
Kunal Talwar,
Li Zhang
Abstract:
The recent, remarkable growth of machine learning has led to intense interest in the privacy of the data on which machine learning relies, and to new techniques for preserving privacy. However, older ideas about privacy may well remain valid and useful. This note reviews two recent works on privacy in the light of the wisdom of some of the early literature, in particular the principles distilled b…
▽ More
The recent, remarkable growth of machine learning has led to intense interest in the privacy of the data on which machine learning relies, and to new techniques for preserving privacy. However, older ideas about privacy may well remain valid and useful. This note reviews two recent works on privacy in the light of the wisdom of some of the early literature, in particular the principles distilled by Saltzer and Schroeder in the 1970s.
△ Less
Submitted 26 August, 2017;
originally announced August 2017.
-
AMIGOS: A Dataset for Affect, Personality and Mood Research on Individuals and Groups
Authors:
Juan Abdon Miranda-Correa,
Mojtaba Khomami Abadi,
Nicu Sebe,
Ioannis Patras
Abstract:
We present AMIGOS-- A dataset for Multimodal research of affect, personality traits and mood on Individuals and GrOupS. Different to other databases, we elicited affect using both short and long videos in two social contexts, one with individual viewers and one with groups of viewers. The database allows the multimodal study of the affective responses, by means of neuro-physiological signals of in…
▽ More
We present AMIGOS-- A dataset for Multimodal research of affect, personality traits and mood on Individuals and GrOupS. Different to other databases, we elicited affect using both short and long videos in two social contexts, one with individual viewers and one with groups of viewers. The database allows the multimodal study of the affective responses, by means of neuro-physiological signals of individuals in relation to their personality and mood, and with respect to the social context and videos' duration. The data is collected in two experimental settings. In the first one, 40 participants watched 16 short emotional videos. In the second one, the participants watched 4 long videos, some of them alone and the rest in groups. The participants' signals, namely, Electroencephalogram (EEG), Electrocardiogram (ECG) and Galvanic Skin Response (GSR), were recorded using wearable sensors. Participants' frontal HD video and both RGB and depth full body videos were also recorded. Participants emotions have been annotated with both self-assessment of affective levels (valence, arousal, control, familiarity, liking and basic emotions) felt during the videos as well as external-assessment of levels of valence and arousal. We present a detailed correlation analysis of the different dimensions as well as baseline methods and results for single-trial classification of valence and arousal, personality traits, mood and social context. The database is made publicly available.
△ Less
Submitted 13 April, 2017; v1 submitted 2 February, 2017;
originally announced February 2017.
-
Learning a Natural Language Interface with Neural Programmer
Authors:
Arvind Neelakantan,
Quoc V. Le,
Martin Abadi,
Andrew McCallum,
Dario Amodei
Abstract:
Learning a natural language interface for database tables is a challenging task that involves deep language understanding and multi-step reasoning. The task is often approached by mapping natural language queries to logical forms or programs that provide the desired response when executed on the database. To our knowledge, this paper presents the first weakly supervised, end-to-end neural network…
▽ More
Learning a natural language interface for database tables is a challenging task that involves deep language understanding and multi-step reasoning. The task is often approached by mapping natural language queries to logical forms or programs that provide the desired response when executed on the database. To our knowledge, this paper presents the first weakly supervised, end-to-end neural network model to induce such programs on a real-world dataset. We enhance the objective function of Neural Programmer, a neural network with built-in discrete operations, and apply it on WikiTableQuestions, a natural language question-answering dataset. The model is trained end-to-end with weak supervision of question-answer pairs, and does not require domain-specific grammars, rules, or annotations that are key elements in previous approaches to program induction. The main experimental result in this paper is that a single Neural Programmer model achieves 34.2% accuracy using only 10,000 examples with weak supervision. An ensemble of 15 models, with a trivial combination technique, achieves 37.7% accuracy, which is competitive to the current state-of-the-art accuracy of 37.1% obtained by a traditional natural language semantic parser.
△ Less
Submitted 2 March, 2017; v1 submitted 27 November, 2016;
originally announced November 2016.
-
Learning to Protect Communications with Adversarial Neural Cryptography
Authors:
Martín Abadi,
David G. Andersen
Abstract:
We ask whether neural networks can learn to use secret keys to protect information from other neural networks. Specifically, we focus on ensuring confidentiality properties in a multiagent system, and we specify those properties in terms of an adversary. Thus, a system may consist of neural networks named Alice and Bob, and we aim to limit what a third neural network named Eve learns from eavesdro…
▽ More
We ask whether neural networks can learn to use secret keys to protect information from other neural networks. Specifically, we focus on ensuring confidentiality properties in a multiagent system, and we specify those properties in terms of an adversary. Thus, a system may consist of neural networks named Alice and Bob, and we aim to limit what a third neural network named Eve learns from eavesdropping on the communication between Alice and Bob. We do not prescribe specific cryptographic algorithms to these neural networks; instead, we train end-to-end, adversarially. We demonstrate that the neural networks can learn how to perform forms of encryption and decryption, and also how to apply these operations selectively in order to meet confidentiality goals.
△ Less
Submitted 21 October, 2016;
originally announced October 2016.
-
Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data
Authors:
Nicolas Papernot,
Martín Abadi,
Úlfar Erlingsson,
Ian Goodfellow,
Kunal Talwar
Abstract:
Some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may therefore reveal sensitive information.
To address this problem, we demonstrate a generally applicable approach to providing strong privacy guarantees…
▽ More
Some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may therefore reveal sensitive information.
To address this problem, we demonstrate a generally applicable approach to providing strong privacy guarantees for training data: Private Aggregation of Teacher Ensembles (PATE). The approach combines, in a black-box fashion, multiple models trained with disjoint datasets, such as records from different subsets of users. Because they rely directly on sensitive data, these models are not published, but instead used as "teachers" for a "student" model. The student learns to predict an output chosen by noisy voting among all of the teachers, and cannot directly access an individual teacher or the underlying data or parameters. The student's privacy properties can be understood both intuitively (since no single teacher and thus no single dataset dictates the student's training) and formally, in terms of differential privacy. These properties hold even if an adversary can not only query the student but also inspect its internal workings.
Compared with previous work, the approach imposes only weak assumptions on how teachers are trained: it applies to any model, including non-convex models like DNNs. We achieve state-of-the-art privacy/utility trade-offs on MNIST and SVHN thanks to an improved privacy analysis and semi-supervised learning.
△ Less
Submitted 3 March, 2017; v1 submitted 18 October, 2016;
originally announced October 2016.
-
The Applied Pi Calculus: Mobile Values, New Names, and Secure Communication
Authors:
Martín Abadi,
Bruno Blanchet,
Cédric Fournet
Abstract:
We study the interaction of the programming construct "new", which generates statically scoped names, with communication via messages on channels. This interaction is crucial in security protocols, which are the main motivating examples for our work, it also appears in other programming-language contexts.
We define the applied pi calculus, a simple, general extension of the pi calculus in which…
▽ More
We study the interaction of the programming construct "new", which generates statically scoped names, with communication via messages on channels. This interaction is crucial in security protocols, which are the main motivating examples for our work, it also appears in other programming-language contexts.
We define the applied pi calculus, a simple, general extension of the pi calculus in which values can be formed from names via the application of built-in functions, subject to equations, and be sent as messages. (In contrast, the pure pi calculus lacks built-in functions, its only messages are atomic names.) We develop semantics and proof techniques for this extended language and apply them in reasoning about security protocols.
This paper essentially subsumes the conference paper that introduced the applied pi calculus in 2001. It fills gaps, incorporates improvements, and further explains and studies the applied pi calculus. Since 2001, the applied pi calculus has been the basis for much further work, described in many research publications and sometimes embodied in useful software, such as the tool ProVerif, which relies on the applied pi calculus to support the specification and automatic analysis of security protocols. Although this paper does not aim to be a complete review of the subject, it benefits from that further work and provides better foundations for some of it. In particular, the applied pi calculus has evolved through its implementation in ProVerif, and the present definition reflects that evolution.
△ Less
Submitted 28 July, 2017; v1 submitted 10 September, 2016;
originally announced September 2016.
-
Deep Learning with Differential Privacy
Authors:
Martín Abadi,
Andy Chu,
Ian Goodfellow,
H. Brendan McMahan,
Ilya Mironov,
Kunal Talwar,
Li Zhang
Abstract:
Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refin…
▽ More
Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.
△ Less
Submitted 24 October, 2016; v1 submitted 1 July, 2016;
originally announced July 2016.
-
TensorFlow: A system for large-scale machine learning
Authors:
Martín Abadi,
Paul Barham,
Jianmin Chen,
Zhifeng Chen,
Andy Davis,
Jeffrey Dean,
Matthieu Devin,
Sanjay Ghemawat,
Geoffrey Irving,
Michael Isard,
Manjunath Kudlur,
Josh Levenberg,
Rajat Monga,
Sherry Moore,
Derek G. Murray,
Benoit Steiner,
Paul Tucker,
Vijay Vasudevan,
Pete Warden,
Martin Wicke,
Yuan Yu,
Xiaoqiang Zheng
Abstract:
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs,…
▽ More
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model in contrast to existing systems, and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.
△ Less
Submitted 31 May, 2016; v1 submitted 27 May, 2016;
originally announced May 2016.
-
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Authors:
Martín Abadi,
Ashish Agarwal,
Paul Barham,
Eugene Brevdo,
Zhifeng Chen,
Craig Citro,
Greg S. Corrado,
Andy Davis,
Jeffrey Dean,
Matthieu Devin,
Sanjay Ghemawat,
Ian Goodfellow,
Andrew Harp,
Geoffrey Irving,
Michael Isard,
Yangqing Jia,
Rafal Jozefowicz,
Lukasz Kaiser,
Manjunath Kudlur,
Josh Levenberg,
Dan Mane,
Rajat Monga,
Sherry Moore,
Derek Murray,
Chris Olah
, et al. (15 additional authors not shown)
Abstract:
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational de…
▽ More
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
△ Less
Submitted 16 March, 2016; v1 submitted 14 March, 2016;
originally announced March 2016.
-
Falkirk Wheel: Rollback Recovery for Dataflow Systems
Authors:
Michael Isard,
Martín Abadi
Abstract:
We present a new model for rollback recovery in distributed dataflow systems. We explain existing rollback schemes by assigning a logical time to each event such as a message delivery. If some processors fail during an execution, the system rolls back by selecting a set of logical times for each processor. The effect of events at times within the set is retained or restored from saved state, while…
▽ More
We present a new model for rollback recovery in distributed dataflow systems. We explain existing rollback schemes by assigning a logical time to each event such as a message delivery. If some processors fail during an execution, the system rolls back by selecting a set of logical times for each processor. The effect of events at times within the set is retained or restored from saved state, while the effect of other events is undone and re-executed. We show that, by adopting different logical time "domains" at different processors, an application can adopt appropriate checkpointing schemes for different parts of its computation. We illustrate with an example of an application that combines batch processing with low-latency streaming updates. We show rules, and an algorithm, to determine a globally consistent state for rollback in a system that uses multiple logical time domains. We also introduce selective rollback at a processor, which can selectively preserve the effect of events at some logical times and not others, independent of the original order of execution of those events. Selective rollback permits new checkpointing policies that are particularly well suited to iterative streaming algorithms. We report on an implementation of our new framework in the context of the Naiad system.
△ Less
Submitted 30 March, 2015;
originally announced March 2015.
-
A Model of Cooperative Threads
Authors:
Martín Abadi,
Gordon D. Plotkin
Abstract:
We develop a model of concurrent imperative programming with threads. We focus on a small imperative language with cooperative threads which execute without interruption until they terminate or explicitly yield control. We define and study a trace-based denotational semantics for this language; this semantics is fully abstract but mathematically elementary. We also give an equational theory for t…
▽ More
We develop a model of concurrent imperative programming with threads. We focus on a small imperative language with cooperative threads which execute without interruption until they terminate or explicitly yield control. We define and study a trace-based denotational semantics for this language; this semantics is fully abstract but mathematically elementary. We also give an equational theory for the computational effects that underlie the language, including thread spawning. We then analyze threads in terms of the free algebra monad for this theory.
△ Less
Submitted 20 October, 2010; v1 submitted 13 September, 2010;
originally announced September 2010.