Skip to main content

Showing 1–30 of 30 results for author: Miklau, G

  1. Measure-Observe-Remeasure: An Interactive Paradigm for Differentially-Private Exploratory Analysis

    Authors: Priyanka Nanayakkara, Hyeok Kim, Yifan Wu, Ali Sarvghad, Narges Mahyar, Gerome Miklau, Jessica Hullman

    Abstract: Differential privacy (DP) has the potential to enable privacy-preserving analysis on sensitive data, but requires analysts to judiciously spend a limited ``privacy loss budget'' $ε$ across queries. Analysts conducting exploratory analyses do not, however, know all queries in advance and seldom have DP expertise. Thus, they are limited in their ability to specify $ε$ allotments across queries prior… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Published in IEEE Symposium on Security and Privacy (SP) 2024

    Journal ref: in 2024 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2024 pp. 231-231

  2. arXiv:2403.07797  [pdf, other

    cs.LG cs.AI

    Joint Selection: Adaptively Incorporating Public Information for Private Synthetic Data

    Authors: Miguel Fuentes, Brett Mullins, Ryan McKenna, Gerome Miklau, Daniel Sheldon

    Abstract: Mechanisms for generating differentially private synthetic data based on marginals and graphical models have been successful in a wide range of settings. However, one limitation of these methods is their inability to incorporate public data. Initializing a data generating model by pre-training on public data has shown to improve the quality of synthetic data, but this technique is not applicable w… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  3. arXiv:2212.04133  [pdf, other

    cs.CR

    Tumult Analytics: a robust, easy-to-use, scalable, and expressive framework for differential privacy

    Authors: Skye Berghel, Philip Bohannon, Damien Desfontaines, Charles Estes, Sam Haney, Luke Hartman, Michael Hay, Ashwin Machanavajjhala, Tom Magerlein, Gerome Miklau, Amritha Pai, William Sexton, Ruchit Shrestha

    Abstract: In this short paper, we outline the design of Tumult Analytics, a Python framework for differential privacy used at institutions such as the U.S. Census Bureau, the Wikimedia Foundation, or the Internal Revenue Service.

    Submitted 8 December, 2022; originally announced December 2022.

  4. arXiv:2201.12677  [pdf, other

    cs.DB

    AIM: An Adaptive and Iterative Mechanism for Differentially Private Synthetic Data

    Authors: Ryan McKenna, Brett Mullins, Daniel Sheldon, Gerome Miklau

    Abstract: We propose AIM, a new algorithm for differentially private synthetic data generation. AIM is a workload-adaptive algorithm within the paradigm of algorithms that first selects a set of queries, then privately measures those queries, and finally generates synthetic data from the noisy measurements. It uses a set of innovative features to iteratively select the most useful measurements, reflecting b… ▽ More

    Submitted 12 June, 2024; v1 submitted 29 January, 2022; originally announced January 2022.

  5. arXiv:2112.09238  [pdf, other

    cs.CR

    Benchmarking Differentially Private Synthetic Data Generation Algorithms

    Authors: Yuchao Tao, Ryan McKenna, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau

    Abstract: This work presents a systematic benchmark of differentially private synthetic data generation algorithms that can generate tabular data. Utility of the synthetic data is evaluated by measuring whether the synthetic data preserve the distribution of individual and pairs of attributes, pairwise correlation as well as on the accuracy of an ML classification model. In a comprehensive empirical evaluat… ▽ More

    Submitted 15 February, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

  6. arXiv:2109.06153  [pdf, other

    cs.LG cs.CR

    Relaxed Marginal Consistency for Differentially Private Query Answering

    Authors: Ryan McKenna, Siddhant Pradhan, Daniel Sheldon, Gerome Miklau

    Abstract: Many differentially private algorithms for answering database queries involve a step that reconstructs a discrete data distribution from noisy measurements. This provides consistent query answers and reduces error, but often requires space that grows exponentially with dimension. Private-PGM is a recent approach that uses graphical models to represent the data distribution, with complexity proport… ▽ More

    Submitted 25 October, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

  7. arXiv:2108.04978  [pdf, other

    cs.CR

    Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

    Authors: Ryan McKenna, Gerome Miklau, Daniel Sheldon

    Abstract: We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a hi… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: 22 pages

  8. arXiv:2107.10659  [pdf, ps, other

    cs.CR cs.DB stat.AP

    Differentially Private Algorithms for 2020 Census Detailed DHC Race \& Ethnicity

    Authors: Sam Haney, William Sexton, Ashwin Machanavajjhala, Michael Hay, Gerome Miklau

    Abstract: This article describes a proposed differentially private (DP) algorithms that the US Census Bureau is considering to release the Detailed Demographic and Housing Characteristics (DHC) Race & Ethnicity tabulations as part of the 2020 Census. The tabulations contain statistics (counts) of demographic and housing characteristics of the entire population of the US crossed with detailed races and tribe… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: Presented at Theory and Practice of Differential Privacy Workshop (TPDP) 2021

  9. arXiv:2106.12118  [pdf, other

    cs.DB cs.CR

    HDMM: Optimizing error of high-dimensional statistical queries under differential privacy

    Authors: Ryan McKenna, Gerome Miklau, Michael Hay, Ashwin Machanavajjhala

    Abstract: In this work we describe the High-Dimensional Matrix Mechanism (HDMM), a differentially private algorithm for answering a workload of predicate counting queries. HDMM represents query workloads using a compact implicit matrix representation and exploits this representation to efficiently optimize over (a subset of) the space of differentially private algorithms for one that is unbiased and answers… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: text overlap with arXiv:1808.03537

  10. arXiv:2002.01582  [pdf, other

    cs.DB cs.CR

    A workload-adaptive mechanism for linear queries under local differential privacy

    Authors: Ryan McKenna, Raj Kumar Maity, Arya Mazumdar, Gerome Miklau

    Abstract: We propose a new mechanism to accurately answer a user-provided set of linear counting queries under local differential privacy (LDP). Given a set of linear counting queries (the workload) our mechanism automatically adapts to provide accuracy on the workload queries. We define a parametric class of mechanisms that produce unbiased estimates of the workload, and formulate a constrained optimizatio… ▽ More

    Submitted 18 May, 2020; v1 submitted 4 February, 2020; originally announced February 2020.

  11. arXiv:1905.12744  [pdf, other

    cs.DB

    Fair Decision Making using Privacy-Protected Data

    Authors: Satya Kuppam, Ryan Mckenna, David Pujol, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau

    Abstract: Data collected about individuals is regularly used to make decisions that impact those same individuals. We consider settings where sensitive personal data is used to decide who will receive resources or benefits. While it is well known that there is a tradeoff between protecting privacy and the accuracy of decisions, we initiate a first-of-its-kind study into the impact of formally private mechan… ▽ More

    Submitted 24 January, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: 12 pages, 4 figures

  12. arXiv:1902.00174  [pdf, other

    cs.LG stat.ML

    Privacy Preserving Off-Policy Evaluation

    Authors: Tengyang Xie, Philip S. Thomas, Gerome Miklau

    Abstract: Many reinforcement learning applications involve the use of data that is sensitive, such as medical records of patients or financial information. However, most current reinforcement learning methods can leak information contained within the (possibly sensitive) data on which they are trained. To address this problem, we present the first differentially private approach for off-policy evaluation. W… ▽ More

    Submitted 31 January, 2019; originally announced February 2019.

  13. arXiv:1901.09136  [pdf, other

    cs.LG cs.CR stat.ML

    Graphical-model based estimation and inference for differential privacy

    Authors: Ryan McKenna, Daniel Sheldon, Gerome Miklau

    Abstract: Many privacy mechanisms reveal high-level information about a data distribution through noisy measurements. It is common to use this information to estimate the answers to new queries. In this work, we provide an approach to solve this estimation problem efficiently using graphical models, which is particularly effective when the distribution is high-dimensional but the measurements are over low-d… ▽ More

    Submitted 25 January, 2019; originally announced January 2019.

  14. Ektelo: A Framework for Defining Differentially-Private Computations

    Authors: Dan Zhang, Ryan McKenna, Ios Kotsogiannis, George Bissias, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau

    Abstract: The adoption of differential privacy is growing but the complexity of designing private, efficient and accurate algorithms is still high. We propose a novel programming framework and system, Ektelo, for implementing both existing and new privacy algorithms. For the task of answering linear counting queries, we show that nearly all existing algorithms can be composed from operators, each conforming… ▽ More

    Submitted 24 May, 2019; v1 submitted 10 August, 2018; originally announced August 2018.

    Comments: Journal version under submission

  15. Optimizing error of high-dimensional statistical queries under differential privacy

    Authors: Ryan McKenna, Gerome Miklau, Michael Hay, Ashwin Machanavajjhala

    Abstract: Differentially private algorithms for answering sets of predicate counting queries on a sensitive database have many applications. Organizations that collect individual-level data, such as statistical agencies and medical institutions, use them to safely release summary tabulations. However, existing techniques are accurate only on a narrow class of query workloads, or are extremely slow, especial… ▽ More

    Submitted 10 August, 2018; originally announced August 2018.

    Journal ref: PVLDB, 11 (10): 1206-1219, 2018

  16. On Obtaining Stable Rankings

    Authors: Abolfazl Asudeh, H. V. Jagadish, Gerome Miklau, Julia Stoyanovich

    Abstract: Decision making is challenging when there is more than one criterion to consider. In such cases, it is common to assign a goodness score to each item as a weighted sum of its attribute values and rank them accordingly. Clearly, the ranking obtained depends on the weights used for this summation. Ideally, one would want the ranked order not to change if the weights are changed slightly. We call thi… ▽ More

    Submitted 18 December, 2018; v1 submitted 29 April, 2018; originally announced April 2018.

    Journal ref: Abolfazl Asudeh, H. V. Jagadish, Gerome Miklau, Julia Stoyanovich. On Obtaining Stable Rankings. PVLDB , 12(3): 237-250, 2018

  17. arXiv:1804.07890  [pdf, other

    cs.CY cs.DB cs.HC

    A Nutritional Label for Rankings

    Authors: Ke Yang, Julia Stoyanovich, Abolfazl Asudeh, Bill Howe, HV Jagadish, Gerome Miklau

    Abstract: Algorithmic decisions often result in scoring and ranking individuals to determine credit worthiness, qualifications for college admissions and employment, and compatibility as dating partners. While automatic and seemingly objective, ranking algorithms can discriminate against individuals and protected groups, and exhibit low diversity. Furthermore, ranked results are often unstable --- small cha… ▽ More

    Submitted 21 April, 2018; originally announced April 2018.

    Comments: 4 pages, SIGMOD demo, 3 figuress, ACM SIGMOD 2018

    MSC Class: 68U01; 68P01 ACM Class: H.2, H.2.8, K.4.1

  18. arXiv:1706.04646  [pdf, other

    cs.LG cs.CR stat.ML

    Differentially Private Learning of Undirected Graphical Models using Collective Graphical Models

    Authors: Garrett Bernstein, Ryan McKenna, Tao Sun, Daniel Sheldon, Michael Hay, Gerome Miklau

    Abstract: We investigate the problem of learning discrete, undirected graphical models in a differentially private way. We show that the approach of releasing noisy sufficient statistics using the Laplace mechanism achieves a good trade-off between privacy, utility, and practicality. A naive learning algorithm that uses the noisy sufficient statistics "as is" outperforms general-purpose differentially priva… ▽ More

    Submitted 14 June, 2017; originally announced June 2017.

    Comments: Accepted to ICML 2017

  19. arXiv:1609.02104  [pdf, other

    cs.DB cs.DC

    A Consumer-Centric Market for Database Computation in the Cloud

    Authors: Yue Wang, Alexandra Meliou, Gerome Miklau

    Abstract: The availability of public computing resources in the cloud has revolutionized data analysis, but requesting cloud resources often involves complex decisions for consumers. Under the current pricing mechanisms, cloud service providers offer several service options and charge consumers based on the resources they use. Before they can decide which cloud resources to request, consumers have to estima… ▽ More

    Submitted 16 June, 2017; v1 submitted 7 September, 2016; originally announced September 2016.

  20. arXiv:1512.04817  [pdf, other

    cs.DB cs.CR

    Principled Evaluation of Differentially Private Algorithms using DPBench

    Authors: Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, Dan Zhang

    Abstract: Differential privacy has become the dominant standard in the research community for strong privacy protection. There has been a flood of research into query answering algorithms that meet this standard. Algorithms are becoming increasingly complex, and in particular, the performance of many emerging algorithms is {\em data dependent}, meaning the distribution of the noise added to query answers ma… ▽ More

    Submitted 15 December, 2015; originally announced December 2015.

  21. arXiv:1410.0265  [pdf, other

    cs.DB

    A Data- and Workload-Aware Algorithm for Range Queries Under Differential Privacy

    Authors: Chao Li, Michael Hay, Gerome Miklau, Yue Wang

    Abstract: We describe a new algorithm for answering a given set of range queries under $ε$-differential privacy which often achieves substantially lower error than competing methods. Our algorithm satisfies differential privacy by adding noise that is adapted to the input data and to the given query set. We first privately learn a partitioning of the domain into buckets that suit the input data well. Then w… ▽ More

    Submitted 1 October, 2014; originally announced October 2014.

    Comments: VLDB 2014

  22. arXiv:1307.8269  [pdf, ps, other

    cs.DB

    Introducing Access Control in Webdamlog

    Authors: Serge Abiteboul, Émilien Antoine, Gerome Miklau, Julia Stoyanovich, Vera Zaychik Moffitt

    Abstract: We survey recent work on the specification of an access control mechanism in a collaborative environment. The work is presented in the context of the WebdamLog language, an extension of datalog to a distributed context. We discuss a fine-grained access control mechanism for intentional data based on provenance as well as a control mechanism for delegation, i.e., for deploying rules at remote peers… ▽ More

    Submitted 31 July, 2013; originally announced July 2013.

    Comments: Proceedings of the 14th International Symposium on Database Programming Languages (DBPL 2013), August 30, 2013, Riva del Garda, Trento, Italy

  23. arXiv:1305.3058  [pdf, ps, other

    cs.DB

    Rule-Based Application Development using Webdamlog

    Authors: Serge Abiteboul, Émilien Antoine, Gerome Miklau, Julia Stoyanovich, Jules Testard

    Abstract: We present the WebdamLog system for managing distributed data on the Web in a peer-to-peer manner. We demonstrate the main features of the system through an application called Wepic for sharing pictures between attendees of the sigmod conference. Using Wepic, the attendees will be able to share, download, rate and annotate pictures in a highly decentralized manner. We show how WebdamLog handles he… ▽ More

    Submitted 14 May, 2013; originally announced May 2013.

    Comments: SIGMOD - Special Interest Group on Management Of Data (2013)

  24. A Theory of Pricing Private Data

    Authors: Chao Li, Daniel Yang Li, Gerome Miklau, Dan Suciu

    Abstract: Personal data has value to both its owner and to institutions who would like to analyze it. Privacy mechanisms protect the owner's data while releasing to analysts noisy versions of aggregate query results. But such strict protections of individual's data have not yet found wide use in practice. Instead, Internet companies, for example, commonly provide free services in return for valuable sensiti… ▽ More

    Submitted 17 December, 2012; v1 submitted 26 August, 2012; originally announced August 2012.

    Comments: 25 pages, 2 figures. Best Paper Award, to appear in the 16th International Conference on Database Theory (ICDT), 2013

    Journal ref: ICDT '13 Proceedings of the 16th International Conference on Database Theory Pages 33-44, 2013

  25. arXiv:1202.3807  [pdf, other

    cs.DB

    An Adaptive Mechanism for Accurate Query Answering under Differential Privacy

    Authors: Chao Li, Gerome Miklau

    Abstract: We propose a novel mechanism for answering sets of count- ing queries under differential privacy. Given a workload of counting queries, the mechanism automatically selects a different set of "strategy" queries to answer privately, using those answers to derive answers to the workload. The main algorithm proposed in this paper approximates the optimal strategy for any workload of linear counting qu… ▽ More

    Submitted 16 February, 2012; originally announced February 2012.

    Comments: VLDB2012. arXiv admin note: substantial text overlap with arXiv:1103.1367

  26. arXiv:1202.3399  [pdf, ps, other

    cs.DB cs.CR

    Optimal error of query sets under the differentially-private matrix mechanism

    Authors: Chao Li, Gerome Miklau

    Abstract: A common goal of privacy research is to release synthetic data that satisfies a formal privacy guarantee and can be used by an analyst in place of the original data. To achieve reasonable accuracy, a synthetic data set must be tuned to support a specified set of queries accurately, sacrificing fidelity for other queries. This work considers methods for producing synthetic data under differential… ▽ More

    Submitted 17 December, 2012; v1 submitted 15 February, 2012; originally announced February 2012.

    Comments: 35 pages; Short version to appear in the 16th International Conference on Database Theory (ICDT), 2013

    ACM Class: H.2.8; F.2

  27. arXiv:1103.1367  [pdf, other

    cs.DB

    Efficient Batch Query Answering Under Differential Privacy

    Authors: Chao Li, Gerome Miklau

    Abstract: Differential privacy is a rigorous privacy condition achieved by randomizing query answers. This paper develops efficient algorithms for answering multiple queries under differential privacy with low error. We pursue this goal by advancing a recent approach called the matrix mechanism, which generalizes standard differentially private mechanisms. This new mechanism works by first answering a diffe… ▽ More

    Submitted 7 March, 2011; originally announced March 2011.

    Comments: 6 figues, 22 pages

  28. arXiv:1005.1934  [pdf, other

    cs.DB cs.AI

    Scalable Probabilistic Databases with Factor Graphs and MCMC

    Authors: Michael Wick, Andrew McCallum, Gerome Miklau

    Abstract: Probabilistic databases play a crucial role in the management and understanding of uncertain data. However, incorporating probabilities into the semantics of incomplete databases has posed many challenges, forcing systems to sacrifice modeling power, scalability, or restrict the class of relational algebra formula under which they are closed. We propose an alternative approach where the underlying… ▽ More

    Submitted 11 May, 2010; originally announced May 2010.

    Comments: Submitted to VLDB 2010

  29. arXiv:0912.4742  [pdf, other

    cs.DB cs.CR

    Optimizing Histogram Queries under Differential Privacy

    Authors: Chao Li, Michael Hay, Vibhor Rastogi, Gerome Miklau, Andrew McGregor

    Abstract: Differential privacy is a robust privacy standard that has been successfully applied to a range of data analysis tasks. Despite much recent work, optimal strategies for answering a collection of correlated queries are not known. We study the problem of devising a set of strategy queries, to be submitted and answered privately, that will support the answers to a given workload of queries. We prop… ▽ More

    Submitted 6 September, 2010; v1 submitted 23 December, 2009; originally announced December 2009.

    Comments: 22 pages, 1 figure

  30. arXiv:0904.0942  [pdf, other

    cs.DB cs.CR

    Boosting the Accuracy of Differentially-Private Histograms Through Consistency

    Authors: Michael Hay, Vibhor Rastogi, Gerome Miklau, Dan Suciu

    Abstract: We show that it is possible to significantly improve the accuracy of a general class of histogram queries while satisfying differential privacy. Our approach carefully chooses a set of queries to evaluate, and then exploits consistency constraints that should hold over the noisy output. In a post-processing phase, we compute the consistent input most likely to have produced the noisy output. The f… ▽ More

    Submitted 8 July, 2010; v1 submitted 6 April, 2009; originally announced April 2009.

    Comments: 15 pages, 7 figures, minor revisions to previous version