-
Debiased Distribution Compression
Authors:
Lingxiao Li,
Raaz Dwivedi,
Lester Mackey
Abstract:
Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i.i.d. sampling but require access to a low-bias input sequence like a Markov chain converging quickly to $\mathbb{P}$. We introduce a new suite of compression methods suitable for compression with biased input sequences. Given $n$ points targeting the wrong distribution and quadratic time, Stein kerne…
▽ More
Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i.i.d. sampling but require access to a low-bias input sequence like a Markov chain converging quickly to $\mathbb{P}$. We introduce a new suite of compression methods suitable for compression with biased input sequences. Given $n$ points targeting the wrong distribution and quadratic time, Stein kernel thinning (SKT) returns $\sqrt{n}$ equal-weighted points with $\widetilde{O}(n^{-1/2})$ maximum mean discrepancy (MMD) to $\mathbb{P}$. For larger-scale compression tasks, low-rank SKT achieves the same feat in sub-quadratic time using an adaptive low-rank debiasing procedure that may be of independent interest. For downstream tasks that support simplex or constant-preserving weights, Stein recombination and Stein Cholesky achieve even greater parsimony, matching the guarantees of SKT with as few as $\text{poly-log}(n)$ weighted points. Underlying these advances are new guarantees for the quality of simplex-weighted coresets, the spectral decay of kernel matrices, and the covering numbers of Stein kernel Hilbert spaces. In our experiments, our techniques provide succinct and accurate posterior summaries while overcoming biases due to burn-in, approximate Markov chain Monte Carlo, and tempering.
△ Less
Submitted 26 May, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
FaceFilterSense: A Filter-Resistant Face Recognition and Facial Attribute Analysis Framework
Authors:
Shubham Tiwari,
Yash Sethia,
Ritesh Kumar,
Ashwani Tanwar,
Rudresh Dwivedi
Abstract:
With the advent of social media, fun selfie filters have come into tremendous mainstream use affecting the functioning of facial biometric systems as well as image recognition systems. These filters vary from beautification filters and Augmented Reality (AR)-based filters to filters that modify facial landmarks. Hence, there is a need to assess the impact of such filters on the performance of exis…
▽ More
With the advent of social media, fun selfie filters have come into tremendous mainstream use affecting the functioning of facial biometric systems as well as image recognition systems. These filters vary from beautification filters and Augmented Reality (AR)-based filters to filters that modify facial landmarks. Hence, there is a need to assess the impact of such filters on the performance of existing face recognition systems. The limitation associated with existing solutions is that these solutions focus more on the beautification filters. However, the current AR-based filters and filters which distort facial key points are in vogue recently and make the faces highly unrecognizable even to the naked eye. Also, the filters considered are mostly obsolete with limited variations. To mitigate these limitations, we aim to perform a holistic impact analysis of the latest filters and propose an user recognition model with the filtered images. We have utilized a benchmark dataset for baseline images, and applied the latest filters over them to generate a beautified/filtered dataset. Next, we have introduced a model FaceFilterNet for beautified user recognition. In this framework, we also utilize our model to comment on various attributes of the person including age, gender, and ethnicity. In addition, we have also presented a filter-wise impact analysis on face recognition, age estimation, gender, and ethnicity prediction. The proposed method affirms the efficacy of our dataset with an accuracy of 87.25% and an optimal accuracy for facial attribute analysis.
△ Less
Submitted 18 April, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
FairPair: A Robust Evaluation of Biases in Language Models through Paired Perturbations
Authors:
Jane Dwivedi-Yu,
Raaz Dwivedi,
Timo Schick
Abstract:
The accurate evaluation of differential treatment in language models to specific groups is critical to ensuring a positive and safe user experience. An ideal evaluation should have the properties of being robust, extendable to new groups or attributes, and being able to capture biases that appear in typical usage (rather than just extreme, rare cases). Relatedly, bias evaluation should surface not…
▽ More
The accurate evaluation of differential treatment in language models to specific groups is critical to ensuring a positive and safe user experience. An ideal evaluation should have the properties of being robust, extendable to new groups or attributes, and being able to capture biases that appear in typical usage (rather than just extreme, rare cases). Relatedly, bias evaluation should surface not only egregious biases but also ones that are subtle and commonplace, such as a likelihood for talking about appearances with regard to women. We present FairPair, an evaluation framework for assessing differential treatment that occurs during ordinary usage. FairPair operates through counterfactual pairs, but crucially, the paired continuations are grounded in the same demographic group, which ensures equivalent comparison. Additionally, unlike prior work, our method factors in the inherent variability that comes from the generation process itself by measuring the sampling variability. We present an evaluation of several commonly used generative models and a qualitative analysis that indicates a preference for discussing family and hobbies with regard to women.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Doubly Robust Inference in Causal Latent Factor Models
Authors:
Alberto Abadie,
Anish Agarwal,
Raaz Dwivedi,
Abhin Shah
Abstract:
This article introduces a new estimator of average treatment effects under unobserved confounding in modern data-rich environments featuring large numbers of units and outcomes. The proposed estimator is doubly robust, combining outcome imputation, inverse probability weighting, and a novel cross-fitting procedure for matrix completion. We derive finite-sample and asymptotic guarantees, and show t…
▽ More
This article introduces a new estimator of average treatment effects under unobserved confounding in modern data-rich environments featuring large numbers of units and outcomes. The proposed estimator is doubly robust, combining outcome imputation, inverse probability weighting, and a novel cross-fitting procedure for matrix completion. We derive finite-sample and asymptotic guarantees, and show that the error of the new estimator converges to a mean-zero Gaussian distribution at a parametric rate. Simulation results demonstrate the practical relevance of the formal properties of the estimators analyzed in this article.
△ Less
Submitted 15 April, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Incorporating Zero-Knowledge Succinct Non-interactive Argument of Knowledge for Blockchain-based Identity Management with off-chain computations
Authors:
Pranay Kothari,
Deepak Chopra,
Manjot Singh,
Shivam Bhardwaj,
Rudresh Dwivedi
Abstract:
In today's world, secure and efficient biometric authentication is of keen importance. Traditional authentication methods are no longer considered reliable due to their susceptibility to cyber-attacks. Biometric authentication, particularly fingerprint authentication, has emerged as a promising alternative, but it raises concerns about the storage and use of biometric data, as well as centralized…
▽ More
In today's world, secure and efficient biometric authentication is of keen importance. Traditional authentication methods are no longer considered reliable due to their susceptibility to cyber-attacks. Biometric authentication, particularly fingerprint authentication, has emerged as a promising alternative, but it raises concerns about the storage and use of biometric data, as well as centralized storage, which could make it vulnerable to cyber-attacks. In this paper, a novel blockchain-based fingerprint authentication system is proposed that integrates zk-SNARKs, which are zero-knowledge proofs that enable secure and efficient authentication without revealing sensitive biometric information. A KNN-based approach on the FVC2002, FVC2004 and FVC2006 datasets is used to generate a cancelable template for secure, faster, and robust biometric registration and authentication which is stored using the Interplanetary File System. The proposed approach provides an average accuracy of 99.01%, 98.97% and 98.52% over the FVC2002, FVC2004 and FVC2006 datasets respectively for fingerprint authentication. Incorporation of zk-SNARK facilitates smaller proof size. Overall, the proposed method has the potential to provide a secure and efficient solution for blockchain-based identity management.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Blockchain-Based Transferable Digital Rights of Land
Authors:
Ras Dwivedi,
Sumit Patel,
Prof. Sandeep Shukla
Abstract:
Land, being a scarce and valuable resource, is in high demand, especially in densely populated areas of older cities. Development authorities require land for infrastructure projects and other amenities, while landowners hold onto their land for both its usage and its financial value. Transferable Development Rights (TDRs) serve as a mechanism to separate the development rights associated with the…
▽ More
Land, being a scarce and valuable resource, is in high demand, especially in densely populated areas of older cities. Development authorities require land for infrastructure projects and other amenities, while landowners hold onto their land for both its usage and its financial value. Transferable Development Rights (TDRs) serve as a mechanism to separate the development rights associated with the land from the physical land itself. Development authorities acquire the land by offering compensation in the form of TDRs, which hold monetary value. In this paper, we present the tokenization of development rights, focusing on the implementation in collaboration with a development authority. While there have been previous implementations of land tokenization, we believe our approach is the first to tokenize development rights specifically. Our implementation addresses practical challenges related to record-keeping, ground verification of land, and the unique identification of stakeholders. We ensure the accurate evaluation of development rights by incorporating publicly available circle rates, which consider the ground development of the land and its surrounding areas.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
An Efficient Ensemble Explainable AI (XAI) Approach for Morphed Face Detection
Authors:
Rudresh Dwivedi,
Ritesh Kumar,
Deepak Chopra,
Pranay Kothari,
Manjot Singh
Abstract:
The extensive utilization of biometric authentication systems have emanated attackers / imposters to forge user identity based on morphed images. In this attack, a synthetic image is produced and merged with genuine. Next, the resultant image is user for authentication. Numerous deep neural convolutional architectures have been proposed in literature for face Morphing Attack Detection (MADs) to pr…
▽ More
The extensive utilization of biometric authentication systems have emanated attackers / imposters to forge user identity based on morphed images. In this attack, a synthetic image is produced and merged with genuine. Next, the resultant image is user for authentication. Numerous deep neural convolutional architectures have been proposed in literature for face Morphing Attack Detection (MADs) to prevent such attacks and lessen the risks associated with them. Although, deep learning models achieved optimal results in terms of performance, it is difficult to understand and analyse these networks since they are black box/opaque in nature. As a consequence, incorrect judgments may be made. There is, however, a dearth of literature that explains decision-making methods of black box deep learning models for biometric Presentation Attack Detection (PADs) or MADs that can aid the biometric community to have trust in deep learning-based biometric systems for identification and authentication in various security applications such as border control, criminal database establishment etc. In this work, we present a novel visual explanation approach named Ensemble XAI integrating Saliency maps, Class Activation Maps (CAM) and Gradient-CAM (Grad-CAM) to provide a more comprehensive visual explanation for a deep learning prognostic model (EfficientNet-B1) that we have employed to predict whether the input presented to a biometric authentication system is morphed or genuine. The experimentations have been performed on three publicly available datasets namely Face Research Lab London Set, Wide Multi-Channel Presentation Attack (WMCA), and Makeup Induced Face Spoofing (MIFS). The experimental evaluations affirms that the resultant visual explanations highlight more fine-grained details of image features/areas focused by EfficientNet-B1 to reach decisions along with appropriate reasoning.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling
Authors:
Susobhan Ghosh,
Raphael Kim,
Prasidh Chhabria,
Raaz Dwivedi,
Predrag Klasnja,
Peng Liao,
Kelly Zhang,
Susan Murphy
Abstract:
There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this pro…
▽ More
There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an ``optimized'' intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.
△ Less
Submitted 7 August, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Compress Then Test: Powerful Kernel Testing in Near-linear Time
Authors:
Carles Domingo-Enrich,
Raaz Dwivedi,
Lester Mackey
Abstract:
Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on $n$ sample points. However, existing kernel tests either run in $n^2$ time or sacrifice undue power to improve runtime. To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression. CTT cheaply approximate…
▽ More
Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on $n$ sample points. However, existing kernel tests either run in $n^2$ time or sacrifice undue power to improve runtime. To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression. CTT cheaply approximates an expensive test by compressing each $n$ point sample into a small but provably high-fidelity coreset. For standard kernels and subexponential distributions, CTT inherits the statistical behavior of a quadratic-time test -- recovering the same optimal detection boundary -- while running in near-linear time. We couple these advances with cheaper permutation testing, justified by new power analyses; improved time-vs.-quality guarantees for low-rank approximation; and a fast aggregation procedure for identifying especially discriminating kernels. In our experiments with real and simulated data, CTT and its extensions provide 20--200x speed-ups over state-of-the-art approximate MMD tests with no loss of power.
△ Less
Submitted 23 February, 2023; v1 submitted 14 January, 2023;
originally announced January 2023.
-
Doubly robust nearest neighbors in factor models
Authors:
Raaz Dwivedi,
Katherine Tian,
Sabina Tomkins,
Predrag Klasnja,
Susan Murphy,
Devavrat Shah
Abstract:
We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for esti…
▽ More
We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for estimating the mean $f(u_i, v_t)$ relies on existence of other rows $j$ with $u_j \approx u_i$. Similarly, time-time NN strategy relies on existence of columns $t'$ with $v_{t'} \approx v_t$. These strategies provide poor performance respectively when similar rows or similar columns are not available. Our estimate is doubly robust to this deficit in two ways: (1) As long as there exist either good row or good column neighbors, our estimate provides a consistent estimate. (2) Furthermore, if both good row and good column neighbors exist, it provides a (near-)quadratic improvement in the non-asymptotic error and admits a significantly narrower asymptotic confidence interval when compared to both unit-unit or time-time NN.
△ Less
Submitted 29 January, 2024; v1 submitted 25 November, 2022;
originally announced November 2022.
-
On counterfactual inference with unobserved confounding
Authors:
Abhin Shah,
Raaz Dwivedi,
Devavrat Shah,
Gregory W. Wornell
Abstract:
Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one $p$-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneit…
▽ More
Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one $p$-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the conditional distribution of the outcomes as an exponential family, we reduce learning the unit-level counterfactual distributions to learning $n$ exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are $s$-sparse linear combination of $k$ known vectors, the error is $O(s\log k/p)$. En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing covariates.
△ Less
Submitted 14 September, 2023; v1 submitted 13 November, 2022;
originally announced November 2022.
-
Counterfactual inference for sequential experiments
Authors:
Raaz Dwivedi,
Katherine Tian,
Sabina Tomkins,
Predrag Klasnja,
Susan Murphy,
Devavrat Shah
Abstract:
We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumpt…
▽ More
We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to $\infty$ together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps.
△ Less
Submitted 16 April, 2023; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Distribution Compression in Near-linear Time
Authors:
Abhishek Shetty,
Raaz Dwivedi,
Lester Mackey
Abstract:
In distribution compression, one aims to accurately summarize a probability distribution $\mathbb{P}$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$. Unfortunately, these algorithms suffer from quadrat…
▽ More
In distribution compression, one aims to accurately summarize a probability distribution $\mathbb{P}$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$. Unfortunately, these algorithms suffer from quadratic or super-quadratic runtime in the sample size $n$. To address this deficiency, we introduce Compress++, a simple meta-procedure for speeding up any thinning algorithm while suffering at most a factor of $4$ in error. When combined with the quadratic-time kernel halving and kernel thinning algorithms of Dwivedi and Mackey (2021), Compress++ delivers $\sqrt{n}$ points with $\mathcal{O}(\sqrt{\log n/n})$ integration error and better-than-Monte-Carlo maximum mean discrepancy in $\mathcal{O}(n \log^3 n)$ time and $\mathcal{O}( \sqrt{n} \log^2 n )$ space. Moreover, Compress++ enjoys the same near-linear runtime given any quadratic-time input and reduces the runtime of super-quadratic algorithms by a square-root factor. In our benchmarks with high-dimensional Monte Carlo samples and Markov chains targeting challenging differential equation posteriors, Compress++ matches or nearly matches the accuracy of its input algorithm in orders of magnitude less time.
△ Less
Submitted 17 October, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Generalized Kernel Thinning
Authors:
Raaz Dwivedi,
Lester Mackey
Abstract:
The kernel thinning (KT) algorithm of Dwivedi and Mackey (2021) compresses a probability distribution more effectively than independent sampling by targeting a reproducing kernel Hilbert space (RKHS) and leveraging a less smooth square-root kernel. Here we provide four improvements. First, we show that KT applied directly to the target RKHS yields tighter, dimension-free guarantees for any kernel,…
▽ More
The kernel thinning (KT) algorithm of Dwivedi and Mackey (2021) compresses a probability distribution more effectively than independent sampling by targeting a reproducing kernel Hilbert space (RKHS) and leveraging a less smooth square-root kernel. Here we provide four improvements. First, we show that KT applied directly to the target RKHS yields tighter, dimension-free guarantees for any kernel, any distribution, and any fixed function in the RKHS. Second, we show that, for analytic kernels like Gaussian, inverse multiquadric, and sinc, target KT admits maximum mean discrepancy (MMD) guarantees comparable to or better than those of square-root KT without making explicit use of a square-root kernel. Third, we prove that KT with a fractional power kernel yields better-than-Monte-Carlo MMD guarantees for non-smooth kernels, like Laplace and Matérn, that do not have square-roots. Fourth, we establish that KT applied to a sum of the target and power kernels (a procedure we call KT+) simultaneously inherits the improved MMD guarantees of power KT and the tighter individual function guarantees of target KT. In our experiments with target KT and KT+, we witness significant improvements in integration error even in $100$ dimensions and when compressing challenging differential equation posteriors.
△ Less
Submitted 19 July, 2022; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Kernel Thinning
Authors:
Raaz Dwivedi,
Lester Mackey
Abstract:
We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}_{\star}$ and $O(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated…
▽ More
We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}_{\star}$ and $O(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. The maximum discrepancy in integration error is $O_d(n^{-1/2}\sqrt{\log n})$ in probability for compactly supported $\mathbb{P}$ and $O_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $Ω(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. Moreover, the same construction delivers near-optimal $L^\infty$ coresets in $O(n^2)$ time. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Matérn, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.
△ Less
Submitted 11 May, 2024; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Towards Designing Computer Vision-based Explainable-AI Solution: A Use Case of Livestock Mart Industry
Authors:
Devam Dave,
Het Naik,
Smiti Singhal,
Rudresh Dwivedi,
Pankesh Patel
Abstract:
The objective of an online Mart is to match buyers and sellers, to weigh animals and to oversee their sale. A reliable pricing method can be developed by ML models that can read through historical sales data. However, when AI models suggest or recommend a price, that in itself does not reveal too much (i.e., it acts like a black box) about the qualities and the abilities of an animal. An intereste…
▽ More
The objective of an online Mart is to match buyers and sellers, to weigh animals and to oversee their sale. A reliable pricing method can be developed by ML models that can read through historical sales data. However, when AI models suggest or recommend a price, that in itself does not reveal too much (i.e., it acts like a black box) about the qualities and the abilities of an animal. An interested buyer would like to know more about the salient features of an animal before making the right choice based on his requirements. A model capable of explaining the different factors that impact the price point is essential for the needs of the market. It can also inspire confidence in buyers and sellers about the price point offered. To achieve these objectives, we have been working with the team at MartEye, a startup based in Portershed in Galway City, Ireland. Through this paper, we report our work-in-progress research towards building a smart video analytic platform, leveraging Explainable AI techniques.
△ Less
Submitted 8 February, 2021;
originally announced March 2021.
-
A New Mask R-CNN Based Method for Improved Landslide Detection
Authors:
Silvia Liberata Ullo,
Amrita Mohan,
Alessandro Sebastianelli,
Shaik Ejaz Ahamed,
Basant Kumar,
Ramji Dwivedi,
G. R. Sinha
Abstract:
This paper presents a novel method of landslide detection by exploiting the Mask R-CNN capability of identifying an object layout by using a pixel-based segmentation, along with transfer learning used to train the proposed model. A data set of 160 elements is created containing landslide and non-landslide images. The proposed method consists of three steps: (i) augmenting training image samples to…
▽ More
This paper presents a novel method of landslide detection by exploiting the Mask R-CNN capability of identifying an object layout by using a pixel-based segmentation, along with transfer learning used to train the proposed model. A data set of 160 elements is created containing landslide and non-landslide images. The proposed method consists of three steps: (i) augmenting training image samples to increase the volume of the training data, (ii) fine tuning with limited image samples, and (iii) performance evaluation of the algorithm in terms of precision, recall and F1 measure, on the considered landslide images, by adopting ResNet-50 and 101 as backbone models. The experimental results are quite encouraging as the proposed method achieves Precision equals to 1.00, Recall 0.93 and F1 measure 0.97, when ResNet-101 is used as backbone model, and with a low number of landslide photographs used as training samples. The proposed algorithm can be potentially useful for land use planners and policy makers of hilly areas where intermittent slope deformations necessitate landslide detection as prerequisite before planning.
△ Less
Submitted 4 October, 2020;
originally announced October 2020.
-
Stable discovery of interpretable subgroups via calibration in causal studies
Authors:
Raaz Dwivedi,
Yan Shuo Tan,
Briton Park,
Mian Wei,
Kevin Horgan,
David Madigan,
Bin Yu
Abstract:
Building on Yu and Kumbier's PCS framework and for randomized experiments, we introduce a novel methodology for Stable Discovery of Interpretable Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects. StaDISC was developed during our re-analysis of the 1999-2000 VIGOR study, an 8076 patient randomized controlled trial (RCT), that compared the risk of adverse events from a…
▽ More
Building on Yu and Kumbier's PCS framework and for randomized experiments, we introduce a novel methodology for Stable Discovery of Interpretable Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects. StaDISC was developed during our re-analysis of the 1999-2000 VIGOR study, an 8076 patient randomized controlled trial (RCT), that compared the risk of adverse events from a then newly approved drug, Rofecoxib (Vioxx), to that from an older drug Naproxen. Vioxx was found to, on average and in comparison to Naproxen, reduce the risk of gastrointestinal (GI) events but increase the risk of thrombotic cardiovascular (CVT) events. Applying StaDISC, we fit 18 popular conditional average treatment effect (CATE) estimators for both outcomes and use calibration to demonstrate their poor global performance. However, they are locally well-calibrated and stable, enabling the identification of patient groups with larger than (estimated) average treatment effects. In fact, StaDISC discovers three clinically interpretable subgroups each for the GI outcome (totaling 29.4% of the study size) and the CVT outcome (totaling 11.0%). Complementary analyses of the found subgroups using the 2001-2004 APPROVe study, a separate independently conducted RCT with 2587 patients, provides further supporting evidence for the promise of StaDISC.
△ Less
Submitted 28 September, 2020; v1 submitted 23 August, 2020;
originally announced August 2020.
-
Revisiting minimum description length complexity in overparameterized models
Authors:
Raaz Dwivedi,
Chandan Singh,
Bin Yu,
Martin J. Wainwright
Abstract:
Complexity is a fundamental concept underlying statistical learning theory that aims to inform generalization performance. Parameter count, while successful in low-dimensional settings, is not well-justified for overparameterized settings when the number of parameters is more than the number of training samples. We revisit complexity measures based on Rissanen's principle of minimum description le…
▽ More
Complexity is a fundamental concept underlying statistical learning theory that aims to inform generalization performance. Parameter count, while successful in low-dimensional settings, is not well-justified for overparameterized settings when the number of parameters is more than the number of training samples. We revisit complexity measures based on Rissanen's principle of minimum description length (MDL) and define a novel MDL-based complexity (MDL-COMP) that remains valid for overparameterized models. MDL-COMP is defined via an optimality criterion over the encodings induced by a good Ridge estimator class. We provide an extensive theoretical characterization of MDL-COMP for linear models and kernel methods and show that it is not just a function of parameter count, but rather a function of the singular values of the design or the kernel matrix and the signal-to-noise ratio. For a linear model with $n$ observations, $d$ parameters, and i.i.d. Gaussian predictors, MDL-COMP scales linearly with $d$ when $d<n$, but the scaling is exponentially smaller -- $\log d$ for $d>n$. For kernel methods, we show that MDL-COMP informs minimax in-sample error, and can decrease as the dimensionality of the input increases. We also prove that MDL-COMP upper bounds the in-sample mean squared error (MSE). Via an array of simulations and real-data experiments, we show that a data-driven Prac-MDL-COMP informs hyper-parameter tuning for optimizing test MSE with ridge regression in limited data settings, sometimes improving upon cross-validation and (always) saving computational costs. Finally, our findings also suggest that the recently observed double decent phenomenons in overparameterized models might be a consequence of the choice of non-ideal estimators.
△ Less
Submitted 12 October, 2023; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Instability, Computational Efficiency and Statistical Accuracy
Authors:
Nhat Ho,
Koulik Khamaru,
Raaz Dwivedi,
Martin J. Wainwright,
Michael I. Jordan,
Bin Yu
Abstract:
Many statistical estimators are defined as the fixed point of a data-dependent operator, with estimators based on minimizing a cost function being an important special case. The limiting performance of such estimators depends on the properties of the population-level operator in the idealized limit of infinitely many samples. We develop a general framework that yields bounds on statistical accurac…
▽ More
Many statistical estimators are defined as the fixed point of a data-dependent operator, with estimators based on minimizing a cost function being an important special case. The limiting performance of such estimators depends on the properties of the population-level operator in the idealized limit of infinitely many samples. We develop a general framework that yields bounds on statistical accuracy based on the interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (in)stability when applied to an empirical object based on $n$ samples. Using this framework, we analyze both stable forms of gradient descent and some higher-order and unstable algorithms, including Newton's method and its cubic-regularized variant, as well as the EM algorithm. We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models. We exhibit cases in which an unstable algorithm can achieve the same statistical accuracy as a stable algorithm in exponentially fewer steps -- namely, with the number of iterations being reduced from polynomial to logarithmic in sample size $n$.
△ Less
Submitted 20 March, 2022; v1 submitted 22 May, 2020;
originally announced May 2020.
-
Curating a COVID-19 data repository and forecasting county-level death counts in the United States
Authors:
Nick Altieri,
Rebecca L. Barter,
James Duncan,
Raaz Dwivedi,
Karl Kumbier,
Xiao Li,
Robert Netzorg,
Briton Park,
Chandan Singh,
Yan Shuo Tan,
Tiffany Tang,
Yu Wang,
Chao Zhang,
Bin Yu
Abstract:
As the COVID-19 outbreak evolves, accurate forecasting continues to play an extremely important role in informing policy decisions. In this paper, we present our continuous curation of a large data repository containing COVID-19 information from a range of sources. We use this data to develop predictions and corresponding prediction intervals for the short-term trajectory of COVID-19 cumulative de…
▽ More
As the COVID-19 outbreak evolves, accurate forecasting continues to play an extremely important role in informing policy decisions. In this paper, we present our continuous curation of a large data repository containing COVID-19 information from a range of sources. We use this data to develop predictions and corresponding prediction intervals for the short-term trajectory of COVID-19 cumulative death counts at the county-level in the United States up to two weeks ahead. Using data from January 22 to June 20, 2020, we develop and combine multiple forecasts using ensembling techniques, resulting in an ensemble we refer to as Combined Linear and Exponential Predictors (CLEP). Our individual predictors include county-specific exponential and linear predictors, a shared exponential predictor that pools data together across counties, an expanded shared exponential predictor that uses data from neighboring counties, and a demographics-based shared exponential predictor. We use prediction errors from the past five days to assess the uncertainty of our death predictions, resulting in generally-applicable prediction intervals, Maximum (absolute) Error Prediction Intervals (MEPI). MEPI achieves a coverage rate of more than 94% when averaged across counties for predicting cumulative recorded death counts two weeks in the future. Our forecasts are currently being used by the non-profit organization, Response4Life, to determine the medical supply need for individual hospitals and have directly contributed to the distribution of medical supplies across the country. We hope that our forecasts and data repository at https://covidseverity.com can help guide necessary county-specific decision-making and help counties prepare for their continued fight against COVID-19.
△ Less
Submitted 9 August, 2020; v1 submitted 16 May, 2020;
originally announced May 2020.
-
Trees and Islands -- Machine learning approach to nuclear physics
Authors:
Nishchal R. Dwivedi
Abstract:
We implement machine learning algorithms to nuclear data. These algorithms are purely data driven and generate models that are capable to capture intricate trends. Gradient boosted trees algorithm is employed to generate a trained model from existing nuclear data, which is used for prediction for data of damping parameter, shell correction energies, quadrupole deformation, pairing gaps, level dens…
▽ More
We implement machine learning algorithms to nuclear data. These algorithms are purely data driven and generate models that are capable to capture intricate trends. Gradient boosted trees algorithm is employed to generate a trained model from existing nuclear data, which is used for prediction for data of damping parameter, shell correction energies, quadrupole deformation, pairing gaps, level densities and giant dipole resonance for large number of nuclei. We, in particular, predict level density parameter for superheavy elements which is of great current interest. The predictions made by the machine learning algorithm is found to have standard deviation from 0.00035 to 0.73.
△ Less
Submitted 23 July, 2019;
originally announced July 2019.
-
Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradients
Authors:
Yuansi Chen,
Raaz Dwivedi,
Martin J. Wainwright,
Bin Yu
Abstract:
Hamiltonian Monte Carlo (HMC) is a state-of-the-art Markov chain Monte Carlo sampling algorithm for drawing samples from smooth probability densities over continuous spaces. We study the variant most widely used in practice, Metropolized HMC with the Störmer-Verlet or leapfrog integrator, and make two primary contributions. First, we provide a non-asymptotic upper bound on the mixing time of the M…
▽ More
Hamiltonian Monte Carlo (HMC) is a state-of-the-art Markov chain Monte Carlo sampling algorithm for drawing samples from smooth probability densities over continuous spaces. We study the variant most widely used in practice, Metropolized HMC with the Störmer-Verlet or leapfrog integrator, and make two primary contributions. First, we provide a non-asymptotic upper bound on the mixing time of the Metropolized HMC with explicit choices of step-size and number of leapfrog steps. This bound gives a precise quantification of the faster convergence of Metropolized HMC relative to simpler MCMC algorithms such as the Metropolized random walk, or Metropolized Langevin algorithm. Second, we provide a general framework for sharpening mixing time bounds of Markov chains initialized at a substantial distance from the target distribution over continuous spaces. We apply this sharpening device to the Metropolized random walk and Langevin algorithms, thereby obtaining improved mixing time bounds from a non-warm initial distribution.
△ Less
Submitted 11 January, 2021; v1 submitted 29 May, 2019;
originally announced May 2019.
-
Sharp Analysis of Expectation-Maximization for Weakly Identifiable Models
Authors:
Raaz Dwivedi,
Nhat Ho,
Koulik Khamaru,
Martin J. Wainwright,
Michael I. Jordan,
Bin Yu
Abstract:
We study a class of weakly identifiable location-scale mixture models for which the maximum likelihood estimates based on $n$ i.i.d. samples are known to have lower accuracy than the classical $n^{- \frac{1}{2}}$ error. We investigate whether the Expectation-Maximization (EM) algorithm also converges slowly for these models. We provide a rigorous characterization of EM for fitting a weakly identif…
▽ More
We study a class of weakly identifiable location-scale mixture models for which the maximum likelihood estimates based on $n$ i.i.d. samples are known to have lower accuracy than the classical $n^{- \frac{1}{2}}$ error. We investigate whether the Expectation-Maximization (EM) algorithm also converges slowly for these models. We provide a rigorous characterization of EM for fitting a weakly identifiable Gaussian mixture in a univariate setting where we prove that the EM algorithm converges in order $n^{\frac{3}{4}}$ steps and returns estimates that are at a Euclidean distance of order ${ n^{- \frac{1}{8}}}$ and ${ n^{-\frac{1} {4}}}$ from the true location and scale parameter respectively. Establishing the slow rates in the univariate setting requires a novel localization argument with two stages, with each stage involving an epoch-based argument applied to a different surrogate EM operator at the population level. We demonstrate several multivariate ($d \geq 2$) examples that exhibit the same slow rates as the univariate case. We also prove slow statistical rates in higher dimensions in a special case, when the fitted covariance is constrained to be a multiple of the identity.
△ Less
Submitted 15 November, 2021; v1 submitted 1 February, 2019;
originally announced February 2019.
-
A non-invertible cancelable fingerprint template generation based on ridge feature transformation
Authors:
Rudresh Dwivedi,
Somnath Dey
Abstract:
In a biometric verification system, leakage of biometric data leads to permanent identity loss since original biometric data is inherently linked to a user. Further, various types of attacks on a biometric system may reveal the original template and utility in other applications. To address these security and privacy concerns cancelable biometric has been introduced. Cancelable biometric construct…
▽ More
In a biometric verification system, leakage of biometric data leads to permanent identity loss since original biometric data is inherently linked to a user. Further, various types of attacks on a biometric system may reveal the original template and utility in other applications. To address these security and privacy concerns cancelable biometric has been introduced. Cancelable biometric constructs a protected template from the original biometric template using transformation functions and performs the comparison between templates in the transformed domain. Recent approaches towards cancelable fingerprint generation either rely on aligning minutiae points with respect to singular points (core/delta) or utilize the absolute coordinate positions of minutiae points. In this paper, we propose a novel non-invertible ridge feature transformation method to protect the original fingerprint template information. The proposed method partitions the fingerprint region into a number of sectors with reference to each minutia point employing a ridge-based co-ordinate system. The nearest neighbor minutiae in each sector are identified, and ridge-based features are computed. Further, a cancelable template is generated by applying the Cantor pairing function followed by random projection. We have evaluated our method with FVC2002, FVC2004 and FVC2006 databases. It is evident from the experimental results that the proposed method outperforms existing methods in the literature. Moreover, the security analysis demonstrates that the proposed method fulfills the necessary requirements of non-invertibility, revocability, and diversity with a minor performance degradation caused due to cancelable transformation.
△ Less
Submitted 28 May, 2018;
originally announced May 2018.
-
A novel hybrid score level and decision level fusion scheme for cancelable multi-biometric verification
Authors:
Rudresh Dwivedi,
Somnath Dey
Abstract:
In spite of the benefits of biometric-based authentication systems, there are few concerns raised because of the sensitivity of biometric data to outliers, low performance caused due to intra-class variations and privacy invasion caused by information leakage. To address these issues, we propose a hybrid fusion framework where only the protected modalities are combined to fulfill the requirement o…
▽ More
In spite of the benefits of biometric-based authentication systems, there are few concerns raised because of the sensitivity of biometric data to outliers, low performance caused due to intra-class variations and privacy invasion caused by information leakage. To address these issues, we propose a hybrid fusion framework where only the protected modalities are combined to fulfill the requirement of secrecy and performance improvement. This paper presents a method to integrate cancelable modalities utilizing mean-closure weighting (MCW) score level and Dempster-Shafer (DS) theory based decision level fusion for iris and fingerprint to mitigate the limitations in the individual score or decision fusion mechanisms. The proposed hybrid fusion scheme incorporates the similarity scores from different matchers corresponding to each protected modality. The individual scores obtained from different matchers for each modality are combined using MCW score fusion method. The MCW technique achieves the optimal weight for each matcher involved in the score computation. Further, DS theory is applied to the induced scores to output the final decision. The rigorous experimental evaluations on three virtual databases indicate that the proposed hybrid fusion framework outperforms over the component level or individual fusion methods (score level and decision level fusion). As a result, we achieve (48%,66%), (72%,86%) and (49%,38%) of performance improvement over unimodal cancelable iris and unimodal cancelable fingerprint verification systems for Virtual_A, Virtual_B and Virtual_C databases, respectively. Also, the proposed method is robust enough to the variability of scores and outliers satisfying the requirement of secure authentication.
△ Less
Submitted 26 May, 2018;
originally announced May 2018.
-
Generating protected fingerprint template utilizing coprime mapping transformation
Authors:
Rudresh Dwivedi,
Somnath Dey
Abstract:
The identity of a user is permanently lost if biometric data gets compromised since the biometric information is irreplaceable and irrevocable. To revoke and reissue a new template in place of the compromised biometric template, the idea of cancelable biometrics has been introduced. The concept behind cancelable biometric is to irreversibly transform the original biometric template and perform the…
▽ More
The identity of a user is permanently lost if biometric data gets compromised since the biometric information is irreplaceable and irrevocable. To revoke and reissue a new template in place of the compromised biometric template, the idea of cancelable biometrics has been introduced. The concept behind cancelable biometric is to irreversibly transform the original biometric template and perform the comparison in the protected domain. In this paper, a coprime transformation scheme has been proposed to derive a protected fingerprint template. The method divides the fingerprint region into a number of sectors with respect to each minutiae point and identifies the nearest-neighbor minutiae in each sector. Then, ridge features for all neighboring minutiae points are computed and mapped onto co-prime positions of a random matrix to generate the cancelable template. The proposed approach achieves an EER of 1.82, 1.39, 4.02 and 5.77 on DB1, DB2, DB3 and DB4 datasets of the FVC2002 and an EER of 8.70, 7.95, 5.23 and 4.87 on DB1, DB2, DB3 and DB4 datasets of FVC2004 databases, respectively. Experimental evaluations indicate that the method outperforms in comparison to the current state-of-the-art. Moreover, it has been confirmed from the security analysis that the proposed method fulfills the desired characteristics of diversity, revocability, and non-invertibility with a minor performance degradation caused by the transformation.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
A fingerprint based crypto-biometric system for secure communication
Authors:
Rudresh Dwivedi,
Somnath Dey,
Mukul Anand Sharma,
Apurv Goel
Abstract:
To ensure the secure transmission of data, cryptography is treated as the most effective solution. Cryptographic key is an important entity in this procedure. In general, randomly generated cryptographic key (of 256 bits) is difficult to remember. However, such a key needs to be stored in a protected place or transported through a shared communication line which, in fact, poses another threat to s…
▽ More
To ensure the secure transmission of data, cryptography is treated as the most effective solution. Cryptographic key is an important entity in this procedure. In general, randomly generated cryptographic key (of 256 bits) is difficult to remember. However, such a key needs to be stored in a protected place or transported through a shared communication line which, in fact, poses another threat to security. As an alternative, researchers advocate the generation of cryptographic key using the biometric traits of both sender and receiver during the sessions of communication, thus avoiding key storing and at the same time without compromising the strength in security. Nevertheless, the biometric-based cryptographic key generation possesses few concerns such as privacy of biometrics, sharing of biometric data between both communicating users (i.e., sender and receiver), and generating revocable key from irrevocable biometric. This work addresses the above-mentioned concerns.
In this work, a framework for secure communication between two users using fingerprint based crypto-biometric system has been proposed. For this, Diffie-Hellman (DH) algorithm is used to generate public keys from private keys of both sender and receiver which are shared and further used to produce a symmetric cryptographic key at both ends. In this approach, revocable key for symmetric cryptography is generated from irrevocable fingerprint. The biometric data is neither stored nor shared which ensures the security of biometric data, and perfect forward secrecy is achieved using session keys. This work also ensures the long-term security of messages communicated between two users. Based on the experimental evaluation over four datasets of FVC2002 and NIST special database, the proposed framework is privacy-preserving and could be utilized onto real access control systems.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.
-
The power of online thinning in reducing discrepancy
Authors:
Raaz Dwivedi,
Ohad N. Feldheim,
Ori Gurel-Gurevich,
Aaditya Ramdas
Abstract:
Consider an infinite sequence of independent, uniformly chosen points from $[0,1]^d$. After looking at each point in the sequence, an overseer is allowed to either keep it or reject it, and this choice may depend on the locations of all previously kept points. However, the overseer must keep at least one of every two consecutive points. We call a sequence generated in this fashion a \emph{two-thin…
▽ More
Consider an infinite sequence of independent, uniformly chosen points from $[0,1]^d$. After looking at each point in the sequence, an overseer is allowed to either keep it or reject it, and this choice may depend on the locations of all previously kept points. However, the overseer must keep at least one of every two consecutive points. We call a sequence generated in this fashion a \emph{two-thinning} sequence. Here, the purpose of the overseer is to control the discrepancy of the empirical distribution of points, that is, after selecting $n$ points, to reduce the maximal deviation of the number of points inside any axis-parallel hyper-rectangle of volume $A$ from $nA$. Our main result is an explicit low complexity two-thinning strategy which guarantees discrepancy of $O(\log^{2d+1} n)$ for all $n$ with high probability (compare with $Θ(\sqrt{n\log\log n})$ without thinning). The case $d=1$ of this result answers a question of Benjamini.
We also extend the construction to achieve the same asymptotic bound for ($1+β$)-thinning, a set-up in which rejecting is only allowed with probability $β$ independently for each point. In addition, we suggest an improved and simplified strategy which we conjecture to guarantee discrepancy of $O(\log^{d+1} n)$ (compare with $θ(\log^d n)$, the best known construction of a low discrepancy sequence). Finally, we provide theoretical and empirical evidence for our conjecture, and provide simulations supporting the viability of our construction for applications.
△ Less
Submitted 4 September, 2017; v1 submitted 9 August, 2016;
originally announced August 2016.
-
Close Clustering Based Automated Color Image Annotation
Authors:
Ankit Garg,
Rahul Dwivedi,
Krishna Asawa
Abstract:
Most image-search approaches today are based on the text based tags associated with the images which are mostly human generated and are subject to various kinds of errors. The results of a query to the image database thus can often be misleading and may not satisfy the requirements of the user. In this work we propose our approach to automate this tagging process of images, where image results gen…
▽ More
Most image-search approaches today are based on the text based tags associated with the images which are mostly human generated and are subject to various kinds of errors. The results of a query to the image database thus can often be misleading and may not satisfy the requirements of the user. In this work we propose our approach to automate this tagging process of images, where image results generated can be fine filtered based on a probabilistic tagging mechanism. We implement a tool which helps to automate the tagging process by maintaining a training database, wherein the system is trained to identify certain set of input images, the results generated from which are used to create a probabilistic tagging mechanism. Given a certain set of segments in an image it calculates the probability of presence of particular keywords. This probability table is further used to generate the candidate tags for input images.
△ Less
Submitted 2 August, 2010;
originally announced August 2010.