-
U-MixFormer: UNet-like Transformer with Mix-Attention for Efficient Semantic Segmentation
Authors:
Seul-Ki Yeom,
Julian von Klitzing
Abstract:
Semantic segmentation has witnessed remarkable advancements with the adaptation of the Transformer architecture. Parallel to the strides made by the Transformer, CNN-based U-Net has seen significant progress, especially in high-resolution medical imaging and remote sensing. This dual success inspired us to merge the strengths of both, leading to the inception of a U-Net-based vision transformer de…
▽ More
Semantic segmentation has witnessed remarkable advancements with the adaptation of the Transformer architecture. Parallel to the strides made by the Transformer, CNN-based U-Net has seen significant progress, especially in high-resolution medical imaging and remote sensing. This dual success inspired us to merge the strengths of both, leading to the inception of a U-Net-based vision transformer decoder tailored for efficient contextual encoding. Here, we propose a novel transformer decoder, U-MixFormer, built upon the U-Net structure, designed for efficient semantic segmentation. Our approach distinguishes itself from the previous transformer methods by leveraging lateral connections between the encoder and decoder stages as feature queries for the attention modules, apart from the traditional reliance on skip connections. Moreover, we innovatively mix hierarchical feature maps from various encoder and decoder stages to form a unified representation for keys and values, giving rise to our unique mix-attention module. Our approach demonstrates state-of-the-art performance across various configurations. Extensive experiments show that U-MixFormer outperforms SegFormer, FeedFormer, and SegNeXt by a large margin. For example, U-MixFormer-B0 surpasses SegFormer-B0 and FeedFormer-B0 with 3.8% and 2.0% higher mIoU and 27.3% and 21.8% less computation and outperforms SegNext with 3.3% higher mIoU with MSCAN-T encoder on ADE20K. Code available at https://github.com/julian-klitzing/u-mixformer.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Black-Box Audits for Group Distribution Shifts
Authors:
Marc Juarez,
Samuel Yeom,
Matt Fredrikson
Abstract:
When a model informs decisions about people, distribution shifts can create undue disparities. However, it is hard for external entities to check for distribution shift, as the model and its training set are often proprietary. In this paper, we introduce and study a black-box auditing method to detect cases of distribution shift that lead to a performance disparity of the model across demographic…
▽ More
When a model informs decisions about people, distribution shifts can create undue disparities. However, it is hard for external entities to check for distribution shift, as the model and its training set are often proprietary. In this paper, we introduce and study a black-box auditing method to detect cases of distribution shift that lead to a performance disparity of the model across demographic groups. By extending techniques used in membership and property inference attacks -- which are designed to expose private information from learned models -- we demonstrate that an external auditor can gain the information needed to identify these distribution shifts solely by querying the model. Our experimental results on real-world datasets show that this approach is effective, achieving 80--100% AUC-ROC in detecting shifts involving the underrepresentation of a demographic group in the training set. Researchers and investigative journalists can use our tools to perform non-collaborative audits of proprietary models and expose cases of underrepresentation in the training datasets.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Weighted Isolation and Random Cut Forest Algorithms for Anomaly Detection
Authors:
Sijin Yeom,
Jae-Hun Jung
Abstract:
Random cut forest (RCF) algorithms have been developed for anomaly detection, particularly in time series data. The RCF algorithm is an improved version of the isolation forest (IF) algorithm. Unlike the IF algorithm, the RCF algorithm can determine whether real-time input contains an anomaly by inserting the input into the constructed tree network. Various RCF algorithms, including Robust RCF (RR…
▽ More
Random cut forest (RCF) algorithms have been developed for anomaly detection, particularly in time series data. The RCF algorithm is an improved version of the isolation forest (IF) algorithm. Unlike the IF algorithm, the RCF algorithm can determine whether real-time input contains an anomaly by inserting the input into the constructed tree network. Various RCF algorithms, including Robust RCF (RRCF), have been developed, where the cutting procedure is adaptively chosen probabilistically. The RRCF algorithm demonstrates better performance than the IF algorithm, as dimension cuts are decided based on the geometric range of the data, whereas the IF algorithm randomly chooses dimension cuts. However, the overall data structure is not considered in both IF and RRCF, given that split values are chosen randomly. In this paper, we propose new IF and RCF algorithms, referred to as the weighted IF (WIF) and weighted RCF (WRCF) algorithms, respectively. Their split values are determined by considering the density of the given data. To introduce the WIF and WRCF, we first present a new geometric measure, a density measure, which is crucial for constructing the WIF and WRCF. We provide various mathematical properties of the density measure, accompanied by theorems that support and validate our claims through numerical examples.
△ Less
Submitted 8 January, 2024; v1 submitted 1 February, 2022;
originally announced February 2022.
-
Automatic Neural Network Pruning that Efficiently Preserves the Model Accuracy
Authors:
Thibault Castells,
Seul-Ki Yeom
Abstract:
Neural networks performance has been significantly improved in the last few years, at the cost of an increasing number of floating point operations per second (FLOPs). However, more FLOPs can be an issue when computational resources are limited. As an attempt to solve this problem, pruning filters is a common solution, but most existing pruning methods do not preserve the model accuracy efficientl…
▽ More
Neural networks performance has been significantly improved in the last few years, at the cost of an increasing number of floating point operations per second (FLOPs). However, more FLOPs can be an issue when computational resources are limited. As an attempt to solve this problem, pruning filters is a common solution, but most existing pruning methods do not preserve the model accuracy efficiently and therefore require a large number of finetuning epochs. In this paper, we propose an automatic pruning method that learns which neurons to preserve in order to maintain the model accuracy while reducing the FLOPs to a predefined target. To accomplish this task, we introduce a trainable bottleneck that only requires one single epoch with 25.6% (CIFAR-10) or 7.49% (ILSVRC2012) of the dataset to learn which filters to prune. Experiments on various architectures and datasets show that the proposed method can not only preserve the accuracy after pruning but also outperform existing methods after finetuning. We achieve a 52.00% FLOPs reduction on ResNet-50, with a Top-1 accuracy of 47.51% after pruning and a state-of-the-art (SOTA) accuracy of 76.63% after finetuning on ILSVRC2012. Code available at https://github.com/nota-github/autobot_AAAI23.
△ Less
Submitted 7 December, 2022; v1 submitted 18 November, 2021;
originally announced November 2021.
-
Toward Compact Deep Neural Networks via Energy-Aware Pruning
Authors:
Seul-Ki Yeom,
Kyung-Hwan Shim,
Jee-Hyun Hwang
Abstract:
Despite the remarkable performance, modern deep neural networks are inevitably accompanied by a significant amount of computational cost for learning and deployment, which may be incompatible with their usage on edge devices. Recent efforts to reduce these overheads involve pruning and decomposing the parameters of various layers without performance deterioration. Inspired by several decomposition…
▽ More
Despite the remarkable performance, modern deep neural networks are inevitably accompanied by a significant amount of computational cost for learning and deployment, which may be incompatible with their usage on edge devices. Recent efforts to reduce these overheads involve pruning and decomposing the parameters of various layers without performance deterioration. Inspired by several decomposition studies, in this paper, we propose a novel energy-aware pruning method that quantifies the importance of each filter in the network using nuclear-norm (NN). Proposed energy-aware pruning leads to state-of-the-art performance for Top-1 accuracy, FLOPs, and parameter reduction across a wide range of scenarios with multiple network architectures on CIFAR-10 and ImageNet after fine-grained classification tasks. On toy experiment, without fine-tuning, we can visually observe that NN has a minute change in decision boundaries across classes and outperforms the previous popular criteria. We achieve competitive results with 40.4/49.8% of FLOPs and 45.9/52.9% of parameter reduction with 94.13/94.61% in the Top-1 accuracy with ResNet-56/110 on CIFAR-10, respectively. In addition, our observations are consistent for a variety of different pruning setting in terms of data size as well as data quality which can be emphasized in the stability of the acceleration and compression with negligible accuracy loss.
△ Less
Submitted 10 March, 2022; v1 submitted 19 March, 2021;
originally announced March 2021.
-
GPSPiChain-Blockchain based Self-Contained Family Security System in Smart Home
Authors:
Ali Raza,
Lachlan Hardy,
Erin Roehrer,
Soonja Yeom,
Byeong ho Kang
Abstract:
With advancements in technology, personal computing devices are better adapted for and further integrated into people's lives and homes. The integration of technology into society also results in an increasing desire to control who and what has access to sensitive information, especially for vulnerable people including children and the elderly. With blockchain coming in to the picture as a technol…
▽ More
With advancements in technology, personal computing devices are better adapted for and further integrated into people's lives and homes. The integration of technology into society also results in an increasing desire to control who and what has access to sensitive information, especially for vulnerable people including children and the elderly. With blockchain coming in to the picture as a technology that can revolutionise the world, it is now possible to have an immutable audit trail of locational data over time. By controlling the process through inexpensive equipment in the home, it is possible to control whom has access to such personal data. This paper presents a blockchain based family security system for tracking the location of consenting family members' smart phones. The locations of the family members' smart phones are logged and stored in a private blockchain which can be accessed through a node installed in the family home on a computer. The data for the whereabouts of family members stays within the family unit and does not go to any third party. The system is implemented in a small scale (one miner and two other nodes) and the technical feasibility is discussed along with the limitations of the system. Further research will cover the integration of the system into a smart home environment, and ethical implementations of tracking, especially of vulnerable people, using the immutability of blockchain.
△ Less
Submitted 13 February, 2021;
originally announced February 2021.
-
Individual Fairness Revisited: Transferring Techniques from Adversarial Robustness
Authors:
Samuel Yeom,
Matt Fredrikson
Abstract:
We turn the definition of individual fairness on its head---rather than ascertaining the fairness of a model given a predetermined metric, we find a metric for a given model that satisfies individual fairness. This can facilitate the discussion on the fairness of a model, addressing the issue that it may be difficult to specify a priori a suitable metric. Our contributions are twofold: First, we i…
▽ More
We turn the definition of individual fairness on its head---rather than ascertaining the fairness of a model given a predetermined metric, we find a metric for a given model that satisfies individual fairness. This can facilitate the discussion on the fairness of a model, addressing the issue that it may be difficult to specify a priori a suitable metric. Our contributions are twofold: First, we introduce the definition of a minimal metric and characterize the behavior of models in terms of minimal metrics. Second, for more complicated models, we apply the mechanism of randomized smoothing from adversarial robustness to make them individually fair under a given weighted $L^p$ metric. Our experiments show that adapting the minimal metrics of linear models to more complicated neural networks can lead to meaningful and interpretable fairness guarantees at little cost to utility.
△ Less
Submitted 13 October, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
Exploring an Application of Virtual Reality for Early Detection of Dementia
Authors:
Yiming Zhong,
Yuan Tian,
Mira Park,
Soonja Yeom
Abstract:
Facing the severe global dementia problem, an exploration was conducted adopting the technology of virtual reality (VR). This report lays a technical foundation for further research project "Early Detection of Dementia Using Testing Tools in VR Environment", which illustrates the process of developing a VR application using Unity 3D software on Oculus Go. This preliminary exploration is composed o…
▽ More
Facing the severe global dementia problem, an exploration was conducted adopting the technology of virtual reality (VR). This report lays a technical foundation for further research project "Early Detection of Dementia Using Testing Tools in VR Environment", which illustrates the process of developing a VR application using Unity 3D software on Oculus Go. This preliminary exploration is composed of three steps, including 3D virtual scene construction, VR interaction design and monitoring. The exploration was recorded to provide basic technical guidance and detailed method for subsequent research.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning
Authors:
Seul-Ki Yeom,
Philipp Seegerer,
Sebastian Lapuschkin,
Alexander Binder,
Simon Wiedemann,
Klaus-Robert Müller,
Wojciech Samek
Abstract:
The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the weights of various layers while at the same time aiming to not sacrifice performance. In this paper, we propose a novel criterion for CNN pruning inspired by neur…
▽ More
The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the weights of various layers while at the same time aiming to not sacrifice performance. In this paper, we propose a novel criterion for CNN pruning inspired by neural network interpretability: The most relevant units, i.e. weights or filters, are automatically found using their relevance scores obtained from concepts of explainable AI (XAI). By exploring this idea, we connect the lines of interpretability and model compression research. We show that our proposed method can efficiently prune CNN models in transfer-learning setups in which networks pre-trained on large corpora are adapted to specialized tasks. The method is evaluated on a broad range of computer vision datasets. Notably, our novel criterion is not only competitive or better compared to state-of-the-art pruning criteria when successive retraining is performed, but clearly outperforms these previous criteria in the resource-constrained application scenario in which the data of the task to be transferred to is very scarce and one chooses to refrain from fine-tuning. Our method is able to compress the model iteratively while maintaining or even improving accuracy. At the same time, it has a computational cost in the order of gradient computation and is comparatively simple to apply without the need for tuning hyperparameters for pruning.
△ Less
Submitted 12 March, 2021; v1 submitted 18 December, 2019;
originally announced December 2019.
-
Learning Fair Representations for Kernel Models
Authors:
Zilong Tan,
Samuel Yeom,
Matt Fredrikson,
Ameet Talwalkar
Abstract:
Fair representations are a powerful tool for establishing criteria like statistical parity, proxy non-discrimination, and equality of opportunity in learned models. Existing techniques for learning these representations are typically model-agnostic, as they preprocess the original data such that the output satisfies some fairness criterion, and can be used with arbitrary learning methods. In contr…
▽ More
Fair representations are a powerful tool for establishing criteria like statistical parity, proxy non-discrimination, and equality of opportunity in learned models. Existing techniques for learning these representations are typically model-agnostic, as they preprocess the original data such that the output satisfies some fairness criterion, and can be used with arbitrary learning methods. In contrast, we demonstrate the promise of learning a model-aware fair representation, focusing on kernel-based models. We leverage the classical Sufficient Dimension Reduction (SDR) framework to construct representations as subspaces of the reproducing kernel Hilbert space (RKHS), whose member functions are guaranteed to satisfy fairness. Our method supports several fairness criteria, continuous and discrete data, and multiple protected attributes. We further show how to calibrate the accuracy tradeoff by characterizing it in terms of the principal angles between subspaces of the RKHS. Finally, we apply our approach to obtain the first Fair Gaussian Process (FGP) prior for fair Bayesian learning, and show that it is competitive with, and in some cases outperforms, state-of-the-art methods on real data.
△ Less
Submitted 20 January, 2020; v1 submitted 27 June, 2019;
originally announced June 2019.
-
FlipTest: Fairness Testing via Optimal Transport
Authors:
Emily Black,
Samuel Yeom,
Matt Fredrikson
Abstract:
We present FlipTest, a black-box technique for uncovering discrimination in classifiers. FlipTest is motivated by the intuitive question: had an individual been of a different protected status, would the model have treated them differently? Rather than relying on causal information to answer this question, FlipTest leverages optimal transport to match individuals in different protected groups, cre…
▽ More
We present FlipTest, a black-box technique for uncovering discrimination in classifiers. FlipTest is motivated by the intuitive question: had an individual been of a different protected status, would the model have treated them differently? Rather than relying on causal information to answer this question, FlipTest leverages optimal transport to match individuals in different protected groups, creating similar pairs of in-distribution samples. We show how to use these instances to detect discrimination by constructing a "flipset": the set of individuals whose classifier output changes post-translation, which corresponds to the set of people who may be harmed because of their group membership. To shed light on why the model treats a given subgroup differently, FlipTest produces a "transparency report": a ranking of features that are most associated with the model's behavior on the flipset. Evaluating the approach on three case studies, we show that this provides a computationally inexpensive way to identify subgroups that may be harmed by model discrimination, including in cases where the model satisfies group fairness criteria.
△ Less
Submitted 6 December, 2019; v1 submitted 21 June, 2019;
originally announced June 2019.
-
Hunting for Discriminatory Proxies in Linear Regression Models
Authors:
Samuel Yeom,
Anupam Datta,
Matt Fredrikson
Abstract:
A machine learning model may exhibit discrimination when used to make decisions involving people. One potential cause for such outcomes is that the model uses a statistical proxy for a protected demographic attribute. In this paper we formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies. Our definition follows recent work on proxies…
▽ More
A machine learning model may exhibit discrimination when used to make decisions involving people. One potential cause for such outcomes is that the model uses a statistical proxy for a protected demographic attribute. In this paper we formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies. Our definition follows recent work on proxies in classification models, and characterizes a model's constituent behavior that: 1) correlates closely with a protected random variable, and 2) is causally influential in the overall behavior of the model. We show that proxies in linear regression models can be efficiently identified by solving a second-order cone program, and further extend this result to account for situations where the use of a certain input variable is justified as a `business necessity'. Finally, we present empirical results on two law enforcement datasets that exhibit varying degrees of racial disparity in prediction outcomes, demonstrating that proxies shed useful light on the causes of discriminatory behavior in models.
△ Less
Submitted 27 November, 2018; v1 submitted 16 October, 2018;
originally announced October 2018.
-
Avoiding Disparity Amplification under Different Worldviews
Authors:
Samuel Yeom,
Michael Carl Tschantz
Abstract:
We mathematically compare four competing definitions of group-level nondiscrimination: demographic parity, equalized odds, predictive parity, and calibration. Using the theoretical framework of Friedler et al., we study the properties of each definition under various worldviews, which are assumptions about how, if at all, the observed data is biased. We argue that different worldviews call for dif…
▽ More
We mathematically compare four competing definitions of group-level nondiscrimination: demographic parity, equalized odds, predictive parity, and calibration. Using the theoretical framework of Friedler et al., we study the properties of each definition under various worldviews, which are assumptions about how, if at all, the observed data is biased. We argue that different worldviews call for different definitions of fairness, and we specify the worldviews that, when combined with the desire to avoid a criterion for discrimination that we call disparity amplification, motivate demographic parity and equalized odds. We also argue that predictive parity and calibration are insufficient for avoiding disparity amplification because predictive parity allows an arbitrarily large inter-group disparity and calibration is not robust to post-processing. Finally, we define a worldview that is more realistic than the previously considered ones, and we introduce a new notion of fairness that corresponds to this worldview.
△ Less
Submitted 9 March, 2021; v1 submitted 26 August, 2018;
originally announced August 2018.
-
Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting
Authors:
Samuel Yeom,
Irene Giacomelli,
Matt Fredrikson,
Somesh Jha
Abstract:
Machine learning algorithms, when applied to sensitive data, pose a distinct threat to privacy. A growing body of prior work demonstrates that models produced by these algorithms may leak specific private information in the training data to an attacker, either through the models' structure or their observable behavior. However, the underlying cause of this privacy risk is not well understood beyon…
▽ More
Machine learning algorithms, when applied to sensitive data, pose a distinct threat to privacy. A growing body of prior work demonstrates that models produced by these algorithms may leak specific private information in the training data to an attacker, either through the models' structure or their observable behavior. However, the underlying cause of this privacy risk is not well understood beyond a handful of anecdotal accounts that suggest overfitting and influence might play a role.
This paper examines the effect that overfitting and influence have on the ability of an attacker to learn information about the training data from machine learning models, either through training set membership inference or attribute inference attacks. Using both formal and empirical analyses, we illustrate a clear relationship between these factors and the privacy risk that arises in several popular machine learning algorithms. We find that overfitting is sufficient to allow an attacker to perform membership inference and, when the target attribute meets certain conditions about its influence, attribute inference attacks. Interestingly, our formal analysis also shows that overfitting is not necessary for these attacks and begins to shed light on what other factors may be in play. Finally, we explore the connection between membership inference and attribute inference, showing that there are deep connections between the two that lead to effective new attacks.
△ Less
Submitted 4 May, 2018; v1 submitted 5 September, 2017;
originally announced September 2017.