QBI: Quantile-based Bias Initialization for Efficient Private Data Reconstruction in Federated Learning

Micha V. Nowak1  Tim P. Bott2  David Khachaturov3

Frank Puppe1Adrian Krenzer1Amar Hekalo1

1University of Würzburg  2University of Augsburg  3University of Cambridge
Abstract

Federated learning enables the training of machine learning models on distributed data without compromising user privacy, as data remains on personal devices and only model updates, such as gradients, are shared with a central coordinator. However, recent research has shown that the central entity can perfectly reconstruct private data from shared model updates by maliciously initializing the model’s parameters. In this paper, we propose QBI, a novel bias initialization method that significantly enhances reconstruction capabilities. This is accomplished by directly solving for bias values yielding sparse activation patterns. Further, we propose PAIRS, an algorithm that builds on QBI. PAIRS can be deployed when a separate dataset from the target domain is available to further increase the percentage of data that can be fully recovered. Measured by the percentage of samples that can be perfectly reconstructed from batches of various sizes, our approach achieves significant improvements over previous methods with gains of up to 50% on ImageNet and up to 60% on the IMDB sentiment analysis text dataset. Furthermore, we establish theoretical limits for attacks leveraging stochastic gradient sparsity, providing a foundation for understanding the fundamental constraints of these attacks. We empirically assess these limits using synthetic datasets. Finally, we propose and evaluate AGGP, a defensive framework designed to prevent gradient sparsity attacks, contributing to the development of more secure and private federated learning systems. Source code is available at: https://github.com/mvnowak/QBI

1 Introduction

The proliferation of mobile devices and the Internet of Things has led to an unprecedented amount of data being generated at the edge of the network [1]. This data, often in the form of user-generated content, sensor readings, or other types of user interactions, holds immense value for training machine learning (ML) models [2]. Due to the sensitive and often private nature of this data, traditional ML approaches that rely on centralized data collection and processing are often inadequate from a legal or ethical perspective [3]. Furthermore, regulations such as data sovereignty laws and cross-border data transfer restrictions (e.g., GDPR, CCPA) can hinder the movement of data between jurisdictions [4, 5].

Federated learning (FL) was proposed as a solution to these challenges [6]. It enables the collaborative training of ML models while preventing the need for users’ data to leave their devices – each device computes model updates locally, which are then sent to a central entity for aggregation into a shared model. FL, in theory, should preserve users’ privacy and adhere to data transfer restrictions, as the computed gradient updates should not expose any user data.

However, a large body of prior work has demonstrated that the FL protocol is vulnerable to multiple forms of data reconstruction attacks, which can be roughly classified into two major categories.

Passive gradient leakage attacks

In this scenario, the attacker is assumed to have no control over the model’s architecture, its parameters, or the specific FL protocol that is being used. The gradients created through a standard FL protocol could either be obtained by an honest but curious central entity, or an external actor via a man-in-the-middle (MITM) attack. A long line of work has demonstrated, that simply by obtaining these gradients, an attacker could learn about properties of the data [7, 8], data membership [8], and even partially reconstruct user data [9, 10, 11].

Malicious model modifications

The second major threat model assumes a malicious central entity, that has the ability to modify the model’s architecture or parameters in an attempt to break the user’s privacy. A broad range of research has outlined a multitude of possible active attacks, that often leverage induced gradient sparsity [12, 9, 13] or sample disaggregation [14, 15] to recover user data.

Each of these approaches strikes a different balance between computational overhead, quality of the reconstructed data, client-side detectability and the requirement for knowledge about the user data, including certain features and their distributions.

Efficient perfect user data reconstruction with bias tuning

In this work, we propose a novel method of maliciously initializing a model, by only modifying the bias values of a fully connected layer, while leaving the weights completely randomly initialized. This reduces client-side detectability in parameter space compared to methods that either maliciously train the full model to compromise privacy [16] or introduce a shift in the magnitude of all weight values [12]. Additionally our method removes the need for experimentally determined hyperparameters [12] or handpicked target features or classes [17]. We provide two variants of our approach: Quantile-Based Bias Initialization (QBI), which directly determines the optimal bias with near-zero computational cost, and Pattern-Aware Iterative Random Search (PAIRS) which builds on QBI, but further enhances reconstruction success by incorporating auxiliary data and incurring marginally higher computational overhead. Additionally, we derive boundaries for the expected success of attacks leveraging stochastic gradient sparsity, which enables us to identify the inherent constraints of such attacks. Using the findings described in this paper, we propose a novel defensive framework – AGGP – aimed at preventing gradient sparsity attacks, contributing to the development of more secure and private FL systems.

Contributions

  • We establish theoretical limits for attacks leveraging stochastic gradient sparsity and empirically assess these limits using synthetic datasets.

  • We propose a novel, compute efficient method of maliciously initializing fully-connected layers in a server-side attack on the FL protocol, which achieves state-of-the-art results in the domain of perfect user data reconstruction.

  • We provide two variants of our approach: QBI which can be deployed with near-zero computational cost, and PAIRS which can be used to achieve increased performance if auxiliary data and increased compute is available. We release an open-source implementation of our method at https://github.com/mvnowak/QBI, including dedicated scripts to reproduce all results reported in this paper.

  • We extensively evaluate both QBI and PAIRS and find that we achieve improvements above the previous state-of-the-art in perfect rectonstruction with gains of up to 50%percent5050\%50 % on ImageNet and up to 60%percent6060\%60 % on the IMDB sentiment analysis text dataset.

  • We propose and evaluate AGGP, a novel and compute efficient defensive framework that prevents data leakage from both active and passive attacks that leverage gradient sparsity in fully connected layers.

2 Background

Federated learning

FL is a decentralized approach to machine learning that enables multiple parties to collaboratively train a shared model on their local data without sharing the data itself. Each client has a local dataset and computes an update to the model parameters based on their local data. The update is typically computed as the gradient of the local loss function with respect to the model parameters. The local updates are then aggregated to update the global model parameters. One common aggregation method is federated averaging, which computes the weighted average of the local updates. This process is repeated over multiple rounds to train the global model [6].

Gradient sparsity attacks

As demonstrated by Geiping et al. [18], it is feasible to extract a single input from the gradients of a fully-connected layer, given that the layer is preceded only by fully connected layers, has a bias b𝑏bitalic_b, uses the ReLU activation function, and the gradient of the loss with respect to the layer’s output contains at least one non-zero entry (see Proposition D.1 in Geiping et al. [18] for a full proof). If this layer is placed at the beginning of the network, this corresponds to reconstructing the original input data point x𝑥xitalic_x. As shown in Section 5.1 of Boenisch et al. [12], for any non-zero output yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the gradient of the corresponding weight row directly contains the input scaled by the gradient of the loss with respect to the bias. In practice, gradients are typically computed as the average over an entire batch of samples. Since a single neuron is often activated by multiple samples, the gradients of its weight row contain the average of multiple input data points, effectively obscuring them and preventing individual extraction. To extract an input sample x𝑥xitalic_x, there has to exist a neuron nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT such that L(x)i>0𝐿subscript𝑥𝑖0L(x)_{i}>0italic_L ( italic_x ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 and L(x)i<0𝐿subscriptsuperscript𝑥𝑖0L(x^{\prime})_{i}<0italic_L ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 0 for all xxsuperscript𝑥𝑥x^{\prime}\neq xitalic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_x, where L(x)i𝐿subscript𝑥𝑖L(x)_{i}italic_L ( italic_x ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the activation of the i𝑖iitalic_i-th neuron for sample x𝑥xitalic_x. Since the samples that are to be extracted are unknown, initializing a layer to achieve this specific activation pattern is challenging. Therefore, the goal of stochastic gradient sparsity attacks is to increase the probability that a neuron is activated only by a single sample, while producing a variety of neurons with diverse activation patterns, to capture as much data from the target domain as possible.

Adaptability to CNN-based architectures

Boenisch et al. [12] extend the gradient sparsity attack to Convolutional Neural Networks (CNNs), where one or more convolutional layers precede the first linear layer. By utilizing zero-padding, a stride of one, and maliciously initialized filters, convolutional layers can effectively be transformed into identity functions, which perfectly transmit the inputs deeper into the network, allowing them to be extracted once they pass through a fully connected layer. For a detailed description, including visualizations, see Appendix B of Boenisch et al. [12].

3 Related Work

In FL, a common threat model is the passive gradient leakage scenario, where a malicious entity obtains a set of gradients to recover the original data points. Previous research has demonstrated that these gradients can be used to infer data properties [7, 8] or data membership [8]. Several optimization-based attacks have been proposed [19, 11, 20], which optimize a batch of random noise to generate gradients similar to those observed and thereby achieve partial reconstruction of user data. Although applicable to various architectures, these methods often require significant computational resources and are unable to achieve perfect reconstructions. The second threat model involves a malicious server (MS) that can manipulate the model’s architecture or parameters to compromise user privacy. Research in this area has identified various active attacks, which exploit induced gradient sparsity [12, 9, 13] or sample disaggregation techniques [14, 15] to recover user data. The SEER framework [14] is a notable example of an MS that avoids client-side detectability in gradient space by disaggregating samples in an embedding space that is unknown by the client. Although it achieves a high percentage of well-reconstructed images, it does so at a high computational cost (14 GPU days to train on an A100 with 80GB) and falls short of perfectly reconstructing data.

We mainly build on the work of Boenisch et al. [12] that introduced the concept of trap weights, a computationally efficient way to initialize a model to induce gradient sparsity. By adding a slight negative shift to the weight values, their approach achieves perfect reconstruction on multiple datasets. However, their method relies on an experimentally determined scaling factor. In contrast, we automate the model’s initialization process and achieve substantially higher rates of perfect reconstruction.

4 Method: Adversarial Bias Tuning

Intuition of bias tuning

Given a linear layer L𝐿Litalic_L of shape N×M𝑁𝑀N\times Mitalic_N × italic_M, that uses the ReLU activation function, and a batch X~~𝑋\tilde{X}over~ start_ARG italic_X end_ARG of B𝐵Bitalic_B samples x1,,xBsubscript𝑥1subscript𝑥𝐵x_{1},\cdots,x_{B}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT, the objective is that for every x𝑥xitalic_x there exists one and only one neuron nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT such that L(x)i>0𝐿subscript𝑥𝑖0L(x)_{i}>0italic_L ( italic_x ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 and L(x)i<0𝐿subscriptsuperscript𝑥𝑖0L(x^{\prime})_{i}<0italic_L ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 0 for all xxsuperscript𝑥𝑥x^{\prime}\neq xitalic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_x, where L(x)i𝐿subscript𝑥𝑖L(x)_{i}italic_L ( italic_x ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the activation of the i𝑖iitalic_i-th neuron for sample x𝑥xitalic_x. This ensures that neuron nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT allows for perfect reconstruction of x𝑥xitalic_x from the gradients of its weight row, while producing zero gradients for all other samples. Therefore, for every neuron nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT the desired probability to activate for any sample in the batch is 1/B1𝐵1/B1 / italic_B. We take the model’s weights and our approximated normalized features to be independently and identically distributed (i.i.d.) random variables sampled from a normal distribution:

(𝐰𝐢𝐱𝐢)𝒩(0,IM)similar-tomatrixsubscript𝐰𝐢subscript𝐱𝐢𝒩0subscript𝐼𝑀\begin{pmatrix}\bf{w}_{i}\\ \bf{x}_{i}\end{pmatrix}\sim\mathcal{N}(0,I_{M})( start_ARG start_ROW start_CELL bold_w start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_x start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) ∼ caligraphic_N ( 0 , italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) (1)

Using this assumption, the probability of a single neuron, with corresponding weight row wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and bias bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, activating for a sample xiMsubscript𝑥𝑖superscript𝑀x_{i}\in\mathbb{R}^{M}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT can be expanded as:

P(L(xi)>0)=P(wiTxi+bi>0)=P(wi1xi1++wiMxiM+bi>0)𝑃𝐿subscript𝑥𝑖0𝑃superscriptsubscript𝑤𝑖𝑇subscript𝑥𝑖subscript𝑏𝑖0𝑃subscript𝑤𝑖1subscript𝑥𝑖1subscript𝑤𝑖𝑀subscript𝑥𝑖𝑀subscript𝑏𝑖0P\bigl{(}L(x_{i})>0\bigr{)}=P\bigl{(}w_{i}^{T}x_{i}+b_{i}>0\bigr{)}=P\bigl{(}w% _{i1}x_{i1}+\ldots+w_{iM}x_{iM}+b_{i}>0\bigr{)}italic_P ( italic_L ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) > 0 ) = italic_P ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 ) = italic_P ( italic_w start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT + … + italic_w start_POSTSUBSCRIPT italic_i italic_M end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i italic_M end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 ) (2)

Now the left-hand side is a sum of i.i.d. random variables, each of which has characteristic function:

(1+t2)1/2superscript1superscript𝑡212\bigl{(}1+t^{2}\bigr{)}^{\nicefrac{{1}}{{2}}}( 1 + italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT (3)

In turn, the characteristic function of the entire sum is:

(1+2it)M/2=(12it/2)M/2(1+2it/2)M/2superscript12𝑖𝑡𝑀2superscript12𝑖𝑡2𝑀2superscript12𝑖𝑡2𝑀2(1+2it)^{-M/2}=(1-2it/2)^{-M/2}\cdot(1+2it/2)^{-M/2}( 1 + 2 italic_i italic_t ) start_POSTSUPERSCRIPT - italic_M / 2 end_POSTSUPERSCRIPT = ( 1 - 2 italic_i italic_t / 2 ) start_POSTSUPERSCRIPT - italic_M / 2 end_POSTSUPERSCRIPT ⋅ ( 1 + 2 italic_i italic_t / 2 ) start_POSTSUPERSCRIPT - italic_M / 2 end_POSTSUPERSCRIPT (4)

This immediately identifies it as distributed like:

P(wiTxi+bi>0)=P(Q1/2Q2/2bi)𝑃superscriptsubscript𝑤𝑖𝑇subscript𝑥𝑖subscript𝑏𝑖0𝑃subscript𝑄12subscript𝑄22subscript𝑏𝑖P\bigl{(}w_{i}^{T}x_{i}+b_{i}>0\bigr{)}=P\bigl{(}Q_{1}/2-Q_{2}/2\leq b_{i}% \bigr{)}italic_P ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 ) = italic_P ( italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / 2 - italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 ≤ italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (5)

where Q1subscript𝑄1Q_{1}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Q2subscript𝑄2Q_{2}italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are independent and both of the chi-square distribution with M𝑀Mitalic_M degrees of freedom. Note that the variance of each addend is V(wijxij)=1𝑉subscript𝑤𝑖𝑗subscript𝑥𝑖𝑗1V(w_{ij}x_{ij})=1italic_V ( italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) = 1 (derived by evaluating the second derivative of Equation 3 at 0). We can drop the assumption that our features are normally-distributed and instead assume that the data has been normalized, as we do in our experiments. Even with this change, the variance of each addend still evaluates to 1111 as we maintain the assumption that the weights are independently- and normally-distributed. This enables us to apply the Central Limit Theorem to the original sum in Equation 2. As a sum of i.i.d. random variables with existing second moments, we can make a statement about convergence in distribution:

P(wi1xi1++wiMxiM+bi>0)𝑑Φ(biM)𝑑𝑃subscript𝑤𝑖1subscript𝑥𝑖1subscript𝑤𝑖𝑀subscript𝑥𝑖𝑀subscript𝑏𝑖0Φsubscript𝑏𝑖𝑀P\bigl{(}w_{i1}x_{i1}+\ldots+w_{iM}x_{iM}+b_{i}>0\bigr{)}\xrightarrow{d}\Phi% \biggl{(}\frac{b_{i}}{\sqrt{M}}\biggr{)}italic_P ( italic_w start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT + … + italic_w start_POSTSUBSCRIPT italic_i italic_M end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i italic_M end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 ) start_ARROW overitalic_d → end_ARROW roman_Φ ( divide start_ARG italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_M end_ARG end_ARG ) (6)

where ΦΦ\Phiroman_Φ is the cumulative distribution function of a standard normal distribution. Meaning we can solve for the asymptotically optimal bias bsubscript𝑏b_{*}italic_b start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT by using the inverse cdf or quantile function:

b=Φ1(1B)Msubscript𝑏superscriptΦ11𝐵𝑀b_{*}=\Phi^{-1}(\frac{1}{B})\cdot\sqrt{M}italic_b start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT = roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ) ⋅ square-root start_ARG italic_M end_ARG (7)

It can be helpful to conceptualize the relationship between this linear layer and a set of samples as a bipartite random graph. The nodes of one partition class are neurons from the neuron set N~={n1,,nn}~𝑁subscript𝑛1subscript𝑛𝑛\smash{\tilde{N}}=\{n_{1},\ldots,n_{n}\}over~ start_ARG italic_N end_ARG = { italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_n start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, and those in the other partition are from the sample set X~={x1,,xB}~𝑋subscript𝑥1subscript𝑥𝐵\smash{\tilde{X}}=\{x_{1},\ldots,x_{B}\}over~ start_ARG italic_X end_ARG = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT }. A vertex (ni,xj)N~×X~subscript𝑛𝑖subscript𝑥𝑗~𝑁~𝑋(n_{i},x_{j})\in\smash{\tilde{N}}\times\smash{\tilde{X}}( italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∈ over~ start_ARG italic_N end_ARG × over~ start_ARG italic_X end_ARG is said to exist if nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is activated by xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. If the malicious actor wants to reconstruct a sample xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, that sample needs to have a neighbor nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with degree δ(ni)=1𝛿subscript𝑛𝑖1\delta(n_{i})=1italic_δ ( italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 1.

Extraction metrics

We closely follow the extraction metrics proposed by Boenisch et al. [12]. Given a linear layer with N𝑁Nitalic_N output neurons and a batch X~~𝑋\tilde{X}over~ start_ARG italic_X end_ARG of B𝐵Bitalic_B samples x1,,xBsubscript𝑥1subscript𝑥𝐵x_{1},\cdots,x_{B}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT, we define the following metrics:

  1. 1.

    Active neurons (A𝐴Aitalic_A): This metric represents the percentage of neurons in layer L𝐿Litalic_L that activate for at least one of the B𝐵Bitalic_B samples. Formally, let NAsubscript𝑁𝐴N_{A}italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT be the number of neurons nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that satisfy the condition that there exists at least one input x𝑥xitalic_x in X~~𝑋\tilde{X}over~ start_ARG italic_X end_ARG such that L(x)i>0𝐿subscript𝑥𝑖0L(x)_{i}>0italic_L ( italic_x ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0, where L(x)i𝐿subscript𝑥𝑖L(x)_{i}italic_L ( italic_x ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the activation of the i𝑖iitalic_i-th neuron in L𝐿Litalic_L for sample x𝑥xitalic_x. For the neuron conceptualized as a graph node, this condition is equivalent to δ(ni)1𝛿subscript𝑛𝑖1\delta(n_{i})\geq 1italic_δ ( italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≥ 1. The active neurons metric A𝐴Aitalic_A is therefore defined as the ratio of NAsubscript𝑁𝐴N_{A}italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT to the total number of neurons N𝑁Nitalic_N:

    A=NAN𝐴subscript𝑁𝐴𝑁A=\frac{N_{A}}{N}italic_A = divide start_ARG italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_ARG start_ARG italic_N end_ARG (8)
  2. 2.

    Extraction-Precision (P𝑃Pitalic_P): This metric measures the percentage of neurons that allow for the extraction of individual data points. Specifically, let Nusubscript𝑁𝑢N_{u}italic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT be the number of neurons nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in layer L𝐿Litalic_L with unique activations, i.e. those that satisfy the following condition: L(x)i>0𝐿subscript𝑥𝑖0L(x)_{i}>0italic_L ( italic_x ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 for one input x𝑥xitalic_x in X~~𝑋\tilde{X}over~ start_ARG italic_X end_ARG, and L(x)i<0𝐿subscriptsuperscript𝑥𝑖0L(x^{\prime})_{i}<0italic_L ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 0 for all other inputs xxsuperscript𝑥𝑥x^{\prime}\neq xitalic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_x. This condition is equivalent to δ(ni)=1𝛿subscript𝑛𝑖1\delta(n_{i})=1italic_δ ( italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 1, which effectively counts neuron nodes that are leaves of their graphs. The extraction-precision P𝑃Pitalic_P is defined as the following ratio:

    P=NuN𝑃subscript𝑁𝑢𝑁P=\frac{N_{u}}{N}italic_P = divide start_ARG italic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_ARG start_ARG italic_N end_ARG (9)
  3. 3.

    Extraction-Recall (R𝑅Ritalic_R): The extraction-recall measures the percentage of input data points that can be perfectly reconstructed from any gradient row. Let B0subscript𝐵0B_{0}italic_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be the number of data points that can be extracted with an l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-error of zero, then R𝑅Ritalic_R is denoted as:

    R=B0B𝑅subscript𝐵0𝐵R=\frac{B_{0}}{B}italic_R = divide start_ARG italic_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_B end_ARG (10)

Notably, R𝑅Ritalic_R is the most significant metric, as neither A𝐴Aitalic_A nor P𝑃Pitalic_P can be used in isolation to meaningfully assess the effectiveness of an attack. A high A𝐴Aitalic_A value could lead to overlapping activations that prevent individual extraction, while a high P𝑃Pitalic_P value could be observed in a scenario where all neurons activated for the same sample. If the true probability of a neuron activating is 1/B1𝐵1/B1 / italic_B, we can derive the explicit probabilities pA;Bsubscript𝑝𝐴𝐵p_{A;B}italic_p start_POSTSUBSCRIPT italic_A ; italic_B end_POSTSUBSCRIPT and pu;Bsubscript𝑝𝑢𝐵p_{u;B}italic_p start_POSTSUBSCRIPT italic_u ; italic_B end_POSTSUBSCRIPT for a neuron to be counted as a success in the context of the A𝐴Aitalic_A and P𝑃Pitalic_P metrics, respectively. Since the success of one neuron does not influence the remaining neurons, the entire batch follows a binomial distribution: NA(N,pA;B)similar-tosubscript𝑁𝐴𝑁subscript𝑝𝐴𝐵N_{A}\sim\mathcal{B}(N,p_{A;B})italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ∼ caligraphic_B ( italic_N , italic_p start_POSTSUBSCRIPT italic_A ; italic_B end_POSTSUBSCRIPT ) and Nu(N,pu;B)similar-tosubscript𝑁𝑢𝑁subscript𝑝𝑢𝐵N_{u}\sim\mathcal{B}(N,p_{u;B})italic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∼ caligraphic_B ( italic_N , italic_p start_POSTSUBSCRIPT italic_u ; italic_B end_POSTSUBSCRIPT ). Consequently, the expected activation share A𝐴Aitalic_A and the expected extraction-precision P𝑃Pitalic_P are:

pA;B=𝔼[AB]=1(B1B)B and pu;B=𝔼[PB]=(B1B)B1formulae-sequencesubscript𝑝𝐴𝐵𝔼delimited-[]subscript𝐴𝐵1superscript𝐵1𝐵𝐵 and subscript𝑝𝑢𝐵𝔼delimited-[]subscript𝑃𝐵superscript𝐵1𝐵𝐵1p_{A;B}=\mathbb{E}[A_{B}]=1-\bigl{(}\frac{B-1}{B}\bigr{)}^{B}\qquad\text{ and % }\qquad p_{u;B}=\mathbb{E}[P_{B}]=\bigl{(}\frac{B-1}{B}\bigr{)}^{B-1}italic_p start_POSTSUBSCRIPT italic_A ; italic_B end_POSTSUBSCRIPT = blackboard_E [ italic_A start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ] = 1 - ( divide start_ARG italic_B - 1 end_ARG start_ARG italic_B end_ARG ) start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT and italic_p start_POSTSUBSCRIPT italic_u ; italic_B end_POSTSUBSCRIPT = blackboard_E [ italic_P start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ] = ( divide start_ARG italic_B - 1 end_ARG start_ARG italic_B end_ARG ) start_POSTSUPERSCRIPT italic_B - 1 end_POSTSUPERSCRIPT (11)

For growing batch sizes, these converge to:

limB𝔼[AB]=11e63.2%andlimB𝔼[PB]=1e36.8%formulae-sequencesubscript𝐵𝔼delimited-[]subscript𝐴𝐵11𝑒percent63.2andsubscript𝐵𝔼delimited-[]subscript𝑃𝐵1𝑒percent36.8\lim\limits_{B\to\infty}\mathbb{E}[A_{B}]=1-\frac{1}{e}\approx 63.2\%\quad% \text{and}\quad\lim\limits_{B\to\infty}\mathbb{E}[P_{B}]=\frac{1}{e}\approx 36% .8\%roman_lim start_POSTSUBSCRIPT italic_B → ∞ end_POSTSUBSCRIPT blackboard_E [ italic_A start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ] = 1 - divide start_ARG 1 end_ARG start_ARG italic_e end_ARG ≈ 63.2 % and roman_lim start_POSTSUBSCRIPT italic_B → ∞ end_POSTSUBSCRIPT blackboard_E [ italic_P start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ] = divide start_ARG 1 end_ARG start_ARG italic_e end_ARG ≈ 36.8 % (12)

From a graph theory perspective, RB=B0𝑅𝐵subscript𝐵0R\cdot B=B_{0}italic_R ⋅ italic_B = italic_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT denotes the size of the largest neuron subset VN~𝑉~𝑁V\subset\tilde{N}italic_V ⊂ over~ start_ARG italic_N end_ARG, such that the subgraph induced by the union of V𝑉Vitalic_V and all neighbors of vertices in V𝑉Vitalic_V forms a perfect matching. However in contrast to the previous metrics, the expected extraction recall R𝑅Ritalic_R exhibits a crucial difference. Our assumptions state that existence of a particular edge (representing the activation of a neuron by a data point) is independent of the existence of any other edges. Due to this, and because neurons do not have edges in common with one another, the event of inclusion of any neuron in the count for NAsubscript𝑁𝐴N_{A}italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT and Nusubscript𝑁𝑢N_{u}italic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is also independent of the inclusion of any other neuron therein. This allows us to assert the success probabilities (see Equation 11) and that NAsubscript𝑁𝐴N_{A}italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT and Nusubscript𝑁𝑢N_{u}italic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT will both follow a binomial distribution. This binomial approach breaks down for R𝑅Ritalic_R, however. We can and will still derive the non-zero probability (see Equation 13) for one specific data point to be included in the count (for NAsubscript𝑁𝐴N_{A}italic_N start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT and Nusubscript𝑁𝑢N_{u}italic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT we were counting neurons, for B0subscript𝐵0B_{0}italic_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT we count data points). This value will depend on the size of both partition classes of the graph, not just the one being counted. Namely, the expected share of data points that the malicious actor can perfectly reconstruct is:

pR;B;N=𝔼[R]=1(11B(B1B)B1)Nsubscript𝑝𝑅𝐵𝑁𝔼delimited-[]𝑅1superscript11𝐵superscript𝐵1𝐵𝐵1𝑁p_{R;B;N}=\mathbb{E}[R]=1-\left(1-\frac{1}{B}\left(\frac{B-1}{B}\right)^{B-1}% \right)^{N}italic_p start_POSTSUBSCRIPT italic_R ; italic_B ; italic_N end_POSTSUBSCRIPT = blackboard_E [ italic_R ] = 1 - ( 1 - divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ( divide start_ARG italic_B - 1 end_ARG start_ARG italic_B end_ARG ) start_POSTSUPERSCRIPT italic_B - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT (13)

See Section B.3 for a more detailed explanation. To check that this does not yield the exact distribution of R𝑅Ritalic_R and that the binomial approach is no longer relevant, consider a graph with more data points than neurons, i.e. B>N𝐵𝑁B>Nitalic_B > italic_N. It is clear that P(R=100%)=0(pR;B;N)B𝑃𝑅percent1000superscriptsubscript𝑝𝑅𝐵𝑁BP(R=100\%)=0\neq(p_{R;B;N})^{\textsc{B}}italic_P ( italic_R = 100 % ) = 0 ≠ ( italic_p start_POSTSUBSCRIPT italic_R ; italic_B ; italic_N end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT, since there are not enough neurons to cover each data point. As Equation 13 assumes the optimal scenario, only obtainable with per-neuron activation probabilities of 1/B1𝐵1/B1 / italic_B and truly normally distributed data, it provides an upper limit for the expected success of stochastic gradient sparsity attacks on real-world data, which will yield lower expected results, the further the data deviates from being normally distributed. In Section A.3 we assess these boundaries by evaluating the extraction success on synthetic truly random datasets.

Quantile-based Bias Initialization (QBI)

QBI maliciously initializes a linear layer L𝐿Litalic_L before sending it to a client k𝑘kitalic_k targeted for extraction. The weight values w𝑤witalic_w of L𝐿Litalic_L are initialized from a standard normal distribution. Given a batch size B𝐵Bitalic_B used on the client side and the number of input features M𝑀Mitalic_M, QBI determines the bias value bsuperscript𝑏b^{*}italic_b start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT using Equation 7, which approximately leads to an activation probability of 1/B1𝐵1/B1 / italic_B for each neuron in L𝐿Litalic_L. Even though the true distribution of features on the user side is unknown and the features are neither independent nor truly normally-distributed, our approximation is effective in practice.

Pattern-Aware Iterative Random Search (PAIRS)

We propose the PAIRS algorithm that further adapts the malicious QBI linear layer to the target domain when auxiliary data is available. By acknowledging that real-world data, such as images, rarely exhibits the assumed i.i.d. properties, PAIRS iteratively searches the weight space to better capture the underlying patterns. The procedure, outlined in Algorithm 1, begins by performing a forward pass with a batch of auxiliary data. It then identifies and re-initializes the weight rows of neurons that are either overactive or underactive, as well as those that exhibit redundant activation patterns. Through this process, PAIRS builds neuron-sample pairs, iteratively searching the weight space until all samples are covered or a fixed number of iterations is reached. Specifically, with an output shape of N𝑁Nitalic_N and a batch size of B𝐵Bitalic_B, PAIRS uses N/B𝑁𝐵N/Bitalic_N / italic_B batches of B𝐵Bitalic_B samples for groups of B𝐵Bitalic_B neurons each. By randomly re-initializing weight values, PAIRS avoids detectability in weight space while increasing the percentage of data that can be perfectly reconstructed, surpassing the performance of plain QBI.

Algorithm 1 Pattern-Aware Iterative Random Search (PAIRS)
1:Input: Linear layer L𝐿Litalic_L of shape M×N𝑀𝑁M\times Nitalic_M × italic_N, Number of retries T𝑇Titalic_T
2:K𝐾Kitalic_K Batches X~1,,X~Ksubscript~𝑋1subscript~𝑋𝐾\tilde{X}_{1},\cdots,\tilde{X}_{K}over~ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over~ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT of size BN/K𝐵𝑁𝐾B\leftarrow\lceil N/K\rceilitalic_B ← ⌈ italic_N / italic_K ⌉
3:Initialize: L.biasϕ1(1B)Mformulae-sequence𝐿𝑏𝑖𝑎𝑠superscriptitalic-ϕ11𝐵𝑀L.bias\leftarrow\phi^{-1}(\frac{1}{B})\cdot\sqrt{M}italic_L . italic_b italic_i italic_a italic_s ← italic_ϕ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ) ⋅ square-root start_ARG italic_M end_ARG \triangleright Fill bias values using quantile function
4:for all X~ksubscript~𝑋𝑘\tilde{X}_{k}over~ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT do
5:     Initialize: FDsubscript𝐹𝐷F_{D}\leftarrow\emptysetitalic_F start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ← ∅ \triangleright Frozen data points
6:     for neuron n=(k1)B𝑛𝑘1𝐵n=(k-1)Bitalic_n = ( italic_k - 1 ) italic_B to kB1𝑘𝐵1kB-1italic_k italic_B - 1 do
7:         for t=1𝑡1t=1italic_t = 1 to T𝑇Titalic_T do \triangleright Random resets for this neuron
8:              AL(X~k)n𝐴𝐿subscriptsubscript~𝑋𝑘𝑛A\leftarrow L(\tilde{X}_{k})_{n}italic_A ← italic_L ( over~ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT \triangleright Get activations via forward pass for neuron n𝑛nitalic_n
9:              I{i|A[i]>0}𝐼conditional-set𝑖𝐴delimited-[]𝑖0I\leftarrow\{i\,|\,A[i]>0\}italic_I ← { italic_i | italic_A [ italic_i ] > 0 } \triangleright Get indices of active samples
10:              sI[0]𝑠𝐼delimited-[]0s\leftarrow I[0]italic_s ← italic_I [ 0 ]
11:              if |I|1sFD𝐼1𝑠subscript𝐹𝐷|I|\neq 1\vee s\in F_{D}| italic_I | ≠ 1 ∨ italic_s ∈ italic_F start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT then \triangleright Check if sample is not isolated, or already covered
12:                  L.Wi𝒩formulae-sequence𝐿similar-tosubscript𝑊𝑖𝒩L.W_{i}\sim\mathcal{N}italic_L . italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N \triangleright Randomly re-initialize weight row i𝑖iitalic_i
13:                  continue
14:              end if
15:              FDFD{s}subscript𝐹𝐷subscript𝐹𝐷𝑠F_{D}\leftarrow F_{D}\cup\left\{s\right\}italic_F start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ← italic_F start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ∪ { italic_s } \triangleright Mark sample as frozen
16:         end for
17:     end for
18:end for

5 Defence: Activation-based Greedy Gradient Pruning

To counter attacks that exploit gradient sparsity in fully connected layers, we propose Activation-based Greedy Gradient Pruning (AGGP). This is a novel approach that detects and mitigates both passive and active data leakage. Unlike previous works that suggest skipping entire training rounds when potential data leakage is detected [14], we adopt a more targeted strategy by selectively pruning gradients of suspect neurons, scaled by their activation pattern. We account for the fact that even benign networks may occasionally leak data points, and that skipping entire updates could withhold valuable training information.

A forward hook is registered at a potentially vulnerable linear layer L𝐿Litalic_L. The hook records and caches activation counts, i.e., the number of samples in the batch that lead to a positive activation in a particular neuron. As outlined in Algorithm 2, after the loss and gradients have been calculated, AGGP iterates over all neurons in L𝐿Litalic_L. We take ansubscript𝑎𝑛a_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to be the number of samples that activate a specific neuron n𝑛nitalic_n. Take c𝑐citalic_c to be an arbitrary cut-off sample count. Neurons that did not activate (i.e., with an=0subscript𝑎𝑛0a_{n}=0italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 0) or those that activated for more than c𝑐citalic_c samples (i.e., with ancsubscript𝑎𝑛𝑐a_{n}\geq citalic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ italic_c) are skipped. For all other neurons with 0<an<c0subscript𝑎𝑛𝑐0<a_{n}<c0 < italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT < italic_c, the percentage pkeep,nsubscript𝑝𝑘𝑒𝑒𝑝𝑛p_{keep,n}italic_p start_POSTSUBSCRIPT italic_k italic_e italic_e italic_p , italic_n end_POSTSUBSCRIPT of gradient values to retain is calculated as:

pkeep,n=(an1)2(pupl)(c2)2+plsubscript𝑝𝑘𝑒𝑒𝑝𝑛superscriptsubscript𝑎𝑛12subscript𝑝𝑢subscript𝑝𝑙superscript𝑐22subscript𝑝𝑙p_{keep,n}=\frac{(a_{n}-1)^{2}\cdot\left(p_{u}-p_{l}\right)}{(c-2)^{2}}+p_{l}italic_p start_POSTSUBSCRIPT italic_k italic_e italic_e italic_p , italic_n end_POSTSUBSCRIPT = divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ ( italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) end_ARG start_ARG ( italic_c - 2 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT (14)

where plsubscript𝑝𝑙p_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and pusubscript𝑝𝑢p_{u}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT represent the lower and upper bound for the percentage of gradient values that are retained for a=1𝑎1a=1italic_a = 1 and a=c1𝑎𝑐1a=c-1italic_a = italic_c - 1 respectively. In our experiments on the ImageNet dataset, we set c=16𝑐16c=16italic_c = 16, pl=0.01subscript𝑝𝑙0.01p_{l}=0.01italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = 0.01 and pu=0.95subscript𝑝𝑢0.95p_{u}=0.95italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 0.95. See Section B.4 for a detailed explanation of how we arrived at these hyperparameters. For higher ansubscript𝑎𝑛a_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT values pkeep,nsubscript𝑝𝑘𝑒𝑒𝑝𝑛p_{keep,n}italic_p start_POSTSUBSCRIPT italic_k italic_e italic_e italic_p , italic_n end_POSTSUBSCRIPT increases as the overlapping samples lead to more diffuse representations, where individual samples are increasingly obscured. AGGP then continues by sorting the gradients of the corresponding weight row by their absolute magnitude. Among the top pkeep,nsubscript𝑝𝑘𝑒𝑒𝑝𝑛p_{keep,n}italic_p start_POSTSUBSCRIPT italic_k italic_e italic_e italic_p , italic_n end_POSTSUBSCRIPT percent of values, 25%percent2525\%25 % are randomly selected to be retained, while all other values are set to zero, to further obscure potentially connected features. This effectively prevents the perfect reconstruction of individual samples, while 25%percent2525\%25 % of the gradient values corresponding to the pkeep,nsubscript𝑝𝑘𝑒𝑒𝑝𝑛p_{keep,n}italic_p start_POSTSUBSCRIPT italic_k italic_e italic_e italic_p , italic_n end_POSTSUBSCRIPT percent largest magnitudes are maintained, allowing the propagation of valuable training information.

Algorithm 2 Activation-based Greedy Gradient Pruning (AGGP)
1:Input: Linear layer L𝐿Litalic_L of shape M×N𝑀𝑁M\times Nitalic_M × italic_N, Batch X~~𝑋\tilde{X}over~ start_ARG italic_X end_ARG of size B𝐵Bitalic_B, cut-off threshold c𝑐citalic_c, lower and upper pruning bounds plsubscript𝑝𝑙p_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and pusubscript𝑝𝑢p_{u}italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT
2:𝐀L(X~)>0𝐀𝐿~𝑋0\mathbf{A}\leftarrow L(\tilde{X})>0bold_A ← italic_L ( over~ start_ARG italic_X end_ARG ) > 0 \triangleright Store layer activations during forward pass
3:loss=𝑙𝑜𝑠𝑠loss=\cdotsitalic_l italic_o italic_s italic_s = ⋯
4:Gloss.backward()formulae-sequenceG𝑙𝑜𝑠𝑠𝑏𝑎𝑐𝑘𝑤𝑎𝑟𝑑\textbf{G}\leftarrow loss.backward()G ← italic_l italic_o italic_s italic_s . italic_b italic_a italic_c italic_k italic_w italic_a italic_r italic_d ( ) \triangleright Compute gradients
5:for all neuron n𝑛nitalic_n in L𝐿Litalic_L do
6:     ani=1M𝕀(An,i>0)subscript𝑎𝑛superscriptsubscript𝑖1𝑀𝕀subscript𝐴𝑛𝑖0a_{n}\leftarrow\sum_{i=1}^{M}\mathbb{I}(A_{n,i}>0)italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ← ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT blackboard_I ( italic_A start_POSTSUBSCRIPT italic_n , italic_i end_POSTSUBSCRIPT > 0 ) \triangleright Number of samples that activate neuron n𝑛nitalic_n
7:     if an=0an>csubscript𝑎𝑛0subscript𝑎𝑛𝑐a_{n}=0\vee a_{n}>citalic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 0 ∨ italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT > italic_c then \triangleright Neurons with no activation or >>> cut-off are skipped
8:         continue
9:     end if
10:     𝐩keep(an1)2(pupl)(c2)2+plsubscript𝐩𝑘𝑒𝑒𝑝superscriptsubscript𝑎𝑛12subscript𝑝𝑢subscript𝑝𝑙superscript𝑐22subscript𝑝𝑙\mathbf{p}_{keep}\leftarrow\frac{(a_{n}-1)^{2}\cdot\left(p_{u}-p_{l}\right)}{(% c-2)^{2}}+p_{l}bold_p start_POSTSUBSCRIPT italic_k italic_e italic_e italic_p end_POSTSUBSCRIPT ← divide start_ARG ( italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ ( italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) end_ARG start_ARG ( italic_c - 2 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT \triangleright Determine percentage of gradient values to retain
11:     k𝐩keepN𝑘subscript𝐩𝑘𝑒𝑒𝑝𝑁k\leftarrow\lfloor\mathbf{p}_{keep}\cdot N\rflooritalic_k ← ⌊ bold_p start_POSTSUBSCRIPT italic_k italic_e italic_e italic_p end_POSTSUBSCRIPT ⋅ italic_N ⌋ \triangleright Number of elements to fully prune
12:     𝐬argsort(|𝐆𝐧|)𝐬argsortsubscript𝐆𝐧\mathbf{s}\leftarrow\text{argsort}(\mathbf{|G_{n}|)}bold_s ← argsort ( | bold_G start_POSTSUBSCRIPT bold_n end_POSTSUBSCRIPT | ) \triangleright Get sorted indices of absolute gradient values of n𝑛nitalic_n-th weight row
13:     𝐈𝐬[:k]\mathbf{I}\leftarrow\mathbf{s}[:k]bold_I ← bold_s [ : italic_k ] \triangleright Retrieve indices of lowest k𝑘kitalic_k and random 75% of top Nk𝑁𝑘N-kitalic_N - italic_k values
14:     𝐈𝐈𝐬[k:][randperm(Nk)[:0.75(Nk)]]\mathbf{I}\leftarrow\mathbf{I}\cup\mathbf{s}[k:][\text{randperm}(N-k)[:\lfloor 0% .75\cdot(N-k)\rfloor]]bold_I ← bold_I ∪ bold_s [ italic_k : ] [ randperm ( italic_N - italic_k ) [ : ⌊ 0.75 ⋅ ( italic_N - italic_k ) ⌋ ] ]
15:     𝐆𝐧[𝐈]0subscript𝐆𝐧delimited-[]𝐈0\mathbf{G_{n}}[\mathbf{I}]\leftarrow 0bold_G start_POSTSUBSCRIPT bold_n end_POSTSUBSCRIPT [ bold_I ] ← 0 \triangleright Zero out gradients
16:end for

6 Experimental Evaluation

We closely follow the experimental setup of Boenisch et al. [12] for both image and text data, to allow a direct comparison to be made. See Algorithm 3 for further implementation details.

Image Data Extraction

We evaluated our method on two benchmark vision datasets: ImageNet [21] at a resolution of 224×224224224224\times 224224 × 224, and CIFAR-10 [22] at a resolution of 32×32323232\times 3232 × 32. As our method requires normalized data, we used publically available normalization parameters for both datasets. The model we used consisted of convolutional layers maliciously initialized to transfer the input further into the network, followed by a linear layer initialized with either QBI or PAIRS. The rate of perfectly reconstructed images R𝑅Ritalic_R (see Equation 10) was evaluated using batch sizes of 20, 50, 100, and 200, and layer sizes of 200, 500, and 1000. Our results, presented in Table 1, surpass those reported by Boenisch et al. [12] using their trap weights approach. Our method yields consistent improvements for both the CIFAR-10 and ImageNet datasets across various layer size and batch size combinations. Notably, the most significant gains are observed on ImageNet with smaller batch sizes, where our method achieves up to 50%percent5050\%50 % higher reconstruction rates. Figure 2 in Appendix A.1 presents an example batch of 20 images and the perfectly reconstructed subset of 16 images, obtained from a layer size of 200. We also conducted experiments using unnormalized data, inserting either a batch normalization layer or a layer normalization layer before the maliciously initialized linear layer. See Section B.1, B.2 and Table 6 for more details.

Text Data Extraction

The IMDB sentiment analysis dataset [23] was used to evaluate the extraction of text data. Our model operates on 250-token sentences with an embedding dimension of 250, where the embedded tokens are directly fed to a fully connected layer of size 1000. We re-trained the bert-base-uncased tokenizer [24] on the IMDB dataset, resulting in a vocabulary size of 10,0001000010,00010 , 000. The extraction was evaluated on batch sizes of 20, 50, 100, and 200. The results, presented in Table 2, are compared to those reported by Boenisch et al. [12]. Our approach performs similarly across batch sizes 20 and 50, but achieves performance gains of 25%percent2525\%25 % and 47%percent4747\%47 % on batch sizes 100 and 200, respectively.

Secondary Metrics

The advantage of our method can be explained by examining the secondary metrics precision P𝑃Pitalic_P and activation value A𝐴Aitalic_A (see Equation 11). As established in Equation 12, the theoretical optimum for these values lies at A=11/e63.2%𝐴11𝑒percent63.2A=1-1/e\approx 63.2\%italic_A = 1 - 1 / italic_e ≈ 63.2 % and P=1/e36.8%𝑃1𝑒percent36.8P=1/e\approx 36.8\%italic_P = 1 / italic_e ≈ 36.8 %. Although these can only be achieved if the target data is truly normally distributed (see Table 5 in the Appendix), values that lie closer to this optimum will lead to better reconstruction rates. Table 4 in the Appendix compares the A𝐴Aitalic_A and P𝑃Pitalic_P values achieved using QBI and PAIRS to those obtained using trap weights [12] on image data. Specifically, on the ImageNet dataset, our A𝐴Aitalic_A values range from 72%percent7272\%72 % to 87%percent8787\%87 %, whereas those reported in Boenisch et al. [12] show a stronger variance across batch sizes, ranging from 9%percent99\%9 % to 89%percent8989\%89 %. Similarly, our precision values range from 26%percent2626\%26 % to 34%percent3434\%34 % on ImageNet, while those reported in [12] vary from 23%percent2323\%23 % to 94%percent9494\%94 %.

Table 1: Comparing the percentage of perfectly reconstructed images (R𝑅Ritalic_R, see Equation 13) taken from both the ImageNet and CIFAR-10 datasets, with the results reported in [12] using their trap weights, across various combinations of neuron counts N𝑁Nitalic_N and batch sizes B𝐵Bitalic_B. The listed values represent the mean across 10 random initializations, with each initialization evaluated on 10 random batches. The error margins indicate the 95% confidence interval.
ImageNet CIFAR-10
(N, B) [12] QBI (Ours) PAIRS (Ours) [12] QBI (Ours) PAIRS (Ours)
(200, 20) 35.5 82.5 ±plus-or-minus\pm± 2.43 85.5 ±plus-or-minus\pm± 1.25 69.5 75.7 ±plus-or-minus\pm± 1.85 77.1 ±plus-or-minus\pm± 1.19
(200, 50) 30.4 52.0 ±plus-or-minus\pm± 1.46 56.0 ±plus-or-minus\pm± 1.13 45.2 46.5 ±plus-or-minus\pm± 1.06 48.4 ±plus-or-minus\pm± 0.43
(200, 100) 24.0 29.0 ±plus-or-minus\pm± 0.92 34.6 ±plus-or-minus\pm± 0.67 26.9 28.4 ±plus-or-minus\pm± 0.60 31.5 ±plus-or-minus\pm± 0.73
(200, 200) 11.3 15.1 ±plus-or-minus\pm± 0.62 19.5 ±plus-or-minus\pm± 0.60 9.60 15.8 ±plus-or-minus\pm± 0.49 18.6 ±plus-or-minus\pm± 0.32
(500, 20) 49.0 93.6 ±plus-or-minus\pm± 1.10 94.5 ±plus-or-minus\pm± 0.84 87.0 87.6 ±plus-or-minus\pm± 1.18 87.8 ±plus-or-minus\pm± 1.36
(500, 50) 42.6 73.5 ±plus-or-minus\pm± 1.50 76.8 ±plus-or-minus\pm± 1.27 61.4 63.8 ±plus-or-minus\pm± 1.09 67.3 ±plus-or-minus\pm± 1.28
(500, 100) 35.8 49.0 ±plus-or-minus\pm± 1.09 55.8 ±plus-or-minus\pm± 0.87 42.2 45.1 ±plus-or-minus\pm± 0.74 48.1 ±plus-or-minus\pm± 0.80
(500, 200) 19.9 29.2 ±plus-or-minus\pm± 0.49 35.9 ±plus-or-minus\pm± 0.35 17.7 28.4 ±plus-or-minus\pm± 0.43 32.6 ±plus-or-minus\pm± 0.60
(1000, 20) 59.5 96.6 ±plus-or-minus\pm± 0.78 96.7 ±plus-or-minus\pm± 0.66 91.5 91.3 ±plus-or-minus\pm± 1.40 91.8 ±plus-or-minus\pm± 1.49
(1000, 50) 51.6 84.3 ±plus-or-minus\pm± 0.59 86.6 ±plus-or-minus\pm± 0.67 72.4 74.3 ±plus-or-minus\pm± 0.93 77.7 ±plus-or-minus\pm± 1.17
(1000, 100) 45.7 64.8 ±plus-or-minus\pm± 0.57 68.8 ±plus-or-minus\pm± 1.00 55.6 57.2 ±plus-or-minus\pm± 0.76 59.4 ±plus-or-minus\pm± 0.52
(1000, 200) 28.8 42.7 ±plus-or-minus\pm± 0.72 49.4 ±plus-or-minus\pm± 3.00 25.6 39.2 ±plus-or-minus\pm± 0.49 42.6 ±plus-or-minus\pm± 0.60
Table 2: Comparing the percentage of perfectly reconstructed text samples R𝑅Ritalic_R from the IMDB dataset [23], with the results reported in [12], across various batch sizes B𝐵Bitalic_B using a layer size of 1000. The values represent the mean across 10 random initializations, with each initialization evaluated on 10 random batches. The error margins indicate the 95% confidence interval.
B Trap weights[12] QBI (Ours) PAIRS (Ours)
20 100.0 100 ±plus-or-minus\pm± 0.00 99.9 ±plus-or-minus\pm± 0.20
50 96.2 98.9 ±plus-or-minus\pm± 0.46 98.6 ±plus-or-minus\pm± 0.43
100 65.4 90.5 ±plus-or-minus\pm± 0.33 90.8 ±plus-or-minus\pm± 0.82
200 25.5 72.8 ±plus-or-minus\pm± 0.64 73.3 ±plus-or-minus\pm± 0.78

AGGP

We find that our proposed defense framework reduces the percentage of perfectly reconstructable samples obtained via gradient sparsity in linear layers to zero. This occurs as any neuron that meets the condition for perfect extraction (an=1subscript𝑎𝑛1a_{n}=1italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 1), triggers the pruning of 10.25pl10.25subscript𝑝𝑙1-0.25\cdot p_{l}1 - 0.25 ⋅ italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT percent of gradient values of its corresponding weight row (see Equation 14). The effect of our defense is best presented visually – the left-hand side of Figure 1, displays the data that is leaked passively from the first 20 neurons of a benign network after a single training step with a batch of 20 samples from the ImageNet dataset. The right-hand side depicts the impact of AGGP on the same scenario, where gradients of neurons with low activation counts are aggressively pruned, while those with high activation counts remain unaffected. Further visualizations of AGGP’s impact on a maliciously initialized model, along with preliminary observations of its effect on training performance, are provided in Section A.5.

Refer to caption
Figure 1: Visualization of the passive data leakage of the first 20 neurons of a linear layer of size 200 (left) and the impact of AGGP (right). Sparsely activated neurons are aggressively pruned, while the gradients of neurons with activation counts exceeding the cut-off threshold remain unaffected.

7 Discussion

Impact on secure aggregation and distributed differential privacy

Secure aggregation (SA) [25] and distributed differential privacy (DDP) [26] are proposed modifications to the traditional FL protocol, designed to decrease the amount of trust users have to place in the central entity. SA employs a multiparty computation protocol to perform decentralized gradient aggregation, while in DDP users locally add a small amount of noise to their gradient updates. Boenisch et al. [27] demonstrated that attacks leveraging gradient sparsity can undermine these protocols if the server is able to introduce malicious nodes in the form of so-called sybil devices. Since their work uses the previously described trap weights [12], replacing the model initialization with our QBI or PAIRS approach could significantly improve the performance of this attack vector.

Detectability

Our method leaves all weight values completely randomly initialized, making it virtually undetectable in weight space. However, it introduces a negative shift in the bias values, which could potentially be detected by the client. Additionally, as our method relies on gradient sparsity, it would be feasible to detect it in gradient space, e.g., by leveraging measures like the disaggregation signal-to-noise ratio [14].

Limitations

Boenisch et al. [12] and our attack rely on the existence of a fully connected layer, either at the beginning of the network or positioned such that preceding layers can be maliciously initialized to perfectly transmit the input deeper into the network. Preceding layers that reduce dimensionality, for example, through pooling operations, will diminish fidelity and thereby prevent perfect data extraction. We conduct preliminary tests on the impact of AGGP on training performance that indicate little-to-no adverse effects, however, a comprehensive evaluation across a wider range of architectures, datasets, and pruning functions is needed to achieve generalizable insights.

8 Conclusion

In this work, we introduce a novel bias tuning method designed to enhance attacks targeting private data reconstruction in federated learning systems. Our approach, encompassing the QBI and PAIRS variants, outperforms comparable gradient sparsity methods across various datasets and batch sizes, achieving superior rates of perfect user data reconstruction. By establishing theoretical limits for stochastic gradient sparsity attacks, our work provides a crucial step towards a more comprehensive understanding of the fundamental constraints imposed by the probabilistic nature of these attacks. Additionally, we introduce AGGP as a defense mechanism against gradient sparsity attacks, effectively mitigating data leakage in linear layers. While our attack method poses privacy risks, we believe that by sharing the details of our approach, we can encourage systematic exploration of privacy safeguards and enable practitioners to better assess and mitigate privacy risks in FL deployments, ultimately promoting safer machine learning practices.

Acknowledgments

David Khachaturov is supported by the University of Cambridge Harding Distinguished Postgraduate Scholars Programme.

References

  • Zhang et al. [2021] Tuo Zhang, Chaoyang He, Tianhao Ma, Lei Gao, Mark Ma, and Salman Avestimehr. Federated learning for internet of things. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, pages 413–419, 2021.
  • Wang et al. [2021] Yi Wang, Imane Lahmam Bennani, Xiufeng Liu, Mingyang Sun, and Yao Zhou. Electricity consumer characteristics identification: A federated learning approach. IEEE Transactions on Smart Grid, 12(4):3637–3647, 2021.
  • Basu et al. [2020] Treena Basu, Sebastian Engel-Wolf, and Olaf Menzer. The ethics of machine learning in medical sciences: Where do we stand today? Indian Journal of Dermatology, 65(5):358–364, 2020.
  • gdp [2016] General data protection regulation (gdpr). https://eur-lex.europa.eu/eli/reg/2016/679/oj, 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016.
  • ccp [2018] California consumer privacy act (ccpa). https://leginfo.legislature.ca.gov/faces/codes_displayText.xhtml?lawCode=CIV&division=3.&title=1.81.&part=4.&chapter=&article=, 2018. California Civil Code, Title 1.81.5, Section 1798.100 et seq.
  • McMahan et al. [2017] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  • Ganju et al. [2018] Karan Ganju, Qi Wang, Wei Yang, Carl A Gunter, and Nikita Borisov. Property inference attacks on fully connected neural networks using permutation invariant representations. In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pages 619–633, 2018.
  • Melis et al. [2019] Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE symposium on security and privacy (SP), pages 691–706. IEEE, 2019.
  • Fowl et al. [2021] Liam Fowl, Jonas Geiping, Wojtek Czaja, Micah Goldblum, and Tom Goldstein. Robbing the fed: Directly obtaining private data in federated learning with modified models. arXiv preprint arXiv:2110.13057, 2021.
  • Wang et al. [2019] Zhibo Wang, Mengkai Song, Zhifei Zhang, Yang Song, Qian Wang, and Hairong Qi. Beyond inferring class representatives: User-level privacy leakage from federated learning. In IEEE INFOCOM 2019-IEEE conference on computer communications, pages 2512–2520. IEEE, 2019.
  • Zhao et al. [2020] Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. idlg: Improved deep leakage from gradients. arXiv preprint arXiv:2001.02610, 2020.
  • Boenisch et al. [2023a] Franziska Boenisch, Adam Dziedzic, Roei Schuster, Ali Shahin Shamsabadi, Ilia Shumailov, and Nicolas Papernot. When the curious abandon honesty: Federated learning is not private. In 2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), pages 175–199. IEEE, 2023a.
  • Zhao et al. [2023] Joshua C Zhao, Atul Sharma, Ahmed Roushdy Elkordy, Yahya H Ezzeldin, Salman Avestimehr, and Saurabh Bagchi. Secure aggregation in federated learning is not private: Leaking user data at large scale through model modification. arXiv preprint arXiv:2303.12233, 2023.
  • Garov et al. [2023] Kostadin Garov, Dimitar I Dimitrov, Nikola Jovanović, and Martin Vechev. Hiding in plain sight: Disguising data stealing attacks in federated learning. arXiv preprint arXiv:2306.03013, 2023.
  • Pasquini et al. [2022] Dario Pasquini, Danilo Francati, and Giuseppe Ateniese. Eluding secure aggregation in federated learning via model inconsistency. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 2429–2443, 2022.
  • Zhang et al. [2022] Shuaishuai Zhang, Jie Huang, Zeping Zhang, and Chunyang Qi. Compromise privacy in large-batch federated learning via malicious model parameters. In International Conference on Algorithms and Architectures for Parallel Processing, pages 63–80. Springer, 2022.
  • Wen et al. [2022] Yuxin Wen, Jonas Geiping, Liam Fowl, Micah Goldblum, and Tom Goldstein. Fishing for user data in large-batch federated learning via gradient magnification. arXiv preprint arXiv:2202.00580, 2022.
  • Geiping et al. [2020] Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, and Michael Moeller. Inverting gradients-how easy is it to break privacy in federated learning? Advances in neural information processing systems, 33:16937–16947, 2020.
  • Yin et al. [2021] Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M Alvarez, Jan Kautz, and Pavlo Molchanov. See through gradients: Image batch recovery via gradinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16337–16346, 2021.
  • Zhu et al. [2019] Ligeng Zhu, Zhijian Liu, and Song Han. Deep leakage from gradients. Advances in neural information processing systems, 32, 2019.
  • Deng et al. [2009] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  • Krizhevsky et al. [2009] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
  • Maas et al. [2011] Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 142–150, 2011.
  • Devlin et al. [2018] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018. URL http://arxiv.org/abs/1810.04805.
  • Bonawitz et al. [2017] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1175–1191, 2017.
  • Kim et al. [2021] Muah Kim, Onur Günlü, and Rafael F Schaefer. Federated learning with local differential privacy: Trade-offs between privacy, utility, and communication. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2650–2654. IEEE, 2021.
  • Boenisch et al. [2023b] Franziska Boenisch, Adam Dziedzic, Roei Schuster, Ali Shahin Shamsabadi, Ilia Shumailov, and Nicolas Papernot. Reconstructing individual data points in federated learning hardened with differential privacy and secure aggregation. In 2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), pages 241–257. IEEE, 2023b.
  • Kingma and Ba [2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • Wang et al. [2004] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.

Appendix A Additional Experimental Results

A.1 Images

Refer to caption
Figure 2: True user data (left), a batch of 20 images from the ImageNet dataset and reconstructed user data (right), using a linear layer of size 200 that was maliciously initialized with our QBI approach. Fully black images denote data points that could not be recovered. Despite the small layer size, in this particular setting, our method achieves perfect reconstruction of around 82.5% of the original data points, on average.

A.2 Activation values (A) and precision (P)

Table 3: Comparing A𝐴Aitalic_A and P𝑃Pitalic_P obtained on the IMDB text dataset, (see Equation 11) with the results reported by Boenisch et al. [12], across various batch sizes B𝐵Bitalic_B using a layer size of 1000. The corresponding R𝑅Ritalic_R values measuring the extraction success are listed in Table 2. Results are averaged over 10 random initializations, each evaluated on 10 random batches.
A P
B [12] QBI PAIRS [12] QBI PAIRS
20 51.9 58.7 59.2 61.0 33.3 33.6
50 77.6 57.1 57.0 37.6 32.6 32.5
100 91.0 55.1 55.5 19.2 31.8 31.5
200 97.8 53.4 53.8 0.07 30.6 30.7
Table 4: Comparing A𝐴Aitalic_A and P𝑃Pitalic_P (see Equation 11) with the results reported by Boenisch et al. [12], across various batch sizes B𝐵Bitalic_B and layer sizes N𝑁Nitalic_N. The corresponding R𝑅Ritalic_R values measuring the extraction success are listed in Table 1. Results are averaged over 10 random initializations, each evaluated on 10 random batches.
ImageNet CIFAR-10
A P A P
(N, B) [12] QBI PAIRS [12] QBI PAIRS [12] QBI PAIRS [12] QBI PAIRS
(200, 20) 0.09 75.6 72.6 94.8 31.4 33.9 45.4 55.7 56.2 67.0 31.2 31.6
(200, 50) 38.1 80.8 76.2 76.3 25.1 29.7 66.2 53.9 57.1 49.4 26.7 29.4
(200, 100) 65.3 84.6 79.0 50.0 21.4 27.8 84.6 54.6 57.6 28.0 24.9 28.7
(200, 200) 88.6 86.5 80.6 23.3 18.8 26.3 95.4 54.6 56.6 12.4 22.7 27.6
(500, 20) 0.09 75.8 72.7 93.9 31.4 33.0 45.2 56.3 56.9 68.9 30.7 32.7
(500, 50) 38.7 81.4 76.2 76.7 26.2 30.2 65.3 54.8 56.7 50.5 26.2 30.3
(500, 100) 64.6 84.6 78.6 50.8 21.6 27.7 84.5 55.7 57.9 29.0 24.4 29.0
(500, 200) 89.2 87.1 80.4 24.0 18.8 26.3 95.0 56.0 57.0 11.9 22.8 27.8
(1000, 20) 10.2 75.5 73.4 94.2 31.4 33.1 44.1 55.5 57.0 70.3 30.2 32.2
(1000, 50) 38.8 80.9 76.4 77.0 26.1 30.1 64.8 55.6 57.5 50.4 27.0 30.1
(1000, 100) 65.5 84.4 79.3 51.4 22.0 28.0 84.4 55.7 56.5 29.7 24.6 28.5
(1000, 200) 89.2 87.3 81.5 23.8 18.8 26.0 95.1 56.0 58.2 12.0 23.0 27.5

A.3 Synthetic Data

Table 5: Comparing the predicted values Apredsubscript𝐴𝑝𝑟𝑒𝑑A_{pred}italic_A start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d end_POSTSUBSCRIPT, Ppredsubscript𝑃𝑝𝑟𝑒𝑑P_{pred}italic_P start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d end_POSTSUBSCRIPT and Rpredsubscript𝑅𝑝𝑟𝑒𝑑R_{pred}italic_R start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d end_POSTSUBSCRIPT, obtained via Equations 11 and 13, to Aexpsubscript𝐴𝑒𝑥𝑝A_{exp}italic_A start_POSTSUBSCRIPT italic_e italic_x italic_p end_POSTSUBSCRIPT, Pexpsubscript𝑃𝑒𝑥𝑝P_{exp}italic_P start_POSTSUBSCRIPT italic_e italic_x italic_p end_POSTSUBSCRIPT and Rexpsubscript𝑅𝑒𝑥𝑝R_{exp}italic_R start_POSTSUBSCRIPT italic_e italic_x italic_p end_POSTSUBSCRIPT observed experimentally when using a synthetic, fully random dataset. All numbers are averaged over 300 random initalizations, tested with 10 batches of random data each. The experimental setup was idential to that when evaluating the extraction on the CIFAR-10 dataset using QBI, replacing the image data with normal random noise of shape 3×32×32332323\times 32\times 323 × 32 × 32.
(N, B) Apredsubscript𝐴𝑝𝑟𝑒𝑑A_{pred}italic_A start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d end_POSTSUBSCRIPT Aexpsubscript𝐴𝑒𝑥𝑝A_{exp}italic_A start_POSTSUBSCRIPT italic_e italic_x italic_p end_POSTSUBSCRIPT Ppredsubscript𝑃𝑝𝑟𝑒𝑑P_{pred}italic_P start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d end_POSTSUBSCRIPT Pexpsubscript𝑃𝑒𝑥𝑝P_{exp}italic_P start_POSTSUBSCRIPT italic_e italic_x italic_p end_POSTSUBSCRIPT Rpredsubscript𝑅𝑝𝑟𝑒𝑑R_{pred}italic_R start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d end_POSTSUBSCRIPT Rexpsubscript𝑅𝑒𝑥𝑝R_{exp}italic_R start_POSTSUBSCRIPT italic_e italic_x italic_p end_POSTSUBSCRIPT
(200, 20) 64.2 64.1 37.7 37.5 97.8 97.7
(200, 50) 63.6 64.2 37.2 37.3 77.5 77.1
(200, 100) 63.4 63.0 37.0 36.7 52.3 52.1
(200, 200) 63.3 63.1 36.9 36.5 30.9 30.7
(500, 20) 64.2 64.2 37.7 38.0 100 100
(500, 50) 63.6 63.4 37.2 37.0 97.6 97.5
(500, 100) 63.4 63.3 37.0 36.8 84.3 83.9
(500, 200) 63.3 63.1 36.9 36.5 60.3 59.8
(1000, 20) 64.2 64.2 37.7 37.6 100 100
(1000, 50) 63.6 63.6 37.2 37.0 99.9 100
(1000, 100) 63.4 63.2 37.0 36.8 97.5 97.1
(1000, 200) 63.3 63.4 36.9 36.8 84.2 83.6

A.4 DataNorm vs. BatchNorm vs. LayerNorm

Table 6: Comparing extraction recall R𝑅Ritalic_R on image data achieved using QBI in combination with regular data normalization (DN), batch normalization (BN) and layer normalization (LN) across ImageNet and CIFAR-10. The values represent the mean across 10 random initializations, with each initialization evaluated on 10 random batches. Error margins indicating the 95% confidence interval ranged from ±0.2plus-or-minus0.2\pm 0.2± 0.2 to ±1.57plus-or-minus1.57\pm 1.57± 1.57 and are omitted for readability. (*) For layer normalization, extraction recall is defined as the percentage of samples that are isolated by a single neuron, see Section B.2 for further explanation.
ImageNet CIFAR-10
(N, B) DN BN LN* DN BN LN*
(200, 20) 82.5 81.3 95.5 75.7 77.7 94.8
(200, 50) 52.0 51.9 70.2 46.5 47.6 67.7
(200, 100) 29.0 32.5 44.2 28.4 28.9 39.1
(200, 200) 15.1 18.7 24.2 15.8 16.1 21.7
(500, 20) 93.6 91.4 99.9 87.6 87.6 99.5
(500, 50) 73.5 71.3 94.4 63.8 63.9 90.4
(500, 100) 49.0 50.0 75.0 45.1 44.8 70.2
(500, 200) 29.2 32.6 49.3 28.4 28.0 44.0
(1000, 20) 96.6 94.9 100 91.3 91.6 99.9
(1000, 50) 84.3 81.4 99.6 74.3 74.5 98.8
(1000, 100) 64.8 63.4 92.4 57.2 56.2 89.5
(1000, 200) 42.7 44.1 72.4 39.2 38.7 66.8

A.5 AGGP

Refer to caption
Figure 3: Visualization of the active data leakage of the first 20 neurons of a linear layer of size 200 (left), that was maliciously initialized using QBI, and the impact of AGGP (right). The artificially induced sparsity leads to aggressive gradient pruning across the entire layer.

We evaluated the impact of AGGP on a CNN-based architecture (see Table 7), which could, in theory, be maliciously initialized by changing parameter values without modifying the architecture. Since model performance is not a concern when the central entity is malicious or compromised, the impact is assessed on a benign network. The model’s validation accuracy on the CIFAR-10 dataset was evaluated across 10 random initializations, recorded across 25 epochs, using a batch size of 64. The results of these preliminary experiments, presented in Figure 4, show no significant impact on training performance.

Refer to caption
Figure 4: Performance of a benign CNN-based image model (Table 7) on the CIFAR-10 dataset, using a batch size of 64. Comparing the unmodified version (Base) to one protected using AGGP. The experiment used the Adam optimizer [28] with a learning rate of 1e-5 and optimized the cross-entropy loss. Results are averaged across 10 runs using different seeds. The shaded regions correspond to the 95% confidence interval.

Appendix B Background

B.1 Extraction under batch normalization

When a normalization layer, such as a batch normalization layer, precedes the maliciously initialized linear layer, the reconstruction process must be adapted accordingly. This section will specifically focus on the BatchNorm2d111https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html layer as it is implemented in PyTorch. Batch normalization uses the following equation to perform normalization:

y=xE[x]Var[x]+ϵγ+β𝑦𝑥Edelimited-[]𝑥Vardelimited-[]𝑥italic-ϵ𝛾𝛽y=\frac{x-\text{E}[x]}{\sqrt{\text{Var}[x]+\epsilon}}*\gamma+\betaitalic_y = divide start_ARG italic_x - E [ italic_x ] end_ARG start_ARG square-root start_ARG Var [ italic_x ] + italic_ϵ end_ARG end_ARG ∗ italic_γ + italic_β (15)

where x𝑥xitalic_x is the input, γ𝛾\gammaitalic_γ and β𝛽\betaitalic_β are learnable parameters with default values 1111 and 00 respectively, and ϵitalic-ϵ\epsilonitalic_ϵ is a small constant added to the denominator for numerical stability, with default value 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT. The mean and standard-deviation are calculated per-dimension over the mini-batches. Additionally BatchNorm2d keeps running estimates of its computed mean μ^^𝜇\hat{\mu}over^ start_ARG italic_μ end_ARG and variance V^^𝑉\smash{\hat{V}}over^ start_ARG italic_V end_ARG, which are both updated using a momentum of 0.10.10.10.1 using the following equation:

x^new=(1momentum)x^+momentumxtsubscript^𝑥𝑛𝑒𝑤1momentum^𝑥momentumsubscript𝑥𝑡\hat{x}_{new}=(1-\textit{momentum})\cdot\hat{x}+\textit{momentum}\cdot x_{t}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT = ( 1 - momentum ) ⋅ over^ start_ARG italic_x end_ARG + momentum ⋅ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (16)

where x^^𝑥\hat{x}over^ start_ARG italic_x end_ARG is the estimated statistic and xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the new observed value. As extraction is only done on the initial training step after model initialization, μ^^𝜇\hat{\mu}over^ start_ARG italic_μ end_ARG and V^^𝑉\hat{V}over^ start_ARG italic_V end_ARG will be set to their default values 0 and 1 respectively. Hence, the values that we obtain after the first training step are:

μ^new=momentumμtsubscript^𝜇𝑛𝑒𝑤momentumsubscript𝜇𝑡\hat{\mu}_{new}=\textit{momentum}\cdot\mu_{t}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT = momentum ⋅ italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (17)

and

V^new=(1momentum)+momentumVtsubscript^𝑉𝑛𝑒𝑤1momentummomentumsubscript𝑉𝑡\hat{V}_{new}=(1-\textit{momentum})+\textit{momentum}\cdot V_{t}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT = ( 1 - momentum ) + momentum ⋅ italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (18)

allowing us to simply rearrange the equations to solve for the batch statistics:

μ^t=μ^newmomentumsubscript^𝜇𝑡subscript^𝜇𝑛𝑒𝑤momentum\hat{\mu}_{t}=\frac{\hat{\mu}_{new}}{\textit{momentum}}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT end_ARG start_ARG momentum end_ARG (19)
V^t=V^new(1momentum)momentumsubscript^𝑉𝑡subscript^𝑉𝑛𝑒𝑤1momentummomentum\hat{V}_{t}=\frac{\hat{V}_{new}-(1-\textit{momentum})}{\textit{momentum}}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT - ( 1 - momentum ) end_ARG start_ARG momentum end_ARG (20)

Since the learnable parameters γ𝛾\gammaitalic_γ and β𝛽\betaitalic_β are intialized to 1 and 0 respectively, they have no effect during the first training step and can be ignored. Finally, given the extracted sample y𝑦yitalic_y from the gradients, normalization can be reversed to obtain the original sample x𝑥xitalic_x by rearranging equation Equation 15 as:

x=μ^t+yV^t+ϵ𝑥subscript^𝜇𝑡𝑦subscript^𝑉𝑡italic-ϵx=\hat{\mu}_{t}+y\sqrt{\hat{V}_{t}+\epsilon}italic_x = over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_y square-root start_ARG over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_ϵ end_ARG (21)

B.2 Extraction under layer normalization

This section will specifically focus on the LayerNorm222https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html layer as it is implemented in PyTorch. Layer normalization uses the same equation as batch normalization (see Equation 15) with the difference, that the mean and standard-deviation are calculated over the last D𝐷Ditalic_D dimensions of every sample, where D𝐷Ditalic_D is the dimension of the normalized_shape. In our case, normalized_shape is a 1-dimensional vector representing the flattened input image. Since each sample is normalized with respect to its own mean and standard deviation, perfect reconstruction, defined by an L2 loss of zero, is not possible, as the central entity does not have knowledge of these per-sample normalization parameters which are typically not shared or cached. However, normalization can be reversed by substituting with commonly available normalization parameters of the target domain, as an approximation for the true sample statistics. Figure 5 displays a mini-batch of true user data compared to the images that could be reconstructed when the QBI-initialized linear layer was preceded by a LayerNorm layer, where normalization was reversed using the publicly available ImageNet normalization parameters. It is evident, that while the reversal of normalization with approximated parameters introduces a slight shift to the images color and brightness, detail and structure are preserved. When a LayerNorm layer precedes a linear layer that was maliciously initialized with QBI, reconstruction success, as measured by samples that are isolated by a single neuron, is greatly increased, as per-sample normalization is more effective at bringing input features closer to being normally distributed, compared to regular data normalization or batch normalization.

Refer to caption
Figure 5: A mini-batch of true user data from the ImageNet dataset, compared to the images that could be reconstructed when the QBI-initialized linear layer was preceded by a LayerNorm layer. Normalization was reversed using the publicly available ImageNet normalization parameters. The reversal of normalization with imperfect parameters introduces a slight shift to the images color and brightness, however detail and structure are preserved, leading to a high structural similarity index (SSIM), ranging from 0.820.820.820.82 to 0.960.960.960.96 for these samples.

B.3 Intuition of Equation 13

Equation 13 estimates the expected percentage of reconstructed samples in the optimal case of normally distributed features. To do this, we calculate the probability of a single sample being successfully reconstructed, which is equal to the expected recall percentage. Let B𝐵Bitalic_B be the number of samples in our batch, and N𝑁Nitalic_N be the number of neurons in our linear layer. Additionally, we assume that we have achieved the optimal probability of activation of 1/B1𝐵1/B1 / italic_B for every neuron-sample pair. Given a single neuron nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and a single sample x𝑥xitalic_x, the probability that this neuron activates only for x𝑥xitalic_x and not for all other samples xxsuperscript𝑥𝑥x^{\prime}\neq xitalic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_x is

1B(B1B)B1,1𝐵superscript𝐵1𝐵𝐵1\frac{1}{B}\left(\frac{B-1}{B}\right)^{B-1},divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ( divide start_ARG italic_B - 1 end_ARG start_ARG italic_B end_ARG ) start_POSTSUPERSCRIPT italic_B - 1 end_POSTSUPERSCRIPT , (22)

which we also refer to as the probability that nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT isolates x𝑥xitalic_x. The complement of this event is the probability that the neuron does not isolate x𝑥xitalic_x:

11B(B1B)B1.11𝐵superscript𝐵1𝐵𝐵11-\frac{1}{B}\left(\frac{B-1}{B}\right)^{B-1}.1 - divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ( divide start_ARG italic_B - 1 end_ARG start_ARG italic_B end_ARG ) start_POSTSUPERSCRIPT italic_B - 1 end_POSTSUPERSCRIPT . (23)

Raising this result to the power of N𝑁Nitalic_N yields the probability that all neurons do not isolate x𝑥xitalic_x. Finally, the complement of this event is the probability that at least one neuron isolates x𝑥xitalic_x, which in turn results in x𝑥xitalic_x being reconstructed. This leads to the final formula:

pR;B;N=𝔼[R]=1(11B(B1B)B1)Nsubscript𝑝𝑅𝐵𝑁𝔼delimited-[]𝑅1superscript11𝐵superscript𝐵1𝐵𝐵1𝑁p_{R;B;N}=\mathbb{E}[R]=1-\left(1-\frac{1}{B}\left(\frac{B-1}{B}\right)^{B-1}% \right)^{N}italic_p start_POSTSUBSCRIPT italic_R ; italic_B ; italic_N end_POSTSUBSCRIPT = blackboard_E [ italic_R ] = 1 - ( 1 - divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ( divide start_ARG italic_B - 1 end_ARG start_ARG italic_B end_ARG ) start_POSTSUPERSCRIPT italic_B - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT (24)

B.4 AGGP Hyperparameters

Refer to caption
Figure 6: Effect of averaging an increasing number of images n𝑛nitalic_n from the ImageNet dataset, on the level of obfuscation.

To determine suitable hyperparameters for AGGP applied to ImageNet data, we investigated the level of obfuscation achieved when multiple images are averaged together in a gradient row. Figure 6 displays an example of this effect, where the average is computed over an increasing number of images n𝑛nitalic_n. To quantify this effect more objectively, we conducted a systematic evaluation. We randomly selected one image and averaged it with an increasing number of additional images (from 1 to 25), and then computed three commonly used metrics to quantify the similarity between the original image and the averaged image: the Structural Similarity Index Measure (SSIM, [29]), L1 Distance, and Peak Signal-to-Noise Ratio (PSNR). By repeating this process 100 times and averaging the metrics across all experiments, we obtained the results shown in Figure 8. The results clearly show that PSNR and SSIM decrease sharply in the range [0,15]015[0,15][ 0 , 15 ], while the L1 distance increases. After that, the values appear to converge slowly with little to no movement, which is why we decided to select 16 as our cut-off value c𝑐citalic_c. As evident from Figure 6 and Figure 8, low activation counts provide little obscurity, while higher activation counts require minimal pruning to obscure data, which is why we set the bounds for pkeepsubscript𝑝keepp_{\text{keep}}italic_p start_POSTSUBSCRIPT keep end_POSTSUBSCRIPT to pl=0.01subscript𝑝𝑙0.01p_{l}=0.01italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = 0.01 and pu=0.95subscript𝑝𝑢0.95p_{u}=0.95italic_p start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 0.95.

Refer to caption
Figure 7:
Figure 8: Average similarity metrics (SSIM, L1 Distance, PSNR) between an original image from the ImageNet dataset and its average with 1 to 25 additional images, repeated 100 times. PSNR and SSIM drop significantly up to N=15, while L1 Distance increases.

Appendix C Experiment details

All results that we report are averaged across 10 runs, each evaluated on 10 batches of unseen test data. For each individual run, the weights are randomly initialized, and the train / test split is shuffled randomly. In the case of malicious model initialization and data extraction, train images refer to those used by the PAIRS algorithm to tune the model’s parameters, while test images refer to those used to evaluate the reconstruction percentage. For each (N,B) setting, the standard error across these 10 runs is calculated, and multiplied by 1.961.961.961.96 to determine the 95% CI. For every batch, a single forward pass is performed, and the metrics A𝐴Aitalic_A, P𝑃Pitalic_P and R𝑅Ritalic_R (see Equations 11 and 13) are obtained. Since during evaluation, we have access to the internals of the model, we don’t have to examine the gradients to determine the data leakage. Rather we directly determine which samples will be leaked by observing the activation patterns in the maliciously initialized layer during the forward pass. Furthermore this direct access allows us to feed the input directly to our maliciously initialized layer, circumventing compute heavy convolutional layers that were initialized as identity functions.

C.1 Datasets

We used three datasets to evaluate our method: The ImageNet-1k [21] validation set (6GB, 50k images) 333https://image-net.org/challenges/LSVRC/2012, the CIFAR-10 [22] dataset 444https://www.cs.toronto.edu/ kriz/cifar.html and the IMDB [23] binary sentiment classification dataset 555https://ai.stanford.edu/ amaas/data/sentiment. All these datasets can be used for non-commercial research purposes.

C.2 Compute Resources

All experiments were run on an RTX 2060 Super with 8GB VRAM. Data-extraction runs in near real-time, scaling linearly with batch size and data dimensionality, as it is simply done in one step, by dividing the weight gradients by the bias gradients. Time to maliciously initialize the model varied across methods: QBI required less than 1 second in all settings, while PAIRS initialization times ranged from 10 seconds for CIFAR-10 (N=200, B=20) to 12 minutes for ImageNet (N=1000, B=200). For even larger layer sizes N, initialization time for PAIRS will scale linearly. Obtaining all results, for both QBI and PAIRS, across all possible combinations of N and B and all three datasets, averaged across 10 runs with random seeds for model intialization and train / test split, required approximately 8 hours of compute. Evaluation of AGGP’s impact on training performance across 20 runs (10 Base, 10 protected with AGGP) required about 2 hours of training time. Experiments evaluating QBI on synthetic data (see Table 5) took about 1 hour, averaging over 300 random runs per (N, B) setting.

C.3 Image models

Table 7 outlines the implementation of the model used for image data extraction. As Boenisch et al. [12] explain in their Appendix B, convolutional layers can be modified to pass the input to the next layer, effectively acting as an identity function. This can be achieved through various methods. Algorithm 3 provides a minimal working example using a 2D convolutional layer, suitable for RGB images. To preserve the image shape, we employ a kernel size of 3, padding of 1, and stride of 1. Since we have three channels to transmit, we initialize three filters, each acting as the identity function for its respective channel. This is achieved by setting the weight values of the i𝑖iitalic_i-th filter to zero and then setting the center value of its weight matrix for the i𝑖iitalic_i-th channel to one. Additionally, randomly initialized filters could be added to obscure the modifications made to the model.

Algorithm 3 Conv2D Identity Initialization Example
1:num_channels3𝑛𝑢𝑚_𝑐𝑎𝑛𝑛𝑒𝑙𝑠3num\_channels\leftarrow 3italic_n italic_u italic_m _ italic_c italic_h italic_a italic_n italic_n italic_e italic_l italic_s ← 3
2:conv2dConv2d(in=3,out=3,k=3,s=1,p=1)𝑐𝑜𝑛𝑣2𝑑Conv2dformulae-sequence𝑖𝑛3formulae-sequence𝑜𝑢𝑡3formulae-sequence𝑘3formulae-sequence𝑠1𝑝1conv2d\leftarrow\textsc{Conv2d}(in=3,out=3,k=3,s=1,p=1)italic_c italic_o italic_n italic_v 2 italic_d ← Conv2d ( italic_i italic_n = 3 , italic_o italic_u italic_t = 3 , italic_k = 3 , italic_s = 1 , italic_p = 1 )
3:for i0𝑖0i\leftarrow 0italic_i ← 0 to num_channels𝑛𝑢𝑚_𝑐𝑎𝑛𝑛𝑒𝑙𝑠num\_channelsitalic_n italic_u italic_m _ italic_c italic_h italic_a italic_n italic_n italic_e italic_l italic_s do
4:     conv2d.weight.data[i,:,:,:]0formulae-sequence𝑐𝑜𝑛𝑣2𝑑𝑤𝑒𝑖𝑔𝑡𝑑𝑎𝑡𝑎𝑖:::0conv2d.weight.data[i,:,:,:]\leftarrow 0italic_c italic_o italic_n italic_v 2 italic_d . italic_w italic_e italic_i italic_g italic_h italic_t . italic_d italic_a italic_t italic_a [ italic_i , : , : , : ] ← 0
5:     conv2d.weight.data[i,i,1,1]1formulae-sequence𝑐𝑜𝑛𝑣2𝑑𝑤𝑒𝑖𝑔𝑡𝑑𝑎𝑡𝑎𝑖𝑖111conv2d.weight.data[i,i,1,1]\leftarrow 1italic_c italic_o italic_n italic_v 2 italic_d . italic_w italic_e italic_i italic_g italic_h italic_t . italic_d italic_a italic_t italic_a [ italic_i , italic_i , 1 , 1 ] ← 1
6:end for
Table 7: Architecture of models used in the experiments on image data. f: number of filters, k: kernel size, s: stride, p: padding act: activation function, n: number of neurons. The size of the second to last layer was varied across experiments.
CNN Architecture
Conv(f=128, k=(3, 3), s=1, p=1)
Conv(f=256, k=(3, 3), s=1, p=1)
Conv(f=3, k=(3, 3), s=1, p=1)
Flatten
<Optional> BatchNorm / LayerNorm
Dense(n=1000, act=ReLU)
Dense(n=#classes, act=None)
Table 8: Architecture of models used in the experiments on the IMDB dataset. feat: vocabulary size, dim: embedding size, act: activation function, n: number of neurons.
IMDB-Model Architecture
Embedding(feat=10_000, dim=250)
<Optional> BatchNorm / LayerNorm
Dense(n=1000, act=ReLU)
Dense(n=1, act=None)