QBI: Quantile-based Bias Initialization for Efficient Private Data Reconstruction in Federated Learning

Micha V. Nowak¹ Tim P. Bott² David Khachaturov³

Frank Puppe¹ Adrian Krenzer¹ Amar Hekalo¹

¹University of Würzburg ²University of Augsburg ³University of Cambridge

Abstract

Federated learning enables the training of machine learning models on distributed data without compromising user privacy, as data remains on personal devices and only model updates, such as gradients, are shared with a central coordinator. However, recent research has shown that the central entity can perfectly reconstruct private data from shared model updates by maliciously initializing the model’s parameters. In this paper, we propose QBI, a novel bias initialization method that significantly enhances reconstruction capabilities. This is accomplished by directly solving for bias values yielding sparse activation patterns. Further, we propose PAIRS, an algorithm that builds on QBI. PAIRS can be deployed when a separate dataset from the target domain is available to further increase the percentage of data that can be fully recovered. Measured by the percentage of samples that can be perfectly reconstructed from batches of various sizes, our approach achieves significant improvements over previous methods with gains of up to 50% on ImageNet and up to 60% on the IMDB sentiment analysis text dataset. Furthermore, we establish theoretical limits for attacks leveraging stochastic gradient sparsity, providing a foundation for understanding the fundamental constraints of these attacks. We empirically assess these limits using synthetic datasets. Finally, we propose and evaluate AGGP, a defensive framework designed to prevent gradient sparsity attacks, contributing to the development of more secure and private federated learning systems. Source code is available at: https://github.com/mvnowak/QBI

1 Introduction

The proliferation of mobile devices and the Internet of Things has led to an unprecedented amount of data being generated at the edge of the network [1]. This data, often in the form of user-generated content, sensor readings, or other types of user interactions, holds immense value for training machine learning (ML) models [2]. Due to the sensitive and often private nature of this data, traditional ML approaches that rely on centralized data collection and processing are often inadequate from a legal or ethical perspective [3]. Furthermore, regulations such as data sovereignty laws and cross-border data transfer restrictions (e.g., GDPR, CCPA) can hinder the movement of data between jurisdictions [4, 5].

Federated learning (FL) was proposed as a solution to these challenges [6]. It enables the collaborative training of ML models while preventing the need for users’ data to leave their devices – each device computes model updates locally, which are then sent to a central entity for aggregation into a shared model. FL, in theory, should preserve users’ privacy and adhere to data transfer restrictions, as the computed gradient updates should not expose any user data.

However, a large body of prior work has demonstrated that the FL protocol is vulnerable to multiple forms of data reconstruction attacks, which can be roughly classified into two major categories.

Passive gradient leakage attacks

In this scenario, the attacker is assumed to have no control over the model’s architecture, its parameters, or the specific FL protocol that is being used. The gradients created through a standard FL protocol could either be obtained by an honest but curious central entity, or an external actor via a man-in-the-middle (MITM) attack. A long line of work has demonstrated, that simply by obtaining these gradients, an attacker could learn about properties of the data [7, 8], data membership [8], and even partially reconstruct user data [9, 10, 11].

Malicious model modifications

The second major threat model assumes a malicious central entity, that has the ability to modify the model’s architecture or parameters in an attempt to break the user’s privacy. A broad range of research has outlined a multitude of possible active attacks, that often leverage induced gradient sparsity [12, 9, 13] or sample disaggregation [14, 15] to recover user data.

Each of these approaches strikes a different balance between computational overhead, quality of the reconstructed data, client-side detectability and the requirement for knowledge about the user data, including certain features and their distributions.

Efficient perfect user data reconstruction with bias tuning

In this work, we propose a novel method of maliciously initializing a model, by only modifying the bias values of a fully connected layer, while leaving the weights completely randomly initialized. This reduces client-side detectability in parameter space compared to methods that either maliciously train the full model to compromise privacy [16] or introduce a shift in the magnitude of all weight values [12]. Additionally our method removes the need for experimentally determined hyperparameters [12] or handpicked target features or classes [17]. We provide two variants of our approach: Quantile-Based Bias Initialization (QBI), which directly determines the optimal bias with near-zero computational cost, and Pattern-Aware Iterative Random Search (PAIRS) which builds on QBI, but further enhances reconstruction success by incorporating auxiliary data and incurring marginally higher computational overhead. Additionally, we derive boundaries for the expected success of attacks leveraging stochastic gradient sparsity, which enables us to identify the inherent constraints of such attacks. Using the findings described in this paper, we propose a novel defensive framework – AGGP – aimed at preventing gradient sparsity attacks, contributing to the development of more secure and private FL systems.

Contributions

•

We establish theoretical limits for attacks leveraging stochastic gradient sparsity and empirically assess these limits using synthetic datasets.
•

We propose a novel, compute efficient method of maliciously initializing fully-connected layers in a server-side attack on the FL protocol, which achieves state-of-the-art results in the domain of perfect user data reconstruction.
•

We provide two variants of our approach: QBI which can be deployed with near-zero computational cost, and PAIRS which can be used to achieve increased performance if auxiliary data and increased compute is available. We release an open-source implementation of our method at https://github.com/mvnowak/QBI, including dedicated scripts to reproduce all results reported in this paper.
•

We extensively evaluate both QBI and PAIRS and find that we achieve improvements above the previous state-of-the-art in perfect rectonstruction with gains of up to $50\%$ on ImageNet and up to $60\%$ on the IMDB sentiment analysis text dataset.
•

We propose and evaluate AGGP, a novel and compute efficient defensive framework that prevents data leakage from both active and passive attacks that leverage gradient sparsity in fully connected layers.

2 Background

Federated learning

FL is a decentralized approach to machine learning that enables multiple parties to collaboratively train a shared model on their local data without sharing the data itself. Each client has a local dataset and computes an update to the model parameters based on their local data. The update is typically computed as the gradient of the local loss function with respect to the model parameters. The local updates are then aggregated to update the global model parameters. One common aggregation method is federated averaging, which computes the weighted average of the local updates. This process is repeated over multiple rounds to train the global model [6].

Gradient sparsity attacks

As demonstrated by Geiping et al. [18], it is feasible to extract a single input from the gradients of a fully-connected layer, given that the layer is preceded only by fully connected layers, has a bias $b$ , uses the ReLU activation function, and the gradient of the loss with respect to the layer’s output contains at least one non-zero entry (see Proposition D.1 in Geiping et al. [18] for a full proof). If this layer is placed at the beginning of the network, this corresponds to reconstructing the original input data point $x$ . As shown in Section 5.1 of Boenisch et al. [12], for any non-zero output $y_{i}$ , the gradient of the corresponding weight row directly contains the input scaled by the gradient of the loss with respect to the bias. In practice, gradients are typically computed as the average over an entire batch of samples. Since a single neuron is often activated by multiple samples, the gradients of its weight row contain the average of multiple input data points, effectively obscuring them and preventing individual extraction. To extract an input sample $x$ , there has to exist a neuron $n_{i}$ such that $L(x)_{i}>0$ and $L(x^{\prime})_{i}<0$ for all $x^{\prime}\neq x$ , where $L(x)_{i}$ denotes the activation of the $i$ -th neuron for sample $x$ . Since the samples that are to be extracted are unknown, initializing a layer to achieve this specific activation pattern is challenging. Therefore, the goal of stochastic gradient sparsity attacks is to increase the probability that a neuron is activated only by a single sample, while producing a variety of neurons with diverse activation patterns, to capture as much data from the target domain as possible.

Adaptability to CNN-based architectures

Boenisch et al. [12] extend the gradient sparsity attack to Convolutional Neural Networks (CNNs), where one or more convolutional layers precede the first linear layer. By utilizing zero-padding, a stride of one, and maliciously initialized filters, convolutional layers can effectively be transformed into identity functions, which perfectly transmit the inputs deeper into the network, allowing them to be extracted once they pass through a fully connected layer. For a detailed description, including visualizations, see Appendix B of Boenisch et al. [12].

3 Related Work

In FL, a common threat model is the passive gradient leakage scenario, where a malicious entity obtains a set of gradients to recover the original data points. Previous research has demonstrated that these gradients can be used to infer data properties [7, 8] or data membership [8]. Several optimization-based attacks have been proposed [19, 11, 20], which optimize a batch of random noise to generate gradients similar to those observed and thereby achieve partial reconstruction of user data. Although applicable to various architectures, these methods often require significant computational resources and are unable to achieve perfect reconstructions. The second threat model involves a malicious server (MS) that can manipulate the model’s architecture or parameters to compromise user privacy. Research in this area has identified various active attacks, which exploit induced gradient sparsity [12, 9, 13] or sample disaggregation techniques [14, 15] to recover user data. The SEER framework [14] is a notable example of an MS that avoids client-side detectability in gradient space by disaggregating samples in an embedding space that is unknown by the client. Although it achieves a high percentage of well-reconstructed images, it does so at a high computational cost (14 GPU days to train on an A100 with 80GB) and falls short of perfectly reconstructing data.

We mainly build on the work of Boenisch et al. [12] that introduced the concept of trap weights, a computationally efficient way to initialize a model to induce gradient sparsity. By adding a slight negative shift to the weight values, their approach achieves perfect reconstruction on multiple datasets. However, their method relies on an experimentally determined scaling factor. In contrast, we automate the model’s initialization process and achieve substantially higher rates of perfect reconstruction.

4 Method: Adversarial Bias Tuning

Intuition of bias tuning

Given a linear layer $L$ of shape $N\times M$ , that uses the ReLU activation function, and a batch $\tilde{X}$ of $B$ samples $x_{1},\cdots,x_{B}$ , the objective is that for every $x$ there exists one and only one neuron $n_{i}$ such that $L(x)_{i}>0$ and $L(x^{\prime})_{i}<0$ for all $x^{\prime}\neq x$ , where $L(x)_{i}$ denotes the activation of the $i$ -th neuron for sample $x$ . This ensures that neuron $n_{i}$ allows for perfect reconstruction of $x$ from the gradients of its weight row, while producing zero gradients for all other samples. Therefore, for every neuron $n_{i}$ the desired probability to activate for any sample in the batch is $1/B$ . We take the model’s weights and our approximated normalized features to be independently and identically distributed (i.i.d.) random variables sampled from a normal distribution:

\begin{pmatrix}\bf{w}_{i}\\ \bf{x}_{i}\end{pmatrix}\sim\mathcal{N}(0,I_{M})

(1)

Using this assumption, the probability of a single neuron, with corresponding weight row $w_{i}$ and bias $b_{i}$ , activating for a sample $x_{i}\in\mathbb{R}^{M}$ can be expanded as:

P\bigl{(}L(x_{i})>0\bigr{)}=P\bigl{(}w_{i}^{T}x_{i}+b_{i}>0\bigr{)}=P\bigl{(}w% _{i1}x_{i1}+\ldots+w_{iM}x_{iM}+b_{i}>0\bigr{)}

(2)

Now the left-hand side is a sum of i.i.d. random variables, each of which has characteristic function:

\bigl{(}1+t^{2}\bigr{)}^{\nicefrac{{1}}{{2}}}

(3)

In turn, the characteristic function of the entire sum is:

(1+2it)^{-M/2}=(1-2it/2)^{-M/2}\cdot(1+2it/2)^{-M/2}

(4)

This immediately identifies it as distributed like:

P\bigl{(}w_{i}^{T}x_{i}+b_{i}>0\bigr{)}=P\bigl{(}Q_{1}/2-Q_{2}/2\leq b_{i}% \bigr{)}

(5)

where $Q_{1}$ and $Q_{2}$ are independent and both of the chi-square distribution with $M$ degrees of freedom. Note that the variance of each addend is $V(w_{ij}x_{ij})=1$ (derived by evaluating the second derivative of Equation 3 at 0). We can drop the assumption that our features are normally-distributed and instead assume that the data has been normalized, as we do in our experiments. Even with this change, the variance of each addend still evaluates to $1$ as we maintain the assumption that the weights are independently- and normally-distributed. This enables us to apply the Central Limit Theorem to the original sum in Equation 2. As a sum of i.i.d. random variables with existing second moments, we can make a statement about convergence in distribution:

P\bigl{(}w_{i1}x_{i1}+\ldots+w_{iM}x_{iM}+b_{i}>0\bigr{)}\xrightarrow{d}\Phi% \biggl{(}\frac{b_{i}}{\sqrt{M}}\biggr{)}

(6)

where $\Phi$ is the cumulative distribution function of a standard normal distribution. Meaning we can solve for the asymptotically optimal bias $b_{*}$ by using the inverse cdf or quantile function:

b_{*}=\Phi^{-1}(\frac{1}{B})\cdot\sqrt{M}

(7)

It can be helpful to conceptualize the relationship between this linear layer and a set of samples as a bipartite random graph. The nodes of one partition class are neurons from the neuron set $\smash{\tilde{N}}=\{n_{1},\ldots,n_{n}\}$ , and those in the other partition are from the sample set $\smash{\tilde{X}}=\{x_{1},\ldots,x_{B}\}$ . A vertex $(n_{i},x_{j})\in\smash{\tilde{N}}\times\smash{\tilde{X}}$ is said to exist if $n_{i}$ is activated by $x_{j}$ . If the malicious actor wants to reconstruct a sample $x_{j}$ , that sample needs to have a neighbor $n_{i}$ with degree $\delta(n_{i})=1$ .

Extraction metrics

We closely follow the extraction metrics proposed by Boenisch et al. [12]. Given a linear layer with $N$ output neurons and a batch $\tilde{X}$ of $B$ samples $x_{1},\cdots,x_{B}$ , we define the following metrics:

1.

Active neurons ( $A$ ): This metric represents the percentage of neurons in layer $L$ that activate for at least one of the $B$ samples. Formally, let $N_{A}$ be the number of neurons $n_{i}$ that satisfy the condition that there exists at least one input $x$ in $\tilde{X}$ such that $L(x)_{i}>0$ , where $L(x)_{i}$ denotes the activation of the $i$ -th neuron in $L$ for sample $x$ . For the neuron conceptualized as a graph node, this condition is equivalent to $\delta(n_{i})\geq 1$ . The active neurons metric $A$ is therefore defined as the ratio of $N_{A}$ to the total number of neurons $N$ :

$A=\frac{N_{A}}{N}$ (8)
2.

Extraction-Precision ( $P$ ): This metric measures the percentage of neurons that allow for the extraction of individual data points. Specifically, let $N_{u}$ be the number of neurons $n_{i}$ in layer $L$ with unique activations, i.e. those that satisfy the following condition: $L(x)_{i}>0$ for one input $x$ in $\tilde{X}$ , and $L(x^{\prime})_{i}<0$ for all other inputs $x^{\prime}\neq x$ . This condition is equivalent to $\delta(n_{i})=1$ , which effectively counts neuron nodes that are leaves of their graphs. The extraction-precision $P$ is defined as the following ratio:

$P=\frac{N_{u}}{N}$ (9)
3.

Extraction-Recall ( $R$ ): The extraction-recall measures the percentage of input data points that can be perfectly reconstructed from any gradient row. Let $B_{0}$ be the number of data points that can be extracted with an $l_{2}$ -error of zero, then $R$ is denoted as:

$R=\frac{B_{0}}{B}$ (10)

Notably, $R$ is the most significant metric, as neither $A$ nor $P$ can be used in isolation to meaningfully assess the effectiveness of an attack. A high $A$ value could lead to overlapping activations that prevent individual extraction, while a high $P$ value could be observed in a scenario where all neurons activated for the same sample. If the true probability of a neuron activating is $1/B$ , we can derive the explicit probabilities $p_{A;B}$ and $p_{u;B}$ for a neuron to be counted as a success in the context of the $A$ and $P$ metrics, respectively. Since the success of one neuron does not influence the remaining neurons, the entire batch follows a binomial distribution: $N_{A}\sim\mathcal{B}(N,p_{A;B})$ and $N_{u}\sim\mathcal{B}(N,p_{u;B})$ . Consequently, the expected activation share $A$ and the expected extraction-precision $P$ are:

p_{A;B}=\mathbb{E}[A_{B}]=1-\bigl{(}\frac{B-1}{B}\bigr{)}^{B}\qquad\text{ and % }\qquad p_{u;B}=\mathbb{E}[P_{B}]=\bigl{(}\frac{B-1}{B}\bigr{)}^{B-1}

(11)

For growing batch sizes, these converge to:

\lim\limits_{B\to\infty}\mathbb{E}[A_{B}]=1-\frac{1}{e}\approx 63.2\%\quad% \text{and}\quad\lim\limits_{B\to\infty}\mathbb{E}[P_{B}]=\frac{1}{e}\approx 36% .8\%

(12)

From a graph theory perspective, $R\cdot B=B_{0}$ denotes the size of the largest neuron subset $V\subset\tilde{N}$ , such that the subgraph induced by the union of $V$ and all neighbors of vertices in $V$ forms a perfect matching. However in contrast to the previous metrics, the expected extraction recall $R$ exhibits a crucial difference. Our assumptions state that existence of a particular edge (representing the activation of a neuron by a data point) is independent of the existence of any other edges. Due to this, and because neurons do not have edges in common with one another, the event of inclusion of any neuron in the count for $N_{A}$ and $N_{u}$ is also independent of the inclusion of any other neuron therein. This allows us to assert the success probabilities (see Equation 11) and that $N_{A}$ and $N_{u}$ will both follow a binomial distribution. This binomial approach breaks down for $R$ , however. We can and will still derive the non-zero probability (see Equation 13) for one specific data point to be included in the count (for $N_{A}$ and $N_{u}$ we were counting neurons, for $B_{0}$ we count data points). This value will depend on the size of both partition classes of the graph, not just the one being counted. Namely, the expected share of data points that the malicious actor can perfectly reconstruct is:

p_{R;B;N}=\mathbb{E}[R]=1-\left(1-\frac{1}{B}\left(\frac{B-1}{B}\right)^{B-1}% \right)^{N}

(13)

See Section B.3 for a more detailed explanation. To check that this does not yield the exact distribution of $R$ and that the binomial approach is no longer relevant, consider a graph with more data points than neurons, i.e. $B>N$ . It is clear that $P(R=100\%)=0\neq(p_{R;B;N})^{\textsc{B}}$ , since there are not enough neurons to cover each data point. As Equation 13 assumes the optimal scenario, only obtainable with per-neuron activation probabilities of $1/B$ and truly normally distributed data, it provides an upper limit for the expected success of stochastic gradient sparsity attacks on real-world data, which will yield lower expected results, the further the data deviates from being normally distributed. In Section A.3 we assess these boundaries by evaluating the extraction success on synthetic truly random datasets.

Quantile-based Bias Initialization (QBI)

QBI maliciously initializes a linear layer $L$ before sending it to a client $k$ targeted for extraction. The weight values $w$ of $L$ are initialized from a standard normal distribution. Given a batch size $B$ used on the client side and the number of input features $M$ , QBI determines the bias value $b^{*}$ using Equation 7, which approximately leads to an activation probability of $1/B$ for each neuron in $L$ . Even though the true distribution of features on the user side is unknown and the features are neither independent nor truly normally-distributed, our approximation is effective in practice.

Pattern-Aware Iterative Random Search (PAIRS)

We propose the PAIRS algorithm that further adapts the malicious QBI linear layer to the target domain when auxiliary data is available. By acknowledging that real-world data, such as images, rarely exhibits the assumed i.i.d. properties, PAIRS iteratively searches the weight space to better capture the underlying patterns. The procedure, outlined in Algorithm 1, begins by performing a forward pass with a batch of auxiliary data. It then identifies and re-initializes the weight rows of neurons that are either overactive or underactive, as well as those that exhibit redundant activation patterns. Through this process, PAIRS builds neuron-sample pairs, iteratively searching the weight space until all samples are covered or a fixed number of iterations is reached. Specifically, with an output shape of $N$ and a batch size of $B$ , PAIRS uses $N/B$ batches of $B$ samples for groups of $B$ neurons each. By randomly re-initializing weight values, PAIRS avoids detectability in weight space while increasing the percentage of data that can be perfectly reconstructed, surpassing the performance of plain QBI.

Algorithm 1 Pattern-Aware Iterative Random Search (PAIRS)

1:Input: Linear layer

L

of shape

M\times N

, Number of retries

T

K

Batches

\tilde{X}_{1},\cdots,\tilde{X}_{K}

of size

B\leftarrow\lceil N/K\rceil

3:Initialize:

L.bias\leftarrow\phi^{-1}(\frac{1}{B})\cdot\sqrt{M}

\triangleright

Fill bias values using quantile function

4:for all

\tilde{X}_{k}

5: Initialize:

F_{D}\leftarrow\emptyset

\triangleright

Frozen data points

6: for neuron

n=(k-1)B

kB-1

7: for

t=1

T

\triangleright

Random resets for this neuron

A\leftarrow L(\tilde{X}_{k})_{n}

\triangleright

Get activations via forward pass for neuron

n

I\leftarrow\{i\,|\,A[i]>0\}

\triangleright

Get indices of active samples

10:

s\leftarrow I[0]

11: if

|I|\neq 1\vee s\in F_{D}

then

\triangleright

Check if sample is not isolated, or already covered

12:

L.W_{i}\sim\mathcal{N}

\triangleright

Randomly re-initialize weight row

i

13: continue

14: end if

15:

F_{D}\leftarrow F_{D}\cup\left\{s\right\}

\triangleright

Mark sample as frozen

16: end for

17: end for

18:end for

5 Defence: Activation-based Greedy Gradient Pruning

To counter attacks that exploit gradient sparsity in fully connected layers, we propose Activation-based Greedy Gradient Pruning (AGGP). This is a novel approach that detects and mitigates both passive and active data leakage. Unlike previous works that suggest skipping entire training rounds when potential data leakage is detected [14], we adopt a more targeted strategy by selectively pruning gradients of suspect neurons, scaled by their activation pattern. We account for the fact that even benign networks may occasionally leak data points, and that skipping entire updates could withhold valuable training information.

A forward hook is registered at a potentially vulnerable linear layer $L$ . The hook records and caches activation counts, i.e., the number of samples in the batch that lead to a positive activation in a particular neuron. As outlined in Algorithm 2, after the loss and gradients have been calculated, AGGP iterates over all neurons in $L$ . We take $a_{n}$ to be the number of samples that activate a specific neuron $n$ . Take $c$ to be an arbitrary cut-off sample count. Neurons that did not activate (i.e., with $a_{n}=0$ ) or those that activated for more than $c$ samples (i.e., with $a_{n}\geq c$ ) are skipped. For all other neurons with $0<a_{n}<c$ , the percentage $p_{keep,n}$ of gradient values to retain is calculated as:

p_{keep,n}=\frac{(a_{n}-1)^{2}\cdot\left(p_{u}-p_{l}\right)}{(c-2)^{2}}+p_{l}

(14)

where $p_{l}$ and $p_{u}$ represent the lower and upper bound for the percentage of gradient values that are retained for $a=1$ and $a=c-1$ respectively. In our experiments on the ImageNet dataset, we set $c=16$ , $p_{l}=0.01$ and $p_{u}=0.95$ . See Section B.4 for a detailed explanation of how we arrived at these hyperparameters. For higher $a_{n}$ values $p_{keep,n}$ increases as the overlapping samples lead to more diffuse representations, where individual samples are increasingly obscured. AGGP then continues by sorting the gradients of the corresponding weight row by their absolute magnitude. Among the top $p_{keep,n}$ percent of values, $25\%$ are randomly selected to be retained, while all other values are set to zero, to further obscure potentially connected features. This effectively prevents the perfect reconstruction of individual samples, while $25\%$ of the gradient values corresponding to the $p_{keep,n}$ percent largest magnitudes are maintained, allowing the propagation of valuable training information.

Algorithm 2 Activation-based Greedy Gradient Pruning (AGGP)

1:Input: Linear layer

L

of shape

M\times N

, Batch

\tilde{X}

of size

B

, cut-off threshold

c

, lower and upper pruning bounds

p_{l}

and

p_{u}

\mathbf{A}\leftarrow L(\tilde{X})>0

\triangleright

Store layer activations during forward pass

loss=\cdots

\textbf{G}\leftarrow loss.backward()

\triangleright

Compute gradients

5:for all neuron

n

L

a_{n}\leftarrow\sum_{i=1}^{M}\mathbb{I}(A_{n,i}>0)

\triangleright

Number of samples that activate neuron

n

7: if

a_{n}=0\vee a_{n}>c

then

\triangleright

Neurons with no activation or

>

cut-off are skipped

8: continue

9: end if

10:

\mathbf{p}_{keep}\leftarrow\frac{(a_{n}-1)^{2}\cdot\left(p_{u}-p_{l}\right)}{(% c-2)^{2}}+p_{l}

\triangleright

Determine percentage of gradient values to retain

11:

k\leftarrow\lfloor\mathbf{p}_{keep}\cdot N\rfloor

\triangleright

Number of elements to fully prune

12:

\mathbf{s}\leftarrow\text{argsort}(\mathbf{|G_{n}|)}

\triangleright

Get sorted indices of absolute gradient values of

n

-th weight row

13:

\mathbf{I}\leftarrow\mathbf{s}[:k]

\triangleright

Retrieve indices of lowest

k

and random 75% of top

N-k

values

14:

\mathbf{I}\leftarrow\mathbf{I}\cup\mathbf{s}[k:][\text{randperm}(N-k)[:\lfloor 0% .75\cdot(N-k)\rfloor]]

15:

\mathbf{G_{n}}[\mathbf{I}]\leftarrow 0

\triangleright

Zero out gradients

16:end for

6 Experimental Evaluation

We closely follow the experimental setup of Boenisch et al. [12] for both image and text data, to allow a direct comparison to be made. See Algorithm 3 for further implementation details.

Image Data Extraction

We evaluated our method on two benchmark vision datasets: ImageNet [21] at a resolution of $224\times 224$ , and CIFAR-10 [22] at a resolution of $32\times 32$ . As our method requires normalized data, we used publically available normalization parameters for both datasets. The model we used consisted of convolutional layers maliciously initialized to transfer the input further into the network, followed by a linear layer initialized with either QBI or PAIRS. The rate of perfectly reconstructed images $R$ (see Equation 10) was evaluated using batch sizes of 20, 50, 100, and 200, and layer sizes of 200, 500, and 1000. Our results, presented in Table 1, surpass those reported by Boenisch et al. [12] using their trap weights approach. Our method yields consistent improvements for both the CIFAR-10 and ImageNet datasets across various layer size and batch size combinations. Notably, the most significant gains are observed on ImageNet with smaller batch sizes, where our method achieves up to $50\%$ higher reconstruction rates. Figure 2 in Appendix A.1 presents an example batch of 20 images and the perfectly reconstructed subset of 16 images, obtained from a layer size of 200. We also conducted experiments using unnormalized data, inserting either a batch normalization layer or a layer normalization layer before the maliciously initialized linear layer. See Section B.1, B.2 and Table 6 for more details.

Text Data Extraction

The IMDB sentiment analysis dataset [23] was used to evaluate the extraction of text data. Our model operates on 250-token sentences with an embedding dimension of 250, where the embedded tokens are directly fed to a fully connected layer of size 1000. We re-trained the bert-base-uncased tokenizer [24] on the IMDB dataset, resulting in a vocabulary size of $10,000$ . The extraction was evaluated on batch sizes of 20, 50, 100, and 200. The results, presented in Table 2, are compared to those reported by Boenisch et al. [12]. Our approach performs similarly across batch sizes 20 and 50, but achieves performance gains of $25\%$ and $47\%$ on batch sizes 100 and 200, respectively.

Secondary Metrics

The advantage of our method can be explained by examining the secondary metrics precision $P$ and activation value $A$ (see Equation 11). As established in Equation 12, the theoretical optimum for these values lies at $A=1-1/e\approx 63.2\%$ and $P=1/e\approx 36.8\%$ . Although these can only be achieved if the target data is truly normally distributed (see Table 5 in the Appendix), values that lie closer to this optimum will lead to better reconstruction rates. Table 4 in the Appendix compares the $A$ and $P$ values achieved using QBI and PAIRS to those obtained using trap weights [12] on image data. Specifically, on the ImageNet dataset, our $A$ values range from $72\%$ to $87\%$ , whereas those reported in Boenisch et al. [12] show a stronger variance across batch sizes, ranging from $9\%$ to $89\%$ . Similarly, our precision values range from $26\%$ to $34\%$ on ImageNet, while those reported in [12] vary from $23\%$ to $94\%$ .

Table 1: Comparing the percentage of perfectly reconstructed images (

R

, see Equation 13) taken from both the ImageNet and CIFAR-10 datasets, with the results reported in [12] using their trap weights, across various combinations of neuron counts

N

and batch sizes

B

. The listed values represent the mean across 10 random initializations, with each initialization evaluated on 10 random batches. The error margins indicate the 95% confidence interval.

(N, B)	[12]	QBI (Ours)	PAIRS (Ours)	[12]	QBI (Ours)	PAIRS (Ours)
	ImageNet			CIFAR-10
(200, 20)	35.5	82.5 $\pm$ 2.43	85.5 $\pm$ 1.25	69.5	75.7 $\pm$ 1.85	77.1 $\pm$ 1.19
(200, 50)	30.4	52.0 $\pm$ 1.46	56.0 $\pm$ 1.13	45.2	46.5 $\pm$ 1.06	48.4 $\pm$ 0.43
(200, 100)	24.0	29.0 $\pm$ 0.92	34.6 $\pm$ 0.67	26.9	28.4 $\pm$ 0.60	31.5 $\pm$ 0.73
(200, 200)	11.3	15.1 $\pm$ 0.62	19.5 $\pm$ 0.60	9.60	15.8 $\pm$ 0.49	18.6 $\pm$ 0.32
(500, 20)	49.0	93.6 $\pm$ 1.10	94.5 $\pm$ 0.84	87.0	87.6 $\pm$ 1.18	87.8 $\pm$ 1.36
(500, 50)	42.6	73.5 $\pm$ 1.50	76.8 $\pm$ 1.27	61.4	63.8 $\pm$ 1.09	67.3 $\pm$ 1.28
(500, 100)	35.8	49.0 $\pm$ 1.09	55.8 $\pm$ 0.87	42.2	45.1 $\pm$ 0.74	48.1 $\pm$ 0.80
(500, 200)	19.9	29.2 $\pm$ 0.49	35.9 $\pm$ 0.35	17.7	28.4 $\pm$ 0.43	32.6 $\pm$ 0.60
(1000, 20)	59.5	96.6 $\pm$ 0.78	96.7 $\pm$ 0.66	91.5	91.3 $\pm$ 1.40	91.8 $\pm$ 1.49
(1000, 50)	51.6	84.3 $\pm$ 0.59	86.6 $\pm$ 0.67	72.4	74.3 $\pm$ 0.93	77.7 $\pm$ 1.17
(1000, 100)	45.7	64.8 $\pm$ 0.57	68.8 $\pm$ 1.00	55.6	57.2 $\pm$ 0.76	59.4 $\pm$ 0.52
(1000, 200)	28.8	42.7 $\pm$ 0.72	49.4 $\pm$ 3.00	25.6	39.2 $\pm$ 0.49	42.6 $\pm$ 0.60

Table 2: Comparing the percentage of perfectly reconstructed text samples

R

from the IMDB dataset [23], with the results reported in [12], across various batch sizes

B

using a layer size of 1000. The values represent the mean across 10 random initializations, with each initialization evaluated on 10 random batches. The error margins indicate the 95% confidence interval.

B	Trap weights[12]	QBI (Ours)	PAIRS (Ours)
20	100.0	100 $\pm$ 0.00	99.9 $\pm$ 0.20
50	96.2	98.9 $\pm$ 0.46	98.6 $\pm$ 0.43
100	65.4	90.5 $\pm$ 0.33	90.8 $\pm$ 0.82
200	25.5	72.8 $\pm$ 0.64	73.3 $\pm$ 0.78

AGGP

We find that our proposed defense framework reduces the percentage of perfectly reconstructable samples obtained via gradient sparsity in linear layers to zero. This occurs as any neuron that meets the condition for perfect extraction ( $a_{n}=1$ ), triggers the pruning of $1-0.25\cdot p_{l}$ percent of gradient values of its corresponding weight row (see Equation 14). The effect of our defense is best presented visually – the left-hand side of Figure 1, displays the data that is leaked passively from the first 20 neurons of a benign network after a single training step with a batch of 20 samples from the ImageNet dataset. The right-hand side depicts the impact of AGGP on the same scenario, where gradients of neurons with low activation counts are aggressively pruned, while those with high activation counts remain unaffected. Further visualizations of AGGP’s impact on a maliciously initialized model, along with preliminary observations of its effect on training performance, are provided in Section A.5.

Refer to caption — Figure 1: Visualization of the passive data leakage of the first 20 neurons of a linear layer of size 200 (left) and the impact of AGGP (right). Sparsely activated neurons are aggressively pruned, while the gradients of neurons with activation counts exceeding the cut-off threshold remain unaffected.

7 Discussion

Impact on secure aggregation and distributed differential privacy

Secure aggregation (SA) [25] and distributed differential privacy (DDP) [26] are proposed modifications to the traditional FL protocol, designed to decrease the amount of trust users have to place in the central entity. SA employs a multiparty computation protocol to perform decentralized gradient aggregation, while in DDP users locally add a small amount of noise to their gradient updates. Boenisch et al. [27] demonstrated that attacks leveraging gradient sparsity can undermine these protocols if the server is able to introduce malicious nodes in the form of so-called sybil devices. Since their work uses the previously described trap weights [12], replacing the model initialization with our QBI or PAIRS approach could significantly improve the performance of this attack vector.

Detectability

Our method leaves all weight values completely randomly initialized, making it virtually undetectable in weight space. However, it introduces a negative shift in the bias values, which could potentially be detected by the client. Additionally, as our method relies on gradient sparsity, it would be feasible to detect it in gradient space, e.g., by leveraging measures like the disaggregation signal-to-noise ratio [14].

Limitations

Boenisch et al. [12] and our attack rely on the existence of a fully connected layer, either at the beginning of the network or positioned such that preceding layers can be maliciously initialized to perfectly transmit the input deeper into the network. Preceding layers that reduce dimensionality, for example, through pooling operations, will diminish fidelity and thereby prevent perfect data extraction. We conduct preliminary tests on the impact of AGGP on training performance that indicate little-to-no adverse effects, however, a comprehensive evaluation across a wider range of architectures, datasets, and pruning functions is needed to achieve generalizable insights.

8 Conclusion

In this work, we introduce a novel bias tuning method designed to enhance attacks targeting private data reconstruction in federated learning systems. Our approach, encompassing the QBI and PAIRS variants, outperforms comparable gradient sparsity methods across various datasets and batch sizes, achieving superior rates of perfect user data reconstruction. By establishing theoretical limits for stochastic gradient sparsity attacks, our work provides a crucial step towards a more comprehensive understanding of the fundamental constraints imposed by the probabilistic nature of these attacks. Additionally, we introduce AGGP as a defense mechanism against gradient sparsity attacks, effectively mitigating data leakage in linear layers. While our attack method poses privacy risks, we believe that by sharing the details of our approach, we can encourage systematic exploration of privacy safeguards and enable practitioners to better assess and mitigate privacy risks in FL deployments, ultimately promoting safer machine learning practices.

Acknowledgments

David Khachaturov is supported by the University of Cambridge Harding Distinguished Postgraduate Scholars Programme.

References

Zhang et al. [2021] Tuo Zhang, Chaoyang He, Tianhao Ma, Lei Gao, Mark Ma, and Salman Avestimehr. Federated learning for internet of things. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, pages 413–419, 2021.
Wang et al. [2021] Yi Wang, Imane Lahmam Bennani, Xiufeng Liu, Mingyang Sun, and Yao Zhou. Electricity consumer characteristics identification: A federated learning approach. IEEE Transactions on Smart Grid, 12(4):3637–3647, 2021.
Basu et al. [2020] Treena Basu, Sebastian Engel-Wolf, and Olaf Menzer. The ethics of machine learning in medical sciences: Where do we stand today? Indian Journal of Dermatology, 65(5):358–364, 2020.
gdp [2016] General data protection regulation (gdpr). https://eur-lex.europa.eu/eli/reg/2016/679/oj, 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016.
ccp [2018] California consumer privacy act (ccpa). https://leginfo.legislature.ca.gov/faces/codes_displayText.xhtml?lawCode=CIV&division=3.&title=1.81.&part=4.&chapter=&article=, 2018. California Civil Code, Title 1.81.5, Section 1798.100 et seq.
McMahan et al. [2017] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
Ganju et al. [2018] Karan Ganju, Qi Wang, Wei Yang, Carl A Gunter, and Nikita Borisov. Property inference attacks on fully connected neural networks using permutation invariant representations. In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pages 619–633, 2018.
Melis et al. [2019] Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE symposium on security and privacy (SP), pages 691–706. IEEE, 2019.
Fowl et al. [2021] Liam Fowl, Jonas Geiping, Wojtek Czaja, Micah Goldblum, and Tom Goldstein. Robbing the fed: Directly obtaining private data in federated learning with modified models. arXiv preprint arXiv:2110.13057, 2021.
Wang et al. [2019] Zhibo Wang, Mengkai Song, Zhifei Zhang, Yang Song, Qian Wang, and Hairong Qi. Beyond inferring class representatives: User-level privacy leakage from federated learning. In IEEE INFOCOM 2019-IEEE conference on computer communications, pages 2512–2520. IEEE, 2019.
Zhao et al. [2020] Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. idlg: Improved deep leakage from gradients. arXiv preprint arXiv:2001.02610, 2020.
Boenisch et al. [2023a] Franziska Boenisch, Adam Dziedzic, Roei Schuster, Ali Shahin Shamsabadi, Ilia Shumailov, and Nicolas Papernot. When the curious abandon honesty: Federated learning is not private. In 2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), pages 175–199. IEEE, 2023a.
Zhao et al. [2023] Joshua C Zhao, Atul Sharma, Ahmed Roushdy Elkordy, Yahya H Ezzeldin, Salman Avestimehr, and Saurabh Bagchi. Secure aggregation in federated learning is not private: Leaking user data at large scale through model modification. arXiv preprint arXiv:2303.12233, 2023.
Garov et al. [2023] Kostadin Garov, Dimitar I Dimitrov, Nikola Jovanović, and Martin Vechev. Hiding in plain sight: Disguising data stealing attacks in federated learning. arXiv preprint arXiv:2306.03013, 2023.
Pasquini et al. [2022] Dario Pasquini, Danilo Francati, and Giuseppe Ateniese. Eluding secure aggregation in federated learning via model inconsistency. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 2429–2443, 2022.
Zhang et al. [2022] Shuaishuai Zhang, Jie Huang, Zeping Zhang, and Chunyang Qi. Compromise privacy in large-batch federated learning via malicious model parameters. In International Conference on Algorithms and Architectures for Parallel Processing, pages 63–80. Springer, 2022.
Wen et al. [2022] Yuxin Wen, Jonas Geiping, Liam Fowl, Micah Goldblum, and Tom Goldstein. Fishing for user data in large-batch federated learning via gradient magnification. arXiv preprint arXiv:2202.00580, 2022.
Geiping et al. [2020] Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, and Michael Moeller. Inverting gradients-how easy is it to break privacy in federated learning? Advances in neural information processing systems, 33:16937–16947, 2020.
Yin et al. [2021] Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M Alvarez, Jan Kautz, and Pavlo Molchanov. See through gradients: Image batch recovery via gradinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16337–16346, 2021.
Zhu et al. [2019] Ligeng Zhu, Zhijian Liu, and Song Han. Deep leakage from gradients. Advances in neural information processing systems, 32, 2019.
Deng et al. [2009] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
Krizhevsky et al. [2009] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
Maas et al. [2011] Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 142–150, 2011.
Devlin et al. [2018] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018. URL http://arxiv.org/abs/1810.04805.
Bonawitz et al. [2017] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1175–1191, 2017.
Kim et al. [2021] Muah Kim, Onur Günlü, and Rafael F Schaefer. Federated learning with local differential privacy: Trade-offs between privacy, utility, and communication. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2650–2654. IEEE, 2021.
Boenisch et al. [2023b] Franziska Boenisch, Adam Dziedzic, Roei Schuster, Ali Shahin Shamsabadi, Ilia Shumailov, and Nicolas Papernot. Reconstructing individual data points in federated learning hardened with differential privacy and secure aggregation. In 2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), pages 241–257. IEEE, 2023b.
Kingma and Ba [2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Wang et al. [2004] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.

Appendix A Additional Experimental Results

A.1 Images

A.2 Activation values (A) and precision (P)

Table 3: Comparing

A

and

P

obtained on the IMDB text dataset, (see Equation 11) with the results reported by Boenisch et al. [12], across various batch sizes

B

using a layer size of 1000. The corresponding

R

values measuring the extraction success are listed in Table 2. Results are averaged over 10 random initializations, each evaluated on 10 random batches.

	A			P
B	[12]	QBI	PAIRS	[12]	QBI	PAIRS
20	51.9	58.7	59.2	61.0	33.3	33.6
50	77.6	57.1	57.0	37.6	32.6	32.5
100	91.0	55.1	55.5	19.2	31.8	31.5
200	97.8	53.4	53.8	0.07	30.6	30.7

Table 4: Comparing

A

and

P

(see Equation 11) with the results reported by Boenisch et al. [12], across various batch sizes

B

and layer sizes

N

. The corresponding

R

values measuring the extraction success are listed in Table 1. Results are averaged over 10 random initializations, each evaluated on 10 random batches.

(N, B)	[12]	QBI	PAIRS	[12]	QBI	PAIRS	[12]	QBI	PAIRS	[12]	QBI	PAIRS
	ImageNet						CIFAR-10
	A			P			A			P
(200, 20)	0.09	75.6	72.6	94.8	31.4	33.9	45.4	55.7	56.2	67.0	31.2	31.6
(200, 50)	38.1	80.8	76.2	76.3	25.1	29.7	66.2	53.9	57.1	49.4	26.7	29.4
(200, 100)	65.3	84.6	79.0	50.0	21.4	27.8	84.6	54.6	57.6	28.0	24.9	28.7
(200, 200)	88.6	86.5	80.6	23.3	18.8	26.3	95.4	54.6	56.6	12.4	22.7	27.6
(500, 20)	0.09	75.8	72.7	93.9	31.4	33.0	45.2	56.3	56.9	68.9	30.7	32.7
(500, 50)	38.7	81.4	76.2	76.7	26.2	30.2	65.3	54.8	56.7	50.5	26.2	30.3
(500, 100)	64.6	84.6	78.6	50.8	21.6	27.7	84.5	55.7	57.9	29.0	24.4	29.0
(500, 200)	89.2	87.1	80.4	24.0	18.8	26.3	95.0	56.0	57.0	11.9	22.8	27.8
(1000, 20)	10.2	75.5	73.4	94.2	31.4	33.1	44.1	55.5	57.0	70.3	30.2	32.2
(1000, 50)	38.8	80.9	76.4	77.0	26.1	30.1	64.8	55.6	57.5	50.4	27.0	30.1
(1000, 100)	65.5	84.4	79.3	51.4	22.0	28.0	84.4	55.7	56.5	29.7	24.6	28.5
(1000, 200)	89.2	87.3	81.5	23.8	18.8	26.0	95.1	56.0	58.2	12.0	23.0	27.5

A.3 Synthetic Data

Table 5: Comparing the predicted values

A_{pred}

P_{pred}

and

R_{pred}

, obtained via Equations 11 and 13, to

A_{exp}

P_{exp}

and

R_{exp}

observed experimentally when using a synthetic, fully random dataset. All numbers are averaged over 300 random initalizations, tested with 10 batches of random data each. The experimental setup was idential to that when evaluating the extraction on the CIFAR-10 dataset using QBI, replacing the image data with normal random noise of shape

3\times 32\times 32

(N, B)	$A_{pred}$	$A_{exp}$	$P_{pred}$	$P_{exp}$	$R_{pred}$	$R_{exp}$
(200, 20)	64.2	64.1	37.7	37.5	97.8	97.7
(200, 50)	63.6	64.2	37.2	37.3	77.5	77.1
(200, 100)	63.4	63.0	37.0	36.7	52.3	52.1
(200, 200)	63.3	63.1	36.9	36.5	30.9	30.7
(500, 20)	64.2	64.2	37.7	38.0	100	100
(500, 50)	63.6	63.4	37.2	37.0	97.6	97.5
(500, 100)	63.4	63.3	37.0	36.8	84.3	83.9
(500, 200)	63.3	63.1	36.9	36.5	60.3	59.8
(1000, 20)	64.2	64.2	37.7	37.6	100	100
(1000, 50)	63.6	63.6	37.2	37.0	99.9	100
(1000, 100)	63.4	63.2	37.0	36.8	97.5	97.1
(1000, 200)	63.3	63.4	36.9	36.8	84.2	83.6

A.4 DataNorm vs. BatchNorm vs. LayerNorm

Table 6: Comparing extraction recall

R

on image data achieved using QBI in combination with regular data normalization (DN), batch normalization (BN) and layer normalization (LN) across ImageNet and CIFAR-10. The values represent the mean across 10 random initializations, with each initialization evaluated on 10 random batches. Error margins indicating the 95% confidence interval ranged from

\pm 0.2

\pm 1.57

and are omitted for readability. (*) For layer normalization, extraction recall is defined as the percentage of samples that are isolated by a single neuron, see Section B.2 for further explanation.

(N, B)	DN	BN	LN*	DN	BN	LN*
	ImageNet			CIFAR-10
(200, 20)	82.5	81.3	95.5	75.7	77.7	94.8
(200, 50)	52.0	51.9	70.2	46.5	47.6	67.7
(200, 100)	29.0	32.5	44.2	28.4	28.9	39.1
(200, 200)	15.1	18.7	24.2	15.8	16.1	21.7
(500, 20)	93.6	91.4	99.9	87.6	87.6	99.5
(500, 50)	73.5	71.3	94.4	63.8	63.9	90.4
(500, 100)	49.0	50.0	75.0	45.1	44.8	70.2
(500, 200)	29.2	32.6	49.3	28.4	28.0	44.0
(1000, 20)	96.6	94.9	100	91.3	91.6	99.9
(1000, 50)	84.3	81.4	99.6	74.3	74.5	98.8
(1000, 100)	64.8	63.4	92.4	57.2	56.2	89.5
(1000, 200)	42.7	44.1	72.4	39.2	38.7	66.8

A.5 AGGP

We evaluated the impact of AGGP on a CNN-based architecture (see Table 7), which could, in theory, be maliciously initialized by changing parameter values without modifying the architecture. Since model performance is not a concern when the central entity is malicious or compromised, the impact is assessed on a benign network. The model’s validation accuracy on the CIFAR-10 dataset was evaluated across 10 random initializations, recorded across 25 epochs, using a batch size of 64. The results of these preliminary experiments, presented in Figure 4, show no significant impact on training performance.

Appendix B Background

B.1 Extraction under batch normalization

When a normalization layer, such as a batch normalization layer, precedes the maliciously initialized linear layer, the reconstruction process must be adapted accordingly. This section will specifically focus on the BatchNorm2d¹¹1https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html layer as it is implemented in PyTorch. Batch normalization uses the following equation to perform normalization:

y=\frac{x-\text{E}[x]}{\sqrt{\text{Var}[x]+\epsilon}}*\gamma+\beta

(15)

where $x$ is the input, $\gamma$ and $\beta$ are learnable parameters with default values $1$ and $0$ respectively, and $\epsilon$ is a small constant added to the denominator for numerical stability, with default value $10^{-5}$ . The mean and standard-deviation are calculated per-dimension over the mini-batches. Additionally BatchNorm2d keeps running estimates of its computed mean $\hat{\mu}$ and variance $\smash{\hat{V}}$ , which are both updated using a momentum of $0.1$ using the following equation:

\hat{x}_{new}=(1-\textit{momentum})\cdot\hat{x}+\textit{momentum}\cdot x_{t}

(16)

where $\hat{x}$ is the estimated statistic and $x_{t}$ is the new observed value. As extraction is only done on the initial training step after model initialization, $\hat{\mu}$ and $\hat{V}$ will be set to their default values 0 and 1 respectively. Hence, the values that we obtain after the first training step are:

\hat{\mu}_{new}=\textit{momentum}\cdot\mu_{t}

(17)

and

\hat{V}_{new}=(1-\textit{momentum})+\textit{momentum}\cdot V_{t}

(18)

allowing us to simply rearrange the equations to solve for the batch statistics:

\hat{\mu}_{t}=\frac{\hat{\mu}_{new}}{\textit{momentum}}

(19)

\hat{V}_{t}=\frac{\hat{V}_{new}-(1-\textit{momentum})}{\textit{momentum}}

(20)

Since the learnable parameters $\gamma$ and $\beta$ are intialized to 1 and 0 respectively, they have no effect during the first training step and can be ignored. Finally, given the extracted sample $y$ from the gradients, normalization can be reversed to obtain the original sample $x$ by rearranging equation Equation 15 as:

x=\hat{\mu}_{t}+y\sqrt{\hat{V}_{t}+\epsilon}

(21)

B.2 Extraction under layer normalization

This section will specifically focus on the LayerNorm²²2https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html layer as it is implemented in PyTorch. Layer normalization uses the same equation as batch normalization (see Equation 15) with the difference, that the mean and standard-deviation are calculated over the last $D$ dimensions of every sample, where $D$ is the dimension of the normalized_shape. In our case, normalized_shape is a 1-dimensional vector representing the flattened input image. Since each sample is normalized with respect to its own mean and standard deviation, perfect reconstruction, defined by an L2 loss of zero, is not possible, as the central entity does not have knowledge of these per-sample normalization parameters which are typically not shared or cached. However, normalization can be reversed by substituting with commonly available normalization parameters of the target domain, as an approximation for the true sample statistics. Figure 5 displays a mini-batch of true user data compared to the images that could be reconstructed when the QBI-initialized linear layer was preceded by a LayerNorm layer, where normalization was reversed using the publicly available ImageNet normalization parameters. It is evident, that while the reversal of normalization with approximated parameters introduces a slight shift to the images color and brightness, detail and structure are preserved. When a LayerNorm layer precedes a linear layer that was maliciously initialized with QBI, reconstruction success, as measured by samples that are isolated by a single neuron, is greatly increased, as per-sample normalization is more effective at bringing input features closer to being normally distributed, compared to regular data normalization or batch normalization.

B.3 Intuition of Equation 13

Equation 13 estimates the expected percentage of reconstructed samples in the optimal case of normally distributed features. To do this, we calculate the probability of a single sample being successfully reconstructed, which is equal to the expected recall percentage. Let $B$ be the number of samples in our batch, and $N$ be the number of neurons in our linear layer. Additionally, we assume that we have achieved the optimal probability of activation of $1/B$ for every neuron-sample pair. Given a single neuron $n_{i}$ and a single sample $x$ , the probability that this neuron activates only for $x$ and not for all other samples $x^{\prime}\neq x$ is

\frac{1}{B}\left(\frac{B-1}{B}\right)^{B-1},

(22)

which we also refer to as the probability that $n_{i}$ isolates $x$ . The complement of this event is the probability that the neuron does not isolate $x$ :

1-\frac{1}{B}\left(\frac{B-1}{B}\right)^{B-1}.

(23)

Raising this result to the power of $N$ yields the probability that all neurons do not isolate $x$ . Finally, the complement of this event is the probability that at least one neuron isolates $x$ , which in turn results in $x$ being reconstructed. This leads to the final formula:

p_{R;B;N}=\mathbb{E}[R]=1-\left(1-\frac{1}{B}\left(\frac{B-1}{B}\right)^{B-1}% \right)^{N}

(24)

B.4 AGGP Hyperparameters

To determine suitable hyperparameters for AGGP applied to ImageNet data, we investigated the level of obfuscation achieved when multiple images are averaged together in a gradient row. Figure 6 displays an example of this effect, where the average is computed over an increasing number of images $n$ . To quantify this effect more objectively, we conducted a systematic evaluation. We randomly selected one image and averaged it with an increasing number of additional images (from 1 to 25), and then computed three commonly used metrics to quantify the similarity between the original image and the averaged image: the Structural Similarity Index Measure (SSIM, [29]), L1 Distance, and Peak Signal-to-Noise Ratio (PSNR). By repeating this process 100 times and averaging the metrics across all experiments, we obtained the results shown in Figure 8. The results clearly show that PSNR and SSIM decrease sharply in the range $[0,15]$ , while the L1 distance increases. After that, the values appear to converge slowly with little to no movement, which is why we decided to select 16 as our cut-off value $c$ . As evident from Figure 6 and Figure 8, low activation counts provide little obscurity, while higher activation counts require minimal pruning to obscure data, which is why we set the bounds for $p_{\text{keep}}$ to $p_{l}=0.01$ and $p_{u}=0.95$ .

Appendix C Experiment details

All results that we report are averaged across 10 runs, each evaluated on 10 batches of unseen test data. For each individual run, the weights are randomly initialized, and the train / test split is shuffled randomly. In the case of malicious model initialization and data extraction, train images refer to those used by the PAIRS algorithm to tune the model’s parameters, while test images refer to those used to evaluate the reconstruction percentage. For each (N,B) setting, the standard error across these 10 runs is calculated, and multiplied by $1.96$ to determine the 95% CI. For every batch, a single forward pass is performed, and the metrics $A$ , $P$ and $R$ (see Equations 11 and 13) are obtained. Since during evaluation, we have access to the internals of the model, we don’t have to examine the gradients to determine the data leakage. Rather we directly determine which samples will be leaked by observing the activation patterns in the maliciously initialized layer during the forward pass. Furthermore this direct access allows us to feed the input directly to our maliciously initialized layer, circumventing compute heavy convolutional layers that were initialized as identity functions.

C.1 Datasets

We used three datasets to evaluate our method: The ImageNet-1k [21] validation set (6GB, 50k images) ³³3https://image-net.org/challenges/LSVRC/2012, the CIFAR-10 [22] dataset ⁴⁴4https://www.cs.toronto.edu/ kriz/cifar.html and the IMDB [23] binary sentiment classification dataset ⁵⁵5https://ai.stanford.edu/ amaas/data/sentiment. All these datasets can be used for non-commercial research purposes.

C.2 Compute Resources

All experiments were run on an RTX 2060 Super with 8GB VRAM. Data-extraction runs in near real-time, scaling linearly with batch size and data dimensionality, as it is simply done in one step, by dividing the weight gradients by the bias gradients. Time to maliciously initialize the model varied across methods: QBI required less than 1 second in all settings, while PAIRS initialization times ranged from 10 seconds for CIFAR-10 (N=200, B=20) to 12 minutes for ImageNet (N=1000, B=200). For even larger layer sizes N, initialization time for PAIRS will scale linearly. Obtaining all results, for both QBI and PAIRS, across all possible combinations of N and B and all three datasets, averaged across 10 runs with random seeds for model intialization and train / test split, required approximately 8 hours of compute. Evaluation of AGGP’s impact on training performance across 20 runs (10 Base, 10 protected with AGGP) required about 2 hours of training time. Experiments evaluating QBI on synthetic data (see Table 5) took about 1 hour, averaging over 300 random runs per (N, B) setting.

C.3 Image models

Table 7 outlines the implementation of the model used for image data extraction. As Boenisch et al. [12] explain in their Appendix B, convolutional layers can be modified to pass the input to the next layer, effectively acting as an identity function. This can be achieved through various methods. Algorithm 3 provides a minimal working example using a 2D convolutional layer, suitable for RGB images. To preserve the image shape, we employ a kernel size of 3, padding of 1, and stride of 1. Since we have three channels to transmit, we initialize three filters, each acting as the identity function for its respective channel. This is achieved by setting the weight values of the $i$ -th filter to zero and then setting the center value of its weight matrix for the $i$ -th channel to one. Additionally, randomly initialized filters could be added to obscure the modifications made to the model.

Algorithm 3 Conv2D Identity Initialization Example

num\_channels\leftarrow 3

conv2d\leftarrow\textsc{Conv2d}(in=3,out=3,k=3,s=1,p=1)

3:for

i\leftarrow 0

num\_channels

conv2d.weight.data[i,:,:,:]\leftarrow 0

conv2d.weight.data[i,i,1,1]\leftarrow 1

6:end for

Table 7: Architecture of models used in the experiments on image data. f: number of filters, k: kernel size, s: stride, p: padding act: activation function, n: number of neurons. The size of the second to last layer was varied across experiments.

CNN Architecture
Conv(f=128, k=(3, 3), s=1, p=1)
Conv(f=256, k=(3, 3), s=1, p=1)
Conv(f=3, k=(3, 3), s=1, p=1)
Flatten
<Optional> BatchNorm / LayerNorm
Dense(n=1000, act=ReLU)
Dense(n=#classes, act=None)

Table 8: Architecture of models used in the experiments on the IMDB dataset. feat: vocabulary size, dim: embedding size, act: activation function, n: number of neurons.

IMDB-Model Architecture
Embedding(feat=10_000, dim=250)
<Optional> BatchNorm / LayerNorm
Dense(n=1000, act=ReLU)
Dense(n=1, act=None)