Secure Generalization through Stochastic Bidirectional Parameter Updates Using Dual-Gradient Mechanism
Abstract
Federated learning (FL) has gained increasing attention due to privacy-preserving collaborative training on decentralized clients, mitigating the need to upload sensitive data to a central server directly. Nonetheless, recent research has underscored the risk of exposing private data to adversaries, even within FL frameworks. In general, existing methods sacrifice performance while ensuring resistance to privacy leakage in FL. We overcome these issues and generate diverse models at a global server through the proposed stochastic bidirectional parameter update mechanism. Using diverse models, we improved the generalization and feature representation in the FL setup, which also helped to improve the robustness of the model against privacy leakage without hurting the model’s utility. We use global models from past FL rounds to follow systematic perturbation in parameter space at the server to ensure model generalization and resistance against privacy attacks. We generate diverse models (in close neighborhoods) for each client by using systematic perturbations in model parameters at a fine-grained level (i.e., altering each convolutional filter across the layers of the model) to improve the generalization and security perspective. We evaluated our proposed approach on four benchmark datasets to validate its superiority. We surpassed the state-of-the-art methods in terms of model utility and robustness towards privacy leakage. We have proven the effectiveness of our method by evaluating performance using several quantitative and qualitative results.
1 Introduction
In recent years, Federated Learning (FL) [26] has gained wide attention across various domains, including healthcare [12, 24, 15], autonomous driving [34], etc., since FL allow clients to locally train data and share only model parameters (not sensitive data) to the global server for aggregation. Current studies [10, 8, 36, 39] highlighted the issue of privacy leakage through shared model parameters, which offers vulnerability to adversaries in the form of different types of attacks. Several attempts have been made to solve the privacy-leakage issues and provide enhanced protection, which includes homomorphic encryption [2, 16], differential privacy [1, 22], and gradient perturbation [30]. These methods attempt to secure the privacy of sensitive data at the cost of computational overhead or sacrifice the model’s efficiency. Researchers aim to maintain utility without sacrificing the model’s accuracy and encrypt the training data [5, 14]. These approaches require sharing classifier model parameters to perform model aggregation at the server and defend against image reconstruction attacks in FL. Following these attempts, Researchers have proven the vulnerability of clients in these methods for label inference attacks and membership inference attacks, and hence not suitable to provide adequate security [29, 10].
Recently, Yutin et al. [25] provided a theoretical analysis of different attacks in FL and highlighted the concerns of privacy leakage due to the sharing of classifier parameters. To overcome the privacy leakage issue, Yutin et al. [25] proposed a Generative Adversarial Network (GAN) based privacy-preserving image distribution sharing scheme (PPIDSG) in FL, which does not require the sharing of classifier model parameters. To secure federated learning, PPIDSG employs GAN-based parameter sharing to learn the distribution of encrypted images and update client models with an aggregated model. However, learning in the encrypted domain involves a trade-off between utility and security. Also, the same global update to different clients limits the generalization of clients. Moreover, the gradients communicated from the global server to clients are also susceptible to various attacks, which can be improved to make the model more secure. Particularly, to avoid privacy leakage, researchers have proposed differential privacy (DP) [1, 22, 31] and gradient perturbation-based methods [40, 30]. The aforementioned approaches ensure resistance to privacy leakage but also sacrifice the performance of FL due to a lack of systematic perturbation in gradients.
Motivated by these observations and considering these gaps, we found scope to improve the utility and security perspectives in FL. Particularly, we focus on: 1) How to retain the model utility when focusing on security by not sharing classifier model parameters in FL communications. 2) How to improve the robustness of the FL setup against different attacks without sacrificing the utility of the model in terms of its classification accuracy. To achieve these objectives, we proposed a novel approach that provides a more generalized and robust update from the global model to clients during FL, which improves the robustness of the model against the different attacks and does not sacrifice the model’s utility while making the model secure. Our stochastic bidirectional update approach uses a dual-gradient mechanism to generate diverse models (in close neighborhoods) for each client, which improves the generalization and security perspective of FL. After obtaining diverse global models, we do not make any further alterations that help retain utility and generalization.
Our Contributions: We propose a novel approach that follows our stochastic bidirectional parameter update mechanism to generate diverse and generalizable global models for different clients. The proposed approach improves the robustness of clients in FL against different data attacks without sacrificing the model’s utility. Our approach makes systematic alterations to the global model using a dual gradient mechanism to make multiple diverse models by using global models from previous FL rounds. The diverse models generated by our method are in a close neighborhood so that clients can improve generalization as well as robustness against privacy attacks. We validated the superiority of our approach using four datasets against state-of-the-art (SOTA) methods. Our method is evaluated considering the model’s utility and robustness against attacks and surpasses the SOTA methods.
2 Related work
Several optimization approaches have been proposed to improve the utility of FL methods. The various optimization methods for FL can be categorized into global variable-based [20, 17], device grouping-based [7, 4, 19] knowledge distillation-based [27, 23, 41]. In FedProx [20], a proximal term is calculated as a squared distance of a global model with local models that helps in regularizing local loss and helps in model convergence. SCAF-FOLD [17] improves local training through global control variables to adjust optimization direction in each round of FL. The device-grouping FL approaches optimize local training by heuristic-based selection of local devices from the device groups for local training, which are grouped based on the specific similarity metric (model similarity). CluSamp [7] performs the client grouping based on sample size or model similarity. FedCluster [4] follows cyclic FL, wherein in each FL round, clients are grouped into multiple groups that perform FL.
Knowledge distillation-based methods help to improve the inference of the FL by using knowledge of the teacher network to teach the student network. FedAUX [27] makes use of an auxiliary dataset for knowledge distillation and initialize server model. FedDF [23] accelerates the FL by using the ensemble model as a teacher model and unlabelled data for knowledge distillation. The global variable-based methods are computationally demanding in terms of additional communication of global variables and proximal term computation over clients. The device grouping methods need to access all local methods to estimate similarity for grouping, which leads to vulnerability for privacy leakage. On the other hand, the knowledge distillation-based methods need additional overhead for computations and datasets for the distillation process.
To ensure security in FL, researchers have proposed various types of defense mechanisms against the attacks. The common attacks in FL include property inference attacks, membership inference attacks, and image reconstruction attacks. In the property inference attack, the adversary aims to determine the specific attributes that belong to a subset of the training data [8]. In a label inference attack, the adversary aims to determine the label attribute. In an image reconstruction attack, the adversary uses gradients sent from the client to the server model to reconstruct the original image. The authors attempted to perform minimization optimization using gradient difference for the original image and dummy image DLG [40], which was further enhanced through the extraction of ground truth labels in iDLG [38]. Gradient Inversion [35] is proposed to reconstruct complex and high-fidelity images using group consistency regularization.
To make secure FL, researchers have also utilized GAN-based methods. The Generative Adversarial Network (GAN) was proposed to generate images resembling real ones using min-max optimization and adversarial loss between the generator and discriminator network [11]. GAN can also be used for image translation, which has also been explored by researchers for defense or attack. The adversary can use GAN to generate the target distribution images in real time [13]. To perform a model extraction attack through a substitute network trained using GAN [38]. Conditional GANs were utilized in FedCG to resist the image reconstruction attack for privacy preservation [32]. Recently, GAN has been used to establish secure FL by using GAN parameters instead of sharing classifier parameters to avoid privacy leakage since GAN holds encrypted domain distribution [25]. To ensure security against attacks, researchers commonly employ DP-based methods [1, 22, 31] or follow gradient pruning, gradient perturbation-based methods [40, 30]. These methods sacrifice performance to resist privacy leakage due to non-systematic gradient alteration in the form of a defense mechanism, which also affects the overall learning of the FL setup.
Considering the aforementioned limitations, our approach improves the utility-security trade-off in FL. To achieve this, we propose a stochastic bidirectional learning approach that allows generalized learning in local clients through diverse updates/ models from the global server such that these diverse solution updates are in close neighborhoods. Using diverse but close neighborhood updates, the clients follow generalized solutions and hence improve the classification accuracy to improve the utility of FL. To ensure security without hurting utility, our approach follows systematic updates in gradients of the diverse solutions sent from the global server to different clients so that updates are optimally closer.
3 Preliminaries
3.1 Overview of Federated Learning
FL follows a cloud-server architecture, which consists of a global server and multiple local clients. In this paper, we consider the FL system to consist of homogenous local client models, i.e., using similar data distribution and having a model structure as of the global model. Assume our FL aims to map input space to output space using global model, and local models represented as {, , …, }. We denote the local dataset for each client as = {, , …, } such that = . In FL, global model weights are trained collaboratively by local clients by sharing their learning (local models) with the global server. The conventional method to generate a global model by aggregating the local models is FedAvg [3]. We can formulate the primary objective of FL using Eq. 1.
(1) |
where, denotes the loss terms for the global model, denotes loss for individual samples, comprises the loss of all samples for local client .
3.2 Privacy Leakage Using Attacks in FL
In our work, we hold the assumption that the adversary does not corrupt the training. The client models share their parameters (weights or gradients) with the global model. When all local clients train their models for just one local epoch between two global aggregation operations, using their complete training datasets, we assume these parameters are equivalent. This scenario results in a white-box attack where model parameters and structure are accessible to the adversary.
Label Inference Attack (LIA): Assume each client with local dataset where , denotes the -th sample and its ground truth label respectively. Consider local training with batch size, with cross-entropy loss, for the classification task; we can define the gradient of with respect to (wrt) using Eq. 2
(2) |
where, denotes the number of class labels, denotes the predicted logit. =1 if output index matches ground truth else =0. In LIA, our aim is to count images present in a batch () for each class type . For each input , we can compute gradient wrt network output at index as suggested in (Yin et al. 2021) as shown in Eq. 3:
(3) |
Further, we can use the uploaded gradient from the classifier model to perform LIA by multiple passes of random samples to the classifier to compute each category count , which leads to privacy leakage [25]. We do not share the classifier parameters, so our method is resistant to LIA.
Membership Inference Attack (MIA):
In our work, we consider the enhanced MIA [25], defined as follows: Suppose attacker has shadow dataset which contains some images with target distribution but not in the target dataset. To improve attacks, shadow models are rebuilt for victims and other users. Using model parameters from clients, generates a copy of victim model and other models (aggregate if other users 2). The adversary produces non-overlapping datasets randomly using as and , which are fed as input to, respectively. The obtained predictions , are manually considered with labels as (member) and (non-member) respectively. Using obtained predictions and their labels, an inference model is trained by the adversary. Since the adversary has a skeptical dataset, it becomes difficult to determine whether the data comes from the victim or other users. fed this dataset to to perform MIA to infer data samples, and its success is determined based on correct inferences.
Image Reconstruction Attack (IR):
During IR, the adversary attempts to recover the original image through the encrypted image to perform privacy leakage. To achieve this, optimization aimed to minimize the gradient difference obtained through the dummy images having labels and gradients shared by victim . We can formally denote IR using Eq. 4.
(4) |
4 Methodology
We propose a stochastic bidirectional parameter update mechanism to improve the utility of clients as well as improve their defense against different attacks. To achieve this, our approach generates diverse and generalizable global models through systematic perturbations using a dual gradient mechanism such that the diverse global models for different clients are in close neighborhoods. Our approach improves the robustness of clients against different attacks without sacrificing model utility.
To validate the effectiveness of our method, we followed the setup similar to PPIDSG [25] and did not share the classifier () parameters and used GAN to share the parameters from generator () parameters in FL, where learns the image distribution in the encrypted domain. We augment training data and encrypt it by utilizing the image distribution scheme proposed by PPIDSG, where we encrypt training images through several transformations like Rotation and augmentation using a pseudo-random bit, image block flipping, and pixel value exchange across channels. We learn target distribution in by adversarial training with Discriminator, . We used auto-encoder [28] to build a feature extractor and used a separate classifier network to train it. The obtained classifier loss is fed to to help in learning class specific distributions to improve classification.
Our FL setup consists of clients () having their local dataset with samples in total. During , clients update their model locally and share the generator parameters with the central/global server. The central server takes updates sent from the local clients and aggregate model parameters as shown in Eq. 5, where and denote the learning rate and loss function, respectively.
(5) |
To generate a diverse global model for each client, we proposed a Stochastic Bidirectional Parameter Updates () strategy, which utilizes the global models from previous FL rounds. We elaborate on our proposed approach as follows:

4.1 Stochastic Bidirectional Parameter Updates using Dual-Gradient Mechanism
Our novelty lies in proposing generalized updates for local clients to improve the utility and resist privacy leakage in FL. We validated the superiority of our method over the recent state-of-the-art method [25] (PPIDSG) by following a similar FL set-up (except using the proposed approach to generate diverse global solutions) and a further improvement in model utility and defense across the attacks. To achieve this, we propose a stochastic bidirectional learning mechanism that helps to generate diverse solutions at the global server to update local clients. The generated models at the server are in the neighborhood to provide generalized solutions for FL clients (refer to Fig. 1).
Our approach makes bidirectional systematic alterations in the gradients by modification in model parameters at a fine-grained level (i.e., altering each convolutional filter across the layers of the model), which improves the defense against attacks. These systematic updates provide a diverse model for each client and help them with generalized learning to improve the utility of the model (refer Fig. 1). The overview of the proposed method is provided in Fig. 2.
The overall FL round mainly consists of four steps: 1) The global model sends a diverse model to each client, as shown in Algo. 1. 2) With the received diverse model, each client undergoes local training and then uploads the model parameters (in our case, ) to the global model. 3) The global model aggregates the received models from the clients as shown in Eq. 5 to obtain the updated global model. 4) Using the current global model and previous global models, we perform Stochastic Bidirectional Parameter Updates () to generate diverse models as shown in Algo. 2. For each client, the global model sends one diverse model to assist in its learning in the next round. Finally, we update the global models (, , ) for the next stochastic update.
Please note that, for the initial two FL rounds, we initialize and with .To perform , we use as [-1, , -1, 1, , 1, -2, , -2, 2, …, 2], where the frequency of each stochastic term, i.e., {-1, 1, -2, 2}, equals to where denotes the number of filters present in layer of global model . The is randomly shuffled to create diverse models by performing bidirectional parameter updates in filter from layer . If , we perform the update under diversity rate as shown in Line 9 of Algo. 2, else we update under diversity rate as shown in Line 11 of Algo. 2, where , denotes the gradient computed for -th filter of -th layer using global models from previous and previous to previous FL rounds, respectively; and denotes the diverse model obtained after for -th filter of -th layer.
Input:
i) , the global model, ii) , the previous global model, iii) , the previous to previous global model,
iv) , # clients for FL.
Output:
i) , the list of diverse models
Intermediate: ()
Input:
i) : global model, ii) : gradients using previous run, iii) : gradients using previous to previous run, iv) , : diversity rates
Output:
i) , the mutated weights
()

4.2 Overall Architecture
In the GAN setup, captures the distribution in encrypted space and shares its parameters to perform FL. The original image is fed to instead of noise as it improves privacy. The generator network, , consists of the encoder, ResNet blocks, and a decoder. To extract features from the original image, the encoder is used, which is fed into the ResNet block to maintain and align the image features into the target domain. Finally, the decoder helps to restore the features of the image. The discriminator network utilizes the adversarial loss (without conditional labels), for effective conversion into the encrypted domain. Consider original image domain () with distribution and target domain with distribution . For training images with batch size , we can express the objective using Eq. 4.2 where and tires to maximize and minimize the objective, respectively.
(6) |
To retain semantic information, we use semantic loss using norm as shown in Eq. 7, where denotes the generator parameters.
(7) |

To improve distribution learning in towards the class-specific features, we used classification loss into . With training epochs, learns to align generated images into the encrypted domain, and then generator parameters can be shared to the global server by different clients to facilitate FL. To improve the feature learning, we use feature extractor , which consists of Encoder () and decoder networks. The features extracted by are converted into images using . To extract efficient features, tries to minimize the feature distance between the image generated by and as shown in Eq. 8 where denotes the generated image. The classifier consists of a simple convolutional network and takes features from to minimize the classification error () for classes as described in Eq. 9. We compute total loss as shown in Eq. 10 where and are hyperparameters to control the influence of and respectively.
(8) |
(9) |
(10) |
4.3 Convergence Analysis
Our global model is aggregated from all the trained local models similar to FedAvg. Let denote the SGD iteration on the local client, and each local client undergoes SGD training iterations, denotes the aggregated model. For , our method satisfies the following property:
(11) |
where denotes the mutated weights in the round. Inspired by [21], the following assumptions on the loss functions of local clients (i.e., ) can be considered.
Assumption 1: For , is -smooth, where .
Assumption 2: For , is -strongly convex, where .
Assumption 3: The variance of stochastic gradients is bounded by , i.e., , where is a data batch of the client in the FL round.
Assumption 4: The expected squared norm of stochastic gradients is bounded by , i.e., .
Based on these assumptions, our convergence can be obtained as:
Theorem 1. (Convergence of SBPU) Let Assumption 1-4 hold. If there are FL rounds during the FL training process. Let denotes total number of SGD iterations and is learning rate. Let , . We have
(12) |
where . Theorem 1 computes loss for SBPU between (optimal weight) and in the interaction and indicates a convergence rate similar to FedAvg (detailed in [21]). We have provided full proof of convergence analysis of SBPU in the supplementary material.
5 Experimental Setup and Results Obtained
5.1 Dataset Used and Implementation Setup
We implemented the proposed approach using the PyTorch programming framework using NVIDIA RTX A5000 GPU with 24GB GPU memory. We evaluated our model on four datasets MNIST [6], FMNIST [33], CIFAR10 [18], and SVHN [37] as per official train-test split. To perform different attacks, we randomly select any one client as a victim.
5.2 Implementation Details
For a fair comparison with recent benchmark PPIDSG [25], we followed a similar setting, i.e., homogeneous distribution across clients in the FL system [26] having 10 clients with equal training data access. We followed a batch size of 64 for GAN. We used block sizes ( and ) for image encryption as 4. The generator and discriminator network have been trained using Adam optimizer with a learning rate (lr) of 0.0002. For the feature extractor () and classifier (), we used an SGD optimizer with an lr of 0.01 and weight decay of 0.001. We keep the initial lr constant for the first 20 global iterations and then follow linear decrement until it converges to 0. In Eq. 10, we used =1 and =2 and train model for 100 rounds. We used = 0.025, 0.25, 0.15, and 1.1 in our learning rule for MNIST, FMIST, CIFAR10, and SVHN datasets, respectively. We defined and as and , respectively. For further details, please refer to the supplementary material.
5.3 Defense Baselines
To evaluate the robustness of our model against different attacks, we compared it against several defense mechanisms, i.e., 1) ATS [9] (to find optimal image transformation through automatic transformation search) 2) EtC [5] (encrypt image using block-based image transformation) 3) DP [31] (use clipped gradient with gaussian noise during model training) 4) GC [40] follows gradient pruning to avoid privacy leakage 5) FedCG [32] use conditional GANs for privacy preservation in FL 6) PPIDSG [25] (share GAN parameters during FL rather than classifier to ensure privacy preservation). For differential privacy (DP), we kept the privacy budget as for global training epochs with clipping hyperparameter and denoted as DP .
5.4 Utility and Robustness Against Attacks
To validate the utility of the model, we select a random user to evaluate classification performance since our approach doesn’t have a global classification model. We provide the highest classification accuracy obtained by our model and compare it with different techniques in Table 1. Our method surpassed the state-of-the-art (SOTA) methods and obtained the highest classification accuracy. For CIFAR10, we observed significant improvement in classification accuracy. It is important to note that ATS and EtC defense policies utilize ResNet-18 as a classifier, which mainly determines the model utility and affects the classification accuracy. Our model uses a simple classifier network and is able to improve model utility mainly due to diverse and generalized updates to clients. To ensure effectiveness against attacks, our approach does not share the classifier parameter. Since we do not share the classifier model, our model becomes robust against Label Inference Attack (LIA). We have shown the robustness of other defense methods against LIA using different activation functions and architecture in the supplementary material. We considered enhanced Membership Inference Attack (MIA) as proposed in [25] for evaluation. We compared our defense accuracy against MIA with other defense mechanisms in Table 2.
Due to page limit constraints, we provided additional results, such as model utility and its robustness against MIA under and settings for the remaining datasets in the supplementary material. For Reconstruction Attack (RA), we do not share the classifier model parameters with the global model, so the adversary fails to perform RA on our approach and attempts to reconstruct the image using generator parameters. In Fig. 4, we provide the quantitative and qualitative comparison of different mechanisms with the proposed approach against RA.


Method | CIFAR10 | SVHN |
---|---|---|
ATS | 59.67 | 85.22 |
EtC | 53.34 | 78.7 |
DP1 | 49.29 | 82.7 |
DP2 | 44.43 | 80.28 |
GC1 | 54.07 | 84.36 |
GC2 | 50.91 | 79.96 |
FedCG | 53.2 | 79.71 |
PPIDSG | 70.56 | 91.53 |
Ours | 75.06 | 92.36 |
Method | CIFAR10 | SVHN | ||
---|---|---|---|---|
Part | All | Part | All | |
ATS | 84.3 | 73.51 | 54.78 | 52.41 |
EtC | 55.84 | 46.83 | 50.04 | 63.89 |
DP1 | 69.6 | 68.74 | 56.37 | 63.55 |
DP2 | 70.28 | 65.95 | 59.37 | 55.76 |
DP3 | 87.17 | 75.47 | 62.21 | 55.19 |
DP4 | 73.71 | 56.44 | 59.31 | 57.53 |
DP5 | 74.29 | 66.64 | 58.66 | 58.66 |
GC1 | 75.56 | 72.43 | 60.74 | 58.31 |
GC2 | 67.41 | 61.18 | 56.37 | 56.01 |
GC3 | 83.71 | 84.16 | 58.13 | 54.24 |
FedCG | 49.84 | 70.03 | 51.18 | 53.72 |
PPIDSG | 54.39 | 52.54 | 52.35 | 47.21 |
Ours | 40.31 | 36.11 | 49.44 | 39.52 |
DatatSet | SingleGrad | DualGrad | TripleGrad | |
---|---|---|---|---|
CIFAR10 | 0.15 | 0.17(0.62) | 0.17(0.54) | 0.19(0.59) |
0.25 | 0.15(0.66) | 0.15(0.56) | 0.20(0.60) | |
0.5 | 0.16(0.71) | 0.16(0.59) | 0.10(0.67) | |
1.5 | 0.16(0.67) | 0.15(0.67) | 0.24(0.78) |
.
Dataset | Train (seconds) | Test ( seconds) | ||
---|---|---|---|---|
PPIDSG | Proposed | PPIDSG | Proposed | |
MNIST | 111.76 | 112.90 | ||
CIFAR10 | 123.52 | 125.26 | ||
SVHN | 186.73 | 187.99 | ||
FMNIST | 111.30 | 111.79 |
5.5 Ablation Study
To decide the proposed stochastic bidirectional parameter update strategy, we performed ablation to analyze the effect of varying the number of global models (one, two, three, i.e., SingleGrad, DualGrad, and TripleGrad) to consider from previous FL rounds. SingleGrad, DualGrad, and TripleGrad consider stochastic terms for as { 1} with diversity rate: ; { 1, 2} with diversity rate: , ; and { 1, 2, 3} with diversity rate: , , respectively. We provided these ablations in Table 3.
We performed ablation to analyze the effect of i) block size for image encryption, ii) the number of clients, and iii) the Effect of varying diversity rates, iv) compared the defense accuracy of our approach with SOTA settings: original image and no update; and provided results in the supplementary material. While comparing our method over PPIDSG using different numbers of clients, we found our method better, confirming its reliability and practical utility. We have shown the effect of increasing clients for both methods using CIFAR10 and FMNIST datasets in Fig. 5.
Comparison of train and test time: Our proposed method (SBPU) takes almost the same time as PPIDSG during the training and testing phase and offers significant performance improvement. We provide a comparative analysis of the time taken by both methods on different datasets in Table 4, which supports its practical utility.
6 Conclusion and Future Work
Our work makes a significant contribution to generate diverse global models for clients to improve the generalization and robustness against different privacy attacks. Our method follows the stochastic bidirectional update mechanism, which offers systematic perturbations to the global model in the parameter space of the model to generate diverse updates for clients. We validated the significance and utility of the proposed method through extensive experimentation on four datasets and surpassed the available SOTA methods. While improving privacy leakage issues during attacks, our method does not sacrifice the performance and improves the utility-security trade-off in FL. Our method offers an opportunity for future researchers to optimize existing marginal computation overhead in SBPU and explore more sophisticated bidirectional update methods. In the future, we can validate SBPU robustness across diverse data types in security-critical areas like healthcare, social media, and surveillance.
References
- Abadi et al. [2016] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
- Aono et al. [2017] Yoshinori Aono, Takuya Hayashi, Lihua Wang, Shiho Moriai, et al. Privacy-preserving deep learning via additively homomorphic encryption. IEEE transactions on information forensics and security, 13(5):1333–1345, 2017.
- Bonawitz et al. [2017] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1175–1191, 2017.
- Chen et al. [2020] Cheng Chen, Ziyi Chen, Yi Zhou, and Bhavya Kailkhura. Fedcluster: Boosting the convergence of federated learning via cluster-cycling. In 2020 IEEE International Conference on Big Data (Big Data), pages 5017–5026. IEEE, 2020.
- Chuman et al. [2018] Tatsuya Chuman, Warit Sirichotedumrong, and Hitoshi Kiya. Encryption-then-compression systems using grayscale-based image encryption for jpeg images. IEEE Transactions on Information Forensics and security, 14(6):1515–1525, 2018.
- Deng [2012] Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine, 29(6):141–142, 2012.
- Fraboni et al. [2021] Yann Fraboni, Richard Vidal, Laetitia Kameni, and Marco Lorenzi. Clustered sampling: Low-variance and improved representativity for clients selection in federated learning. In International Conference on Machine Learning, pages 3407–3416. PMLR, 2021.
- Fu et al. [2022] Chong Fu, Xuhong Zhang, Shouling Ji, Jinyin Chen, Jingzheng Wu, Shanqing Guo, Jun Zhou, Alex X Liu, and Ting Wang. Label inference attacks against vertical federated learning. In 31st USENIX security symposium (USENIX Security 22), pages 1397–1414, 2022.
- Gao et al. [2021] Wei Gao, Shangwei Guo, Tianwei Zhang, Han Qiu, Yonggang Wen, and Yang Liu. Privacy-preserving collaborative learning with automatic transformation search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 114–123, 2021.
- Geng et al. [2021] Jiahui Geng, Yongli Mou, Feifei Li, Qing Li, Oya Beyan, Stefan Decker, and Chunming Rong. Towards general deep leakage in federated learning. arXiv preprint arXiv:2110.09074, 2021.
- Goodfellow et al. [2014] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Guo et al. [2021] Pengfei Guo, Puyang Wang, Jinyuan Zhou, Shanshan Jiang, and Vishal M Patel. Multi-institutional collaborations for improving deep learning-based magnetic resonance image reconstruction using federated learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2423–2432, 2021.
- Hitaj et al. [2017] Briland Hitaj, Giuseppe Ateniese, and Fernando Perez-Cruz. Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 603–618, 2017.
- Huang et al. [2020] Yangsibo Huang, Zhao Song, Kai Li, and Sanjeev Arora. Instahide: Instance-hiding schemes for private distributed learning. In International conference on machine learning, pages 4507–4518. PMLR, 2020.
- Jiang et al. [2022] Meirui Jiang, Zirui Wang, and Qi Dou. Harmofl: Harmonizing local and global drifts in federated learning on heterogeneous medical images. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1087–1095, 2022.
- Jin et al. [2023] Weizhao Jin, Yuhang Yao, Shanshan Han, Carlee Joe-Wong, Srivatsan Ravi, Salman Avestimehr, and Chaoyang He. Fedml-he: An efficient homomorphic-encryption-based privacy-preserving federated learning system. arXiv preprint arXiv:2303.10837, 2023.
- Karimireddy et al. [2020] Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning, pages 5132–5143. PMLR, 2020.
- Krizhevsky et al. [2009] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
- Li et al. [2021] Anran Li, Lan Zhang, Junhao Wang, Feng Han, and Xiang-Yang Li. Privacy-preserving efficient federated-learning model debugging. IEEE Transactions on Parallel and Distributed Systems, 33(10):2291–2303, 2021.
- Li et al. [2020] Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429–450, 2020.
- Li et al. [2019] Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189, 2019.
- Liao et al. [2023] Xinting Liao, Weiming Liu, Xiaolin Zheng, Binhui Yao, and Chaochao Chen. Ppgencdr: A stable and robust framework for privacy-preserving cross-domain recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 4453–4461, 2023.
- Lin et al. [2020] Tao Lin, Lingjing Kong, Sebastian U Stich, and Martin Jaggi. Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems, 33:2351–2363, 2020.
- Liu et al. [2021] Quande Liu, Cheng Chen, Jing Qin, Qi Dou, and Pheng-Ann Heng. Feddg: Federated domain generalization on medical image segmentation via episodic learning in continuous frequency space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1013–1023, 2021.
- Ma et al. [2024] Yuting Ma, Yuanzhi Yao, and Xiaohua Xu. Ppidsg: A privacy-preserving image distribution sharing scheme with gan in federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 14272–14280, 2024.
- McMahan et al. [2017] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
- Sattler et al. [2021] Felix Sattler, Tim Korjakow, Roman Rischke, and Wojciech Samek. Fedaux: Leveraging unlabeled auxiliary data in federated learning. IEEE Transactions on Neural Networks and Learning Systems, 34(9):5531–5543, 2021.
- Sellami and Tabbone [2022] Akrem Sellami and Salvatore Tabbone. Deep neural networks-based relevant latent representation learning for hyperspectral image classification. Pattern Recognition, 121:108224, 2022.
- Shokri et al. [2017] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017.
- Sun et al. [2021] Jingwei Sun, Ang Li, Binghui Wang, Huanrui Yang, Hai Li, and Yiran Chen. Soteria: Provable defense against privacy leakage in federated learning from representation perspective. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9311–9319, 2021.
- Wei et al. [2020] Kang Wei, Jun Li, Ming Ding, Chuan Ma, Howard H Yang, Farhad Farokhi, Shi Jin, Tony QS Quek, and H Vincent Poor. Federated learning with differential privacy: Algorithms and performance analysis. IEEE transactions on information forensics and security, 15:3454–3469, 2020.
- Wu et al. [2021] Yuezhou Wu, Yan Kang, Jiahuan Luo, Yuanqin He, and Qiang Yang. Fedcg: Leverage conditional gan for protecting privacy and maintaining competitive performance in federated learning. arXiv preprint arXiv:2111.08211, 2021.
- Xiao et al. [2017] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Xie et al. [2022] Kan Xie, Zhe Zhang, Bo Li, Jiawen Kang, Dusit Niyato, Shengli Xie, and Yi Wu. Efficient federated learning with spike neural networks for traffic sign recognition. IEEE Transactions on Vehicular Technology, 71(9):9980–9992, 2022.
- Yin et al. [2021] Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M Alvarez, Jan Kautz, and Pavlo Molchanov. See through gradients: Image batch recovery via gradinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16337–16346, 2021.
- Yu et al. [2023] Yang Yu, Qi Liu, Likang Wu, Runlong Yu, Sanshi Lei Yu, and Zaixi Zhang. Untargeted attack against federated recommendation systems via poisonous item embeddings and the defense. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 4854–4863, 2023.
- Yuval [2011] Netzer Yuval. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
- Zhao et al. [2020] Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. idlg: Improved deep leakage from gradients. arXiv preprint arXiv:2001.02610, 2020.
- Zhu et al. [2023] Junyi Zhu, Ruicong Yao, and Matthew B Blaschko. Surrogate model extension (sme): A fast and accurate weight update attack on federated learning. arXiv preprint arXiv:2306.00127, 2023.
- Zhu et al. [2019] Ligeng Zhu, Zhijian Liu, and Song Han. Deep leakage from gradients. Advances in neural information processing systems, 32, 2019.
- Zhu et al. [2021] Zhuangdi Zhu, Junyuan Hong, and Jiayu Zhou. Data-free knowledge distillation for heterogeneous federated learning. In International conference on machine learning, pages 12878–12889. PMLR, 2021.