A privacy preserving federated clustering algorithm for data imbalance based on density peak clustering and Gaussian distribution simulation data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Federated clustering is an unsupervised learning method that has emerged in recent years in distributed data environments. It aims to discover knowledge from data provided by multiple clients while protecting data privacy by combining federated learning and clustering techniques. However, there are significant challenges in data heterogeneity and communication. This paper proposes a new federated learning clustering method, DPG-PFC, to address the data imbalance problem. The core of the method is to obtain statistical information of clusters (such as variance, mean, etc.) through local density peaks clustering, and use this information on the server to reconstruct a simulated dataset by Gaussian distribution for re-clustering. To enhance privacy protection, a differential privacy mechanism is adopted to add noise to local data, ensuring privacy security during operations. Experiments on the MNIST dataset validate the effectiveness of the method. This method uses local density clustering to resolve the inconsistency in the number of clusters caused by data imbalance. Additionally, the method requires only one communication round, which significantly improves communication efficiency.

Article activity feed