FedXGB-OptDP: A Privacy-Optimised Federated XGBoost Framework with Differential Privacy for IID and Non-IID healthcare data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid growth of sensitive healthcare data results in a significant need for machine learning systems capable of providing accurate predictions while safeguarding patient privacy. Due to rapid growth, current privacy-preserving federated tree models face significant computational expenses, inadequate noise allocation methodologies, and losses in accuracy while maintaining a trade-off between privacy and utility in both IID and non-IID scenarios. To overcome the challenges, a privacy-focused extension of the Federated XGBoost architecture, FedXGB-OptDP, has been developed. It integrates Hybrid optimisation techniques along with the regularisation. A Depth-Adaptive Differential privacy (DAD), Noise–Aware Regularisation (NAR), and a hybrid optimisation technique such as Genetic Algorithm (GA) and Bayesian TPE search. The DAD-NAR is essential for adaptively regulating the allocation of privacy budgets across tree depths, using calibrated Laplace Noise, and implementing noise-aware node dropout-ensures that model stability throughout training while safeguarding privacy. Each client executes GA-driven federated feature selection when combined with TPE–based hyperparameter optimisation, facilitating efficient learning while maintaining data privacy. Global aggregation is achieved through consensus-driven feature voting and weighted averaging of hyperparameters, eliminating the necessity for complex cryptographic techniques such as Homomorphic Encryption (HE) or Secure Multi-Party Computation (SMPC). Experiments performed on five datasets across both IID and Non-IID configurations demonstrate that our model consistently achieves high levels (up to 95–96%) while ensuring robust privacy safeguards. It exceeds the performance of centralised XGBoost and prominent federated baselines, including PrivaTree, FedXHDP, and FedBoost. Overall, the results show that adaptive differential privacy, when integrated with optimisation, substantially enhances the trade-off between privacy and utility, as well as the reliability and scalability of federated decision-tree models. Hence, it provides a practicable, efficient, and highly precise solution for privacy-preserving collaborative learning within real-world environments.

Article activity feed