Privacy-preserving AUC Computation in Distributed Machine Learning with PHT-meDIC

Marius de Arruda Botelho Herr
Cem Ata Baykara
Ali Burak Ünal
Nico Pfeifer
Mete Akgün

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Ensuring privacy in distributed machine learning while computing the Area Under the Curve (AUC) is a significant challenge because pooling sensitive test data is often not allowed. Although cryptographic methods can address some of these concerns, they may compromise either scalability or accuracy. In this paper, we present two privacy-preserving solutions for secure AUC computation across multiple institutions: (1) an exact global AUC method that handles ties in prediction scores and scales linearly with the number of samples, and (2) an approximation method that substantially reduces runtime while maintaining acceptable accuracy. Our protocols leverage a combination of homomorphic encryption (modified Paillier), symmetric and asymmetric cryptography, and randomized encoding to preserve the confidentiality of true labels and model predictions. We integrate these methods into the Personal Health Train (PHT)-meDIC platform, a distributed machine learning environment designed for healthcare, to demonstrate their correctness and feasibility. Results using both real-world and synthetic datasets confirm the accuracy of our approach: the exact method computes the true AUC without revealing private inputs, and the approximation provides a balanced trade-off between computational efficiency and precision. All relevant code is publicly available at https://github.com/PHT-meDIC/PP-AUC , facilitating straightforward adoption and further development within broader distributed learning ecosystems.

Author summary

A commonly used metric to evaluate the performance of machine learning models is the Area Under the Curve (AUC). Calculating the AUC in distributed machine learning settings is challenging because data cannot be shared between institutions due to privacy concerns. To address this, we developed two privacy-preserving methods: one that calculates the exact AUC securely and another that provides faster approximations with high accuracy. These methods use advanced encryption techniques to protect sensitive data while enabling secure collaboration. We tested them in a real-world healthcare platform called PHT-meDIC and demonstrated their effectiveness. The code is publicly available at https://github.com/PHT-meDIC/PP-AUC to support wider adoption.

Version published to 10.1101/2025.01.14.25320558v1 on medRxiv
Jan 15, 2025

Advancing Privacy-Preserving AI: A Survey on Federated Learning and Its Applications

This article has 2 authors:
1. Eustace Nowell
2. Sameera Gallus
This article has no evaluationsLatest version Jan 9, 2025
NoisyFlow: Differentially Private Optimal Transport Using Neural Networks for Secure Biomedical Data Sharing

This article has 6 authors:
1. Yunyang Li
2. Nikhil Khandekar
3. Skylar Wang
4. Varada Khanna
5. Julian Sanker
6. Mark B. Gerstein
This article has no evaluationsLatest version Feb 5, 2025
SafeSynthDP: Leveraging Large Language Models for Privacy-Preserving Synthetic Data Generation Using Differential Privacy

This article has 2 authors:
1. Md Mahadi Hasan Nahid
2. Sadid Bin Hasan
This article has no evaluationsLatest version Jan 30, 2025

Listed in

Abstract

Author summary

Article activity feed

Related articles

Advancing Privacy-Preserving AI: A Survey on Federated Learning and Its Applications

NoisyFlow: Differentially Private Optimal Transport Using Neural Networks for Secure Biomedical Data Sharing

SafeSynthDP: Leveraging Large Language Models for Privacy-Preserving Synthetic Data Generation Using Differential Privacy