A Data Privacy Protection Method for Infectious Disease Prediction Models with Balanced Training Speed and Accuracy
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recently, the application of deep learning technologies in the construction of infectious disease prediction models has significantly increased their auxiliary effectiveness in formulating prevention and control strategies for infectious diseases. Typically, scientists rely on extensive datasets to thoroughly train models, aiming to generate ones with high predictive accuracy to forecast the occurrence trends of emerging infectious diseases. However, given the inherent need for privacy protection in medical data, many institutions are reluctant to readily share their data resources due to the compliance and security concerns, which not only directly limits the comprehensiveness and diversity of training data, but also decreases the predictive accuracy on infectious disease predictive models.To address these issues, firstly, we propose a Random Transmission Hybrid Homomorphic Algorithm, which enhances the efficiency of the model by utilizing a random transmission sequence and a hybrid approach that combines semi-homomorphic and fully homomorphic algorithms. Secondly, we developed a DS-DSSGD (Data Select-Distributed selective stochastic Gradient descent) algorithm to balance the training speed and predictive accuracy for the model after incorporating privacy-preserving computational technologies. Finally, we have established a scientific research collaboration platform, XDP, to integrate data from multiple users and provide end-to-end lifecycle management for data.