Machine Learning-Based Prediction of TPPA Confirmation Results in Blood Donor Syphilis Screening: A Large-Scale Multi-Algorithm Comparative Study

Xuelong Ge
Mingming Qian
Xiaohua Yang
Liwei Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

BACKGROUND Syphilis screening in blood banks relies on enzyme immunoassays (EIAs) with treponemal particle agglutination (TPPA) confirmation. This study aimed to develop and compare machine learning models for predicting TPPA confirmation results to optimize screening workflows. METHODS This retrospective cohort study analyzed 762,655 blood donor specimens from December 2020 to July 2025. Signal-to-cutoff (s/co) ratios and dual-reagent screening results were evaluated. Logistic regression, random forest, and gradient boosting models were compared using receiver operating characteristic curves and decision curve analysis. Hyperparameter optimization was performed using grid search with cross-validation. Feature importance was assessed using SHAP values. RESULTS The overall positive rate was 0.157% (1,196/762,655). TPPA confirmation rates were 89.6% for dual-reagent positive versus 26.0% for single-reagent positive samples (relative risk, 3.45; p < 0.0001). The s/co ratio demonstrated excellent predictive value (area under the curve AUC , 0.909); at the optimal threshold of 7.75, sensitivity was 81.9%, specificity was 96.7%, and positive predictive value was 98.2%. The gradient boosting model achieved the best performance (AUC, 0.946), outperforming random forest (AUC, 0.928) and logistic regression (AUC, 0.933). Decision curve analysis demonstrated higher net benefit for the gradient boosting model across clinically relevant threshold probabilities. CONCLUSION Machine learning models, particularly gradient boosting, significantly improve prediction of TPPA confirmation results. Implementation of s/co stratified management and machine learning-assisted decision systems can enhance blood safety efficiency while reducing unnecessary confirmatory testing costs.

Version published to 10.21203/rs.3.rs-8926817/v1 on Research Square
Mar 20, 2026

Development and Validation of a Machine Learning Model for Hepatitis C Virus Exposure: A Demographic Screening Approach for the US Population

This article has 5 authors:
1. Dorian G Ding
2. Taoyi Chen
3. Yu Sheng
4. Jeffrey S.H. Lin
5. Ye Yuan
This article has no evaluationsLatest version Apr 15, 2026
Predicting Iron Deficiencies Using Routine Complete Blood Cell Count Parameters: A Machine Learning Approach and Evaluation

This article has 5 authors:
1. Davide Negrini
2. Laura Pighi
3. Simone Mignolli
4. Gian Luca Salvagno
5. Giuseppe Lippi
This article has no evaluationsLatest version Apr 2, 2026
Predicting Mortality Risk in Sepsis-Induced Early Coagulopathy: A Multicenter Comparison of Machine Learning and Nomogram Approaches

This article has 2 authors:
1. hongwei duan
2. Yan Huang
This article has no evaluationsLatest version Apr 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Development and Validation of a Machine Learning Model for Hepatitis C Virus Exposure: A Demographic Screening Approach for the US Population

Predicting Iron Deficiencies Using Routine Complete Blood Cell Count Parameters: A Machine Learning Approach and Evaluation

Predicting Mortality Risk in Sepsis-Induced Early Coagulopathy: A Multicenter Comparison of Machine Learning and Nomogram Approaches