Development and Validation of a Machine Learning Model for Hepatitis C Virus Exposure: A Demographic Screening Approach for the US Population

Dorian G Ding
Taoyi Chen
Yu Sheng
Jeffrey S.H. Lin
Ye Yuan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Hepatitis C virus (HCV) remains underdiagnosed in the United States despite recommendations for universal screening. A simple approach based on readily available demographic information may help target screening in settings where screening implementation continues to be incomplete. Methods We analyzed 10 NHANES cycles (1999–2014 and 2017–2023) and defined HCV exposure as a positive HCV antibody or RNA result. Using sex, birth year, race/ethnicity, birthplace, and income-to-poverty ratio, we trained and compared logistic regression (LR) and machine learning models in training and validation cohorts (48,434 and 20,762 participants, respectively). Model performance was evaluated based on sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the area under the receiver operating characteristic curve (AUROC). A web-based calculator was developed to facilitate bedside HCV screening. Results 69,196 participants were included, with 967 showing evidence of HCV exposure. Weighted HCV prevalence remained relatively stable across cycles, ranging from 1.22% to 1.93%. The prevalence did not change significantly after the pandemic. Earlier birth year, male sex, non-Hispanic Black race, US birth, and lower income-to-poverty ratio were independently associated with HCV exposure. XGBoost performed better than LR in the validation cohort (AUROC 0.860 vs 0.762, p < 0.001). Predicted risk separated the population clearly: observed HCV prevalence increased from 0.05% in the lowest-risk decile to 7.95% in the highest, with the top decile containing 58.3% of participants with HCV exposure and the top three deciles containing 85.5%. Conclusions Five demographic variables were sufficient to build a useful HCV risk model in a nationally representative US sample. Most HCV-exposed individuals were concentrated in the highest predicted-risk groups, suggesting that this approach could help prioritize and optimize testing where universal screening uptake remains incomplete. As no laboratory data is required, it may also be practical in data-limited settings and adaptable in other health systems.

Version published to 10.21203/rs.3.rs-9361729/v1 on Research Square
Apr 15, 2026

Construction and Validation of an Interpretable Machine Learning Model with SHAP for Identifying Infectious Diseases in Fever of Unknown Origin

This article has 5 authors:
1. Fei Li
2. Xu Zhang
3. Juan Zhang
4. Yang Yu
5. Jie Yang
This article has no evaluationsLatest version Apr 9, 2026
Assessment of risk prediction models for chronic kidney disease: a global perspective

This article has 7 authors:
1. Xiaoqi Chen
2. Rong Fu
3. Jiamin Chen
4. Zhimin Huang
5. Jinglin Dong
6. Zheng Lin
7. Zhijian Hu
This article has no evaluationsLatest version Apr 7, 2026
Development and Validation of an Interpretable Machine Learning Model to Identify Coexisting Type 2 Diabetes Mellitus in Patients with Metabolic dysfunction-associated fatty liver disease

This article has 14 authors:
1. Hui Zhu
2. Jia Zhang
3. Xi Xu
4. Yi Lv
5. Chenxia Lu
6. Qi Hao
7. Jingjing Huang
8. Miao Peng
9. Jingzhi Wang
10. Ouyang Kani
11. Zixin Shu
12. Shujie Song
13. Xiaodong Li
14. Mingzhong Xiao
This article has no evaluationsLatest version Apr 17, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Construction and Validation of an Interpretable Machine Learning Model with SHAP for Identifying Infectious Diseases in Fever of Unknown Origin

Assessment of risk prediction models for chronic kidney disease: a global perspective

Development and Validation of an Interpretable Machine Learning Model to Identify Coexisting Type 2 Diabetes Mellitus in Patients with Metabolic dysfunction-associated fatty liver disease