Predicting infection risk in rheumatoid arthritis patients receiving biological or targeted synthetic disease-modifying anti-rheumatic drugs: an application of machine learning and healthcare big data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Patients with rheumatoid arthritis (RA) initiating biologic or targeted synthetic disease-modifying antirheumatic drugs (b/ts DMARDs) face elevated risk of serious infections, necessitating tools for individualized risk stratification. Objectives Primary objective was to develop and validate a clinically interpretable machine learning (ML) model to predict 1-year risk of serious infection after b/ts DMARD initiation; secondary objectives were to estimate infection incidence and identify key predictors associated with risk. Methods We performed a retrospective cohort study using territory-wide EHR from Hong Kong’s Clinical Data Analysis and Reporting System (CDARS) for model development and internal validation, with external validation in the U.S. All of Us database. The outcome was first serious infection requiring hospitalization within 1 year. Candidate predictors included demographics, comorbidities, prior infections and medications, laboratory markers. Multiple ML algorithms were trained; model selection was based on AUROC, and interpretability was assessed using SHAP. Results A total of 3,159 patients from CDARS (8.8% with serious infections) and 1,845 from All of Us (2.8% with serious infections) were included. The model demonstrated the highest AUROC in internal validation (0.840, 95% CI: 0.793–0.888) and maintained robust performance in external validation (AUROC: 0.729, 95% CI: 0.665–0.793). Key predictors included prior infections, diabetes, b/ts DMARD type, and inflammatory markers. Rituximab was linked to the highest infection risk, while tofacitinib and upadacitinib had the lowest. Conclusion This study developed and validated an ML model using routine clinical data to predict serious infection risk in RA patients, supporting personalised treatment and proactive infection management.