Accuracy of diagnostic codes and algorithms used to identify rheumatoid arthritis and juvenile idiopathic arthritis in electronic health records: systematic review and meta-analysis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective
This systematic review aimed to assess the diagnostic accuracy of algorithms used to identify rheumatoid arthritis (RA) and juvenile idiopathic arthritis (JIA) in electronic health records (EHRs).
Methods
We searched MEDLINE, Embase, and CENTRAL databases and included studies that validated case definitions against a reference standard such as rheumatologist-confirmed diagnosis or ACR/EULAR classification criteria. Title/abstract screening, full-text review, data extraction and quality assessment were all completed in duplicate. Results were synthesised narratively and using a bivariate random-effects meta-analysis of sensitivity and specificity.
Results
A total of 35 studies were included. Algorithms varied widely in complexity, ranging from single ICD codes to combinations including disease-modifying antirheumatic drugs (DMARDs), hospitalisation records, and specialist diagnosis. Algorithms combining ICD codes with DMARD prescriptions (pooled sensitivity= 0.79 95% CI 0.61-0.90, specificity= 0.96 95% CI 0.72-1.00, PPV= 0.78 95% CI 0.63-0.88) or requiring an ICD code assigned by a rheumatologist (pooled sensitivity= 0.91 95% CI 0.70-0.98, specificity= 0.94 95% CI 0.49-1.00, PPV= 0.70 95% CI 0.64-0.75) showed the highest accuracy, with balanced sensitivity, specificity, and positive predictive value (PPV). Less restrictive algorithms demonstrated high sensitivity but lower PPV. Substantial heterogeneity was observed across studies, likely due to differences in algorithm structure, data sources, and validation methods. Despite this variability, we used conceptually coherent categories to allow for meaningful synthesis, prioritising clinical interpretability.
Conclusions
These findings support the use of more specific algorithms when diagnostic certainty is essential and highlight the need for further validation of high-performing algorithms across diverse healthcare systems.
Significance and Innovations
▪ This is the first comprehensive systematic review to evaluate and synthesize the accuracy of algorithms used to identify rheumatoid arthritis and juvenile idiopathic arthritis in electronic health records (EHRs), addressing a growing need as real-world data become increasingly central in rheumatology research.
▪ The findings provide critical guidance for researchers and clinicians on the strengths and limitations of commonly used case definitions, helping improve validity of studies using administrative or EHR data.
▪ By categorizing algorithms based on their components and reference standards, this review offers a practical framework for selecting the most appropriate algorithm depending on the study purpose and data source.
▪ The review highlight gaps in validation efforts and emphasizes the need to validate high-performing algorithms across diverse healthcare settings and evolving coding systems, ensuring accurate disease identification in current and future research.