Inferring rheumatoid arthritis disease activity status from the electronic health records across health systems to enable real-world data studies

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective

Disease activity plays a central role in rheumatoid arthritis (RA) clinical studies. However, RA disease activity is inconsistently recorded in real-world electronic health records (EHR) data limiting the generation of real-world evidence (RWE). This study aimed to develop and validate scalable machine learning (ML) models to infer RA disease activity from EHR data.

Methods

We conducted studies from EHR data from Mass General Brigham (MGB) and the Veterans Affairs (VA); both have RA registries with prospectively collected disease activity score 28 (DAS28). The features for the algorithm were extracted from the EHR including structured data, e.g., ICD codes and narrative data using natural language processing (NLP). Machine learning models were trained on the registry-collected DAS28.We tested within-institution trained model performance and across systems transportability. The association between inferred disease activity and major adverse cardiovascular events (MACE) was tested with stratified Cox models to test face-validity.

Results

We studied 1105 MGB and 2631 VA RA patients. Models with structured data models achieved an AUC of 0.68-0.70; models incorporating structured and NLP achieved higher performance (AUC=0.843, MGB; 0.833, VA). Cross-site validation demonstrated reduced transportability (AUC=0.679, MGB→VA; 0.718, VA→MGB), due to differences in the important feature. Within institution, inferred disease activity was significantly associated with increased risk for incident MACE (MGB: HR=1.12; VA: HR=1.14).

Conclusion

RA disease activity can be inferred at scale from within-institution EHR data, though cross-institution performance is limited. The inferred disease activity replicated association between RA and MACE and supports it’s use in future studies to generate RWE.

Article activity feed