Machine Learning Analysis of Electronic Health Records Identifies Interstitial Lung Disease and Predicts Mortality in Patients with Systemic Sclerosis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Interstitial lung disease (ILD) is the leading cause of death in patients with systemic sclerosis (SSc), affecting more than 40% of this population. Despite the availability of effective treatments to stabilize or improve lung function, survival for patients with SSc-ILD remains poor. Poor outcomes have been attributed to delayed diagnosis and initiation of treatment for SSc-ILD. Although recent guidelines have provided conditional recommendations for early screening, pulmonary function tests (PFTs) are insensitive for early diagnosis, and computed tomography (CT)—the current gold standard—often detects disease after irreversible lung injury has occurred. A single sensitive biomarker that can accurately predict the risk of SSc-ILD development and mortality is lacking. We hypothesized that applying machine learning (ML) methods to multiple features from readily available electronic health records (EHR) could construct a model to detect ILD and predict mortality in patients with SSc.
Methods
We retrospectively analyzed EHR data from participants enrolled in a single- center registry of patients with SSc over a period of twenty-eight years (1995-2024). We applied a combination of ML models to seventy-four clinical features encompassing demographics, clinical history, PFTs, and laboratory results. The resultant models were tasked with detecting ILD and predicting mortality in participants with SSc.
Results
1,169 participants with SSc were included in this study, spanning 15,494 person-years of observation. Models detecting ILD achieved an AUC of 0.818 and confirmed the importance of known biomarkers, such as autoantibodies and PFTs, as risk factors for SSc-ILD. Unexpected clinical values including white blood cell count and mean corpuscular volume were also important for model prediction of SSc-ILD. For prediction of one-year all-cause mortality, models reached an AUC of 0.903. In a subgroup analysis of those with prevalenet radiographic SSc-ILD, three-year all-cause mortality prediction reached an AUC of 0.831. These models identified features strongly associated with mortality that are routinely collected during clinical assessment of patients with SSc, including unexpected associations with values such as red cell distribution width and serum chloride concentration.
Conclusions
ML-based analysis of clinical features and laboratory tests collected as part of routine clinical care detect ILD and predict mortality in patients with SSc.