Construction of Predictive Models for Interstitial Lung Disease Risk in Sjögren’s Syndrome via Multiple Machine Learning Algorithms

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective: To develop and interpret machine learning (ML) models for predicting interstitial lung disease (ILD) in patients with Sjögren’s Syndrome (SS) using the SHAP algorithm to identify clinical risk factors. Methods: Clinical data from 196 SS patients (2014–2024) were retrospectively analyzed. After LASSO regression for feature selection, eight ML models (including Random Forest, XGBoost, and SVM) were developed using a 70/30 train-test split and 10-fold cross-validation. Model performance was evaluated via ROC-AUC and F1 scores. SHAP analysis was applied to explain the contribution of key variables. Results: Seven core predictors were identified from 94 features: age, anti-CCP, CRP, dry eyes, and CT findings (nodules, pleural effusion, and emphysema). The Random Forest model outperformed others with an AUC of 0.8298. SHAP analysis identified CT_nodules, anti-CCP, CT_emphysema, and age as the most significant clinical features. Conclusion: Beyond radiographic findings like nodules, inflammatory markers (CRP) are critical risk factors for ILD in SS patients.

Article activity feed