Construction of Predictive Models for Interstitial Lung Disease Risk in Sjögren’s Syndrome via Multiple Machine Learning Algorithms
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective: To develop and interpret machine learning (ML) models for predicting interstitial lung disease (ILD) in patients with Sjögren’s Syndrome (SS) using the SHAP algorithm to identify clinical risk factors. Methods: Clinical data from 196 SS patients (2014–2024) were retrospectively analyzed. After LASSO regression for feature selection, eight ML models (including Random Forest, XGBoost, and SVM) were developed using a 70/30 train-test split and 10-fold cross-validation. Model performance was evaluated via ROC-AUC and F1 scores. SHAP analysis was applied to explain the contribution of key variables. Results: Seven core predictors were identified from 94 features: age, anti-CCP, CRP, dry eyes, and CT findings (nodules, pleural effusion, and emphysema). The Random Forest model outperformed others with an AUC of 0.8298. SHAP analysis identified CT_nodules, anti-CCP, CT_emphysema, and age as the most significant clinical features. Conclusion: Beyond radiographic findings like nodules, inflammatory markers (CRP) are critical risk factors for ILD in SS patients.