Sjögren’s syndrome-associated interstitial lung disease: classification model development, risk factor analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective To identify the influential factors contributing to the development of interstitial lung disease (ILD) in patients with Sjögren's syndrome (SS) based on retrospective clinical data. A machine learning (ML) framework integrated with the SHAP (SHapley Additive exPlanations) algorithm was developed on a Python platform to provide diagnostic and therapeutic guidance for clinical practice. Methods A retrospective analysis was conducted on clinical data from 609 patients with SS treated at the Department of Rheumatology, Nanjing Municipal Hospital of Chinese Medicine between 2014 and 2025. All data underwent rigorous de-identification, cleaning, deduplication, and missing value imputation. Potential risk factors for SS-ILD were predefined based on clinical guidelines. Feature selection was initially performed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression. Subsequently, the dataset was randomly partitioned into a 70% training set and a 30% testing set. Seven ML algorithms—Logistic Regression, Support Vector Machine (SVM), Random Forest, XGBoost, K-Nearest Neighbors (KNN), Decision Tree, and Elastic Net—were constructed. Model performance was evaluated using hyperparameter optimization, ROC curves, and metrics including specificity, sensitivity, accuracy, and F1-score. A strict cross-validation strategy was implemented to ensure robustness. Finally, the SHAP algorithm was employed to quantify the contribution of specific clinical features to the development of SS-ILD. Results LASSO regression identified 10 key features from an initial pool of 31 variables as the primary inputs for modeling. Among the evaluated algorithms, the XGBoost model demonstrated superior performance, achieving a nested cross-validation AUC of 0.7507 (95% CI: 0.7251–0.7763). SHAP analysis revealed the top six clinical predictors: age, anti-cyclic citrullinated peptide (anti-CCP) antibody positivity, increased C-reactive protein (CRP), female gender, rheumatoid factor (RF) positivity, and coronary heart disease. Conclusion The development of interstitial lung disease in patients with Sjögren's syndrome is closely associated with factors such as advanced age, anti-CCP positivity, and elevated CRP levels.