Diagnostic Classification of Mild Cognitive Impairment in Parkinson’s Disease Using Subject-Level Stratified Machine-Learning Analysis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: The timely identification of mild cognitive impairment (MCI) in Parkinson’s disease (PD) is essential for early intervention and clinical management, yet it remains a challenge in practice. Methods: We conducted an analysis of 3,154 clinical visits from 896 participants in the Parkinson’s Progression Markers Initiative (PPMI) cohort. Participants were divided into two groups: cognitively normal (PD-NC, MoCA ≥ 26) and MCI (PD-MCI, 21 ≤ MoCA ≤ 25). To ensure no visit-level information leakage, subject-level stratified sampling was employed to split the data into training (70%) and hold-out test (30%) sets. From an initial set of twelve routinely assessed clinical features, seven were selected using LASSO logistic regression: Age, Sex, Education Years, Disease Duration, UPDRS-I, UPDRS-III, and Geriatric Depression Scale (GDS). Four machine learning models—logistic regression (LR), support vector machine (SVM), random forest (RF), and XGBoost—were trained using subject-level stratified 10-fold cross-validation with Bayesian optimization. Probabilistic outputs were dichotomized using three thresholding strategies: (i) default 0.5, (ii) F1-score maximization, and (iii) Youden index maximization. Results: On the independent test set, SVM achieved the highest overall performance with AUC-ROC of 0.7252 and AUC-PR of 0.5008. LR also performed competitively despite its simplicity. RF achieved the top performance in recall, reaching 0.8150. Feature importance analysis consistently highlighted Age, Education Years, and Disease Duration as the most informative predictors for distinguishing PD-MCI. Conclusion: This study developed and validated robust machine learning models for PD-MCI classification using only standard clinical assessments. The use of subject-level stratified design and Bayesian optimization enabled rigorous model evaluation and reduced overfitting risk. The results support the potential for data-driven, interpretable tools to enhance early cognitive impairment screening in PD care.

Article activity feed