Identifying Mild Cognitive Impairment Using Decision Tree–Based Machine Learning with Physical, Functional, and Psychosocial Measures in Community-Dwelling Older Adults: Evidence from the Northern Japanese ORANGE Registry
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Mild cognitive impairment (MCI) is common in later life and represents a key target for early identification and prevention. Scalable, non-imaging approaches using routinely collected community health data may support risk stratification and guide follow-up assessments. Methods We analyzed community health check-up data from Akita Prefecture, Japan. The outcome was binary MCI classification (0 = non-MCI; 1 = MCI). Candidate predictors included demographics, medical history, physical function, and psychosocial measures. Data were split into training (70%) and test (30%) sets using stratification. We trained a decision tree, random forest, and gradient-boosted trees with five-fold cross-validation and hyperparameter tuning. Model discrimination and classification metrics were evaluated on the independent test set. Permutation importance was computed for the best-performing model, and a shallow decision tree was derived using the top-ranked predictors for interpretability. Results The analytic sample included 2,650 participants (non-MCI: n = 1,893; MCI: n = 757). On the test set, the random forest model achieved the highest ROC AUC (0.719). At a 0.5 threshold, accuracy was 0.753, with sensitivity 0.254 and specificity 0.952. Using the Youden threshold (~ 0.256) increased sensitivity to 0.794 while reducing specificity to 0.537. Permutation importance ranked GDS-15 score, osteoporosis, social frailty, and living alone among the top predictors. Conclusions A random forest model demonstrated moderate discrimination for classifying MCI using routinely collected community health variables. The choice of operating threshold had a substantial impact on the sensitivity–specificity trade-off, underscoring the importance of clearly defining intended use and decision thresholds. External validation and prospective evaluation are required before clinical deployment.