Machine learning models for dementia risk prediction: Evidence from the Sydney Memory and Ageing Study

Rebecca A. Chalmers
Matti Cervin
Carol Choo
Katya Numbers
Karen A. Mather
Henry Brodaty
Nicole A. Kochan
Perminder S. Sachdev
Oleg N. Medvedev

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Early dementia risk stratification remains challenging despite advances in biomarker development. We evaluated machine learning approaches for predicting incident dementia using routinely available clinical measures from the Sydney Memory and Ageing Study. From 1037 community-dwelling Australians aged ≥ 70 years at baseline, 119 developed dementia and 313 remained dementia-free at 10-year follow-up. We compared logistic regression, LASSO-penalized regression, random forest, and XGBoost algorithms using baseline demographic, cognitive, cardiovascular, metabolic, and inflammatory markers. Models were trained on 70% of participants and evaluated on a 30% held-out test set. LASSO regression achieved superior discrimination (AUC = 0.752) compared to logistic regression (0.707), random forest (0.657), and XGBoost (0.589). The LASSO model retained only four predictors: age, global cognition score, glucose levels, and cardiovascular disease risk score. At the Youden-optimal threshold, LASSO demonstrated balanced sensitivity (0.698) and specificity (0.736), with favourable positive and negative predictive values. Decision-curve analysis confirmed greatest net clinical benefit across relevant risk thresholds. Notably, incorporating APOE ε4 carrier status did not improve prediction (AUC = 0.704), suggesting that current genetic testing may be unnecessary for initial risk stratification. The final model equation enables direct implementation in clinical settings using standard Excel calculators, with provisions for recalibration to different populations and age groups and can be useful for prediction at an individual level. These findings demonstrate that parsimonious machine learning models using four routinely collected variables can meaningfully predict dementia risk a decade before onset, offering a pragmatic approach for population-level screening without requiring specialized biomarkers or genetic testing.

Version published to 10.21203/rs.3.rs-7872424/v1 on Research Square
Nov 11, 2025

Machine learning-based prediction of future dementia using routine clinical MRI brain scans and healthcare data

This article has 12 authors:
1. Parminder Singh Reel
2. Salim Al-Wasity
3. Craig Edwards
4. Smarti Reel
5. Esma Mansouri-Benssassi
6. Szabolcs Suveges
7. Muthu Rama Krishnan Mookiah
8. Susan Krueger
9. Emanuele Trucco
10. Emily Jefferson
11. Alexander Doney
12. J. Douglas Steele
This article has no evaluationsLatest version Nov 14, 2025
Predicting Alzheimer’s Disease Diagnosis, a Decade or more Years before Onset using the Electronic Health Record and Random Forest Machine Learning Models

This article has 14 authors:
1. Sanya B. Taneja
2. Richard D. Boyce
3. Scott A. Malec
4. Steven M. Albert
5. C. Elizabeth Shaaban
6. Arthur S. Levine
7. Paul Munro
8. Jiang Bian
9. Jie Xu
10. Demetrius Maraganore
11. Karen Schliep
12. Jonathan C. Silverstein
13. Michelle Kienholz
14. Helmet T. Karim
This article has no evaluationsLatest version Nov 6, 2025
Prediction of Alzheimer’s Disease Progression from Mild Cognitive Impairment Using Polygenic Risk Scores

This article has 4 authors:
1. Yalu Wen
2. Shuyao Wang
3. Yu Chen
4. Hongmei Yu
This article has no evaluationsLatest version Oct 26, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Machine learning-based prediction of future dementia using routine clinical MRI brain scans and healthcare data

Predicting Alzheimer’s Disease Diagnosis, a Decade or more Years before Onset using the Electronic Health Record and Random Forest Machine Learning Models

Prediction of Alzheimer’s Disease Progression from Mild Cognitive Impairment Using Polygenic Risk Scores