Integrative Machine Learning Approach to Risk Prediction for Dementia and Alzheimer’s Disease

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Dementia, especially Alzheimer’s disease (AD), is a major global health challenge marked by progressive cognitive impairment, behavioral changes, and loss of autonomy. As global life expectancy rises, there is a growing urgency for earlier diagnosis and better clinical guidance for AD and other dementia subtypes.

Objectives

This study aimed to develop and evaluate machine learning (ML) models for predicting AD risk using comprehensive health, genetic, and lifestyle data. It further sought to examine sex-specific model performance and the predictive potential for vascular dementia (VaD) versus non-vascular dementia.

Methods

Data from the UK Biobank (UKB) cohort were analyzed, comprising 2,878 individuals diagnosed with AD and 72,366 age-matched controls. Multiple ML algorithms were evaluated, with CatBoost achieving the highest performance (ROC-AUC = 0.773). Input features included ICD-10 medical codes reported five years prior to diagnosis, lifestyle and environmental factors, and genetic variants including the ApoE-ε4 alleles.

Results

CatBoost outperformed other ML approaches, with stronger predictive accuracy in females. VaD prediction was more accurate despite the smaller sample size. ApoE-ε4 was confirmed as a key genetic risk factor, while other genetic markers had limited predictive value. Significant non-genetic predictors included comorbid conditions (e.g., type 2 diabetes, hypertension), educational attainment, physical activity, diet, and cardiovascular health, highlighting the multifactorial nature of dementia risk.

Conclusions

The integration of genetic, clinical, and lifestyle data enhances the accuracy of AD and dementia risk prediction. Findings underscore the importance of sex-specific analysis and the influence of comorbidities and modifiable risk factors. This approach supports more precise, personalized early interventions and diagnostic strategies for AD and dementia subtypes.

Highlights

  • CatBoost accurately predicted Alzheimer’s disease risk, with higher performance in females.

  • The ApoE-ε4 gene was the strongest genetic risk factor across AD and all dementia subtypes.

  • Comorbidities (e.g., heart disease, diabetes) interact with lifestyle factors and education to increase the risk predictive models.

  • Vascular dementia showed strong predictive signals despite smaller sample size.

  • Article activity feed