Predicting Alzheimer’s Disease Diagnosis, a Decade or more Years before Onset using the Electronic Health Record and Random Forest Machine Learning Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

INTRODUCTION

There is need to detect and intervene in pre-clinical phases of Alzheimer’s disease (AD). Electronic health records (EHRs) may help predict AD using machine learning methods.

METHODS

We identified EHRs for 19,473 cases with AD and 111,922 controls. Records spanned 10 or more years prior to AD diagnosis. We trained a random forest model (employing 5-fold cross-validation with 2,499 features) to predict AD 10 years prior to its onset using a 75/25% train/test split and then computed permuted feature importance.

RESULTS

We achieved an area under the ROC curve of 0.80. Feature importance identified factors associated with AD, including age, sex, race, ethnicity, BMI, cardiovascular diseases, inflammation, pain, sleep and mood disorders, trauma, other neurodegenerative disorders, diuretics, colon-related disorders and procedures, seizures, and vitamin B12.

DISCUSSION

This is the first EHR-based model to predict AD 10 years prior to onset, which could help predict AD and inform prevention/early intervention.

Article activity feed