Machine Learning for Missing Data Imputation in Alzheimer’s Research: Predicting Medial Temporal Lobe Flexibility

Soodeh Moallemian
Abolfazl Saghafi
Rutvik Deshpande
Jose M. Perez
Miray Budak
Bernadette A. Fausto
Fanny Elahi
Mark A. Gluck

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

BACKGROUND

Alzheimer’s disease (AD) begins years before symptoms appear, making early detection essential. The medial temporal lobe (MTL) is one of the earliest regions affected, and its network flexibility, a dynamic measure of brain connectivity, may serve as a sensitive biomarker of early decline. Cognitive (acquisition, generalization), genetic (APOE, ABCA7), and biochemical (P-tau217) markers may predict MTL dynamic flexibility. Given the high rate of missing data in AD research, this study uses machine learning with advanced imputation methods to predict MTL dynamic flexibility from multimodal predictors in an aging cohort.

METHODS

In an ongoing study at Rutgers’s Aging and Brain Health Alliance, data from 656 participants are utilized, including cognitive assessments, genetic and blood-derived biomarkers, and demographics. Due to MRI-related constraints, only 34.15% of participants had measurable MTL dynamic flexibility from resting-state fMRI. To estimate MTL dynamic flexibility from available data, we evaluated four missing data handling methods (case deletion, MICE, MissForest, and GAIN), and trained five regression models: Ridge, k-NN, SVR, regression trees (bagging, random forest, boosting), and ANN. Hyperparameters were optimized via grid search with 3-fold cross-validation. Model performance was assessed using mean absolute error (MAE), root mean squared error (RMSE), and runtime through 5-fold cross-validation repeated 25 times to ensure robustness in clinical data settings.

RESULTS

A total of 1,866 missing values (25.86%) were identified in the dataset, with only 42 complete cases (6.40%) remaining after listwise deletion, highlighting the need for effective imputation. In the initial analysis using only complete cases, support vector regression (SVR) achieved the lowest mean absolute error (MAE = 0.184), though overall performance was limited due to small sample size. In the second phase, three imputation techniques were applied, significantly improving model accuracy. MissForest combined with Random Forest produced the best results (MAE = 0.083), representing a 54.7% improvement over case deletion. Statistical analysis confirmed significant differences in performance across imputation methods (p < 0.001), with MissForest outperforming GAIN and MICE. GAIN was the fastest imputation method.

DISCUSSION

The findings underscore the importance of using robust imputation strategies to maximize data utility and model reliability in studies with high missingness. Further research is needed, particularly incorporating additional neuroimaging measures, to localize the brain regions most affected by biomarker-driven changes and to refine predictive models for clinical applications.

Version published to 10.1101/2025.05.22.655574 on bioRxiv
May 27, 2025

Beyond the Hippocampus: Objective Memory Stages Capture Widespread Brain Aging in a Cross-Sectional Analysis of Baseline RCT Data

This article has 5 authors:
1. Birthe Kristin Flo
2. Stavros Skouras
3. Anna Maria Matziorinis
4. Christian Gaser
5. Stefan Koelsch
This article has no evaluationsLatest version Jan 22, 2026
Defining the natural history of Alzheimer’s disease by longitudinal cerebrospinal fluid proteomics.

This article has 17 authors:
1. Betty Tijms
2. Diederick de Leeuw
3. Calvin Trieu
4. Martí Jimenéz-Mausbach
5. Katarina Fritz-Wallace
6. Olav Mjaavatten
7. Elena-Raluca Bludjea
8. Roos Jutten
9. Argonde van Harten
10. Flora Duits
11. Anouk den Braber
12. Henne Holstege
13. Marissa Zwan
14. Everard Vijverberg
15. Frode Berven
16. Charlotte Teunissen
17. Pieter Jelle Visser
This article has no evaluationsLatest version Jan 16, 2026
Predicting Cognitive Decline in Early Alzheimer’s: Insights from East Asian Cohorts

This article has 16 authors:
1. Yao-Hwei Fang
2. Yung-Shuan Lin
3. Kazuaki Uchida
4. Wei-Ju Lee
5. Chih-Cheng Hsu
6. Tsung-Jen Hsieh
7. Tzu-Yu Chen
8. Yi-Chu Liao
9. Yujiro Kuroda
10. Taiki Sugimoto
11. I-Shou Chang
12. Takashi Sakurai
13. Hidenori Arai
14. Shuu-Jiun Wang
15. Chao A. Hsiung
16. Jong-Ling Fuh
This article has no evaluationsLatest version Feb 3, 2026

Discuss this preprint

Listed in

Abstract

BACKGROUND

METHODS

RESULTS

DISCUSSION

Article activity feed

Related articles

Beyond the Hippocampus: Objective Memory Stages Capture Widespread Brain Aging in a Cross-Sectional Analysis of Baseline RCT Data

Defining the natural history of Alzheimer’s disease by longitudinal cerebrospinal fluid proteomics.

Predicting Cognitive Decline in Early Alzheimer’s: Insights from East Asian Cohorts