The dangers of data double dipping in assessing the classification accuracies of blood biomarkers in Alzheimer’s disease and related disorder research

Tianshu Liu
Xuemei Zeng
Beth E. Snitz
Thomas K. Karikari
Rebecca A. Deek

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Blood biomarker models are increasingly used in Alzheimer’s disease and related dementia translational research, but predictive performance can be inflated when the same dataset is used for both model development and evaluation. We assess the effect of data double dipping using simulations and NULISA proteomic data from the MYHAT-NI community-based cohort to predict brain amyloid-beta neuroimaging status. In both settings, training AUC increased as more biomarkers were added, while testing AUC peaked earlier and then declined. These findings show that data double dipping can inflate model performance and highlight the need for external validation or internal validation with data partitioning.

Version published to 10.64898/2026.05.22.26353848 on medRxiv
Jun 1, 2026

Deep Learning and Machine Learning for Early Detection of Alzheimer’s Disease: A Systematic Review and Meta-Analysis

This article has 1 author:
1. Saketh Machiraju
This article has no evaluationsLatest version May 22, 2026
Automated quantification of cerebral microbleeds for ARIA-H monitoring in Aging and Alzheimer’s Disease: A multicenter deep learning validation

This article has 10 authors:
1. Zhen Xuen Brandon Low
2. Ella Rowsthorn
3. Mohamad-Reza Nazem-Zadeh
4. Michelle Francis
5. Catherine Robb
6. Robert Whiriskey
7. Maxwell Howcroft
8. Amy Brodtmann
9. John J. McNeil
10. Meng Law
This article has no evaluationsLatest version May 26, 2026
A diagnostic plasma omics-biomarker for Alzheimer’s disease informed by microglial single-cell transcriptomics: A pilot study

This article has 5 authors:
1. Michael W. Lutz
2. Zhaohui Man
3. Yifei Zheng
4. Srilakshmi Venkatesan
5. Ornit Chiba-Falek
This article has no evaluationsLatest version May 5, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Deep Learning and Machine Learning for Early Detection of Alzheimer’s Disease: A Systematic Review and Meta-Analysis

Automated quantification of cerebral microbleeds for ARIA-H monitoring in Aging and Alzheimer’s Disease: A multicenter deep learning validation

A diagnostic plasma omics-biomarker for Alzheimer’s disease informed by microglial single-cell transcriptomics: A pilot study