The dangers of data double dipping in assessing the classification accuracies of blood biomarkers in Alzheimer’s disease and related disorder research
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Blood biomarker models are increasingly used in Alzheimer’s disease and related dementia translational research, but predictive performance can be inflated when the same dataset is used for both model development and evaluation. We assess the effect of data double dipping using simulations and NULISA proteomic data from the MYHAT-NI community-based cohort to predict brain amyloid-beta neuroimaging status. In both settings, training AUC increased as more biomarkers were added, while testing AUC peaked earlier and then declined. These findings show that data double dipping can inflate model performance and highlight the need for external validation or internal validation with data partitioning.