Assessing the feasibility of machine learning for ancient DNA age prediction: limitations and insights

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We investigated the possibility of estimating the age of ancient biological samples directly from their DNA damage profiles using supervised machine learning. Traditional dating methods such as radiocarbon dating, dendrochronology rely on either material context or isotope composition, while our approach exploits intrinsic molecular degradation signatures. Using damage statistics obtained from ancient DNA sequencing data, we trained several regression models to predict sample age over a temporal range of up to 10,000 years. Despite initial correlations between specific damage features and age, cross-validation and external testing revealed no statistically significant predictive signal beyond mean-based baselines. These findings indicate that, in the current formulation, DNA damage information alone is insufficient for reliable age estimation. However, this negative result provides important methodological insight: environmental and biochemical factors appear to dominate damage variation, effectively masking chronological signal. We suggest that integrating contextual data, expanding labeled datasets, and incorporating physical models of DNA decay may improve future attempts. Our study thus contributes to a transparent assessment of the limitations and prospects of DNA-based fossil dating.

Article activity feed