AI-based prediction of herbarium sequencing success across the plant tree of life
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
DNA recovered from herbarium specimens represents a vital asset in botanical research, playing a pivotal role in unravelling the evolution, diversity, and ecological dynamics of plants. Despite its importance, challenges such as fragmented DNA and insufficient sequencing yields render molecular data retrieval a high-risk and costly endeavour involving the use of non-replaceable herbarium specimens. Here, we propose a framework based on Artificial Intelligence (AI) to forecast the success of genomic DNA extraction suitable for sequencing from herbarium samples. Our model integrates morphological characteristics and sample colour derived from scanned herbarium images, metadata including sample age and locality, and DNA quantity measurements of samples. We train a deep learning algorithm with ca. 2,000 specimens that have been digitized and sequenced in the framework of the Plant and Fungal Trees of Life (PAFTOL) Project, spanning from year 1832 to the present. As training datasets increase with ongoing digitization and genomic sequencing efforts, our AI predictive model can support researchers in selecting the herbarium samples with the highest likelihood of yielding high-quality genomic DNA from amongst a vast array of globally distributed candidate specimens. Our approach enhances the contribution of herbarium-derived DNA in large-scale studies and facilitates the utilisation of historical collections for a deeper understanding of plant evolution and ecology, with implications for conservation.