DIANA: Deep Learning Identification and Assessment of Ancient DNA

Camila Duitama González
Maria Lopopolo
Luca Nishimura
Roland Faure
Sebastian Duchene

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The field of ancient metagenomics provides insights into past microbiomes, but with a growing dataset size, methods that rely on reference databases have limited scope. Here, we introduce DIANA, a multi-task neural network that predicts key metadata categories from unitig abundances. Trained on 2,597 run accessions (1.72 Tbp of assembled unitig sequences), DIANA accurately identifies sample host (94.6%), community type (90.0%), and material (88.9%) on held-out test data and demonstrates robust generalisation on an independent validation set. A key innovation is DIANA’s ability to perform semantic generalisation, correctly classifying samples with labels unseen during training — such as novel subspecies — to their appropriate parent categories. By leveraging both known and uncharacterized genomic sequences, DIANA provides a rapid, data-driven system for metadata validation and quality control, accelerating discovery in ancient metagenomics research.

Version published to 10.64898/2026.04.09.717429 on bioRxiv
Apr 10, 2026

A Systematic Approach Toward Implementing Machine Learning Techniques to Analyze Gut Microbiome Data

This article has 9 authors:
1. Anvi Taada
2. Ava George
3. Dhatrisri Biruduraju
4. Emily Lu
5. Isha Singh
6. Khushi Chhajer
7. Madeline Wang
8. Tanvi Pentela
9. Sahar Jahanikia
This article has no evaluationsLatest version Apr 26, 2026
Metaxa: A Transformer-Based Deep Learning Model for Taxonomic Classification of Long Nanopore Reads

This article has 4 authors:
1. Krešimir Friganović
2. Dominik Stanojević
3. Poshen B. Chen
4. Mile Šikić
This article has no evaluationsLatest version Apr 23, 2026
A Bioinformatic Pipeline for Consensus Taxonomic Classification of Long-Read Amplicons

This article has 5 authors:
1. Ashley A. Paulsen
2. Breah LaSarre
3. Drew Delp
4. Gwyn A. Beattie
5. Larry J. Halverson
This article has no evaluationsLatest version Apr 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Systematic Approach Toward Implementing Machine Learning Techniques to Analyze Gut Microbiome Data

Metaxa: A Transformer-Based Deep Learning Model for Taxonomic Classification of Long Nanopore Reads

A Bioinformatic Pipeline for Consensus Taxonomic Classification of Long-Read Amplicons