Systematic comparative benchmarking of computational methods for the detection of transposable elements in long-read sequencing data

Nogayhan Seymen
Renato Santos
Ramya Lakshmanan
Simon Topp
Ammar Al-Chalabi
Ahmad Al Khleifat
Gerome Breen
Richard JB Dobson
John P Quinn
Mohammad M. Karimi
Alfredo Iacoangeli

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Mobile element insertions, particularly transposable elements (TEs) such as Alu, LINE-1 (L1), SVA, and endogenous retroviruses (ERVs), represent a major source of human genetic variation and have been implicated in evolution, genomic instability, and disease. Although long-read sequencing generally outperforms short-read sequencing for the characterisation of such elements, their accurate detection with long-reads remains challenging, with different computational tools adopting varying approaches and producing divergent call sets. As gold standards currently do not exist for TE detection, benchmarking these methods is essential to understand their strengths, limitations, and biases. Here, we systematically evaluate the performance of available state-of-the-art TE detection tools on both simulated and real human genome data using highly characterised samples from the Genome in a Bottle consortium, population level reference databases and an in-house collection for which matching short-read sequencing data are available.

Results

Our results show significant differences in calling strategies, leading to substantial variation in precision, recall, and the spectrum of TE families detected across tools. Our benchmark also displays the differences between short-read and long-read calls, highlighting the importance of appropriate method selection.

Conclusions

The benchmarking results presented here will aid TE researchers make better informed decisions on which tool to use in their long-read TE analyses. Strengths and limitations of different tools have been highlighted in depth as well as their computational requirements, which will result in less time spent finding the best tool for the job and promote faster TE research.

Version published to 10.1101/2025.09.29.679192 on bioRxiv
Sep 30, 2025

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

This article has 7 authors:
1. Grazia Visci
2. Elisabetta Notario
3. Giuseppe Defazio
4. Mariano Francesco Caratozzolo
5. Bruno Fosso
6. Marinella Marzano
7. Graziano Pesole
This article has no evaluationsLatest version Jan 30, 2026
STRspy2.0: Unlocking the Potential of Long Reads for Forensic DNA Profiling

This article has 7 authors:
1. Courtney L. Hall
2. Rupesh K. Kesharwani
3. Katherine E. McBroom Henson
4. Bupe Kapema
5. Nicole R. Phillips
6. Fritz J. Sedlazeck
7. Roxanne R. Zascavage
This article has no evaluationsLatest version Jan 9, 2026
Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

This article has 7 authors:
1. Can Luo
2. Yichen Liu
3. Han Liu
4. Zhenmiao Zhang
5. Lu Zhang
6. Brock Peters
7. Xin Maizie Zhou
This article has no evaluationsLatest version Jan 12, 2026

Discuss this preprint

Listed in

Abstract

Background

Results

Conclusions

Article activity feed

Related articles

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

STRspy2.0: Unlocking the Potential of Long Reads for Forensic DNA Profiling

Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis