panHiTE: a comprehensive and accurate pipeline for TE detection in large-scale population genomes

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Transposable elements (TEs) are key drivers of genomic variation and species evolution. Advances in high-throughput sequencing have enabled whole-genome sequencing of individuals or subspecies, facilitating the identification of population-specific variations. Detecting population-specific TE insertions at scale is crucial for understanding species-specific phenotypic traits. However, tools for constructing comprehensive pan-TE databases remain limited. To address this gap, we develop panHiTE, a population-scale TE detection and annotation tool with several core innovations. panHiTE features a deep learning-based long terminal repeat retrotransposon (LTR-RT) detection algorithm, outperforming existing tools in both sensitivity and precision. It also introduces a novel de-redundancy algorithm, which eliminates highly divergent redundant TE instances, significantly reducing the size of the TE library. Additionally, panHiTE can detect low-copy TEs, which are overlooked in individual genome analyses and absent from existing databases due to their rarity. Furthermore, panHiTE allows for TE-gene association analysis, enabling comprehensive insights into TE-driven phenotypic variation. panHiTE, powered by a Nextflow pipeline, enables efficient and scalable TE detection in large plant genomes and has successfully been applied to hundreds of plant population genomes, demonstrating its effectiveness and scalability.

Article activity feed