Manual versus automatic annotation of transposable elements: case studies in Drosophila melanogaster and Aedes albopictus, balancing accuracy and biological relevance
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Transposable elements (TEs) play a pivotal role in genome evolution, yet their detection and annotation remain challenging due to the limitations of current methods. Manual curation is considered the gold standard for generating TE libraries, particularly for TE-focused studies, although it requires extensive training and time. With the rapid increase in the number of genome assembly publications and the growing need for large-scale comparative analyses, automated software for TE annotation has become indispensable. This study compares manual and automated approaches to TE detection and annotation, focusing on two species: Drosophila melanogaster and Aedes albopictus. In D. melanogaster, a species with a well-annotated TE repertoire and a smaller genome, the differences between manual curation (MCTE) and automated annotation (ATTE) are relatively minor. However, significant differences arise when analysing Ae. albopictus, a species with a relatively large genome and high TE diversity. While automated methods identified a greater number of TEs, including many smaller and fragmented elements, manual curation provided more detailed classifications and, on average, larger consensi. Automated pipelines offer a viable alternative for genome-wide analyses such as TE content estimation, particularly when time and resources are limited. However, caution is advised when interpreting results, as finer details of TE dynamics may be overlooked. This study highlights that the choice of annotation method depends on the intended analysis. Manual curation is more suitable for TE population genomics and studies focusing on recent transposable element activity, whereas automated methods are appropriate for larger comparative analyses or genome assembly projects. Both methods have strengths and limitations, and understanding the specific features of the genome and repeatome under study is essential for selecting the appropriate approach.