Annotation of 200 Insect Genomes with BRAKER for Consistent Comparisons across Species
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The annotation of genomes lags behind their sequencing and assembly. For example, of 5,092 insect species in GenBank, as one of the most widely used databases, only 375 currently have annotated genomes within this database.
Additionally, species that were previously annotated can benefit from reannotation using RNA-Seq and protein data that have been added since their last annotation, as well as from state-of-the-art annotation methods, whose accuracy has improved. Heterogeneous annotations performed with different tools and protein databases can introduce artifactual differences when comparing gene sets or gene structures between species.
Recently, the BRAKER3 annotation pipeline was introduced that integrates evidence from RNA-Seq and from a protein database. It was benchmarked as one of the most accurate annotation methods. Here, we introduce an automated genome annotation workflow that allows to annotate a list of species with BRAKER3 and VARUS for RNA-Seq retrieval, or in the absence of transcriptome data, with BRAKER2, with minimal manual intervention. We selected a diverse set of 200 insect species from different families, including 85 species previously lacking annotations in GenBank. Using currently available RNA-Seq and protein sequence data, we applied our automated workflow to annotate these genomes and conducted downstream analyses typically performed in comparative genomics studies.
We present the resulting gene structures, protein sequences, gene ontology terms, orthologous gene groups and a species tree.