OMAnnotator: a novel approach to building an annotated consensus genome sequence
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Advances in sequencing technologies have enabled researchers to sequence whole genomes rapidly and cheaply. However, despite improvements in genome assembly, genome annotation (i.e. the identification of protein-coding genes) remains challenging, particularly for eukaryotic genomes: it requires combining several approaches (typically ab initio , transcriptomics, and homology search), each with its own pros and cons. Deciding which gene models to retain in a consensus is far from trivial, and automated approaches tend to lag behind laborious manual curation efforts in accuracy.
Results
Here, we present OMAnnotator, a novel approach to building a consensus annotation. OMAnnotator repurposes the OMA algorithm, originally designed to elucidate evolutionary relationships among genes across species, to integrate predictions from different annotation sources, using evolutionary information as a tie-breaker. We validated OMAnnotator by reannotating the Drosophila melanogaster reference genome and comparing it with the expert-curated reference and results from the automated pipelines BRAKER2 and EvidenceModeller. OMAnnotator produced a consensus annotation that outperformed each individual input and surpassed the existing pipelines. Finally, when applied to three recently published genomes, OMAnnotator gave substantial improvements in two cases, and mixed results in the third, which had already benefited from extensive expert curation.
Conclusion
We introduce an original, flexible, and effective approach to annotating genomes by integrating multiple lines of evidence. The method’s robustness is underlined by its successful implementation in re-annotating recently published genomes, opening up new avenues in eukaryotic genome annotation.