OMAnnotator: a novel approach to building an annotated consensus genome sequence

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Advances in sequencing technologies have enabled researchers to sequence whole genomes rapidly and cheaply. However, despite improvements in genome assembly, genome annotation (i.e. the identification of protein-coding genes) remains challenging, particularly for eukaryotic genomes: it requires combining several approaches (typically ab initio , transcriptomics, and homology search), each with its own pros and cons. Deciding which gene models to retain in a consensus is far from trivial, and automated approaches tend to lag behind laborious manual curation efforts in accuracy.

Results

Here, we present OMAnnotator, a novel approach to building a consensus annotation. OMAnnotator repurposes the OMA algorithm, originally designed to elucidate evolutionary relationships among genes across species, to integrate predictions from different annotation sources, using evolutionary information as a tie-breaker. We validated OMAnnotator by reannotating the Drosophila melanogaster reference genome and comparing it with the expert-curated reference and results from the automated pipelines BRAKER2 and EvidenceModeller. OMAnnotator produced a consensus annotation that outperformed each individual input and surpassed the existing pipelines. Finally, when applied to three recently published genomes, OMAnnotator gave substantial improvements in two cases, and mixed results in the third, which had already benefited from extensive expert curation.

Conclusion

We introduce an original, flexible, and effective approach to annotating genomes by integrating multiple lines of evidence. The method’s robustness is underlined by its successful implementation in re-annotating recently published genomes, opening up new avenues in eukaryotic genome annotation.

Article activity feed