EffectorGeneP: accurate gene annotation in pathogen genomes from infection transcriptomes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate gene annotation is crucial for inference of biological knowledge from genomes. However, non-canonical genes such as orphan or single-exon genes as well as those residing in rapidly evolving regions are routinely dismissed in annotation pipelines. In filamentous pathogen genomes, this disproportionately affects the annotation of genes encoding disease-promoting effector proteins. We introduce EffectorGeneP, a machine learning tool that self-trains on transcript data, predicts the most likely coding sequence from transcripts and effectively separates bona fide genes from transcriptional noise. EffectorGeneP annotates over 95% of known effectors correctly, while other state-of-the-art methods annotate 15%-78%. We show that EffectorGeneP expands the predicted secretome of pathogens by over 50% and that high-throughput screening of an effector library in plant protoplasts uncovers the previously poorly annotated AvrSr26 gene family in the wheat stem rust fungus. EffectorGeneP decodes genomes at unprecedented resolution and will enable the study of biological processes in important pathogen species.

Article activity feed