PlasmoFP: leveraging deep learning to predict protein function of uncharacterized proteins across the malaria parasite genus
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The first malaria parasite Plasmodium falciparum genome published in 2002 jump-started functional studies, but a large fraction of all predicted proteins remains partially annotated and of ‘unknown function’. Here, we introduce Plasmodium Function Predictor (PlasmoFP), deep learning models designed specifically for species of genus Plasmodium . Innovatively, PlasmoFP models are trained on structure-function relationships of the phylogenetically relevant SAR (Stramenopiles, Alveolate, and Rhizarians) supergroup proteins, addressing challenges to annotating Plasmodium proteins due to their low sequence similarity to well-characterized model organism proteins. PlasmoFP models estimate epistemic uncertainty, control false discovery rates in model predictions, and are validated using proteins with manually curated GO terms and experimentally characterized proteins. Integrating PlasmoFP predictions with current protein annotations, we reduced the proportion of unannotated proteins without Gene Ontology terms from 15-59% to 3-28% across 19 Plasmodium species, and improved the proportion of fully annotated proteins from 7-42% to 36-68%. PlasmoFP predictions advance Plasmodium basic research, an important component of global malaria R&D.