Predictions from Deep Learning Propose Substantial Protein-Carbohydrate Interplay

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

It is a grand challenge to identify all the protein – carbohydrate interactions in an organism. Direct experiments would require extensive libraries of glycans to definitively distinguish binding from non-binding proteins. Computational screening of proteins for carbohydrate-binding provides an attractive and ultimately testable alternative. Recent computational techniques have focused primarily on which protein residues interact with carbohydrates or which carbohydrate species a protein binds to. Current estimates label 1.5 to 5% of proteins as carbohydrate-binding proteins; however, 50-70% of proteins are known to be glycosylated, suggesting a potential wealth of proteins that bind to carbohydrates. We therefore developed a novel dataset and neural network architecture, named P rotein i nteraction of Ca rbohydrates P redictor (PiCAP), to predict whether a protein non-covalently binds to a carbohydrate. We trained PiCAP on a dataset of known carbohydrate binders, and we selected proteins that we identified as likely not to bind carbohydrates, including DNA-binding transcription factors, cytoskeletal components, selected antibodies, and selected small-molecule-binding proteins. PiCAP achieves a 90% balanced accuracy on protein-level predictions of carbohydrate binding/non-binding. Using the same dataset, we developed a model named Ca rbohydrate P rotein S ite I denti f ier 2 (CAPSIF2) to predict protein residues that interact non-covalently with carbohydrates. CAPSIF2 achieves a Dice coefficient of 0.57 on residue-level predictions on our independent test dataset, outcompeting all previous models for this task. To demonstrate the biological applicability of PiCAP and CAPSIF2, we investigated cell surface proteins of human neural cells and further predicted the likelihood of three proteomes, notably E. coli, M. musculus , and H. sapiens , to bind to carbohydrates. PiCAP predicts that approximately 35-40% of proteins in these proteomes bind carbohydrates, indicating a substantial interplay of protein-carbohydrate interactions for cellular functionality.

Significance Statement

The totality of carbohydrate-protein interactions remains elusive, in part due to the inability to test proteomes versus glycomes in a high throughput manner. Here we show the first high-throughput methodology to predict protein-carbohydrate interactions at proteomic scales by using structural and sequence information. This information will allow scientists to target predicted protein-carbohydrate interactions to better determine how the elusive carbohydrate biomolecules play roles in all cellular functions.

Article activity feed