A joint embedding of protein sequence and structure enables robust variant effect predictions
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (Arcadia Science)
Abstract
The ability to predict how amino acid changes may affect protein function has a wide range of applications including in disease variant classification and protein engineering. Many existing methods focus on learning from patterns found in either protein sequences or protein structures. Here, we present a method for integrating information from protein sequences and structures in a single model that we term SSEmb (Sequence Structure Embedding). SSEmb combines a graph representation for the protein structure with a transformer model for processing multiple sequence alignments, and we show that by integrating both types of information we obtain a variant effect prediction model that is more robust to cases where sequence information is scarce. Furthermore, we find that SSEmb learns embeddings of the sequence and structural properties that are useful for other downstream tasks. We exemplify this by training a downstream model to predict protein-protein binding sites at high accuracy using only the SSEmb embeddings as input. We envisage that SSEmb may be useful both for zero-shot predictions of variant effects and as a representation for predicting protein properties that depend on protein sequence and structure.
Article activity feed
-
Prediction of protein-protein binding sites using embeddings
it would be interesting to see more details here about which PPIs are predicted well/not well by SSEmb vs other models. Is a particular type of PPI consistently missed? It would also be interesting to identify what causes false positives.
-