Residue conservation and solvent accessibility are (almost) all you need for predicting mutational effects in proteins

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Predicting how mutations impact protein biophysical properties remains a significant challenge in computational biology. In recent years, numerous predictors, primarily deep learning models, have been developed to address this problem; however, issues such as their lack of interpretability and limited accuracy persist.

Results

We showed that a simple evolutionary score, based on the log-odd ratio (LOR) of wild-type and mutated residue frequencies in evolutionary related proteins, when scaled by the residue’s relative solvent accessibility (RSA), performs on par with or slightly outperforms most of the benchmarked predictors, many of which are considerably more complex. The evaluation is performed on mutations from the ProteinGym deep mutational scanning dataset collection, which measures various properties such as stability, activity or fitness. This raises further questions about what these complex models actually learn and highlights their limitations in addressing prediction of mutational landscape.

Availability

The RSALOR model is available as a user-friendly Python package that can be installed from the PyPI repository. The code is freely available at https://github.com/3BioCompBio/RSALOR .

Contact

Matsvei.Tsishyn@ulb.be , Fabrizio.Pucci@ulb.be

Article activity feed