Validating folding energy estimates as a method for variant interpretation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Interpretation of variants of uncertain significance remains a major problem in genomic analysis. Whilst statistical models can be used to predict pathogenicity, they offer no insights into the biophysical mechanism of variant action, and genomic data available for training is biased towards the subpopulations who have access. Protein misfolding has been found to act as a frequent mechanism for loss of gene or domain activity, where it is typically responsible for ∼2/3 of disease-causing variants and somatic mutations. The accuracy of energy predictions however has consistently been challenged by highly variable correlation coefficients reported from different proteins, and the unknown impact of alternative structures where available. Here we address this directly through a systematic analysis of mega-scale folding experimental results, enabled by a fully automated predictive pipeline based on FoldX. We find that whilst absolute correlation coefficients are mediocre for three highly studied proteins (PIN1, Spg, and FYN, ranging from 0.29-0.43), the correlation coefficient alone does not capture the full predictive power of the estimates. Specifically, we find a clear linear relationship between experimental and theoretical result, with a small number of outlier residues responsible for reducing the correlation. We show that the quantitative accuracy of predictions can be improved by aggregating estimates taken from different structures, and that the problematic outlier residues can be both empirically and theoretically identified, allowing us to flag low-confidence values. Our findings not only provide a framework for identifying problematic mutations in advance but also offers new insights into potential improvements of the FoldX protocol for more accurate protein stability predictions. Our insights support the use of FoldX in computational saturation screens to support variant analysis.

Article activity feed