Missense mutation knowledge can decrease prediction inaccuracies on protein secondary structure
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein tertiary structure prediction models like AlphaFold2 have revolutionized the field with unprecedented accuracy. Yet predicting structural changes arising from single amino acid mutations remains a challenge. The complexity introduced by these mutations calls for models that can incorporate mutational information into their predictions. We propose a novel refinement strategy for protein secondary structure prediction that leverages missense mutational data. As part of this strategy, we introduce Mut2Dens , a model that not only yields improved consistency of predictions for mutational data, but also maintains robust predictive performance on non-mutational datasets. Mut2Dens takes multiple predicted secondary structures and generates a mutation-aware secondary structure. This awareness comes from our mutational dataset, learning to avoid common mistakes in prediction methods after a missense mutation occurs. In particular, Mut2Dens employs the extremely randomized trees (ExtraTree) algorithm to avoid overfitting and makes effective use of the limited mutational data available from experimentally determined three-dimensional structures. By combining predictions from highly accurate structure prediction models, we create an ensemble that integrates their strengths while enhancing mutational capabilities. This refinement strategy also improves the non-mutational performance of state-of-the-art methods by addressing their most inaccurate and least confident predictions. Moreover, it reduces improbable outcomes in mutated protein structures—such as transforming π -helices into β -sheets—that can still occur in current prediction models. Finally, by using interpretable machine learning algorithms (e.g., ExtraTree), we can reveal the underlying biological knowledge from the refinement model; the insights gained from Mut2Dens can be corroborated with known mutational outcomes, helping users pinpoint discrepancies across structure prediction models and make more informed decisions regarding the predicted structures. The data utilized here is available at https://github.com/ivanpmartell/sam-models .