protPheMut: An Interpretable Machine Learning Tool for Classification of Cancer and Neurodevelopmental Disorders in Human Missense Variants
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Recent advances in human genomics have revealed that missense mutations in a single protein can lead to distinctly different phenotypes. In particular, some mutations in oncoproteins like Ras, MEK, PI3K, PTEN, and SHP2 are linked various cancers and Neurodevelopmental Disorders (NDDs). While numerous tools exist for predicting the pathogenicity of missense mutations, linking these variants to certain phenotypes remains a major challenge, particularly in the context of personalized medicine.
Results
To fill this gap, we developed protPheMut (Protein Phenotypic Mutations Analyzer), leveraging multiple interpretable machine learning methods and integrate diverse biophysics and network dynamics-based signatures, for the prediction of mutations of the same protein can promote cancer, or NDDs. We illustrate the utility of protPheMut in phenotypes (cancer/NDDs) prediction by the mutation analysis of two protein cases, that are PI3Kα and PTEN. Compared to seven other predictive tools, protPheMut demonstrated exceptional accuracy in forecasting phenotypic effects, achieving an AUROC of 0.8501 for PI3Kα mutations related to cancer and Cowden syndrome. For multi-phenotypes prediction of PTEN mutations related to cancer, PHTS, and HCPS, protPheMut achieved an AUC of 0.9349 through micro-averaging. Using SHAP model explanations, we gained insights into the mechanisms driving phenotype formation. A userfriendly website deployment is also provided.
Availability
Source code and data are available at https://github.com/Spencer-JRWang/protPheMut . We also provide a user-friendly website at http://netprotlab.com/protPheMut .
Supplementary information
Supplementary data are available at Bioinformatics online.