Prediction of physical characteristics of disordered proteins using molecular simulation and physics-informed multiple machine learning strategies

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We introduce a novel hybrid machine learning (ML) framework to predict the radius of gyration and other conformational properties of intrinsically disordered proteins (IDPs). Our model integrates sequence information with physical features derived from a coarse-grained (CG) model validated by experimental data. Specifically, we combine hidden states from sequence-based models with 23 physical features projected into a shared latent space, and apply an attention mechanism that assigns weights to each residue to highlight the most informative regions of the sequence. This attention-guided fusion significantly improves predictive accuracy across multiple metrics, including MAPE and MSE, while also enhancing confidence in the predictions. We trained and evaluated our models on Brownian dynamics (BD) simulation results for approximately 7,000 IDPs from the MobiDB database (each with > 99% disorder score). We find that sequence-based models consistently outperform feature-only models, with the GRU achieving the best performance among sequence-only approaches. Moreover, combining sequence and feature information further improves accuracy across all architectures, with the hybrid biGRU model delivering the best overall predictive performance. SHAP analysis reveals the relative importance of physical features, offering model explainability and guiding feature selection. Notably, using a small number of top features often reduces model complexity and improves generalization. Furthermore an integrated gradient analysis reveals that apart from the length of the IDPs, the three parameters (SCD, SHD, and f * ) play key role in ML predictions. Our framework provides a fast, interpretable, and scalable tool for predicting IDP behavior, enabling efficient initial screening prior to costly molecular simulations.

Article activity feed