Inferring Dynamic Information from Protein Structures by Gaussian Integrals and Deep Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein conformational flexibility underlies a wide range of biological functions, yet experimentally probing dynamics at atomic resolution remains costly and low-throughput. Here, we present a deep learning framework that predicts protein flexibility directly from static structural descriptors, bypassing the need for molecular dynamics (MD) simulations. Using the ATLAS database of standardized all-atom MD trajectories, we encoded 1,374 protein chains as 30-dimensional Gaussian integral (GI) vectors—global shape and topology invariants of the protein backbone. Principal component analysis of GI profiles revealed four structural clusters with distinct secondary structure compositions and flexibility distributions. We trained an attention-based one-dimensional convolutional neural network (1D-CNN) to classify proteins as flexible or non-flexible based on their root-mean-square fluctuation (RMSF) relative to the dataset-wide mean. The classifier achieved an AUC of 0.772 (95% CI: 0.712–0.826) on an independent test set, with balanced sensitivity and specificity, and identified a small subset of GI components as the most predictive. In a regression setting, a recurrent neural network outperformed other architectures, attaining an R 2 of 0.537, though high-flexibility values were systematically underestimated. Cluster-specific analyses indicated that coil-rich and β-sheet–dominated proteins were more amenable to flexibility prediction than α-helical proteins, likely due to greater structural heterogeneity. Our results demonstrate that compact GI descriptors preserve sufficient information to recover MD-derived flexibility trends, offering a computationally efficient complement to simulation-based approaches. This framework enables large-scale screening of protein dynamics from structural data alone, with potential applications in structural bioinformatics, drug design, and functional annotation.