Protein Retrieval via Integrative Molecular Ensembles (PRIME) through extended similarity indices

Lexin Chen
Arup Mondal
Alberto Perez
Ramón Alain Miranda-Quintana

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Molecular dynamics (MD) simulations are ideally suited to describe conformational ensembles of biomolecules such as proteins and nucleic acids. Microsecond-long simulations are now routine, facilitated by the emergence of graphical processing units. Processing such ensembles on the basis of statistical mechanics can bring insights about different biologically relevant states, their representative structures, states, and even dynamics between states. Clustering, which groups objects based on structural similarity, is typically used to process ensembles, leading to different states, their populations, and the identification of representative structures. For some purposes, such as in protein structure prediction, we are interested in identifying the representative structure that is more similar to the native state of the protein. The traditional pipeline combines hierarchical clustering for clustering and selecting the cluster centroid as representative of the cluster. However, even when the first cluster represents the native basin, the centroid can be several angstroms away in RMSD from the native state – and many other structures inside this cluster could be better choices of representative structures, reducing the need for protein structure refinement. In this study, we developed a module—Protein Retrieval via Integrative Molecular Ensemble (PRIME), that consists of tools to determine the most prevalent states in an ensemble using extended continuous similarity. PRIME is integrated with our Molecular Dynamics Analysis with N -ary Clustering Ensembles (MDANCE) package and can be used as a post-processing tool for arbitrary clustering algorithms, compatible with several MD suites. PRIME was validated with ensembles of different protein and protein complex systems for their ability to reliably identify the most native-like state, which we compare to their experimental structure, and to the traditional approach. Systems were chosen to represent different degrees of difficulty such as folding processes and binding which require large conformational changes. PRIME predictions produced structures that when aligned to the experimental structure were better superposed (lower RMSD). A further benefit of PRIME is its linear scaling – rather than the traditional O( N ² ) traditionally associated to comparisons of elements in a set.

Version published to 10.1101/2024.03.19.585783 on bioRxiv
Mar 21, 2024

GTcomplex: Spatial indexing-powered search and alignment of macromolecular complexes

This article has 1 author:
1. Mindaugas Margelevicius
This article has no evaluationsLatest version Jan 22, 2026
Are Energy and Forces Really Enough? Using Structure to Evaluate the Accuracy and Transferability of Machine Learning Potentials of Biomolecules

This article has 3 authors:
1. Lejla S. Biberić
2. Nisarg Joshi
3. Jim Pfaendtner
This article has no evaluationsLatest version Jan 14, 2026
The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

GTcomplex: Spatial indexing-powered search and alignment of macromolecular complexes

Are Energy and Forces Really Enough? Using Structure to Evaluate the Accuracy and Transferability of Machine Learning Potentials of Biomolecules

The Evolution of the AlphaFold Architecture