Unfolded to Folded: Unraveling the Secrets of Protein Folding with ProteusFold

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Protein folding has long been regarded as the “holy grail” of biology, typically demanding large models and massive GPU clusters. This study introduces Pro-teusFold, a compact and interpretable model with only 993,408 parameters that achieves state-of-the-art accuracy on modest hardware with an inference time of 0.0011 seconds. By framing folding as an unfolded-to-folded sequence transformation using a novel structural tokenization, ProteusFold reduces regression complexity while preserving bond connectivity through the concept of “Synapses.” It achieves near-atomic fidelity (RMSD 0.24,Å, GDT-TS 99.85) and excels in protein–protein docking with a mean DockQ of 0.7675, with 95.5% of complexes above the 0.23 threshold. Compared to AlphaFold2’s Predicted Aligned Error (6.28), ProteusFold attains 0.396, representing an order-of-magnitude gain in positional accuracy. Beyond accuracy and efficiency, the model provides residue-level attribution analyses that highlight biologically significant residues, serving as a preliminary guide for experiments. Furthermore, ProteusFold is the first to provide atomic-level attribution of key electronic and thermal properties, offering deeper insight into folding mechanisms and pinpointing the specific atoms responsible for distinct scenarios. Moreover, a meta-analysis suggests the presence of folding hotspots , where critical residues cluster, revealing new avenues for discovery. Thus, ProteusFold delivers accuracy, interpretability, and efficiency, broadening access to protein-folding research.

Article activity feed