Discovery of β-Sheet Peptide Assembly Codes via an Experimentally Validated Predictive Computational Platform
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deciphering the sequence codes governing ordered peptide assemblies remains challenging due to the need to explore vast sequence space with atomic resolution. Here, we present an experimentally validated computational framework combining hybrid-resolution molecular dynamics and machine learning for the discovery of b-sheet-rich amyloid-forming peptides. Through exhaustive simulations of all 8,000 tripeptides, we demonstrate that the widely used aggregation propensity (AP) is not effective in predicting b-sheet assembly. We introduce Amyloid-Like Tendency (ALT), a metric enabled by our hybrid-resolution simulations that effectively identifies cross-β architectures. Leveraging this physics-informed dataset, we further fine-tuned the Uni-Mol model to efficiently screen 160,000 tetrapeptides. Experimental validation of 46 candidates confirmed a predictive accuracy of ~85%, yielding 26 novel amyloid-forming peptides, including multiple hydrogelators. Mechanistic analysis reveals that specific sidechain stacking and central amino acid identity, beyond generic hydrophobicity, dictate ordered assembly. This establishes a scalable pipeline for the targeted design of functional peptide materials.