Enhancing protein structure prediction accuracy by prioritizing important residues using protein language models

Yu Liu
Boming Kang
Qinghua Cui

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurate prediction of protein tertiary structures from amino acid sequences remains a fundamental challenge in computational biology. Although AlphaFold2 represents a major advance, systematic discrepancies persist between its predictions and experimentally determined structures. Given that individual residues contribute differentially to protein function, we hypothesized that incorporating residue-specific importance metrics could improve prediction accuracy. Here, we develop i -Fold ( importance Fold), an enhanced neural architecture enhances the AlphaFold2 architecture by integrating protein language model ESM-derived residue importance scores (RIS) as dynamic positional weights during training. Our approach dynamically weights amino acids using RIS during structure prediction, thereby directing computational attention toward functionally critical residues and regions. Evaluation on a benchmark test set of 3,559 protein structures reveals that i -Fold significantly improves accuracy (reduction in r.m.s.d., p = 0) and achieves a higher prediction success rate (7.6% improvement: 55.1% → 62.7%). Notably, i -Fold demonstrates particular improvements for targets that are typically challenging for AlphaFold2, including ribosomal proteins, membrane proteins, and orphan proteins. Consistent results were obtained on a completely independent test set of 167 recently released protein structures, where i -Fold again exhibited a higher prediction success rate (6.0% improvement: 43.7% → 49.7%) compared to AlphaFold2. Our findings indicate that explicit integration of RIS can advance the state-of-the-art in protein structure prediction, producing more accurate and generalizable models without substantially increasing computational cost.

Version published to 10.1101/2025.09.28.679101 on bioRxiv
Sep 30, 2025

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

A Survey on Efficient Protein Language Models

The Evolution of the AlphaFold Architecture