AF2Rank Revisited: Reproducing AlphaFold-Based Structure Evaluation and a Hypothesis for Context-Aware Refinement (CAR-AF)

Priyanshu Kumar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein structure prediction has undergone transformative advancements with AlphaFold2 achieving near-experimental accuracy across extensive protein datasets. This study reproduces and validates the AF2Rank pipeline, which utilizes AlphaFold’s intrinsic confidence metrics—predicted Local Distance Difference Test (pLDDT) and predicted Template Modeling score (pTM)—to evaluate and rank decoy protein structures without dependence on multiple sequence alignments (MSAs). The pipeline was implemented locally using the Rosetta decoy dataset, overcoming reproducibility challenges such as software dependency conflicts, residue indexing inconsistencies, and system-level execution issues. This framework successfully enabled high-confidence evaluation of over 1000 decoys for a benchmark target (1a32), with ongoing expansion to 133 protein targets.

Notably, we discovered that AlphaFold confidence metrics encode protein-specific “fingerprints,” enabling reverse classification of structures to their source proteins. Our XGBoost-based classifier achieved 61.5% accuracy across 133 distinct protein targets, substantially exceeding random chance (0.75%). Feature importance analysis revealed that traditional energy functions (Rosetta normalized score, 26%) and derived interaction features (pLDDT-pTM ratio, 16%) provide the strongest discriminative power, suggesting AlphaFold metrics capture meaningful biological and structural information beyond generic quality assessment. Building upon this foundation, a novel hypothesis is introduced: Context-Aware Refinement of AlphaFold Predictions (CAR-AF). This approach postulates that AlphaFold-predicted structures may be further improved through refinement in the presence of their native binding partners—such as receptors or ligands—thereby producing conformations that are both structurally and functionally enhanced. The proposed methodology comprises four stages: generation of initial predictions via AlphaFold, structural modeling of relevant binding partners, docking of predicted proteins into these biological contexts, and refinement of the resulting complexes using molecular modeling tools such as Rosetta or HADDOCK. Structural and energetic metrics including TM-score, RMSD, and binding energy will be used to assess potential improvements in predictive quality.

Version published to 10.1101/2025.04.30.651434 on bioRxiv
May 7, 2025

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
Quantum-Assisted Refinement of AlphaFold Protein Structures

This article has 1 author:
1. Parham Ghayour
This article has no evaluationsLatest version Dec 31, 2025
A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

Quantum-Assisted Refinement of AlphaFold Protein Structures

A Survey on Efficient Protein Language Models