GhostFold: Accurate protein structure prediction using structure-constrained synthetic coevolutionary signals

Nitesh Mishra
Bryan Briney

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The accuracy of protein structure prediction models such as AlphaFold2 is tightly coupled to the depth and quality of multiple sequence alignments ( MSAs ), posing a persistent challenge for proteins with few or no identifiable homologs. We present GhostFold, a method for conjuring structure-constrained synthetic MSAs from a single amino acid sequence, bypassing the need for traditional homology searches. Leveraging the ProstT5 protein language model and the 3Di structural alphabet, GhostFold projects a query sequence into a tokenized structural representation and iteratively back-translates to generate an ensemble of diverse, fold-consistent sequences. These synthetic alignments ( pseudoMSAs ) encode emergent coevolutionary constraints that are sufficient for high-accuracy structure prediction of difficult targets such as orphan proteins and hypervariable antibody loops. GhostFold consistently matches or exceeds the performance of MSA-based and language model-based structure predictors while being computationally lightweight and independent of large sequence databases. Notably, we observe a decoupling of confidence metrics (e.g., pLDDT) from prediction accuracy when using pseudoMSAs, suggesting that AlphaFold2’s internal confidence calibration is strongly influenced by the statistical properties of natural sequence alignments. These results establish that structure-guided synthetic MSAs can functionally substitute for evolutionary data, offering a scalable and generalizable solution to one of the central limitations in computational structural biology. GhostFold represents a shift from passive data mining to intelligent sequence synthesis, redefining how structural priors are encoded in deep learning-based protein folding.

Version published to 10.1101/2025.10.13.682177 on bioRxiv
Oct 14, 2025

The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026
Quantum-Assisted Refinement of AlphaFold Protein Structures

This article has 1 author:
1. Parham Ghayour
This article has no evaluationsLatest version Dec 31, 2025
A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Evolution of the AlphaFold Architecture

Quantum-Assisted Refinement of AlphaFold Protein Structures

A Survey on Efficient Protein Language Models