Residue burial encodes a protein’s fold

Alex T. Grigas
Jacob Sumner
Corey S. O’Hern

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein structure is controlled by a high-dimensional energy landscape, which is a function of all of the atomic coordinates of the protein. Can this landscape be accurately described by a low-dimensional representation? We find that residue core identity, a binary N -dimensional encoding indicating whether each of the N amino acids in a protein is buried in the core or not, can predict the protein’s backbone conformation more efficiently than all other representations that we tested. Core identity is 4 times more efficient than previous estimates of the bits per residue needed to encode a protein’s native fold, 2 times more efficient than the C _α contact map, and 1.5 times more efficient than the machine-learned embeddings from FoldSeek’s 3Di. Even when the folded structure is unavailable, predicting each residue’s burial from sequence yields a more accurate estimate of fold quality than predicting pairwise contacts from the same sequence information. Thus, this work emphasizes that the problem of determining a protein’s native fold can be re-framed as predicting each residue’s core identity.

Version published to 10.64898/2026.03.28.714986 on bioRxiv
Mar 31, 2026

Principles for the encoding of molecular information in DNA, RNA and protein motifs

This article has 4 authors:
1. Ezequiel Alejandro Galpern
2. Inés Bauer
3. Diego Ulises Ferreiro
4. Ignacio Enrique Sanchez
This article has no evaluationsLatest version Apr 15, 2026
Learning the structural diversity in random protein sequence space

This article has 9 authors:
1. Filip Buchel
2. Tereza Neuwirthova
3. Theodora Tureckiova
4. Gustavo Fuertes
5. Ales Benda
6. Dalibor Panek
7. Matus Fricek
8. Mohammed AlQuraishi
9. Klara Hlouchova
This article has no evaluationsLatest version May 5, 2026
AlphaInterp: Mechanistic Interpretability of AlphaFold 3 Reveals How Evolutionary Information Shapes Protein Structure Prediction

This article has 2 authors:
1. Jonathan Feldman
2. Jeffrey Skolnick
This article has no evaluationsLatest version Apr 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Principles for the encoding of molecular information in DNA, RNA and protein motifs

Learning the structural diversity in random protein sequence space

AlphaInterp: Mechanistic Interpretability of AlphaFold 3 Reveals How Evolutionary Information Shapes Protein Structure Prediction