Just Add Structure: Protein Language Models Combined with Structural Equivariance Excel at Protein Tasks

Qurat-ul-ain
Carlos Outeiral
Matteo Cagiada
Yee Whye Teh
Charlotte M. Deane

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurate in silico prediction of protein properties, functional fitness, and mutational effects remains a central challenge in protein engineering and therapeutic design. While Protein Language Models (PLMs) successfully capture rich evolutionary and functional constraints from sequence data, they only indirectly encode the spatial and geometric information that fundamentally governs protein function. Consequently, state-of-the-art approaches typically rely on extensive fine-tuning, ensembling, or the incorporation of handcrafted structural features to achieve competitive accuracy, making them computationally expensive and difficult to scale. In this work, we demonstrate that explicit geometric modeling can substitute for, and in most cases outperform, large-scale PLM fine-tuning, with much higher parameter efficiency. Our approach, ProtEGNN, pairs PLM residue representations with a lightweight E (3) -Equivariant Graph Neural Network, competing with or achieving state-of-the-art performance across eight different benchmarks in protein property, mutational effect and function prediction, while needing 100–1000 × fewer parameters than competing methods. Even when protein structure is combined with representations from ESM2-T6, a small 8M-parameter PLM, ProtEGNN matches fine-tuned sequence-only approaches based on substantially larger PLM backbones, while training orders of magnitude fewer parameters. Together, these results highlight geometric inductive bias as a powerful and scalable alternative to task-specific fine-tuning of large PLMs for protein modeling.

Version published to 10.64898/2026.05.28.728196 on bioRxiv
May 29, 2026

Simple baselines rival protein language models in mutation-dense design of function tasks

This article has 2 authors:
1. Itay Talpir
2. Sarel J. Fleishman
This article has no evaluationsLatest version May 6, 2026
Evolutionary constraints improve protein large language model predictions for protein stability, binding regions and epistasis

This article has 3 authors:
1. Konstantina Tzavella
2. Catharina Olsen
3. Wim Vranken
This article has no evaluationsLatest version May 26, 2026
Cross-Attention Over RNA And Protein Sequences Enables Generalizable Interaction Prediction

This article has 7 authors:
1. Mario Catalano
2. Gerardo Pepe
3. Gabriele Ausiello
4. Claire McWhite
5. Giorgio Gambosi
6. Manuela Helmer Citterich
7. Pier Federico Gherardini
This article has no evaluationsLatest version Apr 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Simple baselines rival protein language models in mutation-dense design of function tasks

Evolutionary constraints improve protein large language model predictions for protein stability, binding regions and epistasis

Cross-Attention Over RNA And Protein Sequences Enables Generalizable Interaction Prediction