Just Add Structure: Protein Language Models Combined with Structural Equivariance Excel at Protein Tasks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate in silico prediction of protein properties, functional fitness, and mutational effects remains a central challenge in protein engineering and therapeutic design. While Protein Language Models (PLMs) successfully capture rich evolutionary and functional constraints from sequence data, they only indirectly encode the spatial and geometric information that fundamentally governs protein function. Consequently, state-of-the-art approaches typically rely on extensive fine-tuning, ensembling, or the incorporation of handcrafted structural features to achieve competitive accuracy, making them computationally expensive and difficult to scale. In this work, we demonstrate that explicit geometric modeling can substitute for, and in most cases outperform, large-scale PLM fine-tuning, with much higher parameter efficiency. Our approach, ProtEGNN, pairs PLM residue representations with a lightweight E (3) -Equivariant Graph Neural Network, competing with or achieving state-of-the-art performance across eight different benchmarks in protein property, mutational effect and function prediction, while needing 100–1000 × fewer parameters than competing methods. Even when protein structure is combined with representations from ESM2-T6, a small 8M-parameter PLM, ProtEGNN matches fine-tuned sequence-only approaches based on substantially larger PLM backbones, while training orders of magnitude fewer parameters. Together, these results highlight geometric inductive bias as a powerful and scalable alternative to task-specific fine-tuning of large PLMs for protein modeling.