Protein language models reveal hierarchical specialization in duplicated EF-hand calcium-binding motifs

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Gene duplication generates repeated protein architectures that combine structural symmetry with functional specialization. How evolutionary divergence is distributed across duplicated protein repeats remains poorly understood. EF-hand calcium-binding proteins provide an ideal model system because calmodulin contains four homologous motifs that share a conserved structural scaffold yet display functional asymmetries in calcium binding. Here, we apply protein language model–derived measures of contextual sequence constraint to investigate evolutionary specialization across the four EF-hand motifs of calmodulin using orthologs sampled across diverse eukaryotes. Representation analyses show that EF-hand motifs cluster according to duplication architecture rather than structural lobe identity, indicating that protein language model embeddings preserve the symmetry of the duplicated scaffold. Despite this global architectural similarity, regression and positional analyses reveal localized evolutionary divergence concentrated at canonical Ca2+-coordinating residues within the EF-hand loop. Language-model constraint correlates only partially with classical conservation metrics such as Shannon entropy and phylogenetic substitution rate, indicating that contextual sequence probabilities capture complementary evolutionary signals beyond alignment-based statistics. These results support a hierarchical evolutionary model in which duplicated EF-hand motifs retain architectural symmetry while undergoing residue-level functional specialization. The findings demonstrate that protein language models encode multi-scale evolutionary organization within modular proteins and provide a framework for identifying functionally specialized residues in duplicated protein architectures.

Article activity feed