Generating Hybrid Proteins with the MSA-Transformer

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Protein superfamilies display extensive sequence and functional divergence, providing a rich landscape for engineering hybrid or functionally enhanced proteins.ergence, providing a rich landscape for engineering hybrid or functionally enhanced proteins. We present a stochastic, iterative framework that leverages the MSA-Transformer to generate intermediate sequences that define a pathway between a user-defined homologous 'source' and 'target' protein in sequence space. Targeted site selection for masking is guided either by embedding-based dissimilarity or by row-attention information, while beam search concurrently explores multiple mutational pathways. Pretrained sparse autoencoders combined with sequence and structural analyses are used to trace the inheritance and exchange of features across the mutational pathways, revealing hybrid sequences that integrate properties of both source and target proteins. Applied across diverse protein families, the framework produces sequences that occupy biologically meaningful regions of sequence space and achieve higher consistency and plausibility scores than random baselines according to in-silico metrics. In the B1/B2 metallo-beta-lactamase family, the generated hybrids largely retain their core fold recombining structural and active-site motifs from both subclasses, demonstrating the model's capacity to preserve catalytic features while exploring novel structural permutations. Code availability: The implementation is available via GitHub at https://github.com/santule/protmixy.

Article activity feed