Inferring Local Protein Structural Similarity from Sequence Alone
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Detecting structural similarity at the local level between proteins is central to understanding function and evolution, yet most approaches require 3D models. In this work, we show that protein language models (pLMs), solely using sequence data as input, implicitly capture fine-grained structural signals that can be leveraged to identify such similarities. By mean-pooling residue embeddings over sliding windows and comparing them across proteins with cosine similarity, we find diagonal patterns that reflect locally aligned regions even without sequence identity. Building on this insight, we introduce a framework for detecting locally aligned structural regions directly from sequences, supporting the development of scalable methods for structural annotation and comparison.