Inferring Local Protein Structural Similarity from Sequence Alone

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Detecting structural similarity at the local level between proteins is central to understanding function and evolution, yet most approaches require 3D models. In this work, we show that protein language models (pLMs), solely using sequence data as input, implicitly capture fine-grained structural signals that can be leveraged to identify such similarities. By mean-pooling residue embeddings over sliding windows and comparing them across proteins with cosine similarity, we find diagonal patterns that reflect locally aligned regions even without sequence identity. Building on this insight, we introduce a framework for detecting locally aligned structural regions directly from sequences, supporting the development of scalable methods for structural annotation and comparison.

Article activity feed