ConTP: Reshaping transporter functional space to resolve substrate specificity beyond evolutionary proximity
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Membrane transporter annotation has long relied on a homology-centric paradigm that treats evolutionary proximity as a proxy for substrate specificity. Yet transporter functional space is intrinsically long-tailed, partially multi-label, and often decoupled from phylogeny, creating systematic blind spots in substrate-level inference. We reformulate substrate annotation as a chemically coherent multi-label problem and construct benchmarks spanning 70 fine-grained substrate types and 1,352 TC families. We introduce ConTP, an evolution-informed contrastive framework that realigns pretrained protein language model embeddings around substrate semantics rather than sequence similarity, enabling taxon-agnostic, prototype-based inference. In this aligned manifold, cross-family convergence, exemplified by sodium transport across distinct TC superfamilies, and authentic multi-substrate specificity in NRAMP transporters are faithfully recovered. Furthermore, projection of generated sequences exposes substrate-fidelity violations in contemporary design models. Together, these findings support a geometry-aware view of transporter specificity beyond raw evolutionary similarity.