Mapping the diverse topologies of protein-protein interaction fitness landscapes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
De novo binder discovery is unpredictable and inefficient due to a lack of quantitative understanding of protein-protein interaction (PPI) sequence-function landscapes. Here, we use our PANCS-Binder technology to perform >1,300 independent selections of various library sizes and compositions of a randomized small protein to identify binders to a panel of 96 distinct target proteins. For successful selections, we discovered reproducible fitness landscapes that group into a few, target-specific, clusters. Each cluster defines a minimal binding motif whose frequency is inversely proportional to the number of specified amino acids (∼2–8) and determines selection success, which is quantifiable by the density of binders to the target within a theoretical sequence space. We leverage these data to develop a supervised contrastive learning approach that discriminates binders from non-binders and demonstrates generalization beyond a threshold amount of data. Together, this framework renders PPI landscapes measurable and predictive, accelerating de novo binder discovery and optimization.