Towards a comprehensive view of the pocketome universe – biological implications and algorithmic challenges
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
With the availability of reliably predicted 3D-structures for essentially all known proteins, characterizing the entirety of protein - small-molecule interaction sites (binding pockets) has become a possibility. The aim of this study was to identify and analyze all compound-binding sites, i.e. the pocketomes, of eleven different species’ from different kingdoms of life to discern evolutionary trends as well as to arrive at a global cross-species view of the pocketome universe. All protein structures available in the AlphaFold database for each species were subjected to computational binding site predictions. The resulting set of potential binding sites was inspected for overlaps with known pockets and annotated with regard to the protein domains in which the pockets are located. 2D-projections of all pockets embedded in a 128-dimensional feature space and characterizing them with regard to selected physicochemical properties, provide informative, global pocketome maps that reveal differentiating features between pockets. Our study revealed a sub-linear scaling law of the number of unique binding sites relative to the number of unique protein structures per species. Thus, larger proteomes harbor less than proportionally more different binding sites than species with smaller proteomes. However, this relationship is mainly driven by large numbers of singletons, i.e. binding sites that are not similar to any other binding site in the same species. We discuss the significance of this finding as well as identify critical and unmet algorithmic challenges.