Annotating protein function is a major goal in molecular biology, yet experimentally determined knowledge is often limited to a few model organisms. In non-model species, the sequence-based prediction of gene orthology can be used to infer function, however this approach loses predictive power with longer evolutionary distances. Here we propose a pipeline for the functional annotation of proteins using structural similarity, exploiting the fact that protein structures are directly linked to function and can be more conserved than protein sequences.
We propose a pipeline for the functional annotation of proteins via structural similarity (CoFFE: Co lab F old- F oldSeek- E ggNOG) and use this to annotate the complete proteome of a sponge. Sponges are uniquely positioned for inferring the early history of animals, yet their proteomes remain sparsely annotated. CoFFE accurately identifies proteins with known homology in > 90% cases, annotating an additional 50% of the proteome beyond sequence-based methods. Using this, we uncover new functions for sponge cell types, including extensive FGF, TGF and Ephrin signalling in sponge epithelia, and redox metabolism and control in myopeptidocytes. Notably, we also annotate genes that arose via horizontal gene transfer, which are specific to the enigmatic sponge mesocytes and likely participate in digesting algal and plant cell walls.
Our work demonstrates that structural similarity is a powerful approach that complements and extends sequence-based annotations to bridge long evolutionary distances. We anticipate this approach will expand functional annotations across the tree of life and boost discovery in numerous -omics datasets.