Birth of new protein folds and functions in the virome

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Rapid virus evolution generates proteins essential to infectivity and replication but with unknown function due to extreme sequence divergence 1 . Using a database of 67,715 newly predicted protein structures from 4,463 eukaryotic viral species, we found that 62% of viral proteins are evolutionarily young and lack homologs in the Alphafold database 2,3 . Among the 38% of more ancient viral proteins, many have non-viral structural homologs that revealed surprising similarities between human pathogens and their eukaryotic hosts. Structural comparisons suggested putative functions for >25% of unannotated viral proteins, including those with roles in the evasion of innate immunity. In particular, RNA ligase T- (ligT) like phosphodiesterases were found to resemble phage-encoded proteins that hydrolyze the host immune-activating cyclic dinucleotides 3’3’ and 2’3’ cyclic G-A monophosphate (cGAMP). Experimental analysis showed that ligT homologs encoded by avian poxviruses likewise hydrolyze 2’3’ cGAMP, showing that ligT-mediated targeting of cGAMP is an evolutionarily conserved mechanism of immune evasion present in both bacteriophage and eukaryotic viruses. Together, the viral protein structural database and analytics presented here afford new opportunities to identify mechanisms of virus-host interactions that are common across the virome.

Article activity feed

  1. A caveat of our study is the use of a stringent 70% coverage threshold. This means that some proteins with similar function but differences in domain configuration will be split into separate protein clusters, underestimating their taxonomic diversity.

    This is a good point, and something I also wonder about. I'm curious if you all played around with using a domain parser to try and separate out viral domains for individual comparison (either cath resolve for sequence, or DPAM for structure) ?