A Structural Proteome Screen Identifies Protein Mimicry in Host-Microbe Systems
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (Arcadia Science)
Abstract
Host-microbe systems are evolutionary niches that produce coevolved biological interactions and are a key component of global health. However, these systems have historically been a difficult field of biological research due to their experimental intractability. Impactful advances in global health will be obtained by leveraging in silico screens to identify genes involved in mediating interspecific interactions. These predictions will progress our understanding of these systems and lay the groundwork for future in vitro and in vivo experiments and bioengineering projects. A driver of host-manipulation and intracellular survival utilized by host-associated microbes is molecular mimicry, a critical mechanism that can occur at any level from DNA to protein structures. We applied protein structure prediction and alignment tools to explore host-associated bacterial structural proteomes for examples of protein structure mimicry. By leveraging the Legionella pneumophila proteome and its many known structural mimics, we developed and validated a screen that can be applied to virtually any host-microbe system to uncover signals of protein mimicry. These mimics represent candidate proteins that mediate host interactions in microbial proteomes. We successfully applied this screen to other microbes with demonstrated effects on global health, Helicobacter pylori and Wolbachia , identifying protein mimic candidates in each proteome. We discuss the roles these candidates may play in important Wolbachia -induced phenotypes and show that Wobachia infection can partially rescue the loss of one of these factors. This work demonstrates how a genome-wide screen for candidates of host-manipulation and intracellular survival offers an opportunity to identify functionally important genes in host-microbe systems.
Article activity feed
-
Scripts used for generating datasets and performing analysis are available at: https://github.com/gabepen/mimic_screen
It would be great if you could add some clarification to the readme for scripts that arent currently mentioned. For example, when is parse_hyphy_output.py used?
-
We performed structural alignments between proteomes with the tool Foldseek
Thanks for putting your code on github! I noticed on your github page that you masked low confidence ends of protein structures prior to alignment. This is an interesting consideration and I think is worth mentioning in the methods here.
-
is important to note that bacterial queries were not limited to alignments with a single host target structure and single query structures contributed multiple targets to the protein IDs used in the GO analysis
Did you do any analysis of queries that had strong hits (confident alignments) to multiple targets? I am curious about the distribution of these matches (were they all equally good matches? were they matches to proteins in the same family? etc)
-
5,227 unique microbe proteins
can you clarify what this value represents? Looking at supp table 3, it looks like there are 1669 unique microbe uniprot ids? This would also make sense if Legionella only has around 3000 proteins
-
conservation of critical residues and domains within the structural alignment.
Could you elaborate here on how you determined critical domains? Was this something you did manually/by-eye for only a very small set of proteins? Or did you do this systematically?
-
We selected an e-value cutoff of 0.01 for these alignments
I've noticed that the Foldseek e-value can be strongly affected by short query proteins that have low target coverage. As you note below, these could still be biologically meaningful, as pathogens may only need to do a good job mimicking a certain functional domain, for example, rather than the full protein. Did you notice this in your data? Is it possible that with this approach you are missing out on finding more partial mimics even though you are using a lenient target coverage cutoff?
-
Free-living proteomes that contained at least 900 structures were selected for use in the control dataset.
I'm curious about your decision to use a cutoff of 900 here. I would expect free-living bacteria to have more on the order of 3-4000 protein-coding genes. It might be useful to note the distribution you saw and why you chose this threshold.
-
Phylogenetic inference of HtpG and Hsp83 evolutionary histories reveal that the structural similarity of these proteins is due to deep structural conservation, and not to recent horizontal gene transfer (HGT)
This problem is a really interesting one! I'm wondering if you considered more formal tests for structural convergence of the mimic and the host protein, to test if the mimic has a higher TM score than you would expect given the phylogenetic distance? In the absence of a formal test, it could be interesting to even just plot the TM score of drosophila Hsp83 vs. the other proteins on the outside the tree to see if there is a big jump in TM score when you get around to HtpG
-