A Structural Proteome Screen Identifies Protein Mimicry in Host-Microbe Systems

Gabriel Penunuri
Pingting Wang
Russell Corbett-Detig
Shelbi L Russell

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Host-microbe systems are evolutionary niches that produce coevolved biological interactions and are a key component of global health. However, these systems have historically been a difficult field of biological research due to their experimental intractability. Impactful advances in global health will be obtained by leveraging in silico screens to identify genes involved in mediating interspecific interactions. These predictions will progress our understanding of these systems and lay the groundwork for future in vitro and in vivo experiments and bioengineering projects. A driver of host-manipulation and intracellular survival utilized by host-associated microbes is molecular mimicry, a critical mechanism that can occur at any level from DNA to protein structures. We applied protein structure prediction and alignment tools to explore host-associated bacterial structural proteomes for examples of protein structure mimicry. By leveraging the Legionella pneumophila proteome and its many known structural mimics, we developed and validated a screen that can be applied to virtually any host-microbe system to uncover signals of protein mimicry. These mimics represent candidate proteins that mediate host interactions in microbial proteomes. We successfully applied this screen to other microbes with demonstrated effects on global health, Helicobacter pylori and Wolbachia , identifying protein mimic candidates in each proteome. We discuss the roles these candidates may play in important Wolbachia -induced phenotypes and show that Wobachia infection can partially rescue the loss of one of these factors. This work demonstrates how a genome-wide screen for candidates of host-manipulation and intracellular survival offers an opportunity to identify functionally important genes in host-microbe systems.

Arcadia Science
Dec 10, 2024

Scripts used for generating datasets and performing analysis are available at: https://github.com/gabepen/mimic_screen

It would be great if you could add some clarification to the readme for scripts that arent currently mentioned. For example, when is parse_hyphy_output.py used?

Read the original source
Arcadia Science
Dec 10, 2024

We performed structural alignments between proteomes with the tool Foldseek

Thanks for putting your code on github! I noticed on your github page that you masked low confidence ends of protein structures prior to alignment. This is an interesting consideration and I think is worth mentioning in the methods here.

Read the original source
Arcadia Science
Dec 10, 2024

is important to note that bacterial queries were not limited to alignments with a single host target structure and single query structures contributed multiple targets to the protein IDs used in the GO analysis

Did you do any analysis of queries that had strong hits (confident alignments) to multiple targets? I am curious about the distribution of these matches (were they all equally good matches? were they matches to proteins in the same family? etc)

Read the original source
Arcadia Science
Dec 10, 2024

5,227 unique microbe proteins

can you clarify what this value represents? Looking at supp table 3, it looks like there are 1669 unique microbe uniprot ids? This would also make sense if Legionella only has around 3000 proteins

Read the original source
Arcadia Science
Dec 10, 2024

conservation of critical residues and domains within the structural alignment.

Could you elaborate here on how you determined critical domains? Was this something you did manually/by-eye for only a very small set of proteins? Or did you do this systematically?

Read the original source
Arcadia Science
Dec 10, 2024

We selected an e-value cutoff of 0.01 for these alignments

I've noticed that the Foldseek e-value can be strongly affected by short query proteins that have low target coverage. As you note below, these could still be biologically meaningful, as pathogens may only need to do a good job mimicking a certain functional domain, for example, rather than the full protein. Did you notice this in your data? Is it possible that with this approach you are missing out on finding more partial mimics even though you are using a lenient target coverage cutoff?

Read the original source
Arcadia Science
Dec 10, 2024

Free-living proteomes that contained at least 900 structures were selected for use in the control dataset.

I'm curious about your decision to use a cutoff of 900 here. I would expect free-living bacteria to have more on the order of 3-4000 protein-coding genes. It might be useful to note the distribution you saw and why you chose this threshold.

Read the original source
Arcadia Science
Nov 13, 2024

Phylogenetic inference of HtpG and Hsp83 evolutionary histories reveal that the structural similarity of these proteins is due to deep structural conservation, and not to recent horizontal gene transfer (HGT)

This problem is a really interesting one! I'm wondering if you considered more formal tests for structural convergence of the mimic and the host protein, to test if the mimic has a higher TM score than you would expect given the phylogenetic distance? In the absence of a formal test, it could be interesting to even just plot the TM score of drosophila Hsp83 vs. the other proteins on the outside the tree to see if there is a big jump in TM score when you get around to HtpG

Read the original source
Version published to 10.1101/2024.04.10.588793 on bioRxiv
Apr 13, 2024

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed