Structural assembly of the bacterial essential interactome

This article has been Reviewed by the following groups

Read the full article

Listed in

Log in to save this article

Abstract

The study of protein interactions in living organisms is fundamental for understanding biological processes and central metabolic pathways. Yet, our knowledge of the bacterial interactome remains limited. Here, we combined gene deletion mutant analysis with deep-learning protein folding using AlphaFold2 to predict the core bacterial essential interactome. We predicted and modeled 1402 interactions between essential proteins in bacteria and generated 146 high-accuracy models. Our analysis reveals previously unknown details about the assembly mechanisms of these complexes, highlighting the importance of specific structural features in their stability and function. Our work provides a framework for predicting the essential interactomes of bacteria and highlight the potential of deep-learning algorithms in advancing our understanding of the complex biology of living organisms. Also, the results presented here offer a promising approach to identify novel antibiotic targets.

Article activity feed

  1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Reply to the reviewers

    Reviewer #1

    The paper provides models of essential complexes formed in bacteria. These models have been predicted by AlphaFold2 and in some of the models, information from existing experimental structures is utilized. The predicted models have been calculated based on standard workflow procedures which are explained in detail and can be reproduced by others. The figures are informative and clear.

    We are grateful for the reviewer's insightful comments, which have significantly contributed to improve our manuscript.

    Suggestions for improvement:

    The PDB accession codes of the experimental structures should be providedb. A comparison of the predicted models with the experimental structures should be provided (e.g. same orientation, superposition). In Fig. 6 for example, a figure with superposition or use of the same orientation would be more informative.

    As suggested by the reviewer, we have included a new table (Table 1) listing all experimental structures discussed in the main text, with the corresponding PDB codes. All predictions are listed in Supplementary File 1. For instances with available PDB codes, we compared the predicted structures to the experimental ones (new Supplementary Figure 3). In Fig. 6, the structures were difficult to superimpose because the subunits in the complexes have different relative orientations. To help comparing both models, we have added a schematic representation (new Fig. 6c).

    The paper will certainly generate many hypotheses based on the predicted models. In this respect, it would be useful for a wide audience in the bioscience field. However, the discussed models will need experimental verification by various techniques, such as X-ray crystallography, cryo-EM, SAXS, and structural proteomics. A more thorough analysis of the literature may help to improve the paper in this respect.

    We acknowledge the reviewer's emphasis on the importance of experimental verification of the predicted models. We have conducted a thorough analysis of the literature to identify instances where experimental verification could complement our predictions. __We identified several mutations in BirA, documented in the literature, that affect its interaction with AccB. __In BirA mutations M310L and P143T were found to induce a superrepressor phenotype (BirA lacks the capacity to biotinylate AccB). These mutations do not significantly affect the BirA active site, but can destabilize the BirA-AccB interface. We have added this information in the main text. Also, we investigated whether our complexes have known crosslinks in the xlinkdb database(https://xlinkdb.gs.washington.edu/xlinkdb/). We found information for five of our predicted complexes. In all cases, the distance restraints identified by crosslinking (crosslinked lysines are ~15Å apart) are compatible with our models. We have incorporated these references into a new table in Supplementary File 1. Unfortunately, we could not find more information in the xlinkdb that can be used to further validate our complexes.

    Supplementary table. Selected binary complexes modeled by AF2 whose structure is experimentally verified by cross-linking mass spectrometry.

    Protein 1

    Protein 2

    Peptide 1

    Pepitde 2

    Species

    acca

    accd

    VNMLQYSTYSVISPEGCASILWKSADK

    IKSNITPTR

    E. coli

    dnak

    grpe

    DDDVVDAEFEEVKDKK

    VKAEMENLR

    E. coli

    rpob

    rpoc

    GKTHSSGK

    KGLADTALK

    E. coli

    bama

    bamd

    TVDIKPAR

    DVSYLKVAYQNFVDLIR

    A. baumannii

    secd

    secf

    ILGKTANLEFR

    MPSEDPELGKK

    P. aeruginosa

    Reviewer #2

    This study attempts to identify the 'essential interactome' through combining information in presence/absence genomics across bacteria, information in the STRING database, and predictions from alpha-fold. Overall, the strategy is clear, and I do not have concerns about reproducibility and clarity.

    We value the reviewer's constructive evaluation of our manuscript and we would like to thank the reviewer's feedback as it has significantly helped us in improving our manuscript.

    Strengths: Clever approach to get at the essential interactome.

    Weaknesses: Putative impact. It is clear why understanding which interactions are present are important. But even as the authors suggest, interactions are dynamic and there are plenty of other tools that people could use to find interactions (including AA Coev that the authors themselves cite). The counter argument the authors bring up is the high false positive rate of interactions that is solved by this method. While true, the stringency criteria for what constitutes an interaction in this paper is remarkably high: each protein within the interaction needs to be essential, and needs to have a high confidence score in STRING, and then there is a hyperparameter that dictates the level at which AlphaFold 2 is providing confident answers. In this sense, this is less about an 'essential' interactome, and more about an interactome that is present with the highest true positive rate (trading off with the ability to discover new interactions at a reasonable breadth).

    We appreciate the reviewer's insights concerning the stringency criteria for defining interactions. Here, we provide a detailed justification for our selection criteria and show how it aligns with our goal of identifying high-confidence interactions.

    1. Protein essentiality: In our model, interactions are considered essential if, and only if, both proteins involved are essential, providing a conservative estimate for the essential interactome. In our revised manuscript, we explored the possibility the potential for two non-essential proteins to form an essential interaction by investigating synthetically lethal interactions. Out of all synthetic lethal interactions in * coli*, only 28 interactions were identified, and only two could be modeled with an ipTM score > 0.6. Likely, these non-essential proteins operate in parallel or compensatory pathways instead of interacting directly. These findings lend support to our hypothesis and suggest that our interactome encompasses most essential interactions.
    2. Conditional essentiality: While we concur with the reviewer that our study does not address conditional essentiality, we would like to note that exploring conditional essential interactions across all the bacterial species discussed in our manuscript is currently unviable. Just as a matter of example, we checked the overlap in essential genes between Acinetobacter baumannii and Pseudomonas aeruginosa in the lung environment (Wang et al., 2014; Potvin et al., 2003). In that case, there is a minimal overlap between the two species, suggesting that conditional interactions might also be species-dependent. In our manuscript, we aimed to describe the core essential interactions for Gram-negative and Gram-positive bacteria under standard laboratory growth conditions. We agree that further research is needed to incorporate specific, context-dependent interactions to provide a complete, comprehensive view of the interactome. Nonetheless, we define here the first bacteria essential interactome that, in our opinion, marks a significant step towards understanding bacterial cell metabolism and holds relevance in applications such as developing broad-spectrum antibiotics.
    3. Confidence of the interaction: All existing methods to predict protein-protein interactions, including those based on coevolution, suffer from poor performance metrics. Most of them generate many false positive interactions while missing important ones. Without the aim of being exhaustive, here we reproduce a table of some of the latest computational methods to predict PPIs. Table 1. Performance of state-of-the-art PPI prediction methods (Huang et al., 2023).

    Methods

    AUPRCa

    *SGPPI *

    0.422

    Profppikernelb

    0.359

    PIPRc

    0.342

    PIPE2b

    0.220

    SigProdb

    0.264

    a AUPRC denotes the average AUPRC value of 10-fold cross-validation.

    It is clear from the data that such methods are not mature enough to be used as confident predictors. Hence, we decided to resort to validated interactions in the String database, which is one of the most comprehensive PPI databases__. In this revised version, we have expanded our data set to include all experimentally labeled interactions in the String database, even those with a low probability (experimental score > 0.15).__ The addition of these new interactions increased the total number of interactions tested from 1089 to 1402 and generated 38 new models for Gram-negative species (13 with high accuracy) and 275 new models for Gram-positive bacteria (18 with high accuracy). All interactions are now included in the Supplementary File 1 and high accuracy models will be deposited on Model Archive after acceptance.

    Alphafold (AF2) criterion for complex prediction. Although AF2 has its limitations, its accuracy in predicting bacterial complexes is consistently high. In various benchmarking studies, AF2 Multimer accurately predicted between 70-75% of tested complexes, with almost 90% of them being of medium-to-high quality (Evans et al., Yin et al., 2022). While there might be some minor deviations, AF2 can largely capture the bacterial essential interactome accurately. In the revised version, we compare pDockQ and pDockQ2 metrics with our ipTM criterion to define confident models. We observed that both pDockQ and pDockQ2 metrics were capable of identifying highly reliable complexes, but also disregarded actual complexes (Supplementary Figure 1). Thus, we decided to retain our initial criterion, based on ipTM scores, which is consistent with other authors who used similar ipTM thresholds to model bacterial interactions (e.g., O’Reilly et al., 2023).

    In summary, although our methodology has inherent limitations, we believe that our approach is sound and can give a comprehensive and realistic view of the bacterial essential interactome. We hope that these new insights further substantiate our approach.

    I don't know of too many studies that use AlphaFold 2 in this way. This was clever. However, there are plenty of studies that use phylogenomic information to infer interactions. In this sense, the core idea of the paper is not intrinsically novel.

    We thank the reviewer for valuing our approach. Although other methods have been used to predict interactomes, our study, to the best of our knowledge, provides the first high-quality essential interactome for bacteria. We used experimental data (analysis of single deletion mutants) to define the essential interactions in bacteria. Other methods, either using phylogenomic information and/or deep learning tools to infer interactions, have a poor performance, as illustrated in the preceding table. Often, these methods yield a high number of interactions and, in many cases, show a bias towards overrepresented entries in the positive databases used to train the predictors (Macho Rendón et al., 2022). Also, while other methods lack detailed structural insights into the interactions, we offer structural models for every interaction tested.

    Overall, I do feel this would be worth publishing as an expose of AF2 is capable of. I'm not sure of the impact it will have on researchers, however.

    We appreciate the reviewer's positive feedback on our manuscript. __Using AF2, we identified key interactions using only gene deletion mutant data. __This manuscript reveals new insights into the assembly of essential bacterial complexes, providing specific structural details to understand their stability and function. Additionally, our work seeks to establish a methodology applicable to all bacterial species, guiding future research in this field. The approach taken in this study may expand drug targeting opportunities and accelerate the development of more effective antibiotics aimed to disrupt these essential interactions. In conclusion, the impact of the paper lies in its novel use of Alphafold2 to understand essential bacterial protein interactions, providing key insights into assembly mechanisms, and identifying new potential drug targets.

    Reviewer #3

    The selection of "essential" interactions is a bit arbitrary, given that their main criterion for selection is that both proteins are essential. Unfortunately, it's not always clear where the essential protein data is coming from. Authors cite Mateus et al. (ref 15) as source for E. coli, but I don't see an explicit list of essential genes in this paper (nor its supplement). For Pseudomonas the citation doesn't contain author information and for Acinetobacter essentiality only seems to refer to "essentiality" in the lung.

    As a minimum, the author should provide a table with summary statistics for the essential proteins they are using, as this is the basis for the whole paper. Such a table should include the names of the species, the number of genes that are considered as essential, a very brief characterization of how essentiality was determined and the source for this information. For instance, are the genes listed in the Supplementary File congruent with the genes in the Database of Essential Genes (DEG) for these organisms? Finally, authors should indicate in that table which (essential) protein pairs are conserved across species, as this is another one of their selection criteria. Conservation is not necessary for an essential interaction, but it certainly makes it more likely.

    We understand the reviewer's concerns regarding the selection of essential interactions and the need for a more thorough description of the sources of essential protein data. To address these concerns in the revised manuscript:

    1. __We included a clear explanation of the sources for essential protein data, including proper citations for each organism in Supplementary File 1. __The selected studies were primarily sourced from the DEG database. If data was unavailable, we revised the literature for relevant studies. The DEG database's most recent update was on September 1, 2020. __A graphical summary of the datasets has been included in Supplementary Figure 12, __that shows the overlapping between the different studies.
    2. We included comprehensive information for the essential proteins used in our study in Supplementary File 1. The file provides two tables detailing genes for both Gram-positive and Gram-negative datasets. Each table lists the gene names and their corresponding Uniprot IDs for every species in our study, as well as their orthologues in other organisms. Also, the reviewer was right in pointing out that for* Acinetobacter baumannii*, the study was conducted in the lung, which may bias the results as all other studies were performed in the test tube. To solve this, we replaced this study for Bai et al., 2021, that was performed in rich medium.

    Author should also state whether they have verified that none of the random pairs are in the positive set.

    We thank the reviewer for this comment. We certainly checked that none of the random pairs was present in the positive dataset. This clarification has now been added to the methods section.

    This is also relevant because authors "retrieved all high-confidence PPIs between these proteins from the STRING database" which provides compound scores for interactions but that has often little to do with physical interactions (given that the scores factor in co-expression and several other criteria). In fact, I find STRING scores difficult to interpret for that very reason.

    We appreciate the reviewer's comment to the use of combined interaction scores from the STRING database. We agree with the reviewer that STRING combined scores are somehow difficult to interpret because they combine different evidence of interaction. We decided to use the STRING combined scores to include interactions that may not have direct experimental evidence but are probable to interact according to other information (e.g., co-expression). However, to further examine the interactome we have also included in the revised version all interactions with experimental evidence in String to complete our interactome. As mentioned in the response to Reviewer 1, __we expanded the tested interactions from 1089 to 1402. This resulted in 38 new models for Gram-negative species, with 13 being highly accurate, and 275 for Gram-positive bacteria, of which 18 were highly accurate. All interactions are now included in the Supplementary File 1 __and high accuracy models will be deposited on the Model Archive after acceptance.

    The authors "reasoned that a given interaction would only be essential if and only if both proteins forming the complex are essential" - this sounds reasonable but doesn't capture synthetically lethal (genetic) interactions, that is, interactions between two proteins that are both non-essential but are essential in combination. Admittedly, I don't have a number of how many such cases exist, but there are such cases in the literature (e.g. Hannum et al. 2009, PLoS Genet 5[12]: e1000782, for yeast).

    We thank the reviewer for bringing this point into discussion. We acknowledge that our reasoning does not capture synthetic lethality, which occurs when the loss of one of two individual genes has no effect on cell survival, but the simultaneous loss of both leads to cell death. In this case, the two genes or proteins are non-essential individually but become essential in combination. To cover synthetic lethality, we retrieved all synthetically lethal interactions found in Escherichia coli, strain K12-BW25113 from the Mlsar database and included them in our pipeline. We identified 28 synthetically lethal PPIs (involving 45 proteins) and we modeled them with AF2. Only two interactions displayed an ipTM score > 0.6 (nadA-pncB and nuoG-purA). Hence, the number of interactions due to synthetic lethality seems to contribute low to the overall interactome. We believe that synthetic lethal partners often function in parallel or compensatory pathways, rather than directly interacting with each other. For example, in yeast, the genes RAD9 and RAD24 are synthetic lethal. RAD9 is involved in cell cycle checkpoints, while RAD24 is involved in DNA damage response. They function in related pathways but do not encode proteins that directly interact with each other. Hence, finding specific examples of proteins that are both synthetic lethal and directly interact might be challenging as the synthetic lethal relationship often reveals functional rather than physical interactions.

    Apart from that, one could question the selection method more generally, given that for a biological process always essential and non-essential proteins work together, so I wonder why the authors didn't include additional proteins known to be involved in specific processes as this could make their predictions much more biologically meaningful.

    We agree with the reviewer that accessory proteins are important to understand the biological context of interactions. In fact, in several sections of our manuscript, we included accessory proteins to fully describe the essential complexes. For example, in the cell division complex, we incorporated proteins like MreCD-RodZ from the elongasome to enhance the structural context of the interactions. However, a comprehensive explanation of all identified interactions and accessory proteins would extend beyond the scope of this manuscript and further lengthen an already extensive document. In our study, we sought to describe the fundamental interactions for both Gram-negative and Gram-positive bacteria. We anticipate that our findings will prompt additional research to confirm our hypotheses and enhance knowledge of these protein complexes within the proper cellular context.

    In any case, to understand their choice better, authors should provide a table (in the main text) summarizing the proteins they actually analyze and discuss in more detail in their models. This would allow a reader to see which proteins are considered essential and which ones are missing. I would organize this by function / pathway / process, so these proteins are listed in a functional context.

    We added Table 1 in the main text, listing all interactions described in the text. Table 1 includes the proteins involved in each complex, the ipTM score of the interaction, whether a PDB code is available for comparison and the functional classification of the interaction.

    With regard to docking, please also discuss why you focus on iPTM, as there are other derived metrics from AF2 scores, such as pdockq based on if_plddt (e. g. Bryant et al, 2022), as well as external metrics to AF2 (physics-based methods such as Rosetta). Another option may be a modified versions of AF2 multimer, such as AFSample, which produces a greater diversity of models, allowing for more "shots on goal" and ultimately a higher success rate, assuming one has a reliable QC filter (I wonder how those compares to iPTM).

    We did not use AFsample because is a very expensive computational approach that would require too many resources for the batch prediction of more than 1.400 complexes. AFsample generates 240x models, and including the extra recycles, the overall timing is around 1,000x more costly than the baseline. However, we acknowledge that using other metrics can be useful to further evaluate our models. Hence, we investigated how pDockQ and pDockQ2 metrics compare with ipTM score. We observed that pDockQ hardly correlates with ipTM (R = 0.328) whereas the improved metric pDockQ2 correlates much better (R = 0.649). All complexes described in the manuscript, which have an ipTM score higher than our threshold (0.6), have also a pDockQ2 score higher than 0.23, except for six interactions that have a lower pDockQ2 score. However, these scores improve when the interactions are modeled with accessory proteins in the complex. __This somehow suggests that the ipTM metric better captures binary interactions when these are excluded from their context. __It is possible however, that pDockQ scores are better in discriminating false positive interactions than ipTM scores. Based on the strong correlation between the two metrics and the observation that ipTM may better capture binary interactions, we decided to keep our method in the manuscript. Other authors have employed analogous ipTM thresholds to model bacterial interactions (e.g., O’Reilly et al., 2023). Notwithstanding, we also included pDockQ and pDockQ2 metrics in Supplementary File 1, so readers can evaluate complexes based on these metrics.

    Minor comments:

    1, 3rd last line: "the essential interactome is a potentially powerful strategy to [...] identify new targets for discovering new antibiotics"

    Figures and figure legends need to be explicit which species is represented (ideally with a Uniprot ID) and which structure was predicted by alphafold and which one has an experimental structure. Known structures should be indicated in a table, as suggested above.

    Figure 5: LptF is too dark when printed, so a lighter color may be better.

    Figure 6: The cryoEM and alphafold structures look quite different, so please discuss discrepancies between them (in terms of prediction or cryEM modeling). A schematic may be helpful to illustrate the differences in more clarity.

    Figure 7: LolC is also too dark when printed. Make lighter.

    Maybe in some cases it may be worthwhile looking at Consurf structures to see if the predicted inferfaces are indeed more conserved than the non-conserved parts.

    We thank the reviewer for his/her insightful feedback on our manuscript. We have addressed all these comments as follows:

    1. The statement on page 1 was revised as suggested.
    2. We revised all figure legends to include the Uniprot IDs, and distinguish between predicted and experimental structures. We also included Table 1 and Supplementary File 1 for known structures.
    3. We adjusted the colors in Figures 5 and 7 to enhance print visibility.
    4. We provided a schematic to illustrate discrepancies between cryoEM and AlphaFold structures in Figure 6c.
    5. We used Vespa to highlight conserved interfaces in the complexes described in the manuscript, as suggested. The figures displaying the conservation of interfaces in the complexes are now depicted in Supplementary Figure 2. A comparison between interface and surface conservation can be found in Figure 1f.

    The main significance of this study is its potential use for a better understanding of the protein complexes described in more detail (and the fact that alphafold can be applied in a similar fashion to many other complexes). This is why the individual sections need to be evaluated to process-specific experts (disclaimer: I have only worked on some of the complexes, but I am not an expert on any of them). I wonder if it would make more sense to break out some of the sections on individual complexes into separate papers, and then discuss them in more detail and with more context from previous studies. Complexes such as the divisome have a huge body of literature and it may be worth reviewing which structures are known and which ones are not. However, the dynamic and labile nature of these complexes have made it difficult for both crystallography as well as modeling to get a good structural understanding, but some of the models proposed here may be useful for overcoming some of these hurdles.

    We appreciate the reviewer's suggestion. While we acknowledge the complexity of some of the individual complexes, such as the divisome, and the wealth of existing literature, we believe that the current manuscript provides a valuable comprehensive view on how AF2 can be used to predict essential protein complexes in bacteria. In our opinion, dividing the manuscript in separate pieces might dilute its scope. Nonetheless, we are exploring in our laboratory the interactions detailed in the manuscript, aiming to further expand the knowledge on these important complexes and their potential as targets for new antimicrobials.

    References:

    Bai J, Dai Y, Farinha A, et al. Essential Gene Analysis in Acinetobacter baumannii by High-Density Transposon Mutagenesis and CRISPR Interference. J Bacteriol. 2021; 203(12):e0056520.

    Evans R, O’Neill M, Pritzel A, et al. Protein complex prediction with AlphaFold-Multimer.

    bioRxiv. 2021; 2021.10.04.463034.

    Huang Y, Wuchty S, Zhou Y, Zhang Z. SGPPI: structure-aware prediction of protein-protein interactions in rigorous conditions with graph convolutional network. Brief Bioinform. 2023; 24(2):bbad020

    Macho Rendón J, Rebollido-Ríos R, Torrent Burgas M. HPIPred: Host-pathogen interactome prediction with phenotypic scoring. Comput Struct Biotechnol J. 2022; 20:6534-6542.

    O'Reilly FJ, Graziadei A, Forbrig C, et al. Protein complexes in cells by AI-assisted structural proteomics. Mol Syst Biol. 2023; 19(4):e11544.

    Potvin, E., Lehoux, D.E., Kukavica-Ibrulj, I., et al. In vivo functional genomics of Pseudomonas aeruginosa for high-throughput screening of new virulence factors and antibacterial targets. Environmental Microbiology. 2003; 5: 1294-1308.

    Wang N, Ozer EA, Mandel MJ, Hauser AR. Genome-wide identification of Acinetobacter baumannii genes necessary for persistence in the lung. mBio. 2014; 5(3):e01163-14.

    Yin, R, Feng, BY, Varshney, A, Pierce, BG. Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Science. 2022; 31(8):e4379.

  2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #3

    Evidence, reproducibility and clarity

    Summary:

    Gómez-Borrego & Torrent-Burgas selected and modelled 1089 interactions between "essential" proteins in bacteria and generated 115 what they call "high-accuracy" models (using alphafold2). Some of the models potentially provide new insight into structure-function relationships of various biological processes and thus may serve as basis for further exploration.

    Major comments

    Methods

    The selection of "essential" interactions is a bit arbitrary, given that their main criterion for selection is that both proteins are essential. Unfortunately, it's not always clear where the essential protein data is coming from. Authors cite Mateus et al. (ref 15) as source for E. coli, but I don't see an explicit list of essential genes in this paper (nor its supplement). For Pseudomonas the citation doesn't contain author information and for Acinetobacter essentiality only seems to refer to "essentiality" in the lung.

    As a minimum, the author should provide a table with summary statistics for the essential proteins they are using, as this is the basis for the whole paper. Such a table should include the names of the species, the number of genes that are considered as essential, a very brief characterization of how essentiality was determined and the source for this information. For instance, are the genes listed in the Supplementary File congruent with the genes in the Database of Essential Genes (DEG) for these organisms? Finally, authors should indicate in that table which (essential) protein pairs are conserved across species, as this is another one of their selection criteria. Conservation is not necessary for an essential interaction, but it certainly makes it more likely.

    Author should also state whether they have verified that none of the random pairs are in the positive set.

    This is also relevant because authors "retrieved all high-confidence PPIs between these proteins from the STRING database" which provides compound scores for interactions but that has often little to do with physical interactions (given that the scores factor in co-expression and several other criteria). In fact, I find STRING scores difficult to interpret for that very reason.

    The authors "reasoned that a given interaction would only be essential if and only if both proteins forming the complex are essential" - this sounds reasonable but doesn't capture synthetically lethal (genetic) interactions, that is, interactions between two proteins that are both non-essential but are essential in combination. Admittedly, I don't have a number of how many such cases exist, but there are such cases in the literature (e.g. Hannum et al. 2009, PLoS Genet 5[12]: e1000782, for yeast, or Babu et al. 2014 PLoS Genet 10[2]: e1004120, for E. coli).

    Apart from that, one could question the selection method more generally, given that for a biological process always essential and non-essential proteins work together, so I wonder why the authors didn't include additional proteins known to be involved in specific processes as this could make their predictions much more biologically meaningful.

    In any case, to understand their choice better, authors should provide a table (in the main text) summarizing the proteins they actually analyze and discuss in more detail in their models. This would allow a reader to see which proteins are considered essential and which ones are missing. I would organize this by function / pathway / process, so these proteins are listed in a functional context.

    With regard to docking, please also discuss why you focus on iPTM, as there are other derived metrics from AF2 scores, such as pdockq based on if_plddt (e. g. Bryant et al, 2022), as well as external metrics to AF2 (physics-based methods such as Rosetta).

    Another option may be a modified versions of AF2 multimer, such as AFSample, which produces a greater diversity of models, allowing for more "shots on goal" and ultimately a higher success rate, assuming one has a reliable QC filter (I wonder how those compares to iPTM).

    These details are required to make the study truly transparent and reproducible.

    Results

    Given the methodological caveats given above, some of the results are certainly convincing and interesting to a broader readership.

    However, since their models are predictions, it would be important to provide some guidance on which interactions are the highest-scoring and thus the most promising for further validation. I would thus include a list of interactions for each functional group and their scores. This would be more useful than the rather difficult to interpret Figure 2 (even though it looks nice - or just add a table and leave Figure 2). Such a table could (and should) also include other data, such as references that support those top-ranking (but still unknown) interactions, or which structure are already known.

    Minor comments

    P. 1, 3rd last line: "the essential interactome is a potentially powerful strategy to [...] identify new targets for discovering new antibiotics"

    Figures and figure legends need to be explicit which species is represented (ideally with a Uniprot ID) and which structure was predicted by alphafold and which one has an experimental structure. Known structures should be indicated in a table, as suggested above.

    Figure 5: LptF is too dark when printed, so a lighter color may be better.

    Figure 6: The cryoEM and alphafold structures look quite different, so please discuss discrepancies between them (in terms of prediction or cryEM modeling). A schematic may be helpful to illustrate the differences in more clarity.

    Figure 7: LolC is also too dark when printed. Make lighter.

    Maybe in some cases it may be worthwhile looking at Consurf structures to see if the predicted inferfaces are indeed more conserved than the non-conserved parts.

    Significance

    The main significance of this study is its potential use for a better understanding of the protein complextes described in more detail (and the fact that alphafold can be applied in a similar fashion to many other complexes).

    This is why the individual sections need to be evaluated to process-specific experts (disclaimer: I have only worked on some of the complexes but I am not an expert on any of them).

    I wonder if it would make more sense to break out some of the sections on individual complexes into separate papers, and then discuss them in more detail and with more context from previous studies. Complexes such as the divisome have a huge body of literature and it may be worth reviewing which structures are known and which ones are not. However, the dynamic and labile nature of these complexes have made it difficult for both crystallography as well as modeling to get a good structural understanding, but some of the models proposed here may be useful for overcoming some of these hurdles.

  3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #2

    Evidence, reproducibility and clarity

    This study attempts to identify the 'essential interactome' through combining information in presence/absence genomics across bacteria, information in the STRING database, and predictions from alpha-fold. Overall, the strategy is clear, and I do not have concerns about reproducibility and clarity.

    Significance

    General Assessment:

    Strengths: Clever approach to get at the essential interactome.

    Weaknesses: Putative impact. It is clear why understanding which interactions are present are important. But even as the authors suggest, interactions are dynamic and there are plenty of other tools that people could use to find interactions (including AA Coev that the authors themselves cite). The counter argument the authors bring up is the high false positive rate of interactions that is solved by this method. While true, the stringency criteria for what constitutes an interaction in this paper is remarkably high: each protein within the interaction needs to be essential, and needs to have a high confidence score in STRING, and then there is a hyperparameter that dictates the level at which AlphaFold 2 is providing confident answers. In this sense, this is less about an 'essential' interactome, and more about an interactome that is present with the highest true positive rate (trading off with the ability to discover new interactions at a reasonable breadth).

    Advance: I don't know of too many studies that use AlphaFold 2 in this way. This was clever. However, there are plenty of studies that use phylogenomic information to infer interactions. In this sense, the core idea of the paper is not intrinsically novel.

    Audience: specialized. Overall, I do feel this would be worth publishing as an expose of AF2 is capable of. I'm not sure of the impact it will have on researchers however.

    Field of expertise: Statistical genomics.

  4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #1

    Evidence, reproducibility and clarity

    The paper provides models of essential complexes formed in bacteria. These models have been predicted by AlphaFold2 and in some of the models, information from existing experimental structures is utilized. The predicted models have been calculated based on standard workflow procedures which are explained in detail and can be reproduced by others. The figures are informative and clear.

    Suggestions for improvement:

    • a. The PDB accession codes of the experimental structures should be provided
    • b. A comparison of the predicted models with the experimental structures should be provided (e.g. same orientation, superposition). In Fig. 6 for example, a figure with superposition or use of the same orientation would be more informative.

    Significance

    The paper will certainly generate many hypotheses based on the predicted models. In this respect, it would be useful for a wide audience in the bioscience field. However, the discussed models will need experimental verification by various techniques, such as X-ray crystallography, cryo-EM, SAXS, and structural proteomics. A more thorough analysis of the literature may help to improve the paper in this respect.