Coincidence analysis of efflux pumps in the Escherichia coli pangenome
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Global rates of multi-drug resistance continue to rise, posing a significant challenge in the fight against antimicrobial resistance worldwide. Multi-drug efflux pumps are a significant contributor to this burden. Understanding connections between the diverse network of efflux pump genes, and their interactions with other genes, is important for understanding the landscape of multi-drug resistance and devising new interventions to mitigate it. To address this issue, we have investigated how efflux pumps and accessory genes covary across genomes. Using genomic data from 200 publicly available Escherichia coli isolates, we constructed a pangenome and performed coincidence analyses using the bioinformatic tool Coinfinder. On an aggregate level, efflux pumps formed more coincident gene pairs within the accessory genome than avoidant gene pairs. Genes with a functional annotation associated with ‘transport’ had the highest representation for both coincident and avoidant gene pairs. In contrast, when specifically considering gene pairs where both genes were related to efflux pumps, avoidant pairs were more frequent, though some coincident pairs were also found—primarily between subcomponents that make up tripartite pumps, as well as between subcomponents and single unit efflux pumps). Further exploration and understanding on the role the efflux pumps from the accessory genome play and how it expresses itself, would result in developing novel strategies to target increasingly resistant bacteria.
Article activity feed
-
The work presented here is clear, the arguments are well formed, and it is a useful addition to existing work on prokaryote pangenomics. The reviewers raise several points that I would like you to address. Please pay particular attention to Reviewer 1's points regarding the clarity of the methods. Whilst extra work is suggested by both reviewers, completion of this is not in my opinion necessary for publication. Rather, please consider the points and include a discussion of the constraints and limitations at the relevant points in the manuscript.
-
Comments to Author
1. Methodological rigour, reproducibility and availability of underlying data Overall, the authors have produced a thorough investigation using coincidence analysis to interrogate the E. coli pangenome for efflux pump associations. The authors have used appropriate tools for the analysis and have included sufficient information to facilitate reproduction of the analysis. I have some reservations regarding some of the packages used which may affect the outcome of this analysis: PROKKA: The authors have used this tool to re-annotate the genomes in the study. This is a sensible choice. The methods state that the authors have used default parameters, but it is not stated which annotation database was used, this needs to be included as this affects the annotation - much better results can be obtained using …
Comments to Author
1. Methodological rigour, reproducibility and availability of underlying data Overall, the authors have produced a thorough investigation using coincidence analysis to interrogate the E. coli pangenome for efflux pump associations. The authors have used appropriate tools for the analysis and have included sufficient information to facilitate reproduction of the analysis. I have some reservations regarding some of the packages used which may affect the outcome of this analysis: PROKKA: The authors have used this tool to re-annotate the genomes in the study. This is a sensible choice. The methods state that the authors have used default parameters, but it is not stated which annotation database was used, this needs to be included as this affects the annotation - much better results can be obtained using bespoke species-specific databases. The genomes used were obtained from refseq which should have annotation included and it would be good to show a comparison of the original annotations and those made by PROKKA as more complete annotation may be obtained which would influence the results greatly. ROARY: ROARY is a useful tool for the construction of pangenomes, however I have noticed certain problems with the interpretation of the ROARY output which I think may be relevant here or at least worth checking. When ROARY generated the pan genome it clusters genes into orthologs. When it does this it uses a similarity threshold which can (and often) causes very similar genes to be put into different clusters. The presence absence matrix produced by ROARY and in this case used for the downstream analysis is based on cluster membership which can erroneously suggest a much larger pangenome with much more variation in the accessory genome than is actually the case. To test this I would like the authors to produce a pangenome presence absence by similarity (eg BLAST or similar) using the pangenome reference file output by ROARY. This creates a presence absence result for each of the genes, this can then be compared to the ROARY output to confirm the results. The use of the ROARY output alone is not sufficient to perform this analysis. 2. Presentation of results All figures etc. are mostly all very well presented and explained. Figure 5 requires more explanation both in the legend and in the text of the article. It is a very nice figure, but I am unsure of the relevance and implications as these need to be stated and discussed. 3. How the style and organization of the paper communicates and represents key findings This paper is very well presented and follows a logical order, introducing concepts, expanding upon knowledge and explaining findings where necessary. The paper would benefit from tighter editing, but this is not vital. Also, more discussion of the impact of the different types of pumps or reduce the amount of the introduction and figures which deals with them. 4. Literature analysis or discussion The introduction is very full and well planned, could benefit from trimming down unnecessary topics. The discussion is logical in its conclusions and is well thought out but again could benefit from an edit for brevity. 5. Any other relevant comments This is a very nice study and could demonstrate some very nice results. I am very concerned about the limitations mentioned with the software used, but if the authors can show that these have been addressed this would be suitable for publication.
Please rate the manuscript for methodological rigour
Satisfactory
Please rate the quality of the presentation and structure of the manuscript
Good
To what extent are the conclusions supported by the data?
Strongly support
Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?
No
Is there a potential financial or other conflict of interest between yourself and the author(s)?
No
If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?
Yes
-
Comments to Author
Shahabuddin et al. present analysis of 200 E. coli genomes where they examine the association and dissociations of efflux pumps within the pangenome. They found that efflux pumps tend to have a larger number of dissociative gene interactions, particularly with other genes involved in 'transport'. The authors observe a collection of efflux pumps, predominantly found in P. aeruginosa, in their dataset which display association and dissociation with several metabolite transport genes. 1. Methodological rigour, reproducibility and availability of underlying data The methodology is sound. The analysis is understandably limited by computational resources leading the authors to use a random sampling approach. However, the relatively small sample size used (given the diversity of E. coli) limits the …
Comments to Author
Shahabuddin et al. present analysis of 200 E. coli genomes where they examine the association and dissociations of efflux pumps within the pangenome. They found that efflux pumps tend to have a larger number of dissociative gene interactions, particularly with other genes involved in 'transport'. The authors observe a collection of efflux pumps, predominantly found in P. aeruginosa, in their dataset which display association and dissociation with several metabolite transport genes. 1. Methodological rigour, reproducibility and availability of underlying data The methodology is sound. The analysis is understandably limited by computational resources leading the authors to use a random sampling approach. However, the relatively small sample size used (given the diversity of E. coli) limits the strength of the conclusions that can be drawn from this analysis. I would like to see the authors make this limitation more pronounced in their discussion and conclusion. To counteract this limitation the authors could repeat their analysis on repeated random samples of 200 genomes, allowing the authors to measure the reproducibility of their results. However, this is a significant amount of work and not necessary for the publication of this article. I am unclear on how the authors functionally grouped their genes. What methodology was used? Where did the functional categories definitions come from? How can this analysis be reproduced? A tool such as eggNOG-mapper (http://eggnog-mapper.embl.de/ and https://github.com/eggnogdb/eggnog-mapper ) would be useful for reproducible functional gene classification. 2. Presentation of results The results are well presented. Figure 2. Could the authors comment on the misplacement of some phylogroup A strains? Figure 5. Nodes are sized based on lineage independence, do larger nodes equate to more independence? Or less? Is it possible to highlight where efflux pumps are in these networks? 3. How the style and organization of the paper communicates and represents key findings The paper is well organised with a clear and logical flow. Key findings are communicated well. 4. Literature analysis or discussion The introductory literature analysis is well written and explains the context. However, there are also a couple of instances in the introduction and discussion that I would like the authors to address. L74 - 76 "Overexpression of efflux pumps may render bacteria less adaptable to changing environments, limiting their ability to evolve and survive in challenging environments (12)" I do not follow the logic here. Do the authors mean that increased expression of efflux pumps poses a fitness cost that in turn reduces bacterial survival? Could the authors please clarify? L274 - 275 "our coincidence analysis concerns accessory genes, finding a larger accessory genome potentially gives us more statistical power" I am not convinced this statement is accurate. The power to detect coincident genes is dependent on the number of times they can occur together, i.e. the number of genomes analysed. L300 - 302 "mexA/ttgA, mexB/ttgB and oprM/ttgC, which encode parts of the MexAB-OprM/TtgABC RND-family efflux pump (34,35). Synteny of these genes within our genomes suggests they were likely acquired during a single horizontal gene transfer event" An interesting observation and suggestion. A single horizontal gene transfer event implies that all isolates in this collection have acquired these genes via vertical inheritance. Coinfinder should correct for genes that co-occur due to vertical inheritance and have a strong phylogenetic signal. This implies that the phylogenetic signal for these genes is not strong enough for Coinfinder to remove the association, could the authors comment on this? Have the authors examined the pattern of these genes in their phylogenetic tree? Have the authors investigated whether these genes are located on chromosomal or plasmid contigs? L324 and 325 Authors state that acrF is not active in E. coli ST131 however the citation demonstrates acrF is not active in E. coli ST11 O157. 5. Any other relevant comments L143 typo "Insure" should be "ensure" L271 missing could "The differences in the core and accessory genes found here COULD be attributed" L279 extra 's' in sentence L296 missing to "Tend TO associate"
Please rate the manuscript for methodological rigour
Good
Please rate the quality of the presentation and structure of the manuscript
Very good
To what extent are the conclusions supported by the data?
Partially support
Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?
No
Is there a potential financial or other conflict of interest between yourself and the author(s)?
No
If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?
Yes
-
