Genomic diversity of novel strains of mammalian gut microbiome derived Clostridium XIVa strains is driven by mobile genetic element acquisition

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Despite advances in sequencing technologies that enable a greater understanding of mammalian gut microbiome composition, our ability to determine a role for individual strains is hampered by our inability to isolate, culture and study such microbes. Here we describe highly unusual Clostridium XIVa group strains isolated from the murine gut. Genome sequencing indicates that these strains, Clostridium symbiosum LM19B and LM19R and Clostridium clostridioforme LM41 and LM42, have significantly larger genomes than most closely related strains. Genomic evidence indicates that the isolated LM41 and LM42 strains diverge from most other Clostridium XIVa strains and supports reassignment of these groups at genus-level. We attribute increased C. clostridioforme LM41 and LM42 genome size to acquisition of mobile genetic elements including dozens of prophages, integrative elements, putative group II introns and numerous transposons including 29 identical copies of the IS66 transposase, and a very large 192 Kb plasmid. antiSmash analysis determines a greater number of biosynthetic gene clusters within LM41 and LM42 than in related strains, encoding a diverse array of potential novel antimicrobial compounds. Together these strains highlight the potential untapped microbial diversity that remains to be discovered within the gut microbiome and indicate that, despite our ability to get a top down view of microbial diversity, we remain significantly blinded to microbe capabilities at the strain level.

Article activity feed

  1. Thank you for your submission, we have now received two comprehensive reviews. Though the reviewers comments are significant and extensive, i have decided to offer an option to majorly revise your manuscript as i feel that these issues can indeed be addressed with additional time. Reviewers highlight significant concerns surrounding the context of taxonomy (Clostridium vs. Enterocloster), data analysis and availability and deposition of the correct data. It is recommended that you engage with an experienced bioinformatician to address some of the data analysis comments.

  2. Comments to Author

    The authors have isolated four strains (2 each belonging to two known species) with interesting genomic properties. There is interest and novelty in the genomic features described (which constitutes the bulk of the manuscript, from line 210 to end). I have focused my review on the taxonomic aspects, which I am better placed to judge. The phylogenomic analysis is likely sound, although the pdf image quality makes Figure 2 near impossible to read the labels in order to judge/interpret this Figure. Also, it would be helpful if the genome data derived from type strains could be labelled in Fig 1 and Fig 2. However, the taxonomic interpretation/argument is a mess as there are multiple problems with the imprecise application of outdated or inappropriate classifications in the manuscript (or at very least failure to explain the relevant taxonomic history). Indeed, it seems as though the authors are attempting to assign novelty to their findings for the taxonomy of the 4 strains isolated by assigning species from Clostridium XIVa to novel genera when this has already been done! For example, the extensive classification/reclassification of genomes from Clostridium XIVa representatives that has already been addressed by GTDB is not discussed in the manuscript. Problems include: Line 27-29 "Genomic evidence indicates… supports reassignment of these groups [Clostridium XIVa] at genus-level" - this work has already begun but isn't adequately reviewed and incorporated into the analysis presented. Line 64 "Clostridium (a member of the Lachnospiraceae family)" - currently accepted classifications (e.g. LPSN, NCBI, GTDB) place Clostridium in the family Clostridiaceae. Perhaps the sentence should read "Clostridium XIVa (a member of…)" as this would make more sense and could lead to an explanation that Clostridium XIVa encompasses members of multiple Lachnospiraceae genera, not all formally named yet, but including Enterocloster Haas and Blanchard 2020. Lines 71 and 72 - why refer to 'Clostridium bolteae' and 'Clostridium clostridioforme' by their old names throughout, rather than basing the analysis on the accepted improved classification of these species as members of Enterocloster? Use of the old names makes sentences such as "vary significantly among Clostridia with the larger genomes of certain Lachnospiraceae species" imprecise and confusing - the comparison meant seems to be of Enterocloster genomes to "certain Lachnospiraceae species" (which?). Why not explain that these species, previously placed in Clostridium XIVa, are currently classified in Enterocloster (family Lachnospiraceae) and then base the rest of the text on that classification? Line 79 "increases in C. symbiosum" - this is the first main text reference to this species. It would be clearer if it was first reviewed that this species is better named as EITHER 'Lachnoclostridium' symbiosum (family Lachnospiraceae, as per the classification of Yutin and Galperin 2013; even though the genus name is not validly published, this is the taxonomy used in NCBI, https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1512) OR alternatively named using the GTDB taxonomy i.e. 'Otoolea symbiosa' (family Lachnospiraceae). Although this reclassification does not seem to have been published yet, It is still notable and the genus name itself is validly published. Whichever of these classifications is preferred, it is clear that the species doesn't belong in Clostridium and that this is well established (the close relationship of 'C. symbiosum' to Otoolea even being mentioned in the genus description, https://doi.org/10.1016/j.chom.2022.09.011). Line 172 "implying that the existing Lachnoclostridium genus classification may benefit from genome-informed taxonomy revision" - why not discuss the extensive reclassifications already evidenced in GTDB and in studies such as Haas and Blanchard 2020? See next comment. Lines 179-181 "We refer to these as the (1) C. bolteae/C. clostridioforme and (2) C. symbiosum groups respectively but note that this genomic evidence supports nomenclature reassignment of these groups at genus-level (Fig. 2)." Why not discuss that C. bolteae/C. clostridioforme have already been proposed to belong to Enterocloster in Haas and Blanchard 2020? And that C. symbiosum has already been placed in 'Lachnoclostridium' or Otoolea? (See comments above re line 79). NB, it is noted that Line 202-204 states that 'C. bolteae'/'C. clostridioforme' and 'C. symbiosum' "constitute distinct species groups and belong to the same genus" - are the authors therefore claiming that 'C. symbiosum' belongs with the other two species in Enterocloster (in contrast to the GTDB classification and other analyses). If so, this aspect needs a far more extensive analysis and full revision as appropriate, taking into account the already proposed reclassifications referred to above. However, this may be a simple lapse of clarity given the text in lines 633-634 "C. bolteae/C. clostridioforme are likely to correspond to the same genus, but that they are genomically quite distinct from C. symbiosum". Line 187 "The C. bolteae/C. clostridioforme grouping is divisible into four major subgroups". To what extent do these corelate with expected/known Enterocloster species level clusters based on established thresholds (e.g. ANI for species delineation)? NB There are currently 7 Enterocloster species with validly published names (https://lpsn.dsmz.de/genus/enterocloster). Similarly do groups within the "C. symbiosum group" represent a single or multiple species. Line 196 "a considerable amount of material that is not homologous to the other Clostridia in this study" - do you mean comparison with Clostridia, Clostridia XIVa sensu stricto or Lachnospiraceae? The latter two are the appropriate comparisons. Line 633 "C. bolteae/C. clostridioforme are likely to correspond to the same genus": as above - that genus is Enterocloster.

    Please rate the manuscript for methodological rigour

    Satisfactory

    Please rate the quality of the presentation and structure of the manuscript

    Very poor

    To what extent are the conclusions supported by the data?

    Partially support

    Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?

    No

    Is there a potential financial or other conflict of interest between yourself and the author(s)?

    No

    If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?

    Yes

  3. Comments to Author

    The manuscript describes the genomes of four strains of clostridia-like bacteria isolated from the murine gut, some of which have unusually large genomes compared with related microbes. It will be of interest to the research community, but there are several issues that need addressing. General comments - when discussing commensal microbes, use the term microbiota rather than microbiome. As such, modify the title and text throughout the manuscript. Inconsistency in naming of strains throughout manuscript - LM41 and LM41A, LM42 and LM42D. One of the major issues with this manuscript is the taxonomy is not up-to-date - something that must be addressed during revision of the manuscript. Use of Clostridium group classifications (e.g. Clostridium XIVa) was useful when identification of bacteria relied on 16S rRNA gene sequencing and phenotypic tests, but the move to genome-centric analyses of bacteria and the advent of the GTDB has greatly changed the taxonomic landscape. Hence, while I acknowledge the work of Collins et al. was useful for a very long time, it is redundant in the genomic area. As such, if Clostridium XIVa is to be referred to in the manuscript it should be in passing and with all taxa described using current nomenclature. The second major issue with the manuscript is the lack of detail provided with respect to methods, results and comparative analyses. Guidance is provided below as to how the authors can address these issues without (where possible) the need for knowledge of command-line bioinformatics. However, I have serious concerns over the size of genome quoted for LM41 and this may extend to other de novo-assembled genomes described in the manuscript (refer to my comments below). Nice to see the sequence read data already publicly available, but why have the assemblies not also been made publicly available? Lines 129 and 149 indicate the draft genomes are available, not just the sequence reads. As a reviewer I should not have to download and assemble reads myself to determine whether analyses reported on in the manuscript are correct. Also, it is not possible to clone the github for data associated with this publication. In lines 129 and 149, make clear full read data are only available from NCBI, as the fast5/fasta files for the MinION sequencing are not available from EBI. I've downloaded from NCBI (Illumina - SRR23875580; MinION - SRR23875579) the read files for LM41 and assembled the genome using UniCycler (default settings). I found this genome to represent a single circular chromosome of 5297804 bp. Sourmash analysis using the GTDB 214 database shows the genome to belong to [Clostridium] symbiosium. LM41.fasta,found,d__Bacteria,p__Bacillota_A,c__Clostridia,o__Lachnospirales,f__Lachnospiraceae,g__Clostridium_Q,s__Clostridium_Q symbiosum fastANI with the genome of the type strain of [Clostridium] symbiosum (ATCC 14940; GCF_000466485.1) shows the genome to share 98.69 % ANI with the type strain - that is, LM41 is an authentic strain of [Clostridium] symbiosum. In the manuscript it is consistently referred to as being C. clostridioforme. As such, I suggest the authors 1) check they've submitted the correct files to NCBI for each strain and/or 2) work with a bioinformatician familiar with microbial genomics and taxonomy to determine whether the sequence assemblies they've been provided with by MicrobesNG are correct. How were completeness and contamination of genomes assessed? No indication that CheckM2 has been used to confirm genomes of high quality. Title Change to "Genomic diversity of novel murine gut-microbiota-derived Lachnospiraceae strains is driven by acquisition of mobile genetic elements". Abstract Abstract and rest of manuscript: Write C. symbiosum as [Clostridium] symbiosum as this is known to be affiliated with Lachnoclostridium (Lachnospiraceae) by NCBI taxonomy and a novel genus (family Lachnospiraceae) by GTDB taxonomy, but it has not yet been formally reclassified. Clostridium clostridiodes should be named as Enterocloster clostridioformis throughout the abstract and manuscript. Line 26: Avoid use of word "significantly" in absence of P values - change to "much". Main text Line 63 and associated paragraph: The genus Clostridium belongs to the family Clostridiaceae, not Lachnospiraceae (List of Prokaryotic Names with Standing in Nomenclature and GTDB). Reframe the paragraph in the context of Clostridia, which Lachnospirales and Lachnospiraceae fall under. Microbes relevant to intestinal health would be Roseburia, Blautia, Anaerobutyricum, Anaerostipes and Coprococcus. My point being that not all Lachnospiraceae are "detrimental" to host/intestinal health, which is what the opening to the Discussion implies. Enterocloster is affiliated with Lachnospiraceae - C. bolteae is now Enterocloster bolteae. Use most up-to-date names in text to avoid conflating true Clostridium spp. with those belonging to different, genomically heterogeneous families. Line 96: change "significantly" to "greatly". Methods Lines 107-108: give exact composition of anaerobic atmosphere if cabinet used, or provide details of gas packs used to produce anaerobic environment. Line 109: change "media" to "medium". Lines 109-111, 120: change "16S rDNA" to "16S rRNA gene". 16S rDNA does not exist. Line 111: which database was used to confirm identity of isolates based on 16S rRNA gene sequence analysis? Lines 119-122: change to "Genomes of the four strains (LM19B, LM19R, LM41A and LM42D), …" Lines 126-128: is this correct? MicrobesNG require pellets of microbes to be sent for MinION/Illumina sequencing, not DNA. Line 128: details of the polishing and assembly approaches used by MicrobesNG to generate the hybrid assembly should be linked to, either using relevant citation or link provided by MicrobesNG. Particularly important because of how potentially unusual the genomes reported on in the manuscript. Line 135: indicate whether default or user-defined settings were used for the bioinformatics tools listed. Line 136: why was Glimmer used to define ORFs, when Prodigal would have done this during genome annotation? Line 140: the analyses of Lachnospiraeceae genomes needs to be updated. This work was done over four years ago - there have been a lot of changes since then. Also, the GTDB dataset (though not perfect) is far more reliable than NCBI. Line 145: BLAST analysis (BLASTN or BLASTP) - presumably via NCBI - is the worst means of manually annotating genomes. BLASTP analysis should be done using UniProt, or just annotate using PGAP. A quick way of updating the taxonomic affiliations and functional information for the strains described in the manuscript would be to run them through https://tygs.dsmz.de/ and http://protologger.de/. Results would need to be rewritten accordingly. Results Lines 156-220: rewrite in context of up-to-date taxonomic analysis and consideration of recommendations around describing taxa based on genomic data. See specific comments below. Based on 16S rRNA gene sequence data, how similar were the strains to existing members of the family Lachnospiraceae, particularly the type strains of most closely related species? Easiest way of determining this (with any confidence in results) would be using EzBioCloud (https://www.ezbiocloud.net/). Lines 171-173: the authors really do need to consult the literature. Taxonomic frameworks - largely driven by GTDB - have changed greatly since the work described in the manuscript was done. Indeed, Enterocloster was named in 2020 - a year after the comparative genomics work was undertaken. The sentence as written is something of a cop-out. Figure 1. Trees should be labelled with species names, as accession IDs give no indication of species affiliation. Following information should be added to the legend: how many orthologous single-copy genes are included in the analysis; what kind of phylogenetic tree is shown; what the scale bar represents. Table 1. Genome accessions, CDS, N50, rRNA and tRNA information should added for all genomes included in analyses. If information is available for genomes with respect to number of known plasmids, this should also be added to the table. Paragraph beginning at Line 166: what was the ANIm of the genomes of the four strains with the genomes of the type strains of Enterocloster clostridioformis and [Clostridium] symbiosum? Add this information to the text. Does ANIm confirm species affiliation based on recommendations of Chun et al. (https://pubmed.ncbi.nlm.nih.gov/29292687/)? Digital DNA-DNA hybridization data for closest relatives can be generated via TYGC to support findings (link provided above). Information on Figure 2 images (and image shown on github webpage) is unreadable: improve quality of images. Also increase font size for colour legend to make it easier to determine scale range. Legend to Figure 2 should have discussion of results removed. Figure 3. Remove analysis of Enterocloster bolteae from the figure and rest of manuscript (including Table 1). Focus only on the four strains of interest. Relabel tree labels to give full genome accessions - truncated in figures presented in paper. Clearly Phandango-generated images, but nothing in Methods or figure legend to indicate this. What are the blue lines in the figure showing? Needs to be explained in legend for readers unfamiliar with the field. Lines 237-238: You cannot talk about evolution of prophage tolerance. This is not an evolutionary study and you have no empirical evidence to demonstrate prophage tolerance. You can only state that the genomes encode more predicted complete prophages than other clostridia. Lines 238, 240, 261, 267, 363: remove "significantly", "significant". You have no statistical analyses upon which you can determine significance of results. Lines 262-264: How was this done? No information in Methods about assessing plasmid stability - neither growth conditions, frequency of passage nor PCR assessment of plasmid retention (or some other means of determining retention). Lines 266-269: how similar (% identity, % coverage, e value) was the 3824 aa of your strain to SpaA of Corynebacterium diphtheriae? Also give the name of your gene based on the Prokka-annotated plasmid sequence so readers will know which gene to look at if of interest to them. Line 275: what is the relevance of finding an enniatin-like antimicrobial encoded in the plasmid with respect to intestinal colonization? Could this just be an undescribed bacteriocin? Why was gutSMASH not used to look at the predicted primary metabolites for the strains? This would, at a minimum, show whether they have the potential to produce butyrate (beneficial to intestinal health). No consideration of how similar the prophages identified in this study are to known prophages. In addition, are they represented in PhageCloud database (https://phageclouds.ku.dk/; includes NCBI and metagenome-assembled viruses, and would allow identification of prevalence of prophages in intestinal datasets)? Table 3: indicate which (if known) how many prophages are plasmid-associated. Tables 4 and 5: predictions shown should be accompanied by % similarity to known elements. Discussion Lines 370-393: lack of citations to relevant literature with respect to discussions around prophages. Overall discussion limited and requires full consideration of data generated for the four novel strains.

    Please rate the manuscript for methodological rigour

    Poor

    Please rate the quality of the presentation and structure of the manuscript

    Satisfactory

    To what extent are the conclusions supported by the data?

    Partially support

    Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?

    No

    Is there a potential financial or other conflict of interest between yourself and the author(s)?

    No

    If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?

    Yes