A metagenomics pipeline reveals insertion sequence-driven evolution of the microbiota

This article has been Reviewed by the following groups

Read the full article

Listed in

Log in to save this article

Abstract

No abstract available

Article activity feed

  1. Interesting question! We plotted the number of insertions found in Prevotella (https://github.com/joshuakirsch/arcadia-images/blob/main/prevotella%20insertions%20per%20fraction%20reads.png) and Escherichia (https://github.com/joshuakirsch/arcadia-images/blob/main/escherichia%20insertions%20per%20fraction%20reads.png) against the fraction of total reads in the metagenome for the respective genus. Both graphs show a significant relationship between the number of insertions and the fraction of total reads (p < 1e-8), suggesting that the higher rates of Prevotella and Escherichia insertions found in MDG samples is due to higher prevalence of these genera.

  2. Thanks for the insightful comment. We went back through one of the clades with intermixed IS family transposases (shown in red circle (https://github.com/joshuakirsch/arcadia-images/blob/main/messy%20node.png) and investigated some of the mixed elements. The matches to IS family transposases from ISfinder are at times of low confidence for the IS630 and IS1182 transposases, so these might be “close calls” or the best possible family annotation for these elements. We also double-checked the alignment between one of the IS21 transposases and the IS630 and IS1182 transposases. This alignment is of low confidence as well, suggesting that this clade may not be the best representation of the sequence diversity. A different clustering method, such as MMSeqs, may provide better resolution of the transposases. We will consider performing this type of analysis is an updated version of our preprint.

  3. We didn’t see any strong correlation in the inactivation of a tonB receptor during our crassvirales challenge, but this is a really interesting point to consider in the future.

  4. There are many more IS elements present in the genomes of these individuals that do not get picked up by our pipeline. The data we present here are heterogenous IS element alleles, where we can find evidence of both an IS inserted allele and a wild type allele for the same gene. There are likely many heterogeneous alleles we are missing due to low evidence and detection power as well as IS element insertions that assemble with contigs and achieve 100% coverage at the insertion position.

  5. Thank you for this great idea! This could certainly provide better resolution of the strain variation in contigs and we will consider this for a future version of pseudoR.

  6. pseudoR and OASIS are designed for different purposes. OASIS finds new IS elements de novo in whole genome sequences or metagenomic contigs. pseudoR finds heterogeneous IS alleles (where genes in a population exist with and without IS element insertions) in previously assembled data.

  7. We estimate there are insertions in 477 plasmids and in 70 lytic phages in ITA, JPN, and MDG individuals combined. We believe that plasmids, phages, and conjugative transposons are key vectors for the spread of IS elements between bacteria.

  8. Specifically, for Bt there was an enrichment of ISOSDB412 insertions in susC-D/tonB and EPS biogenesis genes, whereas Bf acquired an abundance of insertions in susC-D/tonB genes

    this is a super cool validation!

  9. tonB

    TonB is also associated with siderophore uptake/ iron acquisition. Iron availability is known to impact the growth of potentially pathogenic bacteria in the gut. Wondering if this could be related to fitness effects of IS in these genes

    other question- is the tonB genomically co-localized with these sus genes?

  10. Mobilome genes

    Did you ever see IS elements on phages (or plasmids) themselves? The phages might be transporting the elements around the community and could explain some of the community differences is IS element distribtions. I know that in the lab phages can get IS elements, and wild plasmids sometimes have them too, curious if this is a common enough thing for you to spot!

  11. A barrier to studying IS elements in such complex environments results from imperfect methodologies for measuring in situ IS element dynamics. This stems from the poor recovery of multi-copy genes with repetitive sequences by short-read assemblers, leading to fragmented assemblies where IS elements are either absent or become break-points between contigs (20).

    This is a really excellent point - I really like your study design and your motivation for studying IS diversity.

    If you are looking to expand your database or your approaches more - one thought I had is that spacegraphcats could be helpful. Its explicitly designed to work of assembly graphs and so might be able to catch IS elements that are stuck in tangled regions of the graph https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4

  12. d much higher abundance of IS elements associated with pathobionts including Escherichia and Prevotella species (

    is there any way to normalize this statement relative to taxonomic abundance of these organisms in the MDG samples vs the others? Or state that there was no difference in the abundance of Escherichia/Prevotella in MDG samples vs others if that is the case?

  13. Some individuals had few detectable IS element insertions while others had over 100 unique IS element insertions (Fig. 2B)

    This was suprising to me that some people had so few IS sequences- i would have expected the number of IS sequences across a full metagenome to be much higher. Do you interpret this to mean that that there is a still a lot of room to grow the IS database? My default expectation would be that a metagenome would have thousands of IS sequences, given their ubiquity as MGEs.

  14. The ISOSDB also has a wide range of transposases representing multiple IS families

    Without knowing a lot about how IS families are designated, I was a little confused by this phylogenetic tree. My naive assumption would have been that all transpoases from an insertion sequence family would form a clade together - so you would end up with a tree that resolves out the 8 different familes as clades. However, there is a lot of intermixing of these families, suggesting that the transposon protein sequences themselves dont from robust clades.

    It might be helpful in the text to explain a bit more about the different families and how they are determined.

    It would also be helpful to explain what is driving the patterns on the tree, since it from family level differences.

  15. pseudoR pipeline that utilizes ISOSDB to identify IS element insertions

    you previously referred to OASIS as being a rigorously tested tool for high-throughput identification of multiple genomes. Since this was mentioned by name earlier, can you specify what advantages pseudoR has over OASIS specifically vs. just in general over previous tools?

  16. We used this dataset to evaluate whether we could identify similar fitness determinants as iORFs within the natural Bacteroides population within the intestine.

    This is a really clever way to connect fitness data from mutagenesis studies in pure cultures to their role in the natural environment!

  17. This demonstrates that the ISOSDB has an abundance of novel IS element nucleotide sequence diversity.

    This is very exciting! I'm wondering whether this additional diversity corresponds to new families of IS elements, or expanded diversity within known families?

  18. Gene classes targeted by IS elements are primarily metabolic, cell surface, and mobile genetic element accessory genes.

    are there any gene classes that seem very underrepresented for being targeted by IS elements and if so is there a reasonable hypothesis for why?

  19. The ISOSDB and pseudoR pipeline is freely available at https://github.com/joshuakirsch/pseudoR.

    I really appreciate you putting your code on Github! And for specifying the dependency versions. The documentation looks really nice. So awesome. One suggestion- it might be nice to organize yours scripts/ databases into folders to make the repo a little cleaner