A metagenomics pipeline reveals insertion sequence-driven evolution of the microbiota

Joshua M. Kirsch
Andrew J. Hryckowian
Breck A. Duerkop

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

No abstract available

Version published to 10.1016/j.chom.2024.03.005
May 1, 2024
Arcadia Science
Feb 12, 2024

Interesting question! We plotted the number of insertions found in Prevotella (https://github.com/joshuakirsch/arcadia-images/blob/main/prevotella%20insertions%20per%20fraction%20reads.png) and Escherichia (https://github.com/joshuakirsch/arcadia-images/blob/main/escherichia%20insertions%20per%20fraction%20reads.png) against the fraction of total reads in the metagenome for the respective genus. Both graphs show a significant relationship between the number of insertions and the fraction of total reads (p < 1e-8), suggesting that the higher rates of Prevotella and Escherichia insertions found in MDG samples is due to higher prevalence of these genera.

Read the original source
Arcadia Science
Feb 12, 2024

Thanks for the insightful comment. We went back through one of the clades with intermixed IS family transposases (shown in red circle (https://github.com/joshuakirsch/arcadia-images/blob/main/messy%20node.png) and investigated some of the mixed elements. The matches to IS family transposases from ISfinder are at times of low confidence for the IS630 and IS1182 transposases, so these might be “close calls” or the best possible family annotation for these elements. We also double-checked the alignment between one of the IS21 transposases and the IS630 and IS1182 transposases. This alignment is of low confidence as well, suggesting that this clade may not be the best representation of the sequence diversity. A different clustering method, such as MMSeqs, may provide better resolution of the transposases. We will consider performing this …

Thanks for the insightful comment. We went back through one of the clades with intermixed IS family transposases (shown in red circle (https://github.com/joshuakirsch/arcadia-images/blob/main/messy%20node.png) and investigated some of the mixed elements. The matches to IS family transposases from ISfinder are at times of low confidence for the IS630 and IS1182 transposases, so these might be “close calls” or the best possible family annotation for these elements. We also double-checked the alignment between one of the IS21 transposases and the IS630 and IS1182 transposases. This alignment is of low confidence as well, suggesting that this clade may not be the best representation of the sequence diversity. A different clustering method, such as MMSeqs, may provide better resolution of the transposases. We will consider performing this type of analysis is an updated version of our preprint.

Read the original source
Arcadia Science
Feb 8, 2024

Thank you!

Read the original source
Arcadia Science
Feb 8, 2024

Thank you!

Read the original source
Arcadia Science
Feb 8, 2024

We didn’t see any strong correlation in the inactivation of a tonB receptor during our crassvirales challenge, but this is a really interesting point to consider in the future.

Read the original source
Arcadia Science
Feb 8, 2024

There are many more IS elements present in the genomes of these individuals that do not get picked up by our pipeline. The data we present here are heterogenous IS element alleles, where we can find evidence of both an IS inserted allele and a wild type allele for the same gene. There are likely many heterogeneous alleles we are missing due to low evidence and detection power as well as IS element insertions that assemble with contigs and achieve 100% coverage at the insertion position.

Read the original source
Arcadia Science
Feb 8, 2024

We think it is the latter, due to high rates of transposase homology between our database and ISFinder.

Read the original source
Arcadia Science
Feb 8, 2024

We suspect that essential genes are likely underrepresented in our dataset as loss of these genes is lethal for the host.

Read the original source
Arcadia Science
Feb 8, 2024

Thank you for this great idea! This could certainly provide better resolution of the strain variation in contigs and we will consider this for a future version of pseudoR.

Read the original source
Arcadia Science
Feb 8, 2024

Good point! We will update this.

Read the original source
Arcadia Science
Feb 8, 2024

pseudoR and OASIS are designed for different purposes. OASIS finds new IS elements de novo in whole genome sequences or metagenomic contigs. pseudoR finds heterogeneous IS alleles (where genes in a population exist with and without IS element insertions) in previously assembled data.

Read the original source
Arcadia Science
Feb 8, 2024

We estimate there are insertions in 477 plasmids and in 70 lytic phages in ITA, JPN, and MDG individuals combined. We believe that plasmids, phages, and conjugative transposons are key vectors for the spread of IS elements between bacteria.

Read the original source
Arcadia Science
Feb 8, 2024

We just cleaned up the extra files. In a future version, we will move the IR blast database and other scripts to separate folders. Thanks for the suggestion.

Read the original source
Arcadia Science
Oct 13, 2023

Specifically, for Bt there was an enrichment of ISOSDB412 insertions in susC-D/tonB and EPS biogenesis genes, whereas Bf acquired an abundance of insertions in susC-D/tonB genes

this is a super cool validation!

Read the original source
Arcadia Science
Oct 13, 2023

tonB

TonB is also associated with siderophore uptake/ iron acquisition. Iron availability is known to impact the growth of potentially pathogenic bacteria in the gut. Wondering if this could be related to fitness effects of IS in these genes

other question- is the tonB genomically co-localized with these sus genes?

Read the original source
Arcadia Science
Oct 13, 2023

Mobilome genes

Did you ever see IS elements on phages (or plasmids) themselves? The phages might be transporting the elements around the community and could explain some of the community differences is IS element distribtions. I know that in the lab phages can get IS elements, and wild plasmids sometimes have them too, curious if this is a common enough thing for you to spot!

Read the original source
Arcadia Science
Oct 13, 2023

susC, susD, or tonB receptor genes

tonB is a phage receptor - I wonder if the IS mutants have a fitness advantage due to phage resistance

Read the original source
Arcadia Science
Oct 13, 2023

A barrier to studying IS elements in such complex environments results from imperfect methodologies for measuring in situ IS element dynamics. This stems from the poor recovery of multi-copy genes with repetitive sequences by short-read assemblers, leading to fragmented assemblies where IS elements are either absent or become break-points between contigs (20).

This is a really excellent point - I really like your study design and your motivation for studying IS diversity.

If you are looking to expand your database or your approaches more - one thought I had is that spacegraphcats could be helpful. Its explicitly designed to work of assembly graphs and so might be able to catch IS elements that are stuck in tangled regions of the graph https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4

Read the original source
Arcadia Science
Oct 13, 2023

d much higher abundance of IS elements associated with pathobionts including Escherichia and Prevotella species (

is there any way to normalize this statement relative to taxonomic abundance of these organisms in the MDG samples vs the others? Or state that there was no difference in the abundance of Escherichia/Prevotella in MDG samples vs others if that is the case?

Read the original source
Arcadia Science
Oct 13, 2023

Some individuals had few detectable IS element insertions while others had over 100 unique IS element insertions (Fig. 2B)

This was suprising to me that some people had so few IS sequences- i would have expected the number of IS sequences across a full metagenome to be much higher. Do you interpret this to mean that that there is a still a lot of room to grow the IS database? My default expectation would be that a metagenome would have thousands of IS sequences, given their ubiquity as MGEs.

Read the original source
Arcadia Science
Oct 13, 2023

The ISOSDB also has a wide range of transposases representing multiple IS families

Without knowing a lot about how IS families are designated, I was a little confused by this phylogenetic tree. My naive assumption would have been that all transpoases from an insertion sequence family would form a clade together - so you would end up with a tree that resolves out the 8 different familes as clades. However, there is a lot of intermixing of these families, suggesting that the transposon protein sequences themselves dont from robust clades.

It might be helpful in the text to explain a bit more about the different families and how they are determined.

It would also be helpful to explain what is driving the patterns on the tree, since it from family level differences.

Read the original source
Arcadia Science
Oct 13, 2023

an IS element termini, the termini

should the 'an' be removed here since termini is plural?

Read the original source
Arcadia Science
Oct 13, 2023

pseudoR pipeline that utilizes ISOSDB to identify IS element insertions

you previously referred to OASIS as being a rigorously tested tool for high-throughput identification of multiple genomes. Since this was mentioned by name earlier, can you specify what advantages pseudoR has over OASIS specifically vs. just in general over previous tools?

Read the original source
Arcadia Science
Oct 13, 2023

We used this dataset to evaluate whether we could identify similar fitness determinants as iORFs within the natural Bacteroides population within the intestine.

This is a really clever way to connect fitness data from mutagenesis studies in pure cultures to their role in the natural environment!

Read the original source
Arcadia Science
Oct 13, 2023

This demonstrates that the ISOSDB has an abundance of novel IS element nucleotide sequence diversity.

This is very exciting! I'm wondering whether this additional diversity corresponds to new families of IS elements, or expanded diversity within known families?

Read the original source
Arcadia Science
Oct 13, 2023

Gene classes targeted by IS elements are primarily metabolic, cell surface, and mobile genetic element accessory genes.

are there any gene classes that seem very underrepresented for being targeted by IS elements and if so is there a reasonable hypothesis for why?

Read the original source
Arcadia Science
Oct 13, 2023

The ISOSDB and pseudoR pipeline is freely available at https://github.com/joshuakirsch/pseudoR.

I really appreciate you putting your code on Github! And for specifying the dependency versions. The documentation looks really nice. So awesome. One suggestion- it might be nice to organize yours scripts/ databases into folders to make the repo a little cleaner

Read the original source
Version published to 10.1101/2023.10.06.561241 on bioRxiv
Oct 9, 2023

The heterogeneous selection landscape of genome evolution in prokaryotes

This article has 5 authors:
1. Eugene Koonin
2. Sofiya Garushyants
3. Svetlana Karamycheva
4. Nash Rochman
5. Yuri Wolf
This article has no evaluationsLatest version Dec 12, 2025
Divergent Bacteriophages from Wastewater Reveal an Open Pan-Genome with No Shared Gene Families

This article has 4 authors:
1. Malihe Hamidzade
2. Kimia Sharifian
3. Seyed Jalal Kiani
4. Alieza Mohebbi
This article has no evaluationsLatest version Dec 19, 2025
Functional divergence of the gut microbiome associated with lifestyle and helminth infection in Indigenous Peninsular Malaysian

This article has 9 authors:
1. P'ng Loke
2. Soo Ching Lee
3. Mian Zi Tee
4. Zeyang Shen
5. Yi Xian Er
6. Redekar Neelam
7. Ken Cadwell
8. Yvonne Lim
9. Julia Segre
This article has no evaluationsLatest version Jan 23, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The heterogeneous selection landscape of genome evolution in prokaryotes

Divergent Bacteriophages from Wastewater Reveal an Open Pan-Genome with No Shared Gene Families

Functional divergence of the gut microbiome associated with lifestyle and helminth infection in Indigenous Peninsular Malaysian