Horizontal gene transfer and CRISPR targeting drive phage-bacterial host interactions and coevolution in pink berry marine microbial aggregates

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Bacteriophages (phages), viruses that infect bacteria, are the most abundant components of microbial communities and play roles in community dynamics and host evolution. The study of phage-host interactions, however, is made difficult by a paucity of model systems from natural environments and known and cultivable phage-host pairs. Here, we investigate phage-host interactions in the ”pink berry” consortia, naturally-occurring, low-diversity, macroscopic aggregates of bacteria found in the Sippewissett Salt Marsh (Falmouth, MA, USA). We leverage metagenomic sequence data and a comparative genomics approach to identify eight compete phage genomes, infer their bacterial hosts from host-encoded clustered regularly interspaced short palindromic repeats (CRISPR), and observe the potential evolutionary consequences of these interactions. Seven of the eight phages identified infect the known pink berry symbionts Desulfofustis sp. PB-SRB1, Thiohalocapsa sp. PB-PSB1, and Rhodobacteraceae sp. A2, and belong to entirely novel viral taxa, except for one genome which represents the second member of the Knuthellervirus genus. We further observed increased nucleotide variation over a region of a conserved phage capsid gene that is commonly targeted by host CRISPR systems, suggesting that CRISPRs may drive phage evolution in pink berries. Finally, we identified a predicted phage lysin gene that was horizontally transferred to its bacterial host, potentially via a transposon intermediary, emphasizing the role of phages in bacterial evolution in pink berries. Taken together, our results demonstrate that pink berry consortia contain diverse and variable phages, and provide evidence for phage-host co-evolution via multiple mechanisms in a natural microbial system.

IMPORTANCE

Phages (viruses that infect bacteria) are important components of all microbial systems, where they drive the turnover of organic matter by lysing host cells, facilitate horizontal gene transfer (HGT), and co-evolve with their bacterial hosts. Bacteria resist phage infection, which is often costly or lethal, through a diversity of mechanisms. One of these mechanisms are CRISPR systems, which encode arrays of phage-derived sequences from past infections to block subsequent infection with related phages. Here, we investigate bacteria and phage populations from a simple marine microbial community known as “pink berries” found in salt marshes of Falmouth, Massachusetts, as a model of phage-host co-evolution. We identify eight novel phages, and characterize a case of putative CRISPR-driven phage evolution and an instance of HGT between phage and host, together suggesting that phages have large evolutionary impacts in a naturally-occuring microbial community.

Article activity feed

  1. To assess the distribution of pink berry-associated bacteria and phages, genome-wide read coverages were analyzed for individual pink berry metagenomes

    It would be nice to provide a summary of what % of reads from the metagenome map to the genomes in Figure 3, and what % are unmapped. And also if are large portion are unmapped, if you have an idea what they are (eg run some taxonomic analysis on the unmapped reads)

  2. NARBL

    is this a commonly used tool/ are there better tools that might exist? Looking at the GitHub page associated with NARBL, the documentation seems extremely scarce and it makes me concerned about the reproducibility of using this

  3. Co-assembly of the three pink berry metagenomes yielded 184 contigs totaling 4.35 Mb in length

    Could you add a sentence here explaining how diverse pink berry microbial communities are? And a statement of whether this assembly size seems reasonable? Is it surprising that the total assembly is only 4.35 Mb (about 1 bacterial genome?) It might be nice to know what % of reads are represented in this assembly

  4. One such AMG was a darB-like antirestriction gene encoded on the Thiohalocapsa phage MD04 genome (Suppl. Data 1). Interestingly, darB has been shown to methylate phage DNA to resist host restriction modification (RM) systems (Iida et al., 1987; Iyer et al., 2017).

    Because auxiliary metabolic genes (AMGs) more commonly reflects instances where phages enhance bacterial energy metabolisms, you might want to consider calling these methylases "anti-defense" genes to more accurately reflect their proposed ecological role.

  5. 48 unique repeat sequences from four reference genomes for known pink berry-associated bacteria:

    This level of repeat diversity is much larger that I would have expected and implies that each reference genome has an extremely large number of independent CRISPR-Cas systems. Alternativley, the the repeat finder is erroneously detecting CRISPR repeats. Can you please add more information about what types of CRISPR Cas systems you are finding in each of these reference genomes to contextualize this repeat diversity?

  6. The remaining five genomes of interest lacked sufficient protein similarity for a connection in the vConTACT network, indicating that these phages represent novel and undescribed diversity.

    RefSeq is not a great representation of total viral diversity, making it difficult to evaluate viral novelty simply from Pink Berry phages not being closely related to RefSeq viruses.

    Could you compare instead to viruses MAGs from ecologically similar samples - like other marshes - to determine their relative novelty?

  7. 2,802 unique CRISPR spacer sequences

    This is a really high number of unique CRISPR spacers. CRISPR arrays in bacteria tend to be small-ish (less than 50 or so spacers) while archaea can be have larger arrays (less than 100 or so). Since there are only 4 different bacterial strains present, retrieving thousands of spacers would require a huge number of independent CRISPR-Cas systems in these strains, and/or extremely rapid CRISPR adaptation. Can you add more information to contextualize this finding?

  8. Moreover, although most host spacers matched to a single virus, two spacers from the Rhodobacteraceae host aligned to Thiohalocapsa phage MD04 and Desulfofustis phage MD02

    Can you clarify what % ID this match is?

  9. NARBL

    is this a commonly used tool/ are there better tools that might exist? Looking at the GitHub page associated with NARBL, the documentation seems extremely scarce and it makes me concerned about the reproducibility of using this

  10. Moreover, although most host spacers matched to a single virus, two spacers from the Rhodobacteraceae host aligned to Thiohalocapsa phage MD04 and Desulfofustis phage MD02

    Can you clarify what % ID this match is?

  11. 2,802 unique CRISPR spacer sequences

    This is a really high number of unique CRISPR spacers. CRISPR arrays in bacteria tend to be small-ish (less than 50 or so spacers) while archaea can be have larger arrays (less than 100 or so). Since there are only 4 different bacterial strains present, retrieving thousands of spacers would require a huge number of independent CRISPR-Cas systems in these strains, and/or extremely rapid CRISPR adaptation. Can you add more information to contextualize this finding?

  12. To assess the distribution of pink berry-associated bacteria and phages, genome-wide read coverages were analyzed for individual pink berry metagenomes

    It would be nice to provide a summary of what % of reads from the metagenome map to the genomes in Figure 3, and what % are unmapped. And also if are large portion are unmapped, if you have an idea what they are (eg run some taxonomic analysis on the unmapped reads)

  13. 48 unique repeat sequences from four reference genomes for known pink berry-associated bacteria:

    This level of repeat diversity is much larger that I would have expected and implies that each reference genome has an extremely large number of independent CRISPR-Cas systems. Alternativley, the the repeat finder is erroneously detecting CRISPR repeats. Can you please add more information about what types of CRISPR Cas systems you are finding in each of these reference genomes to contextualize this repeat diversity?

  14. The remaining five genomes of interest lacked sufficient protein similarity for a connection in the vConTACT network, indicating that these phages represent novel and undescribed diversity.

    RefSeq is not a great representation of total viral diversity, making it difficult to evaluate viral novelty simply from Pink Berry phages not being closely related to RefSeq viruses.

    Could you compare instead to viruses MAGs from ecologically similar samples - like other marshes - to determine their relative novelty?

  15. Co-assembly of the three pink berry metagenomes yielded 184 contigs totaling 4.35 Mb in length

    Could you add a sentence here explaining how diverse pink berry microbial communities are? And a statement of whether this assembly size seems reasonable? Is it surprising that the total assembly is only 4.35 Mb (about 1 bacterial genome?) It might be nice to know what % of reads are represented in this assembly

  16. One such AMG was a darB-like antirestriction gene encoded on the Thiohalocapsa phage MD04 genome (Suppl. Data 1). Interestingly, darB has been shown to methylate phage DNA to resist host restriction modification (RM) systems (Iida et al., 1987; Iyer et al., 2017).

    Because auxiliary metabolic genes (AMGs) more commonly reflects instances where phages enhance bacterial energy metabolisms, you might want to consider calling these methylases "anti-defense" genes to more accurately reflect their proposed ecological role.