A genome-phenome association study in native microbiomes identifies a mechanism for cytosine modification in DNA and RNA

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This work will interest researchers who want to explore the functional potential in metagenomes. The authors present a new computational method, MetaGPA, for performing enrichment analysis on cohorts of metagenomes. They then use this approach to identify an enzyme that can modify cytosines in DNA from natural bacteriophage populations. Though successful, the approach needs to improve in clarity and methodology to be both reproducible and of broader impact, as claimed.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Shotgun metagenomic sequencing is a powerful approach to study microbiomes in an unbiased manner and of increasing relevance for identifying novel enzymatic functions. However, the potential of metagenomics to relate from microbiome composition to function has thus far been underutilized. Here, we introduce the Metagenomics Genome-Phenome Association (MetaGPA) study framework, which allows linking genetic information in metagenomes with a dedicated functional phenotype. We applied MetaGPA to identify enzymes associated with cytosine modifications in environmental samples. From the 2365 genes that met our significance criteria, we confirm known pathways for cytosine modifications and proposed novel cytosine-modifying mechanisms. Specifically, we characterized and identified a novel nucleic acid-modifying enzyme, 5-hydroxymethylcytosine carbamoyltransferase, that catalyzes the formation of a previously unknown cytosine modification, 5-carbamoyloxymethylcytosine, in DNA and RNA. Our work introduces MetaGPA as a novel and versatile tool for advancing functional metagenomics.

Article activity feed

  1. Evaluation Summary:

    This work will interest researchers who want to explore the functional potential in metagenomes. The authors present a new computational method, MetaGPA, for performing enrichment analysis on cohorts of metagenomes. They then use this approach to identify an enzyme that can modify cytosines in DNA from natural bacteriophage populations. Though successful, the approach needs to improve in clarity and methodology to be both reproducible and of broader impact, as claimed.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  2. Reviewer #1 (Public Review):

    The authors present a new computational method, named MetaGPA, for performing enrichment analysis on cohorts of metagenomes. The method works similarly to GWAS, in that case and control groups are defined. They apply their computational method to the DNA extracted from multiple environmental metagenomes split into cohorts based on the presence or absence of cytosine modifications to the DNA. They uncover an enzyme that they show converts 5hmdC to 5cmdC. Much work has gone into enrichment analysis using individual genomes, but here they perform it on metagenomes. While the MetaGPA method shows promise, it was not fully described or characterized, and attempts to reuse the code failed.

  3. Reviewer #2 (Public Review):

    This manuscript describes a methodology in which bacteriophages are isolated from natural sources such as sewage and their DNA is extracted. The DNA is divided into two parts, the control DNA is treated using an enzyme cocktail to break it at every unmodified cytosine while the modified cytosines are protected in the "case" sample prior to its treatment that cleaves at the unmodified cytosines. The expectation is that organisms that modify most of their cytosines will survive the cleavage step and this unbroken DNA will be enriched in the subsequent amplification and NextGen sequencing steps. The sequencing reads are then compared between the two samples to find genes that are enriched in the case samples and the Pfam database is used to identify potential DNA base-modifying enzymes. Such a search revealed many amino acid sequence motifs associated with base-modifying enzymes and the presence of nearby thymidylate synthase gene was used to identify enzymes with carbamoyltransferase domains. One such enzyme is cloned, purified and biochemically characterized. The authors demonstrate that it transfers the carbamoyl moiety to the oxygen in 5-hydroxymethylcytosine in DNA, RNA, dCMP. The authors suggest that this methodology could be generalized to find other base modifying enzymes. While it is impressive that the investigators are able to find a new base-modifying enzyme in the absence of any prior sequence information or direct selection for the activity, I have several concerns about the methodology and its potential as a general search tool for base modifying enzymes.