Massively multiplex single-molecule oligonucleosome footprinting

This article has been Reviewed by the following groups

Read the full article

Abstract

Our understanding of the beads-on-a-string arrangement of nucleosomes has been built largely on high-resolution sequence-agnostic imaging methods and sequence-resolved bulk biochemical techniques. To bridge the divide between these approaches, we present the single-molecule adenine methylated oligonucleosome sequencing assay (SAMOSA). SAMOSA is a high-throughput single-molecule sequencing method that combines adenine methyltransferase footprinting and single-molecule real-time DNA sequencing to natively and nondestructively measure nucleosome positions on individual chromatin fibres. SAMOSA data allows unbiased classification of single-molecular 'states' of nucleosome occupancy on individual chromatin fibres. We leverage this to estimate nucleosome regularity and spacing on single chromatin fibres genome-wide, at predicted transcription factor binding motifs, and across human epigenomic domains. Our analyses suggest that chromatin is comprised of both regular and irregular single-molecular oligonucleosome patterns that differ subtly in their relative abundance across epigenomic domains. This irregularity is particularly striking in constitutive heterochromatin, which has typically been viewed as a conformationally static entity. Our proof-of-concept study provides a powerful new methodology for studying nucleosome organization at a previously intractable resolution and offers up new avenues for modeling and visualizing higher order chromatin structure.

Article activity feed

  1. ###Reviewer #3:

    The manuscript by Abdulhay, McNally and colleagues presents an effort to combine DNA modification detection and Pacbio sequencing to contribute to the growing body of methods designed to gain epigenome information at the chromatin fiber level, i.e. beyond existing short read NGS-chemistry constraints. They do so by leveraging micrococcal nuclease to cleave and help solubilize DNA, which they then treat with adenine methyltransferases to footprint nucleosomes; single-molecule adenine methylated oligonucleosome sequencing assay - SAMOSA. Fiber-level epigenetic information will be of great use to the field and is expected to answer many open questions that remain unanswered.

    However many of the claims made about the potential of the method are insufficiently supported by the data provided. It appears that additional data is required to support the conclusions made from SAMOSA with respect to existing chromatin information, such as signal differences as a function of transcription factor binding (see below).

    1. The authors should make an attempt to investigate where sequence bias influences a methylation call in their datasets. Clearly the pattern on the in vitro chromatinized template suggests that on average their methylated calls are correct. However, there appear to be clear positions in their chromatinized template datasets where this is not the case, i.e. lines in sup fig 5a representing methylation calls in unmethylated template DNA and unmethylated calls on fully methylated template DNA. Upon close examination, this also seems the case in the chromatinized template, with certain positions inflexibly methylated/unmethylated and at odds with the surrounding linker/nucleosome patterning (Fig1D). The authors should use Kmer analysis of methylated A's genome-wide to detect sequence bias in either the methyltransferase or sequencing platform.

    2. It seems reasonable that the clustered data by NRL estimate (fig 3) should correlate with existing measurements (i.e. MNase-seq). The authors should identify regions of the genome with strong enrichment for the seven clusters and compare this to nucleosome repeat length as can be estimated using conventional MNase measurements, i.e. the average distance between 5' mapping read positions across the genome (Valouev et al., 2011, Teif et al., 2012). Some agreement (for at least a few of these clusters with very regular nucleosomes) would strengthen the conclusions made by this approach, especially where there are irregular positioning patterns. Additionally, for these clusters the authors should display raw read alignment/methylation calls for SAMOSA at a few representative loci, where a sense of the raw data can be gleaned.

    3. The comparisons of SAMOSA at different TF bound regions is likely influenced by the fraction of actually TF-bound molecules present in the original cellular sample. For example, CTCF is known to occupy it's strong motifs in the majority of cells, while few other factors have such regular binding/residency (Kelly et al., 2012 NomeSeq data at CTCF sites). It seems reasonable that some cluster fractions should scale with the enrichment for the factor (for at least CTCF and REST, the strong binding/nucleosome positioners), especially those associated with chromatin accessibility at the motif (i.e. A-accessible, HA-hyper-accessible). The authors should try to illustrate this, as well as representative read alignments/methylation calls at a few loci where these signals are prevalent.

    4. The meta-plotted data seems noisy for most TFs profiled (Fig 4 A-L) and the authors should show that their replicates agree with each other in terms of the relative size of clusters and at the metaplot level. Similarly, the data shown in Figure 5 should be broken into replicates. It is difficult to know to what extent the differences quoted are quantifiable/reproducible. For example, in panel A the reported deviation seems quite large around the median to make strong claims: e.g. "In specific cases, we observed small effect shifts in the estimated median NRLs for specific domains-for example, a shift of ~5 bp (180 bp vs. 185 bp) in H3K9me3 chromatin with respect to random molecules..." This should also apply to the analysis done in Figure 5B and C, where it is difficult to get a sense of reproducibility from cluster size and the heatmap of Odds ratio and q-values.

  2. ###Reviewer #2:

    The authors describe SAMOSA, a novel method for mapping accessibility on single chromatin fibers, using a non-specific adenine methyltransferase and taking advantage of the long-read high-accuracy capability of the PacBio platform. The method allows for chromatin arrays to be precisely mapped for nucleosomal and non-nucleosomal footprints on single chromatin fibers. When combined with light MNase treatment, the method provides two orthogonal readouts of the chromatin landscape for single molecules, with advantages over other single-molecule long-read methods. Proof-of-concept application of this new method to human K562 cells reveals global heterogeneity, with surprisingly little distinction in nucleosome array patterns between regions distinguished by various active or repressive histone modification patterns. The heterogeneity observed using the unbiased approach represented by SAMOSA highlights the fact that the most common chromatin profiling methods favored by both large projects such as ENCODE and individual researchers are dominated by features such as histone modifications and hyper accessible sites. The method itself and insights into global nucleosomal heterogeneity are of substantial interest to the fields of chromatin and gene regulation. The data are of high quality and the methods are well-described. I have only one suggestion and a couple of minor issues.

    In Figure 5, controls are randomly chosen nucleosomes, but it would be interesting to see what unmarked nucleosomes show. For example, unmarked alpha-satellite should be dominated by highly regular arrays with a 171-bp repeat length present in higher-order repeats corresponding to active centromeres, which consist of nucleosomal complexes that lack Histone H3 (CENP-A instead). The authors speculate that satellite irregularity might result from dynamic restructuring by HP1, and this predicts that other (H3-containing) unmarked satellites that lack H3K9me3 and presumably lack HP1 will be in regular arrays.

  3. ###Reviewer #1:

    The authors validate the method on a reconstituted array of 9 nucleosomes, and convincingly show that m6dA is found in linker DNA, and not (or greatly reduced) at positions bound to nucleosomes.

    They then apply the approach to chromatin fibers released from K562 cells. Long read patterns were clustered to identify 7 clusters. The idea is that because the fragments are released by mild MNase digestion, there will be a positioned nucleosome at one end. The 7 clusters differ in nucleosomal spacing. I am not familiar with Leiden clustering, it would be good if the authors can confirm these clusters with alternative clustering methods. These clusters appear differentially represented in domains that differ in histone modifications.

    Aggregation of data around TF binding sites further reveals a range of different states that show variable nucleosome positioning. This section is interesting but seems rather shallow in analysis. The authors have the ability to look at specific sites and determine the variation in nucleosome positioning in the cell population. However, they look only at aggregated data.

    Overall the approach works well and promises to address important questions, but the current work does not yet take full advantage of the single molecule nature of the assay and as such falls a bit short compared to very related methods that have recently been published (the works cited in the ms, and recently published work from the Stamatoyannopoulos lab). Also, the use of mild MNase is presented as an advantage, but is it really necessary? Adding EcoGII to isolated nuclei may work as well as shown in the recent Stamatoyannopoulos paper in Science.

  4. ##Preprint Review

    This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    ###Summary:

    This manuscript describes a method, named SAMOSA, to identify nucleosome positions along chromatin segments that can be over 10 Kb in size. The approach employs EcoGII-modulated m6dA deposition on accessible non-nucleosomal DNA (inkers, nucleosome free regions) released from nuclear after mild MNase cleavage. The DNA modification is then read-out using PacBio sequencing. Mapping nucleosome positions along longer DNA stretches can provide information on variation in nucleosomal arrays, and how that relates to chromatin state and factor binding etc. The assay is validated using a reconstitute chromatin template and then applied to K562 cells, revealing significant variation in nucleosome positioning and nucleosome repeat lengths at transcription factor binding sites, and throughout domains with various histone modifications.