Mosaic of somatic mutations in one of Earth’s largest organisms, Pando

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This useful study examines patterns of clonal reproduction and somatic mutations in 'Pando', a massive, quaking aspen clone consisting of ~47000 stems. Because the study relies on relatively low-coverage, reduced-representation genomic resequencing data for the detection of somatic mutations, the evidence provided for several of the primary conclusions about clone age and the relationship between mutation accumulation and geographic distance is incomplete.

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

While evolutionary biology traditionally focuses on the spread of mutations within populations, the dynamics of mutational spread within individuals, particularly in long-lived clonally-spreading organisms, remain poorly understood. Here we examine the genetic structure of ‘Pando’, Earth’s largest known quaking aspen ( Populus tremuloides ) clone. We sequenced over 500 samples across Pando and neighboring clones, including multiple tissue types. At fine spatial scales, we detected significant genetic structure, particularly in leaf tissue, but this signal weakened across larger distances, suggesting either rapid root growth homogenizes the system over time or mechanisms exist that prevent widespread mutation transmission. Phylogenetic analyses date Pando between ∼12,000 and 37,000 years old, supported by continuous aspen pollen presence in nearby lake sediments. Tissues accumulated mutations at different rates, with leaves showing significantly higher mutation loads than roots or branches. This work provides the first quantitative age estimate for this remarkable organism and offers initial insights into the spatial dynamics of somatic mutation in a massive clonal plant. While our reduced-representation sequencing approach limits detection of rare variants, these findings establish a foundation for understanding how long-lived modular organisms accumulate and distribute genetic variation, questions that will benefit from future high-coverage whole-genome sequencing across tissues.

Article activity feed

  1. eLife Assessment

    This useful study examines patterns of clonal reproduction and somatic mutations in 'Pando', a massive, quaking aspen clone consisting of ~47000 stems. Because the study relies on relatively low-coverage, reduced-representation genomic resequencing data for the detection of somatic mutations, the evidence provided for several of the primary conclusions about clone age and the relationship between mutation accumulation and geographic distance is incomplete.

  2. Reviewer #1 (Public review):

    Summary

    The authors use reduced-representation sequencing (GBS) across samples from the quaking aspen clonal stand Pando to identify putative somatic mutations, which were used to estimate clone age, and evaluate whether somatic variation shows spatial structure across the grove. This is a compelling and charismatic system to look at somatic mutation in plants. They report little sharing of putative somatic mutations as a function of distance and interpret this as evidence for weak mutation transmission or homogenization over time, potentially driven by rapid root growth and clonal spread dynamics. They use mutations to estimate clone age. The authors are generally upfront and commendably transparent about limitations in sequencing depth and mutation calling. The paper addresses an interesting research system, but struggles to overcome limitations in the suitability of the data.

    Strengths.

    This is a fantastic system and an interesting set of questions. The authors' GBS data does a great job distinguishing Pando from its neighbors, which is an important first step in studying the history of this clone.

    The manuscript is upfront and highlights the need for improved data to refine inference, for example: "Higher-coverage whole-genome sequencing, and ideally single-cell sequencing of defined meristem lineages, will be needed to refine mutational and evolutionary parameter estimates in this iconic organism."

    It also states that "either we are missing roughly 80% of true somatic mutations or only 20% of the mutations we detect are true positives."

    I appreciate that the authors report an age estimate range that considers the breadth of potential false negatives and positives.

    Weaknesses

    I am still not sure whether the paper overcomes issues with the use of GBS for somatic mutation calling.

    I found it difficult to reconcile the manuscript's description of the call set as "conservative" with the reported validation tests (calibrated by looking at retained variants detected in 2 of 8 technical replicates). How was this threshold determined? A mutation with 2/8 has quite low reproducibility, which could reflect either substantial false negatives under low depth (true variants frequently dropping out) or false positives that recur sporadically due to library - or sequencing-specific artifacts. Without stronger internal diagnostics or external validation, it is hard to determine which applies here.

    The GBS sequence space and genomic distribution could be more clearly explained. According to the methods, "The total number of base pairs sequenced(129,194,577) was estimated using angsd, and reduced following the proportion of base pairs that we filtered out because of low coverage (48%)." What does the 129M basepairs represent? Is that 129M/genome length, or is it the number of aligned basepairs (i.e., 1M genome covered x129 depth)? In addition, summarizing where GBS loci fall across the genome, genic vs intergenic vs TE; repetitive vs unique, since these can have substantially different somatic mutation rates (Meyer et al. 2025). Without additional summary/descriptive statistics, it is hard to interpret both missingness and "rate".

    Statistical concerns about some results. In the Figure 3 legend, the authors state that the sample-level relationship between shared variants and distance is significant: "Pearson correlation coefficient ... is −0.02, 95% CI = [−0.05, 0.00], which is significantly different from a randomized distribution (P < 0.001) (B)." However, as plotted in Figure 3B, the observed correlation (−0.02) appears to fall well within the bulk of the randomized distribution of correlation coefficients. If the reported P value is intended to be permutation-based (i.e., the tail probability under the randomized null), it is unclear how P could be < 0.001 given that the observed value does not appear extreme relative to the null.

    The developmental program of plant stem cell layers is essential, but not discussed much. In a root-spreading clone, expectations about mutation sharing depend strongly on how new ramets arise developmentally (root-derived meristem initiation) and how layered meristems partition mutations across tissues (e.g., L1/L2/L3). I was surprised there was not a substantial discussion of the details about the layer specificity of somatic development and mutation accumulation in plants. Especially relating to mutations that would be shared between roots/shoots around potential layer-specific growth of roots. The current analysis seems to focus on comparisons within tissue types (e.g., leaves between ramets), but did not report informative tests between tissue and within-ramet (e.g., in heavily sampled trees, whether a ramet's root, shoot, leaves, share a subset of variants; whether neighboring ramets share root-lineage variants more than shoot-lineage variants). It would help to articulate expectations and clarify what the data can and cannot test. Relatedly, for "mutation rates," in aging material, it would be good to discuss which meristem layer(s) each tissue is likely sampling and how layer-specific mutation dynamics (e.g., reported differences between L1 vs L2 lineages) could influence rate and therefore age estimates (Goel et al. 2024, Amundson et al. 2025).

    Developmental mosaicism makes expected allele fractions lower than discussed in the paper. The supplement states, "However, because the Pando clone is triploid, it reduces our expectation for fixation of a mutation to 0.33", but this ignores layer-specific stem cells in plant development. True that if calls are made against a haploid reference, then a new somatic mutation in a triploid background is expected around ~1/3 allele fraction - but only if fixed in 100% of cells. Layer-specificity (e.g., L1 vs L2 vs L3 restriction) or polyclonal founding events will push expected allele fractions substantially lower. Therefore, at ~12-14× depth (or min of 4x), these allele fractions translate into only a handful (or even 0) of alternate reads (<<33% is expectation).

    Within-tree replicate consistency was unclear. The manuscript hints at multiple samples/replicates per tree (e.g., Figure S2), but it is not clear how often the same putative somatic variants are recovered across samples from the same ramet and tissue. A simple reproducibility summary would be extremely helpful: for variants called in one sample, what fraction are recovered in other samples from the same tree (by tissue), what variant allele fractions, and how do their spectra compare to mutations unique to a single sample?

    The manuscript did not provide supplemental tables or mutation calls. Supplemental tables containing pre-filter and/or post-filter calls (or some other structured data file with flags indicating various quality metrics, REF vs ALT depths at minimum, REF call, and ALT call) would substantially improve transparency and ability to evaluate the work.

  3. Reviewer #2 (Public review):

    Summary:

    The topic of the paper is intriguing as it sets out to age one of the potentially largest living organisms, a tree clone (Pando), using shallow genome resequencing of a large number of replicate samples. The key result is that the Pando clone is several tens of thousands of years old, which is of high-interest to plant genomics and evolutionary ecology.

    Weaknesses:

    Unfortunately, the claims are not matched by the available data and their analysis. Probably, the results can also not be resurrected using modified analyses, as the available data are not suited to reliably detect somatic genetic variation as a means to age-clonal plants.

    In order to reliably age clones, one needs to consider the full process by which clone mates genetically diverge from one another over time, which starts with a plant's apical meristem (SAM). From this, all above-ground tissues such as twigs and branches, as well as leaves, are derived, which has been beautifully worked out now in oaks and many fruit trees (e.g., doi: 10.1101/2023.01.10.523380 ; 10.1101/2024.01.04.573414). For the accumulation and propagation of fixed somatic genetic variation, only the processes in the SAM matter. Hence, it does make little sense to look at tissue-specific mutations unless one is invoking non-cell division induced mutations through UV light. Those, however, would remain undetected with the present low-coverage sequencing as they cannot leave the mosaic status any more, as that tissue is essentially non-dividing.

    Somatic genetic drift (https://www.nature.com/articles/s41559-020-1196-4) is the foundation for the fixation of somatic genetic variation and hence, for ageing (plant) clones. It requires quantitative modeling of the processes at the cell-line level when new modules, here, aspen trees are formed, in particular N (cell population size) and N0 (founder cell size).

    Calibrations have to be made using the mutation and fixation rate at the somatic cell lineage level, ideally also with some empirical data. In trees such as aspen, it would be very easy to obtain calibration points of branch tips that have physically and thus genetically diverged upon a defined TCA to directly determine the rate of accumulation of somatic genetic variation by direct dendrochronology (i.e., counting tree rings).

    Instead, in the present work, a mutation rate from another tree species is taken, which will introduce a lot of uncertainty into the estimates, given that tree SAMs divide at a very different pace (see doi 10.1093/evolut/qpae150). It is clear that a small difference in the assumed mutation rate, e.g., a higher one, would conversely reduce the age estimate considerably.

    I am doubtful that a conventional phylogenetic model based on coalescence, such as the one employed here, can be utilized, as it assumes a sexually recombining population and hence variable sites. A model simulation on an asexually evolving population would be needed to check this.

    In order to reliably call somatic genetic variation, a decent coverage of short-read sequences is needed, definitely > 15x, which was achieved in the present dataset. This is particularly relevant as a fixation in one of the three haploid chromosome sets would just amount to a read frequency of only 0.33. A coverage of only 4x reads per called site seems very low to me; in other words, the filtering steps do not seem to be very rigorous to me. It is also difficult to follow the logic of several ad hoc adjustments that were made to compensate for the low coverage of sequencing, in particular, the common panel and the replicate identical samples. Why chose 80% in the latter?

    There are alternative, non-sequencing-based ways to double-check the accuracy of somatic SNP calls (e.g., described here https://www.nature.com/articles/s41559-020-1196-4), which could have been employed at least once to evaluate the error rates for the specific sequencing strategy.

    I also suggest that for any future study, reference to mutation callers developed for cancer somatic mutation detection should be employed, which are now increasingly used both in clonal plants and trees for that purpose.

    What worries me is that there is a poor correlation between physical and genetic distance. This lack of correlation among spatial and genetic structure, for example, the star-like phylogeny presented in Figure 6d, indicates a large fraction of false positives rather than some special, as yet unexplained processes of local mutation accumulation that the authors claim to have discovered.

    Finally, the work is not properly embedded into the current literature. For example, recent developments of molecular clocks were not considered, such as the development of a dedicated somatic genetic clock that precisely addresses this question (https://www.nature.com/articles/s41559-024-02439-z). Also, older but nevertheless significant work that aged aspen clones using microsatellite markers is not mentioned (http://dx.doi.org/10.1111/j.1365-294X.2008.03962.x).