The Mutationathon highlights the importance of reaching standardization in estimates of pedigree-based germline mutation rates

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    Bergeron et al show that mutation rate independently estimated by several teams with a same pedigree dataset can be different due the methods and approaches used to identify de novo mutations. This result is of primary importance because it shows the necessity to have standard mutation identification method and the difficulties to compare mutation rates from different studies.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

In the past decade, several studies have estimated the human per-generation germline mutation rate using large pedigrees. More recently, estimates for various nonhuman species have been published. However, methodological differences among studies in detecting germline mutations and estimating mutation rates make direct comparisons difficult. Here, we describe the many different steps involved in estimating pedigree-based mutation rates, including sampling, sequencing, mapping, variant calling, filtering, and appropriately accounting for false-positive and false-negative rates. For each step, we review the different methods and parameter choices that have been used in the recent literature. Additionally, we present the results from a ‘Mutationathon,’ a competition organized among five research labs to compare germline mutation rate estimates for a single pedigree of rhesus macaques. We report almost a twofold variation in the final estimated rate among groups using different post-alignment processing, calling, and filtering criteria, and provide details into the sources of variation across studies. Though the difference among estimates is not statistically significant, this discrepancy emphasizes the need for standardized methods in mutation rate estimations and the difficulty in comparing rates from different studies. Finally, this work aims to provide guidelines for computational and statistical benchmarks for future studies interested in identifying germline mutations from pedigrees.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    The subject of this review can be interesting and in principal helpful to the researchers who works on germline mutations. The authors have summarised all the work done in this area over the past 10 years. However, I found parts of the review unsurprising and also I am not sure if the reviewer have convinced the readers what is the best practice for calling gremlin de novo mutations.

    We agree with the reviewer that the limit of our study is that we did not agree on gold standards for calling de novo mutations, but focus on the different parameters that should be considered, how to choose them, and how to report them in a consistent way. More studies with much larger datasets are likely soon to appear and then it will also be more clear if one standard method could fit these. Our study carefully considers all the ingredients of such a future gold standard.

    Bergeron et al present an interesting paper about the biases and implications of the different methods used for the identification of de novo mutations from pedigree dataset. The first part is a review of the different methods and criteria used for the mutation identification (coverage, mapping quality, measure of callable genome among other) in previous studies. The second part of the study is an original approach named "mutationathon" where 5 teams received the same dataset from a pedigree to estimate the mutation rate. The objective here, that is very interesting for the field, is to understand how the different approaches from several teams impacts the mutation rate estimation. The 5 estimates are on the same order, but with one value significantly higher than the others. Moreover, the candidate mutations identified by each team is very variable: about 20-30 mutations candidates but with only 7 true positives in common despite strong criteria during identification steps. This study is very interesting and shows the importance of standard mutation identification method in mutation identification, and the difficulties or biases in the comparison of mutation rates from different publications using different approaches.

    We thank the reviewer for the nice summary of our study and its implication.

    Reviewer #2 (Public Review):

    I commend the authors for their extensive work on summarising large number of germline de novo mutation (DNMs) studies of Human and non-Human trios. They outlined all the methods used by different studies in order to call DNMs. They pointed out different stages which may affect DNMs calling, including; samples size, sample size, library preparation, alignment, variant calling, and post-filter. Finally, by analysing DNMs in macaques pedigree across five groups, they have demonstrated how different strategies in variant calling, may lead to reporting different mutation rates.

    We thank the reviewer for their comments.

    The authors are correct that identification of true DNMs are affected by experimental and analytical strategies. This is a long-time known issue in the field. However, as the authors also mentioned, despite all the variations yet the reported DNMs across different studies are very much in agreement. Indeed, in their Mutationathon exercise on calling DNMs in pedigree of three generations of rhesus macaques they have demonstrated that although all the five groups reported variation in number of DNMs yet the difference in mutation rate is insignificant. Moreover, I am not convinced variability in terms of calling DNMs is a major issue in this field at least not in recent years and specially for Human germline mutation. More recent studies with large number of trios such as analysis of ~12k trios by Kaplanis et al., bioRxiv, 2021 eliminates most of the issues due to systematic noise.

    We agree with the reviewer that the methodology discrepancy does not seem to be an issue in human studies, as all studies now found similar rates over the past decade. The large number of sequenced human trios has helped in the fine-tuning of GATK or Graphtyper genotyping such that the false positive rate has become very low and the callable part of a human genome for 30X coverage is now very well known. However, it may become an issue when comparing different studies on non-human primates which may differ in both overall heterozygosity (most often higher than in humans), repeat organization, and quality of the reference genome. Furthermore, in these species mutation rates are also estimated on smaller pedigrees. This is partly why we choose to do the Mutationathon on a non-human trio, for which the rate is unknown and with each group estimating independently from each other, thus there was little prior expectation for the number of real de novo mutations and callable genome size, but an opportunity to check false positives by subsequent Sanger sequencing. We have now added a sentence in the introduction to explain that the problems are more likely to appear in non-human species (page 3 lines 77-82).

    Having said above it is very helpful to have some guidelines to take different factors into consideration for future experiments. However, there are few issues that I am not sure if the authors have addressed in their review:

    The authors have not address issues with relatedness: the strategy for calling DNMs in multi-sibling families. This is also very important in non-human studies.

    We briefly mentioned the multi-sibling families in the sample size section. Overall, we do not believe that the multi-siblings pedigree should be analyzed in a different way than a unique offspring. Yet, we have now expanded this part to detail the opportunity offered by multi-siblings samples of dissociating mutations that occur during the postzygotic stage from the actual germline ones (page 8 lines 158-167).

    The best practice suggested here is certainly not applicable for different species. Due to differences in selective pressures the number of DNMs in different spices is different. This directly affect the detection method. Moreover, cellular processes causing germline de novo mutations may vary between species. Hence, our mutation calling strategies cannot be generalised across species.

    We agree with the reviewer that some steps of the analysis should be adjusted when working on different species. For instance, we mentioned that the alignment or the variant calling may perform differently on species that are more or less heterozygous. We also agree that different processes may be at play in various species with, for instance, more postzygotic mutations or a larger increase with age. Yet, we do not believe that this will affect the detection method since we basically recommend that each trio is independently called for de novo mutations. Indeed, there is no prior hypothesis on the number of mutations expected. We believe that different sample types or sequencing technologies or genome characteristics could require adjustment in the methods. That is why we proposed various filters and methods to estimate the number of candidate DNMs and the callable genome, as adapting those to each dataset may be necessary. For instance, a species with a highly repetitive genome could lead to difficulties in detecting DNMs, but in these cases a similar method as proposed could be applied as long as the callable genome also excludes the repetitive regions.

    Issues with somatic mutation contamination, as the authors correctly mentioned, can vary depending on the tissue of choice. However, the authors do not suggest a solution. For example, in case of clonal hem what is the solution to overcome this issue and call DNMs? Perhaps, the authors can explore parameters such as cell fraction or purity of the tissue which, can guide the downstream analysis for DNM calling.

    We suggested that sampling different tissues would help to differentiate somatic mutations (present only in one tissue type) from germline mutations (which should be present in all tissues in the offspring). We have now clarified this and added a sentence on the allelic fraction to avoid clonal hematopoiesis (page 9 lines 199-203).

    Another aspect that may affect DNM calling, is clinical history of the parents and/or child. What would be the strategy for these cases?

    In the autism study, there was no overall difference in rates between individuals with autism and their unaffected siblings (Turner et. al, 2017). However, it is true that the clinical history of the parents/offspring could lead to variation in mutation rate in some cases. In non-human primates, we do not normally have any phenotypic information available and are mainly interested in the general rate and spectrum of mutations.

    How about introducing a site-specific error rate? Given the high number of trios publicly now available it would be extremely useful to compute site specific error rate per nucleic acid.

    This is a very good idea. If this would be possible for human studies, it is difficult to apply for other species (see answer bellow).

    Overall, as the authors also mentioned, DNMs calling require study- or species-specific thresholds. Therefore, I am not convinced if their suggested best practice is really applicable to all types of trio-studies.

    We agree with the reviewer that we mainly focus on a standardized way of choosing the filters and reporting the methods. These are essential for transparency until we find a gold method.

    Reviewer #3 (Public Review):

    This study is motivated by the variable germline mutation rates that are estimated from numerous genome sequencing studies of primate pedigrees. The authors argue that this variability is the result of methodological differences in both the molecular and computational strategies employed. Therefore, the authors launch the "Mutationathon" as an effort to isolate the effect of computational differences employed by different research groups that form the authorship of this manuscript. Using PCR validation, they are able to assess the specificity and sensitivity of each approach, recommend some high-level guidelines, and conclude that all future studies should provide detailed reporting of all computational details.

    Whole-genome DNA sequencing of pedigrees consisting of at least the mother, father, and at least one offspring has become the gold standard for estimating the rate of germline mutation in humans and other primates. While the estimated mutation rates are broadly consistent, they can vary by up to a fact of two. The authors make a strong argument that the primary reason for the variance in estimated rates is fundamental differences in the computational methods employed from study to study. Specifically, since germline mutations are rare, therefore most studies make substantial efforts to eliminate false positive predictions. The computational "filtering" approaches differ yielding variability in the specificity and sensitivity of each study. Furthermore, studies account take different approaches to account for the specificity and sensitivity of their approach. As a result, the final estimated germline mutation rates vary.

    The authors seek to assess the impact of differing computational filtering approaches on estimated germline mutation rates by launching the Mutationathon. The study design is to provide DNA sequencing data from a single pedigree of rhesus macaques to five different research labs, who each apply their internal "best practice" computational approaches to estimate a germline mutation rate. The authors then use PCR validation of the union of all mutation predictions to directly measure specificity and sensitivity with an orthogonal molecular strategy. Finally, they provide recommendations for future studies to thoroughly document the methodologies for reproducibility and comparability.

    A key strength of this study is the fact that the authors were able to isolate the impact of computational differences on estimated rates by providing identical sequencing data to each group. Another key strength is the fact that PCR validation of all predicted mutations was performed (or at least attempted), providing an independent assessment of errors. However, these strengths are balanced by the fact that the study was conducted on a single macaque pedigree, thus preventing an assessment of the variance in rates estimated across pedigrees for the same computational approach. Similarly, neither multiple tissues nor a multi-generational pedigree was used, thereby preventing the assessment of the degree to which tissue-specific mutations or early post-zygotic mutations masquerade as germline mutations given their observed allele ratios. Lastly, the basic conclusion is that methods are sufficiently variable that providing a gold standard approach was not possible and the paper concludes that each study should simply thoroughly detail the methods so that differences in reported rates can be better understood.

    While the motivation behind the study is clear and the detailed treatment of the variability of computational approaches is fantastic, the basic conclusions of the study largely reflect the understanding of expert researchers conducting such work. However, the efforts to document the largest computational drivers of variability germline mutation rate estimation are laudable and will likely inform future efforts in this area.

    We thank the reviewer for this summary of our study, highlighting the strength and limits.

  2. Evaluation Summary:

    Bergeron et al show that mutation rate independently estimated by several teams with a same pedigree dataset can be different due the methods and approaches used to identify de novo mutations. This result is of primary importance because it shows the necessity to have standard mutation identification method and the difficulties to compare mutation rates from different studies.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    The subject of this review can be interesting and in principal helpful to the researchers who works on germline mutations. The authors have summarised all the work done in this area over the past 10 years. However, I found parts of the review unsurprising and also I am not sure if the reviewer have convinced the readers what is the best practice for calling gremlin de novo mutations.

    Bergeron et al present an interesting paper about the biases and implications of the different methods used for the identification of de novo mutations from pedigree dataset. The first part is a review of the different methods and criteria used for the mutation identification (coverage, mapping quality, measure of callable genome among other) in previous studies. The second part of the study is an original approach named "mutationathon" where 5 teams received the same dataset from a pedigree to estimate the mutation rate. The objective here, that is very interesting for the field, is to understand how the different approaches from several teams impacts the mutation rate estimation. The 5 estimates are on the same order, but with one value significantly higher than the others. Moreover, the candidate mutations identified by each team is very variable: about 20-30 mutations candidates but with only 7 true positives in common despite strong criteria during identification steps. This study is very interesting and shows the importance of standard mutation identification method in mutation identification, and the difficulties or biases in the comparison of mutation rates from different publications using different approaches.

  4. Reviewer #2 (Public Review):

    I commend the authors for their extensive work on summarising large number of germline de novo mutation (DNMs) studies of Human and non-Human trios. They outlined all the methods used by different studies in order to call DNMs. They pointed out different stages which may affect DNMs calling, including; samples size, sample size, library preparation, alignment, variant calling, and post-filter. Finally, by analysing DNMs in macaques pedigree across five groups, they have demonstrated how different strategies in variant calling, may lead to reporting different mutation rates.

    The authors are correct that identification of true DNMs are affected by experimental and analytical strategies. This is a long-time known issue in the field. However, as the authors also mentioned, despite all the variations yet the reported DNMs across different studies are very much in agreement. Indeed, in their Mutationathon exercise on calling DNMs in pedigree of three generations of rhesus macaques they have demonstrated that although all the five groups reported variation in number of DNMs yet the difference in mutation rate is insignificant. Moreover, I am not convinced variability in terms of calling DNMs is a major issue in this field at least not in recent years and specially for Human germline mutation. More recent studies with large number of trios such as analysis of ~12k trios by Kaplanis et al., bioRxiv, 2021 eliminates most of the issues due to systematic noise.

    Having said above it is very helpful to have some guidelines to take different factors into consideration for future experiments. However, there are few issues that I am not sure if the authors have addressed in their review:

    The authors have not address issues with relatedness: the strategy for calling DNMs in multi-sibling families. This is also very important in non-human studies.

    The best practice suggested here is certainly not applicable for different species. Due to differences in selective pressures the number of DNMs in different spices is different. This directly affect the detection method. Moreover, cellular processes causing germline de novo mutations may vary between species. Hence, our mutation calling strategies cannot be generalised across species.

    Issues with somatic mutation contamination, as the authors correctly mentioned, can vary depending on the tissue of choice. However, the authors do not suggest a solution. For example, in case of clonal hem what is the solution to overcome this issue and call DNMs? Perhaps, the authors can explore parameters such as cell fraction or purity of the tissue which, can guide the downstream analysis for DNM calling.

    Another aspect that may affect DNM calling, is clinical history of the parents and/or child. What would be the strategy for these cases?

    How about introducing a site-specific error rate? Given the high number of trios publicly now available it would be extremely useful to compute site specific error rate per nucleic acid.

    Overall, as the authors also mentioned, DNMs calling require study- or species-specific thresholds. Therefore, I am not convinced if their suggested best practice is really applicable to all types of trio-studies.

  5. Reviewer #3 (Public Review):

    This study is motivated by the variable germline mutation rates that are estimated from numerous genome sequencing studies of primate pedigrees. The authors argue that this variability is the result of methodological differences in both the molecular and computational strategies employed. Therefore, the authors launch the "Mutationathon" as an effort to isolate the effect of computational differences employed by different research groups that form the authorship of this manuscript. Using PCR validation, they are able to assess the specificity and sensitivity of each approach, recommend some high-level guidelines, and conclude that all future studies should provide detailed reporting of all computational details.

    Whole-genome DNA sequencing of pedigrees consisting of at least the mother, father, and at least one offspring has become the gold standard for estimating the rate of germline mutation in humans and other primates. While the estimated mutation rates are broadly consistent, they can vary by up to a fact of two. The authors make a strong argument that the primary reason for the variance in estimated rates is fundamental differences in the computational methods employed from study to study. Specifically, since germline mutations are rare, therefore most studies make substantial efforts to eliminate false positive predictions. The computational "filtering" approaches differ yielding variability in the specificity and sensitivity of each study. Furthermore, studies account take different approaches to account for the specificity and sensitivity of their approach. As a result, the final estimated germline mutation rates vary.

    The authors seek to assess the impact of differing computational filtering approaches on estimated germline mutation rates by launching the Mutationathon. The study design is to provide DNA sequencing data from a single pedigree of rhesus macaques to five different research labs, who each apply their internal "best practice" computational approaches to estimate a germline mutation rate. The authors then use PCR validation of the union of all mutation predictions to directly measure specificity and sensitivity with an orthogonal molecular strategy. Finally, they provide recommendations for future studies to thoroughly document the methodologies for reproducibility and comparability.

    A key strength of this study is the fact that the authors were able to isolate the impact of computational differences on estimated rates by providing identical sequencing data to each group. Another key strength is the fact that PCR validation of all predicted mutations was performed (or at least attempted), providing an independent assessment of errors. However, these strengths are balanced by the fact that the study was conducted on a single macaque pedigree, thus preventing an assessment of the variance in rates estimated across pedigrees for the same computational approach. Similarly, neither multiple tissues nor a multi-generational pedigree was used, thereby preventing the assessment of the degree to which tissue-specific mutations or early post-zygotic mutations masquerade as germline mutations given their observed allele ratios. Lastly, the basic conclusion is that methods are sufficiently variable that providing a gold standard approach was not possible and the paper concludes that each study should simply thoroughly detail the methods so that differences in reported rates can be better understood.

    While the motivation behind the study is clear and the detailed treatment of the variability of computational approaches is fantastic, the basic conclusions of the study largely reflect the understanding of expert researchers conducting such work. However, the efforts to document the largest computational drivers of variability germline mutation rate estimation are laudable and will likely inform future efforts in this area.