Reassessment of weak parent-of-origin expression bias shows it rarely exists outside of known imprinted regions

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This manuscript presents a useful meta-analysis of genes with parent-specific expression from mouse published RNA-seq datasets, focusing on genes with weak allelic bias. A combination of systematic bioinformatic analysis and experimental validation convincingly shows that the number of parentally biased genes has been overestimated and the few novel ones lie at the periphery of known imprinted loci. The work will be of interest to genomicists with an interest in imprinting and its mechanisms.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

In mouse and human, genes subjected to genomic imprinting have been shown to function in development, behavior, and post-natal adaptations. Failure to correctly imprint genes in human is associated with developmental syndromes, adaptive, and metabolic disorders during life as well as numerous forms of cancer. In recent years researchers have turned to RNA-seq technologies applied to reciprocal hybrid strains of mice to identify novel imprinted genes, causing a threefold increase in genes reported as having a parental origin-specific expression bias. The functional relevance of parental origin-specific expression bias is not fully appreciated especially since many are reported with only minimal parental bias (e.g. 51:49). Here, we present an in-depth meta-analysis of previously generated RNA-seq data and show that the methods used to generate and analyze libraries greatly influence the calling of allele-specific expression. Validation experiments show that most novel genes called with parental-origin-specific allelic bias are artefactual, with the mouse strain contributing a larger effect on expression biases than parental origin. Of the weak novel genes that do validate, most are located at the periphery of known imprinted domains, suggesting they may be affected by local allele- and tissue-specific conformation. Together these findings highlight the need for robust tools, definitions, and validation of putative imprinted genes to provide meaningful information within imprinting databases and to understand the functional and mechanistic implications of the process.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    In mammals, a small subset of genes undergoes canonical genomic imprinting, with highly biased expression in function of parent of origin allele. Recent studies, using polymorphic mouse embryos and tissues, have reevaluating the number of allele-specific expressed genes (ASE) to 3 times more than previously thought, however with most of these novel genes showing a very low ASE (50%-60% bias toward one parental allele). Here, the authors undergo a comparison of 4 datasets and complete bioinformatic reanalysis of 3 recent allele specific RNAseq to study potential novel imprinted genes, using recently released iSoLDE pipeline. Very few genes have been confirmed with true ASE in the different studies and/or validated by pyrosequencing analysis, However, the authors show that most of the newly discovered ASE genes are lying in close proximity of already known imprinted loci and could be co-regulated by these imprinted clusters. This is important to understand how and to which extent imprinted control regions control gene expression.

    This manuscript highlights the number of potential false discovered imprinted genes in previous datasets that could result to either lack of replicates, weak allelic ratio or low gene expression and lack of read depth. But the lack of overlap in the ASE called genes (at the exception to the known imprinted genes) between the different datasets is worrying and important to discuss, as the authors did. I would have appreciated more details into the differences between the different datasets that could explain the lack of reproducibility : library preparation protocol, sequencer technology, SNP calling, number of reads per SNP, bioinformatics pipeline.

    We agree and a comparison of all the studies is included in the methods section. In particular, we have now included more information on SNP calling and sequencer technology.

    Studying allele specific expression of lowly expressed genes is difficult by technology based on PCR amplification (library preparation, pyrosequencing) and could result on a bias expression only due to the random amplification of a small pool of molecules. Could the author compare the level of expression of their different classes of genes? The more robust ASE genes in their study could be the more highly expressed? Several genes were identified only in one or two of the previous studies, were they expressed in the other studies when not define as ASE? This would also allow defining a threshold of expression to study allelic bias in the future. To conclude, this study is an important resource for the epigenetic field and better understand genomic imprinting.

    We thank-you for this suggestion. We have now taken all RNAseq data that we had run through the ISoLDE pipeline and extracted the transcripts per million (TPM) expression levels for each of the genes called in the original studies. We find no over representation of lowly expressed genes in the novel biased genes compared with known imprinted genes. We also looked specifically at the expression levels of the genes tested by pyrosequencing in these datasets and saw no relationship between validation and expression levels. Expression levels are consistent between studies, especially in the same tissue, indicating the lack of reproducibility between studies is not due to differing expression. These observations have been added to the manuscript.

    Reviewer #2 (Public Review):

    This work aims to understand genomic imprinting in the mouse and provide further insight to challenges and patterns identified in previous studies.

    Firstly, genomic imprinting studies have been surrounded by controversy especially ~10 years ago when the explosion of sequencing data but immature methods to analyze it lead to highly exaggerated claims of widespread imprinting. While the methods have improved, clear standards are not set and results still have some inconsistencies between studies. The authors first do a meta-analysis of previous studies, comparing their results and doing a useful reanalysis of the data. This provides some valuable insights into the reasons for inconsistencies and guides towards better study designs. While this work does not exactly set a common standard for the field, or provide a full authoritative catalog of imprinted loci in mouse tissues, it provides a step in that direction. I find these analyses relatively simple and straightforward, but they seem solid.

    Previous studies have described a relatively common pattern of subtle expression bias towards one parental allele, rather than the classical imprinting pattern of fully monoallelic expression. This work digs deeper into this phenomenon, using first the meta-analysis data and then also targeted pyrosequencing analysis of selected loci. The analysis is generally well done, although I did not understand why gDNA amplification bias was not systematically corrected in all cases but only if it was above a given (low) threshold. I doubt this would affect the results much though. To some extent the results confirm previously observed patterns (bimodal distribution of either subtle or full bias, and effect of distance from the core of the imprinted locus). The novel insights mostly concern individual loci, with discovery and validation of some novel genes, typically with a subtle or context-specific parental bias.

    The study also provides some insights into mechanisms, especially by analysis of existing mouse models with a deletion of the ICR of specific loci. The change in the parental bias pattern was then used to infer potential methylation and chromatin-related mechanisms in these imprinted loci, including how the subtle bias further away is achieved. There are interesting novel findings here, as well as hypotheses for further research. However, this is an area where the conclusions rely quite heavily on published research especially as this study doesn't include single-cell resolution, and it's not entirely clear how much of e.g. the Figure 7 mechanisms part is based on discoveries of this study.

    We agree that Figure 7 does not illustrate models based exclusively on data generated in this study: instead, it serves as hypotheses to be tested in the coming years

    Imprinting is a fascinating phenomenon that can be informative of mechanisms of genome regulation and parental effects in general. It is a bit of a niche area though, and the target audience of this study is likely going to be limited to specialists doing research on this specific topic. As the authors point out, the functional importance of the findings is unknown.

  2. eLife assessment

    This manuscript presents a useful meta-analysis of genes with parent-specific expression from mouse published RNA-seq datasets, focusing on genes with weak allelic bias. A combination of systematic bioinformatic analysis and experimental validation convincingly shows that the number of parentally biased genes has been overestimated and the few novel ones lie at the periphery of known imprinted loci. The work will be of interest to genomicists with an interest in imprinting and its mechanisms.

  3. Reviewer #1 (Public Review):

    In mammals, a small subset of genes undergoes canonical genomic imprinting, with highly biased expression in function of parent of origin allele. Recent studies, using polymorphic mouse embryos and tissues, have reevaluating the number of allele-specific expressed genes (ASE) to 3 times more than previously thought, however with most of these novel genes showing a very low ASE (50%-60% bias toward one parental allele). Here, the authors undergo a comparison of 4 datasets and complete bioinformatic reanalysis of 3 recent allele specific RNAseq to study potential novel imprinted genes, using recently released iSoLDE pipeline. Very few genes have been confirmed with true ASE in the different studies and/or validated by pyrosequencing analysis, However, the authors show that most of the newly discovered ASE genes are lying in close proximity of already known imprinted loci and could be co-regulated by these imprinted clusters. This is important to understand how and to which extent imprinted control regions control gene expression.

    This manuscript highlights the number of potential false discovered imprinted genes in previous datasets that could result to either lack of replicates, weak allelic ratio or low gene expression and lack of read depth. But the lack of overlap in the ASE called genes (at the exception to the known imprinted genes) between the different datasets is worrying and important to discuss, as the authors did. I would have appreciated more details into the differences between the different datasets that could explain the lack of reproducibility : library preparation protocol, sequencer technology, SNP calling, number of reads per SNP, bioinformatics pipeline.

    Studying allele specific expression of lowly expressed genes is difficult by technology based on PCR amplification (library preparation, pyrosequencing) and could result on a bias expression only due to the random amplification of a small pool of molecules. Could the author compare the level of expression of their different classes of genes? The more robust ASE genes in their study could be the more highly expressed? Several genes were identified only in one or two of the previous studies, were they expressed in the other studies when not define as ASE? This would also allow defining a threshold of expression to study allelic bias in the future. To conclude, this study is an important resource for the epigenetic field and better understand genomic imprinting.

  4. Reviewer #2 (Public Review):

    This work aims to understand genomic imprinting in the mouse and provide further insight to challenges and patterns identified in previous studies.

    Firstly, genomic imprinting studies have been surrounded by controversy especially ~10 years ago when the explosion of sequencing data but immature methods to analyze it lead to highly exaggerated claims of widespread imprinting. While the methods have improved, clear standards are not set and results still have some inconsistencies between studies. The authors first do a meta-analysis of previous studies, comparing their results and doing a useful reanalysis of the data. This provides some valuable insights into the reasons for inconsistencies and guides towards better study designs. While this work does not exactly set a common standard for the field, or provide a full authoritative catalog of imprinted loci in mouse tissues, it provides a step in that direction. I find these analyses relatively simple and straightforward, but they seem solid.

    Previous studies have described a relatively common pattern of subtle expression bias towards one parental allele, rather than the classical imprinting pattern of fully monoallelic expression. This work digs deeper into this phenomenon, using first the meta-analysis data and then also targeted pyrosequencing analysis of selected loci. The analysis is generally well done, although I did not understand why gDNA amplification bias was not systematically corrected in all cases but only if it was above a given (low) threshold. I doubt this would affect the results much though. To some extent the results confirm previously observed patterns (bimodal distribution of either subtle or full bias, and effect of distance from the core of the imprinted locus). The novel insights mostly concern individual loci, with discovery and validation of some novel genes, typically with a subtle or context-specific parental bias.

    The study also provides some insights into mechanisms, especially by analysis of existing mouse models with a deletion of the ICR of specific loci. The change in the parental bias pattern was then used to infer potential methylation and chromatin-related mechanisms in these imprinted loci, including how the subtle bias further away is achieved. There are interesting novel findings here, as well as hypotheses for further research. However, this is an area where the conclusions rely quite heavily on published research especially as this study doesn't include single-cell resolution, and it's not entirely clear how much of e.g. the Figure 7 mechanisms part is based on discoveries of this study.

    Imprinting is a fascinating phenomenon that can be informative of mechanisms of genome regulation and parental effects in general. It is a bit of a niche area though, and the target audience of this study is likely going to be limited to specialists doing research on this specific topic. As the authors point out, the functional importance of the findings is unknown.

  5. Reviewer #3 (Public Review):

    Genomic imprinting is a striking example of epigenetic inheritance in mammals with profound influence on growth and development. A powerful experimental approach to the study of imprinting involves reciprocal mouse F1 crosses; it allows direct assessment of the parent-of-origin effects in a genetically uniform setting that is also an order of magnitude richer in polymorphism than human samples. Use of RNA sequencing is a natural fit to systematic quantitative analysis of allele-specific expression; however, multiple RNA-seq studies of imprinting in F1 mouse tissues wildly disagree in the estimated numbers of novel imprinted genes and in the extent of allelic bias in these genes. In their study, Edwards et al. start with an observation that existing studies varied in their experimental design and data analysis procedures. To assess to what extent disagreements between findings are due to different data processing, they re-analyzed several published datasets using a single pipeline. Furthermore, they performed experimental validation of a number of the novel candidate imprinted genes using primer extension on RT-PCR products (pyrosequencing), to estimate the number of false positives.

    Between re-analysis of RNA-seq datasets and the validation experiments, this study presents convincing evidence that most candidate novel imprinted genes are artefactual. The discordant predictions between studies remain even after processing all the data following ISoLDE protocol. Importantly, validated candidate genes tended to be on the periphery of known imprinted domains, suggesting that their boundaries are yet to be finalized.

    This work brings into focus an important issue of reproducible analysis and interpretation of RNA sequencing data, especially the analysis of allele-specific expression, including in the specific case of imprinted genes. With novel molecular mechanisms described recently (such as H3K27me3-related parent-of origin gene regulation) and greater accuracy of measuring subtle allelic bias afforded by deep sequencing, the authors' suggested classification (canonical, weak canonical, non-canonical, and weakly biased) is a useful pragmatic step in dealing with the confusing terminology in different studies.

    The authors make a strong case that the data analysis methods used in the analyzed studies are prone to false positives. However, the approaches they use are more of an invitation to further dialogue than a definitive recipe to follow. For example, the authors mention that combining the results of several analytical approaches should increase accuracy. However, if those approaches are erroneous, this could lead to two types of error: (1) tools might be erroneous in a similar way, then consistency of results might be taken as confirmation of correctness, (2) averaging results from tools with opposite biases would lead to loss of signal. In the long run, there is no substitute to developing statistically accurate tools and validating that they correctly deal with noise in the data. On the experimental side, Pyrosequencing also involves PCR. This does not change the main conclusions of this study but going forward, it is worth focusing on the methods less affected by amplification (such as allele-specific FISH, ddPCR, or direct RNA sequencing).