Allele-specific gene expression can underlie altered transcript abundance in zebrafish mutants

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    Zebrafish strains are typically considerably polymorphic. White and colleagues tested the hypothesis that genes in linkage with a mutant allele might show allele-specific expression differences and thus potentially confound the interpretation of mutant effects. Using a variety of mutant and wild-type alleles with sophisticated analysis of RNA-seq data in zebrafish embryos they demonstrate over-representation of gene expression changes from genes that are in linkage with the mutant allele on the same chromosome. The data are extensive, carefully analyzed and of sufficient depth and quality to support their main claim of frequent occurrence of allele-specific gene expression in outcross experiments. These allele-specific expression differences may impact on the interpretation of differential gene expression caused by a specific mutation. The findings of this study will be of interest to genetics working not only with zebrafish, but potentially also other polymorphic species.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

In model organisms, RNA-sequencing (RNA-seq) is frequently used to assess the effect of genetic mutations on cellular and developmental processes. Typically, animals heterozygous for a mutation are crossed to produce offspring with different genotypes. Resultant embryos are grouped by genotype to compare homozygous mutant embryos to heterozygous and wild-type siblings. Genes that are differentially expressed between the groups are assumed to reveal insights into the pathways affected by the mutation. Here we show that in zebrafish, differentially expressed genes are often over-represented on the same chromosome as the mutation due to different levels of expression of alleles from different genetic backgrounds. Using an incross of haplotype-resolved wild-type fish, we found evidence of widespread allele-specific expression, which appears as differential expression when comparing embryos homozygous for a region of the genome to their siblings. When analysing mutant transcriptomes, this means that the differential expression of genes on the same chromosome as a mutation of interest may not be caused by that mutation. Typically, the genomic location of a differentially expressed gene is not considered when interpreting its importance with respect to the phenotype. This could lead to pathways being erroneously implicated or overlooked due to the noise of spurious differentially expressed genes on the same chromosome as the mutation. These observations have implications for the interpretation of RNA-seq experiments involving outbred animals and non-inbred model organisms.

Article activity feed

  1. Author Response

    Reviewer #2 (Public Review):

    In an extensive analysis of zebrafish wild-type vs. mutant RNA seq datasets, the authors find that differentially expressed genes are often enriched in the chromosomal region of the mutated gene.

    An older paper by Miller et al. (2013, Genome Research 23: 679) also analyzed RNA seq data on wild-type and mutant zebrafish (in their case with the main goal of identifying the mutated gene), and they noted only a small number of differentially expressed genes near the mutated locus. White et al. could mention this paper and consider methodological differences that may explain these seemingly different conclusions.

    We have added some sentences discussing the Miller paper (Lines 253-257).

    By genotyping and performing RNA seq on individual animals from a large cross, the authors obtain convincing evidence that genomic polymorphisms cause many genes to be differentially expressed in different wild-type zebrafish strains. They show that most of the differentially expressed genes near mutant loci are likely to be caused allele-specific expression differences in linkage disequilibrium with the mutation rather than by any action of the mutated gene. The authors illustrate how one can determine which nearby genes may actually be regulated by the mutated gene, using polymorphisms among wild-type chromosomes and different mutant alleles of the muted gene. Another possible, complementary approach might be to rescue the mutants with a wild-type transgene, which should rescue the mutant phenotype and genes that are regulated by the mutated gene but not affect differentially expressed genes caused simply by allelic differences. In some cases, one could also use transgenic over expression of the gene of interest to compare loss of function and gain of function of the gene to assess possible inverse effects on target genes (e.g. a putative target gene may be reduced in the mutant and increased in transgenic over expression animals). As the authors note, these approaches would represent significant extra effort, and they reasonably suggest that a simpler alternative is for investigators to consider the chromosomal position differentially expressed genes when interpretating their RNA seq data from outbred strains.

    We have added a section about complementary approaches, including overexpression, that could be used to determine if differential expression is downstream of a mutation or not (Lines 303-310).

    Reviewer #3 (Public Review):

    The authors used transcriptome analyses by RNA-seq to identify differentially expressed (DE) genes in a series of previously identified forward genetic mutants emerging from outbred crosses as well as in clusters of wild type zebrafish embryos emerging from newly-generated cross of well-defined genotypes. The authors present experiments, which convincingly demonstrate physical linkage of DE genes to the mutated locus and their predominant localisation on the mutation-carrying chromosome. Next the authors demonstrate, that allelic variation of expression is common in a wild type hybrid cross of SAT double haploid strains and demonstrate haplotype-dependent allelic gene expression variation. Finally, White et al. offer an example approach for distinguishing gene expression change caused by a mutation from that caused by allelic variation of expression of genes in linkage disequilibrium with the mutation by analysing segregation of alleles and their expression dynamics.

    The data from a series of mutants and well-defined wild type crosses convincingly demonstrates the impact of strain polymorphism and linkage disequilibrium on differential gene expression. The provided evidence suggests the generality of differential gene expression readouts arising independently from generated mutations in outcross experiments in zebrafish.

    These observations are potentially important for the design of transcriptomic analyses of forward genetic screens and other experiments involving RNA-seq from outcrosses such as inter and transgenerational epigenetic inheritance studies.

    Evidence on the actual impact of misinterpretation of gene expression differences on biological conclusions drawn from mutants generated in outbred crosses would strengthen the study.

    The conclusions of this manuscript are well supported by the experimental data, some aspects would benefit from further clarification.

    1.) Figure 4 demonstrates separation of differential expression due to sox10 mutations from those arising from allele-specific variation in LD with sox10 by providing an individual example for both. In this section a global demonstration of the distinct segregation-associated expression dynamics would strengthen the claim. It is recommended that the expression variation for the full set of genes quoted in the text (10 and 15 genes respectively) are shown.

    We have included boxplots of all the genes in Figure 5C showing the variation in expression by genotype and split by those most likely downstream of sox10 and those consistent with ASE.

    2.) Demonstration of the importance of the problem of appropriately drawing conclusions from RNA-seq data may be achieved by comparing the features of mutation-dependent and mutation-independent differentially expressed genes in relation to the biological or biochemical functions of the mutated gene.

    We have looked at GO enrichments across all the experiments and the expression patterns of the two sets of genes on chromosome 3 in the sox10 experiment (Lines 188-216).

  2. Evaluation Summary:

    Zebrafish strains are typically considerably polymorphic. White and colleagues tested the hypothesis that genes in linkage with a mutant allele might show allele-specific expression differences and thus potentially confound the interpretation of mutant effects. Using a variety of mutant and wild-type alleles with sophisticated analysis of RNA-seq data in zebrafish embryos they demonstrate over-representation of gene expression changes from genes that are in linkage with the mutant allele on the same chromosome. The data are extensive, carefully analyzed and of sufficient depth and quality to support their main claim of frequent occurrence of allele-specific gene expression in outcross experiments. These allele-specific expression differences may impact on the interpretation of differential gene expression caused by a specific mutation. The findings of this study will be of interest to genetics working not only with zebrafish, but potentially also other polymorphic species.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and #3 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    White and colleagues have generated data that addresses the hypothesis that genes in linkage disequilibrium with a mutant allele in zebrafish could show allele-specific expression effects caused by high polymorphism rates in zebrafish (eQTLs). The experiments use a variety of mutant alleles with sophisticated analysis of RNA-seq data that is both allele-aware and chromosomal location-aware. They show a significant over-representation of gene expression changes from genes that are located close to the mutant allele on the same chromosome. They also performed global allele-specific expression analysis on the SAT line (a hybrid of Tü and AB strains) and showed significant correlations with gene level alterations when regions were one haplotype or the other.

    The data are extensive, carefully analyzed and of sufficient depth and quality to support their claims convincingly. I do not have any significant critiques of the data or the conclusions drawn from them.

  4. Reviewer #2 (Public Review):

    In an extensive analysis of zebrafish wild-type vs. mutant RNA seq datasets, the authors find that differentially expressed genes are often enriched in the chromosomal region of the mutated gene.

    An older paper by Miller et al. (2013, Genome Research 23: 679) also analyzed RNA seq data on wild-type and mutant zebrafish (in their case with the main goal of identifying the mutated gene), and they noted only a small number of differentially expressed genes near the mutated locus. White et al. could mention this paper and consider methodological differences that may explain these seemingly different conclusions.

    By genotyping and performing RNA seq on individual animals from a large cross, the authors obtain convincing evidence that genomic polymorphisms cause many genes to be differentially expressed in different wild-type zebrafish strains. They show that most of the differentially expressed genes near mutant loci are likely to be caused allele-specific expression differences in linkage disequilibrium with the mutation rather than by any action of the mutated gene. The authors illustrate how one can determine which nearby genes may actually be regulated by the mutated gene, using polymorphisms among wild-type chromosomes and different mutant alleles of the muted gene. Another possible, complementary approach might be to rescue the mutants with a wild-type transgene, which should rescue the mutant phenotype and genes that are regulated by the mutated gene but not affect differentially expressed genes caused simply by allelic differences. In some cases, one could also use transgenic over expression of the gene of interest to compare loss of function and gain of function of the gene to assess possible inverse effects on target genes (e.g. a putative target gene may be reduced in the mutant and increased in transgenic over expression animals). As the authors note, these approaches would represent significant extra effort, and they reasonably suggest that a simpler alternative is for investigators to consider the chromosomal position differentially expressed genes when interpreting their RNA seq data from outbred strains.

    The authors are likely correct that many investigators analyzing zebrafish RNA seq data may have overlooked this clustering of differentially expressed genes and its causes, but the paper does not contain examples where this has significantly affected pathway analysis or other follow-up experiments. Are there enough differentially expressed genes misattributed to an effect of the mutated gene instead allele-specific expression to lead to major errors in interpretation? For example, the authors show that lama1 mutants have 12 differentially expressed genes, of which 3 are near lama1. Are there major differences in the conclusions drawn from the sets of 9 and 12 differentially expressed genes? Evidence of this type might increase the impact of the paper. As it stands, the demonstration of allele-specific expression is convincing but not surprising in light of the outbred structure of zebrafish strains and much prior work in other species.

  5. Reviewer #3 (Public Review):

    The authors used transcriptome analyses by RNA-seq to identify differentially expressed (DE) genes in a series of previously identified forward genetic mutants emerging from outbred crosses as well as in clusters of wild type zebrafish embryos emerging from newly-generated cross of well-defined genotypes. The authors present experiments, which convincingly demonstrate physical linkage of DE genes to the mutated locus and their predominant localisation on the mutation-carrying chromosome. Next the authors demonstrate, that allelic variation of expression is common in a wild type hybrid cross of SAT double haploid strains and demonstrate haplotype-dependent allelic gene expression variation. Finally, White et al. offer an example approach for distinguishing gene expression change caused by a mutation from that caused by allelic variation of expression of genes in linkage disequilibrium with the mutation by analysing segregation of alleles and their expression dynamics.

    The data from a series of mutants and well-defined wild type crosses convincingly demonstrates the impact of strain polymorphism and linkage disequilibrium on differential gene expression. The provided evidence suggests the generality of differential gene expression readouts arising independently from generated mutations in outcross experiments in zebrafish.
    These observations are potentially important for the design of transcriptomic analyses of forward genetic screens and other experiments involving RNA-seq from outcrosses such as inter and transgenerational epigenetic inheritance studies.
    Evidence on the actual impact of misinterpretation of gene expression differences on biological conclusions drawn from mutants generated in outbred crosses would strengthen the study.

    The conclusions of this manuscript are well supported by the experimental data, some aspects would benefit from further clarification.

    1.) Figure 4 demonstrates separation of differential expression due to sox10 mutations from those arising from allele-specific variation in LD with sox10 by providing an individual example for both. In this section a global demonstration of the distinct segregation-associated expression dynamics would strengthen the claim. It is recommended that the expression variation for the full set of genes quoted in the text (10 and 15 genes respectively) are shown.
    2.) Demonstration of the importance of the problem of appropriately drawing conclusions from RNA-seq data may be achieved by comparing the features of mutation-dependent and mutation-independent differentially expressed genes in relation to the biological or biochemical functions of the mutated gene.