Gene expression variability in human and chimpanzee populations share common determinants

This article has been Reviewed by the following groups

Read the full article

Abstract

Inter-individual variation in gene expression has been shown to be heritable and is often associated with differences in disease susceptibility between individuals. Many studies focused on mapping associations between genetic and gene regulatory variation, yet much less attention has been paid to the evolutionary processes that shape the observed differences in gene regulation between individuals in humans or any other primate. To begin addressing this gap, we performed a comparative analysis of gene expression variability and expression quantitative trait loci (eQTLs) in humans and chimpanzees, using gene expression data from primary heart samples. We found that expression variability in both species is often determined by non-genetic sources, such as cell-type heterogeneity. However, we also provide evidence that inter-individual variation in gene regulation can be genetically controlled, and that the degree of such variability is generally conserved in humans and chimpanzees. In particular, we found a significant overlap of orthologous genes associated with eQTLs in both species. We conclude that gene expression variability in humans and chimpanzees often evolves under similar evolutionary pressures.

Article activity feed

  1. ###Reviewer #3:

    The paper by Fair (Gilad) and colleagues examined the determinants of gene expression variation within human and chimpanzee populations. Studies focused on an analysis of left ventricle in 39 chimpanzees and 39 human samples. The authors first developed a strategy to measure "dispersion", or gene expression variance after regressing out the effects of mean expression. This metric of dispersion was correlated between human and chimpanzee in most genes, but there were substantial differences between species that could not be explained by changes in mean expression level. Highly dispersed genes were enriched for genes with a higher amino acid divergence, TATA boxes, and cellular composition. In fact, the authors found that changes in cellular composition between samples were highly correlated with expression dispersion, wherein genes that were markers of specific cell populations were highly dispersed. Analysis of eQTLs discovered that genes which are variable based on eQTLs in one species were enriched for eQTLs in the other.

    Overall, there are many good things about this paper. The data will be of broad utility to the comparative genomics community: the authors added RNA-seq data from the left ventricle of 21 chimpanzees and high coverage complete genomes from 39. The calculation of power for discovering differentially expressed genes as a function of sample size at the beginning of the paper is a thoughtful analysis that is useful to many in the community. As I have come to expect from these authors, all of the analyses are extremely thorough and well-executed. The statistical tests are appropriate and rigorous. Results are interpreted in a conservative fashion.

    The main issue is that the authors are not able to conclusively disambiguate between different causes of dispersion. Genetics, cell type, and technical variation may all contribute to dispersion. The authors state this very clearly throughout the manuscript. In part, this may reflect the authors' underselling their results somewhat. But in part, this really does reflect reality: Cell type is a major confounder that may provide false signals in other analyses.

    Major comments/ suggestions:

    1. Did the authors test directly whether eQTLs were enriched in genes with a high dispersion? I could not find this going back through the paper. This seems almost trivially likely to be true. I may have missed this result? Or did the authors worry this is too likely to be confounded with cell type? Either way, this seems like a result that may be useful to show even if the authors did acknowledge that it was likely to be confounded.

    2. Did the authors consider looking for cell-type QTLs? They state several times in the paper the possibility that genetic factors may influence cell types. They have enough data - at least in humans - to obtain QTLs for specific cell types, as others have done (Marderstein et. al. Nat Comms 2020; Donovan et. al. Nat Comms 2020). If these cell type QTLs were enriched near genes with a high dispersion, this may bolster the author's argument that genetic factors underlie dispersion by affecting cell type composition.

    3. The scRNA-seq reference used for estimating cell types in heart tissue was derived from mice. Could this lead the authors to underestimate the degree to which cell types drive dispersion in genes that are variable between human and chimp? Genes that are variable between human/ chimp may also be more likely to be variable between either species and mouse, and perhaps this variability has led to them becoming more/ less of a marker of a specific cell population (and hence their dispersion in primates does not correlate with cell type specificity in mouse).

    4. Have the authors tried estimating dispersion on top of what is expected based on differences in cell type? There are several strategies that might work for this: There are new strategies for estimating a posterior of cell type specific expression from a bulk sample, conditional on scRNA-seq data as prior information (Chu and Danko, bioRxiv, 2020). These cell type specific expression estimates could then be analyzed for dispersion. Alternatively, it may also work to regress the estimated proportion of each cell type out of the dispersion estimates. While there are certainly a lot of pitfalls with using these strategies, especially in the setting shown here (all of this would work better if there were species matched reference data), they might provide an avenue for depleting the contribution of cell type differences from dispersion estimates.

    5. Can the authors add a dotted line to show the shape of the distribution for genes with low dispersion, or where dispersion is shared in both human and chimpanzee, in figure 4b? Is this different from genes that are dispersed in either chimp or human?

    6. Type. pp. 20. "... in only in ..."

  2. ###Reviewer #2:

    In this study, Fair et al. focused on assessing inter-individual variability in gene expression, which has been shown to be heritable and associated with disease susceptibility. More specifically, unlike many studies focused on mapping associations between genetic and gene regulatory variation, authors paid attention to the group dispersion/variance of gene expression among samples as well as the evolutionary processes that shape the differences in gene regulation between individuals in humans or any other primate. Using computational deconvolution, they found that cell-type heterogeneity determines expression variability in both species. They also found a significant overlap of orthologous genes associated with eQTLs in both species. They concluded that gene expression variability in humans and chimpanzees often evolves under similar evolutionary pressures. The manuscript, in general, is well prepared. For example, authors put supplementary figures within the main text whenever they are supposed to be, which is convenient. The authors collected data from 39 human vs. 39 chimp primary heart tissue samples. The sources of human samples include 11 (old study)+28 (GTEx) and chimp samples 18 (old)+21 (new). Twenty-one new specimens are generated specifically for this study. This study involves a large number of tests, but the main problem is the lack of a coherent central hypothesis.

    Major comments:

    1. The first test authors conducted is to identify differentially variable (DV) genes. A total of 2658 DV genes were identified. The problem of the result is that almost equal number of up- and down-regulated DV genes symmetrically distributed around DV=0. Often, this is an indication of a lack of biological signals in data analysis. This might be due to the pooling of gene groups with diverse functionality together. Therefore, this reviewer suggests that authors should break down genes into subgroups to detail the up and down-regulatory patterns with the hope that some of the gene groups give interpretable results

    2. The second test is to correlate the higher coding sequence conservation with lower dispersion. Again, the positive result is not unexpected. There are many indirect and/or confounding factors that may explain the effect. This reviewer, however, understands it is impossible to control them all (also authors have attempted to address some of them in the next few tests). However, here it is better to add exploratory analyses for genes in different functional groups and also give examples of outlier genes that do not follow the rule.

    3. The third test is to examine the correlation between gene expression variability with single-cell type heterogeneity of samples. Authors first used Tabula Muris dataset to show dispersion is strongly correlated with cell-type specificity/diversity. If this is true, then the point that authors really wanted to demonstrate is, in fact, hampered. Authors might really want to show the "true" single-cell variability (see, for example, PMID: 31861624) is correlated with the level of group variance of gene expression.

    4. The fourth test authors conducted is to show that dn/ds and pn/ps ratios of genes are correlated with gene expression variability (variance). However, because of the existence of heterogeneity of cell-type composition in samples, any correlation observed may be utterly biased by this single uncontrollable confounding factor. Furthermore, heart tissues contain an over-abundant expression of genes encoded in the mitochondrial genome. The expression level of these mt-genes may vary substantially between samples and reflect the health status of primary sample donors. PEER normalization may have to take this into account as a covariant.

    5. Several other tests authors performed are around eQTLs (eGene overlap and eSNP overlap) between the two species. These are typical tests evolutionary biologists usually try to do whenever data is available. However, the issues with these types of tests are the low power in general. More importantly, in order to be consistent with previous tests which are all around the explanation of gene expression variance, this part should address the overlap between expression vQTLs in humans and chimps.

  3. ###Reviewer #1:

    This is a solid study, with a large sample size, identifying quantitative trait loci (eQTLs) in humans and chimpanzees, using gene expression data from primary heart samples. The authors complemented the analysis of gene expression with a comparative eQTL mapping, as opposed to relying on mean expression levels, as most studies like this one do.

    1. I would like to see more discussion about the inter-relatedness of the chimpanzees in the analysis of gene expression. Is that contributing to the power of the DE analysis, which has really high numbers of DE genes. That may certainly be due to the large samples size, but should be addressed. Related to that, the support that the gene-wise dispersion estimates are well correlated in humans and chimpanzees overall (Fig1C, and S4) seems qualitative. It looks like the chimpanzees might have less dispersion overall?

    2. What do the authors think these findings mean for study systems outside of humans and captive chimpanzees? Both on the technical level (e.g. sample size), and for how their approach could be helpful outside of these species. Generalizing this approach would broaden the impact and audience of the paper.

    3. Just a comment that I appreciated the thoughtfulness of the possible technical confounds in the results and discussion.

  4. ##Preprint Review

    This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    ###Summary:

    This is a solid study, with a large sample size, identifying quantitative trait loci (eQTLs) in humans and chimpanzees, using gene expression data from primary heart samples. The authors complemented the analysis of gene expression with a comparative eQTL mapping, as opposed to relying on mean expression levels, as most comparative studies like this one do. Also unlike many studies focused on mapping associations between genetic and gene regulatory variation, the authors paid attention to the group dispersion/variance of gene expression among samples as well as the evolutionary processes that shape the differences in gene regulation between individuals. The calculation of power for discovering differentially expressed genes as a function of sample size at the beginning of the paper is a thoughtful analysis that is useful to many in the community. All of the analyses are extremely thorough and well-executed. The statistical tests are appropriate and rigorous. Results are interpreted in a conservative fashion.

    The main limitation is that the authors are not able to conclusively disambiguate between different causes of dispersion. Genetics, cell type, and technical variation may all contribute to dispersion. The authors state this very clearly throughout the manuscript. In part, this may reflect the authors' underselling their results somewhat. But in part, this really does reflect reality: Cell type is a major confounder that may provide false signals in other analyses.