Characterization of caffeine response regulatory variants in vascular endothelial cells

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This important study combines disease-associated genetic variation with a massively parallel reporter assay and different cellular perturbations to identify context-specific genetic regulatory effects. The methods and analyses are solid and the proposed functional variants will be helpful for experimental and quantitative geneticists studying a wide range of complex traits.

This article has been Reviewed by the following groups

Read the full article

Abstract

Genetic variants in gene regulatory sequences can modify gene expression and mediate the molecular response to environmental stimuli. In addition, genotype–environment interactions (GxE) contribute to complex traits such as cardiovascular disease. Caffeine is the most widely consumed stimulant and is known to produce a vascular response. To investigate GxE for caffeine, we treated vascular endothelial cells with caffeine and used a massively parallel reporter assay to measure allelic effects on gene regulation for over 43,000 genetic variants. We identified 665 variants with allelic effects on gene regulation and 6 variants that regulate the gene expression response to caffeine (GxE, false discovery rate [FDR] < 5%). When overlapping our GxE results with expression quantitative trait loci colocalized with coronary artery disease and hypertension, we dissected their regulatory mechanisms and showed a modulatory role for caffeine. Our results demonstrate that massively parallel reporter assay is a powerful approach to identify and molecularly characterize GxE in the specific context of caffeine consumption.

Article activity feed

  1. eLife assessment

    This important study combines disease-associated genetic variation with a massively parallel reporter assay and different cellular perturbations to identify context-specific genetic regulatory effects. The methods and analyses are solid and the proposed functional variants will be helpful for experimental and quantitative geneticists studying a wide range of complex traits.

  2. Reviewer #1 (Public Review):

    Characterizing gene-by-environment interactions has been of great interest for quite some time, as these effects are believed (based on plausible hypotheses and some data) to have importance for the interpretation of complex disease risk. Here, a major class of variants of interest is genetic regulatory variants where e.g. binding of context-specific regulators (TFs, etc) provides a plausible mechanism. However, these variants have been difficult to identify in eQTL and other studies.

    This study leverages the MPRA approach to screen for many thousands of constructs of putative regulatory variants for their effects on vascular endothelial cells with and without caffeine. They identify thousands of sequences that are differentially regulated between the conditions, and with motif enrichment approaches and comparisons to prior studies, identify some TFs (including novel ones) that may have a role in how these cells respond to caffeine. Next, by allele-specific expression analysis, they identify thousands of variants that are not only regulatory (having a different activity from the two allelic versions of the construct) but also a major subset that has different regulatory activity following the caffeine treatment. Again, motif analysis indicates potential mechanisms, and the eQTL comparison nicely demonstrates the value of these discoveries. The MRPA approach is clearly fruitful and informative, and identifying many context-specific regulatory variants is informative for people working on genetic regulatory variation.

    The part of the paper that felt underwhelming and not so well-founded was the link to complex disease. I was somewhat surprised to see caffeine experiments in vascular endothelial cells being so strongly framed in terms of CAD. This cell type (and potentially also caffeine) is relevant in many biological processes and diseases. More importantly, given the strongly disease-focused framing, I was surprised to find few results that would actually link the regulatory variant data here to CAD via GWAS overlap or other analyses. Maybe the results were slim here with little overlap, but the results provided do not really justify the implications that disease-relevant findings are being made.

    Specifically, the evidence of PIP4K2B let alone the studied cASE variant having a causal role in CAD is weak. This is based on a previously published pTWAS paper, but the variant itself is not a significant GWAS variant. TWAS is known to easily suffer from non-causal hits due to LD and other complications, and hence this link should be taken with a heavy grain of salt. I would be more convinced if the variant was a significant GWAS hit, and even more so if it was a fine-mapped variant, but it is neither of them. As such, the language (Discussion) is not justified by the data: "By studying different environmental contexts, we can identify that, in this instance, caffeine can reduce the risk of poor cardiovascular health outcomes. If the environmental context was not considered and this work was conducted solely in the control condition, the decreased risk induced by caffeine would not have been observed." the decreased [CAD] risk induced by caffeine would not have been observed." Has to be softened. Furthermore, Figure 5 is an illustration but very little data (cASE p-values, etc) is provided here and in the text.

    Furthermore, I find some of the suggested links to CAD via lipid biology and related TFs quite speculative; are these processes really taking place in vascular endothelian cells? The paper that is being referred to seems to focus on the liver.

    Finally, the analyses seem carefully done, but in Figure 2A the systematic inflation of p-values seems concerning. This could be the real biology of broadly distributed response to caffeine, but it's also consistent with a bias that is unaccounted for and inflates p-values across the board. And do we really expect all these elements to respond to caffeine (more or less)? It is difficult to say what exactly this might be, but the caffeine libraries seem to have a higher sequencing coverage (SFig 5). How does this affect the results? Can it bias the DE results via different overdispersion, or the ASE or cASE estimation when the caffeine condition is a higher power (which ASE analysis is typically sensitive to)?

    The library included negative controls of variants that are not believed to be regulatory variants, but I don't see a systematic presentation of the null obtained from these presented in the paper.

  3. Reviewer #2 (Public Review):

    In "Characterization of caffeine response regulatory variants in vascular endothelial cells", Boye et. al. employ a massively parallel reporter assay, bi-allelic targeted STARR-seq (BiT-STARR-seq), to characterize how non-coding variants affect gene expression in HUVECs after treatment with caffeine. After measuring the differential activity of the individual MPRA constructs in their cells, they test for both allele-specific effects (ASE) in each condition. They likewise test for conditional allele-specific effects (cASE). The authors identify an enrichment cASE variants with stronger allelic effects in caffeine vs control conditions and use a combination of transcription factor motif identification, open chromatin enrichment, caffeine response factor binding site identification, and eQTL fine-mapping to identify 25 SNPs that meet their selection criteria. The authors finally highlight one example SNP from this set, rs22871, as a potential candidate for further analysis.

  4. Reviewer #3 (Public Review):

    Though it is speculated that gene-environment interactions (GxE) contribute to disease heritability, they remain challenging to detect. Here, the authors use a massively parallel reporter assay in vascular endothelial cells treated with or without caffeine to explore context-specific gene regulation. They use a library of 200-bp candidate regions selected from a variety of genetic studies (eQTL, GWAS) and demonstrate allelic bias in activity across a large proportion, including variants with caffeine-specific allelic effects. The described assays represent a useful approach for examining GxE in complex traits, thus these results are of broad appeal. I have great enthusiasm for the experimental design, including the large library and sample size, testing the MPRA in an appropriate cell type with a relevant stimulus, and interesting functional analyses including transcription factor motif enrichment and comparison to GTEx data. My main critique is that the description of analyses and results lacks the clarity that would aid the reader in interpretation.

    1. The abstract states that >43k variants are tested in the library while the methods section states that >43k constructs were tested. Because you tested allele pairs, my expectation is that you would have used ~86k constructs, and at various points, you mention denominators that are higher than 43k. Please address this discrepancy.
    2. Previously, you reported allele-specific expression analysis across many conditions, including caffeine treatment. In that study, you observed high levels of differential expression induced by caffeine treatment (on the order of thousands of genes) with only a modest number of SNPs with allele-specific expression after caffeine treatment. In the current study, you report that only ~800 constructs are differentially active after caffeine treatment which you state as evidence that "caffeine overall increases the activity of the regulatory elements," but this is quite a small number given that you tested tens of thousands of constructs. Later you describe >8k constructs with conditional allele-specific expression. Do you mean that the former subset only displays caffeine effects without allele-specific expression? And taking both studies into account, what do you think accounts for the seeming discrepancies between the relative amount of conditional allele-specific expression measured by RNA-seq vs BiT-STARR-seq?
    3. Your transcription factor motif enrichment analyses are interesting, and would benefit from a further grounding in the biology of the cells you're working with. To this end, what proportion of the transcription factor sets that you use for enrichment are expressed in your cell model? For those that are enriched, are they highly expressed, and does that expression vary with caffeine treatment? You provide some of this information for a specific example (rs228271), but a broader discussion is warranted.
    4. I suggest elaborating on the choice of treatment conditions to provide valuable context. Acute responses to caffeine exposure may vary from chronic exposure. In this study, I think a single acute exposure is more than appropriate for reasons of feasibility and many of the regulatory pathways will be shared between acute and long-term; however, given that CAD is a chronic disease that develops over many years, it would be worthwhile to speculate on longer term effects of caffeine exposure in your model system.