Integrative analysis of metabolite GWAS illuminates the molecular basis of pleiotropy and genetic correlation

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The paper by Smith and colleagues provides a framework for understanding a seemingly paradoxical observation in human genetics: two phenotypes may be closely correlated to each other, and the patterns of genetic variation that influence both phenotypes may be widely shared at the genome-wide level, but there are often specific genetic variants that show discordant patterns. Though the observations in this paper are derived from analysis of metabolic phenotypes, this may have broader relevance to interpreting the results from disease-related genetic association studies, and shed light on the processes that connect different disease phenotypes.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Pleiotropy and genetic correlation are widespread features in genome-wide association studies (GWAS), but they are often difficult to interpret at the molecular level. Here, we perform GWAS of 16 metabolites clustered at the intersection of amino acid catabolism, glycolysis, and ketone body metabolism in a subset of UK Biobank. We utilize the well-documented biochemistry jointly impacting these metabolites to analyze pleiotropic effects in the context of their pathways. Among the 213 lead GWAS hits, we find a strong enrichment for genes encoding pathway-relevant enzymes and transporters. We demonstrate that the effect directions of variants acting on biology between metabolite pairs often contrast with those of upstream or downstream variants as well as the polygenic background. Thus, we find that these outlier variants often reflect biology local to the traits. Finally, we explore the implications for interpreting disease GWAS, underscoring the potential of unifying biochemistry with dense metabolomics data to understand the molecular basis of pleiotropy in complex traits and diseases.

Article activity feed

  1. Evaluation Summary:

    The paper by Smith and colleagues provides a framework for understanding a seemingly paradoxical observation in human genetics: two phenotypes may be closely correlated to each other, and the patterns of genetic variation that influence both phenotypes may be widely shared at the genome-wide level, but there are often specific genetic variants that show discordant patterns. Though the observations in this paper are derived from analysis of metabolic phenotypes, this may have broader relevance to interpreting the results from disease-related genetic association studies, and shed light on the processes that connect different disease phenotypes.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

  2. Reviewer #1 (Public Review):

    The authors use both genome-wide correlations between genetic effects on metabolite pairs ('genetic correlation') and the pleiotropic effects of individual genetic variants to build an understanding of how biochemical pathways relate to global ('genetic correlation') and local (individual variant or pathway) pleiotropy. The authors look at metabolites, which are themselves interesting and predictive of metabolic health, but also serve as a useful 'model system' for understanding genetic correlation.

    The authors demonstrate that genetic variants that have 'discordant' effects on a pair of metabolites, i.e. effects whose product of signs is opposite to the sign of the genome-wide genetic correlation, tend to be variants (likely) affecting pathway-relevant enzyme or transporter genes and/or affect biochemical pathways 'between' the two metabolites.

    The authors attempt to extend this further to a variant associated with coronary artery disease (CAD), which they hypothesize to act by decreasing the activity of the gene PCCB. While an interesting hypothesis, establishing such a mechanism in the etiology of CAD would require further validation.

    This paper represents an advance in linking statistical genetics constructs such as 'genetic correlation' to a biochemical mechanism for an important case: metabolites. While I expect their approach to be influential in showing how to dissect genetic correlation in a way that can point to the biological mechanism, extending their method to more complex phenotypes with less well-characterized biochemical pathways may be challenging.

  3. Reviewer #2 (Public Review):

    In this interesting paper, Smith and colleagues conduct a detailed analysis of metabolomics data generated (so far) on about 100K participants in the UK Biobank resource. They focus on a set of 16 metabolites that are easily connected to core molecular processes such as amino acid metabolism, glycolysis, and ketone body production. There is (not surprisingly) substantial overlap in the genetic variation that impacts on circulating levels of many of these metabolites, evidenced by both overlapping GWAS hits, and a rich pattern of mutual genetic correlation genome-wide.

    The main observation the authors make is that, even when a pair of metabolites show strong positive (or negative) genome correlation genome-wide, there are variants that result in genome-wide association signals for these traits that go against the grain (ie the direction of the variant association for the correlated metabolites is discordant with the genome-wide correlation). The main inference is that these 'discordant' variants are most likely to influence processes that sit between the metabolites concerned. As the authors point out, this means that variants that influence metabolic processes that lie immediately upstream or downstream of a given metabolite (and which are therefore likely to have the largest effects), tend to behave in paradoxical ways when it comes to their pleiotropic manifestations.

    The authors make a compelling case for these observations, based on both an analysis of all metabolites, as well as several more specific vignettes. The use of the UKBB data is a real positive in this respect given the scale and uniformity of the data. In some ways, this is just a genetic reformulation of the well-understood effects of chemical inhibition across a causal metabolic pathway. (Given a pathway A->B->C->D, chemical perturbation of A->B and C->D will generally lead to correlated effects on B and C, whereas perturbation of B->C is likely to lead to discordant effects [more B, less C]). However, this is the first attempt I have seen to extend this kind of framework to the genetic space.

    The broader relevance of the finding is it provides a framework for thinking about similar observations that arise in the analysis of disease GWAS data. The authors provide one example (related to PCCB variants and CAD risk) that starts to pick up this theme, though more work needs to be done to see how this framework can be applied in the "messier" world of disease phenotypes. There are several examples of this kind of discordancy in GWAS data sets: for example, there are some T2D risk alleles that are associated with REDUCED rather than increased BMI, and some that are associated with REDUCED rather than increased risk of NAFLD, both of which are discordant with the epidemiological and overall genetic correlations between T2D and BMI / NAFLD.

    Overall, this paper describes a well-conducted set of analyses, that provides interesting insights into the effects of genetic variants on complex pathways and networks, and a new twist on the issue of pleiotropy. I have no major concerns about the analyses or the inference and believe that the manuscript will be influential in explaining some of these, seemingly paradoxical, observations that pop up from time to time.

  4. Reviewer #3 (Public Review):

    The authors have used both overall and local genetic correlations to understand how genes associated with two traits relate to those same traits. Their work focuses on understanding why in some cases local genetic correlations may disagree with overall correlation in terms of the direction of effect and exploit known biology to understand why and when this arises.

    Overall the work is solid methodologically as it relies on well-established statistical methods and known biology. I don't see particular weaknesses in this work limited to the presented examples. It remains unclear how these observations will generalise to other less well-known biology or traits, but this is a matter of future work.

    The work is in my opinion highly impactable as it creates a framework to be used to investigate the pleiotropic effects of genes and could help understand their biological role.