Discovering non-additive heritability using additive GWAS summary statistics

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This study provides a valuable investigation into whether phenotypic variance due to interactions between genetic variants can be measured using genome-wide association summary statistics. The authors present a method, i-LDSC, that uses statistics on the correlations between genotypes at different loci (linkage disequilibrium) to estimate the phenotypic variance explained by both additive genetic effects and pairwise interactions. While the authors present extensive simulations on the performance of their method and empirical results indicating the presence of epistasis (as they define epistasis) it is unclear how their method and results relate to the traditional definitions of additive and non-additive genetic effects, which are different from the authors' definitions.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

LD score regression (LDSC) is a method to estimate narrow-sense heritability from genome-wide association study (GWAS) summary statistics alone, making it a fast and popular approach. In this work, we present interaction-LD score (i-LDSC) regression: an extension of the original LDSC framework that accounts for interactions between genetic variants. By studying a wide range of generative models in simulations, and by re-analyzing 25 well-studied quantitative phenotypes from 349,468 individuals in the UK Biobank and up to 159,095 individuals in BioBank Japan, we show that the inclusion of a cis -interaction score (i.e. interactions between a focal variant and proximal variants) recovers genetic variance that is not captured by LDSC. For each of the 25 traits analyzed in the UK Biobank and BioBank Japan, i-LDSC detects additional variation contributed by genetic interactions. The i-LDSC software and its application to these biobanks represent a step towards resolving further genetic contributions of sources of non-additive genetic effects to complex trait variation.

Article activity feed

  1. Author Response

    LD Score regression (LDSC) is a software tool widely used in the field of genome-wide association studies (GWAS) for estimating heritabilities, genetic correlations, the extent of confounding, and biological enrichment. LDSC is for the most part not regarded as an accurate estimator of \emph{absolute} heritability (although useful for relative comparisons). It is relied on primarily for its other uses (e.g., estimating genetic correlations). The authors propose a new method called \texttt{i-LDSC}, extending the original LDSC in order to estimate a component of genetic variance in addition to the narrow-sense heritability---epistatic genetic variance, although not necessarily all of it. Epistasis in quantitative genetics refers to the component of genetic variance that cannot be captured by a linear model regressing total genetic values on single-SNP genotypes. \texttt{i-LDSC} seems aimed at estimating that part of the epistatic variance residing in statistical interactions between pairs of SNPs. To simplify, the basic model of \texttt{i-LDSC} for two SNPs $X_1$ and $X_2$ is

    \begin{equation}\label{eq:twoX} Y = X_1 \beta_1 + X_2 \beta_2 + X_1 X_2 \theta + E, \end{equation}

    and estimation of the epistatic variance associated with the product term proceeds through a variant of the original LD Score that measures the extent to which a SNP tags products of genotypes (rather than genotypes themselves). The authors conducted simulations to test their method and then applied it to a number of traits in the UK Biobank and Biobank Japan. They found that for all traits the additive genetic variance was larger than the epistatic, but for height the absolute size of the epistatic component was estimated to be non-negligible. An interpretation of the authors' results that perhaps cannot be ruled out, however, is that pairwise epistasis overall does not make a detectable contribution to the variance of quantitative traits.

    We thank the reviewer for carefully reading of our manuscript and we appreciate the constructive comments. Our responses and edits to the specific major comments and minor issues are given below.

    Major Comments

    This paper has a lot of strong points, and I commend the authors for the effort and ingenuity expended in tackling the difficult problem of estimating epistatic (non-additive) genetic variance from GWAS summary statistics. The mere possibility of the estimated univariate regression coefficient containing a contribution from epistasis, as represented in the manuscript's Equation~3 and elsewhere, is intriguing in and of itself.

    Is \texttt{i-LDSC} Estimating Epistasis?

    Perhaps the issue that has given me the most pause is uncertainty over whether the paper's method is really estimating the non-additive genetic variance, as this has been traditionally defined in quantitative genetics with great consequences for the correlations between relatives and evolutionary theory (Fisher, 1930, 1941; Lynch & Walsh, 1998; Burger, 2000; Ewens, 2004).

    Let us call the expected phenotypic value of a given multiple-SNP genotype the \emph{total genetic value}. If we apply least-squares regression to obtain the coefficients of the SNPs in a simple linear model predicting the total genetic values, then the partial regression coefficients are the \emph{average effects of gene substitution} and the variance in the predicted values resulting from the model is called the \emph{additive genetic variance}. (This is all theoretical and definitional, not empirical. We do not actually perform this regression.) The variance in the residuals---the differences between the total genetic values and the additive predicted values---is the \emph{non-additive genetic variance}. Notice that this is an orthogonal decomposition of the variance in total genetic values. Thus, in order for the variance in $\mathbf{W}\bm{\theta}$ to qualify as the non-additive genetic variance, it must be orthogonal to $\mathbf{X} \bm{\beta}$.

    At first, I very much doubted whether this is generally true. And I was not reassured by the authors' reply to Reviewer~1 on this point, which did not seem to show any grasp of the issue at all. But to my surprise I discovered in elementary simulations of Equation~\ref{eq:twoX} above that for mean-centered $X_1$ and $X_2$, $(X_1 \beta_1 + X_2 \beta_2)$ is uncorrelated with $X_1 X_2 \theta$ for seemingly arbitrary correlation between $X_1$ and $X_2$. A partition of the outcome's variance between these two components is thus an orthogonal decomposition after all. Furthermore, the result seems general for any number of independent variables and their pairwise products. I am also encouraged by the report that standard and interaction LD Scores are ``lowly correlated' (line~179), meaning that the standard LDSC slope is scarcely affected by the inclusion of interaction LD Scores in the regression; this behavior is what we should expect from an orthogonal decomposition.

    I have therefore come to the view that the additional variance component estimated by \texttt{i-LDSC} has a close correspondence with the epistatic (non-additive) genetic variance after all.

    In order to make this point transparent to all readers, however, I think that the authors should put much more effort into placing their work into the traditional framework of the field. It was certainly not intuitive to multiple reviewers that $\mathbf{X}\bm{\beta}$ is orthogonal to $\mathbf{W}\bm{\theta}$. There are even contrary suggestions. For if $(\mathbf{X}\bm{\beta})^\intercal \mathbf{W} \bm{\theta} = \bm{\beta}^\intercal \mathbf{X}^\intercal \mathbf{W} \bm{\theta} $ is to equal zero, we know that we can't get there by $\mathbf{X}^\intercal \mathbf{W}$ equaling zero because then the method has nothing to go on (e.g., line~139). We thus have a quadratic form---each term being the weighted product of an average (additive) effect and an interaction coefficient---needing to cancel out to equal zero. I wonder if the authors can put forth a rigorous argument or compelling intuition for why this should be the case.

    In the case of two polymorphic sites, quantitative genetics has traditionally partitioned the total genetic variance into the following orthogonal components:

    \begin{itemize}

    \item additive genetic variance, $\sigma^2_A$, the numerator of the narrow-sense heritability;

    \item dominance genetic variance, $\sigma^2_D$;

    \item additive-by-additive genetic variance, $\sigma^2_{AA}$;

    \item additive-by-dominance genetic variance, $\sigma^2_{AD}$; and

    \item dominance-by-dominance genetic variance, $\sigma^2_{DD}$.

    \end{itemize}

    See Lynch and Walsh (1998, pp. 88-92) for a thorough numerical example. This decomposition is not arbitrary or trivial, since each component has a distinct coefficient in the correlations between relatives. Is it possible for the authors to relate the variance associated with their $\mathbf{W}\bm{\theta}$ to this traditional decomposition? Besides justifying the work in this paper, the establishment of a relationship can have the possible practical benefit of allowing \texttt{i-LDSC} estimates of non-additive genetic variance to be checked against empirical correlations between relatives. For example, if we know from other methods that $\sigma^2_D$ is negligible but that \texttt{i-LDSC} returns a sizable $\sigma^2_{AA}$, we might predict that the parent-offspring correlation should be equal to the sibling correlation; a sizable $\sigma^2_D$ would make the sibling correlation higher. Admittedly, however, such an exercise can get rather complicated for the variance contributed by pairs of SNPs that are close together (Lynch & Walsh, 1998, pp. 146-152).

    I would also like the authors to clarify whether LDSC consistently overestimates the narrow-sense heritability in the case that pairwise epistasis is present. The figures seem to show this. I have conflicting intuitions here. On the one hand, if GWAS summary statistics can be inflated by the tagging of epistasis, then it seems that LDSC should overestimate heritability (or at least this should be an upwardly biasing factor; other factors may lead the net bias to be different). On the other hand, if standard and interaction LD Scores are lowly correlated, then I feel that the inclusion of interaction LD Score in the regression should not strongly affect the coefficient of the standard LD Score. Relatedly, I find it rather curious that \texttt{i-LDSC} seems increasingly biased as the proportion of genetic variance that is non-additive goes up---but perhaps this is not too important, since such a high ratio of narrow-sense to broad-sense heritability is not realistic.

    We thank the reviewer for taking the time to thoughtfully offer more context on how we might situate the i-LDSC framework within the greater context of traditional quantitative genetics. We now formalize the interaction component used in the i-LDSC model as an estimate of the phenotypic variance explained by additive-by-additive interactions between genetic variants (which we denote by 𝜎" to follow the conventional notation). In the newly revised Material and Methods, we also show how the i-LDSC model can be formulated to include dominance effects in a more general framework. Our updated derivations provide two key takeaways.

    First, we assume that the additive and interaction effect sizes in the general model (𝜷,𝜽) are each normally distributed with variances proportional to their individual contributions to trait heritability: 𝛽& ∼ 𝒩(0, 𝜎"), πœƒ' ∼ 𝒩(0, 𝜎" ). This independence assumption implies that the additive and non- $ $$ additive components π‘Ώπœ· and π‘Ύπœ½ are orthogonal where 𝔼[πœ·βŠΊπ‘ΏβŠΊπ‘Ύπœ½] = 𝔼[𝜷⊺]π‘ΏβŠΊπ‘Ύπ”Ό[𝜽] = 𝟎. This is important because, as the reviewer points out, it means that there is a unique partitioning of genetic variance when studying a trait of interest. In the revised version of the manuscript, we show this derivation in the main text (see lines 129-143). We also extend this derivation in the Materials and Methods where we show the same result even after we include the presence of dominance effects in the generative model (see lines 415-417 and 438-457).

    Second, we show that the genotype matrix 𝑿 and the matrix of genetic interactions 𝑾 are not linearly dependent because the additive-by-additive effects between two SNPs are encoded as the Hadamard product of two genotypic vectors in the form π’˜! = 𝒙" ∘ 𝒙# (which is a nonlinear function of the genotypes). Linear dependence would have implied that one could find a transformation between a SNP and an interaction term in the form π’˜! = 𝑐 Γ— 𝒙" for some constant 𝑐. However, despite their linear independence, 𝑿 and 𝑾 are themselves not orthogonal and still have a nonzero correlation. This implies that the inner product between genotypes and their interactions is nonzero π‘ΏβŠΊπ‘Ύ β‰  𝟎. To see this, we focus on a focal SNP 𝒙& and consider three different types of interactions:

    • Scenario I: Interaction between a focal SNP with itself (𝒙" ∘ 𝒙").
    • Scenario II: Interaction between a focal SNP with a different SNP (𝒙" ∘ 𝒙#).
    • Scenario III: Interaction between a focal SNP with a pair of different SNPs (𝒙# ∘ 𝒙$).

    In the Materials and Methods of the revised manuscript, we now provide derivations showing when would expect nonzero correlation between 𝑿 and 𝑾 which rely on the fact that: (1) we assume that genotypes have been mean-centered and scaled to have unit variance, and (2) under Hardy-Weinberg equilibrium, SNPs marginally follow a binomial distribution 𝒙& ∼ 𝐡𝑖𝑛(2, 𝑝) where 𝑝 represents the minor allele frequency (MAF) (Wray et al. 2007, Genome Res; Lippert et al. 2013, Sci Rep). These new additions are given in new lines 460-485).

    Lastly, we agree with the reviewer that our results indicate that LDSC inflates estimates of SNP- based narrow-sense heritability. Our intuition for why this happens is largely consistent with the reviewer’s first point: since GWAS summary statistics can be inflated by the tagging of non- additive genetic variance, then it makes sense that LDSC should overestimate heritability. LDSC uses a univariate regression without the inclusion of cis-interaction scores. A simple consequence from β€œomitted variable bias” is likely happening where, since LDSC does not explicitly account for contributions from the tagged non-additive components which also contribute to the variance in the GWAS summary statistics, the estimate for the coefficient 𝜎" becomes slightly inflated.

    How Much Epistasis Is \texttt{i-LDSC} Detecting?

    I think the proper conclusion to be drawn from the authors' analyses is that statistically significant epistatic (non-additive) genetic variance was not detected. Specifically, I think that the analysis presented in Supplementary Table~S6 should be treated as a main analysis rather than a supplementary one, and the results here show no statistically significant epistasis. Let me explain.

    Most serious researchers, I think, treat LDSC as an unreliable estimator of narrow-sense heritability; it typically returns estimates that are too low. Not even the original LDSC paper pressed strongly to use the method for estimating $h^2$ (Bulik-Sullivan et al., 2015). As a practical matter, when researchers are focused on estimating absolute heritability with high accuracy, they usually turn to GCTA/GREML (Evans et al., 2018; Wainschtein et al., 2022).

    One reason for low estimates with LDSC is that if SNPs with higher LD Scores are less likely to be causal or to have large effect sizes, then the slope of univariate LDSC will not rise as much as it ``should' with increasing LD Score. This was a scenario actually simulated by the authors and displayed in their Supplementary Figure~S15. [Incidentally, the authors might have acknowledged earlier work in this vein. A simulation inducing a negative correlation between LD Scores and $\chi^2$ statistics was presented by Bulik-Sullivan et al. (2015, Supplementary Figure 7), and the potentially biasing effect of a correlation over SNPs between LD Scores and contributed genetic variance was a major theme of Lee et al. (2018).] A negative correlation between LD Score and contributed variance does seem to hold for a number of reasons, including the fact that regions of the genome with higher recombination rates tend to be more functional. In short, the authors did very well to carry out this simulation and to show in their Supplementary Figure~S15 that this flaw of LDSC in estimating narrow-sense heritability is also a flaw of \texttt{i-LDSC} in estimating broad-sense heritability. But they should have carried the investigation at least one step further, as I will explain below.

    Another reason for LDSC being a downwardly biased estimator of heritability is that it is often applied to meta-analyses of different cohorts, where heterogeneity (and possibly major but undetected errors by individual cohorts) lead to attenuation of the overall heritability (de Vlaming et al., 2017).

    The optimal case for using LDSC to estimate heritability, then, is incorporating the LD-related annotation introduced by Gazal et al. (2017) into a stratified-LDSC (s-LDSC) analysis of a single large cohort. This is analogous to the calculation of multiple GRMs defined by MAF and LD in the GCTA/GREML papers cited above. When this was done by Gazal et al. (2017, Supplementary Table 8b), the joint impact of the improvements was to increase the estimated narrow-sense heritability of height from 0.216 to 0.534.

    All of this has at least a few ramifications for \texttt{i-LDSC}. First, the authors do not consider whether a relationship between their interaction LD Scores and interaction effect sizes might bias their estimates. (This would be on top of any biasing relationship between standard LD Scores and linear effect sizes, as displayed in Supplementary Figure~S15.) I find some kind of statistical relationship over the whole genome, induced perhaps by evolutionary forces, between \emph{cis}-acting epistasis and interaction LD Scores to be plausible, albeit without intuition regarding the sign of any resulting bias. The authors should investigate this issue or at least mention it as a matter for future study. Second, it might be that the authors are comparing the estimates of broad-sense heritability in Table~1 to the wrong estimates of narrow-sense heritability. Although the estimates did come from single large cohorts, they seem to have been obtained with simple univariate LDSC rather than s-LDSC. When the estimate of $h^2$ obtained with LDSC is too low, some will suspect that the additional variance detected by \texttt{i-LDSC} is simply additive genetic variance missed by the downward bias of LDSC. Consider that the authors' own Supplementary Table~S6 gives s-LDSC heritability estimates that are consistently higher than the LDSC estimates in Table~1. E.g., the estimated $h^2$ of height goes from 0.37 to 0.43. The latter figure cuts quite a bit into the estimated broad-sense heritability of 0.48 obtained with \texttt{i-LDSC}.

    Here we come to a critical point. Lines 282--286 are not entirely clear, but I interpret them to mean that the manuscript's Equation~5 was expanded by stratifying $\ell$ into the components of s-LDSC and this was how the estimates in Supplementary Table~S6 were obtained. If that interpretation is correct, then the scenario of \texttt{i-LDSC} picking up missed additive genetic variance seems rather plausible. At the very least, the increases in broad-sense heritability reported in Supplementary Table~S6 are smaller in magnitude and \emph{not statistically significant}. Perhaps what this means is that the headline should be a \emph{negligible} contribution of pairwise epistasis revealed by this novel and ingenious method, analogous to what has been discovered with respect to dominance (Hivert et al., 2021; Pazokitoroudi et al., 2021; Okbay et al., 2022; Palmer et al., 2023).

    This is an excellent question raised by the reviewer and, again, we really appreciate such a thoughtful and thorough response. First, we completely agree with the reviewer that the s-LDSC estimates previously included in the Supplementary Material should instead be discussed in the main text of the manuscript. In the revision, we have now moved the old Supplemental Table S6 to be the new Table 2. Second, we also agree that the conclusions about the magnitude of additive-by-additive effects should be based upon variance explained when using the cis- interaction score in addition to scores specific to different biological annotations when available, per s-LDSC.

    However, we want to respectfully disagree that the results indicate a negligible contribution of additive-by-additive genetic variance to all the traits we analyzed (see Figure 4D). Although the additive-by-additive genetic variance component is not significant in any trait in the UK Biobank, there is little reason to expect that they would be given the inclusion of 97 other biological annotations from the s-LDSC model. Indeed, in the s-LDSC paper itself the authors look only for enrichment of heritability for a given annotation not a statistically significant test statistic. It also worth noting that jackknife approaches tend to be conservative and yield slightly larger standard errors for hypothesis testing. Taking all the great points that the reviewer mentioned into account, we believe that a moderate stance to the interpretation of our results is one that: (i) emphasizes the importance of using s-LDSC with the cis-interaction score to better assess the variance explained by additive-by-additive interaction effects and (ii) allows for the significance of the additive-by-additive component to not be the only factor when determining the importance of the role of non-additive effects in shaping trait architecture.

    In the revision, we now write the following in lines 331-343:

    Lastly, we performed an additional analysis in the UK Biobank where the cis-interaction scores are included as an annotation alongside 97 other functional categories in the stratified-LD score regression framework and its software s-LDSC (Materials and Methods). Here, s-LDSC heritability estimates still showed an increase with the interaction scores versus when the publicly available functional categories were analyzed alone, but albeit at a much smaller magnitude (Table 2). The contributions from the additive-by-additive component to the overall estimate of genetic variance ranged from 0.005 for MCHC (P = 0.373) to 0.055 for HDL (P = 0.575) (Figures 4C and 4D). Furthermore, in this analysis, the estimates of the additive-by-additive components were no longer statistically significant for any of the traits in the UK Biobank (Table 2). Despite this, these results highlight the ability of the i-LDSC framework to identify sources of β€œmissing” phenotypic variance explained in heritability estimation. Importantly, moving forward, we suggest using the cis- interaction scores with additional annotations whenever they are available as it provides more conservative estimates of the role of additive-by-additive effects on trait architecture.

    Lastly, in the Discussion, we now mention an area of future work would be to explore how the relationship between cis-interaction LD scores and interaction effect sizes might bias heritability estimates from i-LDSC (e.g., similar to the relationship explored standard LD scores and linear effect sizes in Figure 3 – figure supplement 8). See new lines 364-367.

  2. eLife assessment

    This study provides a valuable investigation into whether phenotypic variance due to interactions between genetic variants can be measured using genome-wide association summary statistics. The authors present a method, i-LDSC, that uses statistics on the correlations between genotypes at different loci (linkage disequilibrium) to estimate the phenotypic variance explained by both additive genetic effects and pairwise interactions. While the authors present extensive simulations on the performance of their method and empirical results indicating the presence of epistasis (as they define epistasis) it is unclear how their method and results relate to the traditional definitions of additive and non-additive genetic effects, which are different from the authors' definitions.

  3. Joint Public Review:

    LD Score regression (LDSC) is a software tool widely used in the field of genome-wide association studies (GWAS) for estimating heritabilities, genetic correlations, the extent of confounding, and biological enrichment. LDSC is for the most part not regarded as an accurate estimator of \emph{absolute} heritability (although useful for relative comparisons). It is relied on primarily for its other uses (e.g., estimating genetic correlations). The authors propose a new method called \texttt{i-LDSC}, extending the original LDSC in order to estimate a component of genetic variance in addition to the narrow-sense heritability---epistatic genetic variance, although not necessarily all of it. Epistasis in quantitative genetics refers to the component of genetic variance that cannot be captured by a linear model regressing total genetic values on single-SNP genotypes. \texttt{i-LDSC} seems aimed at estimating that part of the epistatic variance residing in statistical interactions between pairs of SNPs. To simplify, the basic model of \texttt{i-LDSC} for two SNPs $X_1$ and $X_2$ is
    \begin{equation}\label{eq:twoX}
    Y = X_1 \beta_1 + X_2 \beta_2 + X_1 X_2 \theta + E,
    \end{equation}
    and estimation of the epistatic variance associated with the product term proceeds through a variant of the original LD Score that measures the extent to which a SNP tags products of genotypes (rather than genotypes themselves). The authors conducted simulations to test their method and then applied it to a number of traits in the UK Biobank and Biobank Japan. They found that for all traits the additive genetic variance was larger than the epistatic, but for height the absolute size of the epistatic component was estimated to be non-negligible. An interpretation of the authors' results that perhaps cannot be ruled out, however, is that pairwise epistasis overall does not make a detectable contribution to the variance of quantitative traits.

    Major Comments

    This paper has a lot of strong points, and I commend the authors for the effort and ingenuity expended in tackling the difficult problem of estimating epistatic (non-additive) genetic variance from GWAS summary statistics. The mere possibility of the estimated univariate regression coefficient containing a contribution from epistasis, as represented in the manuscript's Equation~3 and elsewhere, is intriguing in and of itself.

    Is \texttt{i-LDSC} Estimating Epistasis?

    Perhaps the issue that has given me the most pause is uncertainty over whether the paper's method is really estimating the non-additive genetic variance, as this has been traditionally defined in quantitative genetics with great consequences for the correlations between relatives and evolutionary theory (Fisher, 1930, 1941; Lynch & Walsh, 1998; Burger, 2000; Ewens, 2004).

    Let us call the expected phenotypic value of a given multiple-SNP genotype the \emph{total genetic value}. If we apply least-squares regression to obtain the coefficients of the SNPs in a simple linear model predicting the total genetic values, then the partial regression coefficients are the \emph{average effects of gene substitution} and the variance in the predicted values resulting from the model is called the \emph{additive genetic variance}. (This is all theoretical and definitional, not empirical. We do not actually perform this regression.) The variance in the residuals---the differences between the total genetic values and the additive predicted values---is the \emph{non-additive genetic variance}. Notice that this is an orthogonal decomposition of the variance in total genetic values. Thus, in order for the variance in $\mathbf{W}\bm{\theta}$ to qualify as the non-additive genetic variance, it must be orthogonal to $\mathbf{X} \bm{\beta}$.

    At first, I very much doubted whether this is generally true. And I was not reassured by the authors' reply to Reviewer~1 on this point, which did not seem to show any grasp of the issue at all. But to my surprise I discovered in elementary simulations of Equation~\ref{eq:twoX} above that for mean-centered $X_1$ and $X_2$, $(X_1 \beta_1 + X_2 \beta_2)$ is uncorrelated with $X_1 X_2 \theta$ for seemingly arbitrary correlation between $X_1$ and $X_2$. A partition of the outcome's variance between these two components is thus an orthogonal decomposition after all. Furthermore, the result seems general for any number of independent variables and their pairwise products. I am also encouraged by the report that standard and interaction LD Scores are ``lowly correlated' (line~179), meaning that the standard LDSC slope is scarcely affected by the inclusion of interaction LD Scores in the regression; this behavior is what we should expect from an orthogonal decomposition.

    I have therefore come to the view that the additional variance component estimated by \texttt{i-LDSC} has a close correspondence with the epistatic (non-additive) genetic variance after all.

    In order to make this point transparent to all readers, however, I think that the authors should put much more effort into placing their work into the traditional framework of the field. It was certainly not intuitive to multiple reviewers that $\mathbf{X}\bm{\beta}$ is orthogonal to $\mathbf{W}\bm{\theta}$. There are even contrary suggestions. For if $(\mathbf{X}\bm{\beta})^\intercal \mathbf{W} \bm{\theta} = \bm{\beta}^\intercal \mathbf{X}^\intercal \mathbf{W} \bm{\theta} $ is to equal zero, we know that we can't get there by $\mathbf{X}^\intercal \mathbf{W}$ equaling zero because then the method has nothing to go on (e.g., line~139). We thus have a quadratic form---each term being the weighted product of an average (additive) effect and an interaction coefficient---needing to cancel out to equal zero. I wonder if the authors can put forth a rigorous argument or compelling intuition for why this should be the case.

    In the case of two polymorphic sites, quantitative genetics has traditionally partitioned the total genetic variance into the following orthogonal components:
    \begin{itemize}
    \item additive genetic variance, $\sigma^2_A$, the numerator of the narrow-sense heritability;
    \item dominance genetic variance, $\sigma^2_D$;
    \item additive-by-additive genetic variance, $\sigma^2_{AA}$;
    \item additive-by-dominance genetic variance, $\sigma^2_{AD}$; and
    \item dominance-by-dominance genetic variance, $\sigma^2_{DD}$.
    \end{itemize}
    See Lynch and Walsh (1998, pp. 88-92) for a thorough numerical example. This decomposition is not arbitrary or trivial, since each component has a distinct coefficient in the correlations between relatives. Is it possible for the authors to relate the variance associated with their $\mathbf{W}\bm{\theta}$ to this traditional decomposition? Besides justifying the work in this paper, the establishment of a relationship can have the possible practical benefit of allowing \texttt{i-LDSC} estimates of non-additive genetic variance to be checked against empirical correlations between relatives. For example, if we know from other methods that $\sigma^2_D$ is negligible but that \texttt{i-LDSC} returns a sizable $\sigma^2_{AA}$, we might predict that the parent-offspring correlation should be equal to the sibling correlation; a sizable $\sigma^2_D$ would make the sibling correlation higher. Admittedly, however, such an exercise can get rather complicated for the variance contributed by pairs of SNPs that are close together (Lynch & Walsh, 1998, pp. 146-152).

    I would also like the authors to clarify whether LDSC consistently overestimates the narrow-sense heritability in the case that pairwise epistasis is present. The figures seem to show this. I have conflicting intuitions here. On the one hand, if GWAS summary statistics can be inflated by the tagging of epistasis, then it seems that LDSC should overestimate heritability (or at least this should be an upwardly biasing factor; other factors may lead the net bias to be different). On the other hand, if standard and interaction LD Scores are lowly correlated, then I feel that the inclusion of interaction LD Score in the regression should not strongly affect the coefficient of the standard LD Score. Relatedly, I find it rather curious that \texttt{i-LDSC} seems increasingly biased as the proportion of genetic variance that is non-additive goes up---but perhaps this is not too important, since such a high ratio of narrow-sense to broad-sense heritability is not realistic.

    How Much Epistasis Is \texttt{i-LDSC} Detecting?

    I think the proper conclusion to be drawn from the authors' analyses is that statistically significant epistatic (non-additive) genetic variance was not detected. Specifically, I think that the analysis presented in Supplementary Table~S6 should be treated as a main analysis rather than a supplementary one, and the results here show no statistically significant epistasis. Let me explain.

    Most serious researchers, I think, treat LDSC as an unreliable estimator of narrow-sense heritability; it typically returns estimates that are too low. Not even the original LDSC paper pressed strongly to use the method for estimating $h^2$ (Bulik-Sullivan et al., 2015). As a practical matter, when researchers are focused on estimating absolute heritability with high accuracy, they usually turn to GCTA/GREML (Evans et al., 2018; Wainschtein et al., 2022).

    One reason for low estimates with LDSC is that if SNPs with higher LD Scores are less likely to be causal or to have large effect sizes, then the slope of univariate LDSC will not rise as much as it ``should' with increasing LD Score. This was a scenario actually simulated by the authors and displayed in their Supplementary Figure~S15. [Incidentally, the authors might have acknowledged earlier work in this vein. A simulation inducing a negative correlation between LD Scores and $\chi^2$ statistics was presented by Bulik-Sullivan et al. (2015, Supplementary Figure 7), and the potentially biasing effect of a correlation over SNPs between LD Scores and contributed genetic variance was a major theme of Lee et al. (2018).] A negative correlation between LD Score and contributed variance does seem to hold for a number of reasons, including the fact that regions of the genome with higher recombination rates tend to be more functional. In short, the authors did very well to carry out this simulation and to show in their Supplementary Figure~S15 that this flaw of LDSC in estimating narrow-sense heritability is also a flaw of \texttt{i-LDSC} in estimating broad-sense heritability. But they should have carried the investigation at least one step further, as I will explain below.

    Another reason for LDSC being a downwardly biased estimator of heritability is that it is often applied to meta-analyses of different cohorts, where heterogeneity (and possibly major but undetected errors by individual cohorts) lead to attenuation of the overall heritability (de Vlaming et al., 2017).

    The optimal case for using LDSC to estimate heritability, then, is incorporating the LD-related annotation introduced by Gazal et al. (2017) into a stratified-LDSC (s-LDSC) analysis of a single large cohort. This is analogous to the calculation of multiple GRMs defined by MAF and LD in the GCTA/GREML papers cited above. When this was done by Gazal et al. (2017, Supplementary Table 8b), the joint impact of the improvements was to increase the estimated narrow-sense heritability of height from 0.216 to 0.534.

    All of this has at least a few ramifications for \texttt{i-LDSC}. First, the authors do not consider whether a relationship between their interaction LD Scores and interaction effect sizes might bias their estimates. (This would be on top of any biasing relationship between standard LD Scores and linear effect sizes, as displayed in Supplementary Figure~S15.) I find some kind of statistical relationship over the whole genome, induced perhaps by evolutionary forces, between \emph{cis}-acting epistasis and interaction LD Scores to be plausible, albeit without intuition regarding the sign of any resulting bias. The authors should investigate this issue or at least mention it as a matter for future study. Second, it might be that the authors are comparing the estimates of broad-sense heritability in Table~1 to the wrong estimates of narrow-sense heritability. Although the estimates did come from single large cohorts, they seem to have been obtained with simple univariate LDSC rather than s-LDSC. When the estimate of $h^2$ obtained with LDSC is too low, some will suspect that the additional variance detected by \texttt{i-LDSC} is simply additive genetic variance missed by the downward bias of LDSC. Consider that the authors' own Supplementary Table~S6 gives s-LDSC heritability estimates that are consistently higher than the LDSC estimates in Table~1. E.g., the estimated $h^2$ of height goes from 0.37 to 0.43. The latter figure cuts quite a bit into the estimated broad-sense heritability of 0.48 obtained with \texttt{i-LDSC}.

    Here we come to a critical point. Lines 282--286 are not entirely clear, but I interpret them to mean that the manuscript's Equation~5 was expanded by stratifying $\ell$ into the components of s-LDSC and this was how the estimates in Supplementary Table~S6 were obtained. If that interpretation is correct, then the scenario of \texttt{i-LDSC} picking up missed additive genetic variance seems rather plausible. At the very least, the increases in broad-sense heritability reported in Supplementary Table~S6 are smaller in magnitude and \emph{not statistically significant}. Perhaps what this means is that the headline should be a \emph{negligible} contribution of pairwise epistasis revealed by this novel and ingenious method, analogous to what has been discovered with respect to dominance (Hivert et al., 2021; Pazokitoroudi et al., 2021; Okbay et al., 2022; Palmer et al., 2023).

    REFERENCES

    Bulik-Sullivan, B., Loh, P.-R., Finucane, H. K., Ripke, S., Yang, J., Schizophrenia Working Group of the Psychiatric Genomics Consortium, Patterson, N., Daly, M. J., Price, A. L., & Neale, B. M. (2015). LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics, 47, 291-295.

    Burger, R. (2000). The mathematical theory of selection, recombination, and mutation. Wiley.

    de Vlaming, R., Okbay, A., Rietveld, C. A., Johannesson, M., Magnusson, P. K. E., Uitterlinden, A. G., van Rooij, F. J. A., Hofman, A., Groe- nen, P. J. F., Thurik, A. R., & Koellinger, P. D. (2017). Meta-GWAS Accuracy and Power (MetaGAP) calculator shows that hiding heritability is partially due to imperfect genetic correlations across studies. PLoS Genetics, 13, e1006495.

    Evans, L. M., Tahmasbi, R., Vrieze, S. I., Abecasis, G. R., Das, S., Gazal, S., Bjelland, D. W., de Candia, T. R., Haplotype Reference Consortium, Goddard, M. E., Neale, B. M., Yang, J., Visscher, P. M., & Keller, M. C. (2018). Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nature Genetics, 50, 737-745.

    Ewens, W. J. (2004). Mathematical population genetics I. Theoretical introduction (2nd ed.). Springer.

    Fisher, R. A. (1930). The genetical theory of natural selection. Oxford University Press.

    Fisher, R. A. (1941). Average excess and average effect of a gene substitution. Annals of Eugenics, 11, 53-63.

    Gazal, S., Finucane, H. K., Furlotte, N. A., Loh, P.-R., Palamara, P. F., Liu, X., Schoech, A., Bulik-Sullivan, B., Neale, B. M., Gusev, A., & Price, A. L. (2017). Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nature Genetics, 49, 1421-1427.

    Hivert, V., Sidorenko, J., Rohart, F., Goddard, M. E., Yang, J., Wray, N. R., Yengo, L., & Visscher, P. M. (2021). Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals. American Journal of Human Genetics, 108, 786- 798.

    Lee, J. J., McGue, M., Iacono, W. G., & Chow, C. C. (2018). The accuracy of LD Score regression as an estimator of confounding and genetic correlations in genome-wide association studies. Genetic Epidemiology, 42, 783-795.

    Lynch, M., & Walsh, B. (1998). Genetics and the analysis of quantitative traits. Sinauer.

    Okbay, A., Wu, Y., Wang, N., Jayashankar, H., Bennett, M., Nehzati, S. M., Sidorenko, J., Kweon, H., Goldman, G., Gjorgjieva, T., Jiang, Y., Hicks, B., Tian, C., Hinds, D. A., Ahlskog, R., Magnusson, P. K. E., Oskarsson, S., Hayward, C., Campbell, A., ... Young, A. I. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individu- als. Nature Genetics, 54, 437-449.

    Palmer, D. S., Zhou, W., Abbott, L., Wigdor, E. M., Baya, N., Churchhouse, C., Seed, C., Poterba, T., King, D., Kanai, M., Bloemendal, A., & Neale, B. M. (2023). Analysis of genetic dominance in the UK Biobank. Science, 379, 1341-1348.

    Pazokitoroudi, A., Chiu, A. M., Burch, K. S., Pasaniuc, B., & Sankararaman, S. (2021). Quantifying the contribution of dominance deviation effects to complex trait variation in biobank-scale data. American Journal of Human Genetics, 108, 799-808.

    Wainschtein, P., Jain, D., Zheng, Z., TOPMed Anthropometry Working Group, NHLBI Trans-Omics for Precision Medicine Consoritum, Cupples, L. A., Shadyab, A. H., McKnight, B., Shoemaker, B. M., Mitchell, B. D., Psaty, B. M., Kooperberg, C., Liu, C.-T., Albert, C. M., Roden, D., Chasman, D. I., Darbar, D., Lloyd-Jones, D. M., Arnett, D. K., . . . Visscher, P. M. (2022). Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nature Genetics, 54, 263-273.