Exploiting fluctuations in gene expression to detect causal interactions between genes

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    By taking advantage of noise in gene expression, this important study introduces a new approach for detecting directed causal interactions between two genes without perturbing either. The main theoretical result is supported by a proof, although clearer statements are needed to ensure that there are no edge cases that can violate the theorem. Preliminary simulations and experiments on small circuits are presented, but the evidence remains incomplete because further investigations are needed to demonstrate the broad applicability and scalability of the method.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Characterizing and manipulating cellular behaviour requires a mechanistic understanding of the causal interactions between cellular components. We present an approach that can detect causal interactions between genes without the need to perturb the physiological state of cells. This approach exploits naturally occurring cell-to-cell variability which is experimentally accessible from static population snapshots of genetically identical cells without the need to follow cells over time. Our main contribution is a simple mathematical relation that constrains the propagation of gene expression noise through biochemical reaction networks. This relation allows us to rigorously interpret fluctuation data even when only a small part of a complex gene regulatory process can be observed. This relation can be exploited to detect causal interactions by synthetically engineering a passive reporter of gene expression, akin to the established “dual reporter assay”. While the focus of our contribution is theoretical, we also present an experimental proof-of-principle to illustrate the approach. Our data from synthetic gene regulatory networks in E. coli are not unequivocal but suggest that the method could prove useful in practice to identify causal interactions between genes from non-genetic cell-to-cell variability.

Article activity feed

  1. eLife assessment

    By taking advantage of noise in gene expression, this important study introduces a new approach for detecting directed causal interactions between two genes without perturbing either. The main theoretical result is supported by a proof, although clearer statements are needed to ensure that there are no edge cases that can violate the theorem. Preliminary simulations and experiments on small circuits are presented, but the evidence remains incomplete because further investigations are needed to demonstrate the broad applicability and scalability of the method.

  2. Reviewer #1 (Public Review):

    Summary:
    This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea.

    Strengths:
    The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems.

    Weaknesses:
    The practical utility of this method may not be straightforward and potentially be quite difficult to execute. Additionally, further investigations are needed to provide evidence of the broad applicability of the method to naturally occurring systems and its scalability beyond the simple circuit in which it is experimentally demonstrated.

  3. Reviewer #2 (Public Review):

    Summary:
    This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality.

    The authors benchmark their approach experimentally in several synthetic circuits. In four positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in two or three of the positive control circuits. The authors constructed sixteen negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter or simply the cellular growth rate. The proposed method detected a causal effect in two of the sixteen negative controls, which the authors argue is not a false positive, but due to an unexpected causal effect. Overall, these pilot studies, albeit in simplified scenarios, provide encouraging results.

    Strengths:
    The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations.

    By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations.

    Caveats:
    The term "causally" is used in the main-text statement of the central theorem (Eq 2) without a definition of this term. This makes it difficult to fully understand the statement of the paper's central theorem without diving into the supplement.

    The basic argument of theorem 1 appears to rely on establishing that x(t) and y(t) are independent of their initial conditions. Yet, there appear to be some scenarios where this property breaks down:

    (1) Theorem 1 does not seem to hold in the edge case where R=beta=W=0, meaning that the components of interest do not vary with time, or perhaps vary in time only due to measurement noise. In this case x(t), y(t), and z(t) depend on x(0), y(0), and z(0). Since the distributions of x(0), y(0), and z(0) are unspecified, a counterexample to the theorem may be readily constructed by manipulating the covariance matrix of x(0), y(0), and z(0).

    (2) A similar problem may occur when transition probabilities decay with time. For example, suppose that again R=0 and X are degraded by a protease (B), but this protease is subject to its own first-order degradation. The deterministic version of this situation can be written, for example, dx/dt=-bx and db/dt=-b. In this system, x(t) approaches x(0)exp(-b(0)) for large t. Thus, as above, x(t) depends on x(0). If similar dynamics apply to the Y and Z genes, we can make all genes depend on their initial conditions, thus producing a pathology analogous to the above example.

    The reviewer does not know when such examples may occur in (bio)physical systems. Nevertheless, since one of the advantages of mathematics is the ability to correctly identify the domain of validity for a claim, the present work would be strengthened by "building a fence" around these edge cases, either by identifying the comprehensive set of such edge cases and explicitly prohibiting them in a stated assumption set, or by pointing out how the existing assumptions already exclude them.