Charting cellular differentiation trajectories with Ricci flow

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Complex biological processes, such as cellular differentiation, require an intricate rewiring of intra-cellular signalling networks. Previous characterisations of these networks revealed that promiscuity in signalling, quantified by a raised network entropy, underlies a less differentiated and malignant cell state. A theoretical connection between entropy and Ricci curvature has led to applications of discrete curvatures to characterise biological signalling networks at distinct time points during differentiation and malignancy. However, understanding and predicting the dynamics of biological network rewiring remains an open problem. Here we construct a framework to apply discrete Ricci curvature and Ricci flow to the problem of biological network rewiring. By investigating the relationship between network entropy and Forman-Ricci curvature, both theoretically and empirically on single-cell RNA-sequencing data, we demonstrate that the two measures do not always positively correlate, as has been previously suggested, and provide complementary rather than interchangeable information. We next employ discrete normalised Ricci flow, to derive network rewiring trajectories from transcriptomes of stem cells to differentiated cells, which accurately predict true intermediate time points of gene expression time courses. In summary, we present a differential geometry toolkit for investigation of dynamic network rewiring during cellular differentiation and cancer.

Article activity feed

  1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

    Learn more at Review Commons


    Reply to the reviewers

    We thank the reviewers for their positive comments on our manuscript with the reviewer’s cross-commenting:

    ‘It is a scholarly work that makes a significant contribution detailing how network entropy and curvature relate in the context of biological networks.’

    While reviewer #2 further comments:

    ‘This is an important paper…it will be of interest to a wide range of scientists that are involved in stem cell differentiation and biological networks.’

    We also thank both reviewers for their constructive and insightful comments. In what follows we present a point-by-point response to the reviewer comments.

    Reviewer #1 (Evidence, reproducibility and clarity (Required)):

    Contrary to earlier studies, which proposed a positive correlation between network entropy and the total discrete curvature of biological networks, the paper proves that the network entropy and their defined total Forman-Ricci curvature are not guaranteed to be positively correlated. In the case of cell differentiation networks, a positive correlation is likely across highly promiscuous signalling regimes such as stem cells. While a negative correlation is likely for deterministic signalling (differentiated cells). Additionally, they found that cancer cells have a higher network entropy, but lower total Forman-Ricci curvature compared to healthy differentiated cells. This contrasts with stem cells, where both network entropy and total Forman-Ricci curvature are higher than healthy differentiated cells. They demonstrated this on the k-star network as a toy example, as well as different real-world biological datasets (embryonic stem cell differentiation, melanoma patients, and colorectal cancer patients) where the Forman-Ricci curvature and the network entropy follow two distinct regimes.

    Finally, they apply their defined normalized discrete Ricci flow to predict the intermediate time-points of cellular differentiation by deriving the biological network rewiring trajectories based only on the first and last time points of time courses of cellular differentiation. Their approach accurately predicted intermediate time-points compared to the null Euclidean model, where a straight-line trajectory is considered. The proposed Ricci flow correctly orders the differentiation time courses during both the embryonic stem cell (ESC) differentiation and myoblast differentiation.

    • The paper is well-written and easy to follow.
    • The paper presents that the network entropy and the FRC do not always result in a positive correlation. This has been shown in a network toy example and several real-world cellular differentiation datasets.

    We thank the reviewer for their positive and accurate summary of our manuscript’s results.

    • [p.8-9, {section sign}2.1.2] "We note that these studies also employ slightly different constructions of Forman-Ricci curvature than our own." The Forman-Ricci curvature is formulated by defining the edge weights and node weights as in {section sign}2.1.2/{section sign}4.2.
    • The node weights are defined as the reciprocal of the node degree. It is explained that this is "to normalize the sums in (4), preventing connected high-degree vertices from dominating the total FRC." Does this pay less importance to hub nodes? Is this consistent with the cellular differentiation biological application?

    The reviewer raises the important point that computation of edge-wise Forman-Ricci Curvature (FRC) defined in eq. 4 requires stipulation of node and edge weights, which must be carefully defined to match the context of our problem of cellular differentiation.

    We specify the edge weights to match the computation of network entropy (Banerji et al., Sci. Rep., 2013), which is based on a mass-action principle assumption. This assumes that the rate of reaction between two interacting proteins is proportional to the product of the concentration of the reactants. We assume that gene expression is a proxy for protein concentration and define interaction strength between nodes i and j by x_i*x_j, where x_i is the expression of gene i. In the context of edge-wise FRC, edge weights equate to a distance, and so nodes which have a strong interaction should have a low edge weight, implying close proximity. Hence we define edge weights for FRC as a_ij/x_i*x_j, where a_ij is the (i,j)^{th} element of the protein interaction network adjacency matrix.

    For node weights we selected the reciprocal of the node degree as the reviewer recognises. The FRC computed on an edge (i,j) is essentially a pair of sums over edges connected to nodes i and j, with each sum normalised by the corresponding node weight W_i and W_j. The number of elements in each sum is deg(i) and deg(j) respectively. Thus choosing W_i as anything other than 1/deg(i), introduces a node degree bias to the edge-wise curvature measure.

    Rather than paying less importance to hub nodes as the reviewer suggests, by selecting W_i=1/deg(i) we are ensuring that node degree does not alter the edge-wise Forman-Ricci curvature, and equal importance is paid to all nodes. This property is essential in assessing the correlation between network entropy and network average Forman-Ricci curvature, as it prevents trivial correlation.

    To elaborate, to compute network average FRC we introduce a node wise FRC (eq. 24) as the mean edge-wise FRC over edges connected to a given vertex. Network average FRC (eq. 25) is then defined as a weighted sum over node averaged FRCs, where the weights employed are elements of the stationary distribution of P=(p_ij)_{ij \in V}. This construction mirrors the calculation of network entropy, where node-wise entropies are first computed (eq. 7) and an identical weighted sum over node-wise entropies (using the stationary distribution) defines network entropy.

    If we consider eq. 7 for computing the node wise entropy of node i:

    S_i = - \sum_{j \in V} p_{ij} log(p_{ij})

    we see that S_i is bounded between [0,deg(i)]. Hence the node-wise of elements of the sum defining network entropy have a degree dependence. If we employ a different node weight in our definition of edge-wise FRC, we ensure that the node-wise elements of network average FRC also have a degree dependence. Such a dependency could lead to a trivial correlation between network entropy and network average Forman-Ricci curvature, confounding our results.

    The reviewer makes the important observation that node degree often correlates with importance in biological networks and should be considered in a network average measure. We emphasise, that though we are removing degree dependency in the computation of edge-wise FRC, by using the stationary distribution for our network average FRC we re-introduce the importance of hub nodes (as high degree/strength nodes will have higher stationary distribution probabilities).

    Thus, to answer the two questions raised by the reviewer: (1) The choice of node weights in eq. 4 ensure that edge-wise FRC is independent of node degree and does not pay more or less importance to hub nodes, this prevents trivial correlation between network average FRC and network entropy. (2) The fact that higher degree nodes are more biologically relevant is respected by the use of the stationary distribution in calculation of network average FRC, which gives higher weighting to hub nodes, without introducing the possibility of trivial correlation with network entropy.

    We thank the reviewer for these important questions and will clarify this in an updated manuscript.

    • How is the Forman-Ricci curvature/flow definition in earlier works applied to biological networks different? How does it impact the current conclusions if such FRC definitions were used?

    Previous works investigating Forman-Ricci curvature in biological networks have defined edge-wise FRC using different node and edge weights, with various justification. All methods investigating gene expression data have used the same method to convert edge-wise FRC to network average FRC.

    In our manuscript, for reasons explained above, we defined edge weights as \omega_{ij}=a_{ij}/(x_i*x_j) and node weights as W_i=1/deg(i). As with all prior studies we then compute node average FRCs and define network average FRC via a stationary distribution weighted sum of node-wise FRCs.

    Murgas et al., Complex Networks & Their Applications X, 2021 defined edge weights via \omega_{ij}=x_i*x_j and node weights via W_i=x_i. Network average FRCs were computed via the same method we employ. The authors demonstrate a positive correlation between network average FRC and network entropy on stem cells. We note, however, that as the node weights do not depend on the node degree, the edge-wise FRCs have a degree dependence, leading to a possibly trivial correlation with network entropy as explained above. We further note that in this construction edge weights cannot be considered as “distances” between nodes, as they are large when interaction propensity is high and vice versa. This leads to a conceptual difficulty in interpreting the curvature. Finally, as the node weight depends on a temporal variable (gene expression), this construct is incompatible with a computable Ricci flow. To elaborate, at each time step of the Ricci flow equation we must compute all updated edge-weights and all updated edge-wise FRCs. However, the Ricci flow equation is defined on edges and thus produces a system of |E| equations at each time point, allowing us to solve for a maximum of |E| parameters. If the node weights also change over the time step, to compute the updated edge-wise FRCs, we will need to solve for |E|+|V| variables, which the system is insufficient to achieve. Thus the FRC defined by Murgas et al., 2021 is not suitable for our investigation as it may introduce trivial correlation with network entropy, is not well defined as a curvature in the biological context and is not compatible with a computable Ricci flow.

    Murgas et al., *Sci Rep. *2022 employed a different construction with edge weights \omega_{ij}=a_{ij}/(p_{ij}*deg(i) and node weights again W_i=x_i. Network average FRCs were computed via the same method we employ. The authors demonstrate a negative correlation between network average FRC and network entropy on stem cells. The fact that deg(i) is incorporated in the edge weight definition makes the degree dependence of edge wise FRC non-trivial, however the choice of node weights independent of degree still introduces the possibility of a correlation with node degree, which could confound any correlation with network entropy. The explanation given for a negative rather than positive correlation observed in this study, was the selection of edge weights, which the authors’ chose to better represent a ‘distance’ between nodes. While this permits a more well defined, intuitive curvature, we have shown here that both positive and negative correlations are possible with edge weights defined to represent distances. We also note that as the node weight is again a temporal variable, computation of a Ricci flow using these parameters is intractable.

    Other studies investigating FRC in biological networks (e.g., Weber et al., Journal of Complex Networks, 2016) do so in a different context and do not consider single sample gene expression weighting, and so cannot be directly compared to our approach.

    To our knowledge no other study has employed a Forman-Ricci flow on a biological network, but Forman-Ricci flow has been employed in the context of the Gnutella peer-to-peer file sharing network (Weber et al., Journal of Complex Networks, 2016). The major difference between the flow implemented by the authors and the one we present is that it was not normalised, but rather evolved the network towards a flat zero curvature. This would be suboptimal for our goal, which is to infer rewiring from one biological state to the another. We note that the concept of a normalised Ricci flow was introduced by Weber et al., 2016, but was not implemented in their example.

    Other studies have employed Ollivier-Ricci curvature in the investigation of gene expression weighted biological networks, including using phenotype average curvatures (Sandhu et al *Sci. Rep. *2015) and single sample curvatures (Elkin et al., NPJ Genom Med, 2021). We note that Ollivier-Ricci curvature is significantly more computationally expensive than Forman-Ricci curvature, and for our Ricci flow we must compute ~150,000 curvatures per time-step making Ollivier-Ricci curvature impractical. While Ollivier-Ricci curvature and Forman-Ricci curvature have been compared on biological networks, with some overlap in conclusions, the correlation between the two measures is not strong and our results cannot be generalised to cover this measure (Pouryahya et al., 2021, https://arxiv.org/abs/1712.02943).

    We will update the discussion of our manuscript with details of these prior studies.

    • Along these lines and especially considering the importance of curvature plays in network science and biology, the authors have to better discuss the existing work in developing various differential geometry approaches for molecular and system biology such as: "Ollivier-ricci curvature-based method to community detection in complex networks." Scientific reports 9, no. 1 (2019): 9800. "Inferring functional communities from partially observed biological networks exploiting geometric topology and side information." Scientific Reports 12, no. 1 (2022): 10883.

    We thank the reviewer for this suggestion and will include a discussion of these studies in the updated manuscript.

    • [{section sign}2.3, Fig. 3cd] For the cancer datasets shown in Fig. 3, I believe Pearson's r is shown for all the samples (malignant and healthy cells). It is mentioned that for cancerous cells, the network entropy is higher while the total FRC is lower. Is there a significant difference in correlation values if the healthy and cancerous cells are considered separately?

    We thank the reviewer for this suggestion and indeed there is a difference in this correlation when healthy and cancerous samples are considered separately. Healthy samples have a stronger negative correlation between network entropy and total FRC than cancerous samples. Our theoretical results show that as intracellular signalling becomes less deterministic the correlation between our two measures becomes less negative. The findings of the analysis suggested by the reviewer, therefore supports the conclusion that intracellular signalling becomes less deterministic during oncogenesis.

    For Figure 3C across control (healthy) samples there is a significant negative correlation between network entropy and total FRC (n=3256, Pearson’s r=-0.83, p-16), while there is no correlation between the two measures across melanoma (cancerous) samples (n=1257, Pearson’s r=-0.009, p=0.76). Using Fisher’s z-transformation to compare these correlations we find a highly significant difference (p-16).

    For Figure 3D, the difference in more subtle. Across control (healthy) samples there is significant negative correlation between network entropy and total FRC (n=160, Pearson’s r=-0.90, p-16), while across colorectal cancer samples the negative correlation is slightly weaker (n=272, Pearson’s r=-0.83, p-16). Using Fisher’s z-transformation to compare these correlations we again find a significant difference (p-3).

    We will include these results in an updated manuscript.

    • [Methods, p.23] "For each gene expression time course we implemented one time step of the Ricci flow from the first time point $\mathbf{x^0}$ using each $\Delta t$ value and selected the optimal $\Delta t$ as the largest which does not admit negative values of $d_{0+\Delta t}$..." It seems that choosing the optimal $\Delta t$ is a heuristic; I wonder how sensitive is the proposed framework based on the choice of $\Delta t$ in accurately identifying the trajectory. i.e., assuming that the choice of $\Delta t$ does not admit to negative d values, are there choices of $\Delta t$ that are still too large that it does not correctly identify the intermediate time points?

    The reviewer raises an important point on the selection of the parameter \Delta t in our Ricci flow equation. Selection of the optimal \Delta t is a trade off. Too small a \Delta t means many time-steps before convergence and the computational cost of implementing our flow (requiring the calculation of 150,000 Forman-Ricci curvatures per time step) becomes unmanageable. While too large a \Delta t may mean too coarse grain an approximation, inhibiting inference of the true underlying continuous network rewiring trajectory.

    As the reviewer notes we are also subject to the constraint that edge weights (which in the context of Ricci flow correspond to distances) must be non-negative. If \Delta t is sufficiently small this guarantees non-negative edge weights are output from the Ricci flow. The precise \Delta t which prevents negative edge weights depends on the differences between the edge wise FRCs of the network at each time point and those of the normaliser. As these differences get smaller as we progress the flow and converge to the normaliser, we can determine the optimal \Delta t from the first time step as:

    1/min_{(i,j)\in E}(Ric_{(i,j)}(x^0) - \bar{Ric_{(i,j)})

    This is the largest possible value of \Delta t we can use in our framework without introducing negative values, and thus the \Delta t most likely to inhibit our approach from approximating the true intermediate time points.

    In our paper, rather than using this formula we empirically searched for the largest \Delta t which admits only positive values. This was done by considering a range of possible values of \Delta t calculating one step of the Ricci flow equation for each value and identifying the largest \Delta t for which the smallest updated edge weight was non-negative. For both our time courses this identified \Delta t=0.06 as optimal, while the above formula gives maximal values for \Delta t at between 0.06 and 0.065.

    The results we present are therefore calculated using very close to the maximum value of \Delta t possible in our framework and as the reviewer comments we

    ‘accurately predicted intermediate time-points compared to the null Euclidean model’

    using this \Delta t.

    Thus to answer the reviewer’s question, subject to the constraint of non-negative d values, there does not appear to be a choice of \Delta t which is too large to correctly identify the intermediate time points.

    We will clarify this point in the updated manuscript.

    Reviewer #1 (Significance (Required)):

    Contrary to earlier studies, which proposed a positive correlation between network entropy and the total discrete curvature of biological networks, the paper proves that the network entropy and their defined total Forman-Ricci curvature are not guaranteed to be positively correlated.

    We thank the reviewer for their kind and positive comments on our work.

    Reviewer #2 (Evidence, reproducibility and clarity (Required)):

    The paper brings in new insights, and corrects earlier ones, on how certain measures quantify structure and functionality of biological networks. It starts with the hypothesis that network entropy is a proxy for 'height' in Waddington's Landscape, with higher values on stem cells and malignant cells compared to healthy ones. In this context, it explores different notions of network curvature (one in particular, Forman-Ricci, in depth) in how it correlates with entropy for various types of networks. It draws a potentially transformative conclusion that promiscuity in signalling, that characterizes less differentiated and malignant cell, is also a key factor in how curvature and entropy relate to each other. Specifically, it observes for cases, and explains with an academic example that correlation between network curvature and entropy may change sign as e.g., stem cells differentiate.

    It supports the conclusion that Forman-Ricci curvature has indeed a great biological relevance, and that curvature and entropy are complementary, and not interchangeable measures of cell potency, as suggested in earlier studies. It also hypothesizes that a version of Ricci flow (a dynamical model for how networks change in the strength of interactions and correlations; or, the metric in continuous spaces) correctly captures likely trajectories in cell differentiation. This provides an additional argument that highlights the importance of curvature as some sort of driving potential that effects, and is defined by, structural changes in biological networks.

    The paper is well written, with a fairly accessible account of pertinent mathematics, in the body of the paper and the materials and methods. The component on the analysis of data is also well explained and supported. Minor suggestion, please point to the formula for S_R in the materials and methods when it is first referred to in the paper.

    We thank the reviewer for their kind comments on our paper and will point to the formula for S_R as suggested in the updated manuscript.

    **Referees cross-commenting**

    It is a scholarly work that makes a significant contribution detailing how network entropy and curvature relate in the context of biological networks

    We thank both referees for their kind and positive assessment of our work.

    Reviewer #2 (Significance (Required)):

    This is an important paper that explores in a systematic way certain mathematical concepts, as indicators of biological network functionality, to distinguish stages in cell differentiation that may be significant in understanding e.g., malignant versus healthy cells and the flow in stem cell differentiation. It supports its conclusions with an academic example as well as analysis of biological data sets. It draws a balanced view on how entropy and curvature relate and gives an accessible account to the relevant mathematics.

    It will be of interest to a wide range of scientists that are involved in stem cell differentiation and biological networks.

    We thank the referee for their kind words on our work and for emphasising the broad range of interest.

  2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #2

    Evidence, reproducibility and clarity

    The paper brings in new insights, and corrects earlier ones, on how certain measures quantify structure and functionality of biological networks.

    It starts with the hypothesis that network entropy is a proxy for 'height' in Waddington's Landscape, with higher values on stem cells and malignant cells compared to healthy ones. In this context, it explores different notions of network curvature (one in particular, Forman-Ricci, in depth) in how it correlates with entropy for various types of networks. It draws a potentially transformative conclusion that promiscuity in signaling, that characterizes less differentiated and malignant cell, is also a key factor in how curvature and entropy relate to each other. Specifically, it observes for cases, and explains with an academic example that correlation between network curvature and entropy may change sign as e.g., stem cells differentiate.

    It supports the conclusion that Forman-Ricci curvature has indeed a great biological relevance, and that curvature and entropy are complementary, and not interchangeable measures of cell potency, as suggested in earlier studies. It also hypothesizes that a version of Ricci flow (a dynamical model for how networks change in the strength of interactions and correlations; or, the metric in continuous spaces) correctly captures likely trajectories in cell differentiation. This provides an additional argument that highlights the importance of curvature as some sort of driving potential that effects, and is defined by, structural changes in biological networks.

    The paper is well written, with a fairly accessible account of pertinent mathematics, in the body of the paper and the materials and methods. The component on the analysis of data is also well explained and supported. Minor suggestion, please point to the formula for S_R in the materials and methods when it is first referred to in the paper.

    Referees cross-commenting

    It is a scholarly work that makes a significant contribution detailing how network entropy and curvature relate in the context of biological networks

    Significance

    This is an important paper that explores in a systematic way certain mathematical concepts, as indicators of biological network functionality, to distinguish stages in cell differentiation that may be significant in understanding e.g., malignant versus healthy cells and the flow in stem cell differentiation. It supports its conclusions with an academic example as well as analysis of biological data sets. It draws a balanced view on how entropy and curvature relate, and gives an accessible account to the relevant mathematics.

    It will be of interest to a wide range of scientists that are involved in stem cell differentiation and biological networks.

  3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #1

    Evidence, reproducibility and clarity

    Contrary to earlier studies, which proposed a positive correlation between network entropy and the total discrete curvature of biological networks, the paper proves that the network entropy and their defined total Forman-Ricci curvature are not guaranteed to be positively correlated. In the case of cell differentiation networks, a positive correlation is likely across highly promiscuous signaling regimes such as stem cells. While a negative correlation is likely for deterministic signaling (differentiated cells). Additionally, they found that cancer cells have a higher network entropy but lower total Forman-Ricci curvature compared to healthy differentiated cells. This is in contrast to stem cells, where both network entropy and total Forman-Ricci curvature are higher than healthy differentiated cells. They demonstrated this on the k-star network as a toy example, as well as different real-world biological datasets (embryonic stem cell differentiation, melanoma patients, and colorectal cancer patients) where the Forman-Ricci curvature and the network entropy follow two distinct regimes.

    Finally, they apply their defined normalized discrete Ricci flow to predict the intermediate time-points of cellular differentiation by deriving the biological network rewiring trajectories based only on the first and last time points of time courses of cellular differentiation. Their approach accurately predicted intermediate time-points compared to the null Euclidean model, where a straight-line trajectory is considered. The proposed Ricci flow correctly orders the differentiation time courses during both the embryonic stem cell (ESC) differentiation and myoblast differentiation.

    • The paper is well-written and easy to follow.
    • The paper presents that the network entropy and the FRC do not always result in a positive correlation. This has been shown in a network toy example and several real-world cellular differentiation datasets.
    • [p.8-9, {section sign}2.1.2] "We note that these studies also employ slightly different constructions of Forman-Ricci curvature than our own." The Forman-Ricci curvature is formulated by defining the edge weights and node weights as in {section sign}2.1.2/{section sign}4.2.
    • The node weights are defined as the reciprocal of the node degree. It is explained that this is "to normalize the sums in (4), preventing connected high-degree vertices from dominating the total FRC." Does this pay less importance to hub nodes? Is this consistent with the cellular differentiation biological application?
    • How is the Forman-Ricci curvature/flow definition in earlier works applied to biological networks different? How does it impact the current conclusions if such FRC definitions were used?
    • Along these lines and especially considering the importance of curvature plays in network science and biology, the authors have to better discuss the existing work in developing various differential geometry approaches for molecular and system biology such as: "Ollivier-ricci curvature-based method to community detection in complex networks." Scientific reports 9, no. 1 (2019): 9800. "Inferring functional communities from partially observed biological networks exploiting geometric topology and side information." Scientific Reports 12, no. 1 (2022): 10883.
    • [{section sign}2.3, Fig. 3cd] For the cancer datasets shown in Fig. 3, I believe Pearson's r is shown for all the samples (malignant and healthy cells). It is mentioned that for cancerous cells, the network entropy is higher while the total FRC is lower. Is there a significant difference in correlation values if the healthy and cancerous cells are considered separately?
    • [Methods, p.23] "For each gene expression time course we implemented one time step of the Ricci flow from the first time point $\mathbf{x^0}$ using each $\Delta t$ value and selected the optimal $\Delta t$ as the largest which does not admit negative values of $d_{0+\Delta t}$..." It seems that choosing the optimal $\Delta t$ is a heuristic; I wonder how sensitive is the proposed framework based on the choice of $\Delta t$ in accurately identifying the trajectory. i.e., assuming that the choice of $\Delta t$ does not admit to negative d values, are there choices of $\Delta t$ that are still too large that it does not correctly identify the intermediate time points?

    Significance

    Contrary to earlier studies, which proposed a positive correlation between network entropy and the total discrete curvature of biological networks, the paper proves that the network entropy and their defined total Forman-Ricci curvature are not guaranteed to be positively correlated.