Identification of phenotype-specific networks from paired gene expression–cell shape imaging data

This article has been Reviewed by the following groups

Read the full article

Listed in

Log in to save this article

Abstract

The morphology of breast cancer cells is often used as an indicator of tumor severity and prognosis. Additionally, morphology can be used to identify more fine-grained, molecular developments within a cancer cell, such as transcriptomic changes and signaling pathway activity. Delineating the interface between morphology and signaling is important to understand the mechanical cues that a cell processes in order to undergo epithelial-to-mesenchymal transition and consequently metastasize. However, the exact regulatory systems that define these changes remain poorly characterized. In this study, we used a network-systems approach to integrate imaging data and RNA-seq expression data. Our workflow allowed the discovery of unbiased and context-specific gene expression signatures and cell signaling subnetworks relevant to the regulation of cell shape, rather than focusing on the identification of previously known, but not always representative, pathways. By constructing a cell-shape signaling network from shape-correlated gene expression modules and their upstream regulators, we found central roles for developmental pathways such as WNT and Notch, as well as evidence for the fine control of NF-kB signaling by numerous kinase and transcriptional regulators. Further analysis of our network implicates a gene expression module enriched in the RAP1 signaling pathway as a mediator between the sensing of mechanical stimuli and regulation of NF-kB activity, with specific relevance to cell shape in breast cancer.

Article activity feed

  1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Reply to the reviewers

    We would like to thank the editor for their consideration and the reviewers for their time and thoughtful comments. Below we have written a point-by-point response to their comments and concerns. The original comments are displayed in italic fonts, whereas our responses are in regular fonts for clarity.

    Reviewer #1 (Evidence, reproducibility and clarity (Required)):

    The submitted manuscript 'Identification of phenotype-specific networks from paired gene expression-cell shape imaging data' of Barker et al. uses a convolute of different bioinformatics tools (see Key Resource Table) to analyze reported RNA sequencing data and to correlate derived pathways with imaging features of breast cancer cell lines based on specific pathway constructions. The thin red line of the data presentation in the manuscript is not obvious.

    **Major concerns:**

    *1.1 *The main biological 'finding' of the study RAP1 'as a potential mediator between the sensing of mechanical stimuli and regulation of NFkB activity' is reported and therefore the assumption 'how exactly extra-cellular mechanical cues are sensed by the cell and passed on to NFkB in breast cancer is not understood' is misleading. Please review: https://www.nature.com/articles/ncb2080* (human breast cancers with NF-κB hyperactivity show elevated levels of cytoplasmic Rap1. Similar to inhibiting NF-κB, knockdown of Rap1 sensitizes breast cancer cells to apoptosis)* https://pubmed.ncbi.nlm.nih.gov/17510404/ (RAP1 is a crucial element in organizing acinar structure and inducing lumen formation), and https://pubmed.ncbi.nlm.nih.gov/21429211/.

    R1.1 We thank the reviewer for pointing out these references. Teo *et al. (which is cited in our manuscript) provides evidence that Rap1 regulates IKK and therefore NFkB in breast cancer, while Itoh et al. and McSherry et al. *focus on Rap1’s ability to modulate migration and morphogenesis as do other similar papers cited in our manuscript. None of the papers show the significance of the Rap1-NFkB interaction in the explicit context of cell shape with Teo et al. only speculatively mentioning a potential relevance of Rap1/NFkB in migration (“Given that NF-κB is critical for [. . . ] stimulating invasion, our results document a clinical setting wherein Rap1-mediated regulation of NF-κB could be critical.”). We appreciate that the specific sentence the reviewer has drawn attention to is slightly misleading given Teo *et al. *and we will amend it in the revised manuscript. However, here, we use a novel methodology on a past dataset to link the concepts introduced by these 3 papers within the specific context of cellular morphology in breast cancer cells.

    Specifically, in this manuscript, we identify a Rap1 expression module correlated with cell shape and find that it is at the network confluence of transcription factors activated by cell shape. This, along with our findings of modulation of NFkB co-activators, as well as previous work showing that it is a key mechano-transductive transcription factor leads us to hypothesize that Rap1 mediates the regulation and mechano-sensing of cell shape via its interaction with NFkB.

    It is also important to note that, while we build on the findings of Teo et al., McSherry* et al.* and Itoh *et al. *relating NFkB, Rap1 and cell shape, we also use our method to focus on other proteins of interest. These are drawn attention to in the discussion, with the ARNT KO/TNFalpha module being the most highly correlated gene expression module with the morphological features. Also, the importance of transcriptional co-regulators of NFkB are illustrated in the network propagation, with both NR0B2 and PPARGC1A mentioned in the discussion. However, our analysis naturally concentrates on the node with the most apparent literature support, which as your reading suggests is Rap1. The significance of this manuscript is that it is an unbiased systems-based methodology used to link cell-shape with signaling, via transcription in a context-specific manner (i.e. in the context of breast cancer). This produces a phenotype-specific network that has allowed us to connect diverse mechanisms and hypotheses put forward by other authors and further our understanding of how signaling manages the sensing and regulation of cell shape in breast cancer. The methodology is also applicable to any paired transcriptomics/phenotype dataset.

    1.2 Besides, Fig. 2 and 3 are unrelated to this main statements.

    R1.2 Figure 2 shows the results of our morphological cluster identification and subsequent differential expression analysis. Since these were included as parts of the network, we included them to give the reader an idea of the components included by this step. Figure 3A shows the network that was generated by our pipeline which forms the basis of all subsequent biological exploration, and the discovery of Rap1 and nodes important for the regulation of cell shape. Figure 3B shows that no bias was introduced by using the specific algorithm for our network generation. As such Figures 2 and 3 are related to the generation of the cell shape-specific network that forms the basis of our study. We will amend the text and figure legends to clarify this point more carefully, and we will consider moving some of the panels to the supplementary materials to simplify the message.

    1.3. The spotted RAP1 (by TFs JARD2 and RUNX2) finding is not obvious without Fig. 4 results, a network propagation of functional TFs in differentially activated processes (basal vs. luminal) in the cell shape regulatory network. Please show that RAP1 could be not identified without the network based on TF and DEG only.

    R1.3 The Rap1 hypothesis is supported by both Figure S2E and Figure 4. Figure S2E shows that the Rap1 pathway-enriched gene expression module is the most differentially expressed module among those incorporated in our cell shape regulatory network. This suggests that this module is correlated to cell shape on a transcriptomic level, but does not necessarily mean anything within the context of intracellular signaling. Figure 4 shows that this gene expression module is at the confluence of activated transcription factors as specified by our constructed signaling network. This is an interesting finding as it implies (unlike Figure S2E) that the Rap1 gene expression module is relevant to intra-cellular signaling.

    While the Rap1 module is indeed differentially expressed and could in theory have been found just by the DE analysis as being important, the network approach enables us to integrate these modules of co-expressed genes within known signaling networks. This allows us to go further than just making comments about expression and transcription factor activity, to discussing how signaling networks interact with our identified gene expression modules. This in turn allows us to construct more sophisticated hypotheses about cell shape regulation.

    Particularly, we use this analysis to reinforce the association with Rap1 by illustrating that the Rap1 network node lies at the confluence of transcription factors activated in luminal-like and basal-like cell shapes. We also use the network to identify highly central nodes (such as PPARGC1A, CTNNB1 and ESR1) and other proteins identified in the network propagation (YAP1, IKBKB and ARNT). Furthermore, the network is used as a means of integrating gene expression modules in their signaling network environment. The method by which this embedding was done (in-going edges being transcription factors regulating the module and out-going edges being signaling proteins contained within the module) adds context specificity to a network that is otherwise generalised to many cell-types and contexts.

    The lack of clarity on how we arrived at Rap1 as a key tenet of our discussion, as well as the added value to the methodology of the network analysis is something that we will certainly work on in our revision and we thank the reviewer for their valuable feedback. We will also move Figure S2E from the supplementary figures to the main Figure panels, as it is an important part of how we arrived upon Rap1 as a module of particular interest.

    1.4* More complex fluorescence phenotypes are available and do not match the complexity of the RNASeq data, data input and pathway construction with only 10 simple cell shape features. Conversely, relative 'monoclonal' breast cancer cell lines may are the only application for this workflow.*

    R1.4 We thank the reviewer for their comment and respectfully disagree. These cell shape features were sufficient for the original authors Sero et al. to predict TF activities (PMID: 26148352) and Sailem et al. to identify clinically predictive metagenes (PMID: 27864353). Although these features seem simplistic, they concisely summarise a highly complex phenotype and are proven to encode metastatic potential (PMCID: PMC6976289) and to be prognostic markers for breast cancer progression (PMID: 28977854). Accordingly, these features are re-analysed using our workflow to better understand the signaling that drives them. Understanding other features that might match the complexity of our expression data is possible using the presented method, but is outside of the scope of the research within this manuscript as our research question focuses on the regulation of cell shape in breast cancer.

    As evidence that this cell shape network is applicable beyond cell lines, we can perform an analysis on breast cancer patient data from the TCGA, demonstrating the relationship of our network’s components with metastasis, which is highly related to cell shape.

    1.5* Image features Fig. 1 and 5 do not match*

    R1.5 It is true that the features in Fig.1 and Fig.5 do not exactly match. This is because these two datasets came from different studies. While they are slightly different features, the essential phenotypes that they quantify are the same. For example, in Fig 5., cytoplasm area, cytoplasm perimeter, nucleus area, nucleus length, nucleus width and nucleus perimeter, are clearly basic morphological features that are analogous to features in Fig.1, such as cell area, cell width to length, nucleus area, nucleus width to length. It is a solid assumption that the biological processes that drive these features are common, and our results illustrate this. Meanwhile, the existence of features measured in the dataset in Fig.5, that are not analogous to those used in the dataset used for model construction, provide convenient negative controls in order to determine that our network describes regulatory processes specific to the features used in network construction.

    *1.6 *with Fig. 5 being a rather indirect 'proof' of usability.

    __R1.6 __The purpose of Figure 5 is not to demonstrate usability but to prove that the network that we have identified is indeed representing cell signalling that controls cell shape. It shows that perturbations inside our network have a significantly stronger effect on phenotypes used in the creation of the network than perturbations outside our network. Moreover, it shows that this is not the case for phenotype features not used in the creation of our network, underlining that our network is both accurate and specific to the phenotypes used as input. We will add further clarifications in the text and figure legend to explain this better.

    1.7* Fig. 1a has not achieved a visual descriptive state and asking a lot.*

    __R1.7 __We apologize for the lack of clarity here and will revise the figure to better present the method and reduce confusion.

    Reviewer #2 (Evidence, reproducibility and clarity (Required)):

    *The authors combine single cell morphology and gene expression data to identify signaling activities implicated in the control of cellular morphogenesis. They describe a reasonable bioinformatics pipeline from gene expression shifts between two morphological phenotypes to pathways, then to common transcription factors to signaling. As far as I can assess the situation (I am not familiar with all the tools they use) the proposed pipeline works convincingly. *

    We would like to thank their reviewer for their thoughtful comments and to clarify that the analysis has not been done on single cell gene expression data, but rather on bulk RNAseq data. This is further explained in the point-by-point responses below.

    2.1* However, I am concerned that the logic underlying this analysis is only partially valid. The link between signaling and morphology may be more direct than via TF-based gene expression regulation. Many signals (and many of the kinases the authors test for validation) are implicated in morphology control as direct upstream regulators of cytoskeleton dynamics and adhesion. This also applies to the GTPase Rap1, which the authors fish out as the most differentially expressed signal between two types of morphologies. In addition to the indirect effect of Rap1 on morphology via NFKB regulation suggested by the authors, Rap1 will affect morphology probably very directly through activation of Rac -> F-actin and RIAM -> nascent adhesions. At minimum, the authors should discuss this complexity as a caveat of their approach. And dependent on the impact the authors hope to have with this story, I believe they should experimentally resolve the ambiguity of direct vs indirect signaling for some of their key interpretations.*

    R2.1 We thank the reviewer for making this fair point about the limitations of our data, i.e that we are not able to directly observe and delineate exactly how upstream signaling has modulated or is modulated by cell shape. As a first major clarification, which we will make sure to include in the text, the Rap1 node doesn’t necessarily represent the Rap1 GTPase itself. It is a co-expression module that is enriched in activators and downstream effectors of Rap1 signalling. As such when we are talking about the Rap1 module we mean the subnetwork of Rap1 signalling rather than the specific small GTPase itself. Thus, we can’t assume that Rap1 itself is the key node in this subnetwork and it is therefore complicated to design specific experiments to test the direct or indirect modulation of cell shape by Rap1. We agree however that additional information regarding the role of this module in regulating cell shape would be interesting and valuable.

    For this study, we have access to transcriptomic and cell shape data from 14 cell lines with transcriptomic and cell shape data. Using the expression data, we will quantify Rap1 expression module activity and its relationship to NFkB transcriptional activity across these cell lines. By comparing in these cell lines the effect on morphology when NFkB and the Rap1 module are combinatorially activated or deactivated, we can disseminate between competing hypotheses for direct and indirect activities of these two factors on cell shape. For example, if the overwhelming source of Rap1 module’s function was via direct interaction with F-actin, then Rap1 module activity would be predictive of cell shape, regardless of NFkB activation. A caveat to this is our limited access to only 14 cell lines. Additionally, if necessary, we have access to drugs that can induce cytoskeletal defects and perturb morphology directly. This can be used to disrupt the relationship between Rap1 and F-actin that the reviewer has identified and gauge the effect on cell shape. If such an intervention disrupts the relationship between Rap1 signaling module/NFkB transcriptional activity and cell shape then we can hypothesise that the activity of Rap1 signaling is greater than just its direct activity on F-actin. Finally, we can perform a knock-down of Rap1 (or selected components of its module) and NFkB itself and gauge the effect of such a perturbation on the Rap1 gene expression module and cell shape.

    That being said, these signaling modulations (whether indirect or direct) are reflected accordingly in differentially activated transcription factors, and therefore can be observed and recorded from expression data. This is an interesting finding, as it implies that signaling processes not explicitly making use of transcription factors (such as those that directly affect adhesion complexes, regulating cytoskeletal proteins etc) can still have their activity gauged through their indirect downstream expression signatures. In any case, our findings illustrate that there is a cell shape-specific modulation of the Rap1 module in breast cancer, reflected in the expression data. Rap1 almost certainly has some direct contributions to cytoskeletal dynamics in breast cancer (PMID: 10805781, PMID: 30156466 and PMID: 22644079), but here we observe clearly how it also is modulating transcription factors, that we hypothesise may contribute to the development of a morphological and transcriptomic ‘niche’ in a more robust and long-term fashion. Nonetheless, the points discussed by the reviewer are valuable and in our revision we will discuss this as a potential caveat.

    In defense to the presented premise, the authors start out by looking for correlation between gene expression and morphology, and they find some signal. Correlation analysis, especially in large data sets, tends to be pretty robust and specific, even on presence of strong confounders. Thus, even though the correlation expression-morphology, which points indirectly at morphology-regulating signaling modules, is likely to be super-imposed by direct morphology-regulating signaling pathways the proposed approach will not be able to detect, the presented analysis is valuable, in principle.

    We thank the reviewer for their positive comments.

    That said, I have a number of substantial concerns also with the implementation and presentation of the approach.

    2.2* First, on the presentation side, for a paper that talks about cell morphology it is strange to have not a single figure panel showing an image of cells, or at least cell outlines. As a reader I would like to get visual impression of how different a high vs low Rap1 gene expresser is, for example.*

    __R2.2 - __We agree that it would greatly help the clarity and message of the manuscript. As we are not able to use the public and previously published data that have been used for our paper due to copyright laws associated with journal publications, we will generate relevant images representative of the respective cell shapes and include them in the manuscript.

    2.3* Along the same lines, it is not quite clear to me when the authors collate entire cell lines into a single phenotype, do they switch then to population-based analysis? That is, for example the volcano plots in 2B,C are they representing an average gene expression shift?*

    R2.3 - We apologise for the lack of clarity. The morphological clustering and differential expression illustrated in Figure 2 is to find expression signatures responsible for distinct breast cancer cell shapes. We link our expression data and our imaging data via the breast cancer cell lines and so are limited to studying expression in bulk per cell lines. The volcano plots in 2B,C are of the differentially expressed genes (as calculated by DESEQ2) in morphologically distinct clusters of cell lines as specified by Figure 2A. This was collated so we could observe transcriptomic differences accounting for cellular morphology, rather than differences in cell lines (these being already well characterised and not in the scope of this manuscript).

    We will add additional clarifications in our revised manuscript to further explain this.

    2.4* How heterogeneous are the morphological signals? *

    R2.4 We provide values of standard deviation for the morphological features of the derived clusters (page 4 - “Clustering based on morphology reveals distinctive cell-line shapes”). The heterogeneity of the morphological clusters is minimised as per the elbow plot shown in Figure S2A. This plot illustrates the decreasing total within-cluster variation of the cell shape groups as the number of clusters (k) is increased. The point of inflection represents the optimum number of clusters (in our case k=3). Aside from this, we note that one of those 3 clusters is significantly more heterogeneous both morphologically speaking and biologically speaking (illustrated Figure 2A) and so we used the other two which showed more informative gene expression profiles and could be annotated roughly with breast cancer subtypes (basal and luminal - although is alignment was not perfect since the grouping was based only on the morphology). We took the more heterogeneous cluster (cluster A, Figure 2A) to be the least relevant cluster in terms of morphology and biologically significance, also because it contained the non-tumorigenic cell-line MCF10A.

    *2.5 **Are the correlations between gene expression and morphology computed with single cell data as the basis? *

    __R2.5 __Due to the availability only of bulk RNA sequencing data for the cell lines for which we also had high content imaging data, all analyses are done using bulk RNA sequencing data at the cell line level. We will clarify this in the text and methods section.

    2.6* Could the volcano plots be sharpened by accounting for the single cell variation in morphology instead of lumping the cells into two morphological classes? *

    __R2.6 __As per the limitations mentioned in R2.3 and R2.5, we cannot study gene expression at the single cell level. Furthermore, the utility of using morphological clusters was so that we could observe morphological transcriptomic traits rather than those specific to cell lines.

    *2.7 *On the back end of the paper, when the authors apply kinase inhibitors to validate some of the claimed pathways, it would be nice for the reader to see the morphological effects of these inhibitors. And to relate the kinase induced shifts to the morphological heterogeneity that is the basis for the study driving, initial correlation analysis? At the end of the day, the proof is in the pudding.

    __R2.7 - __We thank the reviewer for his very good point, and it would certainly improve the manuscript. We will attempt to source a visual illustration of the effect of kinase inhibitors on the breast cancer cells. However, the dataset we source this data from is publicly available, but unpublished and so our use is constrained by the terms of use of LINCs. We will contact the LINCS consortium to acquire permission and if they allow, we will certainly include them in our revised manuscript.

    *2.8 *Finally, cell morphology regulation is a pretty foundational process of life. One therefore wonders whether the pathways the authors pulled out of their analysis work also in other cell types, beyond breast cancer cells? What if they pooled data from different cell types that cover the morphological state space more broadly?

    R2.8 We thank the reviewer for this interesting point. We have observed that many if not most of the general processes identified, i.e. developmental pathways, extracellular matrix regulatory pathways and adhesion pathways, are already known to be associated with cell shape regulation and mechanotransduction in many different cell types. Thus, at the ‘big picture’ level, our findings hold across multiple cell types. However, the precise wiring of our network seems to be breast cancer specific. This is evidenced by the fact that when we try to use the LINCS data from other cell types to see if our network still holds in these contexts, we do not observe significant increase in changes in morphology when perturbing central nodes within our network compared to outside of our network. This is not unexpected: depending on the individual molecular background of each tissue and tumour type, signalling networks are known to be wired differently. Indeed, our method that uses the context- and cell feature- specific gene expression modules and the transcription factors that regulate them as a basis for extracting our cell shape signalling network allow for identification of exactly its specific wiring in the context used for training, i.e. the breast cancer cells. It would be very interesting to repeat our analysis on similar data on a different tissue type to identify parts of the network that seem to be identical versus those that differ. We have not been able yet to identify public systematic high content imaging data for a different tissue across multiple cell lines, but we will continue to look for such a dataset in the literature and through our network of collaborators. We will also explore the possibility of extracting such information from images from the TCGA to perform this analysis across patients of a specific cancer type, although admittedly we are not sure how feasible it would be to extract analogous features as the ones used for the breast cancer network from the images available. We will also add this point to the discussion.

    Reviewer #2 (Significance):

    The premise of this manuscript is very exciting and interesting: Is it possible to identify from a correlation of cell morphology and single cell gene expression the underlying cell signaling states that control morphology? Answers to this will begin to shed some light on the black box relation of morphology as an informant of cell states, which has been exploited by pathologists, physiologists, and cell biologists for more than a century, and which has seen a sharp revival recently thanks to deep learning, which is exceedingly good at finding correlations between data patterns.

    This paper would have a broad audience in quantitative cell biology, systems biology, and perhaps also life data science and cancer (although the cancer aspect is marginal).

    Reviewer #3 (Evidence, reproducibility and clarity (Required)):

    **Summary**

    In this work Barker et al. used computational approaches to analyze several existing data sets (including morphology and expression) in a common context of signaling-regulatory network that correlates with cell morphological features. They identified several pathways and associated transcription factors that their expression levels correlate with specific cell morphological features. The work thus has two main contributions. First, it provides a network of signaling pathways and regulons that may affect the morphological features of breast cancer cells. Second, the computational procedure can be general to study other systems.

    **Main comments**

    *3.1 *I assume that in all analyses using the packages listed in the manuscript, some parameters need to be selected. The authors need to provide these details, and discuss whether the results are robust against parameter choice (at least to certain degree).

    __R3.1 __We thank the reviewer for this comment. Parameters need to be selected in WGCNA and PCSF. In WGCNA they are selected based on the guidance given by the authors of the package, little significant variation in the results was observed when these were changed (i.e. the make-up of the gene expression modules naturally changed, but the processes enriched in the modules correlated to cell shape were the same).

    PCSF is a more sensitive step because the best solutions to the sub-network identification problem are observed when the network edge weights are permuted over a number of iterations. Following this, the union of the produced array of networks is taken to be the solution. Obviously, biology is an inherently noisy system and so this formulation of the PCSF algorithm can capture latent network architecture that the deterministic variation cannot. This introduces extra parameters based on the requirement to introduce random noise to the network, along with the standard PCSF parameters (seen here: https://rdrr.io/github/IOR-Bioinformatics/PCSF/man/PCSF_rand.html) that are used to take into account of user variations in network degree distribution, edge-weight distribution, etc. It is normal for some tuning of parameters to be required for users to tailor their PCSF to their supplied network. We used degree distribution to gauge whether our network appeared to be of a biological ‘scale-free’ distribution and selected parameters based on that. This provides an affirmation that our resulting network is consistent with how we understand the topology of biological networks, and as a result the parameters selected are not arbitrary.

    Nonetheless, we also tested variations in these parameters and found that although levels of significance in our validation would vary, the trends apparent from our validation did not (i.e., that targeting kinases within our network produced a larger effect on cell shape than those outside). From this we were assured that our conclusions mentioned in the discussion were robust to parameter selection. All details of parameters used are currently in our gitlab page, but we will additionally include them in the methods section.

    *3.2 *Cells show some degree of heterogeneity both in cell expression and morphological features, which can be affected by many factors. Wu et al. (Sci Adv 2020, 6 (4): eaaw6938) identified several subgroups of MD-MBA-231 cells with persistent (over generations) distinct morphology, expression profiles, and metastasis potential. Another possible main factor to cell morphology heterogeneity is cell cycle stage. I understand that the analyses in this work are limited by the types of data available, for example, the expression data are largely bulk. One exception might be the data shown in Fig 5. Besides giving the fold changes after kinase inhibitor treatment, the authors may also analyze the variance of cells before and after treatment to estimate the relative extent of cell-cell heterogeneity relative to the effect due to treatment.

    R3.2 We thank the reviewer for the comment and suggestion. We will perform such an analysis and include this point in the discussion. However, to reassure the reviewer, we are not concerned about this affecting our analysis in a major way as, in data from high content microscopy experiments such as the ones we used, hundreds of cells are sampled and the resulting quantified phenotypes are represented by the average from single cells, after removing outliers. Similarly in the bulk RNAseq experiments the dominant cell phenotype/expression profile would be mainly represented in the data. We are therefore reasonably confident that both sets of data used from each cell line indeed represented the most common phenotype for that cell line.

    3.3.* As related to point 2, In Fig. 1B, I am surprised that cell cycle only correlates with cell area significantly, while one knows that cells undergo dramatic change during cell cycle. For example, cells would turn to be roundish for mitosis. How would the authors explain the results? Is it possible that there is sampling bias towards interphase cells?*

    __R3.3 __We thank the reviewer for this comment and apologise for the confusion. The y axis of Fig.1B relates to gene expression modules identified in the expression data. These were named based on any informative term that could be associated with the genes within the modules as implemented by gene set enrichment. The goal was to provide more informative names than the default module names that are based on colours. ‘Cell cycle’ as a term in Reactome is a particularly generalisable gene set and was applied to the gene expression module in question because it was the only informative term identified for it. This singular gene expression module does not represent all transcriptomic activity associated with the cell cycle process. Indeed, the term ‘Cell cycle’ was also enriched in the ‘Hedgehog off-state’ gene expression module (Supplementary table 5). As the enrichment is based only on the genes: HAUS8;MCM8;NCAPH2;MIS12;BIRC5;CENPM;SPDL1;FBXO5;TYMS;TUBB4A, which are not necessarily the major cell cycle-relevant genes, we agree that the name of the specific module is not ideal and can cause confusion so we will rename this module. We will also go over the naming of all the modules to ensure that the names are indeed representative of the module functions.

    **Minor comments**

    *3.4 *In Fig 1B, I have trouble to understand the biological relevance of some module names, like "Green", "indianred4"?

    R3.4 Our pipeline uses WGCNA which constructs gene expression modules completely from gene expression data. We named modules based on terms we could find associated with the genes within a module. Some modules did not have any informative terms associated with them and so we opted to keep the default name of those modules that WGCNA supplies (based on colours). We will attempt to make this clearer in our revised manuscript, by adding a better explanation, and renaming these modules to something that makes it clear that we could not assign a clear function such as non-annotated (NA) module 1,2,3 etc.

    3.5* Fig 3B: I can't find a detailed explanation on how the combined score was calculated.*

    R3.5 Thank you for pointing this out. This is described in Chen* et al. 2013 as part of the enrichR package for gene set enrichment analysis. We will add this detail in the methods section under “Quantification and Statistical Analysis*“.

    3.6* Some of the cell features in Fig 5A are not in Fig 1. Are they from the same analysis? Any explanation?*

    R3.6 As the two datasets were acquired in two completely different studies there isn’t a 1-1 correspondence of the phenotype features, however several of them essentially represent the same phenotype. For example, in Fig 5., cytoplasm area, cytoplasm perimeter, nucleus area, nucleus length, nucleus width and nucleus perimeter, are analogous to features in Fig.1, such as cell area, cell width to length, nucleus area, nucleus width to length. The intersection between the features in these two datasets is not exact however, and we use the features in Fig.5 not used in the network construction as a negative control. This allows us to show that our network is phenotype-specific to the morphology features it was trained on. We will clarify this in the manuscript.

    3.7) It is interesting that the authors have identified a number of pathways known to be related to mechanosensing. Does the Hippo-YAP/TAZ pathway appear in their analysis?

    __R3.7 __Yes, YAP1 is also significantly highly ranked in our network propagation of activated transcription factors in Fig.4 in both luminal- and basal- shaped cell lines. Furthermore, since submitting we have been experimenting with identifying subnetworks of our regulatory network using maximum-flow. Here we assess the interaction between the Rap1 module (given its centrality to our discussion) with NFkB and what is the most efficient ‘flow’ of information between these two nodes given our network. To our interest, we identify LATS2, DVL1 (genes within the Rap1 module), YAP1 and TAZ as key mediating factors between these nodes. This implicates the Hippo-YAP/TAZ pathway as being of particular importance in the interface of our identified gene expression module and our derived signalling network. As an illustration of this we include here one of the preliminary networks derived from this analysis.

    Figure - Network describing maximum flow (top 20 edges) between Rap1 signaling module as the source node and NFKB1 as the target node. The super-node representing the gene expression module is coloured red (with the interface nodes used to embed it in the signaling network coloured blue) and NFKB1 coloured azure. Edge thickness indicates weight, which was used as the maximum capacity used in the max flow calculation.

    We will go over the figures and move barplots and more technical information to the supplement to make space for more figures of this nature that better illustrate the processes involved in the regulation of cell shape as derived from our analysis.

    Reviewer #3 (Significance):

    Extensive studies link cell morphological features and cell expression states, with some important work from one of the authors (Chris Bakal). This topic gains further interest recently. For example, Wu et al. (Sci Adv 2020, 6 (4): eaaw6938) demonstrated that cell shape encodes metastasis potential. Wang et al. (Sci. Adv. 2020 6 (36): eaba9319) traced cell dynamics in cell feature space. Given the context, the work of Barker et al. is a timely study to establish a cell shape-signaling network from integrated analysis of several different types of data. While some previous studies have related the NFkB network and cell morphology, this work further provides unbiased analysis on the relation between cell morphological features and multiple pathways/transcription factors. It is interesting, for example, that the study identifies the correlation between the Rap1 pathway and cell morphology, which was not well studied previously. As the authors acknowledged, there are some limitations of their approach. For example, the relations identified are correlative instead of causal without further verification. The resultant network may have sampling bias. Despite these limitations, I suggest that this work will be a nice contribution to the field and can provide a basis for further studies.

  2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #3

    Evidence, reproducibility and clarity

    Summary

    In this work Barker et al. used computational approaches to analyze several existing data sets (including morphology and expression) in a common context of signaling-regulatory network that correlates with cell morphological features. They identified several pathways and associated transcription factors that their expression levels correlate with specific cell morphological features. The work thus has two main contributions. First, it provides a network of signaling pathways and regulons that may affect the morphological features of breast cancer cells. Second, the computational procedure can be general to study other systems.

    Main comments

    1. I assume that in all analyses using the packages listed in the manuscript, some parameters need to be selected. The authors need to provide these details, and discuss whether the results are robust against parameter choice (at least to certain degree).

    2. Cells show some degree of heterogeneity both in cell expression and morphological features, which can be affected by many factors. Wu et al. (Sci Adv 2020, 6 (4): eaaw6938) identified several subgroups of MD-MBA-231 cells with persistent (over generations) distinct morphology, expression profiles, and metastasis potential. Another possible main factor to cell morphology heterogeneity is cell cycle stage. I understand that the analyses in this work are limited by the types of data available, for example, the expression data are largely bulk. One exception might be the data shown in Fig 5. Besides giving the fold changes after kinase inhibitor treatment, the authors may also analyze the variance of cells before and after treatment to estimate the relative extent of cell-cell heterogeneity relative to the effect due to treatment.

    3. As related to point 2, In Fig. 1B, I am surprised that cell cycle only correlates with cell area significantly, while one knows that cells undergo dramatic change during cell cycle. For example, cells would turn to be roundish for mitosis. How would the authors explain the results? Is it possible that there is sampling bias towards interphase cells?

    Minor comments

    1. In Fig 1B, I have trouble to understand the biological relevance of some module names, like "Green", "indianred4"?

    2. Fig 3B: I can't find detailed explanation on how the combined score was calculated.

    3. Some of the cell features in Fig 5A are not in Fig 1. Are they from the same analysis? Any explanation?

    4. It is interesting that the authors have identified a number of pathways known to be related to mechanosensing. Does the Hippo-YAP/TAZ pathway appear in their analysis?

    Significance

    Extensive studies link cell morphological features and cell expression states, with some important work from one of the authors (Chris Bakal). This topic gains further interest recently. For example, Wu et al. (Sci Adv 2020, 6 (4): eaaw6938) demonstrated that cell shape encodes metastasis potential. Wang et al. (Sci. Adv. 2020 6 (36): eaba9319) traced cell dynamics in cell feature space. Given the context, the work of Barker et al. is a timely study to establish a cell shape-signaling network from integrated analysis of several different types of data. While some previous studies have related the NFkB network and cell morphology, this work further provides unbiased analysis on the relation between cell morphological features and multiple pathways/transcription factors. It is interesting, for example, that the study identifies the correlation between the Rap1 pathway and cell morphology, which was not well studied previously. As the authors acknowledged, there are some limitations of their approach. For example, the relations identified are correlative instead of causal without further verification. The resultant network may have sampling bias. Despite these limitations, I suggest that this work will be a nice contribution to the field and can provide a basis for further studies.

  3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #2

    Evidence, reproducibility and clarity

    The authors combine single cell morphology and gene expression data to identify signaling activities implicated in the control of cellular morphogenesis. They describe a reasonable bioinformatics pipeline from gene expression shifts between two morphological phenotypes to pathways, then to common transcription factors to signaling. As far as I can assess the situation (I am not familiar with all the tools they use) the proposed pipeline works convincingly. However, I am concerned that the logic underlying this analysis is only partially valid. The link between signaling and morphology may be more direct than via TF-based gene expression regulation. Many signals (and many of the kinases the authors test for validation) are implicated in morphology control as direct upstream regulators of cytoskeleton dynamics and adhesion. This also applies to the GTPase Rap1, which the authors fish out as the most differentially expressed signal between two types of morphologies. In addition to the indirect effect of Rap1 on morphology via NFKB regulation suggested by the authors, Rap1 will affect morphology probably very directly through activation of Rac -> F-actin and RIAM -> nascent adhesions. At minimum, the authors should discuss this complexity as a caveat of their approach. And dependent on the impact the authors hope to have with this story, I believe they should experimentally resolve the ambiguity of direct vs indirect signaling for some of their key interpretations.

    In defense to the presented premise, the authors start out by looking for correlation between gene expression and morphology, and they find some signal. Correlation analysis, especially in large data sets, tends to be pretty robust and specific, even on presence of strong confounders. Thus, even though the correlation expression-morphology, which points indirectly at morphology-regulating signaling modules, is likely to be super-imposed by direct morphology-regulating signaling pathways the proposed approach will not be able to detect, the presented analysis is valuable, in principle.

    That said, I have a number of substantial concerns also with the implementation and presentation of the approach. First, on the presentation side, for a paper that talks about cell morphology it is strange to have not a single figure panel showing an image of cells, or at least cell outlines. As a reader I would like to get visual impression of how different a high vs low Rap1 gene expresser is, for example. Along the same lines, it is not quite clear to me when the authors collate entire cell lines into a single phenotype, do they switch then to population-based analysis? That is, for example the volcano plots in 2B,C are they representing an average gene expression shift? How heterogeneous are the morphological signals? Are the correlations between gene expression and morphology computed with single cell data as the basis? Could the volcano plots be sharpened by accounting for the single cell variation in morphology instead of lumping the cells into two morphological classes? On the back end of the paper, when the authors apply kinase inhibitors to validate some of the claimed pathways, it would be nice for the reader to see the morphological effects of these inhibitors. And to relate the kinase induced shifts to the morphological heterogeneity that is the basis for the study driving, initial correlation analysis? At the end of the day, the proof is in the pudding.

    Finally, cell morphology regulation is a pretty foundational process of life. One therefore wonders whether the pathways the authors pulled out of their analysis work also in other cell types, beyond breast cancer cells? What if they pooled data from different cell types that cover the morphological state space more broadly?

    Significance

    The premise of this manuscript is very exciting and interesting: Is it possible to identify from a correlation of cell morphology and single cell gene expression the underlying cell signaling states that control morphology? Answers to this will begin to shed some light on the black box relation of morphology as an informant of cell states, which has been exploited by pathologists, physiologists, and cell biologists for more than a century, and which has seen a sharp revival recently thanks to deep learning, which is exceedingly good at finding correlations between data patterns.

    This paper would have a broad audience in quantitative cell biology, systems biology, and perhaps also life data science and cancer (although the cancer aspect is marginal).

  4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #1

    Evidence, reproducibility and clarity

    The submitted manuscript 'Identification of phenotype-specific networks from paired gene expression-cell shape imaging data' of Barker et al. uses a convolute of different bioinformatics tools (see Key Resource Table) to analyze reported RNA sequencing data and to correlate derived pathways with imaging features of breast cancer cell lines based on specific pathway constructions. The thin red line of the data presentation in the manuscript is not obvious.

    Major concerns:

    1. The main biological 'finding' of the study RAP1 'as a potential mediator between the sensing of mechanical stimuli and regulation of NFkB activity' is reported and therefore the assumption 'how exactly extra-cellular mechanical cues are sensed by the cell and passed on to NFkB in breast cancer is not understood' is misleading. Please review: https://www.nature.com/articles/ncb2080 (human breast cancers with NF-κB hyperactivity show elevated levels of cytoplasmic Rap1. Similar to inhibiting NF-κB, knockdown of Rap1 sensitizes breast cancer cells to apoptosis) https://pubmed.ncbi.nlm.nih.gov/17510404/ (RAP1 is a crucial element in organizing acinar structure and inducing lumen formation), and https://pubmed.ncbi.nlm.nih.gov/21429211/. Besides, Fig. 2 and 3 are unrelated to this main statements.
    2. The spotted RAP1 (by TFs JARD2 and RUNX2) finding is not obvious without Fig. 4 results, a network propagation of functional TFs in differentially activated processes (basal vs. luminal) in the cell shape regulatory network. Please show that RAP1 could be not identified without the network based on TF and DEG only.
    3. More complex fluorescence phenotypes are available and do not match the complexity of the RNASeq data, data input and pathway construction with only 10 simple cell shape features. Conversely, relative 'monoclonal' breast cancer cell lines may are the only application for this workflow. Image features Fig. 1 and 5 do not match, with Fig. 5 being a rather indirect 'proof' of usability.
    4. Fig. 1a has not achieved a visual descriptive state and asking a lot.

    Significance

    The 'Review Commons' efficiently facilitates the reviewing process for the corresponding journals due to the broader 'audience'. On the other hand, authors face less restrictions and pressure for the same reason. Although I really like the idea of pulling the reviews upstream into the preprint process, I would like to answer here also with a kind of pre-review to avoid entering partly immature manuscript to 'Review Commons'. Review Commons might install an automated 'sanity check' of manuscripts in the future to keep the quality of submissions higher?