Proteogenomic analysis of cancer aneuploidy and normal tissues reveals divergent modes of gene regulation across cellular pathways

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This work by Cheng et al evaluates the contribution of regulation of gene expression at the RNA and protein level by leveraging copy number variations in a large cohort of cancer samples. Importantly they find that there is rarely compensatory regulation at the RNA and protein level together, but depending on the gene, expression is either compensated at one or the other. The paper is very intriguing and the findings are of interest to a broad readership.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

How cells control gene expression is a fundamental question. The relative contribution of protein-level and RNA-level regulation to this process remains unclear. Here, we perform a proteogenomic analysis of tumors and untransformed cells containing somatic copy number alterations (SCNAs). By revealing how cells regulate RNA and protein abundances of genes with SCNAs, we provide insights into the rules of gene regulation. Protein complex genes have a strong protein-level regulation while non-complex genes have a strong RNA-level regulation. Notable exceptions are plasma membrane protein complex genes, which show a weak protein-level regulation and a stronger RNA-level regulation. Strikingly, we find a strong negative association between the degree of RNA-level and protein-level regulation across genes and cellular pathways. Moreover, genes participating in the same pathway show a similar degree of RNA- and protein-level regulation. Pathways including translation, splicing, RNA processing, and mitochondrial function show a stronger protein-level regulation while cell adhesion and migration pathways show a stronger RNA-level regulation. These results suggest that the evolution of gene regulation is shaped by functional constraints and that many cellular pathways tend to evolve one predominant mechanism of gene regulation at the protein level or at the RNA level.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    Cheng et al. address one of the fundamental questions of gene expression regulation - what are the relative contributions of RNA-level and protein-level regulation to the final gene expression levels. In order to do that they take advantage of mainly published datasets, especially tumor datasets where matching somatic copy number alterations (SCNAs), RNA expression and protein expression data is available. Performing proteogenomic analysis (taking DNA, RNA and protein into account) they address several open questions, such as: Is gene compensation happening mainly at the RNA level, protein level or both for each gene? Is this the same across different tissue types and also cellular pathways? Taking advantage of the SCNAs in the DNA, the authors use correlation analysis of DNA to RNA and RNA to protein to determine if the expression of a gene is regulated mainly at the level of RNA or protein in the respective samples.

    Although it is mainly a very descriptive study, the meta-analysis of existing datasets (and one smaller dataset that was newly generated) yields very interesting observations, which will be of interest to the cancer and gene expression community. However, there is limited mechanistic insight into how the observations can be explained. This is not a problem in my view as the observations are interesting enough in themselves.

    The main findings of the study are:

    • In general genes are either regulated at the RNA-level or at the protein level, but rarely at both.
    • This is the first study (at least as far as I know) to look at tissue-specific RNA-level and protein-level compensation across several different tumor types. Interestingly, the authors show tissue specificity of RNA and protein-level compensation - for example lung adenocarcinoma does not show nearly any compensation.
    • Protein complex genes show stronger protein-level regulation than non-complex genes and the opposite trend in regards to RNA level regulation.
    • There seems to be an agreement for genes within the same pathway that they show a similar regulatory mode (either RNA level or protein level).
    • Genes involved in RNA processing, mRNA translation and mitochondrial regulation are generally upregulated at the protein-level in highly aneuploid primary tumor samples.

    However, I do think that two points need to be addressed by additional analyses to strengthen the findings.

    • The authors show that SCNAs are often significantly compensated at the protein-level in most tumor types. This compensation is also normally stronger than RNA level compensation. A technical issue about this finding that needs to be addressed is that this is mainly based on proteomics data that used TMT for quantification. TMT-based quantifications, although quite precise, are not always the most accurate measurements in the sense of capturing the true amplitude of changes. This is due to the so-called ratio compression of TMT mass spec data. The authors need to account for that in order to exclude that this technical limitation of TMT-based proteomics measurements is a main contributor to the protein-level compensation seen. Do the authors also have some proteomics data where label-free quantification of SILAC quantification was used? Do the same conclusions hold true when such data sets are used?

    We thank the reviewer for this comment and point which we have now addressed through the following literature search or analyses:

    • First, we found there are some previous studies which observed the similar protein-level compensation in yeast and human cells by different detection methods. Dephoure et al. compared two different methods, stable isotope labeling by amino acids in cell culture (SILAC) and tandem mass tag (TMT) based proteomics. The protein-level compensation of gained genes in yeast was discovered by both methods (Figure 2 and Figure 2 – figure supplement 1 of Dephoure et al., 2014). Similarly, Stingele et al. identified the protein-level compensation in pairs of isogenic diploid and aneuploid human cell lines by SILAC (Figure 2B of Stingele et al., 2012). Another group also found the protein-level compensation in primary human fibroblasts from individuals with Patau (trisomy 13), Edwards (trisomy 18) or Down (trisomy 21) syndromes by MS3-based approach (Hwang et al., 2021), which should eliminate the interference of ratio distortion (Ting et al., 2011). Taken together, those previous studies suggest the protein-level compensation should not be just the artifacts induced by the technical limitation of TMT-based proteomics.

    • To further validate the protein-level compensation, we performed the same analysis on TCGA (The Cancer Genome Atlas Program) (Research Network et al., 2013) COAD samples for which label-free proteomics data is available (Zhang et al., Nature, 2014). Consistent with TMT-based proteomics, significant compensation at the protein level was found, which is higher for complex genes than non-complex genes (Figure 1 – figure supplement 1C, Supplementary File 1G). As we observed before for COAD (Figure 1C), RNA-level compensation was shown in all groups of DNA change, and was stronger for non-complex genes (deep loss and high gain, FDR<0.005, Figure 1 – figure supplement 1C, Supplementary File 1G). These results suggest that the limitations imposed by the TMT quantification do not alter the conclusions of our analysis on gene compensation. We have now added this data in Figure 1 – figure supplement 1C and Supplementary File 1G and corresponding text at page 5.

    • Many of the statistically significant differences seen - e.g complexed proteins versus non-complexed proteins, highly conserved proteins versus less conserved proteins - have actually a relatively small effect size. It is not 100% clear to me that the authors apply always the most stringent and appropriate statistical evaluation. For example, when two density plots are compared and it is evaluated if the distributions differ significantly from each other (e.g. the median), the authors constantly use a bootstrapping strategy (most plots in Fig 2 and Fig S2). Due to the high number of iterations, bootstrapping is very sensitive to picking up statistical differences, even if there are very small effect size differences (as is the case for many of the comparisons). Would not a KS test be more appropriate to compare two density distributions? If a KS test is applied - do the authors still recapitulate the same statistical significance tendencies as seen with their bootstrapping strategy?

    We thank the reviewer for this comment, and we have addressed it in detail. We have performed the analyses using Mann-Whitney U test and Kolmogorov-Smirnov (KS) test (Supplementary File 2K). Compared with bootstrapping, the p-values calculated by Mann-Whitney U test or KS test were much smaller, close to zero. Therefore, the same statistical significance tendencies were observed no matter which statistic method was used (bootstrapping, Mann-Whitney U test or Kolmogorov-Smirnov test). While Mann-Whitney U test or KS test carries the risk of p-value inflation due to the high sample number, the bootstrapping method can solve the problem as it is independent from the sample number. Initially we had used Mann-Whitney U test for all our analyses and were suggested to include bootstrapping method after consultation with the NYU Biostatistics Resource.

  2. Evaluation Summary:

    This work by Cheng et al evaluates the contribution of regulation of gene expression at the RNA and protein level by leveraging copy number variations in a large cohort of cancer samples. Importantly they find that there is rarely compensatory regulation at the RNA and protein level together, but depending on the gene, expression is either compensated at one or the other. The paper is very intriguing and the findings are of interest to a broad readership.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    Cheng et al. address one of the fundamental questions of gene expression regulation - what are the relative contributions of RNA-level and protein-level regulation to the final gene expression levels. In order to do that they take advantage of mainly published datasets, especially tumor datasets where matching somatic copy number alterations (SCNAs), RNA expression and protein expression data is available. Performing proteogenomic analysis (taking DNA, RNA and protein into account) they address several open questions, such as: Is gene compensation happening mainly at the RNA level, protein level or both for each gene? Is this the same across different tissue types and also cellular pathways? Taking advantage of the SCNAs in the DNA, the authors use correlation analysis of DNA to RNA and RNA to protein to determine if the expression of a gene is regulated mainly at the level of RNA or protein in the respective samples.

    Although it is mainly a very descriptive study, the meta-analysis of existing datasets (and one smaller dataset that was newly generated) yields very interesting observations, which will be of interest to the cancer and gene expression community. However, there is limited mechanistic insight into how the observations can be explained. This is not a problem in my view as the observations are interesting enough in themselves.

    The main findings of the study are:
    - In general genes are either regulated at the RNA-level or at the protein level, but rarely at both.
    - This is the first study (at least as far as I know) to look at tissue-specific RNA-level and protein-level compensation across several different tumor types. Interestingly, the authors show tissue specificity of RNA and protein-level compensation - for example lung adenocarcinoma does not show nearly any compensation.
    - Protein complex genes show stronger protein-level regulation than non-complex genes and the opposite trend in regards to RNA level regulation.
    - There seems to be an agreement for genes within the same pathway that they show a similar regulatory mode (either RNA level or protein level).
    - Genes involved in RNA processing, mRNA translation and mitochondrial regulation are generally upregulated at the protein-level in highly aneuploid primary tumor samples.

    However, I do think that two points need to be addressed by additional analyses to strengthen the findings.
    - The authors show that SCNAs are often significantly compensated at the protein-level in most tumor types. This compensation is also normally stronger than RNA level compensation. A technical issue about this finding that needs to be addressed is that this is mainly based on proteomics data that used TMT for quantification. TMT-based quantifications, although quite precise, are not always the most accurate measurements in the sense of capturing the true amplitude of changes. This is due to the so-called ratio compression of TMT mass spec data. The authors need to account for that in order to exclude that this technical limitation of TMT-based proteomics measurements is a main contributor to the protein-level compensation seen. Do the authors also have some proteomics data where label-free quantification of SILAC quantification was used? Do the same conclusions hold true when such data sets are used?
    - Many of the statistically significant differences seen - e.g complexed proteins versus non-complexed proteins, highly conserved proteins versus less conserved proteins - have actually a relatively small effect size. It is not 100% clear to me that the authors apply always the most stringent and appropriate statistical evaluation. For example, when two density plots are compared and it is evaluated if the distributions differ significantly from each other (e.g. the median), the authors constantly use a bootstrapping strategy (most plots in Fig 2 and Fig S2). Due to the high number of iterations, bootstrapping is very sensitive to picking up statistical differences, even if there are very small effect size differences (as is the case for many of the comparisons). Would not a KS test be more appropriate to compare two density distributions? If a KS test is applied - do the authors still recapitulate the same statistical significance tendencies as seen with their bootstrapping strategy?

  4. Reviewer #2 (Public Review):

    This paper addresses the question of how changes in the copy number of genes affect changes in the levels of the corresponding mRNAs and proteins. To this end, the authors investigate a number of published and newly generated datasets from both cancer and normal tissues. They observe that buffering of gene copy number changes mostly occurs at the protein level but can for some genes also be seen at the mRNA. Interestingly, buffering at both levels is inversely correlated so that it either occurs at the protein or at the mRNA level but not at both levels. Also, the type of buffering tends to be similar for genes involved in the same cellular pathway.

    This paper addresses an important question in a thorough and thoughtful way. Its strength is that it integrates results from a broad range of datasets (including newly generated data) to arrive at consistent conclusions. While similar analyses have also been reported by others (like Conclaves et al., 2017), this manuscript extends these analyses and provides a more detailed picture. The data analysis presented is sound, and the fact that observations can be replicated in different independent datasets highlights the general relevance of the findings presented. The finding that mRNA and protein level buffering tend to be inversely correlated and are similar for functionally related genes is interesting. Also, the observation that RNA- and protein-level compensation depends on the tumor type is interesting, even though no explanation for this finding is presented. Overall, the conclusions are supported by the presented data.