Elucidation of putative key genes involved in the regulation of triple negative breast cancer development and progression

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The molecular basis of triple-negative breast cancer (TNBC), a highly aggressive and therapy-resistant subtype of breast cancer, is poorly understood. This study aims to identify key genes and pathways involved in TNBC development and progression using a systems biology approach followed by experimental validation. Here, two transcriptome microarray datasets from the GEO database were analysed using the R package LIMMA to detect differentially expressed genes (DEGs) in TNBC tumors. Gene Ontology (GO) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) enrichment analyses using the DAVID database were performed to identify DEGs’ regulated biological functions and pathways. Further, a protein–protein interaction (PPI) network was constructed using the STRING online database, and the topological properties were determined using MCODE and Cytohubba plug-ins. The expression and the prognostic value of the hub genes were validated using the Cancer Genome Atlas (TCGA) survival analysis. We found 727 DEGs, of which 473 were downregulated and 254 were upregulated in TNBC vs . non-TNBC samples. The GO and KEGG analyses indicated that the DEGs were mainly related to cell adhesion, tumorigenesis, and cellular immunity. The PPI network had shown six hub genes, namely CCND1, CDH1, ESR1, FN1, IL6, and PPARG, as the top key regulators. All these genes were validated by quantitative real-time PCR in the TNBC cell line using non-TNBC cell line as a calibrator, and the obtained results were in accordance with the bioinformatics data. This information may contribute to understanding the various molecular mechanisms that drive the development and progression of TNBC tumors.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/20048138.

    Short summary of the research and contribution to the field

    This preprint uses an integrated systems-biology and transcriptomic analysis approach to identify putative key genes involved in triple-negative breast cancer (TNBC) development and progression. The authors analyzed two GEO microarray datasets, GSE27447 and GSE39004, comparing TNBC and non-TNBC samples, followed by differential gene-expression analysis, GO/KEGG enrichment, STRING-based protein–protein interaction analysis, hub-gene prioritization, TCGA/UALCAN-based validation, survival analysis, and qRT-PCR validation in MDA-MB-231 and MCF7 cell lines. The study identifies six candidate hub genes—CCND1, CDH1, ESR1, FN1, IL6, and PPARG—as putative regulators or biomarkers associated with TNBC biology.

    The work contributes to the field by combining public transcriptomic datasets, network topology, subtype-level expression analysis, survival association, and experimental qRT-PCR validation to prioritize biologically relevant genes in TNBC. The study is clinically relevant because TNBC remains a highly aggressive breast cancer subtype lacking ER, PR, and HER2 expression, which limits endocrine and HER2-targeted treatment options.

    Positive feedback / strengths

    1. Clinically important topic. TNBC is aggressive, heterogeneous, and difficult to treat, so identifying transcriptomic signatures and biologically relevant hub genes remains an important research goal.

    2. Integrated analysis strategy. The manuscript combines multiple layers of analysis: GEO-based differential expression, GO/KEGG enrichment, STRING/Cytoscape PPI analysis, TCGA/GEPIA/UALCAN validation, survival analysis, and qRT-PCR. This multi-step design is stronger than relying on a single bioinformatics method.

    3. Use of two independent GEO datasets. The authors use two publicly available datasets, GSE27447 and GSE39004, with TNBC and non-TNBC samples. The study reports that GSE27447 included 5 TNBC and 7 non-TNBC samples, while GSE39004 included 30 TNBC and 47 non-TNBC samples.

    4. Clear hub-gene prioritization. The PPI analysis uses degree, bottleneck, and betweenness centrality, and the overlap of top-ranked genes across these measures led to the six final hub genes: CDH1, PPARG, FN1, CCND1, IL6, and ESR1.

    5. Experimental validation adds value. The qRT-PCR validation in MDA-MB-231 and MCF7 cells provides experimental support beyond in silico analysis. The qPCR data on page 22 show higher IL6, FN1, CCND1, and PPARG expression and lower CDH1 and ESR1 expression in MDA-MB-231 relative to MCF7, broadly supporting several TNBC-associated expression patterns.

    6. Figures support the workflow. Figure 1 on page 15 shows volcano plots and heatmap-based DEG visualization; Figure 4 on page 18 summarizes the PPI network and overlapping hub-gene centrality results; Figure 7 on page 21 shows subtype-specific expression patterns; and Figure 8 on page 22 provides qRT-PCR validation. These visuals help readers follow the analytical pipeline.

    Major issues

    1. The DEG counts are inconsistent and need clarification

    The abstract states that the study found 727 DEGs, including 473 downregulated and 254 upregulated genes. However, the Results section reports 343 upregulated and 368 downregulated genes in GSE27447, 254 upregulated and 473 downregulated genes in GSE39004, and then later states that 816 unique upregulated and 992 unique downregulated genes were obtained after assessment. It also states that 1810 DEGs were submitted to STRING for PPI analysis.

    Suggested improvement: The authors should provide a clear DEG workflow table showing:

    • DEGs identified in each dataset separately

    • genes overlapping between datasets

    • genes unique to either dataset

    • final DEG list used for enrichment

    • final DEG list used for STRING/PPI analysis

    • whether "727 DEGs" refers only to GSE39004 or the final candidate list

    This is important because the downstream PPI and enrichment results depend directly on the final DEG set.

    2. The study needs stronger control for dataset heterogeneity and batch/platform effects

    The study uses two GEO microarray datasets, but the manuscript does not clearly describe platform differences, normalization strategy, batch correction, probe-to-gene mapping, duplicate probe handling, or whether raw or processed expression data were analyzed. GEO-2R can be useful for exploratory analysis, but cross-dataset integration requires careful harmonization.

    Suggested improvement: The authors should clarify:

    • microarray platforms used for GSE27447 and GSE39004

    • whether raw CEL files or processed data were used

    • normalization method

    • batch-effect correction strategy

    • probe-to-gene annotation method

    • handling of multiple probes per gene

    • whether the combined DEG list was generated by meta-analysis or simple overlap/union

    Without this detail, it is difficult to assess the robustness of the DEGs and hub-gene selection.

    3. Several identified hub genes are not novel TNBC-specific regulators

    Genes such as ESR1, CDH1, CCND1, FN1, IL6, and PPARG are biologically relevant, but many are already well-studied in breast cancer. ESR1 and CDH1 expression differences may reflect known subtype biology rather than newly discovered TNBC-specific drivers. For example, the manuscript itself notes that ESR1 is diminished in TNBC compared with luminal breast cancer, which is expected because TNBC is defined by lack of estrogen receptor signaling.

    Suggested improvement: The authors should soften claims of novelty and clarify whether these genes are:

    • known breast cancer markers rediscovered by the pipeline

    • TNBC subtype-discriminating genes

    • functional drivers of TNBC progression

    • prognostic markers

    • therapeutic targets

    • network-central genes without proven causal role

    A useful revision would be to frame the study as identifying candidate network-prioritized genes rather than "novel key regulators," unless functional data are added.

    4. TNBC-specific validation needs improvement

    GEPIA and TCGA breast cancer analyses appear to compare general breast cancer tissue versus normal tissue in some sections, while UALCAN compares breast cancer molecular subtypes. These are different biological comparisons. A gene that differs between breast cancer and normal tissue may not be TNBC-specific.

    Suggested improvement: The authors should separate validation into distinct comparisons:

    • TNBC vs normal breast tissue

    • TNBC vs luminal breast cancer

    • TNBC vs HER2-positive breast cancer

    • TNBC vs all non-TNBC tumors

    • TNBC-specific survival analysis, not all breast cancer survival

    This would help determine whether the identified genes are truly relevant to TNBC specifically.

    5. qRT-PCR validation is limited to one TNBC and one non-TNBC cell line

    The experimental validation uses MDA-MB-231 as a TNBC model and MCF7 as a non-TNBC/luminal model. While this is a useful first step, one cell line per subtype cannot capture TNBC heterogeneity or generalize to patient tumors.

    Suggested improvement: The authors should validate hub genes in additional cell lines, such as:

    • TNBC: MDA-MB-468, BT-549, HCC1806, Hs578T

    • Luminal/non-TNBC: T47D, ZR-75-1, BT-474, SKBR3 depending on comparison group

    Even better, validation in independent patient samples or public RNA-seq datasets would strengthen the conclusions.

    6. qRT-PCR methods need more experimental detail

    The manuscript states that samples were run in technical triplicates and expression was quantified by the 2^-ΔΔCt method using GAPDH. However, it is not clear how many biological replicates were performed, whether primer efficiency was validated, or whether statistical tests were applied to qPCR results.

    Suggested improvement: The authors should report:

    • number of biological replicates

    • number of technical replicates

    • primer efficiency

    • melt-curve specificity

    • no-template controls

    • RNA integrity/quality

    • reverse-transcription controls

    • statistical test used

    • p-values/error bars for qPCR results

    This is particularly important because Figure 8 shows large expression differences, but the statistical strength and reproducibility are not clear from the figure alone.

    7. PPARG results are internally inconsistent and need deeper interpretation

    The manuscript reports that PPARG was lower in TNBC samples compared with luminal tumors in UALCAN, but qRT-PCR showed PPARG was elevated in MDA-MB-231 compared with MCF7. The authors acknowledge that this discrepancy may reflect differences between patient tumor samples and in vitro cell line models.

    Suggested improvement: The authors should expand this discussion and consider whether PPARG should be treated differently from the other hub genes. Additional validation using multiple TNBC cell lines and patient-level data would help determine whether PPARG is consistently TNBC-associated or context-dependent.

    8. Survival analysis should be TNBC-specific and statistically clearer

    The manuscript states that CCND1, ESR1, IL6, and PPARG were associated with overall survival, but it is unclear whether survival was assessed in all breast cancer patients or specifically in TNBC patients. Figure 6 on page 20 shows Kaplan–Meier curves for the six genes, but the patient cohort, censoring details, hazard-ratio interpretation, and multiple-testing correction are not clearly described.

    Suggested improvement: The authors should clarify:

    • survival dataset used

    • whether analysis was TNBC-specific

    • number of patients in high/low expression groups

    • clinical covariates considered

    • p-value adjustment for multiple testing

    • whether multivariable Cox regression was performed

    • whether subtype, stage, age, and treatment were controlled

    Without this, prognostic claims should be presented cautiously.

    9. Functional claims require direct mechanistic validation

    The study suggests that hub genes may regulate TNBC development and progression, but the experimental work only validates expression differences. Expression validation alone does not prove regulatory function.

    Suggested improvement: To support mechanistic claims, the authors should consider functional assays such as:

    • siRNA/shRNA/CRISPR knockdown or overexpression

    • proliferation assay

    • migration/invasion assay

    • apoptosis assay

    • EMT marker analysis

    • IL6/JAK/STAT pathway readout

    • FN1-mediated adhesion/migration assay

    • rescue experiments

    This would help move the work from correlation to functional biology.

    10. The manuscript should update tools and database versions

    The manuscript mentions DAVID version 6.7 accessed in January 2022. DAVID 6.7 is old, and the preprint is from 2026. Similarly, Cytoscape 3.8.2 and STRING should be described with version and confidence thresholds.

    Suggested improvement: The authors should update or clearly justify tool choices and include:

    • current DAVID/Enrichr/clusterProfiler/g:Profiler analysis

    • STRING version and confidence cutoff

    • correction method for enrichment p-values

    • Cytoscape/CytoHubba version

    • full parameter settings

    • code availability

    This would improve reproducibility.

    Minor issues

    1. Clarify TNBC versus non-TNBC terminology. "Non-TNBC" includes biologically diverse subtypes, including luminal and HER2-positive tumors. The manuscript should avoid treating non-TNBC as a single homogeneous comparator without explanation.

    2. Correct typographical and formatting errors. Examples include "Fold2 Change," "PPRAG" instead of PPARG, inconsistent capitalization, missing spaces, and awkward phrasing in several sections.

    3. Improve figure resolution. Figures 1–8 are useful but some labels are difficult to read. Higher-resolution images and larger axis labels would improve interpretability.

    4. Add a full supplementary gene list. The manuscript should provide final upregulated/downregulated DEG lists, log2FC, adjusted p-values, gene symbols, and dataset origin.

    5. Report adjusted p-values consistently. The Methods mention Benjamini–Hochberg correction, but later sections often refer only to p < 0.05. The authors should report FDR-adjusted p-values wherever possible.

    6. Clarify "MCODE" use. The abstract mentions MCODE, but the methods/results emphasize degree, bottleneck, and betweenness. The manuscript should clarify whether MCODE modules were actually used and how they contributed to final hub-gene selection.

    7. Clarify whether hub genes are upregulated or downregulated in TNBC. A final summary table should list each hub gene, DEG direction in GEO, TCGA subtype expression pattern, survival association, and qRT-PCR direction.

    8. Improve pathway interpretation. The GO/KEGG sections are descriptive. The authors should connect enriched pathways more directly to TNBC biology, such as immune signaling, EMT, extracellular matrix remodeling, proliferation, and hormone receptor loss.

    9. Avoid overclaiming diagnostic/prognostic utility. The current data support candidate biomarker prioritization, but not clinical biomarker validation. Claims should be softened unless larger independent validation is added.

    10. Add data/code availability. A GitHub repository or supplementary R scripts would improve reproducibility.

    Overall assessment

    This is a useful exploratory systems-biology study that identifies and validates six candidate hub genes associated with TNBC and breast cancer subtype biology. The study's strengths include its integrated workflow, use of public transcriptomic datasets, network-based prioritization, TCGA/UALCAN validation, and qRT-PCR confirmation in representative breast cancer cell lines.

    The main areas needing improvement are clarification of DEG counts, stronger handling of cross-dataset normalization and batch effects, more careful framing of novelty, TNBC-specific validation, expanded qRT-PCR experimental detail, and functional validation of candidate genes. The current data support these genes as candidate TNBC-associated network markers, but not yet as confirmed causal regulators or clinically validated diagnostic/prognostic biomarkers.

    With these revisions, the manuscript would provide a stronger and more reproducible contribution to TNBC transcriptomic biomarker discovery and could better guide future functional studies of CCND1, CDH1, ESR1, FN1, IL6, and PPARG in TNBC biology.

    Competing interests

    The author declares that they have no competing interests.

    Use of Artificial Intelligence (AI)

    The author declares that they used generative AI to come up with new ideas for their review.