Elucidation of putative key genes involved in the regulation of triple negative breast cancer development and progression

Abstract

The molecular basis of triple-negative breast cancer (TNBC), a highly aggressive and therapy-resistant subtype of breast cancer, is poorly understood. This study aims to identify key genes and pathways involved in TNBC development and progression using a systems biology approach followed by experimental validation. Here, two transcriptome microarray datasets from the GEO database were analysed using the R package LIMMA to detect differentially expressed genes (DEGs) in TNBC tumors. Gene Ontology (GO) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) enrichment analyses using the DAVID database were performed to identify DEGs’ regulated biological functions and pathways. Further, a protein–protein interaction (PPI) network was constructed using the STRING online database, and the topological properties were determined using MCODE and Cytohubba plug-ins. The expression and the prognostic value of the hub genes were validated using the Cancer Genome Atlas (TCGA) survival analysis. We found 727 DEGs, of which 473 were downregulated and 254 were upregulated in TNBC vs . non-TNBC samples. The GO and KEGG analyses indicated that the DEGs were mainly related to cell adhesion, tumorigenesis, and cellular immunity. The PPI network had shown six hub genes, namely CCND1, CDH1, ESR1, FN1, IL6, and PPARG, as the top key regulators. All these genes were validated by quantitative real-time PCR in the TNBC cell line using non-TNBC cell line as a calibrator, and the obtained results were in accordance with the bioinformatics data. This information may contribute to understanding the various molecular mechanisms that drive the development and progression of TNBC tumors.

This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/20048138.

Short summary of the research and contribution to the field

This preprint uses an integrated systems-biology and transcriptomic analysis approach to identify putative key genes involved in triple-negative breast cancer (TNBC) development and progression. The authors analyzed two GEO microarray datasets, GSE27447 and GSE39004, comparing TNBC and non-TNBC samples, followed by differential gene-expression analysis, GO/KEGG enrichment, STRING-based protein–protein interaction analysis, hub-gene prioritization, TCGA/UALCAN-based validation, survival analysis, and qRT-PCR validation in MDA-MB-231 and MCF7 cell lines. The study identifies six candidate hub genes—CCND1, CDH1, ESR1, FN1, IL6, and PPARG—as putative regulators or biomarkers associated with TNBC biology.

The work contributes to the field by combining public transcriptomic datasets, network topology, subtype-level expression analysis, survival association, and experimental qRT-PCR validation to prioritize biologically relevant genes in TNBC. The study is clinically relevant because TNBC remains a highly aggressive breast cancer subtype lacking ER, PR, and HER2 expression, which limits endocrine and HER2-targeted treatment options.

Positive feedback / strengths

Clinically important topic. TNBC is aggressive, heterogeneous, and difficult to treat, so identifying transcriptomic signatures and biologically relevant hub genes remains an important research goal.
Integrated analysis strategy. The manuscript combines multiple layers of analysis: GEO-based differential expression, GO/KEGG enrichment, STRING/Cytoscape PPI analysis, TCGA/GEPIA/UALCAN validation, survival analysis, and qRT-PCR. This multi-step design is stronger than relying on a single bioinformatics method.
Use of two independent GEO datasets. The authors use two publicly available datasets, GSE27447 and GSE39004, with TNBC and non-TNBC samples. The study reports that GSE27447 included 5 TNBC and 7 non-TNBC samples, while GSE39004 included 30 TNBC and 47 non-TNBC samples.
Clear hub-gene prioritization. The PPI analysis uses degree, bottleneck, and betweenness centrality, and the overlap of top-ranked genes across these measures led to the six final hub genes: CDH1, PPARG, FN1, CCND1, IL6, and ESR1.
Experimental validation adds value. The qRT-PCR validation in MDA-MB-231 and MCF7 cells provides experimental support beyond in silico analysis. The qPCR data on page 22 show higher IL6, FN1, CCND1, and PPARG expression and lower CDH1 and ESR1 expression in MDA-MB-231 relative to MCF7, broadly supporting several TNBC-associated expression patterns.
Figures support the workflow. Figure 1 on page 15 shows volcano plots and heatmap-based DEG visualization; Figure 4 on page 18 summarizes the PPI network and overlapping hub-gene centrality results; Figure 7 on page 21 shows subtype-specific expression patterns; and Figure 8 on page 22 provides qRT-PCR validation. These visuals help readers follow the analytical pipeline.

Major issues

1. The DEG counts are inconsistent and need clarification

The abstract states that the study found 727 DEGs, including 473 downregulated and 254 upregulated genes. However, the Results section reports 343 upregulated and 368 downregulated genes in GSE27447, 254 upregulated and 473 downregulated genes in GSE39004, and then later states that 816 unique upregulated and 992 unique downregulated genes were obtained after assessment. It also states that 1810 DEGs were submitted to STRING for PPI analysis.

Suggested improvement: The authors should provide a clear DEG workflow table showing:

DEGs identified in each dataset separately
genes overlapping between datasets
genes unique to either dataset
final DEG list used for enrichment
final DEG list used for STRING/PPI analysis
whether "727 DEGs" refers only to GSE39004 or the final candidate list

This is important because the downstream PPI and enrichment results depend directly on the final DEG set.

2. The study needs stronger control for dataset heterogeneity and batch/platform effects

The study uses two GEO microarray datasets, but the manuscript does not clearly describe platform differences, normalization strategy, batch correction, probe-to-gene mapping, duplicate probe handling, or whether raw or processed expression data were analyzed. GEO-2R can be useful for exploratory analysis, but cross-dataset integration requires careful harmonization.

Suggested improvement: The authors should clarify:

microarray platforms used for GSE27447 and GSE39004
whether raw CEL files or processed data were used
normalization method
batch-effect correction strategy
probe-to-gene annotation method
handling of multiple probes per gene
whether the combined DEG list was generated by meta-analysis or simple overlap/union

Without this detail, it is difficult to assess the robustness of the DEGs and hub-gene selection.

3. Several identified hub genes are not novel TNBC-specific regulators

Genes such as ESR1, CDH1, CCND1, FN1, IL6, and PPARG are biologically relevant, but many are already well-studied in breast cancer. ESR1 and CDH1 expression differences may reflect known subtype biology rather than newly discovered TNBC-specific drivers. For example, the manuscript itself notes that ESR1 is diminished in TNBC compared with luminal breast cancer, which is expected because TNBC is defined by lack of estrogen receptor signaling.

Suggested improvement: The authors should soften claims of novelty and clarify whether these genes are:

known breast cancer markers rediscovered by the pipeline
TNBC subtype-discriminating genes
functional drivers of TNBC progression
prognostic markers
therapeutic targets
network-central genes without proven causal role

A useful revision would be to frame the study as identifying candidate network-prioritized genes rather than "novel key regulators," unless functional data are added.

4. TNBC-specific validation needs improvement

GEPIA and TCGA breast cancer analyses appear to compare general breast cancer tissue versus normal tissue in some sections, while UALCAN compares breast cancer molecular subtypes. These are different biological comparisons. A gene that differs between breast cancer and normal tissue may not be TNBC-specific.

Suggested improvement: The authors should separate validation into distinct comparisons:

TNBC vs normal breast tissue
TNBC vs luminal breast cancer
TNBC vs HER2-positive breast cancer
TNBC vs all non-TNBC tumors
TNBC-specific survival analysis, not all breast cancer survival

This would help determine whether the identified genes are truly relevant to TNBC specifically.

5. qRT-PCR validation is limited to one TNBC and one non-TNBC cell line

The experimental validation uses MDA-MB-231 as a TNBC model and MCF7 as a non-TNBC/luminal model. While this is a useful first step, one cell line per subtype cannot capture TNBC heterogeneity or generalize to patient tumors.

Suggested improvement: The authors should validate hub genes in additional cell lines, such as:

TNBC: MDA-MB-468, BT-549, HCC1806, Hs578T
Luminal/non-TNBC: T47D, ZR-75-1, BT-474, SKBR3 depending on comparison group

Even better, validation in independent patient samples or public RNA-seq datasets would strengthen the conclusions.

6. qRT-PCR methods need more experimental detail

The manuscript states that samples were run in technical triplicates and expression was quantified by the 2^-ΔΔCt method using GAPDH. However, it is not clear how many biological replicates were performed, whether primer efficiency was validated, or whether statistical tests were applied to qPCR results.

Suggested improvement: The authors should report:

number of biological replicates
number of technical replicates
primer efficiency
melt-curve specificity
no-template controls
RNA integrity/quality
reverse-transcription controls
statistical test used
p-values/error bars for qPCR results

This is particularly important because Figure 8 shows large expression differences, but the statistical strength and reproducibility are not clear from the figure alone.

7. PPARG results are internally inconsistent and need deeper interpretation

The manuscript reports that PPARG was lower in TNBC samples compared with luminal tumors in UALCAN, but qRT-PCR showed PPARG was elevated in MDA-MB-231 compared with MCF7. The authors acknowledge that this discrepancy may reflect differences between patient tumor samples and in vitro cell line models.

Suggested improvement: The authors should expand this discussion and consider whether PPARG should be treated differently from the other hub genes. Additional validation using multiple TNBC cell lines and patient-level data would help determine whether PPARG is consistently TNBC-associated or context-dependent.

8. Survival analysis should be TNBC-specific and statistically clearer

The manuscript states that CCND1, ESR1, IL6, and PPARG were associated with overall survival, but it is unclear whether survival was assessed in all breast cancer patients or specifically in TNBC patients. Figure 6 on page 20 shows Kaplan–Meier curves for the six genes, but the patient cohort, censoring details, hazard-ratio interpretation, and multiple-testing correction are not clearly described.

Suggested improvement: The authors should clarify:

survival dataset used
whether analysis was TNBC-specific
number of patients in high/low expression groups
clinical covariates considered
p-value adjustment for multiple testing
whether multivariable Cox regression was performed
whether subtype, stage, age, and treatment were controlled

Without this, prognostic claims should be presented cautiously.

9. Functional claims require direct mechanistic validation

The study suggests that hub genes may regulate TNBC development and progression, but the experimental work only validates expression differences. Expression validation alone does not prove regulatory function.

Suggested improvement: To support mechanistic claims, the authors should consider functional assays such as:

siRNA/shRNA/CRISPR knockdown or overexpression
proliferation assay
migration/invasion assay
apoptosis assay
EMT marker analysis
IL6/JAK/STAT pathway readout
FN1-mediated adhesion/migration assay
rescue experiments

This would help move the work from correlation to functional biology.

10. The manuscript should update tools and database versions

The manuscript mentions DAVID version 6.7 accessed in January 2022. DAVID 6.7 is old, and the preprint is from 2026. Similarly, Cytoscape 3.8.2 and STRING should be described with version and confidence thresholds.

Suggested improvement: The authors should update or clearly justify tool choices and include:

current DAVID/Enrichr/clusterProfiler/g:Profiler analysis
STRING version and confidence cutoff
correction method for enrichment p-values
Cytoscape/CytoHubba version
full parameter settings
code availability

This would improve reproducibility.

Minor issues

Clarify TNBC versus non-TNBC terminology. "Non-TNBC" includes biologically diverse subtypes, including luminal and HER2-positive tumors. The manuscript should avoid treating non-TNBC as a single homogeneous comparator without explanation.
Correct typographical and formatting errors. Examples include "Fold2 Change," "PPRAG" instead of PPARG, inconsistent capitalization, missing spaces, and awkward phrasing in several sections.
Improve figure resolution. Figures 1–8 are useful but some labels are difficult to read. Higher-resolution images and larger axis labels would improve interpretability.
Add a full supplementary gene list. The manuscript should provide final upregulated/downregulated DEG lists, log2FC, adjusted p-values, gene symbols, and dataset origin.
Report adjusted p-values consistently. The Methods mention Benjamini–Hochberg correction, but later sections often refer only to p < 0.05. The authors should report FDR-adjusted p-values wherever possible.
Clarify "MCODE" use. The abstract mentions MCODE, but the methods/results emphasize degree, bottleneck, and betweenness. The manuscript should clarify whether MCODE modules were actually used and how they contributed to final hub-gene selection.
Clarify whether hub genes are upregulated or downregulated in TNBC. A final summary table should list each hub gene, DEG direction in GEO, TCGA subtype expression pattern, survival association, and qRT-PCR direction.
Improve pathway interpretation. The GO/KEGG sections are descriptive. The authors should connect enriched pathways more directly to TNBC biology, such as immune signaling, EMT, extracellular matrix remodeling, proliferation, and hormone receptor loss.
Avoid overclaiming diagnostic/prognostic utility. The current data support candidate biomarker prioritization, but not clinical biomarker validation. Claims should be softened unless larger independent validation is added.
Add data/code availability. A GitHub repository or supplementary R scripts would improve reproducibility.

Overall assessment

This is a useful exploratory systems-biology study that identifies and validates six candidate hub genes associated with TNBC and breast cancer subtype biology. The study's strengths include its integrated workflow, use of public transcriptomic datasets, network-based prioritization, TCGA/UALCAN validation, and qRT-PCR confirmation in representative breast cancer cell lines.

The main areas needing improvement are clarification of DEG counts, stronger handling of cross-dataset normalization and batch effects, more careful framing of novelty, TNBC-specific validation, expanded qRT-PCR experimental detail, and functional validation of candidate genes. The current data support these genes as candidate TNBC-associated network markers, but not yet as confirmed causal regulators or clinically validated diagnostic/prognostic biomarkers.

With these revisions, the manuscript would provide a stronger and more reproducible contribution to TNBC transcriptomic biomarker discovery and could better guide future functional studies of CCND1, CDH1, ESR1, FN1, IL6, and PPARG in TNBC biology.

Competing interests

The author declares that they have no competing interests.

Use of Artificial Intelligence (AI)

The author declares that they used generative AI to come up with new ideas for their review.

Read the original source

Elucidation of putative key genes involved in the regulation of triple negative breast cancer development and progression

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Short summary of the research and contribution to the field

Short summary of the research and contribution to the field

Positive feedback / strengths

Major issues

1. The DEG counts are inconsistent and need clarification

2. The study needs stronger control for dataset heterogeneity and batch/platform effects

3. Several identified hub genes are not novel TNBC-specific regulators

4. TNBC-specific validation needs improvement

5. qRT-PCR validation is limited to one TNBC and one non-TNBC cell line

6. qRT-PCR methods need more experimental detail

7. PPARG results are internally inconsistent and need deeper interpretation

8. Survival analysis should be TNBC-specific and statistically clearer

9. Functional claims require direct mechanistic validation

10. The manuscript should update tools and database versions

Minor issues

Overall assessment

Competing interests

Use of Artificial Intelligence (AI)