Integrated Multi-Omics Analysis for the Identification of Disease-Associated Variations and Prognostic Biomarkers in Triple-Negative Breast Cancer (TNBC)

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Triple-negative breast cancer (TNBC) exhibits substantial molecular heterogeneity and lacks targeted receptor therapies. Single-omic approaches inadequately capture its regulatory complexity, necessitating integrated multi-omic frameworks to identify stable prognostic signatures.

Methods

Matched transcriptomic and DNA methylation data from the TCGA-BRCA cohort were normalised and mathematically integrated to isolate disease-associated variations. A calibrated machine learning voting ensemble (comprising LightGBM, Random Forest, and Logistic Regression) was trained to predict clinical survival. Model generalisability was tested on an independent microarray cohort (GSE58812) using independent quantile normalisation. SHAP (SHapley Additive exPlanations) values provided biological interpretability.

Results

Differential and integrative analyses identified a 47-gene master prognostic signature. The ensemble classifier achieved an external validation accuracy of 74.77% (AUC 0.590) on unseen clinical patients. SHAP analysis confirmed the biological directionality of these specific biomarkers in driving mortality. Hypergeometric pathway enrichment highlighted targetable metabolic and signalling networks.

Conclusions

This multi-omic machine learning pipeline identifies a highly prognostic 47-gene signature for TNBC. The model demonstrates strong cross-platform generalisability and offers interpretable clinical utility for stratifying patient risk and guiding future therapeutic target development.

Article activity feed