Integrating Gene Expression and Proteomics for Breast Cancer Biomarker Prediction through a Deep Learning Framework with SHAP-Based Explainability

Naim Ajlouni
Abdelrahman Almassri

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Breast cancer treatment hinges on accurately identifying key biomarkers: Estrogen Receptor (ER), Progesterone Receptor (PR), and Human Epidermal Growth Factor Receptor 2 (HER2). In this study, it is intended to use deep learning strategies to achieve the study objectives. The first is a Convolutional Neural Network (CNN) designed specifically for each biomarker, while the second is a holistic multi-input neural model that brings together gene expression data with simulated proteomic features. The study utilizes a dataset containing 705 patient samples with 1,941 gene expression features. Both methods were thoroughly tested and compared. The CNN models achieved better results for ER and PR, indicating strong, learnable patterns in gene expression. The test shows that both ER and PR achieved an accuracy of 89% and 86%, respectively. While HER2 achieved a much lower accuracy and higher loss 72% and 0.6, respectively this means that it is very difficult for the CNN model to correctly model the marker using gene expression alone. The multi-input model showed promising robustness by integrating multiple data types. It performed on par with or better than the CNNs for ER and PR and held its ground in the challenging task of HER2 prediction. The study employed SHAP explainability tools to uncover what drives each prediction. The ER and PR models revealed clear gene signatures contributing to accurate classification. Even though the HER2 signal was weaker, SHAP still helped reveal delicate patterns, contributing valuable transparency and biological insight. The tests revealed two main findings. First, deep learning models whether CNNs or integrative architectures are potent tools for biomarker prediction. The results prove that explainability isn't just an addition; it is an essential component. It creates trust, guides feature refinement, and helps achieve clinical application. As can be seen, the study combines predictive power with interpretability. It demonstrates not just how we can predict, but why. It can be concluded that this kind of AI is needed for future precision oncology.

Version published to 10.21203/rs.3.rs-7151673/v1 on Research Square
Aug 27, 2025

Listed in

Abstract

Article activity feed