A Hybrid Ensemble Machine Learning Framework with Membership-Function Feature Engineering for Non-Invasive Prediction of HER2 Status in Breast Cancer

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate determination of human epidermal growth factor receptor 2 (HER2) status is a critical component of breast cancer prognosis and treatment planning. Conventional diagnostic techniques, such as immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH), are clinically established but remain invasive, time-consuming, costly, and sensitive to pre-analytical and interpretative variability. Motivated by the need for scalable and data-driven decision-support tools, this study proposes a hybrid ensemble machine learning framework for non-invasive HER2 status prediction using routinely available clinical and immunohistochemical features. A retrospective dataset comprising 624 breast cancer patients from Mahdieh Clinic (Kermanshah, Iran) was analyzed using a structured preprocessing pipeline including normalization and class balancing. The proposed framework integrates multiple tree-based classifiers, Random Forest, XGBoost, and LightGBM, through ensemble strategies and enhances predictive robustness using membership-function feature engineering to capture gradual transitions in clinically relevant biomarkers. Decision threshold optimization was further applied to improve classification balance in borderline cases. The proposed ensemble framework achieved an accuracy of 0.816, an F1-score of 0.814, and an area under the receiver operating characteristic curve (AUC) of 0.862 on a held-out test set, demonstrating performance comparable to the best-performing individual classifier. These results indicate that ensemble learning combined with smooth membership-based feature representations can provide a reliable decision-support framework for HER2 status prediction, although further external validation is required before clinical use.

Article activity feed