Risk Stratification of Breast Cancer Metastasis: A Predictive Modelling Framework Using Clinical and Hormonal Receptor Data in a Ghanaian Cohort

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Breast cancer remains the most diagnosed cancer among women globally. It is a leading cause of cancer-related deaths, with disproportionately high mortality rates in sub-Saharan Africa due to late-stage presentation and limited access to diagnostic and treatment services. In Ghana, nearly 70% of breast cancer cases are diagnosed at advanced or metastatic stages, underscoring the urgent need for accurate and context-specific prognostic tools. Here, we assembled clinical, molecular and demographic data from 558 breast cancer patients at Korle-Bu Teaching Hospital in Ghana to develop and compare five supervised machine‐learning model logistic regression, random forest, support vector machine, XGBoost and naïve Bayes for metastasis prediction. Using standardized preprocessing and repeated 10-fold stratified cross-validation, logistic regression achieved the most balanced performance (96% cross-validated accuracy; 93% test‐set accuracy) and facilitated interpretability via SHAP analysis, which identified lymph node involvement, tumour size and stage as top predictors. We further refined prognostic utility by stratifying patients into low, intermediate and high metastatic risk groups, revealing distinct clinical and biological profiles high‐risk individuals exhibited aggressive subtypes (triple-negative, HER2-positive) and advanced staging, whereas low-risk patients showed favourable receptor status and earlier disease. These findings demonstrate that tailored, interpretable machine-learning tools can support accurate, resource-appropriate metastasis risk assessment in low-resource settings.

Article activity feed