Identifying Optimal Algorithms for Breast Cancer Prediction in Ethiopia

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The most prevalent and lethal kind of cancer in Ethiopia is breast cancer. The number of deaths from breast cancer is rising dramatically every year. It is the most common kind of cancer overall and the leading cause of death for women in Ethiopia. Considering this, this study aims to identify optimal machine learning algorithms that can predict the stages of breast cancer. Unlike traditional methods, machine learning approaches have proven to be powerful methods in early detection and prediction of breast cancer. In this study, we have used the breast cancer dataset that was collected from Hiwot Fana Specialized University Hospital and Tikur Anbesa Specialized Hospital from September 2019 to April 2024. We have applied machine learning algorithms on the preprocessed breast cancer dataset; which are random forest, logistic regression, decision tree, and hybrid machine learning algorithms (RF, DT, GBC and SVM). Based on the results obtained from each algorithm, we compared and evaluated the performance of each classifier using evaluation metrics like precision, recall, F1 score, and accuracy to identify the optimal machine-learning algorithm. In order to find the optimal algorithms and improve the accuracy of the model, 13 features were selected as inputs. The model performance evaluation was done using the train split test and the 10-fold cross-validation. The experimental results were based on dataset division (80:20) to predict stages of breast cancer. Python programing language and required libraries were used to analyze dataset. According to the comparative analysis obtained from the dataset, the random forest model performed well in both trains split test and the 10-fold cross validation performance evaluation and surpassed other experimented algorithms. It has better effects, and its recall, precision, accuracy, and F1-scores are equal, which is 99% using train split test and 97% using 10-fold cross validation. Thus, random forest is the optimal machine-learning algorithm that used to determine stages of breast cancer patients in Ethiopia.

Article activity feed