Quantitative Analysis of Breast Nuclei Morphology for Cancer Diagnosis Using Supervised Machine Learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Breast cancer is the most frequently diagnosed malignancy among women worldwide and a major cause of mortality. Early and accurate detection is vital for improving outcomes, yet conventional diagnostic approaches such as mammography, histopathology, and fine-needle aspirate (FNA) cytology can be limited by observer variability and overlapping morphological features. Machine learning (ML) offers a means to improve diagnostic accuracy by capturing subtle patterns in complex datasets. Methods: This study employed the Wisconsin Breast Cancer Diagnostic (WBCD) dataset, comprising 569 FNA cytology cases with 30 quantitative nuclear morphology features. After correlation analysis, 11 predictors were selected to reduce redundancy while retaining diagnostic power. The dataset was split into training and testing sets using an 85:15 stratified approach. Four supervised classifiers were implemented in Python's scikit-learn library: Random Forest (RF), Multi-Layer Perceptron (MLP), K-Nearest Neighbors (KNN), and Support Vector Classifier (SVC). Models were tuned using GridSearchCV and evaluated using accuracy, precision, recall, and confusion matrices. Results: Exploratory analysis showed malignant tumors exhibited larger nuclear size and higher concavity features than benign tumors. The MLP achieved the best performance (accuracy 0.95, recall 0.91, precision 0.96), misclassifying only two malignant cases. RF and KNN both reached 0.93 accuracy and 0.97 precision but had lower recall (0.85). SVC achieved perfect precision (1.00) but the lowest recall (0.76), misclassifying eight malignant cases. Conclusion: ML models demonstrated reliable classification of breast tumors from cytomorphological features, with the MLP offering the most favorable balance of sensitivity and specificity. These findings highlight the clinical potential of neural network-based models to support early and accurate breast cancer detection.

Article activity feed