Quantitative Analysis of Breast Nuclei Morphology for Cancer Diagnosis Using Supervised Machine Learning

Zarlish Attique
Sajjid Khan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Breast cancer is the most frequently diagnosed malignancy among women worldwide and a major cause of mortality. Early and accurate detection is vital for improving outcomes, yet conventional diagnostic approaches such as mammography, histopathology, and fine-needle aspirate (FNA) cytology can be limited by observer variability and overlapping morphological features. Machine learning (ML) offers a means to improve diagnostic accuracy by capturing subtle patterns in complex datasets.

Methods

This study employed the Wisconsin Breast Cancer Diagnostic (WBCD) dataset, comprising 569 FNA cytology cases with 30 quantitative nuclear morphology features. After correlation analysis, 11 predictors were selected to reduce redundancy while retaining diagnostic power. The dataset was split into training and testing sets using an 85:15 stratified approach. Four supervised classifiers were implemented in Python’s scikit-learn library: Random Forest (RF), Multi-Layer Perceptron (MLP), K-Nearest Neighbors (KNN), and Support Vector Classifier (SVC). Models were tuned using GridSearchCV and evaluated using accuracy, precision, recall, and confusion matrices.

Results

Exploratory analysis showed malignant tumors exhibited larger nuclear size and higher concavity features than benign tumors. The MLP achieved the best performance (accuracy 0.95, recall 0.91, precision 0.96), misclassifying only two malignant cases. RF and KNN both reached 0.93 accuracy and 0.97 precision but had lower recall (0.85). SVC achieved perfect precision (1.00) but the lowest recall (0.76), misclassifying eight malignant cases.

Conclusion

ML models demonstrated reliable classification of breast tumors from cytomorphological features, with the MLP offering the most favorable balance of sensitivity and specificity. These findings highlight the clinical potential of neural network–based models to support early and accurate breast cancer detection.

Version published to 10.1101/2025.08.24.25334307 on medRxiv
Aug 26, 2025

Ensemble Deep Learning for Histopathological Breast Cancer Detection

This article has 1 author:
1. Alireza Rahi
This article has no evaluationsLatest version Aug 13, 2025
Improving discriminative ability in mammographic microcalcification classification using deep learning: a novel double transfer learning approach validated with an explainable artificial intelligence technique

This article has 5 authors:
1. K. Arlan
2. M. Björnström
3. T. Mäkelä
4. T. J. Meretoja
5. K. Hukkinen
This article has no evaluationsLatest version Aug 11, 2025
Demographic, Morphological, and Histopathological Characteristics of Melanoma and Nevi: Insights from Statistical Analysis and Machine Learning Models

This article has 4 authors:
1. Blagjica Lazarova
2. Gordana Petrushevska
3. Zdenka Stojanovska
4. Stephen C. Mullins
This article has no evaluationsLatest version Oct 1, 2025

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusion

Article activity feed

Related articles

Ensemble Deep Learning for Histopathological Breast Cancer Detection

Improving discriminative ability in mammographic microcalcification classification using deep learning: a novel double transfer learning approach validated with an explainable artificial intelligence technique

Demographic, Morphological, and Histopathological Characteristics of Melanoma and Nevi: Insights from Statistical Analysis and Machine Learning Models