Using Machine Learning to Improve Cancer Diagnosis Accuracy Through Genetic Data Analysis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate and robust molecular diagnostics are critical for improving breast cancer outcomes. Gene expression profiling offers high-dimensional signatures for distinguishing tumor from normal tissue, but traditional classifiers can be sensitive to technical noise and batch effects. We propose a unified framework that benchmarks classical machine learning and deep unsupervised representation learning for breast cancer diagnosis on the GSE10810 microarray cohort. After variance filtering to retain the top 75 % most variable probes (13 787 genes), we trained logistic regression and random forest classifiers on the filtered data, achieving perfect discrimination (AUC = 1.00) under 5 fold cross validation and held out testing. We then evaluated three autoencoder architectures—standard autoencoder (AE), denoising autoencoder (DAE), and variational autoencoder (VAE)—compressed to 50 and 10 latent dimensions. Downstream classifiers on these embeddings also achieved AUC = 1.00, demonstrating that unsupervised latent spaces preserve all diagnostic information. Qualitative visualizations via PCA and t SNE confirmed clear tumor/normal separation in both raw and denoised latent spaces. The DAE’s denoising capacity suggests resilience to technical variability, while the VAE’s probabilistic embedding facilitates uncertainty estimation and data augmentation. Although GSE10810 represents a high-signal, single-institution cohort, our results validate the potential of deep latent models for noise-robust, low dimensional feature extraction. Future work will extend these methods to heterogeneous RNA seq and multi omics datasets, integrate interpretability modules (e.g. SHAP), and explore cross platform generalization. This study lays the groundwork for next generation AI driven diagnostics that combine high accuracy with enhanced robustness and clinical readiness.