Optimizing Seminal Quality Prediction Using Machine Learning with Data Preprocessing and Feature Selection

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Due to the increasing prevalence of medical diseases, accurately diagnosing patients has become a significant challenge. Medical data is often raw and unstructured, requiring normalization to convert it into a suitable format for disease prediction. Even once data is appropriately formatted, additional challenges remain, such as handling imbalanced datasets, selecting effective features, and choosing suitable machine learning algorithms to achieve reliable predictive accuracy. This research focuses on predicting the seminal quality of men, addressing these challenges through a series of methodologies. The study utilizes the Fertility Dataset and employs preprocessing techniques to convert categorical values into normalized domain values based on WHO 2010 criteria. To handle class imbalance, the SMOTE algorithm is applied. Feature selection is optimized using CFS-Subset Evaluator and Best-First Search techniques to identify the most relevant features. Several machine learning models, including Naïve Bayes and Multi-layer Perceptron (non-ensemble), and ensemble methods like Bagging, Random Forest, and XG-Boost, are evaluated. Both percentage split and 10-fold cross-validation methods are employed for model validation. The highest accuracy achieved in this study is 96.2%.

Article activity feed