Optimizing Seminal Quality Prediction Using Machine Learning with Data Preprocessing and Feature Selection

Aamir Farooq
Zhengrong Xiang
Musaed Alhussein
Muhammad Shahzad
Muhammad Farhan
Khursheed Aurangzeb

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Due to the increasing prevalence of medical diseases, accurately diagnosing patients has become a significant challenge. Medical data is often raw and unstructured, requiring normalization to convert it into a suitable format for disease prediction. Even once data is appropriately formatted, additional challenges remain, such as handling imbalanced datasets, selecting effective features, and choosing suitable machine learning algorithms to achieve reliable predictive accuracy. This research focuses on predicting the seminal quality of men, addressing these challenges through a series of methodologies. The study utilizes the Fertility Dataset and employs preprocessing techniques to convert categorical values into normalized domain values based on WHO 2010 criteria. To handle class imbalance, the SMOTE algorithm is applied. Feature selection is optimized using CFS-Subset Evaluator and Best-First Search techniques to identify the most relevant features. Several machine learning models, including Naïve Bayes and Multi-layer Perceptron (non-ensemble), and ensemble methods like Bagging, Random Forest, and XG-Boost, are evaluated. Both percentage split and 10-fold cross-validation methods are employed for model validation. The highest accuracy achieved in this study is 96.2%.

Version published to 10.21203/rs.3.rs-5930473/v1 on Research Square
Apr 9, 2025

Comparative Study of Machine Learning Techniques for Diabetes Forecasting

This article has 2 authors:
1. Abdul Aamir Khan
2. Bk Sharma
This article has no evaluationsLatest version Jul 22, 2025
Enhanced machine learning and hybrid ensemble approaches for coronary heart disease prediction

This article has 4 authors:
1. Maurice Wanyonyi
2. Zakayo Ndiku Morris
3. Faith Mueni Musyoka
4. Dominic Makaa Kitavi
This article has no evaluationsLatest version Jul 3, 2025
Evaluation of Classical and Ensemble Machine Learning Algorithms for Thyroid Cancer Diagnosis: A Comparative Evaluation

This article has 1 author:
1. Kamorudeen Amuda
This article has no evaluationsLatest version Jul 17, 2025

Listed in

Abstract

Article activity feed

Related articles

Comparative Study of Machine Learning Techniques for Diabetes Forecasting

Enhanced machine learning and hybrid ensemble approaches for coronary heart disease prediction

Evaluation of Classical and Ensemble Machine Learning Algorithms for Thyroid Cancer Diagnosis: A Comparative Evaluation