A Data-Driven Approach to Polycystic Ovary Syndrome Diagnosis: Evaluating Machine Learning Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
PCOS is recognized as a major health concern affecting women around the world. Early detection and treatment of PCOS significantly reduce implications in the future. Conventional diagnostic methods are resource-intensive and may be prone to inaccuracies. We should utilize early diagnostic techniques to reduce the severity and overall impact. Machine learning offers a promising approach to improving PCOS detection by analyzing clinical and demographic data efficiently.
Methods
This study utilized a dataset of 539 women, including 176 PCOS-positive cases, sourced from the Kaggle repository. Thirty-eight features, categorized into anthropometric, symptom-based, test result, and demographic variables, were analyzed. The most important Feature importance was assessed using the Mean Squared Error metric. Six machine learning models were employed to classify PCOS cases.
Results
Significant differences were observed in multiple clinical and anthropometric variables between PCOS-positive and PCOS-negative cases, including BMI, waist-to-hip ratio, antral follicle count, AMH levels, and menstrual cycle length. The most predictive features were antral follicle count, hair growth, skin pigmentation, weight gain, and fast-food consumption. Among all models, Random Forest, the highest-performing model, demonstrated the efficacy of machine learning in PCOS prediction with a 93% accuracy and 86% high sensitivity.
Conclusions
Machine learning can improve early and accurate PCOS detection, providing a cost-effective and efficient substitute for traditional methods of diagnosis. The integration of predictive models into clinical practice could facilitate timely interventions, improving patient outcomes and reducing the healthcare burden associated with PCOS.