Predicting Polycystic Ovary Syndrome among Reproductive-Aged Women in Bangladesh Using Machine Learning Algorithms: Development of a Hospital-Based Predictive Model

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Polycystic ovary syndrome (PCOS) is one of the most common endocrine disorders among reproductive aged women, which is characterized by hormonal imbalance, metabolic disorders, and reproductive complications. Despite increasing rates in South Asia, notably Bangladesh, diagnostic constraints and fragmented data make early detection difficult. Machine learning (ML) may provide a solution by detecting subtle trends in clinical, demographic, and psychological data to improve diagnostic accuracy and timely intervention.

Objective

The study aimed to predict PCOS using machine-learning approaches among reproductive women of Bangladesh and to quantify the relative predictive contribution of psychological and clinical predictors to disease diagnosis.

Methods

A cross-sectional survey with 212 reproductive aged women was conducted to evaluate machine-learning models for predicting PCOS. Data were collected using the DASS-21 and ISI-7 scale and feature selection was performed using the LASSO regression. Model performance was evaluated by bootstrapping and nested cross-validation to ensure robustness. Several machine-learning models were developed, and predictive performance was assessed using various evaluation metrics.

Results

Nearly half of the participants (49.5%) were diagnosed with PCOS. Out of 21 features, eight were selected as significant by LASSO. Extreme Gradient Boosting (XGBoost) demonstrated best predictive performance with an accuracy of 99.63%, sensitivity of 99.45%, specificity of 99.81%, Cohen’s Kappa of 99.26%, and ROC-AUC of 99.99% among all models. Random Forest and Support Vector Machine also showed strong results, confirming the effectiveness of ensemble and kernel-based approaches. Moreover, psychological aspects showed a minimal predictive influence than clinical features.

Conclusions

Using ML frameworks to incorporate clinical, demographic, and psychological data can significantly enhance PCOS prediction in contexts with limited resources. The XGBoost model demonstrated remarkable reliability and accuracy, underscoring its efficacy as a tool for clinical decision support. Future studies should include biochemical indicators for wider application in the management of women’s reproductive health and globally validate these findings.

Article activity feed