Development and Validation of a Machine Learning Model Integrating Multiparametric Clinical Indicators for Predicting Prostate Cancer in the PI-RADS 3 Cohort
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective To develop and validate a machine learning (ML) model integrating multiparametric clinical indicators to predict clinically significant prostate cancer (CSPCa) in patients with PI-RADS 3 lesions, optimizing biopsy decision-making. Methods We retrospectively analyzed 193 patients with PI-RADS 3 lesions (training n = 135; validation n = 58). Least Absolute Shrinkage and Selection Operator (LASSO) regression selected features from age, PSA derivatives, prostate volume, Prostate Health Index (PHI), and PHI density (PHID). Four ML algorithms—Logistic Regression, Random Forest (RF), Extreme Gradient Boosting, and Support Vector Machine—were compared using Area Under the Curve (AUC), calibration, and Decision Curve Analysis. Results LASSO identified Age, PSA density, PHI, and PHID as independent predictors. The RF model demonstrated superior performance, achieving an AUC of 0.958 in training and 0.864 in validation, with 97.8% specificity. A three-tier risk stratification system was established: low-risk (< 0.2, 5.0% CSPCa rate), intermediate-risk (0.2–0.7, 37.5%), and high-risk (≥ 0.7, 70.0%). Conclusion The developed RF-based model integrating Age, PSA density, PHI, and PHID provides a robust tool for predicting CSPCa in PI-RADS 3 lesions. It effectively stratifies patients into distinct risk categories, potentially reducing unnecessary biopsies in low-risk individuals while ensuring timely detection in high-risk patients.