Prostate Cancer Prediction Model Based on Machine Learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Prostate cancer is one of the most prevalent cancers among men worldwide, early screening and diagnosis play a crucial role in improving patient health management. The commonly used detection method, Prostate-Specific Antigen (PSA) testing, has a high false-positive rate. With the widespread application of artificial intelligence in the medical field, machine learning, a subfield of artificial intelligence, demonstrates significant advantages in processing large-scale, high-dimensional, and complex data. Machine learning methods can be leveraged to achieve more accurate prostate cancer prediction. Objective: This study aims to develop a more precise prostate cancer risk prediction model using machine learning algorithms based on multidimensional data. Methods: This research utilized data from the NHANES (National Health and Nutrition Examination Survey) spanning 2003 to 2010. Through an extensive literature review and database integration, four key dimensions related to prostate cancer were identified: demographic details, dietary patterns, laboratory test results, and lifestyle factors, encompassing a total of 22 risk variables. Following rigorous data preprocessing, including cleaning and normalization, machine learning models were developed using Logistic Regression (LR), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Random Forest (RF). Model performance was assessed through metrics such as accuracy, ROC curves, AUC scores and Decsion Curve Analysis (DCA). The best-performing model was then analyzed for interpretability using SHAP (Shapley Additive Explanations) to gain deeper insights into risk factor contributions. Results: Significant differences (P<0.05) were observed between the prostate cancer and normal groups in terms of age, race, marital status, milk consumption, daily intake of calcium, selenium, β-carotene, monounsaturated fatty acids, saturated fatty acids, and total cholesterol levels. Among all models, the RF model performed the best, achieving an accuracy of 95%, a specificity of 92% (recall rate for the normal group), and an AUC of 0.982. Other models had an accuracy ranging from 72% to 73%, with an AUC of approximately 0.80, significantly lower than the RF model. SHAP feature importance analysis indicated that PSA had the greatest influence on the RF model's predictions, followed by daily selenium intake. Conclusion: Utilizing multidimensional data and machine learning algorithms can lead to the development of more accurate prostate cancer risk prediction models. In this study, the RF model demonstrated the best performance, providing a new approach for early screening of prostate cancer.

Article activity feed