A Minimal Plasma Proteome-Based Biomarker Panel for Accurate Prostate Cancer Diagnosis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Early and accurate diagnosis of prostate cancer (PRC) remains a major clinical challenge, particularly with existing biomarker panels relying on invasive sampling or large biomarker panels with limited interpretability. Here, we present a machine learning framework for discovering compact and biologically grounded plasma protein signatures for PRC classification using publicly available pan-cancer proteomic data. We coupled a genetic algorithm-based protein identification method with LASSO-regularized logistic regression to identify minimal protein subsets optimized for diagnostic performance. A 14-protein panel, recurrent across 1,000 genetic algorithm iterations, achieved a mean accuracy of 98.0%, an F1 score of 0.98, and an ROC AUC of 0.997 on a held-out test dataset. This performance exceeded models trained on high dimensionality data (>1,400 proteins) and surpassed published transcriptomic, methylomic, and cfDNA classifiers, many of which reported AUCs less than 0.91. Functional analysis revealed enrichment in protease binding and DNA repair pathways, with known markers such as beta-microseminoprotein (MSMB) and poly(ADP-ribose) polymerase 1 (PARP1) appearing alongside under-characterized proteins like IGSF3 and XG. Models trained only on previously reported PRC-associated proteins showed lower performance, highlighting the added diagnostic value of including novel, data-driven candidates. This study outlines a scalable proteomic workflow and demonstrates that high diagnostic performance can be achieved using small, interpretable panels derived from blood-based proteomics. The findings lay the foundation for the development of interpretable, clinically deployable assays for PRC detection and risk stratification.
Overview of the study workflow: blood-derived proteomic profiles from cancer and normal cohorts were analyzed using machine learning to identify a compact diagnostic panel, enabling clinical decision support for new patients.