Improving Early Prostate Cancer Detection Using Fragmentomics and Ensemble Machine Learning Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Prostate cancer (PCa) detection remains challenging when prostate-specific antigen (PSA) levels fall within the ambiguous 4–20 ng/mL range. This study aimed to develop a non-invasive, cost-effective method for early PCa detection using circulating tumor DNA (ctDNA) fragmentomic features. Whole-genome sequencing of cell-free DNA (cfDNA) from plasma samples was performed in a training cohort (78 PCa patients, 99 non-cancer controls) and an independent test cohort (57 non-metastatic PCa, 20 metastatic cases, 55 controls). Three fragmentomic features—fragment size distribution (FSD), fragment size coverage (FSC), and copy number variance (CNV)—were analyzed using ensemble machine learning models (GLM, GBM, XGBoost). The FSD model demonstrated superior performance (AUROC: 0.898 in training, 0.95 in testing), particularly in high-risk patients (PSA 4–20 ng/mL; AUROC: 0.907). A composite PCa score integrating FSD, FSC, and CNV achieved AUROCs of 0.901 (training) and 0.89 (testing), outperforming PSA (AUROCs: 0.875 and 0.855, respectively). In high-risk subgroups (PSA 4–20 ng/mL), the PCa score maintained high sensitivity (79.31%) and specificity (84%). Despite limitations in cohort size and external validation, this study highlights the clinical potential of cfDNA fragmentomics for early PCa detection, especially in diagnostically ambiguous PSA ranges.