Fecal volatile organic compound–based machine learning model for noninvasive detection of colorectal cancer
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
BACKGROUND Colorectal cancer (CRC) remains a major global health concern, ranking among the top causes of cancer incidence and mortality. Current noninvasive screening tools such as fecal occult blood tests and serum carcinoembryonic antigen (CEA) assays suffer from limited sensitivity and specificity, while colonoscopy, the diagnostic gold standard, is invasive and costly. Volatile organic compounds (VOCs), metabolic end-products influenced by gut microbiota and tumor metabolism, offer a promising avenue for noninvasive CRC detection when coupled with advanced computational modeling. AIM To develop and validate a fecal VOC-based machine learning model for noninvasive CRC detection. METHODS Fecal samples from 78 CRC patients and 57 healthy controls were analyzed using gas chromatography–ion mobility spectrometry (GC–IMS). Recursive feature elimination with cross-validation (RFECV) integrating LASSO, random forest, and support vector machine identified key VOCs. Five machine learning algorithms were constructed and optimized, and their diagnostic performance, calibration, and clinical utility were evaluated. SHapley Additive exPlanations (SHAP) analysis was applied to interpret model predictions. RESULTS Among 85 identified VOCs, 11 were consistently selected as discriminative biomarkers, including 3-methylbutanoic acid-M, indole, and 1-butanol. The XGBoost model achieved the best performance with an area under the receiver operating characteristic curve (AUROC) of 0.8866, sensitivity of 0.83, and specificity of 0.78. SHAP analysis revealed 3-methylbutanoic acid-M as the most influential metabolite in model predictions. Several individual VOCs, such as 2-phenylacetaldehyde and propanoic acid-D, outperformed CEA in discriminating CRC from healthy controls. Decision curve analysis demonstrated superior clinical net benefit for the VOC-based model compared with traditional screening markers. CONCLUSION Integration of fecal VOC profiling with a machine learning model provides a promising noninvasive strategy for accurate CRC detection, potentially improving early diagnosis and screening compliance. Trial Registration Chinese Clinical Trial Registry (ChiCTR), ChiCTR2300073117. Registered on July 1, 2023 expected completion on June 30, 2025. Available at https//www.chictr.org.cn/bin/project/edit?pid=200842