Optimizing Metagenome Analysis for Early Detection of Colorectal Cancer: Benchmarking Bioinformatics Approaches and Advancing Cross-Cohort Prediction

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Colorectal cancer (CRC) continues to be a major global public health challenge. Extensive research has underscored the critical role of the gut microbiome for predictive diagnostics of CRC. However, the variability in analytical methods and microbial features across previous studies has significantly impacted detection performance, hindering further research and application. In this study, we conducted a systematic analysis of over 2,000 gut metagenomic samples from 15 globally-sampled public and in-house cohorts. By benchmarking 1,080 analytical combinations across multiple analytical steps, we established an optimal bioinformatics workflow for metagenome-based CRC detection. Despite the substantial heterogeneity of gut microbiomes across multiple regions and cohorts, our workflow demonstrated the robust performance, achieving an AUROC of 0.83 by identifying consistent microbial dynamic patterns associated with CRC. However, early-stage prediction of CRC, particularly at the precancerous adenomas (ADA) stage, remains challenging due to the instability of microbial signatures across cohorts. To address this, we developed an instance-based transfer learning strategy, Meta-iTL, which improved ADA detection in a newly recruited cohort, increasing the AUROC from 0.58 to 0.77 using models trained on existing cohorts. This study not only provides a comprehensive bioinformatics guild for metagenomic data processing and modeling, but also advances the development and application of non-invasive approaches for the early screening and prevention of CRC.

Article activity feed