CMS4-focused multi-omic integration enhances antigen target identification in colorectal cancer

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Colorectal cancer (CRC) remains a major cause of cancer mortality, with limited options for poor-prognosis subtypes such as CMS4. Antigen-targeted therapies show promise but tend to fail due to inadequate target selection and insufficient patient stratification. Effective prioritization requires large harmonized data capturing CRC heterogeneity – a resource that is currently lacking. To address this need, we built a harmonized multi-omic CRC knowledge base and applied a scalable discovery pipeline to identify antigen targets specifically associated with CMS4 biology and with strong translational potential.

We constructed a harmonized CRC atlas by integrating 79 transcriptomics datasets (5,033 tumors, 161 normal samples) using proprietary AI-powered data scouting, integration, and curation technologies. Consensus Molecular Subtypes (CMS) were inferred to capture CMS4-specific expression patterns and this atlas was then combined with 3 bulk RNA-seq reference datasets, 2 single-cell atlases, and 8 protein annotation databases to form a unified multi-omic CRC knowledge base of unmatched scale. From this integrated system, we identified genes differentially expressed in CMS4 patients encoding druggable cell-surface proteins, which we then prioritized using a weighted efficacy- and safety-based scoring model.

We identified 236 CMS4-enriched candidates, including 124 not detectable at the CRC-wide level, demonstrating the added resolution gained through subtype stratification. Recovery of known investigational CRC (LGR5, MET, TACSTD2) and CMS4-associated targets of clinical emerging interest (PDGFRB, ALK5/TGFBR1, FAP) support the biological and methodological validity of our approach.

Benchmarking against thresholds from FDA-approved pan-cancer targets and terminated trials identified 32 candidates with comparable or superior therapeutic profiles. Among these, 11 were enriched for CMS4-defining pathways, including epithelial–mesenchymal transition, angiogenesis, and stromal invasion, and 5 showed strong profile similarity to established CRC and CMS4 benchmarks. After extensive data exploration, particularly promising candidates were shortlisted for further validation.

This work shows that CMS4-focused molecular stratification, when combined with an unprecedentedly large harmonized multi-omic knowledge base, yields a refined set of antigen candidates with enhanced specificity, safety, and biological relevance. The prioritized targets illustrate the power of subtype-resolved discovery to uncover clinically actionable insights. Our pipeline’s modular design can extend to other tumor contexts, offering a robust foundation for accelerating targeted therapy development.

Article activity feed