A robust classifier for the intrinsic consensus molecular subtypes in colorectal cancer.
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: An analysis of colorectal cancer epithelial cells showed two intrinsic subtypes called iCMS2 and iCMS3, in addition to the bulk consensus subtypes (CMS1-4). The intrinsic subtypes can be prognostically important in some patient subsets and may prove predictive of response to treatment. We present a method for calculating the iCMS subtypes that is robust to technical variation and is designed for single-sample applications. Results: A single-sample classifier (SSC) was developed based on non-parametric correlation similarity with gene expression centroids, synthetically created by resampling samples with known iCMS classes from public datasets that have been used in the derivation of the iCMS classification. We selected the subset of iCMS genes (N=201) with the strongest epithelial expression in colorectal cancer, aiming to reduce unrelated, non-epithelial variation. The SSC calculates the most likely iCMS class based on the distribution of the classes of the nearest centroids with either an absolute cutoff or K-nearest-neighbors voting. We used new data from the Geneva Tumor Registry cohort (GTR) for formal validation, without applying prior batch correction or denoising. Ground truth was established by exhaustively estimating the distance of the GTR samples from the reference iCMS dataset in the space of over 12000 genes, and assigning the most likely iCMS by proximity to a known reference sample. In the GTR cohort the SSC achieved a accuracy ranging from 86 to 88% with or without batch correction, outperforming the previously published nearest template predictor, which only reached 75.4% without batch correction and 89.8% with batch correction. The SSC was more robust to the addition of noise, missing genes or simulated contamination with the opposite iCMS class. In addition, the SSC was applied to data from the VELOUR trial, for which reference iCMS calls were also available. The SSC iCMS was prognostic for OS in iCMS2 vs iCMS3 (P<0.00001) and performed at least as well as the reference iCMS, which was also prognostic (P=0.0001). Conclusions: The SSC reproduces the iCMS classification and appears significantly more robust to batch effect. The SSC calls are prognostic in clinical trial data. An R package is available for download (https://github.com/CRCrepository/iCMS.SSC).