Robust methylome analysis and tumour-normal classification in TCGA-COAD: a reproducible workflow

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Colorectal adenocarcinoma is caused in part by widespread epigenetic deregulation, yet the analysis of genome-wide DNA methylation of colorectal adenocarcinoma is complicated due to spatial correlation among CpG, multiscale patterns of differential methylation, and confounding cellular heterogeneity in bulk tissue. This study develops a simple yet effective framework that combines rigorous statistical modelling with modern machine learning-based prediction, on the Illumina 450K data from The Cancer Genome Atlas (TCGA) colorectal adenocarcinoma cohort. In our framework, differentially methylated regions (DMRs) were first detected using functional smoothing and permutation-based bump hunting, revealing both focal CpG island hypermethylation and broad hypomethylated domains spanning hundreds of kilobases. Next, we performed reference-based cell-type deconvolution and surrogate variable analysis (SVA) controlled immune/stromal admixture and hidden confounding effects, yielding well-calibrated single-site and region-level inference. Then, for tumour-status prediction, we empirically study the performance of classical Logistic Regression model against Random Forest, Gradient Boosting (XGBoost), and Feed-forward neural network; the results show that Logistic Regression achieves the lowest root-mean-square error and Brier score, reflecting its superior probability calibration. In general, our integrated framework provides biologically interpretable and highly predictive methylation signatures of colorectal cancer and offers a transferable baseline for future large-scale cancer epigenomics studies. Our code is publicly available at https://github.com/matekum/tcga-coad-methylation-ekum-2025a.

Article activity feed