Enabling whole-genome DNA methylation-based classification of central nervous system tumors
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objectives
DNA methylation profiling using array-based platforms has proven invaluable for classifying central nervous system (CNS) tumors, especially those with challenging or atypical morphologies. However, existing classification frameworks remain restricted to array-based inputs, interrogating only a subset of CpG sites and limiting diagnostic and prognostic resolution. Whole-genome methylation sequencing methods such as whole-genome bisulfite sequencing (WGBS) and enzymatic methyl-seq (EM-seq) offer near-complete methylome coverage, but their integration into established classifiers is lacking. This study aimed to develop MethylInsight, a web-based platform designed to adapt a widely recognized CNS tumor classification framework to whole-genome data.
Methods
MethylInsight converts WGBS and EM-seq signals into array-compatible beta values, enabling compatibility with established classifiers. A Random Forest-based model with logistic regression calibration was trained on 3,905 CNS tumor and control samples spanning 82 tumor subtypes and nine control tissue classes. Performance was evaluated using five-fold cross-validation and internal validation on 22 matched patient samples. The platform also incorporates t-distributed stochastic neighbor embedding (t-SNE) visualizations for contextualizing newly profiled samples against a reference cohort.
Results
MethylInsight demonstrated robust classification performance across tumor classes, achieving an area under the ROC curve (AUC) of 0.961, comparable to the DKFZ (0.966) and NM (0.964) classifiers. Cross-validation showed uniformly high accuracy, including for glioblastoma (GBM), a challenging subtype, with sensitivity and specificity of 0.924 and 0.940, respectively. Calibration reduced the estimated error rate from 5.88% to 2.95%. Validation across platforms showed strong concordance, with 21 of 22 paired datasets achieving >80% Pearson correlation, and top-ranked predictions matched for all but three pairs, which still shared overlapping top-two predictions.
Conclusions
MethylInsight enables whole-genome methylation data integration into CNS tumor classification, overcoming limitations of array-based methods. By supporting EM-seq inputs and providing calibrated probabilities and intuitive t-SNE visualizations, MethylInsight enhances diagnostic precision and tumor stratification. The platform is freely accessible at https://inocras.methylclassifier.com .