Accurate classification of CNS tumors through DNA methylation data analysis of select genomic regions
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Current clinical neuropathology practice utilizing DNA methylation information to support diagnosis of central nervous system (CNS) tumors could benefit from increased interpretability and cost reductions.
Methods
We identified and characterized limited sets of genomic regions (i.e. features) that can be used for accurate classification of CNS tumors based on DNA methylation data. The features were selected using a hybrid strategy combining filtering and Elastic Net Logistic Regression (ENLR). A Support Vector Machine (SVM)-based classifier was trained using select 1003 informative features and an established cohort of 60 diagnostic tumor classes comprising 82 tumor DNA methylation classes and 9 control classes. Validation was performed using external microarray and targeted DNA methylation sequencing cohorts.
Results
Informative regions were enriched in enhancers and associated with genes involved in neural development and morphogenesis. In the microarray validation cohort of 1993 samples representing 76 DNA methylation classes, overall accuracy of our SVM classifier was 0.96, when using 1003 features and after the differences to the molecular neuropathology classifier were evaluated based on reported final tumor diagnosis and diagnostic relevance. Its performance remained similar (overall accuracy 0.95-0.96) when the number of features was further decreased, down to 163. An accuracy of 0.94 was detected in the in-house targeted sequencing cohort of 17 cases.
Conclusions
The classification of CNS tumors is feasible and accurate based on a very limited set of genomic regions, which facilitate further method development and the interpretation of classification results, likely benefiting CNS tumor diagnostics worldwide.
Highlights
-
Hybrid feature selection identifies 1,003 CpGs strongly linked to CNS tumors
-
SVM model achieves 0.96 accuracy with confidence and top-3 predictions
-
Robust across sequencing and microarray platforms for clinical use
-
Reliable even when reduced to 163 CpG features, lowering cost and complexity