DNA Methylation Biomarkers-based Pan-Cancer Classifier: Predictive Modeling for Cancer Classification
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Machine-learning (ML) driven molecular diagnostics based on omics data has a potential to revolutionize personalized medicine. However, implementation of ML into diagnostic protocols is hindered by methodological challenges which often lead to inflated performance assessment of models during development followed by poor performance of these models in implementation phase. Here, we aimed to develop and validate a pan-cancer classification framework based on DNA methylation data, that addresses methodological challenges of omics data powered ML. Methods: We curated a primary dataset of DNA methylation profiles for 10 756 samples, that included 54 healthy and cancer tissue types and validation dataset comprising data for 2 306 samples from 28 independent studies. The classification framework was build using custom biomarkers selection strategy based on effect size metric that considers variance and class imbalance. The ML models were trained, tuned and evaluated using nested cross-validation approach. Local Outlier Factor algorithm was built into the inference pipelines to identify and filter samples displaying technical or biological anomalies. Additionally, for methodological validation of our framework we used methylation profiles for 3 905 central nervous system (CNS) tumors. Results: We found that relatively simple ML models outperformed complex algorithms such as deep neural network. A logistic regression classifier achieved a balanced accuracy (BACC) of 0.90 to classify 54 cancer and healthy tissue types using methylation levels at 1208 CpG sites. Similarly, our CNS tumor classifier also based on logistic regression algorithm reached a BACC of 0.94 across 59 CNS tumor subtypes. The anomaly filtering improved performance across all categories of samples tested. We deployed our inference pipelines for public access via secure web platform - https://opp.pum.edu.pl/. Conclusions: Our study demonstrates that DNA methylation profiling, when combined with carefully controlled ML practices allows for development of robust solutions that might substantially increase the efficacy of oncological diagnosis.