Deep Learning for Biomarker Discovery in Cancer Genomes

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Genomic data is essential for clinical decision-making in precision oncology. Bioinformatic algorithms are widely used to analyze next-generation sequencing (NGS) data, but they face two major challenges. First, these pipelines are highly complex, involving multiple steps and the integration of various tools. Second, they generate features that are human-interpretable but often result in information loss by focusing only on predefined genetic properties. This limitation restricts the full potential of NGS data in biomarker extraction and slows the discovery of new biomarkers in precision oncology.

Methods

We propose an end-to-end deep learning (DL) approach for analyzing NGS data. Specifically, we developed a multiple instance learning DL framework that integrates somatic mutation sequences to predict two compound biomarkers: microsatellite instability (MSI) and homologous recombination deficiency (HRD). To achieve this, we utilized data from 3,184 cancer patients obtained from two public databases: The Cancer Genome Atlas (TCGA) and the Clinical Proteome Tumor Analysis Consortium (CPTAC).

Results

Our proposed deep learning method demonstrated high accuracy in identifying clinically relevant biomarkers. For predicting MSI status, the model achieved an accuracy of 0.98, a sensitivity of 0.95, and a specificity of 1.00 on an external validation cohort. For predicting HRD status, the model achieved an accuracy of 0.80, a sensitivity of 0.75, and a specificity of 0.86. Furthermore, the deep learning approach significantly outperformed traditional machine learning methods in both tasks (MSI accuracy, p-value = 5.11×10 −18 ; HRD accuracy, p-value = 1.07×10 −10 ). Using explainability techniques, we demonstrated that the model’s predictions are based on biologically meaningful features, aligning with key DNA damage repair mutation signatures.

Conclusion

We demonstrate that deep learning can identify patterns in unfiltered somatic mutations without the need for manual feature extraction. This approach enhances the detection of actionable targets and paves the way for developing NGS-based biomarkers using minimally processed data.

Article activity feed