Discovery of Prognostic Biomarkers in Gastric Cancer Through Machine Learning and Bioinformatics Analysis of Gene Expression Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Gastric cancer (GC) often gets diagnosed in its advanced stages, resulting in poorer prognoses. To identify potential biomarkers in GC, we used fifteen datasets from NCBI-GEO and integrated them into a complete dataset. From the complete dataset, we extracted a subset of only the known cancer driver genes. Using Recursive Feature Elimination (RFE), mutual information (MI), and tree-based (TB) method (SelectFromModel), we extracted top gene features using RFE (10, 20, 30, 40, and 50) and 50 features each from MI and TB method. Subsequently, we applied machine learning classifiers to these selected gene features to classify cancer and normal samples. The SVC classifiers demonstrated better performance when utilizing the top 50 gene features using RFE and MI, while the AB classifier achieved the highest performance using TB for the complete dataset and for driver datasets, RF performed well using 40 features using RFE and SVC, and ET showed up better performance using 50 features using MI and TB feature selection for the test dataset. After combining all the genes from both datasets from and from all three feature selection methods, only 115 showed differentially expressed genes. A Lasso-penalized Cox regression model was applied to narrow down the gene selection to fourteen. This study highlights the effectiveness of integrating machine learning and bioinformatics analysis to identify new biomarkers for GC.

Article activity feed