Explainable AI and Multiclassifiers for Staging Biomarker Discovery in Lung Squamous Cell Carcinoma

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Lung cancer is one of the most common and lethal types of cancer worldwide. Among its subtypes, lung squamous cell carcinoma (LUSC) is one of the most frequent. Identifying biomarkers for LUSC represents a significant challenge due to its high molecular heterogeneity. However, this promising search may elucidate biological mechanisms and reveal potential therapeutic targets. In this context, the present study used gene expression data from TCGA-LUSC, combined with feature selection, data balancing, machine learning, and explainable artificial intelligence (XAI) to identify possible biomarkers related to staging. The employed methods demonstrated robust classification metrics and highlighting random forest, which achieved an accuracy of 0.91. The use of data balancing and feature selection techniques proved to be crucial in the classification process. In addition, it was possible to identify the 16 most relevant genes selected by random forest using the SHapley Additive Explanations (SHAP) method. Among them, three genes (MYOSLID, IMPDH1P8, and COL9A3) were chosen by all successful classifiers, positioning themselves as potential staging biomarkers and possible molecular therapeutic targets for LUSC.

Article activity feed