Performance Assessment of an Unsupervised Variable Selection Approach for Biomarker Discovery and Glioblastoma Subtyping
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
High-dimensional omics data often contain more variables than observations, which negatively impacts the performance of classical data analysis methods. Dimensionality reduction is typically addressed through variable selection strategies that incorporate a penalty term into the model. While effective for selecting task-specific variables, this approach may not be optimal when the goal is to preserve the dataset structure and the overall biological information for multiple downstream analyses. In such cases, a priori unsupervised variable selection is preferable. In this study, we evaluate several unsupervised variable selection approaches to derive a representative subset of the original dataset. Building on the performance assessment results, we introduce TRIM-IT, a novel tool for unsupervised variable selection, clustering, survival analysis, and differential gene expression analysis. Applied to glioblastoma (GBM) data, TRIM-IT identified three clusters that correlate with tumor histology, exhibit distinct survival curves, and display unique molecular profiles with genes potentially serving as biomarkers. The tool is available for reproduction and adaptation to other studies.