Robust Feature Selection for Cancer Microarray Data Using a Hybrid mRMR and Binary Lion Optimization Algorithm
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Microarray cancer datasets are characterized by a large number of irrelevant, redundant, and noisy features, which can severely hinder the accuracy and efficiency of classification algorithms. Feature selection, as a crucial branch of feature engineering, aims to enhance classification performance by identifying and retaining only the most informative features. However, feature selection is an NP-hard problem, where conventional search strategies are often prone to premature convergence and local optima, resulting in increased computational burden. To address these challenges, global metaheuristic algorithms have been widely explored. The recently proposed Lion Optimization (LO) algorithm has shown promising results for continuous optimization problems, yet its design is not inherently suited for discrete feature selection tasks. To overcome this limitation, a binary variant of the LO algorithm, termed Binary Lion Optimization (BLO), is introduced for wrapper-based feature selection in microarray cancer data analysis. In this work, the Minimum Redundancy Maximum Relevance (mRMR) criterion is first employed as a filter method to identify an initial subset of relevant features, thereby reducing search complexity. The refined feature subset is then optimized using the BLO algorithm to achieve improved classification outcomes. The proposed mRMR-BLO framework was evaluated on several widely recognized cancer microarray datasets and benchmarked against four state-of-the-art binary optimization algorithms. Experimental results demonstrate that mRMR-BLO consistently identifies smaller yet highly discriminative feature subsets, while achieving competitive or superior prediction accuracy. These findings highlight the potential of mRMR-BLO as an effective and robust tool for high-dimensional microarray cancer classification.