Extended Hybrid Resampling Architecture for Addressing Imbalanced Datasets in Multi-Label Classification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Class imbalance is a common problem in multi-label classification (MLC). This problem can reduce the predictive accuracy of classifiers. To address this issue, recent studies have proposed hybrid resampling approaches that combine data-level balancing techniques in MLC. The goal of this research is to improve the performance of multi-label classifiers on imbalanced datasets by developing and testing extended hybrid resampling architecture based on REMEDIAL-Hybrid-with-Resampling (R-HwR), R-HwR-ROS and R-HwR-SMT. Hybrid resampling architecture was proposed by extending R-HwR-ROS and R-HwR-SMT with resampling strategies such as Multi-Label edited Nearest Neighbor (MLeNN), Multi-Label Tomek Link (MLTL) and Multi-Label Random Under Sampling (MLRUS) using five multi-label classifiers: Binary Relevance (BR), Classifier Chain (CC), Calibrated Label Ranking (CLR), Label Powerset (LP), and Multi-Label k-Nearest Neighbor (ML-kNN). The classifier performances were evaluated using Micro/Macro-F1, Hamming Loss, and statistical tests such as the Wilcoxon signed-rank and Friedman tests to identify significant improvements and optimal setups across several benchmark datasets. The hybrid of Base + MLTL significantly improved R-HwR-ROS and R-HwR-SMT, whereas Base + MLeNN significantly enhanced R-HwR-ROS (p < 0.05). Specifically, CC has emerged as the most reliable classifier. In R-HwR-ROS, MLeNN outperformed other combinations with the BR, CC, and CLR classifiers, whereas MLTL outperformed the other combinations with the LP and ML-kNN classifiers. In R-HwR-SMT, MLTL outperformed the other combinations for all classifiers. Hybrid resampling algorithms, including MLeNN and MLTL, greatly boost classifier robustness and balance across varied datasets.

Article activity feed