Extended Hybrid Resampling Architecture for Addressing Imbalanced Datasets in Multi-Label Classification

Mediana Aryuni
Chastine Fatichah
Anny Yuniarti

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Class imbalance is a common problem in multi-label classification (MLC). This problem can reduce the predictive accuracy of classifiers. To address this issue, recent studies have proposed hybrid resampling approaches that combine data-level balancing techniques in MLC. The goal of this research is to improve the performance of multi-label classifiers on imbalanced datasets by developing and testing extended hybrid resampling architecture based on REMEDIAL-Hybrid-with-Resampling (R-HwR), R-HwR-ROS and R-HwR-SMT. Hybrid resampling architecture was proposed by extending R-HwR-ROS and R-HwR-SMT with resampling strategies such as Multi-Label edited Nearest Neighbor (MLeNN), Multi-Label Tomek Link (MLTL) and Multi-Label Random Under Sampling (MLRUS) using five multi-label classifiers: Binary Relevance (BR), Classifier Chain (CC), Calibrated Label Ranking (CLR), Label Powerset (LP), and Multi-Label k-Nearest Neighbor (ML-kNN). The classifier performances were evaluated using Micro/Macro-F1, Hamming Loss, and statistical tests such as the Wilcoxon signed-rank and Friedman tests to identify significant improvements and optimal setups across several benchmark datasets. The hybrid of Base + MLTL significantly improved R-HwR-ROS and R-HwR-SMT, whereas Base + MLeNN significantly enhanced R-HwR-ROS (p < 0.05). Specifically, CC has emerged as the most reliable classifier. In R-HwR-ROS, MLeNN outperformed other combinations with the BR, CC, and CLR classifiers, whereas MLTL outperformed the other combinations with the LP and ML-kNN classifiers. In R-HwR-SMT, MLTL outperformed the other combinations for all classifiers. Hybrid resampling algorithms, including MLeNN and MLTL, greatly boost classifier robustness and balance across varied datasets.

Version published to 10.21203/rs.3.rs-9080792/v1 on Research Square
Mar 27, 2026

Partial Multi-Label Learning with Missing Labels via Feature-Label Disentanglement

This article has 2 authors:
1. Yuzhi Tao
2. Anhui Tan
This article has no evaluationsLatest version Mar 23, 2026
A Multiple Instance Learning framework with Instance Identification and Supervised Contrastive Learning for WSI Classification

This article has 4 authors:
1. Liming Yuan
2. Guangcan Hu
3. Na Qin
4. Lu Zhao
This article has no evaluationsLatest version Apr 1, 2026
Confidence-Aware Pseudo-Labeling via Unsupervised Ensemble Consensus for Fraud Detection Contribution

This article has 4 authors:
1. Daniel Agyekum Amakye
2. Joseph Dadzie
3. Nana Yaw Duodu
4. Albert Mainu Tawiah
This article has no evaluationsLatest version Apr 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Partial Multi-Label Learning with Missing Labels via Feature-Label Disentanglement

A Multiple Instance Learning framework with Instance Identification and Supervised Contrastive Learning for WSI Classification

Confidence-Aware Pseudo-Labeling via Unsupervised Ensemble Consensus for Fraud Detection Contribution