Self-organizing maps as a way to evaluate optimal strategies for balancing binary class distributions: a methodological approach

Alberto Nogales
Diego Guadalupe
Álvaro J. García-Tejedor

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Since machine learning algorithms rely on data, the way datasets are collected significantly impacts their performance. Data must be carefully gathered to minimize missing values or class imbalance. However, the inherent nature of the data tends can sometimes lead to such imbalances. An unbalanced dataset can lead to biased models, where predictions are influenced by the majority class. To avoid this problem, balancing strategies can be applied to equalize the instances of each class. This paper introduces a methodological approach to evaluate which balancing strategies yield the best results depending on the dataset. We leverage self-organizing maps, an unsupervised neural network model, to identify which strategy generates the most suitable balanced synthetic data. By considering the topological structure of the data, we propose a metric that uses the trained map to measure changes between the original dataset and the transformed dataset after applying different strategies. This metric is based on the idea that synthetic data resembling the original dataset more closely is preferable.

Version published to 10.21203/rs.3.rs-5559968/v1 on Research Square
Mar 25, 2025

Mixed-robROSE: a Novel Balancing Technique Tailored for Mixed-Type Datasets

This article has 3 authors:
1. Rasool Taban
2. Cláudia Nunes
3. M. Rosário Oliveira
This article has no evaluationsLatest version Apr 17, 2025
Evaluating the Efficacy of Bayesian Optimization for Class-Imbalanced Data: Jointly Optimizing Classifier Hyperparameters and Sampling Strategies

This article has 1 author:
1. Graham Glasheen
This article has no evaluationsLatest version Apr 15, 2025
Optimizing Seminal Quality Prediction Using Machine Learning with Data Preprocessing and Feature Selection

This article has 6 authors:
1. Aamir Farooq
2. Zhengrong Xiang
3. Musaed Alhussein
4. Muhammad Shahzad
5. Muhammad Farhan
6. Khursheed Aurangzeb
This article has no evaluationsLatest version Apr 9, 2025

Listed in

Abstract

Article activity feed

Related articles

Mixed-robROSE: a Novel Balancing Technique Tailored for Mixed-Type Datasets

Evaluating the Efficacy of Bayesian Optimization for Class-Imbalanced Data: Jointly Optimizing Classifier Hyperparameters and Sampling Strategies

Optimizing Seminal Quality Prediction Using Machine Learning with Data Preprocessing and Feature Selection