Self‐Organizing Maps as a Way to Evaluate Optimal Strategies for Balancing Binary Class Distributions: A Methodological Approach

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Given that Machine Learning algorithms are data-driven, the way datasets are collected significantly impacts their performance. Data must be gathered methodically to avoid missing values or class imbalance, but sometimes the inherent nature of the data tends to lead to such imbalances. An unbalanced dataset can lead to biased models whose predictions are influenced by the majority class. To avoid this problem, balancing strategies can be used to equalize the instances of each class. In this paper, we propose a methodology to evaluate which balancing strategies, depending on the dataset, yield the best results. We leverage Self-Organizing Maps, an unsupervised neural network model, to identify which strategy generates the most suitable balanced synthetic data. By considering their topological structure, we also propose a metric that uses the trained map to measure changes between the original dataset and the same dataset after applying the different strategies.

Article activity feed