Effective Hybrid Sampling Approach for Evaluating Classification Performance

Show-Jane Yen
Yue-Shi Lee
Yi-Jie Tang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In order to evaluate the classification performance of an algorithm, it is necessary to partition the original dataset into training and test subsets. After constructing a classification model using the training dataset, the test dataset is utilized to evaluate its accuracy. However, accurately assessing classification performance typically requires multiple rounds of training/test data sampling, model construction, and accuracy evaluation, followed by averaging the results. This process is computationally expensive and time-consuming. To address this issue, we propose an effective sampling approach that allows for the selection of training and test sets which closely approximate the outcomes derived from repeated sampling and evaluation processes. Our approach ensures that the sampled data closely reflects the classification performance on the original dataset. Specifically, we introduce various techniques to measure the similarity of data distributions and incorporate feature weighting in the similarity computation, allowing us to select training and test sets that best preserve the distributional characteristics of the original dataset.

Version published to 10.20944/preprints202506.0212.v1
Jun 3, 2025

A Statistical Approach to the Confusion Matrix for Classification Problems Using Machine Learning

This article has 2 authors:
1. Rafael Sanchez-Marquez
2. Jose Jabaloyes Vivas
This article has no evaluationsLatest version Jun 4, 2025
A Hybrid Ensemble Method with Focal Loss for Improved Forecasting Accuracy on Imbalanced Datasets

This article has 5 authors:
1. Xiaojun Guo
2. Wenxiu Cai
3. Yu Cheng
4. Jiaqi Chen
5. Liyang Wang
This article has no evaluationsLatest version Apr 10, 2025
Hyperparameter Optimization Strategies for Tree-Based Machine Learning Models Prediction: A Comparative Study of AdaBoost, Decision Trees, and Random Forest

This article has 1 author:
1. Mohsen Mohammadagha
This article has no evaluationsLatest version Apr 14, 2025

Listed in

Abstract

Article activity feed

Related articles

A Statistical Approach to the Confusion Matrix for Classification Problems Using Machine Learning

A Hybrid Ensemble Method with Focal Loss for Improved Forecasting Accuracy on Imbalanced Datasets

Hyperparameter Optimization Strategies for Tree-Based Machine Learning Models Prediction: A Comparative Study of AdaBoost, Decision Trees, and Random Forest