An Effective Hybrid Sampling Strategy for Single-Split Evaluation of Classifiers

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Evaluating the classification accuracy of machine learning models typically involves multiple rounds of random training/test splits, model retraining, and performance averaging. However, this conventional approach is computationally expensive and time-consuming, especially for large datasets or complex models. To address this issue, we propose an effective sampling approach that selects a single training/test split that closely approximates the results obtained from repeated random sampling. Our approach ensures that the sampled data closely reflects the classification performance of the original dataset. Our methods integrate advanced distribution distance metrics and feature weighting techniques tailored for numerical, categorical, and mixed-type datasets. The experimental results demonstrate that our method achieves over 95% agreement with multi-run average accuracy while reducing the overhead of computations by more than 90%. This approach offers a scalable, resource-efficient alternative for reliable model evaluation, particularly valuable in time-critical or resource-constrained applications.

Article activity feed