TriBal: An Ensemble Learning Three-Level Data Balancing Framework for Click Fraud Detection in Online Advertising
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Click fraud in online advertising, where malicious actors generate fake clicks, presents a significant challenge due to the class imbalance between legitimate and fraudulent clicks. This study proposes 'TriBal', an ensemble resampling framework designed to detect click fraud by employing a three-level data balancing approach using SMOTE, Cluster Centroids, and Edited Nearest Neighbors (ENN). At Level 1, SMOTE is applied to oversample the minority class (fraudulent clicks), generating synthetic data to ensure the classifier has sufficient information to learn fraud patterns. Level 2 uses Cluster Centroids to under-sample the majority class (legitimate clicks) by condensing them into representative samples, reducing redundancy and further balancing the dataset. ENN is employed at Level 3, to remove noisy and misclassified instances near the decision boundary which results in an ensemble balance dataset, which is further refined by eliminating confusing data points. The effectiveness of TriBal is evaluated on nine benchmark datasets using 10-fold cross-validation, with performance measured through average precision, recall, and F1-score. Experimental results demonstrate that the proposed methodology is a more efficient alternative to existing sampling techniques, offering improved performance in detecting click fraud.