TriBal: An Ensemble Learning Three-Level Data Balancing Framework for Click Fraud Detection in Online Advertising

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Click fraud in online advertising, where malicious actors generate fake clicks, presents a significant challenge due to the class imbalance between legitimate and fraudulent clicks. This study proposes 'TriBal', an ensemble resampling framework designed to detect click fraud by employing a three-level data balancing approach using SMOTE, Cluster Centroids, and Edited Nearest Neighbors (ENN). At Level 1, SMOTE is applied to oversample the minority class (fraudulent clicks), generating synthetic data to ensure the classifier has sufficient information to learn fraud patterns. Level 2 uses Cluster Centroids to under-sample the majority class (legitimate clicks) by condensing them into representative samples, reducing redundancy and further balancing the dataset. ENN is employed at Level 3, to remove noisy and misclassified instances near the decision boundary which results in an ensemble balance dataset, which is further refined by eliminating confusing data points. The effectiveness of TriBal is evaluated on nine benchmark datasets using 10-fold cross-validation, with performance measured through average precision, recall, and F1-score. Experimental results demonstrate that the proposed methodology is a more efficient alternative to existing sampling techniques, offering improved performance in detecting click fraud.

Article activity feed