Enhancing Subscription Fraud Detection Through Ensemble Learning: The Case of Ethio Telecom
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Telecommunication companies globally face the critical challenge of subscription fraud, which threatens both financial stability and national security. This research addresses this issue by developing an advanced fraud detection model specifically for Ethio Telecom. The model utilizes Ensemble and Adaptive Learning techniques to enhance detection accuracy by combining multiple classifiers. The study used a dataset of 1,000,000 Call Detail Records (CDRs) collected over two months known for increased fraudulent activity3. After filtering out irrelevant data and aggregating multiple call records per subscriber, the dataset was refined to 349,164 records. Initially, 16 features were analyzed, with four excluded for lacking relevance. The remaining 11 features, excluding the target variable, underwent preprocessing including data cleaning, transformation, and balancing4. Feature selection, utilizing Correlation Matrix and Random Forest importance analysis, led to the removal of four additional features, resulting in a final set of 8 key features, including INT_DIALLED, RATIO_INT_TOTAL, and RATIO_UNIQUE_TOTAL4. Three individual models, namely Decision Tree (DT), Logistic Regression (LR), and Artificial Neural Network (ANN), were implemented alongside ensemble methods such as Bagging, Boosting, Stacking, and Voting, and adaptive models like Hoeffding Tree and Adaptive Random Forest45. The findings of this research recommend Stacking and Adaptive Random Forest (ARF) as robust tools for subscription fraud detection.