Evaluating the Performance of Random Forest and XGBoost with Gaussian Noise Upsampling Technique for Customer Churn Prediction

Mehdi Imani
Danish Hashmi
Shalini Verma

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Customer churn is a critical challenge for subscription-based businesses, especially in telecommunications, where retaining customers is essential to maintaining profitability. This study investigates the efficacy of two ML models, XGBoost and Random Forest, for predicting customer churn using a publicly available telecommunications dataset. The dataset, characterized by imbalanced classes, presents a crucial challenge addressed by incorporating the Gaussian Noise Upsampling (GNUS) sampling technique. The study evaluates and compares the two models using essential performance indicators, including precision, recall, accuracy, F1-score, and ROC-AUC, both with and without GNUS sampling. The results indicate that while XGBoost initially outperforms Random Forest across most metrics, both models show improved recall after the GNUS application, particularly in identifying churn cases. However, this improvement in recall comes with a trade-off in precision and overall accuracy. The findings highlight the relevance of using appropriate sampling techniques to tackle class imbalance in churn prediction and provide valuable insights for developing proactive customer retention strategies.

Version published to 10.20944/preprints202410.2329.v3
Feb 21, 2025
Version published to 10.20944/preprints202410.2329.v2
Feb 6, 2025
Version published to 10.20944/preprints202410.2329.v1
Oct 30, 2024

SHAP-Based Feature Selection and Iterative Hyperparameter Tuning for Customer Churn Prediction in Telecommunication Datasets

This article has 1 author:
1. Bijaya Pariyar
This article has no evaluationsLatest version Dec 16, 2025
Random Forest Model for Predicting Claims for Outages in Telecommunications Operating Companies in Peru (2016–2024)

This article has 2 authors:
1. Francis Homero Padilla Quispe
2. Erick Giovanny Flores Chacón
This article has no evaluationsLatest version Dec 22, 2025
Intelligent 5G Network Performance Optimization through Gradient Boosting

This article has 2 authors:
1. Mohammed Al-Hubaishi
2. Abdulkader Alabdullah
This article has no evaluationsLatest version Jan 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

SHAP-Based Feature Selection and Iterative Hyperparameter Tuning for Customer Churn Prediction in Telecommunication Datasets

Random Forest Model for Predicting Claims for Outages in Telecommunications Operating Companies in Peru (2016–2024)

Intelligent 5G Network Performance Optimization through Gradient Boosting