Predicting Customer Churn in the Telecommunications Sector Using Machine Learning Techniques: A Comparative Modelling Approach

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The issue of customer churn continues to be one of the main problems to be dealt with, especially by the telecoms market, where strong competition and service contracts that are pretty much at customer's disposal make it harder to keep clients than to win new ones. This project, therefore, has the ambition of creating and testing predictive models that are capable of indicating accurately the customers who are at risk of leaving the company through the use of the Telco Customer Churn dataset that is available on Kaggle. In total, there was a comprehensive preprocessing done that contained several steps; the first one was dealing with missing values, then came the encoding of categorical variables, followed by the removal of duplicates, and finally the preparation of the numerical features for modeling. The comparison of interpretability, scalability, and predictive performance was done through the implementation of four supervised machine learning algorithms—Logistic Regression, Decision Tree, Random Forest, and XGBoost. The evaluation of the model was carried out by applying a stratified train-test split, with ROC-AUC, Precision, Recall, F1-score, and Accuracy metrics following. Random Forest was the model with the best combination of recall and ROC-AUC among all the models, while XGBoost received the highest accuracy and precision, hence, being the most dependable overall performer. Data from the study point to month-to-month contracts, higher monthly charges and no technical support services as the main predictors of churn. These results give data-driven insights that can help the telecom operators in devising customer retention strategies, personalizing customer interventions, and reducing the amount of money lost in revenue. As for the future, there is an idea of doing threshold tuning, using more advanced resampling techniques such as SMOTE, and the combination of SHAP-based interpretability in the customer relationship management systems to enhance real-time decision support.

Article activity feed