Daniyal Asif , Muhammad Shoaib Arif , Aiman Mukheimer
{"title":"A data-driven approach with explainable artificial intelligence for customer churn prediction in the telecommunications industry","authors":"Daniyal Asif , Muhammad Shoaib Arif , Aiman Mukheimer","doi":"10.1016/j.rineng.2025.104629","DOIUrl":null,"url":null,"abstract":"<div><div>In the competitive telecommunications industry (TCI), retaining clients is crucial for profitability, as customer churn remains a significant challenge. Traditional machine learning (ML) models often lack the predictive power needed for complex telecom data, while black-box models provide limited transparency, reducing trust and actionable insights. This study introduces XAI-Churn TriBoost, an interpretable and explainable data-driven model developed using a dataset of over 2 million records. The model combines extreme gradient boosting (XGBoost), categorical boosting (CatBoost), and light gradient boosting machine (LightGBM) in a soft voting ensemble to enhance churn prediction. Data preprocessing included handling missing values through iterative imputation with a Bayesian ridge. Sequential data scaling was implemented by combining robust, standard, and min-max scaling methods to ensure feature consistency. Feature selection was conducted using the Boruta technique with a random forest (RF), and class imbalance in the training data was addressed using the synthetic minority oversampling technique (SMOTE). XAI-Churn TriBoost achieved high predictive performance, with an accuracy of 96.44%, precision of 92.82%, recall of 87.82%, and F1 score of 90.25%. To enhance model transparency, we incorporated explainable artificial intelligence (AI) techniques, specifically local interpretable model-agnostic explanations (LIME) and Shapley additive explanations (SHAP), to interpret individual predictions and identify critical features affecting churn. Key factors impacting churn include regularity and montant, offering TCI valuable insights for targeted retention strategies. XAI-Churn TriBoost thus provides both robust performance and interpretability, highlighting its potential to support customer retention efforts in the TCI.</div></div>","PeriodicalId":36919,"journal":{"name":"Results in Engineering","volume":"26 ","pages":"Article 104629"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Results in Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590123025007066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
In the competitive telecommunications industry (TCI), retaining clients is crucial for profitability, as customer churn remains a significant challenge. Traditional machine learning (ML) models often lack the predictive power needed for complex telecom data, while black-box models provide limited transparency, reducing trust and actionable insights. This study introduces XAI-Churn TriBoost, an interpretable and explainable data-driven model developed using a dataset of over 2 million records. The model combines extreme gradient boosting (XGBoost), categorical boosting (CatBoost), and light gradient boosting machine (LightGBM) in a soft voting ensemble to enhance churn prediction. Data preprocessing included handling missing values through iterative imputation with a Bayesian ridge. Sequential data scaling was implemented by combining robust, standard, and min-max scaling methods to ensure feature consistency. Feature selection was conducted using the Boruta technique with a random forest (RF), and class imbalance in the training data was addressed using the synthetic minority oversampling technique (SMOTE). XAI-Churn TriBoost achieved high predictive performance, with an accuracy of 96.44%, precision of 92.82%, recall of 87.82%, and F1 score of 90.25%. To enhance model transparency, we incorporated explainable artificial intelligence (AI) techniques, specifically local interpretable model-agnostic explanations (LIME) and Shapley additive explanations (SHAP), to interpret individual predictions and identify critical features affecting churn. Key factors impacting churn include regularity and montant, offering TCI valuable insights for targeted retention strategies. XAI-Churn TriBoost thus provides both robust performance and interpretability, highlighting its potential to support customer retention efforts in the TCI.