{"title":"CSMAS","authors":"Youssef Tounsi, H. Anoun, L. Hassouni","doi":"10.1145/3386723.3387851","DOIUrl":null,"url":null,"abstract":"Credit risk is one of the main risks facing banks and credit institutions, with the current progress in machine learning, artificial intelligence and big data. Recent research has proposed several systems for improving credit rating. In this paper, a new scalable credit scoring multi-agent system called \"CSMAS\" is introduced for the prediction of problems in data mining of credit scoring domain. This engine is built using a seven-layer multi-agent system architecture to generate a data mining process based on the coordination of intelligent agents. CSMAS performance is based on preprocessing and data forecasting. The first layer is designed to retrieve any data from various core banking systems, payment systems, credit Bureaus and external databases and data sources and to store it in big data platform. The second layer is devoted to three different subtasks; feature engineering, pre-processing data and integrating diverse datasets. While the third layer is dedicated to dealing with missing Values and treating outliers. In the fourth layer, the techniques of dimensionality reduction are used to reduce the number of features in the original set of features. The fifth layer is dedicated to build a model using the new generation of Gradient Boosting Algorithms (XGBoost, LightGBM and CatBoost) and make predictions. The sixth layer is designed for the model's evaluation. The seventh layer is made to perform the rating of new credit applicants. The performance of CSMAS is assessed using a large dataset of Home Credit Default Risk from Kaggle Challenge (307511 records) to evaluate the risk of a loan applicant as a major problem for banks. The results show that the CSMAS give relevant results. Therefore, the results indicated that the CSMAS can be further employed as a reliable tool to predict more complicated case in credit scoring.","PeriodicalId":139072,"journal":{"name":"Proceedings of the 3rd International Conference on Networking, Information Systems & Security","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Networking, Information Systems & Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3386723.3387851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Credit risk is one of the main risks facing banks and credit institutions, with the current progress in machine learning, artificial intelligence and big data. Recent research has proposed several systems for improving credit rating. In this paper, a new scalable credit scoring multi-agent system called "CSMAS" is introduced for the prediction of problems in data mining of credit scoring domain. This engine is built using a seven-layer multi-agent system architecture to generate a data mining process based on the coordination of intelligent agents. CSMAS performance is based on preprocessing and data forecasting. The first layer is designed to retrieve any data from various core banking systems, payment systems, credit Bureaus and external databases and data sources and to store it in big data platform. The second layer is devoted to three different subtasks; feature engineering, pre-processing data and integrating diverse datasets. While the third layer is dedicated to dealing with missing Values and treating outliers. In the fourth layer, the techniques of dimensionality reduction are used to reduce the number of features in the original set of features. The fifth layer is dedicated to build a model using the new generation of Gradient Boosting Algorithms (XGBoost, LightGBM and CatBoost) and make predictions. The sixth layer is designed for the model's evaluation. The seventh layer is made to perform the rating of new credit applicants. The performance of CSMAS is assessed using a large dataset of Home Credit Default Risk from Kaggle Challenge (307511 records) to evaluate the risk of a loan applicant as a major problem for banks. The results show that the CSMAS give relevant results. Therefore, the results indicated that the CSMAS can be further employed as a reliable tool to predict more complicated case in credit scoring.