Machine Learning for Credit Risk Prediction: A Systematic Literature Review

IF 2.2 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Pub Date : 2023-11-07 DOI:10.3390/data8110169

Jomark Pablo Noriega, Luis Antonio Rivera, José Alfredo Herrera

{"title":"Machine Learning for Credit Risk Prediction: A Systematic Literature Review","authors":"Jomark Pablo Noriega, Luis Antonio Rivera, José Alfredo Herrera","doi":"10.3390/data8110169","DOIUrl":null,"url":null,"abstract":"In this systematic review of the literature on using Machine Learning (ML) for credit risk prediction, we raise the need for financial institutions to use Artificial Intelligence (AI) and ML to assess credit risk, analyzing large volumes of information. We posed research questions about algorithms, metrics, results, datasets, variables, and related limitations in predicting credit risk. In addition, we searched renowned databases responding to them and identified 52 relevant studies within the credit industry of microfinance. Challenges and approaches in credit risk prediction using ML models were identified; we had difficulties with the implemented models such as the black box model, the need for explanatory artificial intelligence, the importance of selecting relevant features, addressing multicollinearity, and the problem of the imbalance in the input data. By answering the inquiries, we identified that the Boosted Category is the most researched family of ML models; the most commonly used metrics for evaluation are Area Under Curve (AUC), Accuracy (ACC), Recall, precision measure F1 (F1), and Precision. Research mainly uses public datasets to compare models, and private ones to generate new knowledge when applied to the real world. The most significant limitation identified is the representativeness of reality, and the variables primarily used in the microcredit industry are data related to the Demographic, Operation, and Payment behavior. This study aims to guide developers of credit risk management tools and software towards the existing ability of ML methods, metrics, and techniques used to forecast it, thereby minimizing possible losses due to default and guiding risk appetite.","PeriodicalId":36824,"journal":{"name":"Data","volume":"5 6","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/data8110169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In this systematic review of the literature on using Machine Learning (ML) for credit risk prediction, we raise the need for financial institutions to use Artificial Intelligence (AI) and ML to assess credit risk, analyzing large volumes of information. We posed research questions about algorithms, metrics, results, datasets, variables, and related limitations in predicting credit risk. In addition, we searched renowned databases responding to them and identified 52 relevant studies within the credit industry of microfinance. Challenges and approaches in credit risk prediction using ML models were identified; we had difficulties with the implemented models such as the black box model, the need for explanatory artificial intelligence, the importance of selecting relevant features, addressing multicollinearity, and the problem of the imbalance in the input data. By answering the inquiries, we identified that the Boosted Category is the most researched family of ML models; the most commonly used metrics for evaluation are Area Under Curve (AUC), Accuracy (ACC), Recall, precision measure F1 (F1), and Precision. Research mainly uses public datasets to compare models, and private ones to generate new knowledge when applied to the real world. The most significant limitation identified is the representativeness of reality, and the variables primarily used in the microcredit industry are data related to the Demographic, Operation, and Payment behavior. This study aims to guide developers of credit risk management tools and software towards the existing ability of ML methods, metrics, and techniques used to forecast it, thereby minimizing possible losses due to default and guiding risk appetite.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

信用风险预测的机器学习:系统文献综述

在这篇关于使用机器学习(ML)进行信用风险预测的文献的系统综述中，我们提出金融机构需要使用人工智能(AI)和机器学习来评估信用风险，分析大量信息。我们提出了关于预测信用风险的算法、指标、结果、数据集、变量和相关限制的研究问题。此外，我们检索了与他们相关的知名数据库，并在小额信贷信贷行业中确定了52项相关研究。识别了使用ML模型进行信用风险预测的挑战和方法;我们在实现模型方面遇到了困难，例如黑箱模型、解释性人工智能的需求、选择相关特征的重要性、解决多重共线性以及输入数据不平衡的问题。通过回答这些问题，我们发现boost类别是研究最多的ML模型家族;最常用的评估指标是曲线下面积(AUC)、准确度(ACC)、召回率(Recall)、精度测量F1 (F1)和精度(precision)。研究主要使用公共数据集来比较模型，而使用私有数据集在应用于现实世界时产生新的知识。发现的最重要的限制是现实的代表性，小额信贷行业主要使用的变量是与人口统计、操作和支付行为相关的数据。本研究旨在指导信用风险管理工具和软件的开发人员利用机器学习方法、指标和技术的现有能力来预测信用风险，从而最大限度地减少违约可能造成的损失，并引导风险偏好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊