{"title":"基于数据填充和贝叶斯优化改进的修正平衡随机森林的不平衡信贷风险预测","authors":"Hongyu Zhang, Zhenjun Ye","doi":"10.61935/aedmr.2.1.2024.p115","DOIUrl":null,"url":null,"abstract":"Based on the distribution characteristics of financial big data, credit risk prediction models often face some problems, such as unbalanced data distribution and difficult data preprocessing process. High-precision models are often accompanied by low model efficiency. Therefore, this paper constructs a complete non-equilibrium credit risk prediction model, namely BO-PBRF, and improves the algorithm to deal with common problems in financial data. In the data preprocessing stage, two missing value fillers are generated according to the original data to facilitate the subsequent new data processing. In the modeling stage, we improve the balanced random forest algorithm, so that the model can not only deal with unbalanced data sets, but also suitable for the background of the explosive development of financial big data, and improve the operation speed of the model. In addition, in the process of establishing the model, we add the Bayesian optimization algorithm to further improve the accuracy of the model, especially in the prediction of default loans. In order to verify the effectiveness of the model proposed in this paper, in the empirical research, we select the credit data from the real world, and compare the model proposed in this paper with the previous models. The experimental results show that the proposed model has the best prediction performance for default data.","PeriodicalId":502155,"journal":{"name":"Advances in Economic Development and Management Research","volume":" 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Imbalanced credit risk prediction based on data fillers and modified balanced random forest improved by Bayesian optimization\",\"authors\":\"Hongyu Zhang, Zhenjun Ye\",\"doi\":\"10.61935/aedmr.2.1.2024.p115\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Based on the distribution characteristics of financial big data, credit risk prediction models often face some problems, such as unbalanced data distribution and difficult data preprocessing process. High-precision models are often accompanied by low model efficiency. Therefore, this paper constructs a complete non-equilibrium credit risk prediction model, namely BO-PBRF, and improves the algorithm to deal with common problems in financial data. In the data preprocessing stage, two missing value fillers are generated according to the original data to facilitate the subsequent new data processing. In the modeling stage, we improve the balanced random forest algorithm, so that the model can not only deal with unbalanced data sets, but also suitable for the background of the explosive development of financial big data, and improve the operation speed of the model. In addition, in the process of establishing the model, we add the Bayesian optimization algorithm to further improve the accuracy of the model, especially in the prediction of default loans. In order to verify the effectiveness of the model proposed in this paper, in the empirical research, we select the credit data from the real world, and compare the model proposed in this paper with the previous models. The experimental results show that the proposed model has the best prediction performance for default data.\",\"PeriodicalId\":502155,\"journal\":{\"name\":\"Advances in Economic Development and Management Research\",\"volume\":\" 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Economic Development and Management Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.61935/aedmr.2.1.2024.p115\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Economic Development and Management Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.61935/aedmr.2.1.2024.p115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Imbalanced credit risk prediction based on data fillers and modified balanced random forest improved by Bayesian optimization
Based on the distribution characteristics of financial big data, credit risk prediction models often face some problems, such as unbalanced data distribution and difficult data preprocessing process. High-precision models are often accompanied by low model efficiency. Therefore, this paper constructs a complete non-equilibrium credit risk prediction model, namely BO-PBRF, and improves the algorithm to deal with common problems in financial data. In the data preprocessing stage, two missing value fillers are generated according to the original data to facilitate the subsequent new data processing. In the modeling stage, we improve the balanced random forest algorithm, so that the model can not only deal with unbalanced data sets, but also suitable for the background of the explosive development of financial big data, and improve the operation speed of the model. In addition, in the process of establishing the model, we add the Bayesian optimization algorithm to further improve the accuracy of the model, especially in the prediction of default loans. In order to verify the effectiveness of the model proposed in this paper, in the empirical research, we select the credit data from the real world, and compare the model proposed in this paper with the previous models. The experimental results show that the proposed model has the best prediction performance for default data.