A federated learning based approach for loan defaults prediction

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI:10.1109/ICDMW51313.2020.00057

Geet Shingi

{"title":"A federated learning based approach for loan defaults prediction","authors":"Geet Shingi","doi":"10.1109/ICDMW51313.2020.00057","DOIUrl":null,"url":null,"abstract":"The number of defaults in bank loans have recently been increasing in the past years. However, the process of sanctioning the loan has still been done manually in many of the banking organizations. Dependency on human intervention and delay in results have been the biggest obstacles in this system. While implementing machine learning models for banking applications, the security of sensitive customer banking data has always been a crucial concern and with strong legislative rules in place, sharing of data with other organizations is not possible. Along with this, the loan dataset is highly imbalanced, there are very few samples of defaults as compared to repaid loans. Hence, these problems make the default prediction system difficult to learn the patterns of defaults and thus difficult to predict them. Previous machine learning-based approaches to automate the process have been training models on the same organization's data but in today's world, classifying the loan application on the data within the organizations is no longer sufficient and a feasible solution. In this paper, we propose a federated learning-based approach for the prediction of loan applications that are less likely to be repaid which helps in resolving the above mentioned issues by sharing the weight of the model which are aggregated at the central server. The federated system is coupled with Synthetic Minority Over-sampling Technique(SMOTE) to solve the problem of imbalanced training data. Further, The federated system is coupled with a weighted aggregation based on the number of samples and performance of a worker on his dataset to further augment the performance. The improved performance by our model on publicly available real-world data further validates the same. Flexible, aggregated models can prove to be crucial in keeping out the defaulters in loan applications.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

Abstract

The number of defaults in bank loans have recently been increasing in the past years. However, the process of sanctioning the loan has still been done manually in many of the banking organizations. Dependency on human intervention and delay in results have been the biggest obstacles in this system. While implementing machine learning models for banking applications, the security of sensitive customer banking data has always been a crucial concern and with strong legislative rules in place, sharing of data with other organizations is not possible. Along with this, the loan dataset is highly imbalanced, there are very few samples of defaults as compared to repaid loans. Hence, these problems make the default prediction system difficult to learn the patterns of defaults and thus difficult to predict them. Previous machine learning-based approaches to automate the process have been training models on the same organization's data but in today's world, classifying the loan application on the data within the organizations is no longer sufficient and a feasible solution. In this paper, we propose a federated learning-based approach for the prediction of loan applications that are less likely to be repaid which helps in resolving the above mentioned issues by sharing the weight of the model which are aggregated at the central server. The federated system is coupled with Synthetic Minority Over-sampling Technique(SMOTE) to solve the problem of imbalanced training data. Further, The federated system is coupled with a weighted aggregation based on the number of samples and performance of a worker on his dataset to further augment the performance. The improved performance by our model on publicly available real-world data further validates the same. Flexible, aggregated models can prove to be crucial in keeping out the defaulters in loan applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于联邦学习的贷款违约预测方法

过去几年，银行贷款违约的数量一直在增加。然而，在许多银行组织中，批准贷款的过程仍然是手工完成的。依赖人为干预和拖延结果一直是这一系统的最大障碍。在为银行应用程序实施机器学习模型时，敏感客户银行数据的安全性一直是一个关键问题，并且由于有强有力的立法规则，与其他组织共享数据是不可能的。与此同时，贷款数据集高度不平衡，与偿还贷款相比，违约样本很少。因此，这些问题使得默认预测系统难以学习默认模式，从而难以预测默认模式。以前基于机器学习的自动化流程方法是在同一组织的数据上训练模型，但在当今世界，根据组织内的数据对贷款申请进行分类不再是足够的，也是可行的解决方案。在本文中，我们提出了一种基于联邦学习的方法来预测不太可能偿还的贷款申请，该方法通过共享在中央服务器上聚合的模型的权重来帮助解决上述问题。为了解决训练数据不平衡的问题，将联邦系统与合成少数派过采样技术(SMOTE)相结合。此外，联邦系统与基于样本数量和工作人员在其数据集上的性能的加权聚合相结合，以进一步提高性能。我们的模型在公开可用的真实数据上的改进性能进一步验证了这一点。事实证明，灵活的聚合模型对于阻止贷款申请中的违约者是至关重要的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 International Conference on Data Mining Workshops (ICDMW)

自引率

0.00%

发文量

期刊最新文献

Synthetic Data by Principal Component Analysis Deep Contextualized Word Embedding for Text-based Online User Profiling to Detect Social Bots on Twitter Integration of Fuzzy and Deep Learning in Three-Way Decisions Mining Heterogeneous Data for Formulation Design Restructuring of Hoeffding Trees for Trapezoidal Data Streams