Will they repay their debt? Identification of borrowers likely to be charged off

R. Caplescu, A. Panaite, D. Pele, V. Strat
{"title":"Will they repay their debt? Identification of borrowers likely to be charged off","authors":"R. Caplescu, A. Panaite, D. Pele, V. Strat","doi":"10.2478/mmcks-2020-0023","DOIUrl":null,"url":null,"abstract":"Abstract Recent increase in peer-to-peer lending prompted for development of models to separate good and bad clients to mitigate risks both for lenders and for the platforms. The rapidly increasing body of literature provides several comparisons between various models. Among the most frequently employed ones are logistic regression, Support Vector Machines, neural networks and decision tree-based models. Among them, logistic regression has proved to be a strong candidate both because its good performance and due to its high explainability. The present paper aims to compare four pairs of models (for imbalanced and under-sampled data) meant to predict charged off clients by optimizing F1 score. We found that, if the data is balanced, Logistic Regression, both simple and with Stochastic Gradient Descent, outperforms LightGBM and K-Nearest Neighbors in optimizing F1 score. We chose this metric as it provides balance between the interests of the lenders and those of the platform. Loan term, debt-to-income ratio and number of accounts were found to be important positively related predictors of risk of charge off. At the other end of the spectrum, by far the strongest impact on charge off probability is that of the FICO score. The final number of features retained by the two models differs very much, because, although both models use Lasso for feature selection, Stochastic Gradient Descent Logistic Regression uses a stronger regularization. The analysis was performed using Python (numpy, pandas, sklearn and imblearn).","PeriodicalId":44909,"journal":{"name":"Management & Marketing-Challenges for the Knowledge Society","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2020-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Management & Marketing-Challenges for the Knowledge Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/mmcks-2020-0023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BUSINESS","Score":null,"Total":0}
引用次数: 3

Abstract

Abstract Recent increase in peer-to-peer lending prompted for development of models to separate good and bad clients to mitigate risks both for lenders and for the platforms. The rapidly increasing body of literature provides several comparisons between various models. Among the most frequently employed ones are logistic regression, Support Vector Machines, neural networks and decision tree-based models. Among them, logistic regression has proved to be a strong candidate both because its good performance and due to its high explainability. The present paper aims to compare four pairs of models (for imbalanced and under-sampled data) meant to predict charged off clients by optimizing F1 score. We found that, if the data is balanced, Logistic Regression, both simple and with Stochastic Gradient Descent, outperforms LightGBM and K-Nearest Neighbors in optimizing F1 score. We chose this metric as it provides balance between the interests of the lenders and those of the platform. Loan term, debt-to-income ratio and number of accounts were found to be important positively related predictors of risk of charge off. At the other end of the spectrum, by far the strongest impact on charge off probability is that of the FICO score. The final number of features retained by the two models differs very much, because, although both models use Lasso for feature selection, Stochastic Gradient Descent Logistic Regression uses a stronger regularization. The analysis was performed using Python (numpy, pandas, sklearn and imblearn).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
他们会偿还债务吗?确定可能被冲销的借款人
最近p2p借贷的增加促使了区分好客户和坏客户的模型的发展,以减轻贷方和平台的风险。迅速增加的文献提供了几种不同模型之间的比较。其中最常用的是逻辑回归,支持向量机,神经网络和基于决策树的模型。其中,逻辑回归因其良好的性能和较高的可解释性而被证明是一个强有力的候选者。本文旨在比较四对模型(用于不平衡和采样不足的数据),旨在通过优化F1分数来预测收费客户。我们发现,如果数据是平衡的,Logistic回归,无论是简单的还是随机梯度下降,在优化F1分数方面都优于LightGBM和k近邻。我们之所以选择这一指标,是因为它能够平衡出借人与平台之间的利益。贷款期限、债务收入比和账户数量是冲销风险的重要正相关预测因子。在频谱的另一端,到目前为止,对冲销概率影响最大的是FICO分数。两种模型保留的最终特征数量差别很大,因为尽管两种模型都使用Lasso进行特征选择,但随机梯度下降逻辑回归使用了更强的正则化。使用Python (numpy, pandas, sklearn和imblearn)进行分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.20
自引率
2.70%
发文量
25
审稿时长
10 weeks
期刊最新文献
Managing energy performance through heat pumps. Success drivers and barriers in residential sector An innovative conceptual model for education and training on hybrid warfare Sales effect of a software product series’ length in Japan Determinants of digital wallet adoption and super app: A review and research model Capital Market Volatility During Crises: Oil Price Insights, VIX Index, and Gold Price Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1