Sample selection bias in non-traditional lending: A copula-based approach for imbalanced data

IF 6.2 2区 经济学 Q1 ECONOMICS Socio-economic Planning Sciences Pub Date : 2024-08-29 DOI:10.1016/j.seps.2024.102045
Raffaella Calabrese , Silvia Angela Osmetti , Luca Zanin
{"title":"Sample selection bias in non-traditional lending: A copula-based approach for imbalanced data","authors":"Raffaella Calabrese ,&nbsp;Silvia Angela Osmetti ,&nbsp;Luca Zanin","doi":"10.1016/j.seps.2024.102045","DOIUrl":null,"url":null,"abstract":"<div><p>Credit scoring models for non-traditional lending channels, such as peer-to-peer (P2P) lending platforms, are usually estimated only on the sample of accepted applicants. This may lead to biased estimates of the risk drivers. This issue can be addressed using a reject inference technique that includes the characteristics of rejected applicants in the model. Due to the low numbers of accepted applicants and default records, credit scoring models usually face a class imbalance problem. However, previous literature on sample selection models for credit scoring does not address the class imbalance issue. To fill this gap, we extend the Generalised Extreme Value (GEV) regression model for binary data to the sample selection framework. We consider the quantile function of the GEV distribution as a link function in both the selection and outcome equations. We use the copula function to model the dependence structure between the two equations for its flexibility. This proposal is called the Sample Selection Generalised Extreme Value (SSGEV) model and it is implemented in the R package BivGEV. We apply this model to a comprehensive dataset provided by Lending Club, and we show that parameter estimates obtained only on accepted P2P applicants are biased and coherently with the literature. The SSGEV model achieves a higher predictive accuracy than those obtained using univariate approaches or a sample selection probit model. Our proposal also provides more conservative estimates of the Value-at-Risk and the Expected Shortfall.</p></div>","PeriodicalId":22033,"journal":{"name":"Socio-economic Planning Sciences","volume":"95 ","pages":"Article 102045"},"PeriodicalIF":6.2000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Socio-economic Planning Sciences","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0038012124002441","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

Credit scoring models for non-traditional lending channels, such as peer-to-peer (P2P) lending platforms, are usually estimated only on the sample of accepted applicants. This may lead to biased estimates of the risk drivers. This issue can be addressed using a reject inference technique that includes the characteristics of rejected applicants in the model. Due to the low numbers of accepted applicants and default records, credit scoring models usually face a class imbalance problem. However, previous literature on sample selection models for credit scoring does not address the class imbalance issue. To fill this gap, we extend the Generalised Extreme Value (GEV) regression model for binary data to the sample selection framework. We consider the quantile function of the GEV distribution as a link function in both the selection and outcome equations. We use the copula function to model the dependence structure between the two equations for its flexibility. This proposal is called the Sample Selection Generalised Extreme Value (SSGEV) model and it is implemented in the R package BivGEV. We apply this model to a comprehensive dataset provided by Lending Club, and we show that parameter estimates obtained only on accepted P2P applicants are biased and coherently with the literature. The SSGEV model achieves a higher predictive accuracy than those obtained using univariate approaches or a sample selection probit model. Our proposal also provides more conservative estimates of the Value-at-Risk and the Expected Shortfall.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
非传统贷款中的样本选择偏差:基于 copula 的不平衡数据处理方法
针对非传统借贷渠道(如点对点(P2P)借贷平台)的信用评分模型通常只对已接受的申请人样本进行估算。这可能导致对风险驱动因素的估计存在偏差。使用拒绝推断技术可以解决这一问题,该技术将被拒绝的申请人的特征纳入模型中。由于被接受的申请人和违约记录的数量较少,信用评分模型通常会面临类别不平衡的问题。然而,以往关于信用评分样本选择模型的文献并没有解决类别不平衡问题。为了填补这一空白,我们将二元数据的广义极值回归模型(GEV)扩展到了样本选择框架。我们将 GEV 分布的量化函数视为选择方程和结果方程中的链接函数。我们使用 copula 函数对两个方程之间的依赖结构进行建模,以提高其灵活性。这一建议被称为样本选择广义极值(SSGEV)模型,并在 R 软件包 BivGEV 中实现。我们将该模型应用于 Lending Club 提供的综合数据集,结果表明,仅从被接受的 P2P 申请人身上获得的参数估计是有偏差的,且与文献一致。SSGEV 模型比使用单变量方法或样本选择概率模型获得的预测准确性更高。我们的建议还提供了更保守的风险价值和预期缺口估计值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Socio-economic Planning Sciences
Socio-economic Planning Sciences OPERATIONS RESEARCH & MANAGEMENT SCIENCE-
CiteScore
9.40
自引率
13.10%
发文量
294
审稿时长
58 days
期刊介绍: Studies directed toward the more effective utilization of existing resources, e.g. mathematical programming models of health care delivery systems with relevance to more effective program design; systems analysis of fire outbreaks and its relevance to the location of fire stations; statistical analysis of the efficiency of a developing country economy or industry. Studies relating to the interaction of various segments of society and technology, e.g. the effects of government health policies on the utilization and design of hospital facilities; the relationship between housing density and the demands on public transportation or other service facilities: patterns and implications of urban development and air or water pollution. Studies devoted to the anticipations of and response to future needs for social, health and other human services, e.g. the relationship between industrial growth and the development of educational resources in affected areas; investigation of future demands for material and child health resources in a developing country; design of effective recycling in an urban setting.
期刊最新文献
Low-carbon route optimization model for multimodal freight transport considering value and time attributes Measurement and comparison of different dimensions of renewable energy policy implementation in the agricultural sector A Kansei engineering-based decision-making method for offline medical service quality evaluation with multidimensional attributes Investigating water sustainability towards indicators: An empirical illustration using country-level data What about QR codes on wine bottles? A statistical analysis of technology's influence on purchase decisions among Italian wine consumers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1