稳定机器学习模型与年龄-时期-队列输入评分和压力测试

IF 1.3 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Frontiers in Applied Mathematics and Statistics Pub Date : 2023-06-08 DOI:10.3389/fams.2023.1195810
J. Breeden, Ye. A. Leonova
{"title":"稳定机器学习模型与年龄-时期-队列输入评分和压力测试","authors":"J. Breeden, Ye. A. Leonova","doi":"10.3389/fams.2023.1195810","DOIUrl":null,"url":null,"abstract":"Machine learning models have been used extensively for credit scoring, but the architectures employed suffer from a significant loss in accuracy out-of-sample and out-of-time. Further, the most common architectures do not effectively integrate economic scenarios to enable stress testing, cash flow, or yield estimation. The present research demonstrates that providing lifecycle and environment functions from Age-Period-Cohort analysis can significantly improve out-of-sample and out-of-time performance as well as enabling the model's use in both scoring and stress testing applications. This method is demonstrated for behavior scoring where account delinquency is one of the provided inputs, because behavior scoring has historically presented the most difficulties for combining credit scoring and stress testing. Our method works well in both origination and behavior scoring. The results are also compared to multihorizon survival models, which share the same architectural design with Age-Period-Cohort inputs and coefficients that vary with forecast horizon, but using a logistic regression estimation of the model. The analysis was performed on 30-year prime conforming US mortgage data. Nonlinear problems involving large amounts of alternate data are best at highlighting the advantages of machine learning. Data from Fannie Mae and Freddie Mac is not such a test case, but it serves the purpose of comparing these methods with and without Age-Period-Cohort inputs. In order to make a fair comparison, all models are given a panel structure where each account is observed monthly to determine default or non-default.","PeriodicalId":36662,"journal":{"name":"Frontiers in Applied Mathematics and Statistics","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stabilizing machine learning models with Age-Period-Cohort inputs for scoring and stress testing\",\"authors\":\"J. Breeden, Ye. A. Leonova\",\"doi\":\"10.3389/fams.2023.1195810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning models have been used extensively for credit scoring, but the architectures employed suffer from a significant loss in accuracy out-of-sample and out-of-time. Further, the most common architectures do not effectively integrate economic scenarios to enable stress testing, cash flow, or yield estimation. The present research demonstrates that providing lifecycle and environment functions from Age-Period-Cohort analysis can significantly improve out-of-sample and out-of-time performance as well as enabling the model's use in both scoring and stress testing applications. This method is demonstrated for behavior scoring where account delinquency is one of the provided inputs, because behavior scoring has historically presented the most difficulties for combining credit scoring and stress testing. Our method works well in both origination and behavior scoring. The results are also compared to multihorizon survival models, which share the same architectural design with Age-Period-Cohort inputs and coefficients that vary with forecast horizon, but using a logistic regression estimation of the model. The analysis was performed on 30-year prime conforming US mortgage data. Nonlinear problems involving large amounts of alternate data are best at highlighting the advantages of machine learning. Data from Fannie Mae and Freddie Mac is not such a test case, but it serves the purpose of comparing these methods with and without Age-Period-Cohort inputs. In order to make a fair comparison, all models are given a panel structure where each account is observed monthly to determine default or non-default.\",\"PeriodicalId\":36662,\"journal\":{\"name\":\"Frontiers in Applied Mathematics and Statistics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2023-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Applied Mathematics and Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fams.2023.1195810\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Applied Mathematics and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fams.2023.1195810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

机器学习模型已被广泛用于信用评分,但所采用的架构在样本外和时间外的准确性方面存在重大损失。此外,最常见的体系结构不能有效地集成经济场景来支持压力测试、现金流或收益估计。目前的研究表明,从年龄-时期-队列分析中提供生命周期和环境功能可以显着提高样本外和时间外的性能,并使模型在评分和压力测试应用中使用。该方法用于行为评分,其中帐户拖欠是提供的输入之一,因为行为评分历来是信用评分和压力测试相结合的最大困难。我们的方法在起源和行为评分中都很有效。结果还与多水平生存模型进行了比较,多水平生存模型具有相同的结构设计,具有年龄-时期-队列输入和随预测水平变化的系数,但使用了模型的逻辑回归估计。该分析是对美国30年期优质合格抵押贷款数据进行的。涉及大量交替数据的非线性问题最能突出机器学习的优势。房利美(Fannie Mae)和房地美(Freddie Mac)的数据不是这样的测试案例,但它的目的是比较这些方法是否有年龄-时期-队列输入。为了进行公平的比较,所有模型都采用面板结构,每个帐户每月观察一次,以确定默认或非默认。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Stabilizing machine learning models with Age-Period-Cohort inputs for scoring and stress testing
Machine learning models have been used extensively for credit scoring, but the architectures employed suffer from a significant loss in accuracy out-of-sample and out-of-time. Further, the most common architectures do not effectively integrate economic scenarios to enable stress testing, cash flow, or yield estimation. The present research demonstrates that providing lifecycle and environment functions from Age-Period-Cohort analysis can significantly improve out-of-sample and out-of-time performance as well as enabling the model's use in both scoring and stress testing applications. This method is demonstrated for behavior scoring where account delinquency is one of the provided inputs, because behavior scoring has historically presented the most difficulties for combining credit scoring and stress testing. Our method works well in both origination and behavior scoring. The results are also compared to multihorizon survival models, which share the same architectural design with Age-Period-Cohort inputs and coefficients that vary with forecast horizon, but using a logistic regression estimation of the model. The analysis was performed on 30-year prime conforming US mortgage data. Nonlinear problems involving large amounts of alternate data are best at highlighting the advantages of machine learning. Data from Fannie Mae and Freddie Mac is not such a test case, but it serves the purpose of comparing these methods with and without Age-Period-Cohort inputs. In order to make a fair comparison, all models are given a panel structure where each account is observed monthly to determine default or non-default.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Frontiers in Applied Mathematics and Statistics
Frontiers in Applied Mathematics and Statistics Mathematics-Statistics and Probability
CiteScore
1.90
自引率
7.10%
发文量
117
审稿时长
14 weeks
期刊最新文献
Third-degree B-spline collocation method for singularly perturbed time delay parabolic problem with two parameters Item response theory to discriminate COVID-19 knowledge and attitudes among university students Editorial: Justified modeling frameworks and novel interpretations of ecological and epidemiological systems Pneumonia and COVID-19 co-infection modeling with optimal control analysis Enhanced corn seed disease classification: leveraging MobileNetV2 with feature augmentation and transfer learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1