Using out-of-sample Cox–Snell residuals in time-to-event forecasting

IF 0.5 Q4 BUSINESS Biznes Informatika-Business Informatics Pub Date : 2021-03-31 DOI:10.17323/2587-814X.2021.1.7.18

E. Rumyantseva, K. Furmanov

{"title":"Using out-of-sample Cox–Snell residuals in time-to-event forecasting","authors":"E. Rumyantseva, K. Furmanov","doi":"10.17323/2587-814X.2021.1.7.18","DOIUrl":null,"url":null,"abstract":"The problem of assessing out-of-sample forecasting performance of event-history models is considered. Time-to-event data are usually incomplete because the event of interest can happen outside the period of observation or not happen at all. In this case, only the shortest possible time is observed and the data are right censored. Traditional accuracy measures like mean absolute or mean squared error cannot be applied directly to censored data, because forecasting errors also remain unobserved. Instead of mean error measures, researchers use rank correlation coefficients: concordance indices by Harrell and Uno and Somers’ Delta. These measures characterize not the distance between the actual and predicted values but the agreement between orderings of predicted and observed times-to-event. Hence, they take almost “ideal” values even in presence of substantial forecasting bias. Another drawback of using correlation measures when selecting a forecasting model is undesirable reduction of a forecast to a point estimate of predicted value. It is rarely possible to predict the timing of an event precisely, and it is reasonable to consider the forecast not as a point estimate but as an estimate of the whole distribution of the variable of interest. The article proposes computing Cox–Snell residuals for the test or validation dataset as a complement to rank correlation coefficients in model selection. Cox–Snell residuals for the correctly specified model are known to have unit exponential distribution, and that allows comparison of the observed out-of-sample performance of a forecasting model to the ideal case. The comparison can be done by plotting the estimate of integrated hazard function of residuals or by calculating the Kolmogorov distance between the observed and the ideal distribution of residuals. The proposed approach is illustrated with an example of selecting a forecasting model for the timing of mortgage termination. BUSINESS INFORMATICS Vol. 15 No 1 – 2021","PeriodicalId":41920,"journal":{"name":"Biznes Informatika-Business Informatics","volume":"15 1","pages":"7-18"},"PeriodicalIF":0.5000,"publicationDate":"2021-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biznes Informatika-Business Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17323/2587-814X.2021.1.7.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BUSINESS","Score":null,"Total":0}

引用次数: 1

Abstract

The problem of assessing out-of-sample forecasting performance of event-history models is considered. Time-to-event data are usually incomplete because the event of interest can happen outside the period of observation or not happen at all. In this case, only the shortest possible time is observed and the data are right censored. Traditional accuracy measures like mean absolute or mean squared error cannot be applied directly to censored data, because forecasting errors also remain unobserved. Instead of mean error measures, researchers use rank correlation coefficients: concordance indices by Harrell and Uno and Somers’ Delta. These measures characterize not the distance between the actual and predicted values but the agreement between orderings of predicted and observed times-to-event. Hence, they take almost “ideal” values even in presence of substantial forecasting bias. Another drawback of using correlation measures when selecting a forecasting model is undesirable reduction of a forecast to a point estimate of predicted value. It is rarely possible to predict the timing of an event precisely, and it is reasonable to consider the forecast not as a point estimate but as an estimate of the whole distribution of the variable of interest. The article proposes computing Cox–Snell residuals for the test or validation dataset as a complement to rank correlation coefficients in model selection. Cox–Snell residuals for the correctly specified model are known to have unit exponential distribution, and that allows comparison of the observed out-of-sample performance of a forecasting model to the ideal case. The comparison can be done by plotting the estimate of integrated hazard function of residuals or by calculating the Kolmogorov distance between the observed and the ideal distribution of residuals. The proposed approach is illustrated with an example of selecting a forecasting model for the timing of mortgage termination. BUSINESS INFORMATICS Vol. 15 No 1 – 2021

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在事件预测中使用样本外Cox–Snell残差

考虑了事件历史模型的样本外预测性能评估问题。时间到事件的数据通常是不完整的，因为感兴趣的事件可能发生在观察期之外，也可能根本没有发生。在这种情况下，只观察到尽可能短的时间，并对数据进行正确的审查。传统的精度测量，如平均绝对误差或均方误差，不能直接应用于审查数据，因为预测误差也无法观测到。研究人员使用的不是平均误差测量，而是秩相关系数：Harrell和Uno以及Somers’Delta的一致性指数。这些度量的特征不是实际值和预测值之间的距离，而是预测时间和观测时间到事件的顺序之间的一致性。因此，即使在存在大量预测偏差的情况下，它们也会取几乎“理想”的值。在选择预测模型时使用相关性度量的另一个缺点是不希望将预测减少到预测值的点估计。很少有可能准确预测事件的时间，而且不将预测视为点估计，而是将其视为对感兴趣变量的整个分布的估计是合理的。本文建议计算测试或验证数据集的Cox-Snell残差，作为模型选择中对相关系数排序的补充。已知正确指定模型的Cox–Snell残差具有单位指数分布，这允许将预测模型的观测样本外性能与理想情况进行比较。可以通过绘制残差综合危险函数的估计值或通过计算观测到的残差与理想残差分布之间的Kolmogorov距离来进行比较。通过选择抵押贷款终止时间预测模型的例子说明了所提出的方法。商业资讯第15卷第1期-2021

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Biznes Informatika-Business Informatics BUSINESS-

自引率

33.30%

发文量