A relationship between the incremental values of area under the ROC curve and of area under the precision-recall curve.

Qian M Zhou, Lu Zhe, Russell J Brooke, Melissa M Hudson, Yan Yuan
{"title":"A relationship between the incremental values of area under the ROC curve and of area under the precision-recall curve.","authors":"Qian M Zhou,&nbsp;Lu Zhe,&nbsp;Russell J Brooke,&nbsp;Melissa M Hudson,&nbsp;Yan Yuan","doi":"10.1186/s41512-021-00102-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Incremental value (IncV) evaluates the performance change between an existing risk model and a new model. Different IncV metrics do not always agree with each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a slightly lower area under the receiver operating characteristic curve (AUC) but increases the area under the precision-recall curve (AP) by 48%. This phenomenon of disagreement is not uncommon, and can create confusion when assessing whether the added information improves the model prediction accuracy.</p><p><strong>Methods: </strong>In this article, we examine the analytical connections and differences between the AUC IncV (ΔAUC) and AP IncV (ΔAP). We also compare the true values of these two IncV metrics in a numerical study. Additionally, as both are semi-proper scoring rules, we compare them with a strictly proper scoring rule: the IncV of the scaled Brier score (ΔsBrS) in the numerical study.</p><p><strong>Results: </strong>We demonstrate that ΔAUC and ΔAP are both weighted averages of the changes (from the existing model to the new one) in separating the risk score distributions between events and non-events. However, ΔAP assigns heavier weights to the changes in higher-risk regions, whereas ΔAUC weights the changes equally. Due to this difference, the two IncV metrics can disagree, and the numerical study shows that their disagreement becomes more pronounced as the event rate decreases. In the numerical study, we also find that ΔAP has a wide range, from negative to positive, but the range of ΔAUC is much smaller. In addition, ΔAP and ΔsBrS are highly consistent, but ΔAUC is negatively correlated with ΔsBrS and ΔAP when the event rate is low.</p><p><strong>Conclusions: </strong>ΔAUC treats the wins and losses of a new risk model equally across different risk regions. When neither the existing or new model is the true model, this equality could attenuate a superior performance of the new model for a sub-region. In contrast, ΔAP accentuates the change in the prediction accuracy for higher-risk regions.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":" ","pages":"13"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s41512-021-00102-w","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and prognostic research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41512-021-00102-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Background: Incremental value (IncV) evaluates the performance change between an existing risk model and a new model. Different IncV metrics do not always agree with each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a slightly lower area under the receiver operating characteristic curve (AUC) but increases the area under the precision-recall curve (AP) by 48%. This phenomenon of disagreement is not uncommon, and can create confusion when assessing whether the added information improves the model prediction accuracy.

Methods: In this article, we examine the analytical connections and differences between the AUC IncV (ΔAUC) and AP IncV (ΔAP). We also compare the true values of these two IncV metrics in a numerical study. Additionally, as both are semi-proper scoring rules, we compare them with a strictly proper scoring rule: the IncV of the scaled Brier score (ΔsBrS) in the numerical study.

Results: We demonstrate that ΔAUC and ΔAP are both weighted averages of the changes (from the existing model to the new one) in separating the risk score distributions between events and non-events. However, ΔAP assigns heavier weights to the changes in higher-risk regions, whereas ΔAUC weights the changes equally. Due to this difference, the two IncV metrics can disagree, and the numerical study shows that their disagreement becomes more pronounced as the event rate decreases. In the numerical study, we also find that ΔAP has a wide range, from negative to positive, but the range of ΔAUC is much smaller. In addition, ΔAP and ΔsBrS are highly consistent, but ΔAUC is negatively correlated with ΔsBrS and ΔAP when the event rate is low.

Conclusions: ΔAUC treats the wins and losses of a new risk model equally across different risk regions. When neither the existing or new model is the true model, this equality could attenuate a superior performance of the new model for a sub-region. In contrast, ΔAP accentuates the change in the prediction accuracy for higher-risk regions.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ROC曲线下面积的增量值与精确召回曲线下面积的增量值之间的关系。
背景:增量值(IncV)是评价现有风险模型和新模型之间的绩效变化。不同的IncV参数并不总是一致的。例如,与处方剂量模型相比,用于预测急性卵巢衰竭的卵巢剂量模型在受试者工作特征曲线(AUC)下的面积略小,但在精确召回曲线(AP)下的面积增加了48%。这种不一致的现象并不罕见,并且在评估添加的信息是否提高模型预测准确性时可能会造成混淆。方法:在本文中,我们检查了AUC IncV (ΔAUC)和AP IncV (ΔAP)之间的分析联系和差异。我们还在数值研究中比较了这两个IncV指标的真实值。此外,由于两者都是半适当的评分规则,我们将它们与严格适当的评分规则进行比较:在数值研究中缩放Brier分数的IncV (ΔsBrS)。结果:我们证明ΔAUC和ΔAP都是分离事件和非事件之间风险评分分布的变化(从现有模型到新模型)的加权平均值。然而,ΔAP对高风险区域的变化赋予了更大的权重,而ΔAUC对这些变化的权重是相等的。由于这种差异,两个IncV指标可能不一致,数值研究表明,随着事件率的降低,它们的不一致变得更加明显。在数值研究中,我们也发现ΔAP的取值范围很广,从负到正,但ΔAUC的取值范围要小得多。此外,ΔAP和ΔsBrS具有高度的一致性,但当事件发生率较低时,ΔAUC与ΔsBrS和ΔAP呈负相关。结论:ΔAUC在不同风险区域平等对待新风险模型的得失。当现有模型和新模型都不是真正的模型时,这种等式会削弱新模型在子区域的优越性能。相比之下,ΔAP强调了高风险地区预测准确性的变化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
18 weeks
期刊最新文献
Risk prediction tools for pressure injury occurrence: an umbrella review of systematic reviews reporting model development and validation methods. Rehabilitation outcomes after comprehensive post-acute inpatient rehabilitation following moderate to severe acquired brain injury-study protocol for an overall prognosis study based on routinely collected health data. Validation of prognostic models predicting mortality or ICU admission in patients with COVID-19 in low- and middle-income countries: a global individual participant data meta-analysis. Reported prevalence and comparison of diagnostic approaches for Candida africana: a systematic review with meta-analysis. The relative data hungriness of unpenalized and penalized logistic regression and ensemble-based machine learning methods: the case of calibration.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1