竞争预测验证:用功率发散统计量检验“更好”的频率

IF 3 3区 地球科学 Q2 METEOROLOGY & ATMOSPHERIC SCIENCES Weather and Forecasting Pub Date : 2023-06-21 DOI:10.1175/waf-d-22-0201.1
E. Gilleland, D. Muñoz‐Esparza, David D. Turner
{"title":"竞争预测验证:用功率发散统计量检验“更好”的频率","authors":"E. Gilleland, D. Muñoz‐Esparza, David D. Turner","doi":"10.1175/waf-d-22-0201.1","DOIUrl":null,"url":null,"abstract":"\nWhen testing hypotheses about which of two competing models is better, say A and B, the difference is often not significant. An alternative, complementary approach, is to measure how often model A is better than model B regardless of how slight or large the difference. The hypothesis concerns whether or not the percentage of time that model A is better than model B is larger than 50%. One generalized test statistic that can be used is the power-divergence test, which encompasses many familiar goodness-of-fit test statistics, such as the loglikelihood-ratio and Pearson X2 tests. Theoretical results justify using the distribution for the entire family of test statistics, where k is the number of categories. However, these results assume that the underlying data are independent and identically distributed; which is often violated. Empirical results demonstrate that the reduction to two categories (i.e., model A is better than model B v. model B is better than A) results in a test that is reasonably robust to even severe departures from temporal independence, as well as contemporaneous correlation. The test is demonstrated on two different example verification sets: 6-h forecasts of eddy dissipation rate (m2/3s−1) from two versions of the Graphical Turbulence Guidence model and for 12-hour forecasts of 2-m temperature (°C) and 10-m wind speed (ms−1) from two versions of the High-Resolution Rapid Refresh model. The novelty of this paper is in demonstrating the utility of the power-divergence statistic in the face of temporally dependent data, as well as the emphasis on testing for the “frequency-of-better” alongside more traditional measures.","PeriodicalId":49369,"journal":{"name":"Weather and Forecasting","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Competing Forecast Verification: Using the Power-Divergence Statistic for Testing the Frequency of “Better”\",\"authors\":\"E. Gilleland, D. Muñoz‐Esparza, David D. Turner\",\"doi\":\"10.1175/waf-d-22-0201.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\nWhen testing hypotheses about which of two competing models is better, say A and B, the difference is often not significant. An alternative, complementary approach, is to measure how often model A is better than model B regardless of how slight or large the difference. The hypothesis concerns whether or not the percentage of time that model A is better than model B is larger than 50%. One generalized test statistic that can be used is the power-divergence test, which encompasses many familiar goodness-of-fit test statistics, such as the loglikelihood-ratio and Pearson X2 tests. Theoretical results justify using the distribution for the entire family of test statistics, where k is the number of categories. However, these results assume that the underlying data are independent and identically distributed; which is often violated. Empirical results demonstrate that the reduction to two categories (i.e., model A is better than model B v. model B is better than A) results in a test that is reasonably robust to even severe departures from temporal independence, as well as contemporaneous correlation. The test is demonstrated on two different example verification sets: 6-h forecasts of eddy dissipation rate (m2/3s−1) from two versions of the Graphical Turbulence Guidence model and for 12-hour forecasts of 2-m temperature (°C) and 10-m wind speed (ms−1) from two versions of the High-Resolution Rapid Refresh model. The novelty of this paper is in demonstrating the utility of the power-divergence statistic in the face of temporally dependent data, as well as the emphasis on testing for the “frequency-of-better” alongside more traditional measures.\",\"PeriodicalId\":49369,\"journal\":{\"name\":\"Weather and Forecasting\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Weather and Forecasting\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1175/waf-d-22-0201.1\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"METEOROLOGY & ATMOSPHERIC SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Weather and Forecasting","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1175/waf-d-22-0201.1","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

当测试关于两个相互竞争的模型中哪一个更好的假设时,比如A和B,差异通常并不显著。另一种补充的方法是测量模型A比模型B好多少次,而不管差异有多大。这个假设关注的是模型A优于模型B的时间百分比是否大于50%。可以使用的一种广义检验统计量是功率发散检验,它包含许多熟悉的拟合优度检验统计量,如对数似然比和皮尔逊X2检验。理论结果证明对整个检验统计量族使用分布是合理的,其中k是类别的数量。然而,这些结果假设底层数据是独立且均匀分布的;这是经常被违反的。实证结果表明,减少到两个类别(即,模型A比模型B好,模型B比模型A好)导致的测试对甚至严重偏离时间独立性以及同期相关性都相当稳健。该测试在两个不同的示例验证集上进行了演示:来自两个版本的图形湍流指导模型的6小时涡流耗散率(m2/3s−1)预报,以及来自两个版本的高分辨率快速刷新模型的12小时2米温度(°C)和10米风速(ms−1)预报。本文的新颖之处在于展示了功率散度统计在面对时间相关数据时的效用,以及强调测试“更好的频率”以及更传统的测量方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Competing Forecast Verification: Using the Power-Divergence Statistic for Testing the Frequency of “Better”
When testing hypotheses about which of two competing models is better, say A and B, the difference is often not significant. An alternative, complementary approach, is to measure how often model A is better than model B regardless of how slight or large the difference. The hypothesis concerns whether or not the percentage of time that model A is better than model B is larger than 50%. One generalized test statistic that can be used is the power-divergence test, which encompasses many familiar goodness-of-fit test statistics, such as the loglikelihood-ratio and Pearson X2 tests. Theoretical results justify using the distribution for the entire family of test statistics, where k is the number of categories. However, these results assume that the underlying data are independent and identically distributed; which is often violated. Empirical results demonstrate that the reduction to two categories (i.e., model A is better than model B v. model B is better than A) results in a test that is reasonably robust to even severe departures from temporal independence, as well as contemporaneous correlation. The test is demonstrated on two different example verification sets: 6-h forecasts of eddy dissipation rate (m2/3s−1) from two versions of the Graphical Turbulence Guidence model and for 12-hour forecasts of 2-m temperature (°C) and 10-m wind speed (ms−1) from two versions of the High-Resolution Rapid Refresh model. The novelty of this paper is in demonstrating the utility of the power-divergence statistic in the face of temporally dependent data, as well as the emphasis on testing for the “frequency-of-better” alongside more traditional measures.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Weather and Forecasting
Weather and Forecasting 地学-气象与大气科学
CiteScore
5.20
自引率
17.20%
发文量
131
审稿时长
6-12 weeks
期刊介绍: Weather and Forecasting (WAF) (ISSN: 0882-8156; eISSN: 1520-0434) publishes research that is relevant to operational forecasting. This includes papers on significant weather events, forecasting techniques, forecast verification, model parameterizations, data assimilation, model ensembles, statistical postprocessing techniques, the transfer of research results to the forecasting community, and the societal use and value of forecasts. The scope of WAF includes research relevant to forecast lead times ranging from short-term “nowcasts” through seasonal time scales out to approximately two years.
期刊最新文献
The Impact of Analysis Correction-based Additive Inflation on subseasonal tropical prediction in the Navy Earth System Prediction Capability Comparison of Clustering Approaches in a Multi-Model Ensemble for U.S. East Coast Cold Season Extratropical Cyclones Collaborative Exploration of Storm-Scale Probabilistic Guidance for NWS Forecast Operations Verification of the Global Forecast System, North American Mesoscale Forecast System, and High-Resolution Rapid Refresh Model Near-Surface Forecasts by use of the New York State Mesonet The influence of time varying sea-ice concentration on Antarctic and Southern Ocean numerical weather prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1