探索大型空间数据集的统计和深度学习方法的有效性:案例研究

Arnab Hazra, Pratik Nag, Rishikesh Yadav, Ying Sun
{"title":"探索大型空间数据集的统计和深度学习方法的有效性:案例研究","authors":"Arnab Hazra, Pratik Nag, Rishikesh Yadav, Ying Sun","doi":"10.1007/s13253-024-00602-4","DOIUrl":null,"url":null,"abstract":"<p>Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the <span>R</span> package <span>GpGp</span>, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation—two things that are necessary for the competition, we developed additional <span>R</span> functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured the first position in two out of four sub-competitions and the second position in the other two, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.</p>","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":"527 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring the Efficacy of Statistical and Deep Learning Methods for Large Spatial Datasets: A Case Study\",\"authors\":\"Arnab Hazra, Pratik Nag, Rishikesh Yadav, Ying Sun\",\"doi\":\"10.1007/s13253-024-00602-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the <span>R</span> package <span>GpGp</span>, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation—two things that are necessary for the competition, we developed additional <span>R</span> functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured the first position in two out of four sub-competitions and the second position in the other two, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.</p>\",\"PeriodicalId\":56336,\"journal\":{\"name\":\"Journal of Agricultural Biological and Environmental Statistics\",\"volume\":\"527 1\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Agricultural Biological and Environmental Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s13253-024-00602-4\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Agricultural Biological and Environmental Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s13253-024-00602-4","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

由于计算和存储成本高昂,日益庞大和复杂的空间数据集带来了巨大的推理挑战。我们的研究是受 KAUST 2023 年大型空间数据集竞赛的启发,该竞赛要求参赛者估算空间协方差相关参数并预测测试点的值以及不确定性估计值。我们通过交叉验证比较了各种统计和深度学习方法,最终选择了 Vecchia 近似技术进行模型拟合。R 软件包 GpGp 缺乏对零均值高斯过程拟合和直接不确定性估计的支持--而这两点正是比赛所必需的,为了克服这一限制,我们开发了额外的 R 函数。此外,我们还实现了某些基于子采样的近似和参数平滑,以处理估计器的倾斜采样分布。我们的团队 DesiBoys 在四项分赛中有两项获得第一名,另外两项获得第二名,这验证了我们提出的策略的有效性。此外,我们还将评估扩展到了一个大型真实空间卫星可降水总量数据集,并在此基础上使用多种诊断方法比较了不同模型的预测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Exploring the Efficacy of Statistical and Deep Learning Methods for Large Spatial Datasets: A Case Study

Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the R package GpGp, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation—two things that are necessary for the competition, we developed additional R functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured the first position in two out of four sub-competitions and the second position in the other two, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.70
自引率
7.10%
发文量
38
审稿时长
>12 weeks
期刊介绍: The Journal of Agricultural, Biological and Environmental Statistics (JABES) publishes papers that introduce new statistical methods to solve practical problems in the agricultural sciences, the biological sciences (including biotechnology), and the environmental sciences (including those dealing with natural resources). Papers that apply existing methods in a novel context are also encouraged. Interdisciplinary papers and papers that illustrate the application of new and important statistical methods using real data are strongly encouraged. The journal does not normally publish papers that have a primary focus on human genetics, human health, or medical statistics.
期刊最新文献
Algorithms for Fitting the Space-Time ETAS Model to Earthquake Catalog Data: A Comparative Study Bayesian Approaches to Proxy Uncertainty Quantification in Paleoecology: A Mathematical Justification and Practical Integration Stopping Rule Sampling to Monitor and Protect Endangered Species Environmental Loss Assessment via Functional Outlier Detection of Transformed Biodiversity Profiles Expectations of Linear and Nonlinear Hawkes Processes Using a Field-Theoretical Approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1