Exploring the Efficacy of Statistical and Deep Learning Methods for Large Spatial Datasets: A Case Study

IF 16.4 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Accounts of Chemical Research Pub Date : 2024-02-08 DOI:10.1007/s13253-024-00602-4
Arnab Hazra, Pratik Nag, Rishikesh Yadav, Ying Sun
{"title":"Exploring the Efficacy of Statistical and Deep Learning Methods for Large Spatial Datasets: A Case Study","authors":"Arnab Hazra, Pratik Nag, Rishikesh Yadav, Ying Sun","doi":"10.1007/s13253-024-00602-4","DOIUrl":null,"url":null,"abstract":"<p>Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the <span>R</span> package <span>GpGp</span>, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation—two things that are necessary for the competition, we developed additional <span>R</span> functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured the first position in two out of four sub-competitions and the second position in the other two, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.</p>","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s13253-024-00602-4","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the R package GpGp, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation—two things that are necessary for the competition, we developed additional R functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured the first position in two out of four sub-competitions and the second position in the other two, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
探索大型空间数据集的统计和深度学习方法的有效性:案例研究
由于计算和存储成本高昂,日益庞大和复杂的空间数据集带来了巨大的推理挑战。我们的研究是受 KAUST 2023 年大型空间数据集竞赛的启发,该竞赛要求参赛者估算空间协方差相关参数并预测测试点的值以及不确定性估计值。我们通过交叉验证比较了各种统计和深度学习方法,最终选择了 Vecchia 近似技术进行模型拟合。R 软件包 GpGp 缺乏对零均值高斯过程拟合和直接不确定性估计的支持--而这两点正是比赛所必需的,为了克服这一限制,我们开发了额外的 R 函数。此外,我们还实现了某些基于子采样的近似和参数平滑,以处理估计器的倾斜采样分布。我们的团队 DesiBoys 在四项分赛中有两项获得第一名,另外两项获得第二名,这验证了我们提出的策略的有效性。此外,我们还将评估扩展到了一个大型真实空间卫星可降水总量数据集,并在此基础上使用多种诊断方法比较了不同模型的预测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Accounts of Chemical Research
Accounts of Chemical Research 化学-化学综合
CiteScore
31.40
自引率
1.10%
发文量
312
审稿时长
2 months
期刊介绍: Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance. Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.
期刊最新文献
Management of Cholesteatoma: Hearing Rehabilitation. Congenital Cholesteatoma. Evaluation of Cholesteatoma. Management of Cholesteatoma: Extension Beyond Middle Ear/Mastoid. Recidivism and Recurrence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1