Flexible basis representations for modeling large non-Gaussian spatial data

IF 2.1 2区 数学 Q3 GEOSCIENCES, MULTIDISCIPLINARY Spatial Statistics Pub Date : 2024-08-01 DOI:10.1016/j.spasta.2024.100841
Remy MacDonald, Benjamin Seiyon Lee
{"title":"Flexible basis representations for modeling large non-Gaussian spatial data","authors":"Remy MacDonald,&nbsp;Benjamin Seiyon Lee","doi":"10.1016/j.spasta.2024.100841","DOIUrl":null,"url":null,"abstract":"<div><p>Nonstationary and non-Gaussian spatial data are common in various fields, including ecology (e.g., counts of animal species), epidemiology (e.g., disease incidence counts in susceptible regions), and environmental science (e.g., remotely-sensed satellite imagery). Due to modern data collection methods, the size of these datasets have grown considerably. Spatial generalized linear mixed models (SGLMMs) are a flexible class of models used to model nonstationary and non-Gaussian datasets. Despite their utility, SGLMMs can be computationally prohibitive for even moderately large datasets (e.g., 5000 to 100,000 observed locations). To circumvent this issue, past studies have embedded nested radial basis functions into the SGLMM. However, two crucial specifications (knot placement and bandwidth parameters), which directly affect model performance, are typically fixed prior to model-fitting. We propose a novel approach to model large nonstationary and non-Gaussian spatial datasets using adaptive radial basis functions. Our approach: (1) partitions the spatial domain into subregions; (2) employs reversible-jump Markov chain Monte Carlo (RJMCMC) to infer the number and location of the knots within each partition; and (3) models the latent spatial surface using partition-varying and adaptive basis functions. Through an extensive simulation study, we show that our approach provides more accurate predictions than competing methods while preserving computational efficiency. We demonstrate our approach on two environmental datasets - incidences of plant species and counts of bird species in the United States.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spatial Statistics","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211675324000320","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Nonstationary and non-Gaussian spatial data are common in various fields, including ecology (e.g., counts of animal species), epidemiology (e.g., disease incidence counts in susceptible regions), and environmental science (e.g., remotely-sensed satellite imagery). Due to modern data collection methods, the size of these datasets have grown considerably. Spatial generalized linear mixed models (SGLMMs) are a flexible class of models used to model nonstationary and non-Gaussian datasets. Despite their utility, SGLMMs can be computationally prohibitive for even moderately large datasets (e.g., 5000 to 100,000 observed locations). To circumvent this issue, past studies have embedded nested radial basis functions into the SGLMM. However, two crucial specifications (knot placement and bandwidth parameters), which directly affect model performance, are typically fixed prior to model-fitting. We propose a novel approach to model large nonstationary and non-Gaussian spatial datasets using adaptive radial basis functions. Our approach: (1) partitions the spatial domain into subregions; (2) employs reversible-jump Markov chain Monte Carlo (RJMCMC) to infer the number and location of the knots within each partition; and (3) models the latent spatial surface using partition-varying and adaptive basis functions. Through an extensive simulation study, we show that our approach provides more accurate predictions than competing methods while preserving computational efficiency. We demonstrate our approach on two environmental datasets - incidences of plant species and counts of bird species in the United States.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为大型非高斯空间数据建模的灵活基础表示法
非平稳和非高斯空间数据常见于各个领域,包括生态学(如动物物种计数)、流行病学(如易感地区的疾病发病率计数)和环境科学(如遥感卫星图像)。由于采用了现代数据收集方法,这些数据集的规模已大幅扩大。空间广义线性混合模型(SGLMM)是一类灵活的模型,用于对非平稳和非高斯数据集进行建模。尽管空间广义线性混合模型非常有用,但对于中等规模的数据集(如 5000 到 100000 个观测地点)来说,其计算量也可能过大。为了规避这一问题,过去的研究将嵌套径向基函数嵌入到 SGLMM 中。然而,直接影响模型性能的两个关键参数(节点位置和带宽参数)在模型拟合之前通常是固定不变的。我们提出了一种使用自适应径向基函数对大型非平稳和非高斯空间数据集进行建模的新方法。我们的方法:(1) 将空间域划分为子区域;(2) 采用可逆跳转马尔可夫链蒙特卡罗(RJMCMC)来推断每个分区内节点的数量和位置;(3) 使用分区变化和自适应基函数对潜在空间表面进行建模。通过广泛的模拟研究,我们证明了我们的方法在保持计算效率的同时,比其他竞争方法提供了更准确的预测。我们在两个环境数据集--美国植物物种发生率和鸟类物种计数--上演示了我们的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Spatial Statistics
Spatial Statistics GEOSCIENCES, MULTIDISCIPLINARY-MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
CiteScore
4.00
自引率
21.70%
发文量
89
审稿时长
55 days
期刊介绍: Spatial Statistics publishes articles on the theory and application of spatial and spatio-temporal statistics. It favours manuscripts that present theory generated by new applications, or in which new theory is applied to an important practical case. A purely theoretical study will only rarely be accepted. Pure case studies without methodological development are not acceptable for publication. Spatial statistics concerns the quantitative analysis of spatial and spatio-temporal data, including their statistical dependencies, accuracy and uncertainties. Methodology for spatial statistics is typically found in probability theory, stochastic modelling and mathematical statistics as well as in information science. Spatial statistics is used in mapping, assessing spatial data quality, sampling design optimisation, modelling of dependence structures, and drawing of valid inference from a limited set of spatio-temporal data.
期刊最新文献
Uncovering hidden alignments in two-dimensional point fields Spatio-temporal data fusion for the analysis of in situ and remote sensing data using the INLA-SPDE approach Exploiting nearest-neighbour maps for estimating the variance of sample mean in equal-probability systematic sampling of spatial populations Variable selection of nonparametric spatial autoregressive models via deep learning Estimation and inference of multi-effect generalized geographically and temporally weighted regression models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1