加快空间变化系数模型的估算速度

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS ACS Applied Bio Materials Pub Date : 2024-07-02 DOI:10.1007/s10109-024-00442-3
Ghislain Geniaux
{"title":"加快空间变化系数模型的估算速度","authors":"Ghislain Geniaux","doi":"10.1007/s10109-024-00442-3","DOIUrl":null,"url":null,"abstract":"<p>Spatially varying coefficient models, such as GWR (Brunsdon et al. in Geogr Anal 28:281–298, 1996 and McMillen in J Urban Econ 40:100–124, 1996), find extensive applications across various fields, including housing markets, land use, population ecology, seismology, and mining research. These models are valuable for capturing the spatial heterogeneity of coefficient values. In many application areas, the continuous expansion of spatial data sample sizes, in terms of both volume and richness of explanatory variables, has given rise to new methodological challenges. The primary issues revolve around the time required to calculate each local coefficients and the memory requirements imposed for storing the large hat matrix (of size <span>\\(n \\times n\\)</span>) for parameter variance estimation. Researchers have explored various approaches to address these challenges (Harris et al. in Trans GIS 14:43–61, 2010, Pozdnoukhov and Kaiser in: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2011; Tran et al. in: 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), IEEE, 2016; Geniaux and Martinetti in Reg Sci Urban Econ 72:74–85, 2018; Li et al. in Int J Geogr Inf Sci 33:155–175, 2019; Murakami et al. in Ann Am Assoc Geogr 111:459–480, 2020). While the use of a subset of target points for local regressions has been extensively studied in nonparametric econometrics, its application within the context of GWR has been relatively unexplored. In this paper, we propose an original two-stage method designed to accelerate GWR computations. We select a subset of target points based on the spatial smoothing of residuals from a first-stage regression, conducting GWR solely on this subsample. Additionally, we propose an original approach for extrapolating coefficients to non-target points. In addition to using an effective sample of target points, we explore the computational gain provided by using truncated Gaussian kernel to create sparser matrices during computation. Our Monte Carlo experiments demonstrate that this method of target point selection outperforms methods based on point density or random selection. The results also reveal that using target points can reduce bias and root mean square error (RMSE) in estimating <span>\\(\\beta\\)</span> coefficients compared to traditional GWR, as it enables the selection of a more accurate bandwidth size. We demonstrate that our estimator is scalable and exhibits superior properties in this regard compared to the (Murakami et al. in Ann Am Assoc Geogr 111:459–480, 2020) estimator under two conditions: the use of a ratio of target points that provides satisfactory approximation of coefficients (10–20 % of locations) and an optimal bandwidth that remains within a reasonable neighborhood (<span>\\(&lt;\\,\\)</span>5000 neighbors). All the estimator of GWR with target pointsare now accessible in the R package <i>mgwrsar</i> for GWR and Mixed GWR with and without spatial autocorrelation, available on CRAN depository at https://CRAN.R-project.org/package=mgwrsar.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speeding up estimation of spatially varying coefficients models\",\"authors\":\"Ghislain Geniaux\",\"doi\":\"10.1007/s10109-024-00442-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Spatially varying coefficient models, such as GWR (Brunsdon et al. in Geogr Anal 28:281–298, 1996 and McMillen in J Urban Econ 40:100–124, 1996), find extensive applications across various fields, including housing markets, land use, population ecology, seismology, and mining research. These models are valuable for capturing the spatial heterogeneity of coefficient values. In many application areas, the continuous expansion of spatial data sample sizes, in terms of both volume and richness of explanatory variables, has given rise to new methodological challenges. The primary issues revolve around the time required to calculate each local coefficients and the memory requirements imposed for storing the large hat matrix (of size <span>\\\\(n \\\\times n\\\\)</span>) for parameter variance estimation. Researchers have explored various approaches to address these challenges (Harris et al. in Trans GIS 14:43–61, 2010, Pozdnoukhov and Kaiser in: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2011; Tran et al. in: 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), IEEE, 2016; Geniaux and Martinetti in Reg Sci Urban Econ 72:74–85, 2018; Li et al. in Int J Geogr Inf Sci 33:155–175, 2019; Murakami et al. in Ann Am Assoc Geogr 111:459–480, 2020). While the use of a subset of target points for local regressions has been extensively studied in nonparametric econometrics, its application within the context of GWR has been relatively unexplored. In this paper, we propose an original two-stage method designed to accelerate GWR computations. We select a subset of target points based on the spatial smoothing of residuals from a first-stage regression, conducting GWR solely on this subsample. Additionally, we propose an original approach for extrapolating coefficients to non-target points. In addition to using an effective sample of target points, we explore the computational gain provided by using truncated Gaussian kernel to create sparser matrices during computation. Our Monte Carlo experiments demonstrate that this method of target point selection outperforms methods based on point density or random selection. The results also reveal that using target points can reduce bias and root mean square error (RMSE) in estimating <span>\\\\(\\\\beta\\\\)</span> coefficients compared to traditional GWR, as it enables the selection of a more accurate bandwidth size. We demonstrate that our estimator is scalable and exhibits superior properties in this regard compared to the (Murakami et al. in Ann Am Assoc Geogr 111:459–480, 2020) estimator under two conditions: the use of a ratio of target points that provides satisfactory approximation of coefficients (10–20 % of locations) and an optimal bandwidth that remains within a reasonable neighborhood (<span>\\\\(&lt;\\\\,\\\\)</span>5000 neighbors). All the estimator of GWR with target pointsare now accessible in the R package <i>mgwrsar</i> for GWR and Mixed GWR with and without spatial autocorrelation, available on CRAN depository at https://CRAN.R-project.org/package=mgwrsar.</p>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1007/s10109-024-00442-3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10109-024-00442-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

摘要

空间变化系数模型,如 GWR(Brunsdon 等人在 Geogr Anal 28:281-298, 1996 年和 McMillen 在 J Urban Econ 40:100-124, 1996 年),广泛应用于各个领域,包括住房市场、土地利用、人口生态学、地震学和采矿研究。这些模型对于捕捉系数值的空间异质性很有价值。在许多应用领域,空间数据样本量的不断扩大,无论是在数量上还是在解释变量的丰富程度上,都带来了新的方法论挑战。主要问题围绕计算每个局部系数所需的时间,以及存储用于参数方差估计的大帽矩阵(大小为 n 次)所需的内存。研究人员探索了各种方法来应对这些挑战(Harris 等人,载于 Trans GIS 14:43-61, 2010 年;Pozdnoukhov 和 Kaiser,载于:第 19 届 ACM SIGSPATIAL 地理信息系统进展国际会议论文集,2011 年;Tran 等人在《2016 年第八届国际知识与应用大会》上的论文:2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), IEEE, 2016; Geniaux and Martinetti in Reg Sci Urban Econ 72:74-85, 2018; Li et al. in Int J Geogr Inf Sci 33:155-175, 2019; Murakami et al. in Ann Am Assoc Geogr 111:459-480, 2020)。虽然在非参数计量经济学中已经对使用目标点子集进行局部回归进行了广泛研究,但其在全球地理回归中的应用却相对较少。在本文中,我们提出了一种新颖的两阶段方法,旨在加速 GWR 计算。我们根据第一阶段回归的残差的空间平滑化选择目标点子集,仅在该子样本上进行 GWR。此外,我们还提出了一种将系数外推到非目标点的独创方法。除了使用有效的目标点样本外,我们还探索了在计算过程中使用截断高斯核创建稀疏矩阵所带来的计算增益。我们的蒙特卡罗实验证明,这种目标点选择方法优于基于点密度或随机选择的方法。结果还显示,与传统的 GWR 相比,使用目标点可以减少估计系数的偏差和均方根误差(RMSE),因为它可以选择更精确的带宽大小。我们证明,在两个条件下,我们的估计器是可扩展的,并且与(Murakami et al. in Ann Am Assoc Geogr 111:459-480, 2020)估计器相比,在这方面表现出更优越的特性:使用能提供令人满意的系数近似值的目标点比例(10%-20%的位置),以及保持在合理邻域(\(<\,\)5000个邻域)内的最佳带宽。带有目标点的 GWR 的所有估计方法现在都可以在 R 软件包 mgwrsar 中访问,该软件包用于 GWR 和带有或不带有空间自相关性的混合 GWR,可在 CRAN 存储库(https://CRAN.R-project.org/package=mgwrsar)中访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Speeding up estimation of spatially varying coefficients models

Spatially varying coefficient models, such as GWR (Brunsdon et al. in Geogr Anal 28:281–298, 1996 and McMillen in J Urban Econ 40:100–124, 1996), find extensive applications across various fields, including housing markets, land use, population ecology, seismology, and mining research. These models are valuable for capturing the spatial heterogeneity of coefficient values. In many application areas, the continuous expansion of spatial data sample sizes, in terms of both volume and richness of explanatory variables, has given rise to new methodological challenges. The primary issues revolve around the time required to calculate each local coefficients and the memory requirements imposed for storing the large hat matrix (of size \(n \times n\)) for parameter variance estimation. Researchers have explored various approaches to address these challenges (Harris et al. in Trans GIS 14:43–61, 2010, Pozdnoukhov and Kaiser in: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2011; Tran et al. in: 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), IEEE, 2016; Geniaux and Martinetti in Reg Sci Urban Econ 72:74–85, 2018; Li et al. in Int J Geogr Inf Sci 33:155–175, 2019; Murakami et al. in Ann Am Assoc Geogr 111:459–480, 2020). While the use of a subset of target points for local regressions has been extensively studied in nonparametric econometrics, its application within the context of GWR has been relatively unexplored. In this paper, we propose an original two-stage method designed to accelerate GWR computations. We select a subset of target points based on the spatial smoothing of residuals from a first-stage regression, conducting GWR solely on this subsample. Additionally, we propose an original approach for extrapolating coefficients to non-target points. In addition to using an effective sample of target points, we explore the computational gain provided by using truncated Gaussian kernel to create sparser matrices during computation. Our Monte Carlo experiments demonstrate that this method of target point selection outperforms methods based on point density or random selection. The results also reveal that using target points can reduce bias and root mean square error (RMSE) in estimating \(\beta\) coefficients compared to traditional GWR, as it enables the selection of a more accurate bandwidth size. We demonstrate that our estimator is scalable and exhibits superior properties in this regard compared to the (Murakami et al. in Ann Am Assoc Geogr 111:459–480, 2020) estimator under two conditions: the use of a ratio of target points that provides satisfactory approximation of coefficients (10–20 % of locations) and an optimal bandwidth that remains within a reasonable neighborhood (\(<\,\)5000 neighbors). All the estimator of GWR with target pointsare now accessible in the R package mgwrsar for GWR and Mixed GWR with and without spatial autocorrelation, available on CRAN depository at https://CRAN.R-project.org/package=mgwrsar.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
期刊最新文献
A Systematic Review of Sleep Disturbance in Idiopathic Intracranial Hypertension. Advancing Patient Education in Idiopathic Intracranial Hypertension: The Promise of Large Language Models. Anti-Myelin-Associated Glycoprotein Neuropathy: Recent Developments. Approach to Managing the Initial Presentation of Multiple Sclerosis: A Worldwide Practice Survey. Association Between LACE+ Index Risk Category and 90-Day Mortality After Stroke.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1