{"title":"Speeding up estimation of spatially varying coefficients models","authors":"Ghislain Geniaux","doi":"10.1007/s10109-024-00442-3","DOIUrl":null,"url":null,"abstract":"<p>Spatially varying coefficient models, such as GWR (Brunsdon et al. in Geogr Anal 28:281–298, 1996 and McMillen in J Urban Econ 40:100–124, 1996), find extensive applications across various fields, including housing markets, land use, population ecology, seismology, and mining research. These models are valuable for capturing the spatial heterogeneity of coefficient values. In many application areas, the continuous expansion of spatial data sample sizes, in terms of both volume and richness of explanatory variables, has given rise to new methodological challenges. The primary issues revolve around the time required to calculate each local coefficients and the memory requirements imposed for storing the large hat matrix (of size <span>\\(n \\times n\\)</span>) for parameter variance estimation. Researchers have explored various approaches to address these challenges (Harris et al. in Trans GIS 14:43–61, 2010, Pozdnoukhov and Kaiser in: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2011; Tran et al. in: 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), IEEE, 2016; Geniaux and Martinetti in Reg Sci Urban Econ 72:74–85, 2018; Li et al. in Int J Geogr Inf Sci 33:155–175, 2019; Murakami et al. in Ann Am Assoc Geogr 111:459–480, 2020). While the use of a subset of target points for local regressions has been extensively studied in nonparametric econometrics, its application within the context of GWR has been relatively unexplored. In this paper, we propose an original two-stage method designed to accelerate GWR computations. We select a subset of target points based on the spatial smoothing of residuals from a first-stage regression, conducting GWR solely on this subsample. Additionally, we propose an original approach for extrapolating coefficients to non-target points. In addition to using an effective sample of target points, we explore the computational gain provided by using truncated Gaussian kernel to create sparser matrices during computation. Our Monte Carlo experiments demonstrate that this method of target point selection outperforms methods based on point density or random selection. The results also reveal that using target points can reduce bias and root mean square error (RMSE) in estimating <span>\\(\\beta\\)</span> coefficients compared to traditional GWR, as it enables the selection of a more accurate bandwidth size. We demonstrate that our estimator is scalable and exhibits superior properties in this regard compared to the (Murakami et al. in Ann Am Assoc Geogr 111:459–480, 2020) estimator under two conditions: the use of a ratio of target points that provides satisfactory approximation of coefficients (10–20 % of locations) and an optimal bandwidth that remains within a reasonable neighborhood (<span>\\(<\\,\\)</span>5000 neighbors). All the estimator of GWR with target pointsare now accessible in the R package <i>mgwrsar</i> for GWR and Mixed GWR with and without spatial autocorrelation, available on CRAN depository at https://CRAN.R-project.org/package=mgwrsar.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10109-024-00442-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0
Abstract
Spatially varying coefficient models, such as GWR (Brunsdon et al. in Geogr Anal 28:281–298, 1996 and McMillen in J Urban Econ 40:100–124, 1996), find extensive applications across various fields, including housing markets, land use, population ecology, seismology, and mining research. These models are valuable for capturing the spatial heterogeneity of coefficient values. In many application areas, the continuous expansion of spatial data sample sizes, in terms of both volume and richness of explanatory variables, has given rise to new methodological challenges. The primary issues revolve around the time required to calculate each local coefficients and the memory requirements imposed for storing the large hat matrix (of size \(n \times n\)) for parameter variance estimation. Researchers have explored various approaches to address these challenges (Harris et al. in Trans GIS 14:43–61, 2010, Pozdnoukhov and Kaiser in: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2011; Tran et al. in: 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), IEEE, 2016; Geniaux and Martinetti in Reg Sci Urban Econ 72:74–85, 2018; Li et al. in Int J Geogr Inf Sci 33:155–175, 2019; Murakami et al. in Ann Am Assoc Geogr 111:459–480, 2020). While the use of a subset of target points for local regressions has been extensively studied in nonparametric econometrics, its application within the context of GWR has been relatively unexplored. In this paper, we propose an original two-stage method designed to accelerate GWR computations. We select a subset of target points based on the spatial smoothing of residuals from a first-stage regression, conducting GWR solely on this subsample. Additionally, we propose an original approach for extrapolating coefficients to non-target points. In addition to using an effective sample of target points, we explore the computational gain provided by using truncated Gaussian kernel to create sparser matrices during computation. Our Monte Carlo experiments demonstrate that this method of target point selection outperforms methods based on point density or random selection. The results also reveal that using target points can reduce bias and root mean square error (RMSE) in estimating \(\beta\) coefficients compared to traditional GWR, as it enables the selection of a more accurate bandwidth size. We demonstrate that our estimator is scalable and exhibits superior properties in this regard compared to the (Murakami et al. in Ann Am Assoc Geogr 111:459–480, 2020) estimator under two conditions: the use of a ratio of target points that provides satisfactory approximation of coefficients (10–20 % of locations) and an optimal bandwidth that remains within a reasonable neighborhood (\(<\,\)5000 neighbors). All the estimator of GWR with target pointsare now accessible in the R package mgwrsar for GWR and Mixed GWR with and without spatial autocorrelation, available on CRAN depository at https://CRAN.R-project.org/package=mgwrsar.
空间变化系数模型,如 GWR(Brunsdon 等人在 Geogr Anal 28:281-298, 1996 年和 McMillen 在 J Urban Econ 40:100-124, 1996 年),广泛应用于各个领域,包括住房市场、土地利用、人口生态学、地震学和采矿研究。这些模型对于捕捉系数值的空间异质性很有价值。在许多应用领域,空间数据样本量的不断扩大,无论是在数量上还是在解释变量的丰富程度上,都带来了新的方法论挑战。主要问题围绕计算每个局部系数所需的时间,以及存储用于参数方差估计的大帽矩阵(大小为 n 次)所需的内存。研究人员探索了各种方法来应对这些挑战(Harris 等人,载于 Trans GIS 14:43-61, 2010 年;Pozdnoukhov 和 Kaiser,载于:第 19 届 ACM SIGSPATIAL 地理信息系统进展国际会议论文集,2011 年;Tran 等人在《2016 年第八届国际知识与应用大会》上的论文:2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), IEEE, 2016; Geniaux and Martinetti in Reg Sci Urban Econ 72:74-85, 2018; Li et al. in Int J Geogr Inf Sci 33:155-175, 2019; Murakami et al. in Ann Am Assoc Geogr 111:459-480, 2020)。虽然在非参数计量经济学中已经对使用目标点子集进行局部回归进行了广泛研究,但其在全球地理回归中的应用却相对较少。在本文中,我们提出了一种新颖的两阶段方法,旨在加速 GWR 计算。我们根据第一阶段回归的残差的空间平滑化选择目标点子集,仅在该子样本上进行 GWR。此外,我们还提出了一种将系数外推到非目标点的独创方法。除了使用有效的目标点样本外,我们还探索了在计算过程中使用截断高斯核创建稀疏矩阵所带来的计算增益。我们的蒙特卡罗实验证明,这种目标点选择方法优于基于点密度或随机选择的方法。结果还显示,与传统的 GWR 相比,使用目标点可以减少估计系数的偏差和均方根误差(RMSE),因为它可以选择更精确的带宽大小。我们证明,在两个条件下,我们的估计器是可扩展的,并且与(Murakami et al. in Ann Am Assoc Geogr 111:459-480, 2020)估计器相比,在这方面表现出更优越的特性:使用能提供令人满意的系数近似值的目标点比例(10%-20%的位置),以及保持在合理邻域(\(<\,\)5000个邻域)内的最佳带宽。带有目标点的 GWR 的所有估计方法现在都可以在 R 软件包 mgwrsar 中访问,该软件包用于 GWR 和带有或不带有空间自相关性的混合 GWR,可在 CRAN 存储库(https://CRAN.R-project.org/package=mgwrsar)中访问。