{"title":"Extending regionalization algorithms to explore spatial process heterogeneity","authors":"Hao Guo, Andre Python, Yu Liu","doi":"10.1080/13658816.2023.2266493","DOIUrl":null,"url":null,"abstract":"AbstractIn spatial regression models, spatial heterogeneity may be considered with either continuous or discrete specifications. The latter is related to delineation of spatially connected regions with homogeneous relationships between variables (spatial regimes). Although various regionalization algorithms have been proposed and studied in the field of spatial analytics, methods to optimize spatial regimes have been largely unexplored. In this paper, we propose two new algorithms for spatial regime delineation, two-stage K-Models and Regional-K-Models. We also extend the classic Automatic Zoning Procedure to a spatial regression context. The proposed algorithms are applied to a series of synthetic datasets and two real-world datasets. Results indicate that all three algorithms achieve superior or comparable performance to existing approaches, while the two-stage K-Models algorithm largely outperforms existing approaches on model fitting, region reconstruction and coefficient estimation. Our work enriches the spatial analytics toolbox to explore spatial heterogeneous processes.Keywords: Regionalizationspatial heterogeneityspatial regimespatial regression NotesAcknowledgmentsThe authors thank members of Spatial Analysis Group, Spatio-temporal Social Sensing Lab for helpful discussion. The constructive comments from anonymous reviewers are gratefully acknowledged.Data and codes availability statementThe data and codes that support the findings of this study are available at https://github.com/Nithouson/regreg.Disclosure statementThe authors declare that they have no conflict of interest.Notes1 For example, the population in each region is required to be as similar as possible or above a predefined value (see Duque et al. (Citation2012), Folch and Spielman (Citation2014), Wei et al. (Citation2021)).2 Note that the optimization of spatial regimes differs from Openshaw (Citation1978), where spatial units are aggregated into areas, and each area is treated as an observation in a global regression model.3 Note that in EquationEquation 1(1) L(R)=∑j=1p∑1≤i1<i2≤nI[ui1,ui2∈Rj]||xi1−xi2||2,(1) , the number of considered unit pairs in the sum is ∑j=1M(|Rj|2), which is smaller if |Rj|(j=1,…,M) are close to each other. Hence the objective function might favor solutions whose regions have similar numbers of units.4 Throughout the paper, we describe the case of lattice data (spatial data on areal units). Our approach is also applicable to point observation data after building adjacency (with k-nearest neighbors (KNN) or Delaunay triangulation, for example).5 This usually happens when min_obs is close to n/p, where p is the number of regions. Given min_obs≪n/p, this issue does not cause problems, as observed in our experiments.6 If a region with inadequate units has two or more neighboring regions, we select the neighbor which minimizes the total SSR after the merge.7 When min_obs is too large or K is too small (close to p), exceptions may occur that the number of regions is less than p, hence the algorithm cannot produce the required number of regions by merging ‘micro-clusters’. This issue can be solved by adjusting min_obs and K.8 Let nr denote the number of units in the region. The OLS estimation of the coefficient vector is β=(XTX)−1XTy, where X is the nr×(m+1) matrix of independent variables, y is the nr-dimensional vector of dependent variable. Here the intercept is included in β by adding an independent variable with constant value 1. By applying the Sherman-Morrison formula (Bartlett Citation1951) to update the (XTX)−1 term, the time complexity can be reduced from O(m2(nr+m)) to O(m(nr+m)).9 Let βi,j denote the value of coefficient βi in region Rj. In each simulation, the list (−2,−1,0,1,2) is randomly shuffled twice and used as (β1,1,…,β1,5) and (β2,1,…,β2,5), respectively.10 Helbich et al. (Citation2013) also applied principal component analysis to the GWR coefficients. This step is skipped, as dimension reduction is not needed in our experiment.11 Different values of min_obs may be used in the two stages of K-Models algorithm. Here min_obs=10 is used for the merge stage, while min_obs in the partition stage is the number of independent variables plus 1 throughout this paper.12 Even considering the average SSR rather than the lowest, two-stage K-Models and AZP consistently outperform GWR-Skater and Skater-reg; Regional-K-Models is comparable to Skater-reg and superior to GWR-Skater.13 The GWR estimation did not complete within 30 minutes on our machine.14 Experiments on King County house price dataset is performed on a computer with an Intel Core i5-1135G7 CPU (2.40 GHz) and 16GB of memory.Additional informationFundingThis research was supported by grants from the National Natural Science Foundation of China (41830645, 42271426, 41971331, 82273731), Smart Guangzhou Spatio-temporal Information Cloud Platform Construction (GZIT2016-A5-147) and the National Key Research and Development Program of China (2021YFC2701905).Notes on contributorsHao GuoHao Guo is currently a Ph.D. candidate at the Institute of Remote Sensing and Geographic Information Systems, Peking University. He received his B.S. in Geographic Information Science and a dual B.S. in Mathematics from Peking University in 2020. His research interests include spatial analytics, geo-spatial artificial intelligence and spatial optimization.Andre PythonAndre Python is ZJU100 Young Professor in Statistics at the Center for Data Science, Zhejiang University, P.R. China. He received his B.S. and M.S. from the University of Fribourg, Switzerland, and his Ph.D. from the University of St Andrews, United Kingdom. He develops and applies spatial models and interpretable machine learning algorithms to better understand the mechanisms behind the observed patterns of spatial phenomena.Yu LiuYu Liu is currently the Boya Professor of GIScience at the Institute of Remote Sensing and Geographic Information Systems, Peking University. He received his B.S., M.S. and Ph.D. degrees from Peking University in 1994, 1997 and 2003, respectively. His research interests mainly focus on humanities and social sciences based on big geo-data.","PeriodicalId":14162,"journal":{"name":"International Journal of Geographical Information Science","volume":"130 1","pages":"0"},"PeriodicalIF":4.3000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Geographical Information Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/13658816.2023.2266493","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
AbstractIn spatial regression models, spatial heterogeneity may be considered with either continuous or discrete specifications. The latter is related to delineation of spatially connected regions with homogeneous relationships between variables (spatial regimes). Although various regionalization algorithms have been proposed and studied in the field of spatial analytics, methods to optimize spatial regimes have been largely unexplored. In this paper, we propose two new algorithms for spatial regime delineation, two-stage K-Models and Regional-K-Models. We also extend the classic Automatic Zoning Procedure to a spatial regression context. The proposed algorithms are applied to a series of synthetic datasets and two real-world datasets. Results indicate that all three algorithms achieve superior or comparable performance to existing approaches, while the two-stage K-Models algorithm largely outperforms existing approaches on model fitting, region reconstruction and coefficient estimation. Our work enriches the spatial analytics toolbox to explore spatial heterogeneous processes.Keywords: Regionalizationspatial heterogeneityspatial regimespatial regression NotesAcknowledgmentsThe authors thank members of Spatial Analysis Group, Spatio-temporal Social Sensing Lab for helpful discussion. The constructive comments from anonymous reviewers are gratefully acknowledged.Data and codes availability statementThe data and codes that support the findings of this study are available at https://github.com/Nithouson/regreg.Disclosure statementThe authors declare that they have no conflict of interest.Notes1 For example, the population in each region is required to be as similar as possible or above a predefined value (see Duque et al. (Citation2012), Folch and Spielman (Citation2014), Wei et al. (Citation2021)).2 Note that the optimization of spatial regimes differs from Openshaw (Citation1978), where spatial units are aggregated into areas, and each area is treated as an observation in a global regression model.3 Note that in EquationEquation 1(1) L(R)=∑j=1p∑1≤i1
期刊介绍:
International Journal of Geographical Information Science provides a forum for the exchange of original ideas, approaches, methods and experiences in the rapidly growing field of geographical information science (GIScience). It is intended to interest those who research fundamental and computational issues of geographic information, as well as issues related to the design, implementation and use of geographical information for monitoring, prediction and decision making. Published research covers innovations in GIScience and novel applications of GIScience in natural resources, social systems and the built environment, as well as relevant developments in computer science, cartography, surveying, geography and engineering in both developed and developing countries.