This year is the 50th anniversary of Besag's classic auto-models publication, a cornerstone in the development of modern-day spatial statistics/econometrics. Besag struggled for nearly two decades to make his conceptualization collectively successful across a wide suite of random variables. But only his auto-normal, and to a lesser degree his auto-logistic/binomial, were workable. Others, like his auto-Poisson, were effectively failures, whereas still others, such as potentials like an auto-Weibull, defied even awkward mathematical incorporations of spatial lag terms. Besag circumvented this impediment by introducing an auto-normal random effects components (within a Bayesian estimation context), building upon his single total success. This article describes an alternative approach, partly paralleling his reformulation while avoiding inserting spatial lag terms directly into probability density/mass functions, implanting spatial autocorrelation into cumulative distributions functions (CDFs), instead, via a spatially autocorrelated uniform distribution. The already existing probability integral transform and quantile function mathematical statistics theorems enable this mechanism to spatialize any random variable, with these new ones labeled sui-models.
This article describes a new spatial optimization model, the Multiple Gradual Maximal Covering Location Problem (MG-MCLP). This model is useful when coverage from multiple facilities or sensors is necessary to consider a demand to be covered, and when the quality of that coverage varies with the number of located facilities within the service distance, and the distance from the demand itself. The motivating example for this model uses a coupled GIS and optimization framework to determine the optimal locations for acoustic sensors—typically used in police applications for gunshot detection—in Tuscaloosa, AL. The results identify the optimal facility locations for allocating multiple facilities, at different locations, to cover multiple demands and evaluate those optimal locations with distance-decay. Solving the MG-MCLP over a range of values allows for comparing the performance of varying numbers of available resources, which could be used by public safety operations to demonstrate the number of resources that would be required to meet policy goals. The results illustrate the flexibility in designing alternative spatial allocation strategies and provide a tractable covering model that is solved with standard linear programming and GIS software, which in turn can improve spatial data analysis across many operational contexts.
Mete, M. O., & Yomralioglu, T. (2023) A hybrid approach for mass valuation of residential properties through geographic information systems and machine learning integration. Geographical Analysis, 55(4), 535–559.
The funding statement for this article was missing. The below funding statement has been added to the article:
“Funding for the research project was received from Scientific Research Projects Coordination Unit of Istanbul Technical University under grant MDK-2021-43080.”
We apologize for this error.
Statistical research on correlation with spatial data dates at least to Student's (W. S. Gosset's) 1914 paper on “the elimination of spurious correlation due to position in time and space.” Since 1968, much of this work has been organized around the concept of spatial autocorrelation (SA). A growing statistical literature is now organized around the concept of “spatial confounding” (SC) but is estranged from, and often at odds with, the SA literature and its history. The SC literature is producing new, sometimes flawed, statistical techniques such as Restricted Spatial Regression (RSR). This article brings the SC literature into conversation with the SA literature and provides a theoretically grounded review of the history of research on correlation with spatial data, explaining some of its implications for the the SC literature. The article builds upon principles of plausible inference to synthesize a guiding theoretical thread that runs throughout the SA literature. This leads to a concise theoretical critique of RSR and a clarification of the logic behind standard spatial-statistical models.
To minimize the disclosure of personal information, sensitive location data collected by mobile phones is often aggregated to predefined geographic units and presented as counts of devices at a given time. The use of grids or units created by statistical agencies for the dissemination of traditional data sets—such as censuses—are common choices for this aggregation process. However, these can result in large variations in the number of devices encapsulated within each geographic unit, resulting in over-generalization and a loss of information in some areas. To alleviate this issue, we propose a new method for the aggregation of mobile phone generated location data sets that creates bespoke geometries that maximize the granularity of the data, whilst minimizing the risks of disclosing personal information. The resulting small areas are built on Uber's H3 hexagonal indexing system by attributing activity counts and land-use features to each cell, then merging cells into geographies containing a predetermined number of data points and respecting the underlying topography and land use. This methodology has applications to widely available data sets and enables bespoke geographical units to be created for different contexts. We compare the generated units to established aggregates from the England and Wales Census and Ordnance Survey. We demonstrate that our outputs are more representative of the original mobile phone data set and minimize data omission caused by low counts. This speaks to the need for a data-driven and context-driven regionalization methodology.
Geographic shape has long been an intriguing feature of observed and defined facets of an area or region. Compactness reflects a critical element of shape with important practical and policy implications. It may suggest characteristics of urban/regional form, efficiency in trade and service provision, fairness in political representation and distributional qualities of the physical environment, among others. While there has been much study of compactness and a wealth of measures and metrics derived to reflect nuances of geographic form, there are questions that remain about their ability to characterize shape in a meaningful manner. Given this, exploration of relationships between various categories of methods for quantifying compactness is critical. Further, recent developments of, advances in and access to physics based spatial measures of compactness suggest an opportunity for better theoretical understanding. Assessment of 388 districts is carried out. Significant correlation is demonstrated between contemporary measures, opening the door for research advancements associated with the compactness of spatial shapes. This work is interesting, important, and of current relevance because compactness measures are given serious consideration in management, planning, and policy, but also are regularly relied upon in legal proceedings. Further, compactness measures continue to drive automated and semi-automated approaches in districting and redistricting, often embedded in optimization approaches.
Data on neighborhood characteristics are not typically collected in epidemiological studies. They are however useful, for example, in the study of small-area health inequalities and may be available in social surveys. We propose to use kriging based on semi-variogram models to predict values at nonobserved locations with the aim of obtaining indicators of neighborhood characteristics of epidemiological study participants. The spatial data available for kriging is usually sparse at small distance and therefore we perform a simulation study to assess the feasibility and usability of the method as well as a case study using data from the RECORD study. Apart from having enough observed data at small distances to the non-observed locations, a good fitting semi-variogram, a larger range and the absence of nugget effects for the semi-variogram models are factors leading to a higher reliability. Recommendations on the required number of observations within the neighborhood range are given.
Large cellular phone-based mobility datasets are an important new data source for research on human movement. We investigate and illustrate bias in representation in a large mobility data set at the census block group, tract, and county levels. We paired American Community Survey (ACS) 2019 data with SafeGraph (SG) cell phone mobility data to elucidate potential bias in SG data by examining ACS estimated population against the number of devices in the SG data, stratifying by key sociodemographic variables such as income, percent Black population, percent of population over 55 years, percent of population 18–65 years, percent of people living in crowded living conditions, and urbanization level. We evaluated whether the bias varied over time by examining a 10-month period. This bias changes with key demographic characteristics and changes over time. Specifically, we see underrepresentation in areas that have the highest percentage of Black population at all aggregation levels. We also see underrepresentation at all levels in areas with the highest percentage of working age residents as well as areas with the lowest median incomes. Researchers should be cautious when using mobility datasets because of bias differential on key sociodemographic factors and collection time.