Pub Date : 2024-11-17DOI: 10.1016/j.spasta.2024.100872
Sebastian Hörning , András Bárdossy
Observed environmental are usually the results of physical, chemical, or biological processes. These processes often introduce asymmetries which should be considered when analysing and modelling the observed variables. In a geostatistical context, there are two main types of asymmetry. The first is rank-asymmetry, i.e., low and high values exhibit different spatial dependence structures. The second is order-asymmetry, i.e., the spatial dependence structure is distinguishable in different directions. Both asymmetries, if significant, indicate that the corresponding random field has a non-Gaussian dependence structure. These asymmetries are not part of the classical geostatistical workflow. Taking asymmetry into account however is likely to improve the estimation and the uncertainty assessment at unobserved locations. In this contribution a stochastic model which can be used to simulate asymmetrical random fields with any of the asymmetries or with their combination is presented. Synthetically simulated flow fields and the well known Walker lake dataset are used to demonstrate the methodology.
{"title":"Simulation of conditional non-Gaussian random fields with directional asymmetry","authors":"Sebastian Hörning , András Bárdossy","doi":"10.1016/j.spasta.2024.100872","DOIUrl":"10.1016/j.spasta.2024.100872","url":null,"abstract":"<div><div>Observed environmental are usually the results of physical, chemical, or biological processes. These processes often introduce asymmetries which should be considered when analysing and modelling the observed variables. In a geostatistical context, there are two main types of asymmetry. The first is rank-asymmetry, i.e., low and high values exhibit different spatial dependence structures. The second is order-asymmetry, i.e., the spatial dependence structure is distinguishable in different directions. Both asymmetries, if significant, indicate that the corresponding random field has a non-Gaussian dependence structure. These asymmetries are not part of the classical geostatistical workflow. Taking asymmetry into account however is likely to improve the estimation and the uncertainty assessment at unobserved locations. In this contribution a stochastic model which can be used to simulate asymmetrical random fields with any of the asymmetries or with their combination is presented. Synthetically simulated flow fields and the well known Walker lake dataset are used to demonstrate the methodology.</div></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":"65 ","pages":"Article 100872"},"PeriodicalIF":2.1,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142707342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-16DOI: 10.1016/j.spasta.2024.100864
J.D. Toloza-Delgado , O.O. Melo , N.A. Cruz
In the context of spatial econometrics, it is very useful to have methodologies that allow modeling the spatial dependence of the observed variables and obtaining more precise predictions of both the mean and the variability of the response variable, something very useful in territorial planning and public policies. This paper proposes a new methodology that jointly models the mean and the variance. Also, it allows to model the spatial dependence of the dependent variable as a function of covariates and to model the semiparametric effects in both models. The algorithms developed are based on generalized additive models that allow the inclusion of non-parametric terms in both the mean and the variance, maintaining the traditional theoretical framework of spatial regression. The theoretical developments of the estimation of this model are carried out, obtaining desirable statistical properties in the estimators. A simulation study is developed to verify that the proposed method has a remarkable predictive capacity in terms of the mean square error and shows a notable improvement in the estimation of the spatial autoregressive parameter, compared to other traditional methods and some recent developments. The model is also tested on data from the construction of a hedonic price model for the city of Bogotá, highlighting as the main result the ability to model the variability of housing prices, and the wealth in the analysis obtained.
{"title":"Joint spatial modeling of mean and non-homogeneous variance combining semiparametric SAR and GAMLSS models for hedonic prices","authors":"J.D. Toloza-Delgado , O.O. Melo , N.A. Cruz","doi":"10.1016/j.spasta.2024.100864","DOIUrl":"10.1016/j.spasta.2024.100864","url":null,"abstract":"<div><div>In the context of spatial econometrics, it is very useful to have methodologies that allow modeling the spatial dependence of the observed variables and obtaining more precise predictions of both the mean and the variability of the response variable, something very useful in territorial planning and public policies. This paper proposes a new methodology that jointly models the mean and the variance. Also, it allows to model the spatial dependence of the dependent variable as a function of covariates and to model the semiparametric effects in both models. The algorithms developed are based on generalized additive models that allow the inclusion of non-parametric terms in both the mean and the variance, maintaining the traditional theoretical framework of spatial regression. The theoretical developments of the estimation of this model are carried out, obtaining desirable statistical properties in the estimators. A simulation study is developed to verify that the proposed method has a remarkable predictive capacity in terms of the mean square error and shows a notable improvement in the estimation of the spatial autoregressive parameter, compared to other traditional methods and some recent developments. The model is also tested on data from the construction of a hedonic price model for the city of Bogotá, highlighting as the main result the ability to model the variability of housing prices, and the wealth in the analysis obtained.</div></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":"65 ","pages":"Article 100864"},"PeriodicalIF":2.1,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142707294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-13DOI: 10.1016/j.spasta.2024.100871
Yi-Hung Kung
The COVID-19 pandemic has posed unprecedented public health challenges worldwide, necessitating a comprehensive understanding of its transmission dynamics. This study examines the correlation between COVID-19 transmission and various risk factors, focusing on the impact of population structure and socio-economic conditions in Taiwan. By analyzing official government databases, we explore how factors such as population density, dependency ratios, and socio-economic environment influence the spread of COVID-19. Our findings highlight that densely populated areas, along with regions characterized by higher child dependency ratios and a significant number of low- and middle-income households, exhibit higher transmission rates. This research underscores the importance of considering socio-economic disparities and healthcare access in developing effective public health strategies. Furthermore, we utilize a mixture scan statistic to identify disease hotspots, taking into account spatial correlation and covariate effects. This approach can detect clusters based on known risk factors and help to assess possible unknown geographic risks, facilitating targeted interventions and resource allocation. Our study contributes to the broader understanding of COVID-19 transmission dynamics, offering insights into the importance of integrating socio-economic factors and spatial analysis in pandemic response efforts.
{"title":"Epidemiological insights and geographic clusters for COVID-19 in Taiwan using a mixture scan statistic","authors":"Yi-Hung Kung","doi":"10.1016/j.spasta.2024.100871","DOIUrl":"10.1016/j.spasta.2024.100871","url":null,"abstract":"<div><div>The COVID-19 pandemic has posed unprecedented public health challenges worldwide, necessitating a comprehensive understanding of its transmission dynamics. This study examines the correlation between COVID-19 transmission and various risk factors, focusing on the impact of population structure and socio-economic conditions in Taiwan. By analyzing official government databases, we explore how factors such as population density, dependency ratios, and socio-economic environment influence the spread of COVID-19. Our findings highlight that densely populated areas, along with regions characterized by higher child dependency ratios and a significant number of low- and middle-income households, exhibit higher transmission rates. This research underscores the importance of considering socio-economic disparities and healthcare access in developing effective public health strategies. Furthermore, we utilize a mixture scan statistic to identify disease hotspots, taking into account spatial correlation and covariate effects. This approach can detect clusters based on known risk factors and help to assess possible unknown geographic risks, facilitating targeted interventions and resource allocation. Our study contributes to the broader understanding of COVID-19 transmission dynamics, offering insights into the importance of integrating socio-economic factors and spatial analysis in pandemic response efforts.</div></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":"65 ","pages":"Article 100871"},"PeriodicalIF":2.1,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142707293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12DOI: 10.1016/j.spasta.2024.100866
Joshua S. North , Mark D. Risser , F. Jay Breidt
Statistical modeling of high-dimensional matrix-valued data motivates the use of a low-rank representation that simultaneously summarizes key characteristics of the data and enables dimension reduction. Low-rank representations commonly factor the original data into the product of orthonormal basis functions and weights, where each basis function represents an independent feature of the data. However, the basis functions in these factorizations are typically computed using algorithmic methods that cannot quantify uncertainty or account for basis function correlation structure a priori. While there exist Bayesian methods that allow for a common correlation structure across basis functions, empirical examples motivate the need for basis function-specific dependence structure. We propose a prior distribution for orthonormal matrices that can explicitly model basis function-specific structure. The prior is used within a general probabilistic model for singular value decomposition to conduct posterior inference on the basis functions while accounting for measurement error and fixed effects. We discuss how the prior specification can be used for various scenarios and demonstrate favorable model properties through synthetic data examples. Finally, we apply our method to two-meter air temperature data from the Pacific Northwest, enhancing our understanding of the Earth system’s internal variability.
{"title":"A flexible class of priors for orthonormal matrices with basis function-specific structure","authors":"Joshua S. North , Mark D. Risser , F. Jay Breidt","doi":"10.1016/j.spasta.2024.100866","DOIUrl":"10.1016/j.spasta.2024.100866","url":null,"abstract":"<div><div>Statistical modeling of high-dimensional matrix-valued data motivates the use of a low-rank representation that simultaneously summarizes key characteristics of the data and enables dimension reduction. Low-rank representations commonly factor the original data into the product of orthonormal basis functions and weights, where each basis function represents an independent feature of the data. However, the basis functions in these factorizations are typically computed using algorithmic methods that cannot quantify uncertainty or account for basis function correlation structure <em>a priori</em>. While there exist Bayesian methods that allow for a common correlation structure across basis functions, empirical examples motivate the need for basis function-specific dependence structure. We propose a prior distribution for orthonormal matrices that can explicitly model basis function-specific structure. The prior is used within a general probabilistic model for singular value decomposition to conduct posterior inference on the basis functions while accounting for measurement error and fixed effects. We discuss how the prior specification can be used for various scenarios and demonstrate favorable model properties through synthetic data examples. Finally, we apply our method to two-meter air temperature data from the Pacific Northwest, enhancing our understanding of the Earth system’s internal variability.</div></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":"64 ","pages":"Article 100866"},"PeriodicalIF":2.1,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1016/j.spasta.2024.100870
Michael Tsyrulnikov , Arseniy Sotskiy
The sample covariance matrix of a random vector is a good estimate of the true covariance matrix if the sample size is much larger than the length of the vector. In high-dimensional problems, this condition is never met. As a result, in high dimensions the Ensemble Kalman Filter’s (EnKF) ensemble does not contain enough information to specify the prior covariance matrix accurately. This necessitates the need for regularization of the analysis (observation update) problem. We propose a regularization technique based on a new spatial model. The model is a constrained version of the general Gaussian process convolution model. The constraints include local stationarity and smoothness of local spectra. We regularize EnKF by postulating that its prior covariances obey the spatial model. Placing a hyperprior distribution on the model parameters and using the likelihood of the prior ensemble data allows for an optimized use of both the ensemble and the hyperprior. A linear version of the respective estimator is shown to be consistent. A more accurate nonlinear neural-Bayes implementation of the estimator is developed. In simulation experiments, the new technique led to substantially better EnKF performance than several existing techniques.
{"title":"Regularization of the Ensemble Kalman Filter using a non-parametric, non-stationary spatial model","authors":"Michael Tsyrulnikov , Arseniy Sotskiy","doi":"10.1016/j.spasta.2024.100870","DOIUrl":"10.1016/j.spasta.2024.100870","url":null,"abstract":"<div><div>The sample covariance matrix of a random vector is a good estimate of the true covariance matrix if the sample size is much larger than the length of the vector. In high-dimensional problems, this condition is never met. As a result, in high dimensions the Ensemble Kalman Filter’s (EnKF) ensemble does not contain enough information to specify the prior covariance matrix accurately. This necessitates the need for regularization of the analysis (observation update) problem. We propose a regularization technique based on a new spatial model. The model is a constrained version of the general Gaussian process convolution model. The constraints include local stationarity and smoothness of local spectra. We regularize EnKF by postulating that its prior covariances obey the spatial model. Placing a hyperprior distribution on the model parameters and using the likelihood of the prior ensemble data allows for an optimized use of both the ensemble and the hyperprior. A linear version of the respective estimator is shown to be consistent. A more accurate nonlinear neural-Bayes implementation of the estimator is developed. In simulation experiments, the new technique led to substantially better EnKF performance than several existing techniques.</div></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":"64 ","pages":"Article 100870"},"PeriodicalIF":2.1,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1016/j.spasta.2024.100867
Martin Outzen Berild, Geir-Arne Fuglstad
We construct flexible spatio-temporal models through stochastic partial differential equations (SPDEs) where both diffusion and advection can be spatially varying. Computations are done through a Gaussian Markov random field approximation of the solution of the SPDE, which is constructed through a finite volume method. The new flexible non-separable model is compared to a flexible separable model both for reconstruction and forecasting, and evaluated in terms of root mean square errors and continuous rank probability scores. A simulation study demonstrates that the non-separable model performs better when the data is simulated from a non-separable model with diffusion and advection. Further, we estimate surrogate models for emulating the output of a ocean model in Trondheimsfjorden, Norway, and simulate observations of autonomous underwater vehicles. The results show that the flexible non-separable model outperforms the flexible separable model for real-time prediction of unobserved locations.
{"title":"Non-stationary spatio-temporal modeling using the stochastic advection–diffusion equation","authors":"Martin Outzen Berild, Geir-Arne Fuglstad","doi":"10.1016/j.spasta.2024.100867","DOIUrl":"10.1016/j.spasta.2024.100867","url":null,"abstract":"<div><div>We construct flexible spatio-temporal models through stochastic partial differential equations (SPDEs) where both diffusion and advection can be spatially varying. Computations are done through a Gaussian Markov random field approximation of the solution of the SPDE, which is constructed through a finite volume method. The new flexible non-separable model is compared to a flexible separable model both for reconstruction and forecasting, and evaluated in terms of root mean square errors and continuous rank probability scores. A simulation study demonstrates that the non-separable model performs better when the data is simulated from a non-separable model with diffusion and advection. Further, we estimate surrogate models for emulating the output of a ocean model in Trondheimsfjorden, Norway, and simulate observations of autonomous underwater vehicles. The results show that the flexible non-separable model outperforms the flexible separable model for real-time prediction of unobserved locations.</div></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":"64 ","pages":"Article 100867"},"PeriodicalIF":2.1,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.spasta.2024.100868
Eulogio Pardo-Igúzquiza , Peter A. Dowd
The problem of mapping hidden alignments of points in data sets of two-dimensional points is of significant interest in many geoscience disciplines. In this paper, we revisit this issue and provide a new algorithm, insights, and results. The statistical significance of alignments is assessed by using percentile confidence intervals estimated by a Monte Carlo procedure in which important issues, such as the shape of the geometric support and the possible non-homogeneity of the point density (i.e., clustering effects), have been considered. The procedure is not limited to the simplest case of occurrence and the chance of triads (alignments of three points in a plane) but has been extended to k-ads with k arbitrarily large. The important issue of scale, when searching for point alignments, has also been taken into account. Case studies using synthetic and real data sets are provided to illustrate the methodology and the claims.
绘制二维点数据集中点的隐藏排列图是许多地球科学学科非常感兴趣的问题。在本文中,我们重新审视了这一问题,并提供了一种新的算法、见解和结果。通过使用蒙特卡罗程序估算的百分位数置信区间来评估排列的统计意义,其中考虑了一些重要问题,如几何支撑的形状和点密度可能存在的非均质性(即聚类效应)。该程序并不局限于最简单的三元组(平面上三个点的排列)出现和出现的几率,而是扩展到了 k 值任意大的 k 元组。在搜索点排列时,还考虑到了重要的规模问题。我们提供了使用合成数据集和真实数据集的案例研究,以说明我们的方法和主张。
{"title":"Uncovering hidden alignments in two-dimensional point fields","authors":"Eulogio Pardo-Igúzquiza , Peter A. Dowd","doi":"10.1016/j.spasta.2024.100868","DOIUrl":"10.1016/j.spasta.2024.100868","url":null,"abstract":"<div><div>The problem of mapping hidden alignments of points in data sets of two-dimensional points is of significant interest in many geoscience disciplines. In this paper, we revisit this issue and provide a new algorithm, insights, and results. The statistical significance of alignments is assessed by using percentile confidence intervals estimated by a Monte Carlo procedure in which important issues, such as the shape of the geometric support and the possible non-homogeneity of the point density (i.e., clustering effects), have been considered. The procedure is not limited to the simplest case of occurrence and the chance of triads (alignments of three points in a plane) but has been extended to k-ads with k arbitrarily large. The important issue of scale, when searching for point alignments, has also been taken into account. Case studies using synthetic and real data sets are provided to illustrate the methodology and the claims.</div></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":"64 ","pages":"Article 100868"},"PeriodicalIF":2.1,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1016/j.spasta.2024.100863
Shiyu He, Samuel W.K. Wong
We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a “fusion” model via the construction of projection matrices in both spatial and temporal domains. Through simulation studies, we demonstrate that the fusion model has superior performance in prediction accuracy across space and time compared to standalone “in situ” and “satellite” models based on only in situ or satellite data, respectively. The fusion model also generally outperforms the standalone models in terms of parameter inference. Such a modeling approach is motivated by environmental problems, and our specific focus is on the analysis and prediction of harmful algae bloom (HAB) events, where the convention is to conduct separate analyses based on either in situ samples or satellite images. A real data analysis shows that the proposed model is a necessary step towards a unified characterization of bloom dynamics and identifying the key drivers of HAB events.
{"title":"Spatio-temporal data fusion for the analysis of in situ and remote sensing data using the INLA-SPDE approach","authors":"Shiyu He, Samuel W.K. Wong","doi":"10.1016/j.spasta.2024.100863","DOIUrl":"10.1016/j.spasta.2024.100863","url":null,"abstract":"<div><div>We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a “fusion” model via the construction of projection matrices in both spatial and temporal domains. Through simulation studies, we demonstrate that the fusion model has superior performance in prediction accuracy across space and time compared to standalone “in situ” and “satellite” models based on only in situ or satellite data, respectively. The fusion model also generally outperforms the standalone models in terms of parameter inference. Such a modeling approach is motivated by environmental problems, and our specific focus is on the analysis and prediction of harmful algae bloom (HAB) events, where the convention is to conduct separate analyses based on either in situ samples or satellite images. A real data analysis shows that the proposed model is a necessary step towards a unified characterization of bloom dynamics and identifying the key drivers of HAB events.</div></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":"64 ","pages":"Article 100863"},"PeriodicalIF":2.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24DOI: 10.1016/j.spasta.2024.100865
Sara Franceschi , Lorenzo Fattorini , Timothy G Gregoire
Because of its ease of implementation, equal probability systematic sampling is of wide use in spatial surveys with sample mean that constitutes an unbiased estimator of population mean. A serious drawback, however, is that no unbiased estimator of the variance of the sample mean is available. As the search for an omnibus variance estimator able to provide reliable results under any spatial population has been lacking, we propose a design-consistent estimator that invariably converges to the true variance as the population and sample size increase. The proposal is based on the nearest-neighbour maps that are taken as pseudo-populations from which all the possible systematic samples can be enumerated. As nearest-neighbour maps are design-consistent under equal-probability systematic sampling and mild conditions, the variance of the sample mean achieved from all the possible systematic samples selected from the map is also a consistent estimator of the true variance. Through a simulation study based on artificial and real populations we show that our proposal generally outperforms the familiar estimators proposed in literature.
{"title":"Exploiting nearest-neighbour maps for estimating the variance of sample mean in equal-probability systematic sampling of spatial populations","authors":"Sara Franceschi , Lorenzo Fattorini , Timothy G Gregoire","doi":"10.1016/j.spasta.2024.100865","DOIUrl":"10.1016/j.spasta.2024.100865","url":null,"abstract":"<div><div>Because of its ease of implementation, equal probability systematic sampling is of wide use in spatial surveys with sample mean that constitutes an unbiased estimator of population mean. A serious drawback, however, is that no unbiased estimator of the variance of the sample mean is available. As the search for an omnibus variance estimator able to provide reliable results under any spatial population has been lacking, we propose a design-consistent estimator that invariably converges to the true variance as the population and sample size increase. The proposal is based on the nearest-neighbour maps that are taken as pseudo-populations from which all the possible systematic samples can be enumerated. As nearest-neighbour maps are design-consistent under equal-probability systematic sampling and mild conditions, the variance of the sample mean achieved from all the possible systematic samples selected from the map is also a consistent estimator of the true variance. Through a simulation study based on artificial and real populations we show that our proposal generally outperforms the familiar estimators proposed in literature.</div></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":"64 ","pages":"Article 100865"},"PeriodicalIF":2.1,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-16DOI: 10.1016/j.spasta.2024.100862
Xiaodi Zhang, Yunquan Song
With the development of deep learning techniques, the application of neural networks to statistical inference has dramatically increased in popularity. In this paper, we extend the deep neural network-based variable selection method to nonparametric spatial autoregressive models. Our approach incorporates feature selection and parameter learning by introducing Lasso penalties in a residual network structure with spatial effects. We transform the problem into a constrained optimization task, where optimizing an objective function with constraints. Without specifying sparsity, we are also able to obtain a specific set of selected variables. The performance of the method with finite samples is demonstrated through an extensive Monte Carlo simulation study. Finally, we apply the method to California housing price data, further validating its superiority in terms of variable selection and predictive performance.
{"title":"Variable selection of nonparametric spatial autoregressive models via deep learning","authors":"Xiaodi Zhang, Yunquan Song","doi":"10.1016/j.spasta.2024.100862","DOIUrl":"10.1016/j.spasta.2024.100862","url":null,"abstract":"<div><div>With the development of deep learning techniques, the application of neural networks to statistical inference has dramatically increased in popularity. In this paper, we extend the deep neural network-based variable selection method to nonparametric spatial autoregressive models. Our approach incorporates feature selection and parameter learning by introducing Lasso penalties in a residual network structure with spatial effects. We transform the problem into a constrained optimization task, where optimizing an objective function with constraints. Without specifying sparsity, we are also able to obtain a specific set of selected variables. The performance of the method with finite samples is demonstrated through an extensive Monte Carlo simulation study. Finally, we apply the method to California housing price data, further validating its superiority in terms of variable selection and predictive performance.</div></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":"64 ","pages":"Article 100862"},"PeriodicalIF":2.1,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}