首页 > 最新文献

Spatial Statistics最新文献

英文 中文
Automatic cross-validation in structured models: Is it time to leave out leave-one-out? 结构化模型中的自动交叉验证:是时候摒弃 "leave-one-out "了吗?
IF 2.3 2区 数学 Q1 Mathematics Pub Date : 2024-06-12 DOI: 10.1016/j.spasta.2024.100843
Aritz Adin , Elias Teixeira Krainski , Amanda Lenzi , Zhedong Liu , Joaquín Martínez-Minaya , Håvard Rue

Standard techniques such as leave-one-out cross-validation (LOOCV) might not be suitable for evaluating the predictive performance of models incorporating structured random effects. In such cases, the correlation between the training and test sets could have a notable impact on the model’s prediction error. To overcome this issue, an automatic group construction procedure for leave-group-out cross validation (LGOCV) has recently emerged as a valuable tool for enhancing predictive performance measurement in structured models. The purpose of this paper is (i) to compare LOOCV and LGOCV within structured models, emphasizing model selection and predictive performance, and (ii) to provide real data applications in spatial statistics using complex structured models fitted with INLA, showcasing the utility of the automatic LGOCV method. First, we briefly review the key aspects of the recently proposed LGOCV method for automatic group construction in latent Gaussian models. We also demonstrate the effectiveness of this method for selecting the model with the highest predictive performance by simulating extrapolation tasks in both temporal and spatial data analyses. Finally, we provide insights into the effectiveness of the LGOCV method in modeling complex structured data, encompassing spatio-temporal multivariate count data, spatial compositional data, and spatio-temporal geospatial data.

留一交叉验证(LOOCV)等标准技术可能不适合评估包含结构随机效应的模型的预测性能。在这种情况下,训练集和测试集之间的相关性可能会对模型的预测误差产生显著影响。为了克服这一问题,最近出现了一种用于留空交叉验证(LGOCV)的自动建组程序,它是提高结构化模型预测性能测量的重要工具。本文的目的是:(i) 比较结构化模型中的 LOOCV 和 LGOCV,强调模型选择和预测性能;(ii) 提供空间统计学中使用 INLA 拟合的复杂结构化模型的实际数据应用,展示自动 LGOCV 方法的实用性。首先,我们简要回顾了最近提出的在潜在高斯模型中自动构建分组的 LGOCV 方法的主要方面。我们还通过模拟时间和空间数据分析中的外推任务,展示了该方法在选择预测性能最高的模型方面的有效性。最后,我们深入探讨了 LGOCV 方法在复杂结构数据建模中的有效性,包括时空多变量计数数据、空间组合数据和时空地理空间数据。
{"title":"Automatic cross-validation in structured models: Is it time to leave out leave-one-out?","authors":"Aritz Adin ,&nbsp;Elias Teixeira Krainski ,&nbsp;Amanda Lenzi ,&nbsp;Zhedong Liu ,&nbsp;Joaquín Martínez-Minaya ,&nbsp;Håvard Rue","doi":"10.1016/j.spasta.2024.100843","DOIUrl":"https://doi.org/10.1016/j.spasta.2024.100843","url":null,"abstract":"<div><p>Standard techniques such as leave-one-out cross-validation (LOOCV) might not be suitable for evaluating the predictive performance of models incorporating structured random effects. In such cases, the correlation between the training and test sets could have a notable impact on the model’s prediction error. To overcome this issue, an automatic group construction procedure for leave-group-out cross validation (LGOCV) has recently emerged as a valuable tool for enhancing predictive performance measurement in structured models. The purpose of this paper is (i) to compare LOOCV and LGOCV within structured models, emphasizing model selection and predictive performance, and (ii) to provide real data applications in spatial statistics using complex structured models fitted with INLA, showcasing the utility of the automatic LGOCV method. First, we briefly review the key aspects of the recently proposed LGOCV method for automatic group construction in latent Gaussian models. We also demonstrate the effectiveness of this method for selecting the model with the highest predictive performance by simulating extrapolation tasks in both temporal and spatial data analyses. Finally, we provide insights into the effectiveness of the LGOCV method in modeling complex structured data, encompassing spatio-temporal multivariate count data, spatial compositional data, and spatio-temporal geospatial data.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2211675324000344/pdfft?md5=58ade5e28808d907246b86bb20b2c270&pid=1-s2.0-S2211675324000344-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141429838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The circular Matérn covariance function and its link to Markov random fields on the circle 圆马特恩协方差函数及其与圆上马尔可夫随机场的联系
IF 2.3 2区 数学 Q1 Mathematics Pub Date : 2024-06-04 DOI: 10.1016/j.spasta.2024.100837
Chunfeng Huang , Ao Li , Nicholas W. Bussberg , Haimeng Zhang

The connection between Gaussian random fields and Markov random fields has been well-established in Euclidean spaces, with Matérn covariance functions playing a pivotal role. In this paper, we explore the extension of this link to circular spaces and uncover different results. It is known that Matérn covariance functions are not always positive definite on the circle; however, the circular Matérn covariance functions are shown to be valid on the circle and are the focus of this paper. For these circular Matérn random fields on the circle, we show that the corresponding Markov random fields can be obtained explicitly on equidistance grids. Consequently, the equivalence between the circular Matérn random fields and Markov random fields is then exact and this marks a departure from the Euclidean space counterpart, where only approximations are achieved. Moreover, the key motivation in Euclidean spaces for establishing such link relies on the assumption that the corresponding Markov random field is sparse. We show that such sparsity does not hold in general on the circle. In addition, for the sparse Markov random field on the circle, we derive its corresponding Gaussian random field.

高斯随机场和马尔可夫随机场之间的联系在欧几里得空间中已经得到了很好的证实,其中马特恩协方差函数发挥了关键作用。在本文中,我们将探索这一联系在圆空间中的延伸,并揭示出不同的结果。众所周知,圆上的 Matérn 协方差函数并不总是正定的;然而,圆上的 Matérn 协方差函数被证明在圆上是有效的,这也是本文的重点。对于圆上的这些圆 Matérn 随机场,我们证明相应的马尔可夫随机场可以在等距网格上明确得到。因此,圆 Matérn 随机场和马尔可夫随机场之间的等价性是精确的,这标志着与欧几里得空间对应场的不同,后者只能得到近似值。此外,欧几里得空间中建立这种联系的关键动机依赖于假设相应的马尔可夫随机场是稀疏的。我们证明,这种稀疏性在圆上一般不成立。此外,对于圆上的稀疏马尔科夫随机场,我们推导出了其相应的高斯随机场。
{"title":"The circular Matérn covariance function and its link to Markov random fields on the circle","authors":"Chunfeng Huang ,&nbsp;Ao Li ,&nbsp;Nicholas W. Bussberg ,&nbsp;Haimeng Zhang","doi":"10.1016/j.spasta.2024.100837","DOIUrl":"https://doi.org/10.1016/j.spasta.2024.100837","url":null,"abstract":"<div><p>The connection between Gaussian random fields and Markov random fields has been well-established in Euclidean spaces, with Matérn covariance functions playing a pivotal role. In this paper, we explore the extension of this link to circular spaces and uncover different results. It is known that Matérn covariance functions are not always positive definite on the circle; however, the circular Matérn covariance functions are shown to be valid on the circle and are the focus of this paper. For these circular Matérn random fields on the circle, we show that the corresponding Markov random fields can be obtained explicitly on equidistance grids. Consequently, the equivalence between the circular Matérn random fields and Markov random fields is then exact and this marks a departure from the Euclidean space counterpart, where only approximations are achieved. Moreover, the key motivation in Euclidean spaces for establishing such link relies on the assumption that the corresponding Markov random field is sparse. We show that such sparsity does not hold in general on the circle. In addition, for the sparse Markov random field on the circle, we derive its corresponding Gaussian random field.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141323908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dimension reduction for spatial regression: Spatial predictor envelope 空间回归的降维:空间预测包络
IF 2.3 2区 数学 Q1 Mathematics Pub Date : 2024-06-01 DOI: 10.1016/j.spasta.2024.100838
Paul May , Hossein Moradi Rekabdarkolaee

Natural sciences such as geology and forestry often utilize regression models for spatial data with many predictors and small to moderate sample sizes. In these settings, efficient estimation of the regression parameters is crucial for both model interpretation and prediction. We propose a dimension reduction approach for spatial regression that assumes certain linear combinations of the predictors are immaterial to the regression. The model and corresponding inference provide efficient estimation of regression parameters while accounting for spatial correlation in the data. We employed the maximum likelihood estimation approach to estimate the parameters of the model. The effectiveness of the proposed model is illustrated through simulation studies and the analysis of a geochemical data set, predicting rare earth element concentrations within an oil and gas reserve in Wyoming. Simulation results indicate that our proposed model offers a significant reduction in the mean square errors and variation of the regression coefficients. Furthermore, the method provided a 50% reduction in prediction variance for rare earth element concentrations within our data analysis.

地质学和林业等自然科学领域经常利用回归模型来处理预测因子多、样本量小到中等的空间数据。在这些情况下,有效估计回归参数对模型解释和预测都至关重要。我们提出了一种空间回归的降维方法,该方法假定预测因子的某些线性组合对回归无关紧要。该模型和相应的推论在考虑数据空间相关性的同时,提供了回归参数的有效估计。我们采用最大似然估计法来估计模型参数。通过模拟研究和对地球化学数据集的分析,预测了怀俄明州油气储量中稀土元素的浓度,从而说明了所提模型的有效性。模拟结果表明,我们提出的模型显著减少了均方误差和回归系数的变化。此外,在我们的数据分析中,该方法还将稀土元素浓度的预测方差减少了 50%。
{"title":"Dimension reduction for spatial regression: Spatial predictor envelope","authors":"Paul May ,&nbsp;Hossein Moradi Rekabdarkolaee","doi":"10.1016/j.spasta.2024.100838","DOIUrl":"10.1016/j.spasta.2024.100838","url":null,"abstract":"<div><p>Natural sciences such as geology and forestry often utilize regression models for spatial data with many predictors and small to moderate sample sizes. In these settings, efficient estimation of the regression parameters is crucial for both model interpretation and prediction. We propose a dimension reduction approach for spatial regression that assumes certain linear combinations of the predictors are immaterial to the regression. The model and corresponding inference provide efficient estimation of regression parameters while accounting for spatial correlation in the data. We employed the maximum likelihood estimation approach to estimate the parameters of the model. The effectiveness of the proposed model is illustrated through simulation studies and the analysis of a geochemical data set, predicting rare earth element concentrations within an oil and gas reserve in Wyoming. Simulation results indicate that our proposed model offers a significant reduction in the mean square errors and variation of the regression coefficients. Furthermore, the method provided a 50% reduction in prediction variance for rare earth element concentrations within our data analysis.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141132058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated deviance information criterion for spatial autoregressive models with heteroskedasticity 具有异方差性的空间自回归模型的综合偏差信息准则
IF 2.3 2区 数学 Q1 Mathematics Pub Date : 2024-05-18 DOI: 10.1016/j.spasta.2024.100842
Osman Doğan

In this study, we introduce the integrated deviance information criterion (DIC) for nested and non-nested model selection problems in heteroskedastic spatial autoregressive models. In a Bayesian estimation setting, we assume that the idiosyncratic error terms of our spatial autoregressive model have a scale mixture of normal distributions, where the scale mixture variables are latent variables that induce heteroskedasticity. We first derive the integrated likelihood function by analytically integrating out the scale mixture variables from the complete-data likelihood function. We then use the integrated likelihood function to formulate the integrated DIC measure. We investigate the finite sample performance of the integrated DIC in selecting the true model in a simulation study. The simulation results show that the integrated DIC performs satisfactorily and can be useful for selecting the correct model in specification search exercises. Finally, in a spatially augmented economic growth model, we use the integrated DIC to choose the spatial weights matrix that leads to better predictive accuracy.

在本研究中,我们针对异方差空间自回归模型中的嵌套和非嵌套模型选择问题引入了综合偏差信息准则(DIC)。在贝叶斯估计环境下,我们假设空间自回归模型的特异性误差项具有正态分布的尺度混合物,其中尺度混合物变量是引起异方差的潜变量。我们首先从完整数据似然函数中分析积分出尺度混合变量,从而得出积分似然函数。然后,我们使用积分似然函数来制定积分 DIC 度量。我们在模拟研究中考察了综合 DIC 在选择真实模型时的有限样本性能。模拟结果表明,综合 DIC 的性能令人满意,可用于在规范搜索练习中选择正确的模型。最后,在空间增强经济增长模型中,我们利用综合 DIC 选择空间权重矩阵,从而获得更好的预测精度。
{"title":"Integrated deviance information criterion for spatial autoregressive models with heteroskedasticity","authors":"Osman Doğan","doi":"10.1016/j.spasta.2024.100842","DOIUrl":"https://doi.org/10.1016/j.spasta.2024.100842","url":null,"abstract":"<div><p>In this study, we introduce the integrated deviance information criterion (DIC) for nested and non-nested model selection problems in heteroskedastic spatial autoregressive models. In a Bayesian estimation setting, we assume that the idiosyncratic error terms of our spatial autoregressive model have a scale mixture of normal distributions, where the scale mixture variables are latent variables that induce heteroskedasticity. We first derive the integrated likelihood function by analytically integrating out the scale mixture variables from the complete-data likelihood function. We then use the integrated likelihood function to formulate the integrated DIC measure. We investigate the finite sample performance of the integrated DIC in selecting the true model in a simulation study. The simulation results show that the integrated DIC performs satisfactorily and can be useful for selecting the correct model in specification search exercises. Finally, in a spatially augmented economic growth model, we use the integrated DIC to choose the spatial weights matrix that leads to better predictive accuracy.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141095822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hypothesis test for detecting spatial patterns in categorical areal data 检测分类面积数据空间模式的假设检验
IF 2.3 2区 数学 Q1 Mathematics Pub Date : 2024-05-04 DOI: 10.1016/j.spasta.2024.100839
Stella Self , Xingpei Zhao , Anja Zgodic , Anna Overby , David White , Alexander C. McLain , Caitlin Dyckman

The vast growth of spatial datasets in recent decades has fueled the development of many statistical methods for detecting spatial patterns. Two of the most commonly studied spatial patterns are clustering, loosely defined as datapoints with similar attributes existing close together, and dispersion, loosely defined as the semi-regular placement of datapoints with similar attributes. In this work, we develop a hypothesis test to detect spatial clustering or dispersion at specific distances in categorical areal data. Such data consists of a set of spatial regions whose boundaries are fixed and known (e.g., counties) associated with a categorical random variable (e.g. whether the county is rural, micropolitan, or metropolitan). We propose a method to extend the positive area proportion function (developed for detecting spatial clustering in binary areal data) to the categorical case. This proposal, referred to as the categorical positive areal proportion function test, can detect various spatial patterns, including homogeneous clusters, heterogeneous clusters, and dispersion. Our approach is the first method capable of distinguishing between different types of clustering in categorical areal data. After validating our method using an extensive simulation study, we use the categorical positive area proportion function test to detect spatial patterns in Boulder County, Colorado USA biological, agricultural, built and open conservation easements.

近几十年来,空间数据集的大量增加推动了许多用于检测空间模式的统计方法的发展。其中最常研究的两种空间模式是聚类和离散,前者宽泛地定义为具有相似属性的数据点紧靠在一起,后者宽泛地定义为具有相似属性的数据点的半规则分布。在这项工作中,我们开发了一种假设检验方法,用于检测分类区域数据中特定距离的空间聚类或分散。此类数据由一组边界固定且已知的空间区域(如县)组成,这些区域与一个分类随机变量(如县是农村、微型城市还是大都市)相关联。我们提出了一种将正面积比例函数(为检测二元面积数据中的空间聚类而开发)扩展到分类情况的方法。该方法被称为分类正面积比例函数检验法,可以检测出各种空间模式,包括同质聚类、异质聚类和离散模式。我们的方法是第一种能够区分分类方差数据中不同类型聚类的方法。在通过大量模拟研究验证了我们的方法后,我们使用分类正面积比例函数检验法检测了美国科罗拉多州博尔德县的生物、农业、建筑和开放式保护地役权的空间模式。
{"title":"A hypothesis test for detecting spatial patterns in categorical areal data","authors":"Stella Self ,&nbsp;Xingpei Zhao ,&nbsp;Anja Zgodic ,&nbsp;Anna Overby ,&nbsp;David White ,&nbsp;Alexander C. McLain ,&nbsp;Caitlin Dyckman","doi":"10.1016/j.spasta.2024.100839","DOIUrl":"https://doi.org/10.1016/j.spasta.2024.100839","url":null,"abstract":"<div><p>The vast growth of spatial datasets in recent decades has fueled the development of many statistical methods for detecting spatial patterns. Two of the most commonly studied spatial patterns are clustering, loosely defined as datapoints with similar attributes existing close together, and dispersion, loosely defined as the semi-regular placement of datapoints with similar attributes. In this work, we develop a hypothesis test to detect spatial clustering or dispersion at specific distances in categorical areal data. Such data consists of a set of spatial regions whose boundaries are fixed and known (e.g., counties) associated with a categorical random variable (e.g. whether the county is rural, micropolitan, or metropolitan). We propose a method to extend the positive area proportion function (developed for detecting spatial clustering in binary areal data) to the categorical case. This proposal, referred to as the categorical positive areal proportion function test, can detect various spatial patterns, including homogeneous clusters, heterogeneous clusters, and dispersion. Our approach is the first method capable of distinguishing between different types of clustering in categorical areal data. After validating our method using an extensive simulation study, we use the categorical positive area proportion function test to detect spatial patterns in Boulder County, Colorado USA biological, agricultural, built and open conservation easements.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Summary statistics for spatio-temporal point processes on linear networks 线性网络时空点过程的汇总统计
IF 2.3 2区 数学 Q1 Mathematics Pub Date : 2024-05-03 DOI: 10.1016/j.spasta.2024.100840
Mehdi Moradi, Ali Sharifi

We propose novel second/higher-order summary statistics for inhomogeneous spatio-temporal point processes when the spatial locations are limited to a linear network. More specifically, letting the spatial distance between events be measured by a regular distance metric, appropriate forms of K- and J-functions are introduced, and their theoretical relationships are studied. The theoretical forms of our proposed summary statistics are investigated under homogeneity, Poissonness, and independent thinning. Moreover, non-parametric estimators are derived, facilitating the use of our proposed summary statistics to study the spatio-temporal dependence between events. Through simulation studies, we demonstrate that our proposed J-function effectively identifies spatio-temporal clustering, inhibition, and randomness. Finally, we examine spatio-temporal dependencies for street crimes in Valencia, Spain, and traffic accidents in New York, USA.

当空间位置局限于线性网络时,我们为非均质时空点过程提出了新的二阶/高阶汇总统计量。更具体地说,让事件之间的空间距离用常规距离度量来测量,引入适当形式的 K 函数和 J 函数,并研究它们之间的理论关系。在同质性、泊松性和独立稀疏性条件下,研究了我们提出的汇总统计的理论形式。此外,我们还推导出了非参数估计器,便于使用我们提出的汇总统计量来研究事件之间的时空依赖性。通过模拟研究,我们证明了我们提出的 J 函数能有效识别时空聚类、抑制和随机性。最后,我们研究了西班牙巴伦西亚街头犯罪和美国纽约交通事故的时空依赖性。
{"title":"Summary statistics for spatio-temporal point processes on linear networks","authors":"Mehdi Moradi,&nbsp;Ali Sharifi","doi":"10.1016/j.spasta.2024.100840","DOIUrl":"https://doi.org/10.1016/j.spasta.2024.100840","url":null,"abstract":"<div><p>We propose novel second/higher-order summary statistics for inhomogeneous spatio-temporal point processes when the spatial locations are limited to a linear network. More specifically, letting the spatial distance between events be measured by a regular distance metric, appropriate forms of <span><math><mi>K</mi></math></span>- and <span><math><mi>J</mi></math></span>-functions are introduced, and their theoretical relationships are studied. The theoretical forms of our proposed summary statistics are investigated under homogeneity, Poissonness, and independent thinning. Moreover, non-parametric estimators are derived, facilitating the use of our proposed summary statistics to study the spatio-temporal dependence between events. Through simulation studies, we demonstrate that our proposed <span><math><mi>J</mi></math></span>-function effectively identifies spatio-temporal clustering, inhibition, and randomness. Finally, we examine spatio-temporal dependencies for street crimes in Valencia, Spain, and traffic accidents in New York, USA.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2211675324000319/pdfft?md5=345d0dcd771c5b3d1a2f681cc0522723&pid=1-s2.0-S2211675324000319-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140893753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rapid outlier detection, model selection and variable selection using penalized likelihood estimation for general spatial models 利用一般空间模型的惩罚似然估计快速检测离群值、选择模型和变量
IF 2.3 2区 数学 Q1 Mathematics Pub Date : 2024-04-27 DOI: 10.1016/j.spasta.2024.100834
Yunquan Song, Minglu Fang, Yuanfeng Wang, Yiming Hou

The outliers in the data set have a potential influence on the statistical inference and can provide some useful information behind the data set, the methodology for outlier detection and accommodation is always an important topic in data analysis. For spatial data, its influence not only affects coefficient estimation but model selection. The traditional method usually carries out outlier detection, model selection and variable selection step by step, so the data processing efficiency is not high. In order to further improve the efficiency and accuracy of data processing, based on the general spatial model, we consider a technique to achieve outlier detection, along with model and variable estimation in one step. In the general spatial model, we add a mean shift parameter for each data point to identify outliers. Penalized likelihood estimation (PLE) is proposed to simultaneously detect outliers, and to select spatial models and explanatory variables for spatial data. This method correctly identifies multiple outliers, provides a proper spatial model, and corrects coefficient estimation without removing outliers in numerical simulation and case analysis. Compared to current methods, PLE detects outliers more quickly, and solves the optimization problem to select spatial models and explanatory variables. Calculation is easy using the optimized solnp function in R software.

数据集中的离群值对统计推断有潜在影响,并能提供数据集背后的一些有用信息,因此离群值的检测和容纳方法始终是数据分析中的一个重要课题。对于空间数据而言,其影响不仅会影响系数估计,还会影响模型选择。传统的方法通常是逐步进行离群点检测、模型选择和变量选择,因此数据处理效率不高。为了进一步提高数据处理的效率和准确性,我们在一般空间模型的基础上,考虑采用一种技术来实现离群点检测、模型和变量估计的一步到位。在一般空间模型中,我们为每个数据点添加一个均值偏移参数,以识别离群值。我们提出了惩罚似然估计法(PLE)来同时检测异常值,并为空间数据选择空间模型和解释变量。在数值模拟和案例分析中,该方法能正确识别多个离群值,提供合适的空间模型,并在不去除离群值的情况下修正系数估计。与现有方法相比,PLE 能更快地发现异常值,并解决选择空间模型和解释变量的优化问题。使用 R 软件中的优化 solnp 函数,计算非常简单。
{"title":"Rapid outlier detection, model selection and variable selection using penalized likelihood estimation for general spatial models","authors":"Yunquan Song,&nbsp;Minglu Fang,&nbsp;Yuanfeng Wang,&nbsp;Yiming Hou","doi":"10.1016/j.spasta.2024.100834","DOIUrl":"https://doi.org/10.1016/j.spasta.2024.100834","url":null,"abstract":"<div><p>The outliers in the data set have a potential influence on the statistical inference and can provide some useful information behind the data set, the methodology for outlier detection and accommodation is always an important topic in data analysis. For spatial data, its influence not only affects coefficient estimation but model selection. The traditional method usually carries out outlier detection, model selection and variable selection step by step, so the data processing efficiency is not high. In order to further improve the efficiency and accuracy of data processing, based on the general spatial model, we consider a technique to achieve outlier detection, along with model and variable estimation in one step. In the general spatial model, we add a mean shift parameter for each data point to identify outliers. Penalized likelihood estimation (PLE) is proposed to simultaneously detect outliers, and to select spatial models and explanatory variables for spatial data. This method correctly identifies multiple outliers, provides a proper spatial model, and corrects coefficient estimation without removing outliers in numerical simulation and case analysis. Compared to current methods, PLE detects outliers more quickly, and solves the optimization problem to select spatial models and explanatory variables. Calculation is easy using the optimized solnp function in R software.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140815588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental transfer learning for spatial autoregressive model with linear constraints 具有线性约束条件的空间自回归模型的增量转移学习
IF 2.3 2区 数学 Q1 Mathematics Pub Date : 2024-04-27 DOI: 10.1016/j.spasta.2024.100833
Jie Li, Yunquan Song

Transfer learning is generally regarded as a beneficial technique for utilizing external information to enhance learning performance on target tasks. However, current research on transfer learning in high-dimensional regression models does not take into account both the location information of the data and the explicit utilization of prior knowledge. In the framework of transfer learning, this study seeks to resolve the spatial autoregressive problem and investigate the impact of introducing linear constraints. In this paper, a two-step transfer learning approach and a transferable source detection algorithm based on cross-validation are proposed when the input dimensions of the source and target datasets are the same. When the input dimensions are different, this paper suggests a straightforward and workable incremental transfer learning method. Additionally, for the estimating model developed under this method, Karush–Kuhn–Tucker (KKT) conditions and degrees of freedom are determined, and a Bayesian Information Criterion (BIC) is created for choosing hyperparameters. The effectiveness of the proposed methods is proven by numerical calculations, and the performance of the model in transfer learning estimation is improved by the addition of linear constraints.

迁移学习通常被认为是一种利用外部信息提高目标任务学习成绩的有益技术。然而,目前关于高维回归模型中迁移学习的研究并没有考虑数据的位置信息和先验知识的明确利用。在迁移学习的框架下,本研究试图解决空间自回归问题,并研究引入线性约束的影响。当源数据集和目标数据集的输入维度相同时,本文提出了基于交叉验证的两步迁移学习方法和可迁移源检测算法。当输入维度不同时,本文提出了一种简单可行的增量迁移学习方法。此外,本文还确定了根据该方法建立的估计模型的卡鲁什-库恩-塔克(KKT)条件和自由度,并创建了贝叶斯信息准则(BIC)来选择超参数。通过数值计算证明了所提方法的有效性,并通过添加线性约束提高了模型在迁移学习估计中的性能。
{"title":"Incremental transfer learning for spatial autoregressive model with linear constraints","authors":"Jie Li,&nbsp;Yunquan Song","doi":"10.1016/j.spasta.2024.100833","DOIUrl":"https://doi.org/10.1016/j.spasta.2024.100833","url":null,"abstract":"<div><p>Transfer learning is generally regarded as a beneficial technique for utilizing external information to enhance learning performance on target tasks. However, current research on transfer learning in high-dimensional regression models does not take into account both the location information of the data and the explicit utilization of prior knowledge. In the framework of transfer learning, this study seeks to resolve the spatial autoregressive problem and investigate the impact of introducing linear constraints. In this paper, a two-step transfer learning approach and a transferable source detection algorithm based on cross-validation are proposed when the input dimensions of the source and target datasets are the same. When the input dimensions are different, this paper suggests a straightforward and workable incremental transfer learning method. Additionally, for the estimating model developed under this method, Karush–Kuhn–Tucker (KKT) conditions and degrees of freedom are determined, and a Bayesian Information Criterion (BIC) is created for choosing hyperparameters. The effectiveness of the proposed methods is proven by numerical calculations, and the performance of the model in transfer learning estimation is improved by the addition of linear constraints.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140824975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring heterogeneity and dynamics of meteorological influences on US PM2.5: A distributed learning approach with spatiotemporal varying coefficient models 探索气象对美国 PM2.5 影响的异质性和动态性:采用时空变化系数模型的分布式学习方法
IF 2.3 2区 数学 Q1 Mathematics Pub Date : 2024-04-25 DOI: 10.1016/j.spasta.2024.100826
Lily Wang , Guannan Wang , Annie S. Gao

Particulate matter (PM) has emerged as a primary air quality concern due to its substantial impact on human health. Many recent research works suggest that PM2.5 concentrations depend on meteorological conditions. Enhancing current pollution control strategies necessitates a more holistic comprehension of PM2.5 dynamics and the precise quantification of spatiotemporal heterogeneity in the relationship between meteorological factors and PM2.5 levels. The spatiotemporal varying coefficient model stands as a prominent spatial regression technique adept at addressing this heterogeneity. Amidst the challenges posed by the substantial scale of modern spatiotemporal datasets, we propose a pioneering distributed estimation method (DEM) founded on multivariate spline smoothing across a domain’s triangulation. This DEM algorithm ensures an easily implementable, highly scalable, and communication-efficient strategy, demonstrating almost linear speedup potential. We validate the effectiveness of our proposed DEM through extensive simulation studies, demonstrating that it achieves coefficient estimations akin to those of global estimators derived from complete datasets. Applying the proposed model and method to the US daily PM2.5 and meteorological data, we investigate the influence of meteorological variables on PM2.5 concentrations, revealing both spatial and seasonal variations in this relationship.

颗粒物(PM)由于对人类健康有重大影响,已成为空气质量的首要问题。最近的许多研究表明,PM2.5 的浓度取决于气象条件。要加强当前的污染控制策略,就必须更全面地了解 PM2.5 的动态变化,并精确量化气象因素与 PM2.5 浓度之间的时空异质性关系。时空变化系数模型是善于处理这种异质性的一种突出的空间回归技术。面对现代时空数据集的巨大规模所带来的挑战,我们提出了一种开创性的分布式估算方法(DEM),该方法建立在对域的三角剖分进行多元样条平滑的基础上。这种 DEM 算法确保了策略的易实施性、高度可扩展性和通信效率,展示了几乎线性的加速潜力。我们通过大量的模拟研究验证了所提出的 DEM 算法的有效性,证明其系数估算结果与从完整数据集得出的全局估算结果相近。我们将提出的模型和方法应用于美国每日 PM2.5 和气象数据,研究了气象变量对 PM2.5 浓度的影响,揭示了这种关系的空间和季节变化。
{"title":"Exploring heterogeneity and dynamics of meteorological influences on US PM2.5: A distributed learning approach with spatiotemporal varying coefficient models","authors":"Lily Wang ,&nbsp;Guannan Wang ,&nbsp;Annie S. Gao","doi":"10.1016/j.spasta.2024.100826","DOIUrl":"10.1016/j.spasta.2024.100826","url":null,"abstract":"<div><p>Particulate matter (PM) has emerged as a primary air quality concern due to its substantial impact on human health. Many recent research works suggest that PM<sub>2.5</sub> concentrations depend on meteorological conditions. Enhancing current pollution control strategies necessitates a more holistic comprehension of PM<sub>2.5</sub> dynamics and the precise quantification of spatiotemporal heterogeneity in the relationship between meteorological factors and PM<sub>2.5</sub> levels. The spatiotemporal varying coefficient model stands as a prominent spatial regression technique adept at addressing this heterogeneity. Amidst the challenges posed by the substantial scale of modern spatiotemporal datasets, we propose a pioneering distributed estimation method (DEM) founded on multivariate spline smoothing across a domain’s triangulation. This DEM algorithm ensures an easily implementable, highly scalable, and communication-efficient strategy, demonstrating almost linear speedup potential. We validate the effectiveness of our proposed DEM through extensive simulation studies, demonstrating that it achieves coefficient estimations akin to those of global estimators derived from complete datasets. Applying the proposed model and method to the US daily PM<sub>2.5</sub> and meteorological data, we investigate the influence of meteorological variables on PM<sub>2.5</sub> concentrations, revealing both spatial and seasonal variations in this relationship.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140766668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A nonparametric penalized likelihood approach to density estimation of space–time point patterns 时空点模式密度估计的非参数惩罚似然法
IF 2.3 2区 数学 Q1 Mathematics Pub Date : 2024-04-18 DOI: 10.1016/j.spasta.2024.100824
Blerta Begu , Simone Panzeri , Eleonora Arnone , Michelle Carey , Laura M. Sangalli

In this work, we consider space–time point processes and study their continuous space–time evolution. We propose an innovative nonparametric methodology to estimate the unknown space–time density of the point pattern, or, equivalently, to estimate the intensity of an inhomogeneous space–time Poisson point process. The presented approach combines maximum likelihood estimation with roughness penalties, based on differential operators, defined over the spatial and temporal domains of interest. We first establish some important theoretical properties of the considered estimator, including its consistency. We then develop an efficient and flexible estimation procedure that leverages advanced numerical and computation techniques. Thanks to a discretization based on finite elements in space and B-splines in time, the proposed method can effectively capture complex multi-modal and strongly anisotropic spatio-temporal point patterns; moreover, these point patterns may be observed over planar or curved domains with non-trivial geometries, due to geographic constraints, such as coastal regions with complicated shorelines, or curved regions with complex orography. In addition to providing estimates, the method’s functionalities also include the introduction of appropriate uncertainty quantification tools. We thoroughly validate the proposed method, by means of simulation studies and applications to real-world data. The obtained results highlight significant advantages over state-of-the-art competing approaches.

在这项工作中,我们考虑了时空点过程,并研究了它们的连续时空演变。我们提出了一种创新的非参数方法来估算点模式的未知时空密度,或者等同于估算不均匀时空泊松点过程的强度。所提出的方法将最大似然估计与基于微分算子的粗糙度惩罚相结合,微分算子定义在感兴趣的空间和时间域上。我们首先确定了所考虑的估计器的一些重要理论特性,包括其一致性。然后,我们利用先进的数值和计算技术,开发出一种高效灵活的估算程序。由于采用了基于空间有限元和时间 B-样条的离散化方法,所提出的方法可以有效捕捉复杂的多模式和强各向异性的时空点模式;此外,由于地理条件的限制,这些点模式可能会在具有非三维几何形状的平面或曲面域上观测到,例如具有复杂海岸线的沿海地区或具有复杂地形的曲面区域。除了提供估计值,该方法的功能还包括引入适当的不确定性量化工具。我们通过模拟研究和实际数据应用,对所提出的方法进行了全面验证。所获得的结果凸显了与最先进的竞争方法相比的显著优势。
{"title":"A nonparametric penalized likelihood approach to density estimation of space–time point patterns","authors":"Blerta Begu ,&nbsp;Simone Panzeri ,&nbsp;Eleonora Arnone ,&nbsp;Michelle Carey ,&nbsp;Laura M. Sangalli","doi":"10.1016/j.spasta.2024.100824","DOIUrl":"10.1016/j.spasta.2024.100824","url":null,"abstract":"<div><p>In this work, we consider space–time point processes and study their continuous space–time evolution. We propose an innovative nonparametric methodology to estimate the unknown space–time density of the point pattern, or, equivalently, to estimate the intensity of an inhomogeneous space–time Poisson point process. The presented approach combines maximum likelihood estimation with roughness penalties, based on differential operators, defined over the spatial and temporal domains of interest. We first establish some important theoretical properties of the considered estimator, including its consistency. We then develop an efficient and flexible estimation procedure that leverages advanced numerical and computation techniques. Thanks to a discretization based on finite elements in space and B-splines in time, the proposed method can effectively capture complex multi-modal and strongly anisotropic spatio-temporal point patterns; moreover, these point patterns may be observed over planar or curved domains with non-trivial geometries, due to geographic constraints, such as coastal regions with complicated shorelines, or curved regions with complex orography. In addition to providing estimates, the method’s functionalities also include the introduction of appropriate uncertainty quantification tools. We thoroughly validate the proposed method, by means of simulation studies and applications to real-world data. The obtained results highlight significant advantages over state-of-the-art competing approaches.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2211675324000150/pdfft?md5=fcab55472ed3f4b5aa4f0e9b44fe624a&pid=1-s2.0-S2211675324000150-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140757441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Spatial Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1