地理数据主成分分析的广义加法模型(GAM)方法

IF 2.1 2区 数学 Q3 GEOSCIENCES, MULTIDISCIPLINARY Spatial Statistics Pub Date : 2023-12-29 DOI:10.1016/j.spasta.2023.100806
Francisco de Asís López , Celestino Ordóñez , Javier Roca-Pardiñas
{"title":"地理数据主成分分析的广义加法模型(GAM)方法","authors":"Francisco de Asís López ,&nbsp;Celestino Ordóñez ,&nbsp;Javier Roca-Pardiñas","doi":"10.1016/j.spasta.2023.100806","DOIUrl":null,"url":null,"abstract":"<div><p>Geographically Weighted Principal Component Analysis (GWPCA) is an extension of classical PCA to deal with the spatial heterogeneity of geographical data. This heterogeneity results in a variance–covariance matrix that is not stationary but changes with the geographical location. Despite its usefulness, this method presents some unsolved issues, such as finding an appropriate bandwidth (size of the vicinity) as a function of the retained components. In this work, we address the problem of calculating principal components for geographical data from a new perspective that overcomes this problem. Specifically we propose a scale-location model which uses generalized additive models (GAMs) to calculate means for each variable and a correlation matrix that relates the variables, both depending on the spatial location. It should be noticed that although we deal with geographic data, our methodology cannot be considered strictly spatial since we assume that there is not a spatial correlation structure in the error term.</p><p>Our approach does not require to calculate an optimal bandwidth as a function of the number of components retained in the analysis. Instead, the covariance matrix is estimated using smooth functions adapted to the data, so the smoothness can be different for each element of the matrix. The proposed methodology was tested with simulated data and compared with GWPCA. The result was a better representation of the data structure in the proposed method. Finally, we show the possibilities of our method in a problem with real data regarding air pollution and socioeconomic factors.</p></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2211675323000817/pdfft?md5=e258c8c408f56930e791b8a9dc8c5206&pid=1-s2.0-S2211675323000817-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A generalized additive model (GAM) approach to principal component analysis of geographic data\",\"authors\":\"Francisco de Asís López ,&nbsp;Celestino Ordóñez ,&nbsp;Javier Roca-Pardiñas\",\"doi\":\"10.1016/j.spasta.2023.100806\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Geographically Weighted Principal Component Analysis (GWPCA) is an extension of classical PCA to deal with the spatial heterogeneity of geographical data. This heterogeneity results in a variance–covariance matrix that is not stationary but changes with the geographical location. Despite its usefulness, this method presents some unsolved issues, such as finding an appropriate bandwidth (size of the vicinity) as a function of the retained components. In this work, we address the problem of calculating principal components for geographical data from a new perspective that overcomes this problem. Specifically we propose a scale-location model which uses generalized additive models (GAMs) to calculate means for each variable and a correlation matrix that relates the variables, both depending on the spatial location. It should be noticed that although we deal with geographic data, our methodology cannot be considered strictly spatial since we assume that there is not a spatial correlation structure in the error term.</p><p>Our approach does not require to calculate an optimal bandwidth as a function of the number of components retained in the analysis. Instead, the covariance matrix is estimated using smooth functions adapted to the data, so the smoothness can be different for each element of the matrix. The proposed methodology was tested with simulated data and compared with GWPCA. The result was a better representation of the data structure in the proposed method. Finally, we show the possibilities of our method in a problem with real data regarding air pollution and socioeconomic factors.</p></div>\",\"PeriodicalId\":48771,\"journal\":{\"name\":\"Spatial Statistics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2023-12-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2211675323000817/pdfft?md5=e258c8c408f56930e791b8a9dc8c5206&pid=1-s2.0-S2211675323000817-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Spatial Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2211675323000817\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spatial Statistics","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211675323000817","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

地理加权主成分分析(GWPCA)是经典 PCA 的扩展,用于处理地理数据的空间异质性。这种异质性导致方差-协方差矩阵不是静态的,而是随着地理位置的变化而变化。尽管这种方法非常有用,但它也存在一些尚未解决的问题,例如如何找到一个合适的带宽(邻近区域的大小)作为保留成分的函数。在这项工作中,我们从一个新的角度来解决地理数据的主成分计算问题,从而克服了这个问题。具体来说,我们提出了一种规模-位置模型,该模型使用广义加法模型(GAMs)计算每个变量的均值,以及将变量联系起来的相关矩阵,两者都取决于空间位置。需要注意的是,虽然我们处理的是地理数据,但我们的方法不能被视为严格意义上的空间方法,因为我们假设误差项不存在空间相关结构。相反,协方差矩阵是使用适应数据的平滑函数估算的,因此矩阵中每个元素的平滑度可以不同。我们用模拟数据对所提出的方法进行了测试,并与 GWPCA 进行了比较。结果表明,提议的方法能更好地表示数据结构。最后,我们展示了我们的方法在一个有关空气污染和社会经济因素的真实数据问题中的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A generalized additive model (GAM) approach to principal component analysis of geographic data

Geographically Weighted Principal Component Analysis (GWPCA) is an extension of classical PCA to deal with the spatial heterogeneity of geographical data. This heterogeneity results in a variance–covariance matrix that is not stationary but changes with the geographical location. Despite its usefulness, this method presents some unsolved issues, such as finding an appropriate bandwidth (size of the vicinity) as a function of the retained components. In this work, we address the problem of calculating principal components for geographical data from a new perspective that overcomes this problem. Specifically we propose a scale-location model which uses generalized additive models (GAMs) to calculate means for each variable and a correlation matrix that relates the variables, both depending on the spatial location. It should be noticed that although we deal with geographic data, our methodology cannot be considered strictly spatial since we assume that there is not a spatial correlation structure in the error term.

Our approach does not require to calculate an optimal bandwidth as a function of the number of components retained in the analysis. Instead, the covariance matrix is estimated using smooth functions adapted to the data, so the smoothness can be different for each element of the matrix. The proposed methodology was tested with simulated data and compared with GWPCA. The result was a better representation of the data structure in the proposed method. Finally, we show the possibilities of our method in a problem with real data regarding air pollution and socioeconomic factors.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Spatial Statistics
Spatial Statistics GEOSCIENCES, MULTIDISCIPLINARY-MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
CiteScore
4.00
自引率
21.70%
发文量
89
审稿时长
55 days
期刊介绍: Spatial Statistics publishes articles on the theory and application of spatial and spatio-temporal statistics. It favours manuscripts that present theory generated by new applications, or in which new theory is applied to an important practical case. A purely theoretical study will only rarely be accepted. Pure case studies without methodological development are not acceptable for publication. Spatial statistics concerns the quantitative analysis of spatial and spatio-temporal data, including their statistical dependencies, accuracy and uncertainties. Methodology for spatial statistics is typically found in probability theory, stochastic modelling and mathematical statistics as well as in information science. Spatial statistics is used in mapping, assessing spatial data quality, sampling design optimisation, modelling of dependence structures, and drawing of valid inference from a limited set of spatio-temporal data.
期刊最新文献
Uncovering hidden alignments in two-dimensional point fields Spatio-temporal data fusion for the analysis of in situ and remote sensing data using the INLA-SPDE approach Exploiting nearest-neighbour maps for estimating the variance of sample mean in equal-probability systematic sampling of spatial populations Variable selection of nonparametric spatial autoregressive models via deep learning Estimation and inference of multi-effect generalized geographically and temporally weighted regression models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1