用于空间预测的局部变化地质统计机器学习

Artificial Intelligence in Geosciences Pub Date : 2024-07-02 DOI:10.1016/j.aiig.2024.100081

Francky Fouedjio , Emet Arya

{"title":"用于空间预测的局部变化地质统计机器学习","authors":"Francky Fouedjio , Emet Arya","doi":"10.1016/j.aiig.2024.100081","DOIUrl":null,"url":null,"abstract":"<div><p>Machine learning methods dealing with the spatial auto-correlation of the response variable have garnered significant attention in the context of spatial prediction. Nonetheless, under these methods, the relationship between the response variable and explanatory variables is assumed to be homogeneous throughout the entire study area. This assumption, known as spatial stationarity, is very questionable in real-world situations due to the influence of contextual factors. Therefore, allowing the relationship between the target variable and predictor variables to vary spatially within the study region is more reasonable. However, existing machine learning techniques accounting for the spatially varying relationship between the dependent variable and the predictor variables do not capture the spatial auto-correlation of the dependent variable itself. Moreover, under these techniques, local machine learning models are effectively built using only fewer observations, which can lead to well-known issues such as over-fitting and the curse of dimensionality. This paper introduces a novel geostatistical machine learning approach where both the spatial auto-correlation of the response variable and the spatial non-stationarity of the regression relationship between the response and predictor variables are explicitly considered. The basic idea consists of relying on the local stationarity assumption to build a collection of local machine learning models while leveraging on the local spatial auto-correlation of the response variable to locally augment the training dataset. The proposed method’s effectiveness is showcased via experiments conducted on synthetic spatial data with known characteristics as well as real-world spatial data. In the synthetic (resp. real) case study, the proposed method’s predictive accuracy, as indicated by the Root Mean Square Error (RMSE) on the test set, is 17% (resp. 7%) better than that of popular machine learning methods dealing with the response variable’s spatial auto-correlation. Additionally, this method is not only valuable for spatial prediction but also offers a deeper understanding of how the relationship between the target and predictor variables varies across space, and it can even be used to investigate the local significance of predictor variables.</p></div>","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"5 ","pages":"Article 100081"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666544124000224/pdfft?md5=54078444bfb0fb6f7d6f252ccf51265a&pid=1-s2.0-S2666544124000224-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Locally varying geostatistical machine learning for spatial prediction\",\"authors\":\"Francky Fouedjio , Emet Arya\",\"doi\":\"10.1016/j.aiig.2024.100081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Machine learning methods dealing with the spatial auto-correlation of the response variable have garnered significant attention in the context of spatial prediction. Nonetheless, under these methods, the relationship between the response variable and explanatory variables is assumed to be homogeneous throughout the entire study area. This assumption, known as spatial stationarity, is very questionable in real-world situations due to the influence of contextual factors. Therefore, allowing the relationship between the target variable and predictor variables to vary spatially within the study region is more reasonable. However, existing machine learning techniques accounting for the spatially varying relationship between the dependent variable and the predictor variables do not capture the spatial auto-correlation of the dependent variable itself. Moreover, under these techniques, local machine learning models are effectively built using only fewer observations, which can lead to well-known issues such as over-fitting and the curse of dimensionality. This paper introduces a novel geostatistical machine learning approach where both the spatial auto-correlation of the response variable and the spatial non-stationarity of the regression relationship between the response and predictor variables are explicitly considered. The basic idea consists of relying on the local stationarity assumption to build a collection of local machine learning models while leveraging on the local spatial auto-correlation of the response variable to locally augment the training dataset. The proposed method’s effectiveness is showcased via experiments conducted on synthetic spatial data with known characteristics as well as real-world spatial data. In the synthetic (resp. real) case study, the proposed method’s predictive accuracy, as indicated by the Root Mean Square Error (RMSE) on the test set, is 17% (resp. 7%) better than that of popular machine learning methods dealing with the response variable’s spatial auto-correlation. Additionally, this method is not only valuable for spatial prediction but also offers a deeper understanding of how the relationship between the target and predictor variables varies across space, and it can even be used to investigate the local significance of predictor variables.</p></div>\",\"PeriodicalId\":100124,\"journal\":{\"name\":\"Artificial Intelligence in Geosciences\",\"volume\":\"5 \",\"pages\":\"Article 100081\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666544124000224/pdfft?md5=54078444bfb0fb6f7d6f252ccf51265a&pid=1-s2.0-S2666544124000224-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Geosciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666544124000224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666544124000224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在空间预测方面，处理响应变量空间自相关性的机器学习方法备受关注。然而，在这些方法中，响应变量与解释变量之间的关系被假定为在整个研究区域内是同质的。由于受到环境因素的影响，这种被称为空间静止性的假设在现实世界中很成问题。因此，允许目标变量和预测变量之间的关系在研究区域内发生空间变化更为合理。然而，考虑因变量与预测变量之间空间变化关系的现有机器学习技术并不能捕捉因变量本身的空间自相关性。此外，在这些技术下，只需使用较少的观测数据就能有效地建立局部机器学习模型，这可能会导致众所周知的问题，如过拟合和维度诅咒。本文介绍了一种新颖的地理统计机器学习方法，其中明确考虑了响应变量的空间自相关性以及响应变量与预测变量之间回归关系的空间非平稳性。该方法的基本思路是依靠本地静态假设来建立一系列本地机器学习模型，同时利用响应变量的本地空间自相关性来本地增强训练数据集。通过对具有已知特征的合成空间数据和真实世界空间数据进行实验，展示了所提方法的有效性。在合成（或真实）案例研究中，根据测试集上的均方根误差（RMSE），建议方法的预测准确性比处理响应变量空间自相关性的流行机器学习方法高出 17%（或 7%）。此外，这种方法不仅对空间预测有价值，还能更深入地了解目标变量和预测变量之间的关系在不同空间的变化情况，甚至可以用来研究预测变量的局部重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Locally varying geostatistical machine learning for spatial prediction

Machine learning methods dealing with the spatial auto-correlation of the response variable have garnered significant attention in the context of spatial prediction. Nonetheless, under these methods, the relationship between the response variable and explanatory variables is assumed to be homogeneous throughout the entire study area. This assumption, known as spatial stationarity, is very questionable in real-world situations due to the influence of contextual factors. Therefore, allowing the relationship between the target variable and predictor variables to vary spatially within the study region is more reasonable. However, existing machine learning techniques accounting for the spatially varying relationship between the dependent variable and the predictor variables do not capture the spatial auto-correlation of the dependent variable itself. Moreover, under these techniques, local machine learning models are effectively built using only fewer observations, which can lead to well-known issues such as over-fitting and the curse of dimensionality. This paper introduces a novel geostatistical machine learning approach where both the spatial auto-correlation of the response variable and the spatial non-stationarity of the regression relationship between the response and predictor variables are explicitly considered. The basic idea consists of relying on the local stationarity assumption to build a collection of local machine learning models while leveraging on the local spatial auto-correlation of the response variable to locally augment the training dataset. The proposed method’s effectiveness is showcased via experiments conducted on synthetic spatial data with known characteristics as well as real-world spatial data. In the synthetic (resp. real) case study, the proposed method’s predictive accuracy, as indicated by the Root Mean Square Error (RMSE) on the test set, is 17% (resp. 7%) better than that of popular machine learning methods dealing with the response variable’s spatial auto-correlation. Additionally, this method is not only valuable for spatial prediction but also offers a deeper understanding of how the relationship between the target and predictor variables varies across space, and it can even be used to investigate the local significance of predictor variables.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence in Geosciences

CiteScore

4.20

自引率

0.00%

发文量