用于空间预测的局部变化地质统计机器学习

Francky Fouedjio , Emet Arya
{"title":"用于空间预测的局部变化地质统计机器学习","authors":"Francky Fouedjio ,&nbsp;Emet Arya","doi":"10.1016/j.aiig.2024.100081","DOIUrl":null,"url":null,"abstract":"<div><p>Machine learning methods dealing with the spatial auto-correlation of the response variable have garnered significant attention in the context of spatial prediction. Nonetheless, under these methods, the relationship between the response variable and explanatory variables is assumed to be homogeneous throughout the entire study area. This assumption, known as spatial stationarity, is very questionable in real-world situations due to the influence of contextual factors. Therefore, allowing the relationship between the target variable and predictor variables to vary spatially within the study region is more reasonable. However, existing machine learning techniques accounting for the spatially varying relationship between the dependent variable and the predictor variables do not capture the spatial auto-correlation of the dependent variable itself. Moreover, under these techniques, local machine learning models are effectively built using only fewer observations, which can lead to well-known issues such as over-fitting and the curse of dimensionality. This paper introduces a novel geostatistical machine learning approach where both the spatial auto-correlation of the response variable and the spatial non-stationarity of the regression relationship between the response and predictor variables are explicitly considered. The basic idea consists of relying on the local stationarity assumption to build a collection of local machine learning models while leveraging on the local spatial auto-correlation of the response variable to locally augment the training dataset. The proposed method’s effectiveness is showcased via experiments conducted on synthetic spatial data with known characteristics as well as real-world spatial data. In the synthetic (resp. real) case study, the proposed method’s predictive accuracy, as indicated by the Root Mean Square Error (RMSE) on the test set, is 17% (resp. 7%) better than that of popular machine learning methods dealing with the response variable’s spatial auto-correlation. Additionally, this method is not only valuable for spatial prediction but also offers a deeper understanding of how the relationship between the target and predictor variables varies across space, and it can even be used to investigate the local significance of predictor variables.</p></div>","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"5 ","pages":"Article 100081"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666544124000224/pdfft?md5=54078444bfb0fb6f7d6f252ccf51265a&pid=1-s2.0-S2666544124000224-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Locally varying geostatistical machine learning for spatial prediction\",\"authors\":\"Francky Fouedjio ,&nbsp;Emet Arya\",\"doi\":\"10.1016/j.aiig.2024.100081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Machine learning methods dealing with the spatial auto-correlation of the response variable have garnered significant attention in the context of spatial prediction. Nonetheless, under these methods, the relationship between the response variable and explanatory variables is assumed to be homogeneous throughout the entire study area. This assumption, known as spatial stationarity, is very questionable in real-world situations due to the influence of contextual factors. Therefore, allowing the relationship between the target variable and predictor variables to vary spatially within the study region is more reasonable. However, existing machine learning techniques accounting for the spatially varying relationship between the dependent variable and the predictor variables do not capture the spatial auto-correlation of the dependent variable itself. Moreover, under these techniques, local machine learning models are effectively built using only fewer observations, which can lead to well-known issues such as over-fitting and the curse of dimensionality. This paper introduces a novel geostatistical machine learning approach where both the spatial auto-correlation of the response variable and the spatial non-stationarity of the regression relationship between the response and predictor variables are explicitly considered. The basic idea consists of relying on the local stationarity assumption to build a collection of local machine learning models while leveraging on the local spatial auto-correlation of the response variable to locally augment the training dataset. The proposed method’s effectiveness is showcased via experiments conducted on synthetic spatial data with known characteristics as well as real-world spatial data. In the synthetic (resp. real) case study, the proposed method’s predictive accuracy, as indicated by the Root Mean Square Error (RMSE) on the test set, is 17% (resp. 7%) better than that of popular machine learning methods dealing with the response variable’s spatial auto-correlation. Additionally, this method is not only valuable for spatial prediction but also offers a deeper understanding of how the relationship between the target and predictor variables varies across space, and it can even be used to investigate the local significance of predictor variables.</p></div>\",\"PeriodicalId\":100124,\"journal\":{\"name\":\"Artificial Intelligence in Geosciences\",\"volume\":\"5 \",\"pages\":\"Article 100081\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666544124000224/pdfft?md5=54078444bfb0fb6f7d6f252ccf51265a&pid=1-s2.0-S2666544124000224-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Geosciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666544124000224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666544124000224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在空间预测方面,处理响应变量空间自相关性的机器学习方法备受关注。然而,在这些方法中,响应变量与解释变量之间的关系被假定为在整个研究区域内是同质的。由于受到环境因素的影响,这种被称为空间静止性的假设在现实世界中很成问题。因此,允许目标变量和预测变量之间的关系在研究区域内发生空间变化更为合理。然而,考虑因变量与预测变量之间空间变化关系的现有机器学习技术并不能捕捉因变量本身的空间自相关性。此外,在这些技术下,只需使用较少的观测数据就能有效地建立局部机器学习模型,这可能会导致众所周知的问题,如过拟合和维度诅咒。本文介绍了一种新颖的地理统计机器学习方法,其中明确考虑了响应变量的空间自相关性以及响应变量与预测变量之间回归关系的空间非平稳性。该方法的基本思路是依靠本地静态假设来建立一系列本地机器学习模型,同时利用响应变量的本地空间自相关性来本地增强训练数据集。通过对具有已知特征的合成空间数据和真实世界空间数据进行实验,展示了所提方法的有效性。在合成(或真实)案例研究中,根据测试集上的均方根误差(RMSE),建议方法的预测准确性比处理响应变量空间自相关性的流行机器学习方法高出 17%(或 7%)。此外,这种方法不仅对空间预测有价值,还能更深入地了解目标变量和预测变量之间的关系在不同空间的变化情况,甚至可以用来研究预测变量的局部重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Locally varying geostatistical machine learning for spatial prediction

Machine learning methods dealing with the spatial auto-correlation of the response variable have garnered significant attention in the context of spatial prediction. Nonetheless, under these methods, the relationship between the response variable and explanatory variables is assumed to be homogeneous throughout the entire study area. This assumption, known as spatial stationarity, is very questionable in real-world situations due to the influence of contextual factors. Therefore, allowing the relationship between the target variable and predictor variables to vary spatially within the study region is more reasonable. However, existing machine learning techniques accounting for the spatially varying relationship between the dependent variable and the predictor variables do not capture the spatial auto-correlation of the dependent variable itself. Moreover, under these techniques, local machine learning models are effectively built using only fewer observations, which can lead to well-known issues such as over-fitting and the curse of dimensionality. This paper introduces a novel geostatistical machine learning approach where both the spatial auto-correlation of the response variable and the spatial non-stationarity of the regression relationship between the response and predictor variables are explicitly considered. The basic idea consists of relying on the local stationarity assumption to build a collection of local machine learning models while leveraging on the local spatial auto-correlation of the response variable to locally augment the training dataset. The proposed method’s effectiveness is showcased via experiments conducted on synthetic spatial data with known characteristics as well as real-world spatial data. In the synthetic (resp. real) case study, the proposed method’s predictive accuracy, as indicated by the Root Mean Square Error (RMSE) on the test set, is 17% (resp. 7%) better than that of popular machine learning methods dealing with the response variable’s spatial auto-correlation. Additionally, this method is not only valuable for spatial prediction but also offers a deeper understanding of how the relationship between the target and predictor variables varies across space, and it can even be used to investigate the local significance of predictor variables.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.20
自引率
0.00%
发文量
0
期刊最新文献
Convolutional sparse coding network for sparse seismic time-frequency representation Research on the prediction method for fluvial-phase sandbody connectivity based on big data analysis--a case study of Bohai a oilfield Pore size classification and prediction based on distribution of reservoir fluid volumes utilizing well logs and deep learning algorithm in a complex lithology Benchmarking data handling strategies for landslide susceptibility modeling using random forest workflows A 3D convolutional neural network model with multiple outputs for simultaneously estimating the reactive transport parameters of sandstone from its CT images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1