Pub Date : 2025-07-15DOI: 10.1016/j.aiig.2025.100141
Michael Meadows, Karin Reinke, Simon Jones
Machine learning models are increasingly used to correct the vertical biases (mainly due to vegetation and buildings) in global Digital Elevation Models (DEMs), for downstream applications which need “bare earth” elevations. The predictive accuracy of these models has improved significantly as more flexible model architectures are developed and new explanatory datasets produced, leading to the recent release of three model-corrected DEMs (FABDEM, DiluviumDEM and FathomDEM). However, there has been relatively little focus so far on explaining or interrogating these models, especially important in this context given their downstream impact on many other applications (including natural hazard simulations). In this study we train five separate models (by land cover environment) to correct vertical biases in the Copernicus DEM and then explain them using SHapley Additive exPlanation (SHAP) values. Comparing the models, we find significant variation in terms of the specific input variables selected and their relative importance, suggesting that an ensemble of models (specialising by land cover) is likely preferable to a general model applied everywhere. Visualising the patterns learned by the models (using SHAP dependence plots) provides further insights, building confidence in some cases (where patterns are consistent with domain knowledge and past studies) and highlighting potentially problematic variables in others (such as proxy relationships which may not apply in new application sites). Our results have implications for future DEM error prediction studies, particularly in evaluating a very wide range of potential input variables (160 candidates) drawn from topographic, multispectral, Synthetic Aperture Radar, vegetation, climate and urbanisation datasets.
{"title":"Explaining machine learning models trained to predict Copernicus DEM errors in different land cover environments","authors":"Michael Meadows, Karin Reinke, Simon Jones","doi":"10.1016/j.aiig.2025.100141","DOIUrl":"10.1016/j.aiig.2025.100141","url":null,"abstract":"<div><div>Machine learning models are increasingly used to correct the vertical biases (mainly due to vegetation and buildings) in global Digital Elevation Models (DEMs), for downstream applications which need “bare earth” elevations. The predictive accuracy of these models has improved significantly as more flexible model architectures are developed and new explanatory datasets produced, leading to the recent release of three model-corrected DEMs (FABDEM, DiluviumDEM and FathomDEM). However, there has been relatively little focus so far on explaining or interrogating these models, especially important in this context given their downstream impact on many other applications (including natural hazard simulations). In this study we train five separate models (by land cover environment) to correct vertical biases in the Copernicus DEM and then explain them using SHapley Additive exPlanation (SHAP) values. Comparing the models, we find significant variation in terms of the specific input variables selected and their relative importance, suggesting that an ensemble of models (specialising by land cover) is likely preferable to a general model applied everywhere. Visualising the patterns learned by the models (using SHAP dependence plots) provides further insights, building confidence in some cases (where patterns are consistent with domain knowledge and past studies) and highlighting potentially problematic variables in others (such as proxy relationships which may not apply in new application sites). Our results have implications for future DEM error prediction studies, particularly in evaluating a very wide range of potential input variables (160 candidates) drawn from topographic, multispectral, Synthetic Aperture Radar, vegetation, climate and urbanisation datasets.</div></div>","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"6 2","pages":"Article 100141"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144663596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-13DOI: 10.1016/j.aiig.2025.100143
Christian Carhuancho , Edwin Villanueva , Christian Yarleque , Romel Erick Principe , Marcia Castromonte
In weather forecasting, generating atmospheric variables for regions with complex topography, such as the Andean regions with peaks reaching 6500 m above sea level, poses significant challenges. Traditional regional climate models often struggle to accurately represent the atmospheric behavior in such areas. Furthermore, the capability to produce high spatio-temporal resolution data (less than 27 km and hourly) is limited to a few institutions globally due to the substantial computational resources required. This study presents the results of atmospheric data generated using a new type of artificial intelligence (AI) models, aimed to reduce the computational cost of generating downscaled climate data using climate regional models like the Weather Research and Forecasting (WRF) model over the Andes. The WRF model was selected for this comparison due to its frequent use in simulating atmospheric variables in the Andes.
Our results demonstrate a higher downscaling performance for the four target weather variables studied (temperature, relative humidity, zonal and meridional wind) over coastal, mountain, and jungle regions. Moreover, this AI model offers several advantages, including lower computational costs compared to dynamic models like WRF and continuous improvement potential with additional training data.
{"title":"Generating high-resolution climate data in the Andes using artificial intelligence: A lightweight alternative to the WRF model","authors":"Christian Carhuancho , Edwin Villanueva , Christian Yarleque , Romel Erick Principe , Marcia Castromonte","doi":"10.1016/j.aiig.2025.100143","DOIUrl":"10.1016/j.aiig.2025.100143","url":null,"abstract":"<div><div>In weather forecasting, generating atmospheric variables for regions with complex topography, such as the Andean regions with peaks reaching 6500 m above sea level, poses significant challenges. Traditional regional climate models often struggle to accurately represent the atmospheric behavior in such areas. Furthermore, the capability to produce high spatio-temporal resolution data (less than 27 km and hourly) is limited to a few institutions globally due to the substantial computational resources required. This study presents the results of atmospheric data generated using a new type of artificial intelligence (AI) models, aimed to reduce the computational cost of generating downscaled climate data using climate regional models like the Weather Research and Forecasting (WRF) model over the Andes. The WRF model was selected for this comparison due to its frequent use in simulating atmospheric variables in the Andes.</div><div>Our results demonstrate a higher downscaling performance for the four target weather variables studied (temperature, relative humidity, zonal and meridional wind) over coastal, mountain, and jungle regions. Moreover, this AI model offers several advantages, including lower computational costs compared to dynamic models like WRF and continuous improvement potential with additional training data.</div></div>","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"6 2","pages":"Article 100143"},"PeriodicalIF":0.0,"publicationDate":"2025-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144653503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-05DOI: 10.1016/j.aiig.2025.100138
B.T. Gunel , Y.D. Pak , A.Ö. Herekeli , S. Gül , B. Kulga , E. Artun
Characterization and optimization of physical and chemical properties of drilling fluids are critical for the efficiency and success of drilling operations. In particular, maintaining the optimal levels of solids content is essential for achieving the most effective fluid performance. Proper management of solids content also reduces the risk of tool failures. Traditional solids content analysis methods, such as retort analysis, require substantial human intervention and time, which can lead to inaccuracies, time-management issues, and increased operational risks. In contrast to human-intensive methods, machine learning may offer a viable alternative for solids content estimation due to its pattern-recognition capability. In this study, a large set of laboratory reports of drilling-fluid analyses from 130 oil wells around the world were compiled to construct a comprehensive data set. The relationships among various rheological parameters were analyzed using statistical methods and machine learning algorithms. Several machine learning algorithms of diverse classes, namely linear (linear regression, ridge regression, and ElasticNet regression), kernel-based (support vector machine) and ensemble tree-based (gradient boosting, XGBoost, and random forests) algorithms, were trained and tuned to estimate solids content from other readily available drilling fluid properties. Input variables were kept consistent across all models for interpretation and comparison purposes. In the final stage, different evaluation metrics were employed to evaluate and compare the performance of different classes of machine learning models. Among all algorithms tested, random forests algorithm was found to be the best predictive model resulting in consistently high accuracy. Further optimization of the random forests model resulted in a mean absolute percentage error (MAPE) of 3.9% and 9.6% and R of 0.99 and 0.93 for the training and testing sets, respectively. Analysis of residuals, their histograms and Q-Q normality plots showed Gaussian distributions with residuals that are scattered around a mean of zero within error ranges of 1% and 4%, for training and testing, respectively. The selected model was further validated by applying the rheological measurements from mud samples taken from an offshore well from the Gulf of Mexico. The model was able to estimate total solids content in those four mud samples with an average absolute error of 1.08% of total solids content. The model was then used to develop a web-based graphical-user-interface (GUI) application, which can be practically used at the rig site by engineers to optimize drilling fluid programs. The proposed model can complement automation workflows that are designed to measure fundamental rheological properties in real time during drilling operations. While a st
{"title":"Machine learning assisted estimation of total solids content of drilling fluids","authors":"B.T. Gunel , Y.D. Pak , A.Ö. Herekeli , S. Gül , B. Kulga , E. Artun","doi":"10.1016/j.aiig.2025.100138","DOIUrl":"10.1016/j.aiig.2025.100138","url":null,"abstract":"<div><div>Characterization and optimization of physical and chemical properties of drilling fluids are critical for the efficiency and success of drilling operations. In particular, maintaining the optimal levels of solids content is essential for achieving the most effective fluid performance. Proper management of solids content also reduces the risk of tool failures. Traditional solids content analysis methods, such as retort analysis, require substantial human intervention and time, which can lead to inaccuracies, time-management issues, and increased operational risks. In contrast to human-intensive methods, machine learning may offer a viable alternative for solids content estimation due to its pattern-recognition capability. In this study, a large set of laboratory reports of drilling-fluid analyses from 130 oil wells around the world were compiled to construct a comprehensive data set. The relationships among various rheological parameters were analyzed using statistical methods and machine learning algorithms. Several machine learning algorithms of diverse classes, namely linear (linear regression, ridge regression, and ElasticNet regression), kernel-based (support vector machine) and ensemble tree-based (gradient boosting, XGBoost, and random forests) algorithms, were trained and tuned to estimate solids content from other readily available drilling fluid properties. Input variables were kept consistent across all models for interpretation and comparison purposes. In the final stage, different evaluation metrics were employed to evaluate and compare the performance of different classes of machine learning models. Among all algorithms tested, random forests algorithm was found to be the best predictive model resulting in consistently high accuracy. Further optimization of the random forests model resulted in a mean absolute percentage error (MAPE) of 3.9% and 9.6% and R<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span> of 0.99 and 0.93 for the training and testing sets, respectively. Analysis of residuals, their histograms and Q-Q normality plots showed Gaussian distributions with residuals that are scattered around a mean of zero within error ranges of <span><math><mo>±</mo></math></span>1% and <span><math><mo>±</mo></math></span>4%, for training and testing, respectively. The selected model was further validated by applying the rheological measurements from mud samples taken from an offshore well from the Gulf of Mexico. The model was able to estimate total solids content in those four mud samples with an average absolute error of 1.08% of total solids content. The model was then used to develop a web-based graphical-user-interface (GUI) application, which can be practically used at the rig site by engineers to optimize drilling fluid programs. The proposed model can complement automation workflows that are designed to measure fundamental rheological properties in real time during drilling operations. While a st","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"6 2","pages":"Article 100138"},"PeriodicalIF":0.0,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-05DOI: 10.1016/j.aiig.2025.100144
Oriyomi Raheem , Misael M. Morales , Wen Pan , Carlos Torres-Verdín
Capillary pressure plays a crucial role in determining the spatial distribution of oil and gas, particularly in medium-to-low permeability reservoirs, where it is closely linked to the rock's pore structure and wettability. In these environments, pore structure is the primary factor influencing capillary pressure, with different pore types affecting fluid transport through varying degrees of hydrocarbon saturation. One of the main challenges in characterizing pore structure is how to use data from core plugs to establish a relationship with microscopic pore and throat properties, enabling more accurate predictions of capillary pressure. While special core analysis laboratory experiments are effective, they are time-consuming and expensive. In contrast, nuclear magnetic resonance (NMR) measurements, which provide information on pore body size distribution, are faster and can be leveraged to estimate capillary pressure using machine learning algorithms. Recently, artificial intelligence methods have also been applied to capillary pressure prediction (Qi et al., 2024).
Currently, no readily applicable predictive model exists for estimating an entire capillary pressure curve directly from standard petrophysical logs and core data. Although porescale imaging and network modeling techniques can compute capillary pressure from micro-CT rock images (Øren and Bakke, 2003; Valvatne and Blunt, 2004), these approaches are time-consuming, limited to small sample volumes, and not yet practical for routine reservoir evaluation. In this study, we introduce rock classification techniques and implement a data-driven machine learning (ML) method to estimate saturation-dependent capillary pressure from core petrophysical properties. The new model integrates cumulative NMR data and densely resampled core measurements as training data, with prediction errors quantified throughout the process. To approach the common condition of sparsely sampled training data, we transformed the prediction problem into an overdetermined one by applying composite fitting to both capillary pressure and pore throat size distribution, and Gaussian cumulative distribution fitting to the NMR measurements, generating evenly sampled data points. Using these preprocessed input features, we performed classification based on the natural logarithm of the permeability-to-porosity ratio to cluster distinct rock types. For each rock class, we applied regression techniques—such as random forest (RF), k-nearest neighbors (k-NN), extreme gradient boosting (XGB), and artificial neural networks (ANN)—to estimate the logarithm of capillary pressure. The methods were tested on blind core samples, and performance comparisons among different estimation methods
毛管压力对油气的空间分布起着至关重要的作用,特别是在中低渗透储层中,毛管压力与岩石的孔隙结构和润湿性密切相关。在这些环境中,孔隙结构是影响毛管压力的主要因素,不同孔隙类型通过不同程度的烃饱和度影响流体的输运。表征孔隙结构的主要挑战之一是如何利用岩心桥塞的数据建立微观孔隙和喉道特性的关系,从而更准确地预测毛管压力。虽然特殊的岩心分析实验室实验是有效的,但它们耗时且昂贵。相比之下,核磁共振(NMR)测量可以提供孔体大小分布的信息,速度更快,并且可以利用机器学习算法来估计毛细管压力。最近,人工智能方法也被应用于毛细管压力预测(Qi et al., 2024)。目前,还没有现成的预测模型可以直接从标准岩石物理测井和岩心数据中估计整个毛管压力曲线。虽然孔隙尺度成像和网络建模技术可以从微ct岩石图像中计算毛细压力(Øren和Bakke, 2003;Valvatne和Blunt, 2004),这些方法耗时长,仅限于小样本量,还不能用于常规储层评价。在这项研究中,我们引入了岩石分类技术,并实现了一种数据驱动的机器学习(ML)方法,通过岩心岩石物理性质来估计与饱和度相关的毛管压力。新模型将累积核磁共振数据和密集重采样的岩心测量数据作为训练数据,并在整个过程中量化预测误差。为了接近稀疏采样训练数据的常见情况,我们通过对毛细管压力和孔喉大小分布进行复合拟合,并对NMR T2测量值进行高斯累积分布拟合,将预测问题转化为过确定问题,生成均匀采样的数据点。利用这些预处理的输入特征,我们根据渗透率-孔隙度比(ln(k/ϕ))的自然对数进行分类,以聚类不同的岩石类型。对于每个岩石类别,我们应用回归技术——如随机森林(RF)、k近邻(k-NN)、极端梯度增强(XGB)和人工神经网络(ANN)——来估计毛细管压力的对数。对盲岩心样本进行了测试,并基于预测的相对标准误差对不同估计方法进行了性能比较。结果表明,核磁共振数据对岩石孔隙结构较为敏感,对毛细管压力和孔喉大小分布的预测有显著改善。对于毛细管压力和孔喉大小分布,极端梯度增强和随机森林模型的平均估计误差分别为5%和10%,表现最好。相比之下,当NMR T2数据被排除作为输入特征时,预测误差增加到25%。使用传统的高斯模型拟合和更高分辨率的重采样确保了训练数据覆盖了广泛的变异性。将核磁共振T2数据作为输入特征增强了模型捕捉非常规岩石中多峰的能力,使预测问题过度确定。通过向量输入特征预测向量函数,有效降低了预测误差。该解释工作流程可用于构建具有代表性的分类模型,并在广泛的饱和度范围内估计毛细管压力。
{"title":"Improved estimation of two-phase capillary pressure with nuclear magnetic resonance measurements via machine learning","authors":"Oriyomi Raheem , Misael M. Morales , Wen Pan , Carlos Torres-Verdín","doi":"10.1016/j.aiig.2025.100144","DOIUrl":"10.1016/j.aiig.2025.100144","url":null,"abstract":"<div><div>Capillary pressure plays a crucial role in determining the spatial distribution of oil and gas, particularly in medium-to-low permeability reservoirs, where it is closely linked to the rock's pore structure and wettability. In these environments, pore structure is the primary factor influencing capillary pressure, with different pore types affecting fluid transport through varying degrees of hydrocarbon saturation. One of the main challenges in characterizing pore structure is how to use data from core plugs to establish a relationship with microscopic pore and throat properties, enabling more accurate predictions of capillary pressure. While special core analysis laboratory experiments are effective, they are time-consuming and expensive. In contrast, nuclear magnetic resonance (NMR) measurements, which provide information on pore body size distribution, are faster and can be leveraged to estimate capillary pressure using machine learning algorithms. Recently, artificial intelligence methods have also been applied to capillary pressure prediction (Qi et al., 2024).</div><div>Currently, no readily applicable predictive model exists for estimating an entire capillary pressure curve directly from standard petrophysical logs and core data. Although porescale imaging and network modeling techniques can compute capillary pressure from micro-CT rock images (Øren and Bakke, 2003; Valvatne and Blunt, 2004), these approaches are time-consuming, limited to small sample volumes, and not yet practical for routine reservoir evaluation. In this study, we introduce rock classification techniques and implement a data-driven machine learning (ML) method to estimate saturation-dependent capillary pressure from core petrophysical properties. The new model integrates cumulative NMR data and densely resampled core measurements as training data, with prediction errors quantified throughout the process. To approach the common condition of sparsely sampled training data, we transformed the prediction problem into an overdetermined one by applying composite fitting to both capillary pressure and pore throat size distribution, and Gaussian cumulative distribution fitting to the NMR <span><math><mrow><msub><mi>T</mi><mn>2</mn></msub></mrow></math></span> measurements, generating evenly sampled data points. Using these preprocessed input features, we performed classification based on the natural logarithm of the permeability-to-porosity ratio <span><math><mrow><mo>(</mo><mrow><mi>ln</mi><mrow><mo>(</mo><mrow><mi>k</mi><mo>/</mo><mi>ϕ</mi></mrow><mo>)</mo></mrow></mrow><mo>)</mo></mrow></math></span> to cluster distinct rock types. For each rock class, we applied regression techniques—such as random forest (RF), k-nearest neighbors (k-NN), extreme gradient boosting (XGB), and artificial neural networks (ANN)—to estimate the logarithm of capillary pressure. The methods were tested on blind core samples, and performance comparisons among different estimation methods ","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"6 2","pages":"Article 100144"},"PeriodicalIF":0.0,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144633212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-04DOI: 10.1016/j.aiig.2025.100140
Vinh V. Le , HongGiang Nguyen , Nguyen Huu Ngu
This paper presents a deep learning architecture combined with exploratory data analysis to estimate maximum wall deflection in deep excavations. Six major geotechnical parameters were studied. Statistical methods, such as pair plots and Pearson correlation, highlighted excavation depth (correlation coefficient = 0.82) as the most significant factor. For method prediction, five deep learning models (CNN, LSTM, BiLSTM, CNN-LSTM, and CNN-BiLSTM) were built. The CNN-BiLSTM model excelled in training performance (R2 = 0.98, RMSE = 0.02), while BiLSTM reached superior testing results (R2 = 0.85, RMSE = 0.06), suggesting greater generalization ability. Based on the feature importance analysis from model weights, excavation depth, stiffness ratio, and bracing spacing were ranked as the highest contributors. This point verified a lack of prediction bias on residual plots and high model agreement with measured values on Taylor diagrams (correlation coefficient 0.92). The effectiveness of integrated techniques was reliably assured for predicting wall deformation. This approach facilitates more accurate and efficient geotechnical design and provides engineers with improved tools for risk evaluation and decision-making in deep excavation projects.
{"title":"Deep learning approaches for estimating maximum wall deflection in excavations with inconsistent clay stratigraphy","authors":"Vinh V. Le , HongGiang Nguyen , Nguyen Huu Ngu","doi":"10.1016/j.aiig.2025.100140","DOIUrl":"10.1016/j.aiig.2025.100140","url":null,"abstract":"<div><div>This paper presents a deep learning architecture combined with exploratory data analysis to estimate maximum wall deflection in deep excavations. Six major geotechnical parameters were studied. Statistical methods, such as pair plots and Pearson correlation, highlighted excavation depth (correlation coefficient = 0.82) as the most significant factor. For method prediction, five deep learning models (CNN, LSTM, BiLSTM, CNN-LSTM, and CNN-BiLSTM) were built. The CNN-BiLSTM model excelled in training performance (R<sup>2</sup> = 0.98, RMSE = 0.02), while BiLSTM reached superior testing results (R<sup>2</sup> = 0.85, RMSE = 0.06), suggesting greater generalization ability. Based on the feature importance analysis from model weights, excavation depth, stiffness ratio, and bracing spacing were ranked as the highest contributors. This point verified a lack of prediction bias on residual plots and high model agreement with measured values on Taylor diagrams (correlation coefficient 0.92). The effectiveness of integrated techniques was reliably assured for predicting wall deformation. This approach facilitates more accurate and efficient geotechnical design and provides engineers with improved tools for risk evaluation and decision-making in deep excavation projects.</div></div>","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"6 2","pages":"Article 100140"},"PeriodicalIF":0.0,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-30DOI: 10.1016/j.aiig.2025.100142
Baoling Gui, Anshuman Bhardwaj, Lydia Sam
Rapid urbanization and land-use changes are placing immense pressure on resources, infrastructure, and environmental sustainability. To address these, accurate urban simulation models are essential for sustainable development and governance. Among them, Cellular Automata (CA) models have become key tools for predicting urban expansion, optimizing land-use planning, and supporting data-driven decision-making. This review provides a comprehensive examination of the development of urban cellular automata (UCA) models, presenting a new framework to enhance individual UCA sub-modules within the context of emerging technologies, sustainable environments, and public governance. By addressing gaps in prior UCA modelling reviews—particularly in the integration and optimization of UCA sub-module technologies—this framework is designed to simplify UCA model understanding and development. We systematically review pioneering case studies, deconstruct current UCA operational processes, and explore modern technologies, such as big data and artificial intelligence, to optimize these sub-modules further. We discuss current limitations within UCA models and propose future pathways, emphasizing the necessity of comprehensive analyses for effective UCA simulations. Proposed solutions include strengthening our understanding of urban growth mechanisms, examining spatial positioning and temporal evolution dynamics, and enhancing urban geographic simulations with deep learning techniques to support sustainable transitions in public governance. These improvements offer data-driven decision support for environmental management, advancing policies that foster sustainable urban development.
{"title":"Cellular automata models for simulation and prediction of urban land use change: Development and prospects","authors":"Baoling Gui, Anshuman Bhardwaj, Lydia Sam","doi":"10.1016/j.aiig.2025.100142","DOIUrl":"10.1016/j.aiig.2025.100142","url":null,"abstract":"<div><div>Rapid urbanization and land-use changes are placing immense pressure on resources, infrastructure, and environmental sustainability. To address these, accurate urban simulation models are essential for sustainable development and governance. Among them, Cellular Automata (CA) models have become key tools for predicting urban expansion, optimizing land-use planning, and supporting data-driven decision-making. This review provides a comprehensive examination of the development of urban cellular automata (UCA) models, presenting a new framework to enhance individual UCA sub-modules within the context of emerging technologies, sustainable environments, and public governance. By addressing gaps in prior UCA modelling reviews—particularly in the integration and optimization of UCA sub-module technologies—this framework is designed to simplify UCA model understanding and development. We systematically review pioneering case studies, deconstruct current UCA operational processes, and explore modern technologies, such as big data and artificial intelligence, to optimize these sub-modules further. We discuss current limitations within UCA models and propose future pathways, emphasizing the necessity of comprehensive analyses for effective UCA simulations. Proposed solutions include strengthening our understanding of urban growth mechanisms, examining spatial positioning and temporal evolution dynamics, and enhancing urban geographic simulations with deep learning techniques to support sustainable transitions in public governance. These improvements offer data-driven decision support for environmental management, advancing policies that foster sustainable urban development.</div></div>","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"6 2","pages":"Article 100142"},"PeriodicalIF":0.0,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144518220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-24DOI: 10.1016/j.aiig.2025.100139
Paul Theophily Nsulangi , Werneld Egno Ngongi , John Mbogo Kafuku , Guan Zhen Liang
This study compared the predictive performance and processing speed of an artificial neural network (ANN) and a hybrid of a numerical reservoir simulation (NRS) and artificial neural network (NRS-ANN) models in estimating the oil production rate of the ZH86 reservoir block under waterflood recovery. The historical input variables: reservoir pressure, reservoir pore volume containing hydrocarbons, reservoir pore volume containing water and reservoir water injection rate used as inputs for ANN models. To create the NRS-ANN hybrid models, 314 data sets extracted from the NRS model, which included reservoir pressure, reservoir pore volume containing hydrocarbons, reservoir pore volume containing water and reservoir water injection rate were used. The output of the models was the historical oil production rate (HOPR in m3 per day) recorded from the ZH86 reservoir block. Models were developed using MATLAB R2021a and trained with 25 models in three replicate conditions (2, 4 and 6), each at 1000 epochs. A comparative analysis indicated that, for all 25 models, the ANN outperformed the NRS-ANN in terms of processing speed and prediction performance. ANN models achieved an average of R2 and MAE of 0.8433 and 8.0964 m3/day values, respectively, while NRS-ANN hybrid models achieved an average of R2 and MAE of 0.7828 and 8.2484 m3/day values, respectively. In addition, ANN models achieved a processing speed of 49 epochs/sec, 32 epochs/sec, and 24 epochs/sec after 2, 4, and 6 replicates, respectively. Whereas the NRS-ANN hybrid models achieved lower average processing speeds of 45 epochs/sec, 23 epochs/sec and 20 epochs/sec. In addition, the ANN optimal model outperforms the NRS-ANN model in terms of both processing speed and accuracy. The ANN optimal model achieved a speed of 336.44 epochs/sec, compared to the NRS-ANN hybrid optimal model, which achieved a speed of 52.16 epochs/sec. The ANN optimal model achieved lower RMSE and MAE values of 7.9291 m3/day and 5.3855 m3/day in the validation dataset compared with the hybrid ANS optimal model, which achieved 13.6821 m3/day and 9.2047 m3/day, respectively. The study also showed that the ANN optimal model consistently achieved higher R2 values: 0.9472, 0.9284 and 0.9316 in the training, test and validation data sets. Whereas the NRS-ANN hybrid optimal yielded lower R2 values of 0.8030, 0.8622 and 0.7776 for the training, testing and validation datasets. The study showed that ANN models are a more effective and reliable tool, as they balance both processing speed and accuracy in estimating the oil production rate of the ZH86 reservoir block under the waterflooding recovery method.
{"title":"Comparison of processing speed of NRS-ANN hybrid and ANN models for oil production rate estimation of reservoir under waterflooding","authors":"Paul Theophily Nsulangi , Werneld Egno Ngongi , John Mbogo Kafuku , Guan Zhen Liang","doi":"10.1016/j.aiig.2025.100139","DOIUrl":"10.1016/j.aiig.2025.100139","url":null,"abstract":"<div><div>This study compared the predictive performance and processing speed of an artificial neural network (ANN) and a hybrid of a numerical reservoir simulation (NRS) and artificial neural network (NRS-ANN) models in estimating the oil production rate of the ZH86 reservoir block under waterflood recovery. The historical input variables: reservoir pressure, reservoir pore volume containing hydrocarbons, reservoir pore volume containing water and reservoir water injection rate used as inputs for ANN models. To create the NRS-ANN hybrid models, 314 data sets extracted from the NRS model, which included reservoir pressure, reservoir pore volume containing hydrocarbons, reservoir pore volume containing water and reservoir water injection rate were used. The output of the models was the historical oil production rate (HOPR in m<sup>3</sup> per day) recorded from the ZH86 reservoir block. Models were developed using MATLAB R2021a and trained with 25 models in three replicate conditions (2, 4 and 6), each at 1000 epochs. A comparative analysis indicated that, for all 25 models, the ANN outperformed the NRS-ANN in terms of processing speed and prediction performance. ANN models achieved an average of R<sup>2</sup> and MAE of 0.8433 and 8.0964 m<sup>3</sup>/day values, respectively, while NRS-ANN hybrid models achieved an average of R<sup>2</sup> and MAE of 0.7828 and 8.2484 m<sup>3</sup>/day values, respectively. In addition, ANN models achieved a processing speed of 49 epochs/sec, 32 epochs/sec, and 24 epochs/sec after 2, 4, and 6 replicates, respectively. Whereas the NRS-ANN hybrid models achieved lower average processing speeds of 45 epochs/sec, 23 epochs/sec and 20 epochs/sec. In addition, the ANN optimal model outperforms the NRS-ANN model in terms of both processing speed and accuracy. The ANN optimal model achieved a speed of 336.44 epochs/sec, compared to the NRS-ANN hybrid optimal model, which achieved a speed of 52.16 epochs/sec. The ANN optimal model achieved lower RMSE and MAE values of 7.9291 m<sup>3</sup>/day and 5.3855 m<sup>3</sup>/day in the validation dataset compared with the hybrid ANS optimal model, which achieved 13.6821 m<sup>3</sup>/day and 9.2047 m<sup>3</sup>/day, respectively. The study also showed that the ANN optimal model consistently achieved higher R<sup>2</sup> values: 0.9472, 0.9284 and 0.9316 in the training, test and validation data sets. Whereas the NRS-ANN hybrid optimal yielded lower R<sup>2</sup> values of 0.8030, 0.8622 and 0.7776 for the training, testing and validation datasets. The study showed that ANN models are a more effective and reliable tool, as they balance both processing speed and accuracy in estimating the oil production rate of the ZH86 reservoir block under the waterflooding recovery method.</div></div>","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"6 2","pages":"Article 100139"},"PeriodicalIF":0.0,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144653505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-23DOI: 10.1016/j.aiig.2025.100128
Junfei Zhang , Huisheng Cheng , Ninghui Sun , Zehui Huo , Junlin Chen
Ternary geopolymers incorporating multiple solid wastes such as steel slag (SS), fly ash (FA), and granulated blast furnace slag (GBFS) are considered environmentally friendly and exhibit enhanced performance. However, the mechanisms governing strength development and the design of optimal mixtures are not fully understood due to the complexity of their components. This study presents the development of four machine learning models—Artificial Neural Network (ANN), Support Vector Regression (SVR), Extremely Randomized Tree (ERT), and Gradient Boosting Regression (GBR)—for predicting the unconfined compressive strength (UCS) of ternary geopolymers. The models were trained using a dataset comprising 120 mixtures derived from laboratory tests. Shapley Additive Explanations analysis was employed to interpret the machine learning models and elucidate the influence of different components on the properties of ternary geopolymers. The results indicate that ANN exhibits the highest predictive accuracy for UCS (R = 0.949). Furthermore, the UCS of ternary geopolymers is most sensitive to the content of GBFS. This study provides valuable insights for optimizing the mix proportions in ternary blended geopolymer mixtures.
{"title":"Interpretable machine learning models for evaluating strength of ternary geopolymers","authors":"Junfei Zhang , Huisheng Cheng , Ninghui Sun , Zehui Huo , Junlin Chen","doi":"10.1016/j.aiig.2025.100128","DOIUrl":"10.1016/j.aiig.2025.100128","url":null,"abstract":"<div><div>Ternary geopolymers incorporating multiple solid wastes such as steel slag (SS), fly ash (FA), and granulated blast furnace slag (GBFS) are considered environmentally friendly and exhibit enhanced performance. However, the mechanisms governing strength development and the design of optimal mixtures are not fully understood due to the complexity of their components. This study presents the development of four machine learning models—Artificial Neural Network (ANN), Support Vector Regression (SVR), Extremely Randomized Tree (ERT), and Gradient Boosting Regression (GBR)—for predicting the unconfined compressive strength (UCS) of ternary geopolymers. The models were trained using a dataset comprising 120 mixtures derived from laboratory tests. Shapley Additive Explanations analysis was employed to interpret the machine learning models and elucidate the influence of different components on the properties of ternary geopolymers. The results indicate that ANN exhibits the highest predictive accuracy for UCS (R = 0.949). Furthermore, the UCS of ternary geopolymers is most sensitive to the content of GBFS. This study provides valuable insights for optimizing the mix proportions in ternary blended geopolymer mixtures.</div></div>","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"6 2","pages":"Article 100128"},"PeriodicalIF":0.0,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144523230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-03DOI: 10.1016/j.aiig.2025.100126
Andrey V. Soromotin , Dmitriy A. Martyushev , João Luiz Junho Pereira
Permeability is one of the main oil reservoir characteristics. It affects potential oil production, well-completion technologies, the choice of enhanced oil recovery methods, and more. The methods used to determine and predict reservoir permeability have serious shortcomings. This article aims to refine and adapt machine learning techniques using historical data from hydrocarbon field development to evaluate and predict parameters such as the skin factor and permeability of the remote reservoir zone. The article analyzes data from 4045 wells tests in oil fields in the Perm Krai (Russia). An evaluation of the performance of different Machine Learning (ML) algorithms in the prediction of the well permeability is performed. Three different real datasets are used to train more than 20 machine learning regressors, whose hyperparameters are optimized using Bayesian Optimization (BO). The resulting models demonstrate significantly better predictive performance compared to traditional methods and the best ML model found is one that never was applied before to this problem. The permeability prediction model is characterized by a high R2 adjusted value of 0.799. A promising approach is the integration of machine learning methods and the use of pressure recovery curves to estimate permeability in real-time. The work is unique for its approach to predicting pressure recovery curves during well operation without stopping wells, providing primary data for interpretation. These innovations are exclusive and can improve the accuracy of permeability forecasts. It also reduces well downtime associated with traditional well-testing procedures. The proposed methods pave the way for more efficient and cost-effective reservoir development, ultimately supporting better decision-making and resource optimization in oil production.
{"title":"On the application of machine learning algorithms in predicting the permeability of oil reservoirs","authors":"Andrey V. Soromotin , Dmitriy A. Martyushev , João Luiz Junho Pereira","doi":"10.1016/j.aiig.2025.100126","DOIUrl":"10.1016/j.aiig.2025.100126","url":null,"abstract":"<div><div>Permeability is one of the main oil reservoir characteristics. It affects potential oil production, well-completion technologies, the choice of enhanced oil recovery methods, and more. The methods used to determine and predict reservoir permeability have serious shortcomings. This article aims to refine and adapt machine learning techniques using historical data from hydrocarbon field development to evaluate and predict parameters such as the skin factor and permeability of the remote reservoir zone. The article analyzes data from 4045 wells tests in oil fields in the Perm Krai (Russia). An evaluation of the performance of different Machine Learning (ML) algorithms in the prediction of the well permeability is performed. Three different real datasets are used to train more than 20 machine learning regressors, whose hyperparameters are optimized using Bayesian Optimization (BO). The resulting models demonstrate significantly better predictive performance compared to traditional methods and the best ML model found is one that never was applied before to this problem. The permeability prediction model is characterized by a high R<sup>2</sup> adjusted value of 0.799. A promising approach is the integration of machine learning methods and the use of pressure recovery curves to estimate permeability in real-time. The work is unique for its approach to predicting pressure recovery curves during well operation without stopping wells, providing primary data for interpretation. These innovations are exclusive and can improve the accuracy of permeability forecasts. It also reduces well downtime associated with traditional well-testing procedures. The proposed methods pave the way for more efficient and cost-effective reservoir development, ultimately supporting better decision-making and resource optimization in oil production.</div></div>","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"6 2","pages":"Article 100126"},"PeriodicalIF":0.0,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144330611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01DOI: 10.1016/j.aiig.2025.100120
M. Giselle Fernández-Godino , Wai Tong Chung , Akshay A. Gowardhan , Matthias Ihme , Qingkai Kong , Donald D. Lucas , Stephen C. Myers
High-resolution spatiotemporal simulations effectively capture the complexities of atmospheric plume dispersion in complex terrain. However, their high computational cost makes them impractical for applications requiring rapid responses or iterative processes, such as optimization, uncertainty quantification, or inverse modeling. To address this challenge, this work introduces the Dual-Stage Temporal Three-dimensional UNet Super-resolution (DST3D-UNet-SR) model, a highly efficient deep learning model for plume dispersion predictions. DST3D-UNet-SR is composed of two sequential modules: the temporal module (TM), which predicts the transient evolution of a plume in complex terrain from low-resolution temporal data, and the spatial refinement module (SRM), which subsequently enhances the spatial resolution of the TM predictions. We train DST3D-UNet-SR using a comprehensive dataset derived from high-resolution large eddy simulations (LES) of plume transport. We propose the DST3D-UNet-SR model to significantly accelerate LES of three-dimensional (3D) plume dispersion by three orders of magnitude. Additionally, the model demonstrates the ability to dynamically adapt to evolving conditions through the incorporation of new observational data, substantially improving prediction accuracy in high-concentration regions near the source.
{"title":"A staged deep learning approach to spatial refinement in 3D temporal atmospheric transport","authors":"M. Giselle Fernández-Godino , Wai Tong Chung , Akshay A. Gowardhan , Matthias Ihme , Qingkai Kong , Donald D. Lucas , Stephen C. Myers","doi":"10.1016/j.aiig.2025.100120","DOIUrl":"10.1016/j.aiig.2025.100120","url":null,"abstract":"<div><div>High-resolution spatiotemporal simulations effectively capture the complexities of atmospheric plume dispersion in complex terrain. However, their high computational cost makes them impractical for applications requiring rapid responses or iterative processes, such as optimization, uncertainty quantification, or inverse modeling. To address this challenge, this work introduces the Dual-Stage Temporal Three-dimensional UNet Super-resolution (DST3D-UNet-SR) model, a highly efficient deep learning model for plume dispersion predictions. DST3D-UNet-SR is composed of two sequential modules: the temporal module (TM), which predicts the transient evolution of a plume in complex terrain from low-resolution temporal data, and the spatial refinement module (SRM), which subsequently enhances the spatial resolution of the TM predictions. We train DST3D-UNet-SR using a comprehensive dataset derived from high-resolution large eddy simulations (LES) of plume transport. We propose the DST3D-UNet-SR model to significantly accelerate LES of three-dimensional (3D) plume dispersion by three orders of magnitude. Additionally, the model demonstrates the ability to dynamically adapt to evolving conditions through the incorporation of new observational data, substantially improving prediction accuracy in high-concentration regions near the source.</div></div>","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"6 1","pages":"Article 100120"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144185115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}