Weihang Chen , Chao Shi , Jianwen Ding , Tengfei Wang , David P. Connolly
{"title":"Data-driven sparse learning of three-dimensional subsurface properties incorporating random field theory","authors":"Weihang Chen , Chao Shi , Jianwen Ding , Tengfei Wang , David P. Connolly","doi":"10.1016/j.enggeo.2025.107972","DOIUrl":null,"url":null,"abstract":"<div><div>Geotechnical engineers rely on accurate soil property information for engineering analyses. However, it is challenging for spatial learning of soil attributes because in-situ geotechnical testing is typically performed sparsely at discrete locations, and soil properties also exhibit inherent spatial variability. Traditional geostatistical methods for predicting spatial properties at these unsampled locations exhibit high computational complexity and require pre-determination of hyper-parameters, while pure data-driven methods fail to integrate geotechnical knowledge. In this study, a hybrid and parameter-free framework that uses random field theory and machine learning is proposed to model 3D subsurface field with reduced computational complexity. The framework constructs site-specific basis functions for characterizing the spatial variations of soil properties by decomposing a correlation matrix through principal component analysis. To further reduce the computational complexity involved in processing high-dimensional correlation matrices, a sparse sampling strategy is adopted to map correlation matrix onto lower-rank principal component space. A series of synthetic random field examples are generated to illustrate the impact of scale of fluctuation and autocorrelation functions on the accuracy and sensitivity of subsurface modeling. The performance of the proposed method is further validated using both synthetic cases and two real case histories. It is demonstrated that the proposed method generally achieves higher <em>R</em><sup>2</sup> and lower root mean square error (RMSE) and mean absolute percentage error (MAPE) compared to state-of-the-art methods, such as Kriging and Bayesian compressive sensing. Moreover, the proposed method facilitates the explicit quantification of uncertainty associated with the subsurface models, providing valuable insights for engineering design and analysis. The data and code used in this study are available at <span><span>https://github.com/Data-Driven-RFT/Sparse-Learning</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":11567,"journal":{"name":"Engineering Geology","volume":"349 ","pages":"Article 107972"},"PeriodicalIF":6.9000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Geology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0013795225000687","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, GEOLOGICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Geotechnical engineers rely on accurate soil property information for engineering analyses. However, it is challenging for spatial learning of soil attributes because in-situ geotechnical testing is typically performed sparsely at discrete locations, and soil properties also exhibit inherent spatial variability. Traditional geostatistical methods for predicting spatial properties at these unsampled locations exhibit high computational complexity and require pre-determination of hyper-parameters, while pure data-driven methods fail to integrate geotechnical knowledge. In this study, a hybrid and parameter-free framework that uses random field theory and machine learning is proposed to model 3D subsurface field with reduced computational complexity. The framework constructs site-specific basis functions for characterizing the spatial variations of soil properties by decomposing a correlation matrix through principal component analysis. To further reduce the computational complexity involved in processing high-dimensional correlation matrices, a sparse sampling strategy is adopted to map correlation matrix onto lower-rank principal component space. A series of synthetic random field examples are generated to illustrate the impact of scale of fluctuation and autocorrelation functions on the accuracy and sensitivity of subsurface modeling. The performance of the proposed method is further validated using both synthetic cases and two real case histories. It is demonstrated that the proposed method generally achieves higher R2 and lower root mean square error (RMSE) and mean absolute percentage error (MAPE) compared to state-of-the-art methods, such as Kriging and Bayesian compressive sensing. Moreover, the proposed method facilitates the explicit quantification of uncertainty associated with the subsurface models, providing valuable insights for engineering design and analysis. The data and code used in this study are available at https://github.com/Data-Driven-RFT/Sparse-Learning.
期刊介绍:
Engineering Geology, an international interdisciplinary journal, serves as a bridge between earth sciences and engineering, focusing on geological and geotechnical engineering. It welcomes studies with relevance to engineering, environmental concerns, and safety, catering to engineering geologists with backgrounds in geology or civil/mining engineering. Topics include applied geomorphology, structural geology, geophysics, geochemistry, environmental geology, hydrogeology, land use planning, natural hazards, remote sensing, soil and rock mechanics, and applied geotechnical engineering. The journal provides a platform for research at the intersection of geology and engineering disciplines.