{"title":"Machine learning surrogates for efficient hydrologic modeling: Insights from stochastic simulations of managed aquifer recharge","authors":"Timothy Dai, Kate Maher, Zach Perzan","doi":"arxiv-2407.20902","DOIUrl":null,"url":null,"abstract":"Process-based hydrologic models are invaluable tools for understanding the\nterrestrial water cycle and addressing modern water resources problems.\nHowever, many hydrologic models are computationally expensive and, depending on\nthe resolution and scale, simulations can take on the order of hours to days to\ncomplete. While techniques such as uncertainty quantification and optimization\nhave become valuable tools for supporting management decisions, these analyses\ntypically require hundreds of model simulations, which are too computationally\nexpensive to perform with a process-based hydrologic model. To address this\ngap, we propose a hybrid modeling workflow in which a process-based model is\nused to generate an initial set of simulations and a machine learning (ML)\nsurrogate model is then trained to perform the remaining simulations required\nfor downstream analysis. As a case study, we apply this workflow to simulations\nof variably saturated groundwater flow at a prospective managed aquifer\nrecharge (MAR) site. We compare the accuracy and computational efficiency of\nseveral ML architectures, including deep convolutional networks, recurrent\nneural networks, vision transformers, and networks with Fourier transforms. Our\nresults demonstrate that ML surrogate models can achieve under 10% mean\nabsolute percentage error and yield order-of-magnitude runtime savings over\nprocessed-based models. We also offer practical recommendations for training\nhydrologic surrogate models, including implementing data normalization to\nimprove accuracy, using a normalized loss function to improve training\nstability and downsampling input features to decrease memory requirements.","PeriodicalId":501270,"journal":{"name":"arXiv - PHYS - Geophysics","volume":"213 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Geophysics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.20902","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Process-based hydrologic models are invaluable tools for understanding the
terrestrial water cycle and addressing modern water resources problems.
However, many hydrologic models are computationally expensive and, depending on
the resolution and scale, simulations can take on the order of hours to days to
complete. While techniques such as uncertainty quantification and optimization
have become valuable tools for supporting management decisions, these analyses
typically require hundreds of model simulations, which are too computationally
expensive to perform with a process-based hydrologic model. To address this
gap, we propose a hybrid modeling workflow in which a process-based model is
used to generate an initial set of simulations and a machine learning (ML)
surrogate model is then trained to perform the remaining simulations required
for downstream analysis. As a case study, we apply this workflow to simulations
of variably saturated groundwater flow at a prospective managed aquifer
recharge (MAR) site. We compare the accuracy and computational efficiency of
several ML architectures, including deep convolutional networks, recurrent
neural networks, vision transformers, and networks with Fourier transforms. Our
results demonstrate that ML surrogate models can achieve under 10% mean
absolute percentage error and yield order-of-magnitude runtime savings over
processed-based models. We also offer practical recommendations for training
hydrologic surrogate models, including implementing data normalization to
improve accuracy, using a normalized loss function to improve training
stability and downsampling input features to decrease memory requirements.
基于过程的水文模型是了解陆地水循环和解决现代水资源问题的宝贵工具。然而,许多水文模型的计算成本很高,根据分辨率和规模的不同,模拟可能需要数小时至数天才能完成。虽然不确定性量化和优化等技术已成为支持管理决策的重要工具,但这些分析通常需要数百次模型模拟,而基于过程的水文模型的计算成本太高。为了弥补这一差距,我们提出了一种混合建模工作流程,即使用基于过程的模型生成初始模拟集,然后训练机器学习(ML)代理模型来执行下游分析所需的剩余模拟。作为案例研究,我们将这一工作流程应用于模拟一个潜在的有管理含水层补给(MAR)地点的可变饱和地下水流。我们比较了深度卷积网络、循环神经网络、视觉变换器和傅立叶变换网络等多种 ML 架构的准确性和计算效率。我们的研究结果表明,与基于处理的模型相比,ML 代用模型的平均绝对百分比误差低于 10%,并能节省数量级的运行时间。我们还为训练水文代用模型提供了实用建议,包括实施数据归一化以提高准确性,使用归一化损失函数以提高训练稳定性,以及降低输入特征采样以减少内存需求。