Interpretable machine learning on large samples for supporting runoff estimation in ungauged basins

IF 5.9 1区地球科学 Q1 ENGINEERING, CIVIL Journal of Hydrology Pub Date : 2024-07-05 DOI:10.1016/j.jhydrol.2024.131598

Yuanhao Xu , Kairong Lin , Caihong Hu , Shuli Wang , Qiang Wu , Jingwen Zhang , Mingzhong Xiao , Yufu Luo

{"title":"Interpretable machine learning on large samples for supporting runoff estimation in ungauged basins","authors":"Yuanhao Xu , Kairong Lin , Caihong Hu , Shuli Wang , Qiang Wu , Jingwen Zhang , Mingzhong Xiao , Yufu Luo","doi":"10.1016/j.jhydrol.2024.131598","DOIUrl":null,"url":null,"abstract":"<div><p>The distribution of flowmeter data and basin characteristic information exhibits substantial disparities, with most flow observations being recorded at a limited number of well-monitored locations. The perennial challenge of achieving reliable and robust hydrological modeling in ungauged catchments through regionalization has persisted. The increasing availability of large-scale hydrological datasets, coupled with recent advancements in machine learning techniques, offers new opportunities to explore patterns of association between basin attributes and hydrological parameters to enhance streamflow predictions. A novel parameter cross-regional transfer approach based on interpretable machine learning (XGBoost) is proposed to accurately predict runoff processes in ungauged regions by leveraging well-trained models across numerous basins within climate zones. We validate the effectiveness of this framework across 5,764 basins in a large sample dataset (Caravan), employing Nash-Sutcliffe Efficiency (NSE), RMSE and bias to assess performance. And a comparison is made with deep transfer learning based on LSTM and Transformer. Results indicate that the proposed method achieves NSE values exceeding 0.2 for 75 % of the ungauged basins, demonstrating superior performance and more stable accuracy compared to pure deep learning models, owing to its incorporation of physical constraints. Furthermore, the response of parameters to basin attributes within different climatic zones in the large-sample context is elucidated through SHAP values, enriching the understanding of hydrological features through data-driven inverse inference. These findings underscore the capability of interpretable machine learning to leverage hydro-physical regularities extracted from abundant basin features, thereby enhancing the accuracy of runoff predictions in ungauged regions.</p></div>","PeriodicalId":362,"journal":{"name":"Journal of Hydrology","volume":null,"pages":null},"PeriodicalIF":5.9000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydrology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022169424009946","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

Abstract

The distribution of flowmeter data and basin characteristic information exhibits substantial disparities, with most flow observations being recorded at a limited number of well-monitored locations. The perennial challenge of achieving reliable and robust hydrological modeling in ungauged catchments through regionalization has persisted. The increasing availability of large-scale hydrological datasets, coupled with recent advancements in machine learning techniques, offers new opportunities to explore patterns of association between basin attributes and hydrological parameters to enhance streamflow predictions. A novel parameter cross-regional transfer approach based on interpretable machine learning (XGBoost) is proposed to accurately predict runoff processes in ungauged regions by leveraging well-trained models across numerous basins within climate zones. We validate the effectiveness of this framework across 5,764 basins in a large sample dataset (Caravan), employing Nash-Sutcliffe Efficiency (NSE), RMSE and bias to assess performance. And a comparison is made with deep transfer learning based on LSTM and Transformer. Results indicate that the proposed method achieves NSE values exceeding 0.2 for 75 % of the ungauged basins, demonstrating superior performance and more stable accuracy compared to pure deep learning models, owing to its incorporation of physical constraints. Furthermore, the response of parameters to basin attributes within different climatic zones in the large-sample context is elucidated through SHAP values, enriching the understanding of hydrological features through data-driven inverse inference. These findings underscore the capability of interpretable machine learning to leverage hydro-physical regularities extracted from abundant basin features, thereby enhancing the accuracy of runoff predictions in ungauged regions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在大样本上进行可解释的机器学习，支持无测站流域的径流估算

流量计数据和流域特征信息的分布存在很大差异，大多数流量观测数据都是在少数监测良好的地点记录的。如何通过区域化方法在无测站流域建立可靠、稳健的水文模型，是一项长期存在的挑战。大规模水文数据集的可用性不断提高，再加上机器学习技术的最新进展，为探索流域属性与水文参数之间的关联模式提供了新的机遇，从而提高了对溪流的预测能力。我们提出了一种基于可解释机器学习（XGBoost）的新型参数跨区域转移方法，通过利用气候带内众多流域中训练有素的模型，准确预测无测站地区的径流过程。我们利用纳什-苏克里夫效率（NSE）、均方根误差（RMSE）和偏差评估性能，在大型样本数据集（Caravan）的 5,764 个流域中验证了该框架的有效性。并与基于 LSTM 和 Transformer 的深度迁移学习进行了比较。结果表明，与纯粹的深度学习模型相比，所提出的方法在 75% 的无测站流域的 NSE 值超过了 0.2，表现出更优越的性能和更稳定的精度，这得益于其结合了物理约束条件。此外，在大样本背景下，通过 SHAP 值阐明了参数对不同气候带内流域属性的响应，通过数据驱动的逆推理丰富了对水文特征的理解。这些发现强调了可解释机器学习利用从丰富的流域特征中提取的水文物理规律性的能力，从而提高了无测站地区径流预测的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Hydrology 地学-地球科学综合

CiteScore

11.00

自引率

12.50%

发文量

1309

审稿时长

7.5 months

期刊介绍： The Journal of Hydrology publishes original research papers and comprehensive reviews in all the subfields of the hydrological sciences including water based management and policy issues that impact on economics and society. These comprise, but are not limited to the physical, chemical, biogeochemical, stochastic and systems aspects of surface and groundwater hydrology, hydrometeorology and hydrogeology. Relevant topics incorporating the insights and methodologies of disciplines such as climatology, water resource systems, hydraulics, agrohydrology, geomorphology, soil science, instrumentation and remote sensing, civil and environmental engineering are included. Social science perspectives on hydrological problems such as resource and ecological economics, environmental sociology, psychology and behavioural science, management and policy analysis are also invited. Multi-and interdisciplinary analyses of hydrological problems are within scope. The science published in the Journal of Hydrology is relevant to catchment scales rather than exclusively to a local scale or site.