Interpretable machine learning on large samples for supporting runoff estimation in ungauged basins

IF 5.9 1区 地球科学 Q1 ENGINEERING, CIVIL Journal of Hydrology Pub Date : 2024-07-05 DOI:10.1016/j.jhydrol.2024.131598
Yuanhao Xu , Kairong Lin , Caihong Hu , Shuli Wang , Qiang Wu , Jingwen Zhang , Mingzhong Xiao , Yufu Luo
{"title":"Interpretable machine learning on large samples for supporting runoff estimation in ungauged basins","authors":"Yuanhao Xu ,&nbsp;Kairong Lin ,&nbsp;Caihong Hu ,&nbsp;Shuli Wang ,&nbsp;Qiang Wu ,&nbsp;Jingwen Zhang ,&nbsp;Mingzhong Xiao ,&nbsp;Yufu Luo","doi":"10.1016/j.jhydrol.2024.131598","DOIUrl":null,"url":null,"abstract":"<div><p>The distribution of flowmeter data and basin characteristic information exhibits substantial disparities, with most flow observations being recorded at a limited number of well-monitored locations. The perennial challenge of achieving reliable and robust hydrological modeling in ungauged catchments through regionalization has persisted. The increasing availability of large-scale hydrological datasets, coupled with recent advancements in machine learning techniques, offers new opportunities to explore patterns of association between basin attributes and hydrological parameters to enhance streamflow predictions. A novel parameter cross-regional transfer approach based on interpretable machine learning (XGBoost) is proposed to accurately predict runoff processes in ungauged regions by leveraging well-trained models across numerous basins within climate zones. We validate the effectiveness of this framework across 5,764 basins in a large sample dataset (Caravan), employing Nash-Sutcliffe Efficiency (NSE), RMSE and bias to assess performance. And a comparison is made with deep transfer learning based on LSTM and Transformer. Results indicate that the proposed method achieves NSE values exceeding 0.2 for 75 % of the ungauged basins, demonstrating superior performance and more stable accuracy compared to pure deep learning models, owing to its incorporation of physical constraints. Furthermore, the response of parameters to basin attributes within different climatic zones in the large-sample context is elucidated through SHAP values, enriching the understanding of hydrological features through data-driven inverse inference. These findings underscore the capability of interpretable machine learning to leverage hydro-physical regularities extracted from abundant basin features, thereby enhancing the accuracy of runoff predictions in ungauged regions.</p></div>","PeriodicalId":362,"journal":{"name":"Journal of Hydrology","volume":null,"pages":null},"PeriodicalIF":5.9000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydrology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022169424009946","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0

Abstract

The distribution of flowmeter data and basin characteristic information exhibits substantial disparities, with most flow observations being recorded at a limited number of well-monitored locations. The perennial challenge of achieving reliable and robust hydrological modeling in ungauged catchments through regionalization has persisted. The increasing availability of large-scale hydrological datasets, coupled with recent advancements in machine learning techniques, offers new opportunities to explore patterns of association between basin attributes and hydrological parameters to enhance streamflow predictions. A novel parameter cross-regional transfer approach based on interpretable machine learning (XGBoost) is proposed to accurately predict runoff processes in ungauged regions by leveraging well-trained models across numerous basins within climate zones. We validate the effectiveness of this framework across 5,764 basins in a large sample dataset (Caravan), employing Nash-Sutcliffe Efficiency (NSE), RMSE and bias to assess performance. And a comparison is made with deep transfer learning based on LSTM and Transformer. Results indicate that the proposed method achieves NSE values exceeding 0.2 for 75 % of the ungauged basins, demonstrating superior performance and more stable accuracy compared to pure deep learning models, owing to its incorporation of physical constraints. Furthermore, the response of parameters to basin attributes within different climatic zones in the large-sample context is elucidated through SHAP values, enriching the understanding of hydrological features through data-driven inverse inference. These findings underscore the capability of interpretable machine learning to leverage hydro-physical regularities extracted from abundant basin features, thereby enhancing the accuracy of runoff predictions in ungauged regions.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在大样本上进行可解释的机器学习,支持无测站流域的径流估算
流量计数据和流域特征信息的分布存在很大差异,大多数流量观测数据都是在少数监测良好的地点记录的。如何通过区域化方法在无测站流域建立可靠、稳健的水文模型,是一项长期存在的挑战。大规模水文数据集的可用性不断提高,再加上机器学习技术的最新进展,为探索流域属性与水文参数之间的关联模式提供了新的机遇,从而提高了对溪流的预测能力。我们提出了一种基于可解释机器学习(XGBoost)的新型参数跨区域转移方法,通过利用气候带内众多流域中训练有素的模型,准确预测无测站地区的径流过程。我们利用纳什-苏克里夫效率(NSE)、均方根误差(RMSE)和偏差评估性能,在大型样本数据集(Caravan)的 5,764 个流域中验证了该框架的有效性。并与基于 LSTM 和 Transformer 的深度迁移学习进行了比较。结果表明,与纯粹的深度学习模型相比,所提出的方法在 75% 的无测站流域的 NSE 值超过了 0.2,表现出更优越的性能和更稳定的精度,这得益于其结合了物理约束条件。此外,在大样本背景下,通过 SHAP 值阐明了参数对不同气候带内流域属性的响应,通过数据驱动的逆推理丰富了对水文特征的理解。这些发现强调了可解释机器学习利用从丰富的流域特征中提取的水文物理规律性的能力,从而提高了无测站地区径流预测的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Hydrology
Journal of Hydrology 地学-地球科学综合
CiteScore
11.00
自引率
12.50%
发文量
1309
审稿时长
7.5 months
期刊介绍: The Journal of Hydrology publishes original research papers and comprehensive reviews in all the subfields of the hydrological sciences including water based management and policy issues that impact on economics and society. These comprise, but are not limited to the physical, chemical, biogeochemical, stochastic and systems aspects of surface and groundwater hydrology, hydrometeorology and hydrogeology. Relevant topics incorporating the insights and methodologies of disciplines such as climatology, water resource systems, hydraulics, agrohydrology, geomorphology, soil science, instrumentation and remote sensing, civil and environmental engineering are included. Social science perspectives on hydrological problems such as resource and ecological economics, environmental sociology, psychology and behavioural science, management and policy analysis are also invited. Multi-and interdisciplinary analyses of hydrological problems are within scope. The science published in the Journal of Hydrology is relevant to catchment scales rather than exclusively to a local scale or site.
期刊最新文献
Dam-break flood hazard and risk assessment of large dam for emergency preparedness: A study of Ukai Dam, India Analytical model of contaminant advection, diffusion and degradation in capped sediments and sensitivity to flow and sediment properties High-resolution monitoring of soil infiltration using distributed fiber optic A hydro-geomorphologic assessment of flood generation potentiality in ungauged sub-basins and their prioritization based on traditional, statistical, MCDM and Nash-GIUH models of a tropical plateau-fringe River The causes of algal blooms exist significant scale effect in tributary of the Three Gorges Reservoir
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1