{"title":"利用机器学习估计泰国南部盆地未测量盆地的月流量","authors":"Nureehan Salaeh , Pakorn Ditthakit , Sirimon Pinthong , Warit Wipulanusat , Uruya Weesakul , Ismail Elkhrachy , Krishna Kumar Yadav , Ghadah Shukri Albakri , Maha Awjan Alreshidi , Nand Lal Kushwaha , Mohamed Elsahabi","doi":"10.1016/j.pce.2024.103840","DOIUrl":null,"url":null,"abstract":"<div><div>Predicting streamflow in ungauged basins is a challenging hydrological issue that requires accurate estimation for effective water resource management. This article aims to evaluate the effectiveness of five different Machine Learning (ML) models (i.e., M5 model tree (M5), Random Forest (RF), Support Vector Regression-polynomial kernel (SVR-poly), Support Vector Regression-radial basis function kernel (SVR-rbf), and Multilayer Perceptron (MLP)) for predicting monthly streamflow in ungauged basins. The proposed models were compared with the method of GR2M's regionalized model parameters. Data was collected from 37 streamflow stations in the southern basin of Thailand. The data utilized included hydrological information like monthly rainfall, potential evapotranspiration, and streamflow, as well as physical watershed characteristics such as basin size, river length, distance from the hydrometric station to the area's centroid, and slope. The study evaluated these methods for two distinct scenarios, namely (a) estimating average monthly streamflow and (b) estimating monthly streamflow. The study was conducted in four phases: selection of input data, hyperparameter tuning, performance comparison of different models, and assessment of the chosen model's suitability for predicting monthly streamflow in ungauged basins. Five-fold cross-validation with four statistical indicators, namely, the Nash-Sutcliffe Efficiency (NSE), Overall Index (OI), Coefficient of Determination (r<sup>2</sup>), and Combined Index (CI), were utilized for the model's performance comparison. The results showed that the RF model produced the best performance compared to other ML models and outperformed the GR2M's regionalized model parameters in both scenarios, achieving performance indicators with NSE >0.6, OI > 0.6, r<sup>2</sup> > 0.6, and CI > 2.0.</div></div>","PeriodicalId":54616,"journal":{"name":"Physics and Chemistry of the Earth","volume":"138 ","pages":"Article 103840"},"PeriodicalIF":4.6000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Utilizing machine learning to estimate monthly streamflow in ungauged basins of Thailand's southern basin\",\"authors\":\"Nureehan Salaeh , Pakorn Ditthakit , Sirimon Pinthong , Warit Wipulanusat , Uruya Weesakul , Ismail Elkhrachy , Krishna Kumar Yadav , Ghadah Shukri Albakri , Maha Awjan Alreshidi , Nand Lal Kushwaha , Mohamed Elsahabi\",\"doi\":\"10.1016/j.pce.2024.103840\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Predicting streamflow in ungauged basins is a challenging hydrological issue that requires accurate estimation for effective water resource management. This article aims to evaluate the effectiveness of five different Machine Learning (ML) models (i.e., M5 model tree (M5), Random Forest (RF), Support Vector Regression-polynomial kernel (SVR-poly), Support Vector Regression-radial basis function kernel (SVR-rbf), and Multilayer Perceptron (MLP)) for predicting monthly streamflow in ungauged basins. The proposed models were compared with the method of GR2M's regionalized model parameters. Data was collected from 37 streamflow stations in the southern basin of Thailand. The data utilized included hydrological information like monthly rainfall, potential evapotranspiration, and streamflow, as well as physical watershed characteristics such as basin size, river length, distance from the hydrometric station to the area's centroid, and slope. The study evaluated these methods for two distinct scenarios, namely (a) estimating average monthly streamflow and (b) estimating monthly streamflow. The study was conducted in four phases: selection of input data, hyperparameter tuning, performance comparison of different models, and assessment of the chosen model's suitability for predicting monthly streamflow in ungauged basins. Five-fold cross-validation with four statistical indicators, namely, the Nash-Sutcliffe Efficiency (NSE), Overall Index (OI), Coefficient of Determination (r<sup>2</sup>), and Combined Index (CI), were utilized for the model's performance comparison. The results showed that the RF model produced the best performance compared to other ML models and outperformed the GR2M's regionalized model parameters in both scenarios, achieving performance indicators with NSE >0.6, OI > 0.6, r<sup>2</sup> > 0.6, and CI > 2.0.</div></div>\",\"PeriodicalId\":54616,\"journal\":{\"name\":\"Physics and Chemistry of the Earth\",\"volume\":\"138 \",\"pages\":\"Article 103840\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physics and Chemistry of the Earth\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1474706524002985\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/20 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physics and Chemistry of the Earth","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474706524002985","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/20 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
预测未测量流域的流量是一个具有挑战性的水文问题,需要准确估计有效的水资源管理。本文旨在评估五种不同的机器学习(ML)模型(即M5模型树(M5),随机森林(RF),支持向量回归-多项式核(SVR-poly),支持向量回归-径向基函数核(SVR-rbf)和多层感知器(MLP))在未测量流域预测每月流量的有效性。并与GR2M模型参数区域化方法进行了比较。数据是从泰国南部盆地的37个水流站收集的。利用的数据包括水文信息,如月降雨量、潜在蒸散量和流量,以及流域物理特征,如流域大小、河流长度、水文站到该地区质心的距离和坡度。该研究在两种不同的情景下对这些方法进行了评估,即(a)估计平均月流量和(b)估计月流量。研究分四个阶段进行:输入数据的选择、超参数调整、不同模型的性能比较,以及评估所选模型对未测量流域月度流量预测的适用性。采用Nash-Sutcliffe效率(NSE)、Overall Index (OI)、Coefficient of Determination (r2)、Combined Index (CI) 4个统计指标进行五重交叉验证,对模型进行性能比较。结果表明,与其他ML模型相比,RF模型的性能最好,并且在两种情况下都优于GR2M的区域化模型参数,实现了NSE >;0.6, OI >;0.6, r2 >;0.6, CI >;2.0.
Utilizing machine learning to estimate monthly streamflow in ungauged basins of Thailand's southern basin
Predicting streamflow in ungauged basins is a challenging hydrological issue that requires accurate estimation for effective water resource management. This article aims to evaluate the effectiveness of five different Machine Learning (ML) models (i.e., M5 model tree (M5), Random Forest (RF), Support Vector Regression-polynomial kernel (SVR-poly), Support Vector Regression-radial basis function kernel (SVR-rbf), and Multilayer Perceptron (MLP)) for predicting monthly streamflow in ungauged basins. The proposed models were compared with the method of GR2M's regionalized model parameters. Data was collected from 37 streamflow stations in the southern basin of Thailand. The data utilized included hydrological information like monthly rainfall, potential evapotranspiration, and streamflow, as well as physical watershed characteristics such as basin size, river length, distance from the hydrometric station to the area's centroid, and slope. The study evaluated these methods for two distinct scenarios, namely (a) estimating average monthly streamflow and (b) estimating monthly streamflow. The study was conducted in four phases: selection of input data, hyperparameter tuning, performance comparison of different models, and assessment of the chosen model's suitability for predicting monthly streamflow in ungauged basins. Five-fold cross-validation with four statistical indicators, namely, the Nash-Sutcliffe Efficiency (NSE), Overall Index (OI), Coefficient of Determination (r2), and Combined Index (CI), were utilized for the model's performance comparison. The results showed that the RF model produced the best performance compared to other ML models and outperformed the GR2M's regionalized model parameters in both scenarios, achieving performance indicators with NSE >0.6, OI > 0.6, r2 > 0.6, and CI > 2.0.
期刊介绍:
Physics and Chemistry of the Earth is an international interdisciplinary journal for the rapid publication of collections of refereed communications in separate thematic issues, either stemming from scientific meetings, or, especially compiled for the occasion. There is no restriction on the length of articles published in the journal. Physics and Chemistry of the Earth incorporates the separate Parts A, B and C which existed until the end of 2001.
Please note: the Editors are unable to consider submissions that are not invited or linked to a thematic issue. Please do not submit unsolicited papers.
The journal covers the following subject areas:
-Solid Earth and Geodesy:
(geology, geochemistry, tectonophysics, seismology, volcanology, palaeomagnetism and rock magnetism, electromagnetism and potential fields, marine and environmental geosciences as well as geodesy).
-Hydrology, Oceans and Atmosphere:
(hydrology and water resources research, engineering and management, oceanography and oceanic chemistry, shelf, sea, lake and river sciences, meteorology and atmospheric sciences incl. chemistry as well as climatology and glaciology).
-Solar-Terrestrial and Planetary Science:
(solar, heliospheric and solar-planetary sciences, geology, geophysics and atmospheric sciences of planets, satellites and small bodies as well as cosmochemistry and exobiology).