评估马来西亚柔佛河流域高变化情况下降雨量数据的估算方法

IF 2.6 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Applied Computing and Geosciences Pub Date : 2023-12-01 DOI:10.1016/j.acags.2023.100145
Zulfaqar Sa’adi , Zulkifli Yusop , Nor Eliza Alias , Ming Fai Chow , Mohd Khairul Idlan Muhammad , Muhammad Wafiy Adli Ramli , Zafar Iqbal , Mohammed Sanusi Shiru , Faizal Immaddudin Wira Rohmat , Nur Athirah Mohamad , Mohamad Faizal Ahmad
{"title":"评估马来西亚柔佛河流域高变化情况下降雨量数据的估算方法","authors":"Zulfaqar Sa’adi ,&nbsp;Zulkifli Yusop ,&nbsp;Nor Eliza Alias ,&nbsp;Ming Fai Chow ,&nbsp;Mohd Khairul Idlan Muhammad ,&nbsp;Muhammad Wafiy Adli Ramli ,&nbsp;Zafar Iqbal ,&nbsp;Mohammed Sanusi Shiru ,&nbsp;Faizal Immaddudin Wira Rohmat ,&nbsp;Nur Athirah Mohamad ,&nbsp;Mohamad Faizal Ahmad","doi":"10.1016/j.acags.2023.100145","DOIUrl":null,"url":null,"abstract":"<div><p>Missing values in rainfall records might result in erroneous predictions and inefficient management practices with significant economic, environmental, and social consequences. This is particularly important for rainfall datasets in Peninsular Malaysia (PM) due to the high level of missingness that can affect the inherent pattern in the highly variable time series. In this work, 21 target rainfall stations in the Johor River Basin (JRB) with daily data between 1970 and 2015 were used to examine 19 different multiple imputation methods that were carried out using the Multivariate Imputation by Chained Equations (MICE) package in R. For each station, artificial missing data were added at rates of up to 5%, 10%, 20%, and 30% for different types of missingness, namely, Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), leaving the original missing data intact. The imputation quality was evaluated based on several statistical performance metrics, namely mean absolute error (MAE), root mean square error (RMSE), normalized root mean square error (NRMSE), Nash-Sutcliffe efficiency (NSE), modified degree of agreement (MD), coefficient of determination (R2), Kling-Gupta efficiency (KGE), and volumetric efficiency (VE), which were later ranked and aggregated by using the compromise programming index (CPI) to select the best method. The results showed that linear regression predicted values (<em>norm.predict</em>) consistently ranked the highest under all types and levels of missingness. For example, under MAR, MNAR, and MCAR, this method showed the lowest MAE values, ranging between 0.78 and 2.25, 0.93–2.57, and 0.87–2.43, respectively. It also consistently shows higher NSE and R2 values of 0.71–0.92, 0.6–0.92, and 0.66–0.91, and 0.77–0.92, 0.71–0.93, and 0.75–0.92 under MAR, MCAR, and MNAR, respectively. The methods of <em>mean</em>, <em>rf</em>, and <em>cart</em> also appear to be efficient. The incorporation of the compromise programming index (CPI) as a decision-support tool has enabled an objective assessment of the output from the multiple performance metrics for the ranking and selection of the top-performing method. During validation, the Probability Density Function (PDF) demonstrated that even with up to 30% missingness, the shape of the distribution was retained after imputation compared to the actual data. The methodology proposed in this study can help in choosing suitable imputation methods for other tropical rainfall datasets, leading to improved accuracy in rainfall estimation and prediction.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"20 ","pages":"Article 100145"},"PeriodicalIF":2.6000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197423000344/pdfft?md5=807ccb11378bbc7aafaff142104149e9&pid=1-s2.0-S2590197423000344-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia\",\"authors\":\"Zulfaqar Sa’adi ,&nbsp;Zulkifli Yusop ,&nbsp;Nor Eliza Alias ,&nbsp;Ming Fai Chow ,&nbsp;Mohd Khairul Idlan Muhammad ,&nbsp;Muhammad Wafiy Adli Ramli ,&nbsp;Zafar Iqbal ,&nbsp;Mohammed Sanusi Shiru ,&nbsp;Faizal Immaddudin Wira Rohmat ,&nbsp;Nur Athirah Mohamad ,&nbsp;Mohamad Faizal Ahmad\",\"doi\":\"10.1016/j.acags.2023.100145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Missing values in rainfall records might result in erroneous predictions and inefficient management practices with significant economic, environmental, and social consequences. This is particularly important for rainfall datasets in Peninsular Malaysia (PM) due to the high level of missingness that can affect the inherent pattern in the highly variable time series. In this work, 21 target rainfall stations in the Johor River Basin (JRB) with daily data between 1970 and 2015 were used to examine 19 different multiple imputation methods that were carried out using the Multivariate Imputation by Chained Equations (MICE) package in R. For each station, artificial missing data were added at rates of up to 5%, 10%, 20%, and 30% for different types of missingness, namely, Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), leaving the original missing data intact. The imputation quality was evaluated based on several statistical performance metrics, namely mean absolute error (MAE), root mean square error (RMSE), normalized root mean square error (NRMSE), Nash-Sutcliffe efficiency (NSE), modified degree of agreement (MD), coefficient of determination (R2), Kling-Gupta efficiency (KGE), and volumetric efficiency (VE), which were later ranked and aggregated by using the compromise programming index (CPI) to select the best method. The results showed that linear regression predicted values (<em>norm.predict</em>) consistently ranked the highest under all types and levels of missingness. For example, under MAR, MNAR, and MCAR, this method showed the lowest MAE values, ranging between 0.78 and 2.25, 0.93–2.57, and 0.87–2.43, respectively. It also consistently shows higher NSE and R2 values of 0.71–0.92, 0.6–0.92, and 0.66–0.91, and 0.77–0.92, 0.71–0.93, and 0.75–0.92 under MAR, MCAR, and MNAR, respectively. The methods of <em>mean</em>, <em>rf</em>, and <em>cart</em> also appear to be efficient. The incorporation of the compromise programming index (CPI) as a decision-support tool has enabled an objective assessment of the output from the multiple performance metrics for the ranking and selection of the top-performing method. During validation, the Probability Density Function (PDF) demonstrated that even with up to 30% missingness, the shape of the distribution was retained after imputation compared to the actual data. The methodology proposed in this study can help in choosing suitable imputation methods for other tropical rainfall datasets, leading to improved accuracy in rainfall estimation and prediction.</p></div>\",\"PeriodicalId\":33804,\"journal\":{\"name\":\"Applied Computing and Geosciences\",\"volume\":\"20 \",\"pages\":\"Article 100145\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2590197423000344/pdfft?md5=807ccb11378bbc7aafaff142104149e9&pid=1-s2.0-S2590197423000344-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computing and Geosciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590197423000344\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197423000344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

降雨记录中的缺失值可能会导致错误的预测和低效的管理方法,从而造成严重的经济、环境和社会后果。这一点对于马来西亚半岛(PM)的降雨数据集尤为重要,因为高水平的缺失会影响高度多变的时间序列中的固有模式。在这项研究中,使用 R 软件包中的 "链式方程多变量估算(MICE)",对柔佛河流域(JRB)21 个目标雨量站 1970 年至 2015 年的每日数据进行了研究,并检验了 19 种不同的多重估算方法。针对不同类型的缺失(即完全随机缺失(MCAR)、随机缺失(MAR)和非随机缺失(MNAR)),对每个测站分别按高达 5%、10%、20% 和 30% 的比例添加人工缺失数据,并保留原始缺失数据。根据几个统计性能指标,即平均绝对误差(MAE)、均方根误差(RMSE)、归一化均方根误差(NRMSE)、纳什-苏特克利夫效率(NSE)、修正一致度(MD)、判定系数(R2)、克林-古普塔效率(KGE)和容积效率(VE),对估算质量进行了评估,随后使用折中方案指数(CPI)对这些指标进行排序和汇总,以选出最佳方法。结果表明,线性回归预测值(norm.predict)在所有类型和级别的缺失率中始终排名最高。例如,在 MAR、MNAR 和 MCAR 下,该方法的 MAE 值最低,分别为 0.78 至 2.25、0.93 至 2.57 和 0.87 至 2.43。在 MAR、MCAR 和 MNAR 下,它的 NSE 和 R2 值也一直较高,分别为 0.71-0.92、0.6-0.92 和 0.66-0.91,以及 0.77-0.92、0.71-0.93 和 0.75-0.92。均值法、rf 法和推车法似乎也很有效。将折中方案设计指数(CPI)作为决策支持工具,可以对多种性能指标的输出进行客观评估,从而排序和选择性能最佳的方法。在验证过程中,概率密度函数(PDF)表明,即使缺失率高达 30%,与实际数据相比,估算后的分布形状仍得以保留。本研究提出的方法有助于为其他热带降雨数据集选择合适的估算方法,从而提高降雨估算和预测的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia

Missing values in rainfall records might result in erroneous predictions and inefficient management practices with significant economic, environmental, and social consequences. This is particularly important for rainfall datasets in Peninsular Malaysia (PM) due to the high level of missingness that can affect the inherent pattern in the highly variable time series. In this work, 21 target rainfall stations in the Johor River Basin (JRB) with daily data between 1970 and 2015 were used to examine 19 different multiple imputation methods that were carried out using the Multivariate Imputation by Chained Equations (MICE) package in R. For each station, artificial missing data were added at rates of up to 5%, 10%, 20%, and 30% for different types of missingness, namely, Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), leaving the original missing data intact. The imputation quality was evaluated based on several statistical performance metrics, namely mean absolute error (MAE), root mean square error (RMSE), normalized root mean square error (NRMSE), Nash-Sutcliffe efficiency (NSE), modified degree of agreement (MD), coefficient of determination (R2), Kling-Gupta efficiency (KGE), and volumetric efficiency (VE), which were later ranked and aggregated by using the compromise programming index (CPI) to select the best method. The results showed that linear regression predicted values (norm.predict) consistently ranked the highest under all types and levels of missingness. For example, under MAR, MNAR, and MCAR, this method showed the lowest MAE values, ranging between 0.78 and 2.25, 0.93–2.57, and 0.87–2.43, respectively. It also consistently shows higher NSE and R2 values of 0.71–0.92, 0.6–0.92, and 0.66–0.91, and 0.77–0.92, 0.71–0.93, and 0.75–0.92 under MAR, MCAR, and MNAR, respectively. The methods of mean, rf, and cart also appear to be efficient. The incorporation of the compromise programming index (CPI) as a decision-support tool has enabled an objective assessment of the output from the multiple performance metrics for the ranking and selection of the top-performing method. During validation, the Probability Density Function (PDF) demonstrated that even with up to 30% missingness, the shape of the distribution was retained after imputation compared to the actual data. The methodology proposed in this study can help in choosing suitable imputation methods for other tropical rainfall datasets, leading to improved accuracy in rainfall estimation and prediction.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Computing and Geosciences
Applied Computing and Geosciences Computer Science-General Computer Science
CiteScore
5.50
自引率
0.00%
发文量
23
审稿时长
5 weeks
期刊最新文献
Revolutionizing the future of hydrological science: Impact of machine learning and deep learning amidst emerging explainable AI and transfer learning Generating land gravity anomalies from satellite gravity observations using PIX2PIX GAN image translation Reconstruction of reservoir rock using attention-based convolutional recurrent neural network Mapping landforms of a hilly landscape using machine learning and high-resolution LiDAR topographic data Evaluating the performances of SVR and XGBoost for short-range forecasting of heatwaves across different temperature zones of India
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1