{"title":"基于gis的印度东部Pagla河流域洪水易感性分析的机器学习算法","authors":"Nur Islam Saikh, Prolay Mondal","doi":"10.1016/j.nhres.2023.05.004","DOIUrl":null,"url":null,"abstract":"<div><p>The unique characteristics of drainage conditions in the Pagla river basin cause flooding and harm the socioeconomic environment. The main purpose of this study is to investigate the comparative utility of six machine learning algorithms to improve flood susceptibility and ensemble techniques' capability to elucidate the underlying patterns of floods and make a more accurate prediction of flood susceptibilities in the Pagla river basin. In the present scenario, the frequency of flood conditions in this study area becomes high with heavy and sudden rainfall, so it is essential to study flood mitigation and measure. At first, a spatial flood database was built with 200 flood locations and sixteen flood influencing factors, and its process with the help of the Geographic Information System (GIS) environment and build up different models applying the machine learning techniques. It has found different flood susceptibility zone using machine learning-based Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest (RF), Reduced Error Pruning Tree (REPTree), Logistic Regression (LR), and Bagging helping GIS environment and the model validation using the Receiver Operating Characteristic (ROC) Curve. Afterward, ensemble all the models to gate comparative accuracy of the flood zone. The calculated areas are under the very high flood susceptibility zone 8.69%, 14.92%, 14.17%, 12.98%, 14.65%, 13.24% and 13.41% for ANN, SVM, RF, REPTree, LR and Bagging, respectively. Finally, ROC curve, the Standard Error (SE), and the Confidence Interval (CI) at 95 per cent were used to assess and compare the performance of the models. The obtained results indicate that all models are highly accepted Area Under Curve (AUC) of ROC between 0.889 (LR) to 0.926 (Ensemble). From the estimation of the accuracy of the applied methods using ROC, it is found that the Ensemble model has the higher capability compared to the other applied models in projecting flood susceptibility in the study area. It has the highest area under the ROC curve the AUC values are 0.918 and 0.926, the SE (0.023, 034), and the narrowest CI (95 per cent) (0.873–0.962, 0.859–0.993) whereas highest area under Bagging (the ROC) curve (AUC) value (0.914, 0.919), for both the training and validation datasets. After ensembling, the result shows that the result is a highly flood susceptible area located at the lower part of the study area. In this area, the very high flood susceptibility zone values lie between 4.46 and 6.00 in the ensemble result. The areas comprise the low height and belong to Murarai I, Murarai II, Suti I and Suti II C.D. block of West Bengal. The current study will help the policymakers and the researcher determine the flood conditioning problems for prospects.</p></div>","PeriodicalId":100943,"journal":{"name":"Natural Hazards Research","volume":"3 3","pages":"Pages 420-436"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666592123000513/pdfft?md5=f1ef707eca8ab96046b577d64e103e1f&pid=1-s2.0-S2666592123000513-main.pdf","citationCount":"5","resultStr":"{\"title\":\"GIS-based machine learning algorithm for flood susceptibility analysis in the Pagla river basin, Eastern India\",\"authors\":\"Nur Islam Saikh, Prolay Mondal\",\"doi\":\"10.1016/j.nhres.2023.05.004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The unique characteristics of drainage conditions in the Pagla river basin cause flooding and harm the socioeconomic environment. The main purpose of this study is to investigate the comparative utility of six machine learning algorithms to improve flood susceptibility and ensemble techniques' capability to elucidate the underlying patterns of floods and make a more accurate prediction of flood susceptibilities in the Pagla river basin. In the present scenario, the frequency of flood conditions in this study area becomes high with heavy and sudden rainfall, so it is essential to study flood mitigation and measure. At first, a spatial flood database was built with 200 flood locations and sixteen flood influencing factors, and its process with the help of the Geographic Information System (GIS) environment and build up different models applying the machine learning techniques. It has found different flood susceptibility zone using machine learning-based Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest (RF), Reduced Error Pruning Tree (REPTree), Logistic Regression (LR), and Bagging helping GIS environment and the model validation using the Receiver Operating Characteristic (ROC) Curve. Afterward, ensemble all the models to gate comparative accuracy of the flood zone. The calculated areas are under the very high flood susceptibility zone 8.69%, 14.92%, 14.17%, 12.98%, 14.65%, 13.24% and 13.41% for ANN, SVM, RF, REPTree, LR and Bagging, respectively. Finally, ROC curve, the Standard Error (SE), and the Confidence Interval (CI) at 95 per cent were used to assess and compare the performance of the models. The obtained results indicate that all models are highly accepted Area Under Curve (AUC) of ROC between 0.889 (LR) to 0.926 (Ensemble). From the estimation of the accuracy of the applied methods using ROC, it is found that the Ensemble model has the higher capability compared to the other applied models in projecting flood susceptibility in the study area. It has the highest area under the ROC curve the AUC values are 0.918 and 0.926, the SE (0.023, 034), and the narrowest CI (95 per cent) (0.873–0.962, 0.859–0.993) whereas highest area under Bagging (the ROC) curve (AUC) value (0.914, 0.919), for both the training and validation datasets. After ensembling, the result shows that the result is a highly flood susceptible area located at the lower part of the study area. In this area, the very high flood susceptibility zone values lie between 4.46 and 6.00 in the ensemble result. The areas comprise the low height and belong to Murarai I, Murarai II, Suti I and Suti II C.D. block of West Bengal. The current study will help the policymakers and the researcher determine the flood conditioning problems for prospects.</p></div>\",\"PeriodicalId\":100943,\"journal\":{\"name\":\"Natural Hazards Research\",\"volume\":\"3 3\",\"pages\":\"Pages 420-436\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666592123000513/pdfft?md5=f1ef707eca8ab96046b577d64e103e1f&pid=1-s2.0-S2666592123000513-main.pdf\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Hazards Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666592123000513\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Hazards Research","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666592123000513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
摘要
帕格拉河流域独特的排水条件导致洪水泛滥,对社会经济环境造成危害。本研究的主要目的是研究六种机器学习算法在提高洪水敏感性和集合技术的能力方面的比较效用,以阐明洪水的潜在模式,并更准确地预测Pagla河流域的洪水敏感性。在当前情景下,研究区洪涝条件发生频率较高,且暴雨和突发性降雨较多,因此研究洪水的缓解和措施是必要的。首先,在地理信息系统(GIS)环境下,建立了包含200个洪水位置和16个洪水影响因子的空间洪水数据库,并利用机器学习技术建立了不同的模型。利用基于机器学习的人工神经网络(ANN)、支持向量机(SVM)、随机森林(RF)、减少误差剪枝树(REPTree)、逻辑回归(LR)和Bagging方法,发现了不同的洪水易感区,并利用接受者工作特征(ROC)曲线对模型进行了验证。然后,对所有模型进行综合,以获得洪水区域的比较精度。ANN、SVM、RF、REPTree、LR和Bagging的计算面积分别为8.69%、14.92%、14.17%、12.98%、14.65%、13.24%和13.41%。最后,使用ROC曲线、标准误差(SE)和95%的置信区间(CI)来评估和比较模型的性能。结果表明,各模型的ROC曲线下面积(AUC)在0.889 (LR) ~ 0.926 (Ensemble)之间,均具有较高的可接受性。利用ROC对应用方法的精度进行估计,发现Ensemble模型在预测研究区洪水敏感性方面具有较高的能力。对于训练和验证数据集,它具有最高的ROC曲线下面积(AUC值分别为0.918和0.926),SE(0.023, 034)和最窄的CI(95%)(0.873-0.962, 0.859-0.993),而最大的Bagging (ROC)曲线下面积(AUC)值(0.914,0.919)。综合后的结果表明,研究区下部为高洪水易感区。综合结果表明,该区洪水敏感性极高区值在4.46 ~ 6.00之间。这些地区包括低海拔地区,属于西孟加拉邦的Murarai I, Murarai II, Suti I和Suti II cd块。本文的研究将有助于决策者和研究者确定前景区的洪水调节问题。
GIS-based machine learning algorithm for flood susceptibility analysis in the Pagla river basin, Eastern India
The unique characteristics of drainage conditions in the Pagla river basin cause flooding and harm the socioeconomic environment. The main purpose of this study is to investigate the comparative utility of six machine learning algorithms to improve flood susceptibility and ensemble techniques' capability to elucidate the underlying patterns of floods and make a more accurate prediction of flood susceptibilities in the Pagla river basin. In the present scenario, the frequency of flood conditions in this study area becomes high with heavy and sudden rainfall, so it is essential to study flood mitigation and measure. At first, a spatial flood database was built with 200 flood locations and sixteen flood influencing factors, and its process with the help of the Geographic Information System (GIS) environment and build up different models applying the machine learning techniques. It has found different flood susceptibility zone using machine learning-based Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest (RF), Reduced Error Pruning Tree (REPTree), Logistic Regression (LR), and Bagging helping GIS environment and the model validation using the Receiver Operating Characteristic (ROC) Curve. Afterward, ensemble all the models to gate comparative accuracy of the flood zone. The calculated areas are under the very high flood susceptibility zone 8.69%, 14.92%, 14.17%, 12.98%, 14.65%, 13.24% and 13.41% for ANN, SVM, RF, REPTree, LR and Bagging, respectively. Finally, ROC curve, the Standard Error (SE), and the Confidence Interval (CI) at 95 per cent were used to assess and compare the performance of the models. The obtained results indicate that all models are highly accepted Area Under Curve (AUC) of ROC between 0.889 (LR) to 0.926 (Ensemble). From the estimation of the accuracy of the applied methods using ROC, it is found that the Ensemble model has the higher capability compared to the other applied models in projecting flood susceptibility in the study area. It has the highest area under the ROC curve the AUC values are 0.918 and 0.926, the SE (0.023, 034), and the narrowest CI (95 per cent) (0.873–0.962, 0.859–0.993) whereas highest area under Bagging (the ROC) curve (AUC) value (0.914, 0.919), for both the training and validation datasets. After ensembling, the result shows that the result is a highly flood susceptible area located at the lower part of the study area. In this area, the very high flood susceptibility zone values lie between 4.46 and 6.00 in the ensemble result. The areas comprise the low height and belong to Murarai I, Murarai II, Suti I and Suti II C.D. block of West Bengal. The current study will help the policymakers and the researcher determine the flood conditioning problems for prospects.