Robustness of Optimized Decision Tree-Based Machine Learning Models to Map Gully Erosion Vulnerability

IF 2.9 Q2 SOIL SCIENCE Soil Systems Pub Date : 2023-05-16 DOI:10.3390/soilsystems7020050
Hasna Eloudi, Mohammed Hssaisoune, H. Reddad, M. Namous, Maryem Ismaili, S. Krimissa, Mustapha Ouayah, L. Bouchaou
{"title":"Robustness of Optimized Decision Tree-Based Machine Learning Models to Map Gully Erosion Vulnerability","authors":"Hasna Eloudi, Mohammed Hssaisoune, H. Reddad, M. Namous, Maryem Ismaili, S. Krimissa, Mustapha Ouayah, L. Bouchaou","doi":"10.3390/soilsystems7020050","DOIUrl":null,"url":null,"abstract":"Gully erosion is a worldwide threat with numerous environmental, social, and economic impacts. The purpose of this research is to evaluate the performance and robustness of six machine learning ensemble models based on the decision tree principle: Random Forest (RF), C5.0, XGBoost, treebag, Gradient Boosting Machines (GBMs) and Adaboost, in order to map and predict gully erosion-prone areas in a semi-arid mountain context. The first step was to prepare the inventory data, which consisted of 217 gully points. This database was then randomly subdivided into five percentages of Train/Test (50/50, 60/40, 70/30, 80/20, and 90/10) to assess the stability and robustness of the models. Furthermore, 17 geo-environmental variables were used as potential controlling factors, and several metrics were examined to evaluate the performance of the six models. The results revealed that all of the models used performed well in terms of predicting vulnerability to gully erosion. The C5.0 and RF models had the best prediction performance (AUC = 90.8 and AUC = 90.1, respectively). However, according to the random subdivisions of the database, these models exhibit small but noticeable instability, with high performance for the 80/20% and 70/30% subdivisions. This demonstrates the significance of database refining and the need to test various splitting data in order to ensure efficient and reliable output results.","PeriodicalId":21908,"journal":{"name":"Soil Systems","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2023-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Soil Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/soilsystems7020050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 2

Abstract

Gully erosion is a worldwide threat with numerous environmental, social, and economic impacts. The purpose of this research is to evaluate the performance and robustness of six machine learning ensemble models based on the decision tree principle: Random Forest (RF), C5.0, XGBoost, treebag, Gradient Boosting Machines (GBMs) and Adaboost, in order to map and predict gully erosion-prone areas in a semi-arid mountain context. The first step was to prepare the inventory data, which consisted of 217 gully points. This database was then randomly subdivided into five percentages of Train/Test (50/50, 60/40, 70/30, 80/20, and 90/10) to assess the stability and robustness of the models. Furthermore, 17 geo-environmental variables were used as potential controlling factors, and several metrics were examined to evaluate the performance of the six models. The results revealed that all of the models used performed well in terms of predicting vulnerability to gully erosion. The C5.0 and RF models had the best prediction performance (AUC = 90.8 and AUC = 90.1, respectively). However, according to the random subdivisions of the database, these models exhibit small but noticeable instability, with high performance for the 80/20% and 70/30% subdivisions. This demonstrates the significance of database refining and the need to test various splitting data in order to ensure efficient and reliable output results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于优化决策树的机器学习模型在沟蚀脆弱性映射中的稳健性
沟壑侵蚀是一个全球性的威胁,具有众多的环境、社会和经济影响。本研究的目的是评估基于决策树原理的6种机器学习集成模型:Random Forest (RF)、C5.0、XGBoost、treebag、Gradient Boosting Machines (GBMs)和Adaboost的性能和鲁棒性,以便在半干旱山地环境中绘制和预测沟谷侵蚀易损区。第一步是准备由217个沟点组成的库存数据。然后将该数据库随机细分为5个百分比的Train/Test(50/50、60/40、70/30、80/20和90/10),以评估模型的稳定性和稳健性。在此基础上,以17个地质环境变量作为潜在控制因子,并对6个模型的性能进行了评价。结果表明,所使用的所有模型在预测沟蚀脆弱性方面都表现良好。C5.0和RF模型预测效果最佳(AUC分别为90.8和90.1)。然而,根据数据库的随机细分,这些模型表现出较小但明显的不稳定性,在80/20%和70/30%细分时具有高性能。这证明了数据库细化的重要性,以及为了确保高效可靠的输出结果,需要测试各种分裂数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Soil Systems
Soil Systems Earth and Planetary Sciences-Earth-Surface Processes
CiteScore
5.30
自引率
5.70%
发文量
80
审稿时长
11 weeks
期刊最新文献
Structural Shifts in the Soil Prokaryotic Communities Marking the Podzol-Forming Process on Sand Dumps Soil Phytomining: Recent Developments—A Review Selenium and Heavy Metals in Soil–Plant System in a Hydrogeochemical Province with High Selenium Content in Groundwater: A Case Study of the Lower Dniester Valley Tillage and Cover Crop Systems Alter Soil Particle Size Distribution in Raised-Bed-and-Furrow Row-Crop Agroecosystems Shifts in Soil Bacterial Communities under Three-Year Fertilization Management and Multiple Cropping Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1