Evaluating Landslide Susceptibility Using Sampling Methodology and Multiple Machine Learning Models

Yingze Song, Degang Yang, Weicheng Wu, Xin Zhang, Jie Zhou, Zhaoxu Tian, Chencan Wang, Yingxu Song
{"title":"Evaluating Landslide Susceptibility Using Sampling Methodology and Multiple Machine Learning Models","authors":"Yingze Song, Degang Yang, Weicheng Wu, Xin Zhang, Jie Zhou, Zhaoxu Tian, Chencan Wang, Yingxu Song","doi":"10.3390/ijgi12050197","DOIUrl":null,"url":null,"abstract":"Landslide susceptibility assessment (LSA) based on machine learning methods has been widely used in landslide geological hazard management and research. However, the problem of sample imbalance in landslide susceptibility assessment, where landslide samples tend to be much smaller than non-landslide samples, is often overlooked. This problem is often one of the important factors affecting the performance of landslide susceptibility models. In this paper, we take the Wanzhou district of Chongqing city as an example, where the total number of data sets is more than 580,000 and the ratio of positive to negative samples is 1:19. We oversample or undersample the unbalanced landslide samples to make them balanced, and then compare the performance of machine learning models with different sampling strategies. Three classic machine learning algorithms, logistic regression, random forest and LightGBM, are used for LSA modeling. The results show that the model trained directly using the unbalanced sample dataset performs the worst, showing an extremely low recall rate, indicating that its predictive ability for landslide samples is extremely low and cannot be applied in practice. Compared with the original dataset, the sample set optimized through certain methods has demonstrated improved predictive performance across various classifiers, manifested in the improvement of AUC value and recall rate. The best model was the random forest model using over-sampling (O_RF) (AUC = 0.932).","PeriodicalId":14614,"journal":{"name":"ISPRS Int. J. Geo Inf.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Int. J. Geo Inf.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/ijgi12050197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Landslide susceptibility assessment (LSA) based on machine learning methods has been widely used in landslide geological hazard management and research. However, the problem of sample imbalance in landslide susceptibility assessment, where landslide samples tend to be much smaller than non-landslide samples, is often overlooked. This problem is often one of the important factors affecting the performance of landslide susceptibility models. In this paper, we take the Wanzhou district of Chongqing city as an example, where the total number of data sets is more than 580,000 and the ratio of positive to negative samples is 1:19. We oversample or undersample the unbalanced landslide samples to make them balanced, and then compare the performance of machine learning models with different sampling strategies. Three classic machine learning algorithms, logistic regression, random forest and LightGBM, are used for LSA modeling. The results show that the model trained directly using the unbalanced sample dataset performs the worst, showing an extremely low recall rate, indicating that its predictive ability for landslide samples is extremely low and cannot be applied in practice. Compared with the original dataset, the sample set optimized through certain methods has demonstrated improved predictive performance across various classifiers, manifested in the improvement of AUC value and recall rate. The best model was the random forest model using over-sampling (O_RF) (AUC = 0.932).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用抽样方法和多机器学习模型评估滑坡易感性
基于机器学习方法的滑坡易感性评价(LSA)在滑坡地质灾害管理与研究中得到了广泛的应用。然而,在滑坡易感性评价中,滑坡样本往往比非滑坡样本小得多,因而往往忽视了样本不平衡问题。这一问题往往是影响滑坡敏感性模型性能的重要因素之一。本文以重庆市万州区为例,数据集总数超过58万,正样本与负样本之比为1:19。我们对不平衡的滑坡样本进行过采样或欠采样,使其平衡,然后比较不同采样策略下机器学习模型的性能。三种经典的机器学习算法,逻辑回归,随机森林和LightGBM,用于LSA建模。结果表明,直接使用不平衡样本数据集训练的模型表现最差,召回率极低,表明其对滑坡样本的预测能力极低,无法应用于实践。与原始数据集相比,经过一定方法优化后的样本集在各种分类器上的预测性能都有所提高,表现为AUC值和召回率的提高。最佳模型为过度抽样随机森林模型(O_RF) (AUC = 0.932)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Vertical vs. Horizontal Fractal Dimensions of Roads in Relation to Relief Characteristics A Head/Tail Breaks-Based Approach to Characterizing Space-Time Risks of COVID-19 Epidemic in China's Cities Mapping Gross Domestic Product Distribution at 1 km Resolution across Thailand Using the Random Forest Area-to-Area Regression Kriging Model Effects of Spatial Reference Frames, Map Dimensionality, and Navigation Modes on Spatial Orientation Efficiency Efficient Construction of Voxel Models for Ore Bodies Using an Improved Winding Number Algorithm and CUDA Parallel Computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1