Sample size effects on landslide susceptibility models: A comparative study of heuristic, statistical, machine learning, deep learning and ensemble learning models with SHAP analysis

IF 4.2 2区 地球科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Geosciences Pub Date : 2024-09-13 DOI:10.1016/j.cageo.2024.105723
Shilong Yang , Jiayao Tan , Danyuan Luo , Yuzhou Wang , Xu Guo , Qiuyu Zhu , Chuanming Ma , Hanxiang Xiong
{"title":"Sample size effects on landslide susceptibility models: A comparative study of heuristic, statistical, machine learning, deep learning and ensemble learning models with SHAP analysis","authors":"Shilong Yang ,&nbsp;Jiayao Tan ,&nbsp;Danyuan Luo ,&nbsp;Yuzhou Wang ,&nbsp;Xu Guo ,&nbsp;Qiuyu Zhu ,&nbsp;Chuanming Ma ,&nbsp;Hanxiang Xiong","doi":"10.1016/j.cageo.2024.105723","DOIUrl":null,"url":null,"abstract":"<div><p>In landslide susceptibility assessment (LSA), inventory incompleteness impacts the accuracy of different models to varying degrees. However, this area remains under-researched. This study investigated six LSA models from heuristic, statistical, machine learning and ensemble learning models (analytical hierarchy process (AHP), frequency ratio (FR), logistic regression (LR), Keras based deep learning (KBDL), XGBoost, and LightGBM) across six different sample sizes (100%, 90%, 75%, 50%, 25%, and 10%). Results revealed that XGBoost and LightGBM consistently outperformed other models across all sample sizes. The LR and KBDL models followed, while FR model was the most affected by sample size variations. AHP, an empirical model, remained unaffected by sample size. Through SHapley Additive exPlanations (SHAP) analysis, elevation, NDVI, slope, land use, and distance to roads and rivers emerged as pivotal indicators for landslide occurrences in the study area, suggesting that human activities significantly influence these events. Five time-varying indicators regarding human activity and climate validated this inference, which provides a new method to identify landslide triggering factors, especially in areas of intense human activity. Based on the findings, a comprehensive framework for LSA is proposed to assist landslide managers in making informed decisions. Future research should focus on expanding model diversity to address the effects of sample size, enhancing the adaptability of the LSA framework, deepening the analysis of human activity impacts on landslides using explainable machine learning techniques, addressing temporal inventory incompleteness in LSA, and critically evaluating model sensitivity to sample size variations across multiple disciplines.</p></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"193 ","pages":"Article 105723"},"PeriodicalIF":4.2000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300424002061","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

In landslide susceptibility assessment (LSA), inventory incompleteness impacts the accuracy of different models to varying degrees. However, this area remains under-researched. This study investigated six LSA models from heuristic, statistical, machine learning and ensemble learning models (analytical hierarchy process (AHP), frequency ratio (FR), logistic regression (LR), Keras based deep learning (KBDL), XGBoost, and LightGBM) across six different sample sizes (100%, 90%, 75%, 50%, 25%, and 10%). Results revealed that XGBoost and LightGBM consistently outperformed other models across all sample sizes. The LR and KBDL models followed, while FR model was the most affected by sample size variations. AHP, an empirical model, remained unaffected by sample size. Through SHapley Additive exPlanations (SHAP) analysis, elevation, NDVI, slope, land use, and distance to roads and rivers emerged as pivotal indicators for landslide occurrences in the study area, suggesting that human activities significantly influence these events. Five time-varying indicators regarding human activity and climate validated this inference, which provides a new method to identify landslide triggering factors, especially in areas of intense human activity. Based on the findings, a comprehensive framework for LSA is proposed to assist landslide managers in making informed decisions. Future research should focus on expanding model diversity to address the effects of sample size, enhancing the adaptability of the LSA framework, deepening the analysis of human activity impacts on landslides using explainable machine learning techniques, addressing temporal inventory incompleteness in LSA, and critically evaluating model sensitivity to sample size variations across multiple disciplines.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
样本量对滑坡易感性模型的影响:启发式、统计、机器学习、深度学习和集合学习模型与 SHAP 分析的比较研究
在滑坡易发性评估(LSA)中,清单的不完整性会在不同程度上影响不同模型的准确性。然而,这一领域的研究仍然不足。本研究调查了六种不同样本量(100%、90%、75%、50%、25% 和 10%)的启发式、统计、机器学习和集合学习模型(分析层次过程 (AHP)、频率比 (FR)、逻辑回归 (LR)、基于 Keras 的深度学习 (KBDL)、XGBoost 和 LightGBM)中的六种 LSA 模型。结果显示,在所有样本量下,XGBoost 和 LightGBM 的表现始终优于其他模型。LR 和 KBDL 模型紧随其后,而 FR 模型受样本量变化的影响最大。经验模型 AHP 则不受样本量的影响。通过 SHapley Additive exPlanations(SHAP)分析,海拔、NDVI、坡度、土地利用以及与道路和河流的距离成为研究区域滑坡发生的关键指标,这表明人类活动对这些事件有重大影响。有关人类活动和气候的五个时变指标验证了这一推论,为识别滑坡诱发因素,尤其是人类活动频繁地区的滑坡诱发因素提供了一种新方法。根据研究结果,提出了一个全面的山体滑坡评估框架,以帮助山体滑坡管理者做出明智的决策。未来的研究应侧重于扩大模型的多样性以解决样本大小的影响,增强 LSA 框架的适应性,利用可解释的机器学习技术深化人类活动对滑坡影响的分析,解决 LSA 中时间清单的不完整性,以及批判性地评估模型对跨学科样本大小变化的敏感性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computers & Geosciences
Computers & Geosciences 地学-地球科学综合
CiteScore
9.30
自引率
6.80%
发文量
164
审稿时长
3.4 months
期刊介绍: Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.
期刊最新文献
Multimodal feature integration network for lithology identification from point cloud data A two-dimensional magnetotelluric deep learning inversion approach based on improved Dense Convolutional Network Removing atmospheric noise from InSAR interferograms in mountainous regions with a convolutional neural network Novel empirical curvelet denoising strategy for suppressing mixed noise of microseismic data Curvilinear lineament extraction: Bayesian optimization of Principal Component Wavelet Analysis and Hysteresis Thresholding
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1