Sample size effects on landslide susceptibility models: A comparative study of heuristic, statistical, machine learning, deep learning and ensemble learning models with SHAP analysis
Shilong Yang , Jiayao Tan , Danyuan Luo , Yuzhou Wang , Xu Guo , Qiuyu Zhu , Chuanming Ma , Hanxiang Xiong
{"title":"Sample size effects on landslide susceptibility models: A comparative study of heuristic, statistical, machine learning, deep learning and ensemble learning models with SHAP analysis","authors":"Shilong Yang , Jiayao Tan , Danyuan Luo , Yuzhou Wang , Xu Guo , Qiuyu Zhu , Chuanming Ma , Hanxiang Xiong","doi":"10.1016/j.cageo.2024.105723","DOIUrl":null,"url":null,"abstract":"<div><p>In landslide susceptibility assessment (LSA), inventory incompleteness impacts the accuracy of different models to varying degrees. However, this area remains under-researched. This study investigated six LSA models from heuristic, statistical, machine learning and ensemble learning models (analytical hierarchy process (AHP), frequency ratio (FR), logistic regression (LR), Keras based deep learning (KBDL), XGBoost, and LightGBM) across six different sample sizes (100%, 90%, 75%, 50%, 25%, and 10%). Results revealed that XGBoost and LightGBM consistently outperformed other models across all sample sizes. The LR and KBDL models followed, while FR model was the most affected by sample size variations. AHP, an empirical model, remained unaffected by sample size. Through SHapley Additive exPlanations (SHAP) analysis, elevation, NDVI, slope, land use, and distance to roads and rivers emerged as pivotal indicators for landslide occurrences in the study area, suggesting that human activities significantly influence these events. Five time-varying indicators regarding human activity and climate validated this inference, which provides a new method to identify landslide triggering factors, especially in areas of intense human activity. Based on the findings, a comprehensive framework for LSA is proposed to assist landslide managers in making informed decisions. Future research should focus on expanding model diversity to address the effects of sample size, enhancing the adaptability of the LSA framework, deepening the analysis of human activity impacts on landslides using explainable machine learning techniques, addressing temporal inventory incompleteness in LSA, and critically evaluating model sensitivity to sample size variations across multiple disciplines.</p></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"193 ","pages":"Article 105723"},"PeriodicalIF":4.2000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300424002061","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
In landslide susceptibility assessment (LSA), inventory incompleteness impacts the accuracy of different models to varying degrees. However, this area remains under-researched. This study investigated six LSA models from heuristic, statistical, machine learning and ensemble learning models (analytical hierarchy process (AHP), frequency ratio (FR), logistic regression (LR), Keras based deep learning (KBDL), XGBoost, and LightGBM) across six different sample sizes (100%, 90%, 75%, 50%, 25%, and 10%). Results revealed that XGBoost and LightGBM consistently outperformed other models across all sample sizes. The LR and KBDL models followed, while FR model was the most affected by sample size variations. AHP, an empirical model, remained unaffected by sample size. Through SHapley Additive exPlanations (SHAP) analysis, elevation, NDVI, slope, land use, and distance to roads and rivers emerged as pivotal indicators for landslide occurrences in the study area, suggesting that human activities significantly influence these events. Five time-varying indicators regarding human activity and climate validated this inference, which provides a new method to identify landslide triggering factors, especially in areas of intense human activity. Based on the findings, a comprehensive framework for LSA is proposed to assist landslide managers in making informed decisions. Future research should focus on expanding model diversity to address the effects of sample size, enhancing the adaptability of the LSA framework, deepening the analysis of human activity impacts on landslides using explainable machine learning techniques, addressing temporal inventory incompleteness in LSA, and critically evaluating model sensitivity to sample size variations across multiple disciplines.
期刊介绍:
Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.