{"title":"基于机器学习和 SHAP 解释技术的可解释(可解释)模型,用于绘制风蚀危害图。","authors":"Hamid Gholami, Ehsan Darvishi, Navazollah Moradi, Aliakbar Mohammadifar, Yougui Song, Yue Li, Baicheng Niu, Dimitris Kaskaoutis, Biswajeet Pradhan","doi":"10.1007/s11356-024-35521-x","DOIUrl":null,"url":null,"abstract":"<p><p>Soil erosion by wind poses a significant threat to various regions across the globe, such as drylands in the Middle East and Iran. Wind erosion hazard maps can assist in identifying the regions of highest wind erosion risk and are a valuable tool for the mitigation of its destructive consequences. This study aims to map wind erosion hazards by developing an interpretable (explainable) model based on machine learning (ML) and Shapley additive exPlanation (SHAP) interpretation techniques. Four ML models, namely random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB), and quadratic discriminant analysis (QDA) were used. Thirteen features associated with wind erosion were mapped spatially and then subjected to a multivariate adaptive regression spline (MARS) feature selection algorithm, and then, tolerance coefficient (TC) and variance inflation factor (VIF) statistical tests were used to explore multicollinearity among the variables. MARS analysis shows that eight features consisting of elevation (or DEM), soil bulk density, precipitation, aspect, slope, soil sand content, vegetation cover (or NDVI), and lithology were the most effective for wind erosion, while no collinearity existed among these variables. The ML models were used for ranking the effective features, and the research introduces the application of an interpretable ML model for the interpretation of predictive model's output. The ranking of effective features by RF-as the most typical ML model-revealed that elevation and soil bulk density were the two most important features. According to the area under the receiver operating characteristic curve (AUROC) (with a value > 90%) and precision-recall (PR) (with a value > 90%) curves, all four ML models performed with great accuracy. According to the PR curve, the SVM model performed slightly better than others, and its results revealed that 20.9%, 23%, and 16.6% of the total area in Hormozgan Province is characterized by moderate, high, and very high hazard classes to wind erosion, respectively. SHAP revealed that soil sand content and elevation are the most important variables contributing to the predictive model output. Overall, our research is one of the pioneering applications of interpretable ML models in mapping wind erosion hazards in Southern Iran. We recommend that future research should address the aspect of interpretability in order to better understand predictive model outputs.</p>","PeriodicalId":545,"journal":{"name":"Environmental Science and Pollution Research","volume":" ","pages":""},"PeriodicalIF":5.8000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An interpretable (explainable) model based on machine learning and SHAP interpretation technique for mapping wind erosion hazard.\",\"authors\":\"Hamid Gholami, Ehsan Darvishi, Navazollah Moradi, Aliakbar Mohammadifar, Yougui Song, Yue Li, Baicheng Niu, Dimitris Kaskaoutis, Biswajeet Pradhan\",\"doi\":\"10.1007/s11356-024-35521-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Soil erosion by wind poses a significant threat to various regions across the globe, such as drylands in the Middle East and Iran. Wind erosion hazard maps can assist in identifying the regions of highest wind erosion risk and are a valuable tool for the mitigation of its destructive consequences. This study aims to map wind erosion hazards by developing an interpretable (explainable) model based on machine learning (ML) and Shapley additive exPlanation (SHAP) interpretation techniques. Four ML models, namely random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB), and quadratic discriminant analysis (QDA) were used. Thirteen features associated with wind erosion were mapped spatially and then subjected to a multivariate adaptive regression spline (MARS) feature selection algorithm, and then, tolerance coefficient (TC) and variance inflation factor (VIF) statistical tests were used to explore multicollinearity among the variables. MARS analysis shows that eight features consisting of elevation (or DEM), soil bulk density, precipitation, aspect, slope, soil sand content, vegetation cover (or NDVI), and lithology were the most effective for wind erosion, while no collinearity existed among these variables. The ML models were used for ranking the effective features, and the research introduces the application of an interpretable ML model for the interpretation of predictive model's output. The ranking of effective features by RF-as the most typical ML model-revealed that elevation and soil bulk density were the two most important features. According to the area under the receiver operating characteristic curve (AUROC) (with a value > 90%) and precision-recall (PR) (with a value > 90%) curves, all four ML models performed with great accuracy. According to the PR curve, the SVM model performed slightly better than others, and its results revealed that 20.9%, 23%, and 16.6% of the total area in Hormozgan Province is characterized by moderate, high, and very high hazard classes to wind erosion, respectively. SHAP revealed that soil sand content and elevation are the most important variables contributing to the predictive model output. Overall, our research is one of the pioneering applications of interpretable ML models in mapping wind erosion hazards in Southern Iran. We recommend that future research should address the aspect of interpretability in order to better understand predictive model outputs.</p>\",\"PeriodicalId\":545,\"journal\":{\"name\":\"Environmental Science and Pollution Research\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2024-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Science and Pollution Research\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1007/s11356-024-35521-x\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science and Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1007/s11356-024-35521-x","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
摘要
风造成的土壤侵蚀对全球多个地区,如中东和伊朗的干旱地区构成了严重威胁。风蚀危害图有助于确定风蚀风险最高的地区,是减轻风蚀破坏性后果的宝贵工具。本研究旨在通过开发基于机器学习(ML)和夏普利加法平面图(SHAP)解释技术的可解释(可解释)模型来绘制风蚀危害图。使用了四种 ML 模型,即随机森林 (RF)、支持向量机 (SVM)、极梯度提升 (XGB) 和二次判别分析 (QDA)。对与风蚀相关的 13 个特征进行了空间映射,然后采用多元自适应回归样条线(MARS)特征选择算法,并使用容差系数(TC)和方差膨胀因子(VIF)统计检验来探讨变量之间的多重共线性。MARS 分析表明,由海拔(或 DEM)、土壤容重、降水、坡向、坡度、土壤含沙量、植被覆盖(或 NDVI)和岩性组成的 8 个特征对风蚀最有效,而这些变量之间不存在共线性。该研究利用 ML 模型对有效特征进行了排序,并介绍了可解释 ML 模型在预测模型输出解释中的应用。通过 RF(最典型的 ML 模型)对有效地物进行排序,发现海拔和土壤容重是两个最重要的地物。根据接收者工作特征曲线下面积(AUROC)(其值大于 90%)和精度-召回(PR)(其值大于 90%)曲线,所有四个 ML 模型都表现出很高的精度。根据 PR 曲线,SVM 模型的表现略好于其他模型,其结果显示霍尔木兹甘省总面积的 20.9%、23% 和 16.6% 分别处于中度、高度和极高度风蚀危害等级。SHAP显示,土壤含沙量和海拔高度是影响预测模型输出结果的最重要变量。总之,我们的研究是将可解释的 ML 模型应用于绘制伊朗南部风蚀危害图的先驱之一。我们建议今后的研究应解决可解释性方面的问题,以便更好地理解预测模型的输出结果。
An interpretable (explainable) model based on machine learning and SHAP interpretation technique for mapping wind erosion hazard.
Soil erosion by wind poses a significant threat to various regions across the globe, such as drylands in the Middle East and Iran. Wind erosion hazard maps can assist in identifying the regions of highest wind erosion risk and are a valuable tool for the mitigation of its destructive consequences. This study aims to map wind erosion hazards by developing an interpretable (explainable) model based on machine learning (ML) and Shapley additive exPlanation (SHAP) interpretation techniques. Four ML models, namely random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB), and quadratic discriminant analysis (QDA) were used. Thirteen features associated with wind erosion were mapped spatially and then subjected to a multivariate adaptive regression spline (MARS) feature selection algorithm, and then, tolerance coefficient (TC) and variance inflation factor (VIF) statistical tests were used to explore multicollinearity among the variables. MARS analysis shows that eight features consisting of elevation (or DEM), soil bulk density, precipitation, aspect, slope, soil sand content, vegetation cover (or NDVI), and lithology were the most effective for wind erosion, while no collinearity existed among these variables. The ML models were used for ranking the effective features, and the research introduces the application of an interpretable ML model for the interpretation of predictive model's output. The ranking of effective features by RF-as the most typical ML model-revealed that elevation and soil bulk density were the two most important features. According to the area under the receiver operating characteristic curve (AUROC) (with a value > 90%) and precision-recall (PR) (with a value > 90%) curves, all four ML models performed with great accuracy. According to the PR curve, the SVM model performed slightly better than others, and its results revealed that 20.9%, 23%, and 16.6% of the total area in Hormozgan Province is characterized by moderate, high, and very high hazard classes to wind erosion, respectively. SHAP revealed that soil sand content and elevation are the most important variables contributing to the predictive model output. Overall, our research is one of the pioneering applications of interpretable ML models in mapping wind erosion hazards in Southern Iran. We recommend that future research should address the aspect of interpretability in order to better understand predictive model outputs.
期刊介绍:
Environmental Science and Pollution Research (ESPR) serves the international community in all areas of Environmental Science and related subjects with emphasis on chemical compounds. This includes:
- Terrestrial Biology and Ecology
- Aquatic Biology and Ecology
- Atmospheric Chemistry
- Environmental Microbiology/Biobased Energy Sources
- Phytoremediation and Ecosystem Restoration
- Environmental Analyses and Monitoring
- Assessment of Risks and Interactions of Pollutants in the Environment
- Conservation Biology and Sustainable Agriculture
- Impact of Chemicals/Pollutants on Human and Animal Health
It reports from a broad interdisciplinary outlook.