THE EXPLAINABILITY OF GRADIENT-BOOSTED DECISION TREES FOR DIGITAL ELEVATION MODEL (DEM) ERROR PREDICTION

Q2 Social Sciences The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences Pub Date : 2023-09-05 DOI:10.5194/isprs-archives-xlviii-m-3-2023-161-2023

C. Okolie, J. Mills, A. Adeleke, J. Smit, I. Maduako

{"title":"THE EXPLAINABILITY OF GRADIENT-BOOSTED DECISION TREES FOR DIGITAL ELEVATION MODEL (DEM) ERROR PREDICTION","authors":"C. Okolie, J. Mills, A. Adeleke, J. Smit, I. Maduako","doi":"10.5194/isprs-archives-xlviii-m-3-2023-161-2023","DOIUrl":null,"url":null,"abstract":"Abstract. Gradient boosted decision trees (GBDTs) have repeatedly outperformed several machine learning and deep learning algorithms in competitive data science. However, the explainability of GBDT predictions especially with earth observation data is still an open issue requiring more focus by researchers. In this study, we investigate the explainability of Bayesian-optimised GBDT algorithms for modelling and prediction of the vertical error in Copernicus GLO-30 digital elevation model (DEM). Three GBDT algorithms are investigated (extreme gradient boosting - XGBoost, light boosting machine – LightGBM, and categorical boosting – CatBoost), and SHapley Additive exPlanations (SHAP) are adopted for the explainability analysis. The assessment sites are selected from urban/industrial and mountainous landscapes in Cape Town, South Africa. Training datasets are comprised of eleven predictor variables which are known influencers of elevation error: elevation, slope, aspect, surface roughness, topographic position index, terrain ruggedness index, terrain surface texture, vector roughness measure, forest cover, bare ground cover, and urban footprints. The target variable (elevation error) was calculated with respect to accurate airborne LiDAR. After model training and testing, the GBDTs were applied for predicting the elevation error at model implementation sites. The SHAP plots showed varying levels of emphasis on the parameters depending on the land cover and terrain. For example, in the urban area, the influence of vector ruggedness measure surpassed that of first-order derivatives such as slope and aspect. Thus, it is recommended that machine learning modelling procedures and workflows incorporate model explainability to ensure robust interpretation and understanding of model predictions by both technical and non-technical users.\n","PeriodicalId":30634,"journal":{"name":"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/isprs-archives-xlviii-m-3-2023-161-2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract. Gradient boosted decision trees (GBDTs) have repeatedly outperformed several machine learning and deep learning algorithms in competitive data science. However, the explainability of GBDT predictions especially with earth observation data is still an open issue requiring more focus by researchers. In this study, we investigate the explainability of Bayesian-optimised GBDT algorithms for modelling and prediction of the vertical error in Copernicus GLO-30 digital elevation model (DEM). Three GBDT algorithms are investigated (extreme gradient boosting - XGBoost, light boosting machine – LightGBM, and categorical boosting – CatBoost), and SHapley Additive exPlanations (SHAP) are adopted for the explainability analysis. The assessment sites are selected from urban/industrial and mountainous landscapes in Cape Town, South Africa. Training datasets are comprised of eleven predictor variables which are known influencers of elevation error: elevation, slope, aspect, surface roughness, topographic position index, terrain ruggedness index, terrain surface texture, vector roughness measure, forest cover, bare ground cover, and urban footprints. The target variable (elevation error) was calculated with respect to accurate airborne LiDAR. After model training and testing, the GBDTs were applied for predicting the elevation error at model implementation sites. The SHAP plots showed varying levels of emphasis on the parameters depending on the land cover and terrain. For example, in the urban area, the influence of vector ruggedness measure surpassed that of first-order derivatives such as slope and aspect. Thus, it is recommended that machine learning modelling procedures and workflows incorporate model explainability to ensure robust interpretation and understanding of model predictions by both technical and non-technical users.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于数字高程模型误差预测的梯度决策树的可解释性

摘要梯度增强决策树（GBDT）在竞争性数据科学中的表现一再优于几种机器学习和深度学习算法。然而，GBDT预测的可解释性，尤其是地球观测数据的可解释度，仍然是一个有待研究人员更多关注的悬而未决的问题。在本研究中，我们研究了贝叶斯优化的GBDT算法在哥白尼GLO-30数字高程模型（DEM）中建模和预测垂直误差的可解释性。研究了三种GBDT算法（极限梯度提升-XGBost、光提升机-LightGBM和分类提升-CatBoost），并采用SHapley加性规划（SHAP）进行可解释性分析。评估地点选自南非开普敦的城市/工业和山区景观。训练数据集由11个已知影响高程误差的预测变量组成：高程、坡度、坡向、表面粗糙度、地形位置指数、地形粗糙度指数、地形表面纹理、矢量粗糙度测量、森林覆盖、裸露地面覆盖和城市足迹。目标变量（仰角误差）是根据精确的机载激光雷达计算的。在模型训练和测试之后，GBDT被应用于预测模型实施地点的高程误差。SHAP图显示，根据土地覆盖和地形，对参数的重视程度各不相同。例如，在城市地区，向量粗糙度测度的影响超过了斜率和坡向等一阶导数的影响。因此，建议机器学习建模程序和工作流结合模型可解释性，以确保技术和非技术用户对模型预测的有力解释和理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊