用于数字高程模型误差预测的梯度决策树的可解释性

C. Okolie, J. Mills, A. Adeleke, J. Smit, I. Maduako
{"title":"用于数字高程模型误差预测的梯度决策树的可解释性","authors":"C. Okolie, J. Mills, A. Adeleke, J. Smit, I. Maduako","doi":"10.5194/isprs-archives-xlviii-m-3-2023-161-2023","DOIUrl":null,"url":null,"abstract":"Abstract. Gradient boosted decision trees (GBDTs) have repeatedly outperformed several machine learning and deep learning algorithms in competitive data science. However, the explainability of GBDT predictions especially with earth observation data is still an open issue requiring more focus by researchers. In this study, we investigate the explainability of Bayesian-optimised GBDT algorithms for modelling and prediction of the vertical error in Copernicus GLO-30 digital elevation model (DEM). Three GBDT algorithms are investigated (extreme gradient boosting - XGBoost, light boosting machine – LightGBM, and categorical boosting – CatBoost), and SHapley Additive exPlanations (SHAP) are adopted for the explainability analysis. The assessment sites are selected from urban/industrial and mountainous landscapes in Cape Town, South Africa. Training datasets are comprised of eleven predictor variables which are known influencers of elevation error: elevation, slope, aspect, surface roughness, topographic position index, terrain ruggedness index, terrain surface texture, vector roughness measure, forest cover, bare ground cover, and urban footprints. The target variable (elevation error) was calculated with respect to accurate airborne LiDAR. After model training and testing, the GBDTs were applied for predicting the elevation error at model implementation sites. The SHAP plots showed varying levels of emphasis on the parameters depending on the land cover and terrain. For example, in the urban area, the influence of vector ruggedness measure surpassed that of first-order derivatives such as slope and aspect. Thus, it is recommended that machine learning modelling procedures and workflows incorporate model explainability to ensure robust interpretation and understanding of model predictions by both technical and non-technical users.\n","PeriodicalId":30634,"journal":{"name":"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"THE EXPLAINABILITY OF GRADIENT-BOOSTED DECISION TREES FOR DIGITAL ELEVATION MODEL (DEM) ERROR PREDICTION\",\"authors\":\"C. Okolie, J. Mills, A. Adeleke, J. Smit, I. Maduako\",\"doi\":\"10.5194/isprs-archives-xlviii-m-3-2023-161-2023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract. Gradient boosted decision trees (GBDTs) have repeatedly outperformed several machine learning and deep learning algorithms in competitive data science. However, the explainability of GBDT predictions especially with earth observation data is still an open issue requiring more focus by researchers. In this study, we investigate the explainability of Bayesian-optimised GBDT algorithms for modelling and prediction of the vertical error in Copernicus GLO-30 digital elevation model (DEM). Three GBDT algorithms are investigated (extreme gradient boosting - XGBoost, light boosting machine – LightGBM, and categorical boosting – CatBoost), and SHapley Additive exPlanations (SHAP) are adopted for the explainability analysis. The assessment sites are selected from urban/industrial and mountainous landscapes in Cape Town, South Africa. Training datasets are comprised of eleven predictor variables which are known influencers of elevation error: elevation, slope, aspect, surface roughness, topographic position index, terrain ruggedness index, terrain surface texture, vector roughness measure, forest cover, bare ground cover, and urban footprints. The target variable (elevation error) was calculated with respect to accurate airborne LiDAR. After model training and testing, the GBDTs were applied for predicting the elevation error at model implementation sites. The SHAP plots showed varying levels of emphasis on the parameters depending on the land cover and terrain. For example, in the urban area, the influence of vector ruggedness measure surpassed that of first-order derivatives such as slope and aspect. Thus, it is recommended that machine learning modelling procedures and workflows incorporate model explainability to ensure robust interpretation and understanding of model predictions by both technical and non-technical users.\\n\",\"PeriodicalId\":30634,\"journal\":{\"name\":\"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5194/isprs-archives-xlviii-m-3-2023-161-2023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/isprs-archives-xlviii-m-3-2023-161-2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

摘要

摘要梯度增强决策树(GBDT)在竞争性数据科学中的表现一再优于几种机器学习和深度学习算法。然而,GBDT预测的可解释性,尤其是地球观测数据的可解释度,仍然是一个有待研究人员更多关注的悬而未决的问题。在本研究中,我们研究了贝叶斯优化的GBDT算法在哥白尼GLO-30数字高程模型(DEM)中建模和预测垂直误差的可解释性。研究了三种GBDT算法(极限梯度提升-XGBost、光提升机-LightGBM和分类提升-CatBoost),并采用SHapley加性规划(SHAP)进行可解释性分析。评估地点选自南非开普敦的城市/工业和山区景观。训练数据集由11个已知影响高程误差的预测变量组成:高程、坡度、坡向、表面粗糙度、地形位置指数、地形粗糙度指数、地形表面纹理、矢量粗糙度测量、森林覆盖、裸露地面覆盖和城市足迹。目标变量(仰角误差)是根据精确的机载激光雷达计算的。在模型训练和测试之后,GBDT被应用于预测模型实施地点的高程误差。SHAP图显示,根据土地覆盖和地形,对参数的重视程度各不相同。例如,在城市地区,向量粗糙度测度的影响超过了斜率和坡向等一阶导数的影响。因此,建议机器学习建模程序和工作流结合模型可解释性,以确保技术和非技术用户对模型预测的有力解释和理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
THE EXPLAINABILITY OF GRADIENT-BOOSTED DECISION TREES FOR DIGITAL ELEVATION MODEL (DEM) ERROR PREDICTION
Abstract. Gradient boosted decision trees (GBDTs) have repeatedly outperformed several machine learning and deep learning algorithms in competitive data science. However, the explainability of GBDT predictions especially with earth observation data is still an open issue requiring more focus by researchers. In this study, we investigate the explainability of Bayesian-optimised GBDT algorithms for modelling and prediction of the vertical error in Copernicus GLO-30 digital elevation model (DEM). Three GBDT algorithms are investigated (extreme gradient boosting - XGBoost, light boosting machine – LightGBM, and categorical boosting – CatBoost), and SHapley Additive exPlanations (SHAP) are adopted for the explainability analysis. The assessment sites are selected from urban/industrial and mountainous landscapes in Cape Town, South Africa. Training datasets are comprised of eleven predictor variables which are known influencers of elevation error: elevation, slope, aspect, surface roughness, topographic position index, terrain ruggedness index, terrain surface texture, vector roughness measure, forest cover, bare ground cover, and urban footprints. The target variable (elevation error) was calculated with respect to accurate airborne LiDAR. After model training and testing, the GBDTs were applied for predicting the elevation error at model implementation sites. The SHAP plots showed varying levels of emphasis on the parameters depending on the land cover and terrain. For example, in the urban area, the influence of vector ruggedness measure surpassed that of first-order derivatives such as slope and aspect. Thus, it is recommended that machine learning modelling procedures and workflows incorporate model explainability to ensure robust interpretation and understanding of model predictions by both technical and non-technical users.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.70
自引率
0.00%
发文量
949
审稿时长
16 weeks
期刊最新文献
EVALUATION OF CONSUMER-GRADE AND SURVEY-GRADE UAV-LIDAR EVALUATING GEOMETRY OF AN INDOOR SCENARIO WITH OCCLUSIONS BASED ON TOTAL STATION MEASUREMENTS OF WALL ELEMENTS INVESTIGATION ON THE USE OF NeRF FOR HERITAGE 3D DENSE RECONSTRUCTION FOR INTERIOR SPACES TERRESTRIAL 3D MAPPING OF FORESTS: GEOREFERENCING CHALLENGES AND SENSORS COMPARISONS SPECTRAL ANALYSIS OF IMAGES OF PLANTS UNDER STRESS USING A CLOSE-RANGE CAMERA
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1