Marija Kopanja, Stefan Hačko, Sanja Brdar, Miloš Savić
{"title":"用于解释基于成本敏感树模型的成本敏感树 SHAP","authors":"Marija Kopanja, Stefan Hačko, Sanja Brdar, Miloš Savić","doi":"10.1111/coin.12651","DOIUrl":null,"url":null,"abstract":"<p>Cost-sensitive ensemble learning as a combination of two approaches, ensemble learning and cost-sensitive learning, enables generation of cost-sensitive tree-based ensemble models using the cost-sensitive decision tree (CSDT) learning algorithm. In general, tree-based models characterize nice graphical representation that can explain a model's decision-making process. However, the depth of the tree and the number of base models in the ensemble can be a limiting factor in comprehending the model's decision for each sample. The CSDT models are widely used in finance (e.g., credit scoring and fraud detection) but lack effective explanation methods. We previously addressed this gap with cost-sensitive tree Shapley Additive Explanation Method (CSTreeSHAP), a cost-sensitive tree explanation method for the single-tree CSDT model. Here, we extend the introduced methodology to cost-sensitive ensemble models, particularly cost-sensitive random forest models. The paper details the theoretical foundation and implementation details of CSTreeSHAP for both single CSDT and ensemble models. The usefulness of the proposed method is demonstrated by providing explanations for single and ensemble CSDT models trained on well-known benchmark credit scoring datasets. Finally, we apply our methodology and analyze the stability of explanations for those models compared to the cost-insensitive tree-based models. Our analysis reveals statistically significant differences between SHAP values despite seemingly similar global feature importance plots of the models. This highlights the value of our methodology as a comprehensive tool for explaining CSDT models.</p>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cost-sensitive tree SHAP for explaining cost-sensitive tree-based models\",\"authors\":\"Marija Kopanja, Stefan Hačko, Sanja Brdar, Miloš Savić\",\"doi\":\"10.1111/coin.12651\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Cost-sensitive ensemble learning as a combination of two approaches, ensemble learning and cost-sensitive learning, enables generation of cost-sensitive tree-based ensemble models using the cost-sensitive decision tree (CSDT) learning algorithm. In general, tree-based models characterize nice graphical representation that can explain a model's decision-making process. However, the depth of the tree and the number of base models in the ensemble can be a limiting factor in comprehending the model's decision for each sample. The CSDT models are widely used in finance (e.g., credit scoring and fraud detection) but lack effective explanation methods. We previously addressed this gap with cost-sensitive tree Shapley Additive Explanation Method (CSTreeSHAP), a cost-sensitive tree explanation method for the single-tree CSDT model. Here, we extend the introduced methodology to cost-sensitive ensemble models, particularly cost-sensitive random forest models. The paper details the theoretical foundation and implementation details of CSTreeSHAP for both single CSDT and ensemble models. The usefulness of the proposed method is demonstrated by providing explanations for single and ensemble CSDT models trained on well-known benchmark credit scoring datasets. Finally, we apply our methodology and analyze the stability of explanations for those models compared to the cost-insensitive tree-based models. Our analysis reveals statistically significant differences between SHAP values despite seemingly similar global feature importance plots of the models. This highlights the value of our methodology as a comprehensive tool for explaining CSDT models.</p>\",\"PeriodicalId\":55228,\"journal\":{\"name\":\"Computational Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/coin.12651\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.12651","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Cost-sensitive tree SHAP for explaining cost-sensitive tree-based models
Cost-sensitive ensemble learning as a combination of two approaches, ensemble learning and cost-sensitive learning, enables generation of cost-sensitive tree-based ensemble models using the cost-sensitive decision tree (CSDT) learning algorithm. In general, tree-based models characterize nice graphical representation that can explain a model's decision-making process. However, the depth of the tree and the number of base models in the ensemble can be a limiting factor in comprehending the model's decision for each sample. The CSDT models are widely used in finance (e.g., credit scoring and fraud detection) but lack effective explanation methods. We previously addressed this gap with cost-sensitive tree Shapley Additive Explanation Method (CSTreeSHAP), a cost-sensitive tree explanation method for the single-tree CSDT model. Here, we extend the introduced methodology to cost-sensitive ensemble models, particularly cost-sensitive random forest models. The paper details the theoretical foundation and implementation details of CSTreeSHAP for both single CSDT and ensemble models. The usefulness of the proposed method is demonstrated by providing explanations for single and ensemble CSDT models trained on well-known benchmark credit scoring datasets. Finally, we apply our methodology and analyze the stability of explanations for those models compared to the cost-insensitive tree-based models. Our analysis reveals statistically significant differences between SHAP values despite seemingly similar global feature importance plots of the models. This highlights the value of our methodology as a comprehensive tool for explaining CSDT models.
期刊介绍:
This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.