Empirical Analysis of Methods for Evaluating Faithfulness of Explanations by Feature Attribution

Yuya Asazuma, Kazuaki Hanawa, Kentaro Inui
{"title":"Empirical Analysis of Methods for Evaluating Faithfulness of Explanations by Feature Attribution","authors":"Yuya Asazuma, Kazuaki Hanawa, Kentaro Inui","doi":"10.1527/tjsai.38-6_c-n22","DOIUrl":null,"url":null,"abstract":"Many high-performance machine learning models in the real world exhibit the black box problem. This issue is widely recognized as needing output reliability and model transparency. XAI (Explainable AI) represents a research field that addresses this issue. Within XAI, feature attribution methods, which clarify the importance of features irrespective of the task or model type, have become a central focus. Evaluating their efficacy based on empirical evidence is essential when proposing new methods. However, extensive debate exists regarding the properties that importance should be possessed, and a consensus on specific evaluation methods remains elusive. Given this context, many existing studies adopt their evaluation techniques, leading to fragmented discussions. This study aims to ”evaluate the evaluation methods,” focusing mainly on the faithfulness metric, deemed especially significant in evaluation criteria. We conducted empirical experiments related to existing evaluation techniques. The experiments approached the topic from two angles: correlation-based comparative evaluations and property verification using random sequences. In the former experiment, we investigated the correlation between faithfulness evaluation tests using numerous models and feature attribution methods. As a result, we found that very few test combinations exhibited high correlation, and many combinations showed low or no correlation. In the latter experiment, we observed that the measured faithfulness varied depending on the model and dataset by using random sequences instead of feature attribution methods to verify the properties of the faithfulness tests.","PeriodicalId":23256,"journal":{"name":"Transactions of The Japanese Society for Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions of The Japanese Society for Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1527/tjsai.38-6_c-n22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Many high-performance machine learning models in the real world exhibit the black box problem. This issue is widely recognized as needing output reliability and model transparency. XAI (Explainable AI) represents a research field that addresses this issue. Within XAI, feature attribution methods, which clarify the importance of features irrespective of the task or model type, have become a central focus. Evaluating their efficacy based on empirical evidence is essential when proposing new methods. However, extensive debate exists regarding the properties that importance should be possessed, and a consensus on specific evaluation methods remains elusive. Given this context, many existing studies adopt their evaluation techniques, leading to fragmented discussions. This study aims to ”evaluate the evaluation methods,” focusing mainly on the faithfulness metric, deemed especially significant in evaluation criteria. We conducted empirical experiments related to existing evaluation techniques. The experiments approached the topic from two angles: correlation-based comparative evaluations and property verification using random sequences. In the former experiment, we investigated the correlation between faithfulness evaluation tests using numerous models and feature attribution methods. As a result, we found that very few test combinations exhibited high correlation, and many combinations showed low or no correlation. In the latter experiment, we observed that the measured faithfulness varied depending on the model and dataset by using random sequences instead of feature attribution methods to verify the properties of the faithfulness tests.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
特征归因解释可信度评价方法的实证分析
现实世界中的许多高性能机器学习模型都存在黑箱问题。这个问题被广泛认为需要输出可靠性和模型透明度。XAI(可解释AI)代表了解决这一问题的研究领域。在XAI中,特征归因方法已经成为焦点,它澄清了与任务或模型类型无关的特征的重要性。在提出新方法时,基于经验证据评估其有效性至关重要。然而,关于重要性应该拥有的属性存在着广泛的争论,在具体的评估方法上仍然难以达成共识。在这种背景下,许多现有的研究采用了他们的评估技术,导致了支离破碎的讨论。本研究旨在“评估评估方法”,主要关注在评估标准中被认为特别重要的忠实度指标。我们进行了与现有评价技术相关的实证实验。实验从基于相关性的比较评价和基于随机序列的性质验证两个角度进行了探讨。在前一个实验中,我们研究了使用多种模型和特征归因方法的忠诚评估测试之间的相关性。结果,我们发现很少的测试组合表现出高相关性,而许多组合表现出低相关性或没有相关性。在后一个实验中,我们观察到测量的信度根据模型和数据集而变化,通过使用随机序列而不是特征归因方法来验证信度测试的属性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Transactions of The Japanese Society for Artificial Intelligence
Transactions of The Japanese Society for Artificial Intelligence Computer Science-Artificial Intelligence
CiteScore
0.40
自引率
0.00%
发文量
36
期刊最新文献
人流データを用いたサプライチェーン異常指数の構築と要因分解手法の開発 An Ontology of Properties and Processes of Inorganic Materials Based on Context-Dependency and Its Use Construction of a Dataset for Extracting the Relationship between Text and Tables for Securities Reports Analysis of Hedging Strategies for Multiple Options in the BTC Market Using Deep Smoothing and Deep Hedging Information Value of Japanese Financial Results Briefings Using Text Mining
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1