Evaluation of Explanation Methods of AI - CNNs in Image Classification Tasks with Reference-based and No-reference Metrics

A. Zhukov, J. Benois-Pineau, R. Giot
{"title":"Evaluation of Explanation Methods of AI - CNNs in Image Classification Tasks with Reference-based and No-reference Metrics","authors":"A. Zhukov, J. Benois-Pineau, R. Giot","doi":"10.54364/AAIML.2023.1143","DOIUrl":null,"url":null,"abstract":"The most popular methods in AI-machine learning paradigm are mainly black boxes. This is why explanation of AI decisions is of emergency. Although dedicated explanation tools have been massively developed, the evaluation of their quality remains an open research question. In this paper, we generalize the methodologies of evaluation of post-hoc explainers of CNNs’ decisions in visual classification tasks with reference and no-reference based metrics. We apply them on our previously developed explainers (FEM1 , MLFEM), and popular Grad-CAM. The reference-based metrics are Pearson correlation coefficient and Similarity computed between the explanation map and its ground truth represented by a Gaze Fixation Density Map obtained with a psycho-visual experiment. As a no-reference metric, we use stability metric, proposed by Alvarez-Melis and Jaakkola. We study its behaviour, consensus with reference-based metrics and show that in case of several kinds of degradation on input images, this metric is in agreement with reference-based ones. Therefore, it can be used for evaluation of the quality of explainers when the ground truth is not available.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Artif. Intell. Mach. Learn.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54364/AAIML.2023.1143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The most popular methods in AI-machine learning paradigm are mainly black boxes. This is why explanation of AI decisions is of emergency. Although dedicated explanation tools have been massively developed, the evaluation of their quality remains an open research question. In this paper, we generalize the methodologies of evaluation of post-hoc explainers of CNNs’ decisions in visual classification tasks with reference and no-reference based metrics. We apply them on our previously developed explainers (FEM1 , MLFEM), and popular Grad-CAM. The reference-based metrics are Pearson correlation coefficient and Similarity computed between the explanation map and its ground truth represented by a Gaze Fixation Density Map obtained with a psycho-visual experiment. As a no-reference metric, we use stability metric, proposed by Alvarez-Melis and Jaakkola. We study its behaviour, consensus with reference-based metrics and show that in case of several kinds of degradation on input images, this metric is in agreement with reference-based ones. Therefore, it can be used for evaluation of the quality of explainers when the ground truth is not available.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于参考和无参考指标的图像分类任务中AI - cnn的解释方法评价
人工智能-机器学习范式中最流行的方法主要是黑盒。这就是为什么解释人工智能决策是紧急的。尽管专门的解释工具已经大量开发,但对其质量的评估仍然是一个开放的研究问题。在本文中,我们用参考和无参考指标概括了cnn在视觉分类任务中决策的事后解释器的评估方法。我们将它们应用于我们以前开发的解释器(FEM1, MLFEM)和流行的Grad-CAM。基于参考的度量是通过心理视觉实验得到的凝视密度图表示的解释图与其基础真值之间的Pearson相关系数和相似度计算。作为无参考度量,我们使用由Alvarez-Melis和Jaakkola提出的稳定性度量。我们研究了它的行为,与基于参考的指标的一致性,并表明在输入图像的几种退化情况下,该指标与基于参考的指标一致。因此,它可以用来评价解释者的质量,当基础真理是不可用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
FishRecGAN: An End to End GAN Based Network for Fisheye Rectification and Calibration Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era Structural Vibration Signal Denoising Using Stacking Ensemble of Hybrid CNN-RNN A Comparison of Methods for Neural Network Aggregation One-class Damage Detector Using Deeper Fully Convolutional Data Descriptions for Civil Application
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1