Improving Network Interpretability via Explanation Consistency Evaluation

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Multimedia Pub Date : 2024-09-16 DOI:10.1109/TMM.2024.3453058

Hefeng Wu;Hao Jiang;Keze Wang;Ziyi Tang;Xianghuan He;Liang Lin

{"title":"Improving Network Interpretability via Explanation Consistency Evaluation","authors":"Hefeng Wu;Hao Jiang;Keze Wang;Ziyi Tang;Xianghuan He;Liang Lin","doi":"10.1109/TMM.2024.3453058","DOIUrl":null,"url":null,"abstract":"While deep neural networks have achieved remarkable performance, they tend to lack transparency in prediction. The pursuit of greater interpretability in neural networks often results in a degradation of their original performance. Some works strive to improve both interpretability and performance, but they primarily depend on meticulously imposed conditions. In this paper, we propose a simple yet effective framework that acquires more explainable activation heatmaps and simultaneously increases the model performance, without the need for any extra supervision. Specifically, our concise framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. The explanation consistency metric is utilized to measure the similarity between the model's visual explanations of the original samples and those of semantic-preserved adversarial samples, whose background regions are perturbed by using image adversarial attack techniques. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations (i.e., low explanation consistency), for which the current model cannot provide robust interpretations. Comprehensive experimental results on various benchmarks demonstrate the superiority of our framework in multiple aspects, including higher recognition accuracy, greater data debiasing capability, stronger network robustness, and more precise localization ability on both regular networks and interpretable networks. We also provide extensive ablation studies and qualitative analyses to unveil the detailed contribution of each component.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11261-11273"},"PeriodicalIF":8.4000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10680614/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

While deep neural networks have achieved remarkable performance, they tend to lack transparency in prediction. The pursuit of greater interpretability in neural networks often results in a degradation of their original performance. Some works strive to improve both interpretability and performance, but they primarily depend on meticulously imposed conditions. In this paper, we propose a simple yet effective framework that acquires more explainable activation heatmaps and simultaneously increases the model performance, without the need for any extra supervision. Specifically, our concise framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. The explanation consistency metric is utilized to measure the similarity between the model's visual explanations of the original samples and those of semantic-preserved adversarial samples, whose background regions are perturbed by using image adversarial attack techniques. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations (i.e., low explanation consistency), for which the current model cannot provide robust interpretations. Comprehensive experimental results on various benchmarks demonstrate the superiority of our framework in multiple aspects, including higher recognition accuracy, greater data debiasing capability, stronger network robustness, and more precise localization ability on both regular networks and interpretable networks. We also provide extensive ablation studies and qualitative analyses to unveil the detailed contribution of each component.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过解释一致性评估提高网络可解释性

虽然深度神经网络已经取得了显著的性能，但它们在预测方面往往缺乏透明度。追求神经网络更高的可解释性往往会导致其原有性能下降。一些研究致力于同时提高可解释性和性能，但它们主要依赖于精心设置的条件。在本文中，我们提出了一个简单而有效的框架，它能获取更多可解释的激活热图，同时提高模型性能，而无需任何额外的监督。具体来说，我们的简洁框架引入了一个新指标，即解释一致性，以便在模型学习过程中对训练样本进行自适应重新加权。解释一致性指标用于衡量模型对原始样本的视觉解释与对语义保留的对抗样本的视觉解释之间的相似性，对抗样本的背景区域通过图像对抗攻击技术进行了扰动。然后，我们的框架会更密切地关注那些解释差异较大（即解释一致性较低）的训练样本，从而促进模型学习。各种基准的综合实验结果证明了我们的框架在多个方面的优越性，包括更高的识别准确率、更强的数据去杂能力、更强的网络鲁棒性，以及在常规网络和可解释网络上更精确的定位能力。我们还提供了广泛的消融研究和定性分析，以揭示每个组件的详细贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.