Improving interpretability via regularization of neural activation sensitivity

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Pub Date : 2024-06-19 DOI:10.1007/s10994-024-06549-4

Ofir Moshe, Gil Fidel, Ron Bitton, Asaf Shabtai

{"title":"Improving interpretability via regularization of neural activation sensitivity","authors":"Ofir Moshe, Gil Fidel, Ron Bitton, Asaf Shabtai","doi":"10.1007/s10994-024-06549-4","DOIUrl":null,"url":null,"abstract":"<p>State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. However, their widespread adoption in mission-critical contexts is limited due to two major weaknesses - their susceptibility to adversarial attacks and their opaqueness. The former raises concerns about DNNs’ security and generalization in real-world conditions, while the latter, opaqueness, directly impacts interpretability. The lack of interpretability diminishes user trust as it is challenging to have confidence in a model’s decision when its reasoning is not aligned with human perspectives. In this research, we (1) examine the effect of adversarial robustness on interpretability, and (2) present a novel approach for improving DNNs’ interpretability that is based on the regularization of neural activation sensitivity. We evaluate the interpretability of models trained using our method to that of standard models and models trained using state-of-the-art adversarial robustness techniques. Our results show that adversarially robust models are superior to standard models, and that models trained using our proposed method are even better than adversarially robust models in terms of interpretability.(Code provided in supplementary material.)</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"32 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06549-4","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. However, their widespread adoption in mission-critical contexts is limited due to two major weaknesses - their susceptibility to adversarial attacks and their opaqueness. The former raises concerns about DNNs’ security and generalization in real-world conditions, while the latter, opaqueness, directly impacts interpretability. The lack of interpretability diminishes user trust as it is challenging to have confidence in a model’s decision when its reasoning is not aligned with human perspectives. In this research, we (1) examine the effect of adversarial robustness on interpretability, and (2) present a novel approach for improving DNNs’ interpretability that is based on the regularization of neural activation sensitivity. We evaluate the interpretability of models trained using our method to that of standard models and models trained using state-of-the-art adversarial robustness techniques. Our results show that adversarially robust models are superior to standard models, and that models trained using our proposed method are even better than adversarially robust models in terms of interpretability.(Code provided in supplementary material.)

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过正则化神经激活敏感性提高可解释性

最先进的深度神经网络（DNN）在处理许多现实世界的任务时非常有效。然而，由于其易受对抗性攻击和不透明性这两大弱点，它们在关键任务环境中的广泛应用受到了限制。前者引发了人们对 DNN 在真实世界条件下的安全性和泛化能力的担忧，而后者，即不透明性，则直接影响了可解释性。缺乏可解释性会降低用户的信任度，因为当模型的推理与人类的观点不一致时，要对模型的决策抱有信心是很有挑战性的。在这项研究中，我们(1) 研究了对抗鲁棒性对可解释性的影响，(2) 提出了一种基于神经激活灵敏度正则化的提高 DNN 可解释性的新方法。我们评估了使用我们的方法训练的模型与标准模型和使用最先进的对抗鲁棒性技术训练的模型的可解释性。我们的结果表明，对抗鲁棒性模型优于标准模型，而使用我们提出的方法训练的模型在可解释性方面甚至优于对抗鲁棒性模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.