加固可解释深度学习系统：对抗性威胁与防御调查

IF 7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Dependable and Secure Computing Pub Date : 2024-07-01 DOI:10.1109/TDSC.2023.3341090

Eldor Abdukhamidov, Mohammad Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed

{"title":"加固可解释深度学习系统：对抗性威胁与防御调查","authors":"Eldor Abdukhamidov, Mohammad Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed","doi":"10.1109/TDSC.2023.3341090","DOIUrl":null,"url":null,"abstract":"Deep learning methods have gained increasing attention in various applications due to their outstanding performance. For exploring how this high performance relates to the proper use of data artifacts and the accurate problem formulation of a given task, interpretation models have become a crucial component in developing deep learning-based systems. Interpretation models enable the understanding of the inner workings of deep learning models and offer a sense of security in detecting the misuse of artifacts in the input data. Similar to prediction models, interpretation models are also susceptible to adversarial inputs. This work introduces two attacks, AdvEdge and AdvEdge<inline-formula><tex-math notation=\"LaTeX\">$^{+}$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mo>+</mml:mo></mml:msup></mml:math><inline-graphic xlink:href=\"abuhmed-ieq1-3341090.gif\"/></alternatives></inline-formula>, which deceive both the target deep learning model and the coupled interpretation model. We assess the effectiveness of proposed attacks against four deep learning model architectures coupled with four interpretation models that represent different categories of interpretation models. Our experiments include the implementation of attacks using various attack frameworks. We also explore the attack resilience against three general defense mechanisms and potential countermeasures. Our analysis shows the effectiveness of our attacks in terms of deceiving the deep learning models and their interpreters, and highlights insights to improve and circumvent the attacks.","PeriodicalId":13047,"journal":{"name":"IEEE Transactions on Dependable and Secure Computing","volume":null,"pages":null},"PeriodicalIF":7.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Hardening Interpretable Deep Learning Systems: Investigating Adversarial Threats and Defenses\",\"authors\":\"Eldor Abdukhamidov, Mohammad Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed\",\"doi\":\"10.1109/TDSC.2023.3341090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning methods have gained increasing attention in various applications due to their outstanding performance. For exploring how this high performance relates to the proper use of data artifacts and the accurate problem formulation of a given task, interpretation models have become a crucial component in developing deep learning-based systems. Interpretation models enable the understanding of the inner workings of deep learning models and offer a sense of security in detecting the misuse of artifacts in the input data. Similar to prediction models, interpretation models are also susceptible to adversarial inputs. This work introduces two attacks, AdvEdge and AdvEdge<inline-formula><tex-math notation=\\\"LaTeX\\\">$^{+}$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mo>+</mml:mo></mml:msup></mml:math><inline-graphic xlink:href=\\\"abuhmed-ieq1-3341090.gif\\\"/></alternatives></inline-formula>, which deceive both the target deep learning model and the coupled interpretation model. We assess the effectiveness of proposed attacks against four deep learning model architectures coupled with four interpretation models that represent different categories of interpretation models. Our experiments include the implementation of attacks using various attack frameworks. We also explore the attack resilience against three general defense mechanisms and potential countermeasures. Our analysis shows the effectiveness of our attacks in terms of deceiving the deep learning models and their interpreters, and highlights insights to improve and circumvent the attacks.\",\"PeriodicalId\":13047,\"journal\":{\"name\":\"IEEE Transactions on Dependable and Secure Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Dependable and Secure Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/TDSC.2023.3341090\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Dependable and Secure Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TDSC.2023.3341090","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 5

摘要

深度学习方法因其出色的性能在各种应用中获得了越来越多的关注。为了探索这种高性能如何与正确使用数据工件和准确制定给定任务的问题相关联，解释模型已成为开发基于深度学习的系统的重要组成部分。解释模型有助于理解深度学习模型的内部运作，并为检测输入数据中人工智能的滥用提供安全感。与预测模型类似，解释模型也容易受到对抗性输入的影响。这项研究引入了两种攻击：AdvEdge 和 AdvEdge$^{+}$+，这两种攻击同时欺骗了目标深度学习模型和耦合解释模型。我们评估了针对四种深度学习模型架构和四种解释模型提出的攻击的有效性，这四种解释模型代表了不同类别的解释模型。我们的实验包括使用各种攻击框架实施攻击。我们还探索了针对三种一般防御机制和潜在对策的攻击复原力。我们的分析表明了我们的攻击在欺骗深度学习模型及其解释器方面的有效性，并强调了改进和规避攻击的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hardening Interpretable Deep Learning Systems: Investigating Adversarial Threats and Defenses

Deep learning methods have gained increasing attention in various applications due to their outstanding performance. For exploring how this high performance relates to the proper use of data artifacts and the accurate problem formulation of a given task, interpretation models have become a crucial component in developing deep learning-based systems. Interpretation models enable the understanding of the inner workings of deep learning models and offer a sense of security in detecting the misuse of artifacts in the input data. Similar to prediction models, interpretation models are also susceptible to adversarial inputs. This work introduces two attacks, AdvEdge and AdvEdge

$^{+}$

, which deceive both the target deep learning model and the coupled interpretation model. We assess the effectiveness of proposed attacks against four deep learning model architectures coupled with four interpretation models that represent different categories of interpretation models. Our experiments include the implementation of attacks using various attack frameworks. We also explore the attack resilience against three general defense mechanisms and potential countermeasures. Our analysis shows the effectiveness of our attacks in terms of deceiving the deep learning models and their interpreters, and highlights insights to improve and circumvent the attacks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Dependable and Secure Computing 工程技术-计算机：软件工程

CiteScore

11.20

自引率

5.50%

发文量

354

审稿时长

9 months

期刊介绍： The "IEEE Transactions on Dependable and Secure Computing (TDSC)" is a prestigious journal that publishes high-quality, peer-reviewed research in the field of computer science, specifically targeting the development of dependable and secure computing systems and networks. This journal is dedicated to exploring the fundamental principles, methodologies, and mechanisms that enable the design, modeling, and evaluation of systems that meet the required levels of reliability, security, and performance. The scope of TDSC includes research on measurement, modeling, and simulation techniques that contribute to the understanding and improvement of system performance under various constraints. It also covers the foundations necessary for the joint evaluation, verification, and design of systems that balance performance, security, and dependability. By publishing archival research results, TDSC aims to provide a valuable resource for researchers, engineers, and practitioners working in the areas of cybersecurity, fault tolerance, and system reliability. The journal's focus on cutting-edge research ensures that it remains at the forefront of advancements in the field, promoting the development of technologies that are critical for the functioning of modern, complex systems.