神经网络的雕刻、秘密和可解释性

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Emerging Topics in Computing Pub Date : 2024-01-31 DOI:10.1109/TETC.2024.3358759

Nathaniel Hobbs;Periklis A. Papakonstantinou;Jaideep Vaidya

{"title":"神经网络的雕刻、秘密和可解释性","authors":"Nathaniel Hobbs;Periklis A. Papakonstantinou;Jaideep Vaidya","doi":"10.1109/TETC.2024.3358759","DOIUrl":null,"url":null,"abstract":"This work proposes a definition and examines the problem of undetectably engraving special input/output information into a Neural Network (NN). Investigation of this problem is significant given the ubiquity of neural networks and society's reliance on their proper training and use. We systematically study this question and provide (1) definitions of security for secret engravings, (2) machine learning methods for the construction of an engraved network, (3) a threat model that is instantiated with state-of-the-art interpretability methods to devise distinguishers/attackers. In this work, there are two kinds of algorithms. First, the constructions of engravings through machine learning training methods. Second, the distinguishers associated with the threat model. The weakest of our engraved NN constructions are insecure and can be broken by our distinguishers, whereas other, more systematic engravings are resilient to each of our distinguishing attacks on three prototypical image classification datasets. Our threat model is of independent interest, as it provides a concrete quantification/benchmark for the “goodness” of interpretability methods.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 4","pages":"1093-1104"},"PeriodicalIF":5.1000,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Engravings, Secrets, and Interpretability of Neural Networks\",\"authors\":\"Nathaniel Hobbs;Periklis A. Papakonstantinou;Jaideep Vaidya\",\"doi\":\"10.1109/TETC.2024.3358759\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work proposes a definition and examines the problem of undetectably engraving special input/output information into a Neural Network (NN). Investigation of this problem is significant given the ubiquity of neural networks and society's reliance on their proper training and use. We systematically study this question and provide (1) definitions of security for secret engravings, (2) machine learning methods for the construction of an engraved network, (3) a threat model that is instantiated with state-of-the-art interpretability methods to devise distinguishers/attackers. In this work, there are two kinds of algorithms. First, the constructions of engravings through machine learning training methods. Second, the distinguishers associated with the threat model. The weakest of our engraved NN constructions are insecure and can be broken by our distinguishers, whereas other, more systematic engravings are resilient to each of our distinguishing attacks on three prototypical image classification datasets. Our threat model is of independent interest, as it provides a concrete quantification/benchmark for the “goodness” of interpretability methods.\",\"PeriodicalId\":13156,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computing\",\"volume\":\"12 4\",\"pages\":\"1093-1104\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10418129/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10418129/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

这项工作提出了一个定义，并研究了不可检测地将特殊输入/输出信息雕刻到神经网络（NN）中的问题。考虑到神经网络的普遍存在以及社会对其适当训练和使用的依赖，对这个问题的研究是非常重要的。我们系统地研究了这个问题，并提供了(1)秘密雕刻的安全定义，(2)构建雕刻网络的机器学习方法，(3)用最先进的可解释性方法实例化的威胁模型，以设计区分者/攻击者。在这项工作中，有两种算法。第一，通过机器学习训练方法构造版画。其次，与威胁模型相关的区分符。我们最弱的雕刻NN结构是不安全的，可以被我们的区分器打破，而其他更系统的雕刻对我们对三个原型图像分类数据集的每种区分攻击都有弹性。我们的威胁模型具有独立的意义，因为它为可解释性方法的“优点”提供了具体的量化/基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Engravings, Secrets, and Interpretability of Neural Networks

This work proposes a definition and examines the problem of undetectably engraving special input/output information into a Neural Network (NN). Investigation of this problem is significant given the ubiquity of neural networks and society's reliance on their proper training and use. We systematically study this question and provide (1) definitions of security for secret engravings, (2) machine learning methods for the construction of an engraved network, (3) a threat model that is instantiated with state-of-the-art interpretability methods to devise distinguishers/attackers. In this work, there are two kinds of algorithms. First, the constructions of engravings through machine learning training methods. Second, the distinguishers associated with the threat model. The weakest of our engraved NN constructions are insecure and can be broken by our distinguishers, whereas other, more systematic engravings are resilient to each of our distinguishing attacks on three prototypical image classification datasets. Our threat model is of independent interest, as it provides a concrete quantification/benchmark for the “goodness” of interpretability methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Emerging Topics in Computing Computer Science-Computer Science (miscellaneous)

CiteScore

12.10

自引率

5.10%

发文量

113

期刊介绍： IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.

期刊最新文献

Front Cover Table of Contents Guest Editorial: Special Section on “Approximate Data Processing: Computing, Storage and Applications” IEEE Transactions on Emerging Topics in Computing Information for Authors Table of Contents