{"title":"神经网络稀疏后门攻击","authors":"Nan Zhong, Zhenxing Qian, Xinpeng Zhang","doi":"10.1093/comjnl/bxad100","DOIUrl":null,"url":null,"abstract":"Abstract Recent studies show that neural networks are vulnerable to backdoor attacks, in which compromised networks behave normally for clean inputs but make mistakes when a pre-defined trigger appears. Although prior studies have designed various invisible triggers to avoid causing visual anomalies, they cannot evade some trigger detectors. In this paper, we consider the stealthiness of backdoor attacks from input space and feature representation space. We propose a novel backdoor attack named sparse backdoor attack, and investigate the minimum required trigger to induce the well-trained networks to make incorrect results. A U-net-based generator is employed to create triggers for each clean image. Considering the stealthiness of the trigger, we restrict the elements of the trigger between −1 and 1. In the aspect of the feature representation domain, we adopt an entanglement cost function to minimize the gap between feature representations of benign and malicious inputs. The inseparability of benign and malicious feature representations contributes to the stealthiness of our attack against various model diagnosis-based defences. We validate the effectiveness and generalization of our method by conducting extensive experiments on multiple datasets and networks.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"79 1","pages":"0"},"PeriodicalIF":1.5000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sparse Backdoor Attack Against Neural Networks\",\"authors\":\"Nan Zhong, Zhenxing Qian, Xinpeng Zhang\",\"doi\":\"10.1093/comjnl/bxad100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Recent studies show that neural networks are vulnerable to backdoor attacks, in which compromised networks behave normally for clean inputs but make mistakes when a pre-defined trigger appears. Although prior studies have designed various invisible triggers to avoid causing visual anomalies, they cannot evade some trigger detectors. In this paper, we consider the stealthiness of backdoor attacks from input space and feature representation space. We propose a novel backdoor attack named sparse backdoor attack, and investigate the minimum required trigger to induce the well-trained networks to make incorrect results. A U-net-based generator is employed to create triggers for each clean image. Considering the stealthiness of the trigger, we restrict the elements of the trigger between −1 and 1. In the aspect of the feature representation domain, we adopt an entanglement cost function to minimize the gap between feature representations of benign and malicious inputs. The inseparability of benign and malicious feature representations contributes to the stealthiness of our attack against various model diagnosis-based defences. We validate the effectiveness and generalization of our method by conducting extensive experiments on multiple datasets and networks.\",\"PeriodicalId\":50641,\"journal\":{\"name\":\"Computer Journal\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/comjnl/bxad100\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/comjnl/bxad100","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Abstract Recent studies show that neural networks are vulnerable to backdoor attacks, in which compromised networks behave normally for clean inputs but make mistakes when a pre-defined trigger appears. Although prior studies have designed various invisible triggers to avoid causing visual anomalies, they cannot evade some trigger detectors. In this paper, we consider the stealthiness of backdoor attacks from input space and feature representation space. We propose a novel backdoor attack named sparse backdoor attack, and investigate the minimum required trigger to induce the well-trained networks to make incorrect results. A U-net-based generator is employed to create triggers for each clean image. Considering the stealthiness of the trigger, we restrict the elements of the trigger between −1 and 1. In the aspect of the feature representation domain, we adopt an entanglement cost function to minimize the gap between feature representations of benign and malicious inputs. The inseparability of benign and malicious feature representations contributes to the stealthiness of our attack against various model diagnosis-based defences. We validate the effectiveness and generalization of our method by conducting extensive experiments on multiple datasets and networks.
期刊介绍:
The Computer Journal is one of the longest-established journals serving all branches of the academic computer science community. It is currently published in four sections.