In recent years, security researchers have shifted their attentions to the underlying processors' architecture and proposed Hardware-Based Malware Detection (HMD) countermeasures to address inefficiencies of software-based detection methods. HMD techniques apply standard Machine Learning (ML) algorithms to the processors' low-level events collected from Hardware Performance Counter (HPC) registers. However, despite obtaining promising results for detecting known malware, the challenge of accurate zero-day (unknown) malware detection has remained an unresolved problem in existing HPC-based countermeasures. Our comprehensive analysis shows that standard ML classifiers are not effective in recognizing zero-day malware traces using HPC events. In response, we propose Deep-HMD, a two-stage intelligent and flexible approach based on deep neural network and transfer learning, for accurate zero-day malware detection based on image-based hardware events. The experimental results indicate that our proposed solution outperforms existing ML-based methods by achieving a 97% detection rate (F-Measure and Area Under the Curve) for detecting zero-day malware signatures at run-time using the top 4 hardware events with a minimal false positive rate and no hardware redesign overhead.
{"title":"Deep Neural Network and Transfer Learning for Accurate Hardware-Based Zero-Day Malware Detection","authors":"Z. He, Amin Rezaei, H. Homayoun, H. Sayadi","doi":"10.1145/3526241.3530326","DOIUrl":"https://doi.org/10.1145/3526241.3530326","url":null,"abstract":"In recent years, security researchers have shifted their attentions to the underlying processors' architecture and proposed Hardware-Based Malware Detection (HMD) countermeasures to address inefficiencies of software-based detection methods. HMD techniques apply standard Machine Learning (ML) algorithms to the processors' low-level events collected from Hardware Performance Counter (HPC) registers. However, despite obtaining promising results for detecting known malware, the challenge of accurate zero-day (unknown) malware detection has remained an unresolved problem in existing HPC-based countermeasures. Our comprehensive analysis shows that standard ML classifiers are not effective in recognizing zero-day malware traces using HPC events. In response, we propose Deep-HMD, a two-stage intelligent and flexible approach based on deep neural network and transfer learning, for accurate zero-day malware detection based on image-based hardware events. The experimental results indicate that our proposed solution outperforms existing ML-based methods by achieving a 97% detection rate (F-Measure and Area Under the Curve) for detecting zero-day malware signatures at run-time using the top 4 hardware events with a minimal false positive rate and no hardware redesign overhead.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132753435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aibin Yan, Zhihui He, Jing Xiang, Jie Cui, Yong Zhou, Zhengfeng Huang, P. Girard, X. Wen
Aggressive scaling of CMOS technologies requires to pay attention to the reliability issues of circuits. This paper presents two highly reliable RHBD 10T and 12T SRAM cells, which can protect against single-node upsets (SNUs) and double-node upsets (DNUs). The 10T cell mainly consists of two cross-coupled input-split inverters and the cell can robustly keep stored values through a feedback mechanism among its internal nodes. It also has a low cost in terms of area and power consumption, since it uses only a few transistors. Based on the 10T cell, a 12T cell is proposed that uses four parallel access transistors. The 12T cell has a reduced read/write access time with the same soft error tolerance when compared to the 10T cell. Simulation results demonstrate that the proposed cells can recover from SNUs and a part of DNUs. Moreover, compared with the state-of-the-art hardened SRAM cells, the proposed 10T cell can save 28.59% write access time, 55.83% read access time, and 4.46% power dissipation at the cost of 4.04% silicon area on average.
{"title":"Two 0.8 V, Highly Reliable RHBD 10T and 12T SRAM Cells for Aerospace Applications","authors":"Aibin Yan, Zhihui He, Jing Xiang, Jie Cui, Yong Zhou, Zhengfeng Huang, P. Girard, X. Wen","doi":"10.1145/3526241.3530312","DOIUrl":"https://doi.org/10.1145/3526241.3530312","url":null,"abstract":"Aggressive scaling of CMOS technologies requires to pay attention to the reliability issues of circuits. This paper presents two highly reliable RHBD 10T and 12T SRAM cells, which can protect against single-node upsets (SNUs) and double-node upsets (DNUs). The 10T cell mainly consists of two cross-coupled input-split inverters and the cell can robustly keep stored values through a feedback mechanism among its internal nodes. It also has a low cost in terms of area and power consumption, since it uses only a few transistors. Based on the 10T cell, a 12T cell is proposed that uses four parallel access transistors. The 12T cell has a reduced read/write access time with the same soft error tolerance when compared to the 10T cell. Simulation results demonstrate that the proposed cells can recover from SNUs and a part of DNUs. Moreover, compared with the state-of-the-art hardened SRAM cells, the proposed 10T cell can save 28.59% write access time, 55.83% read access time, and 4.46% power dissipation at the cost of 4.04% silicon area on average.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115365555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Niklas Bruns, V. Herdt, Daniel Große, R. Drechsler
In this paper, we propose a novel simulation-based cross-level approach for processor verification at the Register-Transfer Level (RTL). We leverage state-of-the-art coverage-guided fuzzing techniques from the software domain to generate processor-level input stimuli. An Instruction Set Simulator (ISS) is utilized as a reference model for the RTL processor under test in an efficient co-simulation setting. To further boost the fuzzing effectiveness, we devised custom mutation procedures tailored for the processor verification domain. Our experiments using the popular open-source RISC-V based VexRiscv processor demonstrate the effectiveness of our approach in finding intricate bugs at the processor level.
{"title":"Efficient Cross-Level Processor Verification using Coverage-guided Fuzzing","authors":"Niklas Bruns, V. Herdt, Daniel Große, R. Drechsler","doi":"10.1145/3526241.3530340","DOIUrl":"https://doi.org/10.1145/3526241.3530340","url":null,"abstract":"In this paper, we propose a novel simulation-based cross-level approach for processor verification at the Register-Transfer Level (RTL). We leverage state-of-the-art coverage-guided fuzzing techniques from the software domain to generate processor-level input stimuli. An Instruction Set Simulator (ISS) is utilized as a reference model for the RTL processor under test in an efficient co-simulation setting. To further boost the fuzzing effectiveness, we devised custom mutation procedures tailored for the processor verification domain. Our experiments using the popular open-source RISC-V based VexRiscv processor demonstrate the effectiveness of our approach in finding intricate bugs at the processor level.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128608270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Rios, Flavio Ponzina, G. Ansaloni, A. Levisse, David Atienza Alonso
The growing popularity of edge computing has fostered the development of diverse solutions to support Artificial Intelligence (AI) in energy-constrained devices. Nonetheless, comparatively few efforts have focused on the resiliency exhibited by AI workloads (such as Convolutional Neural Networks, CNNs) as an avenue towards increasing their run-time efficiency, and even fewer have proposed strategies to increase such resiliency. We herein address this challenge in the context of Bit-line Computing architectures, an embodiment of the in-memory computing paradigm tailored towards CNN applications. We show that little additional hardware is required to add highly effective error detection and mitigation in such platforms. In turn, our proposed scheme can cope with high error rates when performing memory accesses with no impact on CNNs accuracy, allowing for very aggressive voltage scaling. Complementary, we also show that CNN resiliency can be increased by algorithmic optimizations in addition to architectural ones, adopting a combined ensembling and pruning strategy that increases robustness while not inflating workload requirements. Experiments on different quantized CNN models reveal that our combined hardware/software approach enables the supply voltage to be reduced to just 650mV, decreasing the energy per inference up to 51.3%, without affecting the baseline CNN classification accuracy.
{"title":"Error Resilient In-Memory Computing Architecture for CNN Inference on the Edge","authors":"M. Rios, Flavio Ponzina, G. Ansaloni, A. Levisse, David Atienza Alonso","doi":"10.1145/3526241.3530351","DOIUrl":"https://doi.org/10.1145/3526241.3530351","url":null,"abstract":"The growing popularity of edge computing has fostered the development of diverse solutions to support Artificial Intelligence (AI) in energy-constrained devices. Nonetheless, comparatively few efforts have focused on the resiliency exhibited by AI workloads (such as Convolutional Neural Networks, CNNs) as an avenue towards increasing their run-time efficiency, and even fewer have proposed strategies to increase such resiliency. We herein address this challenge in the context of Bit-line Computing architectures, an embodiment of the in-memory computing paradigm tailored towards CNN applications. We show that little additional hardware is required to add highly effective error detection and mitigation in such platforms. In turn, our proposed scheme can cope with high error rates when performing memory accesses with no impact on CNNs accuracy, allowing for very aggressive voltage scaling. Complementary, we also show that CNN resiliency can be increased by algorithmic optimizations in addition to architectural ones, adopting a combined ensembling and pruning strategy that increases robustness while not inflating workload requirements. Experiments on different quantized CNN models reveal that our combined hardware/software approach enables the supply voltage to be reduced to just 650mV, decreasing the energy per inference up to 51.3%, without affecting the baseline CNN classification accuracy.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128138935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most of the applications of modern day VLSI designs are approaching towards energy efficient and high speed computing solutions. Approximate computing is considered a suitable design methodology that satisfies the current requirements of hardware and performance metrics without compromising on the outcome significantly. Many of the arithmetic operations are realized using approximate computing techniques, and many successful implementations are reported at system level designs. However divider operations in general are rarely realized in hardware and this needs much attention considering the surge in neural networks implementation in hardware. In this paper, a novel approximate divider is proposed which is not only characterized to have better accuracy and hardware efficient when compared to the other accurate dividers. The proposed divider is built on logarithmic divider and approximates the exponent part to achieve the desired hardware characteristics. The proposed 8-bit, and 16-bit divider design were realized in 45-NM CMOS technology for different input and output data format including integer, fixed-point, and floating-point. The proposed divider was characterized for error and hardware metrics and compared with other dividers. The novel divider was validated on K-means color quantization algorithm, showcasing improved quantization results.
{"title":"LEAD: Logarithmic Exponent Approximate Divider For Image Quantization Application","authors":"Omkar G. Ratnaparkhi, M. Rao","doi":"10.1145/3526241.3530323","DOIUrl":"https://doi.org/10.1145/3526241.3530323","url":null,"abstract":"Most of the applications of modern day VLSI designs are approaching towards energy efficient and high speed computing solutions. Approximate computing is considered a suitable design methodology that satisfies the current requirements of hardware and performance metrics without compromising on the outcome significantly. Many of the arithmetic operations are realized using approximate computing techniques, and many successful implementations are reported at system level designs. However divider operations in general are rarely realized in hardware and this needs much attention considering the surge in neural networks implementation in hardware. In this paper, a novel approximate divider is proposed which is not only characterized to have better accuracy and hardware efficient when compared to the other accurate dividers. The proposed divider is built on logarithmic divider and approximates the exponent part to achieve the desired hardware characteristics. The proposed 8-bit, and 16-bit divider design were realized in 45-NM CMOS technology for different input and output data format including integer, fixed-point, and floating-point. The proposed divider was characterized for error and hardware metrics and compared with other dividers. The novel divider was validated on K-means color quantization algorithm, showcasing improved quantization results.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126231943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we design a spiking neural network (SNN) accelerator based on the Liquid State Machine (LSM) which is more lightweight and bionic. In this accelerator, 512 leaky integrate-and-fire (LIF) neurons with configurable biological parameters are integrated. For the sparsity of computation and memory of the LSM, we use zero-skipping and weight compression to maximize the performance. The quantized 4-bit model deployed on the accelerator can achieve a classification accuracy of 97.42% on the DVS128 gesture dataset. We implement the accelerator on FPGA. Results indicate that its end-to-end average inference latency is 3.97 ms, which is 26 times better than the gesture recognition system based on TrueNorth.
{"title":"An Event Based Gesture Recognition System Using a Liquid State Machine Accelerator","authors":"Jing Zhu, Lei Wang, Xun Xiao, Zhijie Yang, Ziyang Kang, Shiming Li, LingHui Peng","doi":"10.1145/3526241.3530357","DOIUrl":"https://doi.org/10.1145/3526241.3530357","url":null,"abstract":"In this paper, we design a spiking neural network (SNN) accelerator based on the Liquid State Machine (LSM) which is more lightweight and bionic. In this accelerator, 512 leaky integrate-and-fire (LIF) neurons with configurable biological parameters are integrated. For the sparsity of computation and memory of the LSM, we use zero-skipping and weight compression to maximize the performance. The quantized 4-bit model deployed on the accelerator can achieve a classification accuracy of 97.42% on the DVS128 gesture dataset. We implement the accelerator on FPGA. Results indicate that its end-to-end average inference latency is 3.97 ms, which is 26 times better than the gesture recognition system based on TrueNorth.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127256637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Session 7B: Microelectronic Systems Education","authors":"B. Skromme","doi":"10.1145/3542695","DOIUrl":"https://doi.org/10.1145/3542695","url":null,"abstract":"","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122436734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hardware accelerator-based CNN inference improves the performance and latency but increases the time-to-market. As a result, CNN deployment on hardware is often outsourced to untrusted third parties (3Ps) with security risks, like hardware Trojans (HTs). Therefore, during the outsourcing, designers conceal the information about initial and final CNN layers from 3Ps. However, this paper shows that this solution is ineffective by proposing a hardware-intrinsic attack (HIA), Layer-based Noise Injection (LaBaNI), which successfully performs misclassification without knowing the initial and final layers. LaBaNi uses the statistical properties of feature maps of the CNN to design the trigger with a very low triggering probability and a payload for misclassification. To show the effectiveness of LaBaNI, we demonstrated it on LeNet and LeNet-3D CNN models deployed on Xilinx's PYNQ board. In the experimental results, the attack is successful, non-periodic, and random, hence difficult to detect. Results show that LaBaNI utilizes up to 4% extra LUTs, 5% extra DSPs, and 2% extra FFs, respectively.
基于硬件加速器的CNN推理提高了性能和延迟,但增加了上市时间。因此,CNN在硬件上的部署通常外包给不受信任的第三方(3p),这些第三方存在安全风险,比如硬件木马(ht)。因此,在外包过程中,设计师对第三方隐瞒了CNN初始层和最终层的信息。然而,本文通过提出一种硬件固有攻击(HIA),基于层的噪声注入(LaBaNI),表明该解决方案是无效的,该方法在不知道初始层和最终层的情况下成功地执行错误分类。LaBaNi利用CNN的特征图的统计特性,设计了触发概率极低的触发器和误分类有效载荷。为了展示LaBaNI的有效性,我们在部署在赛灵思PYNQ板上的LeNet和LeNet- 3d CNN模型上进行了演示。在实验结果中,攻击是成功的,非周期性和随机性,因此难以检测。结果表明,LaBaNI分别利用高达4%的额外lut, 5%的额外dsp和2%的额外ff。
{"title":"LaBaNI: Layer-based Noise Injection Attack on Convolutional Neural Networks","authors":"Tolulope A. Odetola, Faiq Khalid, S. R. Hasan","doi":"10.1145/3526241.3530385","DOIUrl":"https://doi.org/10.1145/3526241.3530385","url":null,"abstract":"Hardware accelerator-based CNN inference improves the performance and latency but increases the time-to-market. As a result, CNN deployment on hardware is often outsourced to untrusted third parties (3Ps) with security risks, like hardware Trojans (HTs). Therefore, during the outsourcing, designers conceal the information about initial and final CNN layers from 3Ps. However, this paper shows that this solution is ineffective by proposing a hardware-intrinsic attack (HIA), Layer-based Noise Injection (LaBaNI), which successfully performs misclassification without knowing the initial and final layers. LaBaNi uses the statistical properties of feature maps of the CNN to design the trigger with a very low triggering probability and a payload for misclassification. To show the effectiveness of LaBaNI, we demonstrated it on LeNet and LeNet-3D CNN models deployed on Xilinx's PYNQ board. In the experimental results, the attack is successful, non-periodic, and random, hence difficult to detect. Results show that LaBaNI utilizes up to 4% extra LUTs, 5% extra DSPs, and 2% extra FFs, respectively.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125967572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hardware Trojans (HTs) have introduced serious security concerns into the integrated circuit design flow as they can undermine circuit operations by leaking sensitive information, causing malfunction, or similar attacks. An earlier-introduced HT detection technique in gate-level netlist, the Controllability and Observability for hardware Trojan Detection (COTD) technique detects HTs based on controllability and observability signals in a circuit and presents a static analysis based on an unsupervised machine learning model to identify HT signals in the circuit. While COTD detects the existence of HTs in a circuit, some work has highlighted the shortcoming of COTD in detecting some HT signals that present similar features as genuine signals. To address this shortcoming, this paper presents an improved COTD technique. The improved COTD technique introduces an iterative unsupervised machine-learning technique to isolate HT signals. Furthermore, the improved COTD is equipped with the Gradual-N-Justification (GNJ) technique to reduce false-positive rates in detecting HT signals. The improved COTD technique is applied to several different combinations of full-scan and partial scan circuits tampered with hard-to-detect sequential HTs. To realize valid and hard-to-detect HTs, a configurable HT insertion platform is utilized. The comprehensive results have shown that the improved COTD is highly scalable. Furthermore, the improved COTD technique does not miss a HT circuit if exists and it offers a false-positive rate as low as 3.4%, on average.
{"title":"The Improved COTD Technique for Hardware Trojan Detection in Gate-level Netlist","authors":"H. Salmani","doi":"10.1145/3526241.3530835","DOIUrl":"https://doi.org/10.1145/3526241.3530835","url":null,"abstract":"Hardware Trojans (HTs) have introduced serious security concerns into the integrated circuit design flow as they can undermine circuit operations by leaking sensitive information, causing malfunction, or similar attacks. An earlier-introduced HT detection technique in gate-level netlist, the Controllability and Observability for hardware Trojan Detection (COTD) technique detects HTs based on controllability and observability signals in a circuit and presents a static analysis based on an unsupervised machine learning model to identify HT signals in the circuit. While COTD detects the existence of HTs in a circuit, some work has highlighted the shortcoming of COTD in detecting some HT signals that present similar features as genuine signals. To address this shortcoming, this paper presents an improved COTD technique. The improved COTD technique introduces an iterative unsupervised machine-learning technique to isolate HT signals. Furthermore, the improved COTD is equipped with the Gradual-N-Justification (GNJ) technique to reduce false-positive rates in detecting HT signals. The improved COTD technique is applied to several different combinations of full-scan and partial scan circuits tampered with hard-to-detect sequential HTs. To realize valid and hard-to-detect HTs, a configurable HT insertion platform is utilized. The comprehensive results have shown that the improved COTD is highly scalable. Furthermore, the improved COTD technique does not miss a HT circuit if exists and it offers a false-positive rate as low as 3.4%, on average.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127352395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep Convolutional Neural Networks (DCNNs) have revolutionized and improved many aspects of modern life. However, these models are increasingly more complex, and training them to perform at desirable levels is difficult undertaking; hence, the trained parameters represent a valuable intellectual property (IP) asset which a motivated attacker may wish to steal. To better protect the IP, we propose a method of lightweight input obfuscation that is undone prior to inference, where input data is obfuscated in order to use the model to specification. Without using the correct key and unlocking sequence, the accuracy of the classifier is reduced to a random guess, thus protecting the input/output interface and mitigating model extraction attacks which rely on such access. We evaluate the system using a VGG-16 network trained on CIFAR-10, and demonstrate that with an incorrect deobfuscation key or sequence, the classification accuracy drops to a random guess, with an inference timing overhead of 4.4% on an Nvidia-based evaluation platform. The system avoids the costs associated with retraining and has no impact on model accuracy for authorized users.
{"title":"Protecting Deep Neural Network Intellectual Property with Architecture-Agnostic Input Obfuscation","authors":"Brooks Olney, Robert Karam","doi":"10.1145/3526241.3530386","DOIUrl":"https://doi.org/10.1145/3526241.3530386","url":null,"abstract":"Deep Convolutional Neural Networks (DCNNs) have revolutionized and improved many aspects of modern life. However, these models are increasingly more complex, and training them to perform at desirable levels is difficult undertaking; hence, the trained parameters represent a valuable intellectual property (IP) asset which a motivated attacker may wish to steal. To better protect the IP, we propose a method of lightweight input obfuscation that is undone prior to inference, where input data is obfuscated in order to use the model to specification. Without using the correct key and unlocking sequence, the accuracy of the classifier is reduced to a random guess, thus protecting the input/output interface and mitigating model extraction attacks which rely on such access. We evaluate the system using a VGG-16 network trained on CIFAR-10, and demonstrate that with an incorrect deobfuscation key or sequence, the classification accuracy drops to a random guess, with an inference timing overhead of 4.4% on an Nvidia-based evaluation platform. The system avoids the costs associated with retraining and has no impact on model accuracy for authorized users.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131423710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}