Sayandeep Saha, S. N. Kumar, Sikhar Patranabis, Debdeep Mukhopadhyay, P. Dasgupta
Assessment of the security provided by a fault attack countermeasure is challenging, given that a protected cipher may leak the key if the countermeasure is not designed correctly. This paper proposes, for the first time, a statistical framework to detect information leakage in fault attack countermeasures. Based on the concept of non-interference, we formalize the leakage for fault attacks and provide a t-test based methodology for leakage assessment. One major strength of the proposed framework is that leakage can be detected without the complete knowledge of the countermeasure algorithm, solely by observing the faulty ciphertext distributions. Experimental evaluation over a representative set of countermeasures establishes the efficacy of the proposed methodology.
{"title":"ALAFA","authors":"Sayandeep Saha, S. N. Kumar, Sikhar Patranabis, Debdeep Mukhopadhyay, P. Dasgupta","doi":"10.1145/3316781.3317763","DOIUrl":"https://doi.org/10.1145/3316781.3317763","url":null,"abstract":"Assessment of the security provided by a fault attack countermeasure is challenging, given that a protected cipher may leak the key if the countermeasure is not designed correctly. This paper proposes, for the first time, a statistical framework to detect information leakage in fault attack countermeasures. Based on the concept of non-interference, we formalize the leakage for fault attacks and provide a t-test based methodology for leakage assessment. One major strength of the proposed framework is that leakage can be detected without the complete knowledge of the countermeasure algorithm, solely by observing the faulty ciphertext distributions. Experimental evaluation over a representative set of countermeasures establishes the efficacy of the proposed methodology.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"688 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117237898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daimeng Wang, Zhiyun Qian, Nael B. Abu-Ghazaleh, S. Krishnamurthy
CPU memory prefetchers can substantially interfere with prime and probe cache side-channel attacks, especially on in-order CPUs which use aggressive prefetching. This interference is not accounted for in previous attacks. In this paper, we propose PAPP, a PrefetcherAware Prime Probe attack that can operate even in the presence of aggressive prefetchers. Specifically, we reverse engineer the prefetcher and replacement policy on several CPUs and use these insights to design a prime and probe attack that minimizes the impact of the prefetcher. We evaluate PAPP using Cache Side-channel Vulnerability (CSV) metric and demonstrate the substantial improvements in the quality of the channel under different conditions.
{"title":"PAPP","authors":"Daimeng Wang, Zhiyun Qian, Nael B. Abu-Ghazaleh, S. Krishnamurthy","doi":"10.1145/3316781.3317877","DOIUrl":"https://doi.org/10.1145/3316781.3317877","url":null,"abstract":"CPU memory prefetchers can substantially interfere with prime and probe cache side-channel attacks, especially on in-order CPUs which use aggressive prefetching. This interference is not accounted for in previous attacks. In this paper, we propose PAPP, a PrefetcherAware Prime Probe attack that can operate even in the presence of aggressive prefetchers. Specifically, we reverse engineer the prefetcher and replacement policy on several CPUs and use these insights to design a prime and probe attack that minimizes the impact of the prefetcher. We evaluate PAPP using Cache Side-channel Vulnerability (CSV) metric and demonstrate the substantial improvements in the quality of the channel under different conditions.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122389862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The requirement of high performance computing at low power can be met by the parallel execution of an application on a possibly large number of programmable cores. However, the lack of accurate timing properties may prevent parallel execution from being applicable to time-critical applications. This problem has been addressed by suitably designing the architecture, implementation, and programming models, of the Kalray MPPA (Multi-Purpose Processor Array) family of single-chip many-core processors. We introduce the third-generation MPPA processor, whose key features are motivated by the high-performance and high-integrity functions of automated vehicles. High-performance computing functions, represented by deep learning inference and by computer vision, need to execute under soft real-time constraints. High-integrity functions are developed under model-based design, and must meet hard real-time constraints. Finally, the third-generation MPPA processor integrates a hardware root of trust, and its security architecture is able to support a security kernel for implementing the trusted execution environment functions required by applications.
{"title":"Consolidating High-Integrity, High-Performance, and Cyber-Security Functions on a Manycore Processor","authors":"B. D. de Dinechin","doi":"10.1145/3316781.3323473","DOIUrl":"https://doi.org/10.1145/3316781.3323473","url":null,"abstract":"The requirement of high performance computing at low power can be met by the parallel execution of an application on a possibly large number of programmable cores. However, the lack of accurate timing properties may prevent parallel execution from being applicable to time-critical applications. This problem has been addressed by suitably designing the architecture, implementation, and programming models, of the Kalray MPPA (Multi-Purpose Processor Array) family of single-chip many-core processors. We introduce the third-generation MPPA processor, whose key features are motivated by the high-performance and high-integrity functions of automated vehicles. High-performance computing functions, represented by deep learning inference and by computer vision, need to execute under soft real-time constraints. High-integrity functions are developed under model-based design, and must meet hard real-time constraints. Finally, the third-generation MPPA processor integrates a hardware root of trust, and its security architecture is able to support a security kernel for implementing the trusted execution environment functions required by applications.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117295334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing integration of semiconductor design, many problems have emerged. Row-hammering is one of these problems. The row-hammering effect is a critical issue for reliable memory operation because it can cause some unexpected errors. Hence, it is necessary to address this problem. Mainly, there are two different methods to deal with the row-hammering problem. One is a counter based method, and the other is a probabilistic method. This paper proposes the improved version of the latter method and compares it with other probabilistic methods, PARA and PRoHIT. According to the evaluation results, comparing the proposed method with conventional ones, the proposed one has increased row-hammering reduction per refresh 1.82 and 7.78 times against PARA and PRoHIT in average, respectively. CCS CONCEPTS •Security and privacy $rightarrow$ Hardware attacks and countermeasures; •Hardware $rightarrow$ Dynamic memory;
{"title":"MRLoc","authors":"Jung Min You, Joon-Sung Yang","doi":"10.1145/3316781.3317866","DOIUrl":"https://doi.org/10.1145/3316781.3317866","url":null,"abstract":"With the increasing integration of semiconductor design, many problems have emerged. Row-hammering is one of these problems. The row-hammering effect is a critical issue for reliable memory operation because it can cause some unexpected errors. Hence, it is necessary to address this problem. Mainly, there are two different methods to deal with the row-hammering problem. One is a counter based method, and the other is a probabilistic method. This paper proposes the improved version of the latter method and compares it with other probabilistic methods, PARA and PRoHIT. According to the evaluation results, comparing the proposed method with conventional ones, the proposed one has increased row-hammering reduction per refresh 1.82 and 7.78 times against PARA and PRoHIT in average, respectively. CCS CONCEPTS •Security and privacy $rightarrow$ Hardware attacks and countermeasures; •Hardware $rightarrow$ Dynamic memory;","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115349436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Speculative execution is an essential performance enhancing technique in modern processors, but it has been shown to be insecure. In this paper, we propose SpectreGuard, a novel defense mechanism against Spectre attacks. In our approach, sensitive memory blocks (e.g., secret keys) are marked using simple OS/library API, which are then selectively protected by hardware from Spectre attacks via low-cost micro-architecture extension. This technique allows microprocessors to maintain high performance, while restoring the control to software developers to make security and performance trade-offs.
{"title":"SpectreGuard","authors":"Jacob Fustos, F. Farshchi, H. Yun","doi":"10.1145/3316781.3317914","DOIUrl":"https://doi.org/10.1145/3316781.3317914","url":null,"abstract":"Speculative execution is an essential performance enhancing technique in modern processors, but it has been shown to be insecure. In this paper, we propose SpectreGuard, a novel defense mechanism against Spectre attacks. In our approach, sensitive memory blocks (e.g., secret keys) are marked using simple OS/library API, which are then selectively protected by hardware from Spectre attacks via low-cost micro-architecture extension. This technique allows microprocessors to maintain high performance, while restoring the control to software developers to make security and performance trade-offs.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115804463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present two contrasting approaches to achieve time predictability in the embedded compute engine, the basic building block of any Internet of Things (IoT) or Cyber-Physical (CPS) system. The traditional approach offers predictability on top of unpredictable processors with numerous optimizations for enhanced performance and programmability at the cost of huge variability in timing. Approaches such as Worst-Case Execution Time (WCET) analysis of software have been struggling to model the complex timing behavior of the underlying processor to provide guarantees. On the other hand, the inevitable slowdown of Moore's Law and the end of Dennard scaling have curtailed the performance and energy scaling of the processors. This stagnation in conjunction with the importance of cognitive computing have motivated widespread adoption of non-von Neumann accelerators and architectures. We argue that these emerging architectures are inherently time-predictable as they depend on software to orchestrate the computation and data movement and are an excellent match for the real-time processing needs.
{"title":"Time-Predictable Computing by Design: Looking Back, Looking Forward","authors":"T. Mitra","doi":"10.1145/3316781.3323489","DOIUrl":"https://doi.org/10.1145/3316781.3323489","url":null,"abstract":"We present two contrasting approaches to achieve time predictability in the embedded compute engine, the basic building block of any Internet of Things (IoT) or Cyber-Physical (CPS) system. The traditional approach offers predictability on top of unpredictable processors with numerous optimizations for enhanced performance and programmability at the cost of huge variability in timing. Approaches such as Worst-Case Execution Time (WCET) analysis of software have been struggling to model the complex timing behavior of the underlying processor to provide guarantees. On the other hand, the inevitable slowdown of Moore's Law and the end of Dennard scaling have curtailed the performance and energy scaling of the processors. This stagnation in conjunction with the importance of cognitive computing have motivated widespread adoption of non-von Neumann accelerators and architectures. We argue that these emerging architectures are inherently time-predictable as they depend on software to orchestrate the computation and data movement and are an excellent match for the real-time processing needs.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124827017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning systems have had enormous success in a wide range of fields from computer vision, natural language processing, and anomaly detection. However, such systems are vulnerable to attackers who can cause deliberate misclassification by introducing small perturbations. With machine learning systems being proposed for cyber attack detection such attackers are cause for serious concern. Despite this the vast majority of adversarial machine learning security research is focused on the image domain. This work gives a brief overview of adversarial machine learning and machine learning used in cyber attack detection and suggests key differences between the traditional image domain of adversarial machine learning and the cyber domain. Finally we show an adversarial machine learning attack on an industrial control system.
{"title":"Adversarial Machine Learning Beyond the Image Domain","authors":"Giulio Zizzo, C. Hankin, S. Maffeis, K. Jones","doi":"10.1145/3316781.3323470","DOIUrl":"https://doi.org/10.1145/3316781.3323470","url":null,"abstract":"Machine learning systems have had enormous success in a wide range of fields from computer vision, natural language processing, and anomaly detection. However, such systems are vulnerable to attackers who can cause deliberate misclassification by introducing small perturbations. With machine learning systems being proposed for cyber attack detection such attackers are cause for serious concern. Despite this the vast majority of adversarial machine learning security research is focused on the image domain. This work gives a brief overview of adversarial machine learning and machine learning used in cyber attack detection and suggests key differences between the traditional image domain of adversarial machine learning and the cyber domain. Finally we show an adversarial machine learning attack on an industrial control system.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124849282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dylan C. Stow, Itir Akgun, Wenqin Huangfu, Yuan Xie, Xueqi Li, G. Loh
Emerging Monolithic Three-Dimensional (M3D) integration technology will not only provide improved circuit density through the high-bandwidth coupling of multiple vertically-stacked layers, but it can also provide new architectural opportunities for on-chip computation, memory, and communication that are beyond the capabilities of existing process and packaging technologies. For example, with massive parallel communication between heterogeneous memory and compute layers, existing processing-in-memory architectures can be optimized and expanded, developing into efficient and flexible near-data processors. Additionally, multiple tiers of interconnect can be dynamically leveraged to provide an efficient, scalable interconnect fabric that spans the three-dimensional system. This work explores some of the challenges and opportunities presented by M3D technology for emerging computer architectures, with focus on improving efficiency and increasing system flexibility.
{"title":"Efficient System Architecture in the Era of Monolithic 3D","authors":"Dylan C. Stow, Itir Akgun, Wenqin Huangfu, Yuan Xie, Xueqi Li, G. Loh","doi":"10.1145/3316781.3323475","DOIUrl":"https://doi.org/10.1145/3316781.3323475","url":null,"abstract":"Emerging Monolithic Three-Dimensional (M3D) integration technology will not only provide improved circuit density through the high-bandwidth coupling of multiple vertically-stacked layers, but it can also provide new architectural opportunities for on-chip computation, memory, and communication that are beyond the capabilities of existing process and packaging technologies. For example, with massive parallel communication between heterogeneous memory and compute layers, existing processing-in-memory architectures can be optimized and expanded, developing into efficient and flexible near-data processors. Additionally, multiple tiers of interconnect can be dynamically leveraged to provide an efficient, scalable interconnect fabric that spans the three-dimensional system. This work explores some of the challenges and opportunities presented by M3D technology for emerging computer architectures, with focus on improving efficiency and increasing system flexibility.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125545011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Basu, Rana Elnaggar, Krishnendu Chakrabarty, R. Karri
Anti-virus software (AVS) tools are used to detect Malware in a system. However, software-based AVS are vulnerable to attacks. A malicious entity can exploit these vulnerabilities to subvert the AVS. Recently, hardware components such as Hardware Performance Counters (HPC) have been used for Malware detection. In this paper, we propose PREEMPT, a zero overhead, high-accuracy and low-latency technique to detect Malware by re-purposing the embedded trace buffer (ETB), a debug hardware component available in most modern processors. The ETB is used for post-silicon validation and debug and allows us to control and monitor the internal activities of a chip, beyond what is provided by the Input/Output pins. PREEMPT combines these hardware-level observations with machine learning-based classifiers to preempt Malware before it can cause damage. There are many benefits of re-using the ETB for Malware detection. It is difficult to hack into hardware compared to software, and hence, PREEMPT is more robust against attacks than AVS. PREEMPT does not incur performance penalties. Finally, PREEMPT has a high True Positive value of 94% and maintains a low False Positive value of 2%.
{"title":"PREEMPT","authors":"K. Basu, Rana Elnaggar, Krishnendu Chakrabarty, R. Karri","doi":"10.1145/3316781.3317883","DOIUrl":"https://doi.org/10.1145/3316781.3317883","url":null,"abstract":"Anti-virus software (AVS) tools are used to detect Malware in a system. However, software-based AVS are vulnerable to attacks. A malicious entity can exploit these vulnerabilities to subvert the AVS. Recently, hardware components such as Hardware Performance Counters (HPC) have been used for Malware detection. In this paper, we propose PREEMPT, a zero overhead, high-accuracy and low-latency technique to detect Malware by re-purposing the embedded trace buffer (ETB), a debug hardware component available in most modern processors. The ETB is used for post-silicon validation and debug and allows us to control and monitor the internal activities of a chip, beyond what is provided by the Input/Output pins. PREEMPT combines these hardware-level observations with machine learning-based classifiers to preempt Malware before it can cause damage. There are many benefits of re-using the ETB for Malware detection. It is difficult to hack into hardware compared to software, and hence, PREEMPT is more robust against attacks than AVS. PREEMPT does not incur performance penalties. Finally, PREEMPT has a high True Positive value of 94% and maintains a low False Positive value of 2%.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121026276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PIM (Processing-in-memory)-based CNN (Convolutional neural network) accelerators leverage the characteristics of basic memory cells to enable simple logic and arithmetic operations so that the bandwidth constraint can be effectively alleviated. However, it remains a major challenge to support multiplication operations efficiently on PIM accelerators, in particular, DRAM-based PIM accelerators. This has prevented PIM-based accelerators from being immediately adopted for accurate CNN inference.In this paper, we propose LAcc, a DRAM-based PI M accelerator to support LUT-(lookup table) based fast and accurate multiplication. By enabling LUT based vector multiplication in DRAM, LAcc effectively decreases LUT size and improve its reuse. LAcc further adopts a hybrid mapping of weights and inputs to improve the hardware utilization rate. LAcc achieves 95 FPS at 5.3 W for Alexnet and 6.3× efficiency improvement over the state-of-the-art.
{"title":"LAcc","authors":"Quan Deng, Youtao Zhang, Minxuan Zhang, Jun Yang","doi":"10.1145/3316781.3317845","DOIUrl":"https://doi.org/10.1145/3316781.3317845","url":null,"abstract":"PIM (Processing-in-memory)-based CNN (Convolutional neural network) accelerators leverage the characteristics of basic memory cells to enable simple logic and arithmetic operations so that the bandwidth constraint can be effectively alleviated. However, it remains a major challenge to support multiplication operations efficiently on PIM accelerators, in particular, DRAM-based PIM accelerators. This has prevented PIM-based accelerators from being immediately adopted for accurate CNN inference.In this paper, we propose LAcc, a DRAM-based PI M accelerator to support LUT-(lookup table) based fast and accurate multiplication. By enabling LUT based vector multiplication in DRAM, LAcc effectively decreases LUT size and improve its reuse. LAcc further adopts a hybrid mapping of weights and inputs to improve the hardware utilization rate. LAcc achieves 95 FPS at 5.3 W for Alexnet and 6.3× efficiency improvement over the state-of-the-art.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"36 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121160234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}