Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238483
Deepak Puthal, S. Mohanty, Amit Kumar Mishra, C. Yeun, Ernesto Damiani
The integration of machine learning (ML) and logical reasoning (LR) in cyber security is an emerging field that shows great potential for improving the efficiency and effectiveness of security systems. While ML can detect anomalies and patterns in large amounts of data, LR can provide a higher-level understanding of threats and enable better decision-making. This paper explores the future of ML and LR in cyber security and highlights how the integration of these two approaches can lead to more robust security systems. We discuss several use cases that demonstrate the effectiveness of the integrated approach, such as threat detection and response, vulnerability assessment, and security policy enforcement. Finally, we identify several research directions that will help advance the field, including the development of more explainable ML models and the integration of human-in-the-loop approaches.
{"title":"Revolutionizing Cyber Security: Exploring the Synergy of Machine Learning and Logical Reasoning for Cyber Threats and Mitigation","authors":"Deepak Puthal, S. Mohanty, Amit Kumar Mishra, C. Yeun, Ernesto Damiani","doi":"10.1109/ISVLSI59464.2023.10238483","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238483","url":null,"abstract":"The integration of machine learning (ML) and logical reasoning (LR) in cyber security is an emerging field that shows great potential for improving the efficiency and effectiveness of security systems. While ML can detect anomalies and patterns in large amounts of data, LR can provide a higher-level understanding of threats and enable better decision-making. This paper explores the future of ML and LR in cyber security and highlights how the integration of these two approaches can lead to more robust security systems. We discuss several use cases that demonstrate the effectiveness of the integrated approach, such as threat detection and response, vulnerability assessment, and security policy enforcement. Finally, we identify several research directions that will help advance the field, including the development of more explainable ML models and the integration of human-in-the-loop approaches.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123458793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238551
Marcello M. Muñoz, Denis Maass, Murilo R. Perleberg, Luciano Agostini, M. Porto
The Affine Motion Estimation (AME) is a new and high-complexity task of the Versatile Video Coding (VVC) standard. The AME requires the Affine Motion Compensation (MC) to be performed for 4$times$ 4 subblocks, where one among 156-tap interpolation filters was adopted to interpolate each sample of the 4$times$ 4 subblock according to the motion vector relative to this subblock. This work presents two dedicated hardware implementations for the Affine MC of the VVC standard, the first focusing on the reduction of power dissipation and the second on the area requirement. The ASIC synthesis results of these architectures for TSMC 40nm standard cells show an area requirement of 54. 43k gates and power dissipation of 12. 8mW for the power efficient variant, while for the hardware efficient, the area requirement is 21. 91k gates and power dissipation of 14.41mW.
{"title":"Efficient Hardware Design for the VVC Affine Motion Compensation Exploiting Multiple Constant Multiplication","authors":"Marcello M. Muñoz, Denis Maass, Murilo R. Perleberg, Luciano Agostini, M. Porto","doi":"10.1109/ISVLSI59464.2023.10238551","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238551","url":null,"abstract":"The Affine Motion Estimation (AME) is a new and high-complexity task of the Versatile Video Coding (VVC) standard. The AME requires the Affine Motion Compensation (MC) to be performed for 4$times$ 4 subblocks, where one among 156-tap interpolation filters was adopted to interpolate each sample of the 4$times$ 4 subblock according to the motion vector relative to this subblock. This work presents two dedicated hardware implementations for the Affine MC of the VVC standard, the first focusing on the reduction of power dissipation and the second on the area requirement. The ASIC synthesis results of these architectures for TSMC 40nm standard cells show an area requirement of 54. 43k gates and power dissipation of 12. 8mW for the power efficient variant, while for the hardware efficient, the area requirement is 21. 91k gates and power dissipation of 14.41mW.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130654787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238639
M. Jordan, Guilherme Korol, Tiago Knorst, M. B. Rutzig, A. C. S. Beck
Cloud warehouses have been adopting CPU-FPGA environments to accelerate clients’ applications with scalability. On the CPU side, DVFS improves energy efficiency. On the FPGA side, High-Level Synthesis enables hardware optimizations that lead to designs with variant characteristics (e.g., latency and power). Although both techniques have been used, they have never been cooperatively exploited to improve execution efficiency. For that, we propose RAHD, a framework that bridges the gap between DVFS, HLS multiple design versions, and CPU-FPGA environments. RAHD offers automatic fine-tuning selection of design versions and DVFS to efficiently balance workload, achieving 32.86x energy improvements over a standard provisioning strategy.
{"title":"Resource Provisioning for CPU-FPGA Environments with Adaptive HLS-Versioning and DVFS","authors":"M. Jordan, Guilherme Korol, Tiago Knorst, M. B. Rutzig, A. C. S. Beck","doi":"10.1109/ISVLSI59464.2023.10238639","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238639","url":null,"abstract":"Cloud warehouses have been adopting CPU-FPGA environments to accelerate clients’ applications with scalability. On the CPU side, DVFS improves energy efficiency. On the FPGA side, High-Level Synthesis enables hardware optimizations that lead to designs with variant characteristics (e.g., latency and power). Although both techniques have been used, they have never been cooperatively exploited to improve execution efficiency. For that, we propose RAHD, a framework that bridges the gap between DVFS, HLS multiple design versions, and CPU-FPGA environments. RAHD offers automatic fine-tuning selection of design versions and DVFS to efficiently balance workload, achieving 32.86x energy improvements over a standard provisioning strategy.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126541826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238569
Nidhi Anantharajaiah, Yunhe Xu, Fabian Lesniak, T. Harbaum, Jürgen Becker
Applications of different criticality sharing the same System-on-Chip (SoC) platform are increasing in popularity to reduce overall cost. Spatial and temporal isolation techniques are utilized to reduce inter application influence and to ensure real-time requirements are met. Spatial isolation involves partitioning communication resources and such partitions can result in irregular topologies. It is desirable that the on-chip interconnect on such systems support communication within all possible partition shapes using efficient routing techniques. To improve flexibility, adaptivity and reliability in such systems, it is desirable to incorporate topology agnostic routing algorithms which can compute optimal routes at runtime. For this purpose, we present a Distributed Reinforcement learning Enabled Adaptive Mixed-Critical Network-on-Chip (DREAM NoC) and supporting framework. DREAM is a distributed NoC which uses a topology agnostic reinforcement learning enabled routing algorithm based on the Ant Colony optimization (ACO) metaheuristic. We propose the DREAM framework which comprises of runtime discovery of paths and selection of optimal routes over time based on traffic fluctuations. We compare the performance against other topology agnostic algorithms under uniform random traffic and application traffic of a MPEG4 video decoder. The results show that the presented technique has upto 63% decrease in latency and 25% increase in throughput for certain irregular topologies under uniform random traffic scenario.
{"title":"DREAM: Distributed Reinforcement Learning Enabled Adaptive Mixed-Critical NoC","authors":"Nidhi Anantharajaiah, Yunhe Xu, Fabian Lesniak, T. Harbaum, Jürgen Becker","doi":"10.1109/ISVLSI59464.2023.10238569","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238569","url":null,"abstract":"Applications of different criticality sharing the same System-on-Chip (SoC) platform are increasing in popularity to reduce overall cost. Spatial and temporal isolation techniques are utilized to reduce inter application influence and to ensure real-time requirements are met. Spatial isolation involves partitioning communication resources and such partitions can result in irregular topologies. It is desirable that the on-chip interconnect on such systems support communication within all possible partition shapes using efficient routing techniques. To improve flexibility, adaptivity and reliability in such systems, it is desirable to incorporate topology agnostic routing algorithms which can compute optimal routes at runtime. For this purpose, we present a Distributed Reinforcement learning Enabled Adaptive Mixed-Critical Network-on-Chip (DREAM NoC) and supporting framework. DREAM is a distributed NoC which uses a topology agnostic reinforcement learning enabled routing algorithm based on the Ant Colony optimization (ACO) metaheuristic. We propose the DREAM framework which comprises of runtime discovery of paths and selection of optimal routes over time based on traffic fluctuations. We compare the performance against other topology agnostic algorithms under uniform random traffic and application traffic of a MPEG4 video decoder. The results show that the presented technique has upto 63% decrease in latency and 25% increase in throughput for certain irregular topologies under uniform random traffic scenario.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131468958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238626
Grant Brown, Ganesh Gore, P. Gaillardon
Field Programmable Gate Arrays (FPGA) have grown in popularity in a myriad of applications due to their reconfigurablity and lower non-recurrent engineering costs when compared to application specific integrated circuits (ASIC). To keep pace with growing application needs and process technology improvements, commerical FPGAs have traditionally chosen full custom chip design approaches. However, embedded FPGAs (eFPGA) have redesigned FPGA uses to be more application specific, thereby producing the need for an agile design approach to accelerate the eFPGA design process. Hence, recent agile FPGA design methods have introduced automation in the design process, allowing for a semi-automated fine-tuning of physical and architectural parameters which reduces the physical design iteration time for FPGAs. The novel grid-based design methods render the usage of commercially available Clock Tree Synthesis (CTS) algorithms on modern FPGA fabrics ineffective. To overcome these deficiencies, we propose a novel clock tree embedding algorithm, utilizing a symmetrical clock tree to ensure skew minimization followed by an efficient pruning method leveraging traditional Static Timing Analysis (STA) to improve clock latency. Experimental results on $2times 2, 7times 7, 8times 8, 29times 29$, and $32times 32$ FPGAs show that our proposed CTS algorithm can achieve up to a 50% improvement in latency and over a $10times$ reduction in skew when compared to an implementation using commercial CTS methodology.
{"title":"Performance Optimized Clock Tree Embedding for Auto-Generated FPGAs","authors":"Grant Brown, Ganesh Gore, P. Gaillardon","doi":"10.1109/ISVLSI59464.2023.10238626","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238626","url":null,"abstract":"Field Programmable Gate Arrays (FPGA) have grown in popularity in a myriad of applications due to their reconfigurablity and lower non-recurrent engineering costs when compared to application specific integrated circuits (ASIC). To keep pace with growing application needs and process technology improvements, commerical FPGAs have traditionally chosen full custom chip design approaches. However, embedded FPGAs (eFPGA) have redesigned FPGA uses to be more application specific, thereby producing the need for an agile design approach to accelerate the eFPGA design process. Hence, recent agile FPGA design methods have introduced automation in the design process, allowing for a semi-automated fine-tuning of physical and architectural parameters which reduces the physical design iteration time for FPGAs. The novel grid-based design methods render the usage of commercially available Clock Tree Synthesis (CTS) algorithms on modern FPGA fabrics ineffective. To overcome these deficiencies, we propose a novel clock tree embedding algorithm, utilizing a symmetrical clock tree to ensure skew minimization followed by an efficient pruning method leveraging traditional Static Timing Analysis (STA) to improve clock latency. Experimental results on $2times 2, 7times 7, 8times 8, 29times 29$, and $32times 32$ FPGAs show that our proposed CTS algorithm can achieve up to a 50% improvement in latency and over a $10times$ reduction in skew when compared to an implementation using commercial CTS methodology.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117173616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238642
Gabriel Ammes, P. Butzen, A. Reis, Renato P. Ribas
Approximate circuits are emerging as an alternative to save area, delay, and power consumption in error-resilient applications such as machine learning, computer vision, and signal processing. This work presents an evaluation of a logic synthesis approach by exploring two- and multi-level topologies in approximating digital circuit design. In such a strategy, two-level (2L) approximated logic synthesis (ALS) unlocks robust function optimization, whereas multi-level (ML) ALS acts over the structure simplification. Experimental results of combined exploitation of 2L- and ML-ALS have shown improvement in the average area and delay optimization compared to the state-of-the-art ML-ALS for 5% of error rate, being a reduction of up to 37% in circuit area and up to 31% in delay for the same error constraint.
{"title":"Evaluation of Digital Circuit Design by Combining Two - and Multi-Level Approximate Logic Synthesis","authors":"Gabriel Ammes, P. Butzen, A. Reis, Renato P. Ribas","doi":"10.1109/ISVLSI59464.2023.10238642","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238642","url":null,"abstract":"Approximate circuits are emerging as an alternative to save area, delay, and power consumption in error-resilient applications such as machine learning, computer vision, and signal processing. This work presents an evaluation of a logic synthesis approach by exploring two- and multi-level topologies in approximating digital circuit design. In such a strategy, two-level (2L) approximated logic synthesis (ALS) unlocks robust function optimization, whereas multi-level (ML) ALS acts over the structure simplification. Experimental results of combined exploitation of 2L- and ML-ALS have shown improvement in the average area and delay optimization compared to the state-of-the-art ML-ALS for 5% of error rate, being a reduction of up to 37% in circuit area and up to 31% in delay for the same error constraint.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123988359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238563
Sachin Bhat, S. Kulkarni, C. A. Moritz
Compact models are integral part of large-scale integrated circuit simulations and validation of new technologies. With technology scaling, however, compact models have become complex with lots of parameters involved. Hence, parameter extraction for new device technology is rather challenging. In this paper, we propose a probabilistic approach to compact model parameter extraction. We devise a Bayesian optimization technique which is specifically tailored for efficient extraction of BSIMCMG parameters for fitting nanowire junctionless transistors and 14nm FinFETs. The Bayesian optimization based extraction results show excellent fit to drain current data, with 6.5% normalized root-mean-square error for nanowire junctionless transistors. For a 14nm FinFET, the technique achieves 6.3% and 1.5% for drain current and capacitance data, respectively. This compares favourably to current tools available as well and improves on current tools available including industrial ones.
{"title":"Compact Model Parameter Extraction using Bayesian Machine Learning","authors":"Sachin Bhat, S. Kulkarni, C. A. Moritz","doi":"10.1109/ISVLSI59464.2023.10238563","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238563","url":null,"abstract":"Compact models are integral part of large-scale integrated circuit simulations and validation of new technologies. With technology scaling, however, compact models have become complex with lots of parameters involved. Hence, parameter extraction for new device technology is rather challenging. In this paper, we propose a probabilistic approach to compact model parameter extraction. We devise a Bayesian optimization technique which is specifically tailored for efficient extraction of BSIMCMG parameters for fitting nanowire junctionless transistors and 14nm FinFETs. The Bayesian optimization based extraction results show excellent fit to drain current data, with 6.5% normalized root-mean-square error for nanowire junctionless transistors. For a 14nm FinFET, the technique achieves 6.3% and 1.5% for drain current and capacitance data, respectively. This compares favourably to current tools available as well and improves on current tools available including industrial ones.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127263783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238542
L. H. Brendler, H. Lapuyade, Y. Deval, Ricardo Reis, F. Rivet
This work extends a new method to detect Multiple-Cell Upsets (MCU) in SRAM memories for space applications. The method involves spatially interleaving a memory plan with a network of memory radiation detectors. A 32kb interleaved data/detection SRAM was designed in the 28 nm FD-SOI Technology and tested using post-layout simulations. Results confirm the correct operation of the data and the detection cells of the memory, detecting single and multiple events inserted in different positions of the memory array. Considering the ratio between the number of data and detection cells used in this work (50%), the detection method can provide a probability of detecting MCUs in a memory plan that can reach close to 100%.
{"title":"A MCU-robust Interleaved Data/Detection SRAM for Space Environments","authors":"L. H. Brendler, H. Lapuyade, Y. Deval, Ricardo Reis, F. Rivet","doi":"10.1109/ISVLSI59464.2023.10238542","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238542","url":null,"abstract":"This work extends a new method to detect Multiple-Cell Upsets (MCU) in SRAM memories for space applications. The method involves spatially interleaving a memory plan with a network of memory radiation detectors. A 32kb interleaved data/detection SRAM was designed in the 28 nm FD-SOI Technology and tested using post-layout simulations. Results confirm the correct operation of the data and the detection cells of the memory, detecting single and multiple events inserted in different positions of the memory array. Considering the ratio between the number of data and detection cells used in this work (50%), the detection method can provide a probability of detecting MCUs in a memory plan that can reach close to 100%.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132140573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238634
Indranee Kashyap, Dipika Deb, Nityananda Sarma
Memory latency and off-chip bandwidth have been struggling to keep up with computing performance in modern computer systems. In this regard, prefetching helps in masking the long memory access latency at various cache levels by continuously monitoring an application’s memory access pattern. Upon detecting a pattern, it prefetches cache block ahead of its use. However, complex patterns such as directed or indirected pointer access, linked lists, and so on does not adhere to any specific pattern and hence, makes prefetching impossible. The paper proposes Grep, an adaptive graph based data prefetcher that monitors L1D cache misses and prefetches block in L2 cache. Unlike state-of-the-art prefetchers, Grep does not search for patterns in the miss stream. Rather, it generates a predecessor-successor relationship among the cache misses by constructing an occurrence graph that stores the frequency and sequence of subsequent cache block accesses. Therefore, both regular and irregular patterns in the miss stream can be predicted. Upon an address match in the occurrence graph, Grep prefetches block with a confidence value. Experimentally, it improves prefetch coverage and accuracy by 35.5% and 18.8%, respectively, compared to SPP.
{"title":"Grep: Performance Enhancement in MultiCore Processors using an Adaptive Graph Prefetcher","authors":"Indranee Kashyap, Dipika Deb, Nityananda Sarma","doi":"10.1109/ISVLSI59464.2023.10238634","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238634","url":null,"abstract":"Memory latency and off-chip bandwidth have been struggling to keep up with computing performance in modern computer systems. In this regard, prefetching helps in masking the long memory access latency at various cache levels by continuously monitoring an application’s memory access pattern. Upon detecting a pattern, it prefetches cache block ahead of its use. However, complex patterns such as directed or indirected pointer access, linked lists, and so on does not adhere to any specific pattern and hence, makes prefetching impossible. The paper proposes Grep, an adaptive graph based data prefetcher that monitors L1D cache misses and prefetches block in L2 cache. Unlike state-of-the-art prefetchers, Grep does not search for patterns in the miss stream. Rather, it generates a predecessor-successor relationship among the cache misses by constructing an occurrence graph that stores the frequency and sequence of subsequent cache block accesses. Therefore, both regular and irregular patterns in the miss stream can be predicted. Upon an address match in the occurrence graph, Grep prefetches block with a confidence value. Experimentally, it improves prefetch coverage and accuracy by 35.5% and 18.8%, respectively, compared to SPP.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125371130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238603
Anand Menon, Amisha Srivastava, Shamik Kundu, K. Basu
Kleptographic attacks are a type of security threat that involve weakening a cryptographic implementation in order to extract sensitive information from a computer system. These attacks can be particularly harmful when they target cryptographic keys or other security-critical information. Since software-based defenses are not robust, to address these threats, prior studies have explored the use of trusted hardware-based solutions, involving tailor-made Hardware Performance Counters (HPCs). However, these tailor-made HPCs lack the fine-grained characterization necessary to correctly differentiate between individual applications. As a result, a large number of HPCs are required to monitor the application, which incurs high overhead on the system. To this end, we propose the development of Register-Instruction Hardware Performance Counters (RIHPCs), a bespoke set of special-purpose registers designed to characterize applications, and thus detect Kleptographic attacks, with low granularity and low performance overhead. To assess the performance of RIHPCs against Kleptographic attacks, we profile NIST’s Post Quantum Cryptographic Key Encapsulation Mechanism (PQC-KEM) algorithms. Our results show that RIHPC traces can distinguish between PQC algorithms with an accuracy of over 99%, while furnishing up to 67% reduction in performance overhead in comparison to tailor-made HPCs.
{"title":"Application Profiling Using Register-Instruction Hardware Performance Counters","authors":"Anand Menon, Amisha Srivastava, Shamik Kundu, K. Basu","doi":"10.1109/ISVLSI59464.2023.10238603","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238603","url":null,"abstract":"Kleptographic attacks are a type of security threat that involve weakening a cryptographic implementation in order to extract sensitive information from a computer system. These attacks can be particularly harmful when they target cryptographic keys or other security-critical information. Since software-based defenses are not robust, to address these threats, prior studies have explored the use of trusted hardware-based solutions, involving tailor-made Hardware Performance Counters (HPCs). However, these tailor-made HPCs lack the fine-grained characterization necessary to correctly differentiate between individual applications. As a result, a large number of HPCs are required to monitor the application, which incurs high overhead on the system. To this end, we propose the development of Register-Instruction Hardware Performance Counters (RIHPCs), a bespoke set of special-purpose registers designed to characterize applications, and thus detect Kleptographic attacks, with low granularity and low performance overhead. To assess the performance of RIHPCs against Kleptographic attacks, we profile NIST’s Post Quantum Cryptographic Key Encapsulation Mechanism (PQC-KEM) algorithms. Our results show that RIHPC traces can distinguish between PQC algorithms with an accuracy of over 99%, while furnishing up to 67% reduction in performance overhead in comparison to tailor-made HPCs.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116564757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}