Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238682
Kamal Danouchi, G. Prenat, Philippe Talatchian, Louis Hutin, Lorena Anghel
The efficiency of known algorithms for solving NP- hard problems is constrained by the limitations of conventional von Neumann architectures. Recurrent networks of stochastic neurons are an appealing alternative to conventional computing architectures, as they potentially allow exploring the binary search space of NP-hard problems with limited resources and overheads. In this study, we consider the case of Boolean Satisfiability on small logic functions, with technological implementations based on Spin-Orbit Torque Magnetic Tunnel Junctions. We propose innovative circuit-level implementations of invertible logic architectures for an AND gate and a Full Adder, emphasizing the design constraints of such invertible logic operations. Simulation results demonstrate the feasibility of SOT-based implementations, and their robustness against process variations. The realistic implementation enables identifying the main power efficiency trade-offs.
{"title":"Robustness and Power Efficiency in Spin-Orbit Torque-Based Probabilistic Logic Circuits","authors":"Kamal Danouchi, G. Prenat, Philippe Talatchian, Louis Hutin, Lorena Anghel","doi":"10.1109/ISVLSI59464.2023.10238682","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238682","url":null,"abstract":"The efficiency of known algorithms for solving NP- hard problems is constrained by the limitations of conventional von Neumann architectures. Recurrent networks of stochastic neurons are an appealing alternative to conventional computing architectures, as they potentially allow exploring the binary search space of NP-hard problems with limited resources and overheads. In this study, we consider the case of Boolean Satisfiability on small logic functions, with technological implementations based on Spin-Orbit Torque Magnetic Tunnel Junctions. We propose innovative circuit-level implementations of invertible logic architectures for an AND gate and a Full Adder, emphasizing the design constraints of such invertible logic operations. Simulation results demonstrate the feasibility of SOT-based implementations, and their robustness against process variations. The realistic implementation enables identifying the main power efficiency trade-offs.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131963713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238669
Vedika Saravanan, Mohammad Walid Charrwi, S. Saeed
The distributed supply chain of the semiconductor industry has promoted several attacks at different stages of Integrated Circuit (IC) design and manufacturing. Hardware Trojans (HTs) injected into the IC by a malicious foundry can lead to catastrophic consequences. Recent research efforts have shown the power of reinforcement learning not only in detecting HTs but also bypassing these detection mechanisms. However, they do not take into account the detailed circuit structural information. In this paper, we explore different new strategies for triggering HTs to evaluate the most recently proposed post-silicon HT detection techniques. Specifically, we develop different automated and scalable rare net selection techniques to construct HT trigger conditions informed by the circuit structure. We evaluate our approaches for different benchmarks against the most recently proposed reinforcement learning and other state-of-the-art logic testing HT detection techniques.
{"title":"Revisiting Trojan Insertion Techniques for Post-Silicon Trojan Detection Evaluation","authors":"Vedika Saravanan, Mohammad Walid Charrwi, S. Saeed","doi":"10.1109/ISVLSI59464.2023.10238669","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238669","url":null,"abstract":"The distributed supply chain of the semiconductor industry has promoted several attacks at different stages of Integrated Circuit (IC) design and manufacturing. Hardware Trojans (HTs) injected into the IC by a malicious foundry can lead to catastrophic consequences. Recent research efforts have shown the power of reinforcement learning not only in detecting HTs but also bypassing these detection mechanisms. However, they do not take into account the detailed circuit structural information. In this paper, we explore different new strategies for triggering HTs to evaluate the most recently proposed post-silicon HT detection techniques. Specifically, we develop different automated and scalable rare net selection techniques to construct HT trigger conditions informed by the circuit structure. We evaluate our approaches for different benchmarks against the most recently proposed reinforcement learning and other state-of-the-art logic testing HT detection techniques.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114503350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238633
Taixin Li, Hongtao Zhong, Sumitha George, N. Vijaykrishnan, Liang Shi, Huazhong Yang, Xueqing Li
Multi-Level Ternary Content Addressable Memories (ML-TCAMs) are a type of TCAM that can calculate the hamming distance between the stored data and the input vector, which can be used to accelerate several specific applications. There have been several existing current-domain and charge-domain ML-TCAMs based on SRAMs and nonvolatile memories (NVMs). However, they fail to meet a good balance between area and computational accuracy tradeoffs.In this paper, for the first time, we explore the design of dynamic ML-TCAMs that achieve both high cell density and high accuracy, and propose DyLAN, the current-domain dynamic ML-TCAM using the 4-terminal nanoelectromechanical (NEM) relays. Specifically, combined with the nearly zero OFF-state leakage and stable ON-state current of the 4-terminal NEM relays, this paper proposes DyLAN-W with ultra-long retention time and DyLAN-S with ultra-low single refresh overhead and high density, respectively. Results show that DyLAN achieves up to 2.7 x and 4.9x area reduction compared with the 16T SRAM ML-TCAM and the charge-domain ML-TCAMs, respectively, and increases the few-shot learning accuracy by 13.7% (from 79.9% to 93.6%) on average compared with the state-of-the-art nonvolatile ML-TCAM, i.e., the 2FeFET ML-TCAM.
{"title":"Design Exploration of Dynamic Multi-Level Ternary Content-Addressable Memory Using Nanoelectromechanical Relays","authors":"Taixin Li, Hongtao Zhong, Sumitha George, N. Vijaykrishnan, Liang Shi, Huazhong Yang, Xueqing Li","doi":"10.1109/ISVLSI59464.2023.10238633","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238633","url":null,"abstract":"Multi-Level Ternary Content Addressable Memories (ML-TCAMs) are a type of TCAM that can calculate the hamming distance between the stored data and the input vector, which can be used to accelerate several specific applications. There have been several existing current-domain and charge-domain ML-TCAMs based on SRAMs and nonvolatile memories (NVMs). However, they fail to meet a good balance between area and computational accuracy tradeoffs.In this paper, for the first time, we explore the design of dynamic ML-TCAMs that achieve both high cell density and high accuracy, and propose DyLAN, the current-domain dynamic ML-TCAM using the 4-terminal nanoelectromechanical (NEM) relays. Specifically, combined with the nearly zero OFF-state leakage and stable ON-state current of the 4-terminal NEM relays, this paper proposes DyLAN-W with ultra-long retention time and DyLAN-S with ultra-low single refresh overhead and high density, respectively. Results show that DyLAN achieves up to 2.7 x and 4.9x area reduction compared with the 16T SRAM ML-TCAM and the charge-domain ML-TCAMs, respectively, and increases the few-shot learning accuracy by 13.7% (from 79.9% to 93.6%) on average compared with the state-of-the-art nonvolatile ML-TCAM, i.e., the 2FeFET ML-TCAM.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115488235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238574
Hui Wang, Jinming Lu, Jun Lin, Zhongfeng Wang
Convolutional neural networks (CNNs) have been widely used in computer vision in recent years. However, the huge amount of computation involved in CNN training limits its application on embedded devices. To solve the dilemma, this paper proposes an FPGA-based reconfigurable CNN training accelerator. First, we explore the possibility of using the Winograd algorithm to accelerate convolutions. An input-aligned decomposable Winograd method is proposed that broadens the scope of the application of Winograd and simplifies the implementation of Winograd on a unified processing element. Second, we propose a reconfigurable training architecture consisting of a transposable Winograd processing element array that can perform different training phases with high parallelism under limited resource costs. A series of unified data transformation units are designed to support various Winograd operations. The hierarchical barrel shift networks work for flexible and complex data access without bank conflict. Evaluated on VGG16 and ResNet18, our method reduces multiplications up to $2.4times$ compared to conventional convolution. Additionally, our accelerator implemented on Alveo U200 achieves up to 918.57 GOPS in terms of throughput and shows a $3.18times$ improvement in resource efficiency over the prior art.
{"title":"An FPGA-Based Reconfigurable CNN Training Accelerator Using Decomposable Winograd","authors":"Hui Wang, Jinming Lu, Jun Lin, Zhongfeng Wang","doi":"10.1109/ISVLSI59464.2023.10238574","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238574","url":null,"abstract":"Convolutional neural networks (CNNs) have been widely used in computer vision in recent years. However, the huge amount of computation involved in CNN training limits its application on embedded devices. To solve the dilemma, this paper proposes an FPGA-based reconfigurable CNN training accelerator. First, we explore the possibility of using the Winograd algorithm to accelerate convolutions. An input-aligned decomposable Winograd method is proposed that broadens the scope of the application of Winograd and simplifies the implementation of Winograd on a unified processing element. Second, we propose a reconfigurable training architecture consisting of a transposable Winograd processing element array that can perform different training phases with high parallelism under limited resource costs. A series of unified data transformation units are designed to support various Winograd operations. The hierarchical barrel shift networks work for flexible and complex data access without bank conflict. Evaluated on VGG16 and ResNet18, our method reduces multiplications up to $2.4times$ compared to conventional convolution. Additionally, our accelerator implemented on Alveo U200 achieves up to 918.57 GOPS in terms of throughput and shows a $3.18times$ improvement in resource efficiency over the prior art.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124844022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238545
Sajjad Parvin, Mehran Goli, Thilo Krachenfels, Shahin Tajik, Jean-Pierre Seifert, Frank Sill, R. Drechsler
The insertion of a Hardware Trojan (HT) into a chip after the in-house layout design is outsourced to a chip manufacturer for fabrication is a major concern, especially for mission-critical applications. While several HT detection methods have been developed based on side-channel analysis and physical measurements to overcome this problem, there exist stealthy analog HTs, i.e., capacitive and dopant-level HTs, which have negligible or even zero overhead on the chip. Thus, these stealthy HTs cannot be detected using the aforementioned methods. In this work, we propose a novel analytical approach to detect these Layout-level Analog Trojans (LAT). Our proposed method uses an extension of Optical Probing (OP) for LAT detection, namely, the Laser Logic State Imaging (LLSI) technique. In principle, to detect LATs using LLSI, we only need the golden design and not a golden chip, which is not typically available. As we take advantage of LLSI to detect HTs, our approach is non-invasive, less costly, and scalable to larger designs. We report experimental results on a malicious RISC-V to demonstrate the effectiveness of our approach in detecting LATs.
{"title":"LAT-UP: Exposing Layout-Level Analog Hardware Trojans Using Contactless Optical Probing","authors":"Sajjad Parvin, Mehran Goli, Thilo Krachenfels, Shahin Tajik, Jean-Pierre Seifert, Frank Sill, R. Drechsler","doi":"10.1109/ISVLSI59464.2023.10238545","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238545","url":null,"abstract":"The insertion of a Hardware Trojan (HT) into a chip after the in-house layout design is outsourced to a chip manufacturer for fabrication is a major concern, especially for mission-critical applications. While several HT detection methods have been developed based on side-channel analysis and physical measurements to overcome this problem, there exist stealthy analog HTs, i.e., capacitive and dopant-level HTs, which have negligible or even zero overhead on the chip. Thus, these stealthy HTs cannot be detected using the aforementioned methods. In this work, we propose a novel analytical approach to detect these Layout-level Analog Trojans (LAT). Our proposed method uses an extension of Optical Probing (OP) for LAT detection, namely, the Laser Logic State Imaging (LLSI) technique. In principle, to detect LATs using LLSI, we only need the golden design and not a golden chip, which is not typically available. As we take advantage of LLSI to detect HTs, our approach is non-invasive, less costly, and scalable to larger designs. We report experimental results on a malicious RISC-V to demonstrate the effectiveness of our approach in detecting LATs.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129067296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238618
Sadia Anjum Tumpa, Sonali Singh, Md Fahim Faysal Khan, M. Kandemir, N. Vijaykrishnan, Chita R. Das
With the advances in IoT and edge-computing, Federated Learning is ever more popular as it offers data privacy. Low-power spiking neural networks (SNN) are ideal candidates for local nodes in such federated setup. Most prior works assume that the participating nodes have uniform compute resources, which may not be practical. In this work, we propose a federated SNN learning framework for a realistic heterogeneous environment, consisting of nodes with diverse memory-compute capabilities through activation-checkpointing and time-skipping that offers ~$4times$ reduction in effective memory requirement for low-memory nodes while improving the accuracy upto 10% for non-independent and identically-distributed data.
{"title":"Federated Learning with Spiking Neural Networks in Heterogeneous Systems","authors":"Sadia Anjum Tumpa, Sonali Singh, Md Fahim Faysal Khan, M. Kandemir, N. Vijaykrishnan, Chita R. Das","doi":"10.1109/ISVLSI59464.2023.10238618","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238618","url":null,"abstract":"With the advances in IoT and edge-computing, Federated Learning is ever more popular as it offers data privacy. Low-power spiking neural networks (SNN) are ideal candidates for local nodes in such federated setup. Most prior works assume that the participating nodes have uniform compute resources, which may not be practical. In this work, we propose a federated SNN learning framework for a realistic heterogeneous environment, consisting of nodes with diverse memory-compute capabilities through activation-checkpointing and time-skipping that offers ~$4times$ reduction in effective memory requirement for low-memory nodes while improving the accuracy upto 10% for non-independent and identically-distributed data.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115412850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238526
Omkar G. Ratnaparkhi, M. Rao
The proposed work adopts clustering method to approximate and segment normalized non-linear functions towards realizing arithmetic units such as divider, square-root, squarer, and inverse-of-squarer. The novel implementation of utilizing K-Means Clustering algorithm towards building an nonlinear partitions ensures the possibility of realizing arithmetic units with differed error characteristics as per the designers demand. In this paper IEEE half precision floating point format (fp16) is used to implement and validate novel arithmetic units. Improvement in accuracy was established for arithmetic unit with higher partitions, and conversely advantage in hardware metrics is achieved with lesser partitions. Maximum silicon footprint saving of 60.97%, and power benefit of 66.70%, were achieved for the proposed approximate divider over state-of-the-art (SOTA) dividers. The proposed square-rooter showed maximum footprint savings of 55.12% when compared with SOTA design. Besides, the proposed arithmetic functions especially dividers and square-rooters showed accelerated performance when compared with the respective SOTA implementations. The proposed cluster-wise approximation for computing designs, were validated for two of the image processing applications including color quantization, and edge detection. A maximum of 38.84% improvement in PSNR was realized using the proposed square-rooter designed sobel edge detection algorithm over its counterpart SOTA designed edge detector.
{"title":"CWAHA: Cluster-Wise Approximation for Hardware implementation of Arithmetic functions","authors":"Omkar G. Ratnaparkhi, M. Rao","doi":"10.1109/ISVLSI59464.2023.10238526","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238526","url":null,"abstract":"The proposed work adopts clustering method to approximate and segment normalized non-linear functions towards realizing arithmetic units such as divider, square-root, squarer, and inverse-of-squarer. The novel implementation of utilizing K-Means Clustering algorithm towards building an nonlinear partitions ensures the possibility of realizing arithmetic units with differed error characteristics as per the designers demand. In this paper IEEE half precision floating point format (fp16) is used to implement and validate novel arithmetic units. Improvement in accuracy was established for arithmetic unit with higher partitions, and conversely advantage in hardware metrics is achieved with lesser partitions. Maximum silicon footprint saving of 60.97%, and power benefit of 66.70%, were achieved for the proposed approximate divider over state-of-the-art (SOTA) dividers. The proposed square-rooter showed maximum footprint savings of 55.12% when compared with SOTA design. Besides, the proposed arithmetic functions especially dividers and square-rooters showed accelerated performance when compared with the respective SOTA implementations. The proposed cluster-wise approximation for computing designs, were validated for two of the image processing applications including color quantization, and edge detection. A maximum of 38.84% improvement in PSNR was realized using the proposed square-rooter designed sobel edge detection algorithm over its counterpart SOTA designed edge detector.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115731820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238558
Tiago Da Silva Almeida, Lucas Wanner
FPGA-based architectures have emerged as a versatile acceleration solution for various applications, aided by High-Level Synthesis (HLS) tools. For applications with some level of error resilience, the use of approximate logic components such as imprecise multipliers and adders can improve resource usage and energy efficiency. Nevertheless, these components must be carefully composed and combined to prevent error accumulation and to ensure that the application produces valid outputs. In this work, we explore approximate multiplier and adder designs used in Multiply-accumulate (MAC) operations for accelerators implemented in HLS, aiming to find combinations of components that can save power and resources while effectively mitigating errors in application outputs. We show that the best combinations of components can improve the Power Area Product (PAP) of a Sobel filter accelerator design by 36-49% compared to a precise design while limiting errors and maintaining an acceptable quality of results.
{"title":"Efficient Accelerator Design in High-Level Synthesis Using Approximate Logic Components","authors":"Tiago Da Silva Almeida, Lucas Wanner","doi":"10.1109/ISVLSI59464.2023.10238558","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238558","url":null,"abstract":"FPGA-based architectures have emerged as a versatile acceleration solution for various applications, aided by High-Level Synthesis (HLS) tools. For applications with some level of error resilience, the use of approximate logic components such as imprecise multipliers and adders can improve resource usage and energy efficiency. Nevertheless, these components must be carefully composed and combined to prevent error accumulation and to ensure that the application produces valid outputs. In this work, we explore approximate multiplier and adder designs used in Multiply-accumulate (MAC) operations for accelerators implemented in HLS, aiming to find combinations of components that can save power and resources while effectively mitigating errors in application outputs. We show that the best combinations of components can improve the Power Area Product (PAP) of a Sobel filter accelerator design by 36-49% compared to a precise design while limiting errors and maintaining an acceptable quality of results.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115839328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-20DOI: 10.1109/ISVLSI59464.2023.10238662
Priyanka Panigrahi, C. Karfa
Compiler optimization can be functionally correct but not secure. Register allocation (RA) is an essential optimization performed by a compiler. This paper analyzes the security threat of RA concerning information flow. We define the relative security between two programs with respect to information flow. According to our definition of relative security, we show that RA is secure when there is no splitting and spilling into memory. We also show that register allocation with splitting is also secure based on our attack model. Then, we show that RA can lead to information leaks during spilling as it introduces new leaks through memory. Further, our experimental results on various benchmarks show that RA in LLVM is actually leaky. To address this vulnerability, we propose a secure RA approach in LLVM that mitigates the risk of new leaks during spilling. Our experimental evaluation on various benchmarks shows the effectiveness of our proposed approach.
{"title":"An Investigation into the Security of Register Allocation with Spilling and Splitting","authors":"Priyanka Panigrahi, C. Karfa","doi":"10.1109/ISVLSI59464.2023.10238662","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238662","url":null,"abstract":"Compiler optimization can be functionally correct but not secure. Register allocation (RA) is an essential optimization performed by a compiler. This paper analyzes the security threat of RA concerning information flow. We define the relative security between two programs with respect to information flow. According to our definition of relative security, we show that RA is secure when there is no splitting and spilling into memory. We also show that register allocation with splitting is also secure based on our attack model. Then, we show that RA can lead to information leaks during spilling as it introduces new leaks through memory. Further, our experimental results on various benchmarks show that RA in LLVM is actually leaky. To address this vulnerability, we propose a secure RA approach in LLVM that mitigates the risk of new leaks during spilling. Our experimental evaluation on various benchmarks shows the effectiveness of our proposed approach.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116512278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}