This paper presents a novel hardware-software co-design consisting of a Processing in-Memory (PiM) architecture with embedded neural processing elements (NPE) that are highly reconfigurable. The PiM platform and proposed approximation strategies are employed for various image filtering applications while providing the user with fine-grain dynamic control over energy efficiency, precision, and throughput (EPT). The proposed co-design can change the Peak Signal to Noise Ratio (PSNR, output quality metric for image filtering applications) from 25dB to 50dB (acceptable PSNR range for image filtering applications) without incurring any extra cost in terms of energy or latency. While switching from accurate to approximate mode of computation in the proposed co-design, the maximum improvement in energy efficiency and throughput is 2X. However, the gains in energy efficiency against a MAC-based PE array with the proposed memory platform are 3X-6X. The corresponding improvements in throughput are 2.26X-4.52X, respectively.
{"title":"Tunable Precision Control for Approximate Image Filtering in an In-Memory Architecture with Embedded Neurons","authors":"Ayushi Dube, Ankit Wagle, G. Singh, S. Vrudhula","doi":"10.1145/3508352.3549385","DOIUrl":"https://doi.org/10.1145/3508352.3549385","url":null,"abstract":"This paper presents a novel hardware-software co-design consisting of a Processing in-Memory (PiM) architecture with embedded neural processing elements (NPE) that are highly reconfigurable. The PiM platform and proposed approximation strategies are employed for various image filtering applications while providing the user with fine-grain dynamic control over energy efficiency, precision, and throughput (EPT). The proposed co-design can change the Peak Signal to Noise Ratio (PSNR, output quality metric for image filtering applications) from 25dB to 50dB (acceptable PSNR range for image filtering applications) without incurring any extra cost in terms of energy or latency. While switching from accurate to approximate mode of computation in the proposed co-design, the maximum improvement in energy efficiency and throughput is 2X. However, the gains in energy efficiency against a MAC-based PE array with the proposed memory platform are 3X-6X. The corresponding improvements in throughput are 2.26X-4.52X, respectively.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127653229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Coarse-Grained Reconfigurable Architectures (CGRA) is a promising solution to accelerate domain applications due to its good combination of energy-efficiency and flexibility. Loops, as computation-intensive parts of applications, are often mapped onto CGRA and modulo scheduling is commonly used to improve the execution performance. However, the actual performance using modulo scheduling is highly dependent on the mapping ability of the Data Dependency Graph (DDG) extracted from a loop. As existing approaches usually separate routing exploration of multi-cycle dependence from mapping for fast compilation, they may easily suffer from poor mapping quality. In this paper, we integrate the routing explorations into the mapping process and make it have more opportunities to find a globally optimized solution. Meanwhile, with a reduced resource graph defined, the searching space of the new mapping problem is not greatly increased. To efficiently solve the problem, we introduce graph neural network based reinforcement learning to predict a placement distribution over different resource nodes for all operations in a DDG. Using the routing connectivity as the reward signal, we optimize the parameters of neural network to find a valid mapping solution with a policy gradient method. Without much engineering and heuristic designing, our approach achieves 1.57× mapping quality, as compared to the state-of-the-art heuristic.
{"title":"Towards High-Quality CGRA Mapping with Graph Neural Networks and Reinforcement Learning","authors":"Yan Zhuang, Zhihao Zhang, Dajiang Liu","doi":"10.1145/3508352.3549458","DOIUrl":"https://doi.org/10.1145/3508352.3549458","url":null,"abstract":"Coarse-Grained Reconfigurable Architectures (CGRA) is a promising solution to accelerate domain applications due to its good combination of energy-efficiency and flexibility. Loops, as computation-intensive parts of applications, are often mapped onto CGRA and modulo scheduling is commonly used to improve the execution performance. However, the actual performance using modulo scheduling is highly dependent on the mapping ability of the Data Dependency Graph (DDG) extracted from a loop. As existing approaches usually separate routing exploration of multi-cycle dependence from mapping for fast compilation, they may easily suffer from poor mapping quality. In this paper, we integrate the routing explorations into the mapping process and make it have more opportunities to find a globally optimized solution. Meanwhile, with a reduced resource graph defined, the searching space of the new mapping problem is not greatly increased. To efficiently solve the problem, we introduce graph neural network based reinforcement learning to predict a placement distribution over different resource nodes for all operations in a DDG. Using the routing connectivity as the reward signal, we optimize the parameters of neural network to find a valid mapping solution with a policy gradient method. Without much engineering and heuristic designing, our approach achieves 1.57× mapping quality, as compared to the state-of-the-art heuristic.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114394819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In advanced packages, redistribution layers (RDLs) are extra metal layers for high interconnections among the chips and printed circuit board (PCB). To better utilize the routing resources of RDLs, published works adopted flexible vias such that they can place the vias everywhere. Furthermore, some regions may be blocked for signal integrity protection or manually prerouted nets (such as power/ground nets or feeding lines of antennas) to achieve higher performance. These blocked regions will be treated as obstacles in the routing process. Since the positions of pads, obstacles, and vias can be arbitrary, the structures of RDLs become irregular. The obstacles and irregular structures substantially increase the difficulty of the routing process. This paper proposes a three-stage algorithm: First, the layout is partitioned by a method based on constrained Delaunay triangulation (CDT). Then we present a global routing graph model and generate routing guides for unified-assignment netlists. Finally, a novel tile routing method is developed to obtain detailed routes. Experiment results demonstrate the robustness and effectiveness of our proposed algorithm.
{"title":"Obstacle-Avoiding Multiple Redistribution Layer Routing with Irregular Structures*","authors":"Yen-Ting Chen, Yao-Wen Chang","doi":"10.1145/3508352.3549419","DOIUrl":"https://doi.org/10.1145/3508352.3549419","url":null,"abstract":"In advanced packages, redistribution layers (RDLs) are extra metal layers for high interconnections among the chips and printed circuit board (PCB). To better utilize the routing resources of RDLs, published works adopted flexible vias such that they can place the vias everywhere. Furthermore, some regions may be blocked for signal integrity protection or manually prerouted nets (such as power/ground nets or feeding lines of antennas) to achieve higher performance. These blocked regions will be treated as obstacles in the routing process. Since the positions of pads, obstacles, and vias can be arbitrary, the structures of RDLs become irregular. The obstacles and irregular structures substantially increase the difficulty of the routing process. This paper proposes a three-stage algorithm: First, the layout is partitioned by a method based on constrained Delaunay triangulation (CDT). Then we present a global routing graph model and generate routing guides for unified-assignment netlists. Finally, a novel tile routing method is developed to obtain detailed routes. Experiment results demonstrate the robustness and effectiveness of our proposed algorithm.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115295626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amin Rezaei, Raheel Afsharmazayejani, Jordan Maynard
Hardware IP owners must envision procedures to avoid piracy and overproduction of their designs under a fabless paradigm. A newly proposed technique to obfuscate critical components in a logic design is called eFPGA-based redaction, which replaces a sensitive sub-circuit with an embedded FPGA, and the eFPGA is configured to perform the same functionality as the missing sub-circuit. In this case, the configuration bitstream acts as a hidden key only known to the hardware IP owner. In this paper, we first evaluate the security promise of the existing eFPGA-based redaction algorithms as a preliminary study. Then, we break eFPGA-based redaction schemes by an initial but not necessarily efficient attack named DIP Exclusion that excludes problematic input patterns from checking in a brute-force manner. Finally, by combining cycle breaking and unrolling, we propose a novel and powerful attack called Break & Unroll that is able to recover the bitstream of state-of-the-art eFPGA-based redaction schemes in a relatively short time even with the existence of hard cycles and large size keys. This study reveals that the common perception that eFPGA-based redaction is by default secure against oracle-guided attacks, is prejudice. It also shows that additional research on how to systematically create an exponential number of non-combinational hard cycles is required to secure eFPGA-based redaction schemes.
{"title":"Evaluating the Security of eFPGA-based Redaction Algorithms","authors":"Amin Rezaei, Raheel Afsharmazayejani, Jordan Maynard","doi":"10.1145/3508352.3549425","DOIUrl":"https://doi.org/10.1145/3508352.3549425","url":null,"abstract":"Hardware IP owners must envision procedures to avoid piracy and overproduction of their designs under a fabless paradigm. A newly proposed technique to obfuscate critical components in a logic design is called eFPGA-based redaction, which replaces a sensitive sub-circuit with an embedded FPGA, and the eFPGA is configured to perform the same functionality as the missing sub-circuit. In this case, the configuration bitstream acts as a hidden key only known to the hardware IP owner. In this paper, we first evaluate the security promise of the existing eFPGA-based redaction algorithms as a preliminary study. Then, we break eFPGA-based redaction schemes by an initial but not necessarily efficient attack named DIP Exclusion that excludes problematic input patterns from checking in a brute-force manner. Finally, by combining cycle breaking and unrolling, we propose a novel and powerful attack called Break & Unroll that is able to recover the bitstream of state-of-the-art eFPGA-based redaction schemes in a relatively short time even with the existence of hard cycles and large size keys. This study reveals that the common perception that eFPGA-based redaction is by default secure against oracle-guided attacks, is prejudice. It also shows that additional research on how to systematically create an exponential number of non-combinational hard cycles is required to secure eFPGA-based redaction schemes.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123083049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Chiu, Xin-Ping Chen, Jennifer Shueh-Inn Hu, C. Li
The demand for low-power, high-performance neuromorphic chips is increasing. However, conventional testing is not applicable to neuromorphic chips due to three reasons: (1) lack of scan DfT, (2) stochastic characteristic, and (3) configurable functionality. In this paper, we present an automatic test configuration and pattern generation (ATCPG) method for testing a configurable stochastic neuromorphic chip without using scan DfT. We use machine learning to generate test configurations. Then, we apply a modified fast gradient sign method to generate test patterns. Finally, we determine test repetitions with statistical power of test. We conduct experiments on one of the neuromorphic architectures, spiking neural network, to evaluate the effectiveness of our ATCPG. The experimental results show that our ATCPG can achieve 100% fault coverage for the five fault models we use. For testing a 3-layer model at 0.05 significant level, we produce 5 test configurations and 67 test patterns. The average test repetitions of neuron faults and synapse faults are 2,124 and 4,557, respectively. Besides, our simulation results show that the overkill matched our significance level perfectly.
{"title":"Automatic Test Configuration and Pattern Generation (ATCPG) for Neuromorphic Chips","authors":"I. Chiu, Xin-Ping Chen, Jennifer Shueh-Inn Hu, C. Li","doi":"10.1145/3508352.3549422","DOIUrl":"https://doi.org/10.1145/3508352.3549422","url":null,"abstract":"The demand for low-power, high-performance neuromorphic chips is increasing. However, conventional testing is not applicable to neuromorphic chips due to three reasons: (1) lack of scan DfT, (2) stochastic characteristic, and (3) configurable functionality. In this paper, we present an automatic test configuration and pattern generation (ATCPG) method for testing a configurable stochastic neuromorphic chip without using scan DfT. We use machine learning to generate test configurations. Then, we apply a modified fast gradient sign method to generate test patterns. Finally, we determine test repetitions with statistical power of test. We conduct experiments on one of the neuromorphic architectures, spiking neural network, to evaluate the effectiveness of our ATCPG. The experimental results show that our ATCPG can achieve 100% fault coverage for the five fault models we use. For testing a 3-layer model at 0.05 significant level, we produce 5 test configurations and 67 test patterns. The average test repetitions of neuron faults and synapse faults are 2,124 and 4,557, respectively. Besides, our simulation results show that the overkill matched our significance level perfectly.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122121026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rongjian Liang, Hua Xiang, Jinwook Jung, Jiang Hu, Gi-Joon Nam
Deep learning is a promising approach to early DRV (Design Rule Violation) prediction. However, non-deterministic parallel routing hampers model training and degrades prediction accuracy. In this work, we propose a stochastic approach, called LGC-Net, to solve this problem. In this approach, we develop new techniques of Gaussian random field layer and focal likelihood loss function to seamlessly integrate Log Gaussian Cox process with deep learning. This approach provides not only statistical regression results but also classification ones with different thresholds without retraining. Experimental results with noisy training data on industrial designs demonstrate that LGC-Net achieves significantly better accuracy of DRV density prediction than prior arts.
{"title":"A Stochastic Approach to Handle Non-Determinism in Deep Learning-Based Design Rule Violation Predictions","authors":"Rongjian Liang, Hua Xiang, Jinwook Jung, Jiang Hu, Gi-Joon Nam","doi":"10.1145/3508352.3549347","DOIUrl":"https://doi.org/10.1145/3508352.3549347","url":null,"abstract":"Deep learning is a promising approach to early DRV (Design Rule Violation) prediction. However, non-deterministic parallel routing hampers model training and degrades prediction accuracy. In this work, we propose a stochastic approach, called LGC-Net, to solve this problem. In this approach, we develop new techniques of Gaussian random field layer and focal likelihood loss function to seamlessly integrate Log Gaussian Cox process with deep learning. This approach provides not only statistical regression results but also classification ones with different thresholds without retraining. Experimental results with noisy training data on industrial designs demonstrate that LGC-Net achieves significantly better accuracy of DRV density prediction than prior arts.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126277495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zishen Wan, Karthik Swaminathan, Pin-Yu Chen, Nandhini Chandramoorthy, A. Raychowdhury
Autonomous systems have reached a tipping point, with a myriad of self-driving cars, unmanned aerial vehicles (UAVs), and robots being widely applied and revolutionizing new applications. The continuous deployment of autonomous systems reveals the need for designs that facilitate increased resiliency and safety. The ability of an autonomous system to tolerate, or mitigate against errors, such as environmental conditions, sensor, hardware and software faults, and adversarial attacks, is essential to ensure its functional safety. Application-aware resilience metrics, holistic fault analysis frameworks, and lightweight fault mitigation techniques are being proposed for accurate and effective resilience and robustness assessment and improvement. This paper explores the origination of fault sources across the computing stack of autonomous systems, discusses the various fault impacts and fault mitigation techniques of different scales of autonomous systems, and concludes with challenges and opportunities for assessing and building next-generation resilient and robust autonomous systems.
{"title":"Analyzing and Improving Resilience and Robustness of Autonomous Systems (Invited Paper)","authors":"Zishen Wan, Karthik Swaminathan, Pin-Yu Chen, Nandhini Chandramoorthy, A. Raychowdhury","doi":"10.1145/3508352.3561111","DOIUrl":"https://doi.org/10.1145/3508352.3561111","url":null,"abstract":"Autonomous systems have reached a tipping point, with a myriad of self-driving cars, unmanned aerial vehicles (UAVs), and robots being widely applied and revolutionizing new applications. The continuous deployment of autonomous systems reveals the need for designs that facilitate increased resiliency and safety. The ability of an autonomous system to tolerate, or mitigate against errors, such as environmental conditions, sensor, hardware and software faults, and adversarial attacks, is essential to ensure its functional safety. Application-aware resilience metrics, holistic fault analysis frameworks, and lightweight fault mitigation techniques are being proposed for accurate and effective resilience and robustness assessment and improvement. This paper explores the origination of fault sources across the computing stack of autonomous systems, discusses the various fault impacts and fault mitigation techniques of different scales of autonomous systems, and concludes with challenges and opportunities for assessing and building next-generation resilient and robust autonomous systems.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130547189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamical Decoupling (DD)-based protocols have been shown to reduce the idling errors encountered in quantum circuits. However, the current research in suppressing idling qubit errors suffers from scalability issues due to the large number of tuning quantum circuits that should be executed first to find the locations of the DD sequences in the target quantum circuit, which boost the output state fidelity. This process becomes tedious as the size of the quantum circuit increases. To address this challenge, we propose a Graph Neural Network (GNN) framework, which mitigates idling errors through an efficient insertion of DD sequences into quantum circuits by modeling their impact at different idle qubit windows. Our paper targets maximizing the benefit of DD sequences using a limited number of tuning circuits. We propose to classify the idle qubit windows into critical and non-critical (benign) windows using a data-driven reliability model. Our results obtained from IBM Lagos quantum computer show that our proposed GNN models, which determine the locations of DD sequences in the quantum circuits, significantly improve the output state fidelity by a factor of 1.4x on average and up to 2.6x compared to the adaptive DD approach, which searches for the best locations of DD sequences at run-time.
{"title":"Graph Neural Networks for Idling Error Mitigation","authors":"Vedika Servanan, S. Saeed","doi":"10.1145/3508352.3549444","DOIUrl":"https://doi.org/10.1145/3508352.3549444","url":null,"abstract":"Dynamical Decoupling (DD)-based protocols have been shown to reduce the idling errors encountered in quantum circuits. However, the current research in suppressing idling qubit errors suffers from scalability issues due to the large number of tuning quantum circuits that should be executed first to find the locations of the DD sequences in the target quantum circuit, which boost the output state fidelity. This process becomes tedious as the size of the quantum circuit increases. To address this challenge, we propose a Graph Neural Network (GNN) framework, which mitigates idling errors through an efficient insertion of DD sequences into quantum circuits by modeling their impact at different idle qubit windows. Our paper targets maximizing the benefit of DD sequences using a limited number of tuning circuits. We propose to classify the idle qubit windows into critical and non-critical (benign) windows using a data-driven reliability model. Our results obtained from IBM Lagos quantum computer show that our proposed GNN models, which determine the locations of DD sequences in the quantum circuits, significantly improve the output state fidelity by a factor of 1.4x on average and up to 2.6x compared to the adaptive DD approach, which searches for the best locations of DD sequences at run-time.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128868935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eviction-based cache side-channel attacks take advantage of inclusive cache hierarchies and shared cache hardware. Processors with the template ARM big.LITTLE architecture do not guarantee such preconditions and therefore will not usually allow cross-core attacks let alone cross-cluster attacks. This work reveals a new side-channel based on the snoop filter (SF), an unexplored directory structure embedded in template ARM big.LITTLE processors. Our systematic reverse engineering unveils the undocumented structure and property of the SF, and we successfully utilize it to bootstrap cross-core and cross-cluster cache eviction. We demonstrate a comprehensive methodology to exploit the SF side-channel, including the construction of eviction sets, the covert channel, and attacks against RSA and AES. When attacking TrustZone, we conduct an interrupt-based side-channel attack to extract the key of RSA by a single profiling trace, despite the strict cache clean defense. Supported by detailed experiments, the SF side-channel not only achieves competitive performance but also overcomes the main challenge of cache side-channel attacks on ARM big.LITTLE processors.
{"title":"Attack Directories on ARM big.LITTLE Processors","authors":"Zili Kou, Sharad Sinha, Wenjian He, W. Zhang","doi":"10.1145/3508352.3549340","DOIUrl":"https://doi.org/10.1145/3508352.3549340","url":null,"abstract":"Eviction-based cache side-channel attacks take advantage of inclusive cache hierarchies and shared cache hardware. Processors with the template ARM big.LITTLE architecture do not guarantee such preconditions and therefore will not usually allow cross-core attacks let alone cross-cluster attacks. This work reveals a new side-channel based on the snoop filter (SF), an unexplored directory structure embedded in template ARM big.LITTLE processors. Our systematic reverse engineering unveils the undocumented structure and property of the SF, and we successfully utilize it to bootstrap cross-core and cross-cluster cache eviction. We demonstrate a comprehensive methodology to exploit the SF side-channel, including the construction of eviction sets, the covert channel, and attacks against RSA and AES. When attacking TrustZone, we conduct an interrupt-based side-channel attack to extract the key of RSA by a single profiling trace, despite the strict cache clean defense. Supported by detailed experiments, the SF side-channel not only achieves competitive performance but also overcomes the main challenge of cache side-channel attacks on ARM big.LITTLE processors.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121031571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinshi Zang, Fangzhou Wang, Jinwei Liu, Martin D. F. Wong
Placement and routing are two crucial steps in the physical design of integrated circuits (ICs). To close the gap between placement and routing, the routing with cell movement problem has attracted great attention recently. In this problem, a certain number of cells can be moved to new positions and the nets can be rerouted to improve the total wire length. In this work, we advance the study on this problem by proposing a two-level layer-aware scheme, named ATLAS. A coarse-level cluster-based cell movement is first performed to optimize via usage and provides a better starting point for the next fine-level single cell movement. To further encourage routing on the upper metal layers, we utilize a set of adjusted layer weights to increase the routing cost on lower layers. Experimental results on the ICCAD 2020 contest benchmarks show that ATLAS achieves much more wire length reduction compared with the state-of-the-art routing with cell movement engine. Furthermore, applied on the ICCAD 2021 contest benchmarks, ATLAS outperforms the first place team of the contest with much better solution quality while being 3× faster.
{"title":"ATLAS: A Two-Level Layer-Aware Scheme for Routing with Cell Movement","authors":"Xinshi Zang, Fangzhou Wang, Jinwei Liu, Martin D. F. Wong","doi":"10.1145/3508352.3549470","DOIUrl":"https://doi.org/10.1145/3508352.3549470","url":null,"abstract":"Placement and routing are two crucial steps in the physical design of integrated circuits (ICs). To close the gap between placement and routing, the routing with cell movement problem has attracted great attention recently. In this problem, a certain number of cells can be moved to new positions and the nets can be rerouted to improve the total wire length. In this work, we advance the study on this problem by proposing a two-level layer-aware scheme, named ATLAS. A coarse-level cluster-based cell movement is first performed to optimize via usage and provides a better starting point for the next fine-level single cell movement. To further encourage routing on the upper metal layers, we utilize a set of adjusted layer weights to increase the routing cost on lower layers. Experimental results on the ICCAD 2020 contest benchmarks show that ATLAS achieves much more wire length reduction compared with the state-of-the-art routing with cell movement engine. Furthermore, applied on the ICCAD 2021 contest benchmarks, ATLAS outperforms the first place team of the contest with much better solution quality while being 3× faster.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122347264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}