Energy efficiency has become the core issue of modern CPUs, and it is difficult for existing power models to balance speed, generality, and accuracy. This paper introduces McPAT-Calib, a microarchitecture power modeling framework, which combines McPAT with machine learning (ML) calibration methods. McPAT-Calib can quickly and accurately estimate the power of different benchmarks running on different CPU configurations, and provide an effective evaluation tool for the design of modern CPUs. First, McPAT-7nm is introduced to support the analytical power modeling for the 7nm technology node. Then, a wide range of modeling features are identified, and automatic feature selection and advanced regression methods are used to calibrate the McPAT-7nm modeling results, which greatly improves the generality and accuracy. Moreover, a sampling algorithm based on active learning (AL) is leveraged to effectively reduce the labeling cost. We use up to 15 configurations of 7nm RISC-V Berkeley Out-of-Order Machine (BOOM) along with 80 benchmarks to extensively evaluate the proposed framework. Compared with state-of-the-art microarchitecture power models, McPAT-Calib can reduce the mean absolute percentage error (MAPE) of shuffle-split cross-validation by 5.95%. More importantly, the MAPE is reduced by 6.14% and 3.64% for the evaluations of unknown CPU configurations and benchmarks, respectively. The AL sampling algorithm can reduce the demand of labeled samples by 50 %, while the accuracy loss is only 0.44 %.
{"title":"McPAT-Calib: A Microarchitecture Power Modeling Framework for Modern CPUs","authors":"Jianwang Zhai, Chen Bai, Binwu Zhu, Yici Cai, Qiang Zhou, Bei Yu","doi":"10.1109/ICCAD51958.2021.9643508","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643508","url":null,"abstract":"Energy efficiency has become the core issue of modern CPUs, and it is difficult for existing power models to balance speed, generality, and accuracy. This paper introduces McPAT-Calib, a microarchitecture power modeling framework, which combines McPAT with machine learning (ML) calibration methods. McPAT-Calib can quickly and accurately estimate the power of different benchmarks running on different CPU configurations, and provide an effective evaluation tool for the design of modern CPUs. First, McPAT-7nm is introduced to support the analytical power modeling for the 7nm technology node. Then, a wide range of modeling features are identified, and automatic feature selection and advanced regression methods are used to calibrate the McPAT-7nm modeling results, which greatly improves the generality and accuracy. Moreover, a sampling algorithm based on active learning (AL) is leveraged to effectively reduce the labeling cost. We use up to 15 configurations of 7nm RISC-V Berkeley Out-of-Order Machine (BOOM) along with 80 benchmarks to extensively evaluate the proposed framework. Compared with state-of-the-art microarchitecture power models, McPAT-Calib can reduce the mean absolute percentage error (MAPE) of shuffle-split cross-validation by 5.95%. More importantly, the MAPE is reduced by 6.14% and 3.64% for the evaluations of unknown CPU configurations and benchmarks, respectively. The AL sampling algorithm can reduce the demand of labeled samples by 50 %, while the accuracy loss is only 0.44 %.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115307231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643504
Guannan Guo, Tsung-Wei Huang, Yibo Lin, Martin D. F. Wong
Path-based Analysis (PBA) is a pivotal step in Static Timing Analysis (STA) for reducing slack pessimism and improving quality of results. Optimization flows often invoke PBA repeatedly with different critical path constraints to verify correct timing behavior under certain logic cone. However, PBA is extremely time consuming and state-of-the-art PBA algorithms are hardly scaled beyond a few CPU threads under constrained search space. In order to achieve new performance milestone, in this work, we propose a new GPU-accelerated PBA algorithm which can handle extensive path constraints and quickly report arbitrary number of critical paths in constrained search space. Experimental results show that our algorithm can generated identical path report and achieve up to 102x speed up on a million-gate design compared to the state-of-the-art algorithm.
{"title":"GPU-accelerated Critical Path Generation with Path Constraints","authors":"Guannan Guo, Tsung-Wei Huang, Yibo Lin, Martin D. F. Wong","doi":"10.1109/ICCAD51958.2021.9643504","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643504","url":null,"abstract":"Path-based Analysis (PBA) is a pivotal step in Static Timing Analysis (STA) for reducing slack pessimism and improving quality of results. Optimization flows often invoke PBA repeatedly with different critical path constraints to verify correct timing behavior under certain logic cone. However, PBA is extremely time consuming and state-of-the-art PBA algorithms are hardly scaled beyond a few CPU threads under constrained search space. In order to achieve new performance milestone, in this work, we propose a new GPU-accelerated PBA algorithm which can handle extensive path constraints and quickly report arbitrary number of critical paths in constrained search space. Experimental results show that our algorithm can generated identical path report and achieve up to 102x speed up on a million-gate design compared to the state-of-the-art algorithm.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115657261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643551
Shohidul Islam, Ihsen Alouani, Khaled N. Khasawneh
Deep neural networks (DNNs) are shown to be vulnerable to adversarial attacks—carefully crafted additive noise that undermines DNNs integrity. Previously proposed defenses against these attacks require substantial overheads, making it challenging to deploy these solutions in power and computational resource-constrained devices, such as embedded systems and the Edge. In this paper, we explore the use of voltage over-scaling (VOS) as a lightweight defense against adversarial attacks. Specifically, we exploit the stochastic timing violations of VOS to implement a moving-target defense for DNNs. Our experimental results demonstrate that VOS guarantees effective defense against different attack methods, does not require any software/hardware modifications, and offers a by-product reduction in power consumption.
{"title":"Lower Voltage for Higher Security: Using Voltage Overscaling to Secure Deep Neural Networks","authors":"Shohidul Islam, Ihsen Alouani, Khaled N. Khasawneh","doi":"10.1109/ICCAD51958.2021.9643551","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643551","url":null,"abstract":"Deep neural networks (DNNs) are shown to be vulnerable to adversarial attacks—carefully crafted additive noise that undermines DNNs integrity. Previously proposed defenses against these attacks require substantial overheads, making it challenging to deploy these solutions in power and computational resource-constrained devices, such as embedded systems and the Edge. In this paper, we explore the use of voltage over-scaling (VOS) as a lightweight defense against adversarial attacks. Specifically, we exploit the stochastic timing violations of VOS to implement a moving-target defense for DNNs. Our experimental results demonstrate that VOS guarantees effective defense against different attack methods, does not require any software/hardware modifications, and offers a by-product reduction in power consumption.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126165627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The tidal waves of modern electronic/electrical devices have led to increasing demands for ubiquitous application-specific power converters. A conventional manual design procedure of such power converters is computation- and labor-intensive, which involves selecting and connecting component devices, tuning component-wise parameters and control schemes, and iteratively evaluating and optimizing the design. To automate and speed up this design process, we propose an automatic framework that designs custom power converters from design specifications using reinforcement learning. Specifically, the framework embraces upper-confidence-bound-tree-based (UCT-based) reinforcement learning to automate topology space exploration with circuit design specification-encoded reward signals. Moreover, our UCT-based approach can exploit small offline data via the specially designed default policy to accelerate topology space exploration. Further, it utilizes a hybrid circuit evaluation strategy to substantially reduces design evaluation costs. Empirically, we demonstrated that our framework could generate energy-efficient circuit topologies for various target voltage conversion ratios. Compared to existing automatic topology optimization strategies, the proposed method is much more computationally efficient - it can generate topologies with the same quality while being up to 67% faster. Additionally, we discussed some interesting circuits discovered by our framework.
{"title":"From Specification to Topology: Automatic Power Converter Design via Reinforcement Learning","authors":"Shaoze Fan, N. Cao, Shun Zhang, Jing Li, Xiaoxiao Guo, Xin Zhang","doi":"10.1109/ICCAD51958.2021.9643552","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643552","url":null,"abstract":"The tidal waves of modern electronic/electrical devices have led to increasing demands for ubiquitous application-specific power converters. A conventional manual design procedure of such power converters is computation- and labor-intensive, which involves selecting and connecting component devices, tuning component-wise parameters and control schemes, and iteratively evaluating and optimizing the design. To automate and speed up this design process, we propose an automatic framework that designs custom power converters from design specifications using reinforcement learning. Specifically, the framework embraces upper-confidence-bound-tree-based (UCT-based) reinforcement learning to automate topology space exploration with circuit design specification-encoded reward signals. Moreover, our UCT-based approach can exploit small offline data via the specially designed default policy to accelerate topology space exploration. Further, it utilizes a hybrid circuit evaluation strategy to substantially reduces design evaluation costs. Empirically, we demonstrated that our framework could generate energy-efficient circuit topologies for various target voltage conversion ratios. Compared to existing automatic topology optimization strategies, the proposed method is much more computationally efficient - it can generate topologies with the same quality while being up to 67% faster. Additionally, we discussed some interesting circuits discovered by our framework.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116562218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643520
Weilin Luo, Hai Wan, Hongzhen Zhong, Ou Wei, Biqing Fang, Xiaotong Song
Prime compilation aims to generate all prime implicates/implicants of a Boolean formula. Recently, prime compilation of non-clausal formulae has received great attention. Since it is hard for $Sigma_{2}^{P}$, existing methods have performance issues. We argue that the main performance bottleneck stems from enlarging the search space using dual rail (DR) encoding, and computing a minimal clausal formula as a by-product. To deal with the issue, we propose a two-phase approach, namely CoAPI, for prime compilation of non-clausal formulae. Thanks to the two-phase framework, we construct a clausal formula without using DR encoding. In addition, to improve performance, the key in our work is a novel bounded prime extraction (BPE) method that, interleaving extracting prime implicates with extracting small implicates, enables constructing a succinct clausal formula rather than a minimal one. Following the assessment way of the state-of-the-art (SOTA) work, we show that CoAPI achieves SOTA performance. Particularly, for generating all prime implicates, CoAPI is up to about one order of magnitude faster. Moreover, we evaluate CoAPI on a benchmark sourcing from real-world industries. The results also confirm the outperformance of CoAPI11Our code and benchmarks are publicly available at https://github.com/LuoWeiLinWillam/CoAPI.
{"title":"An Efficient Two-phase Method for Prime Compilation of Non-clausal Boolean Formulae","authors":"Weilin Luo, Hai Wan, Hongzhen Zhong, Ou Wei, Biqing Fang, Xiaotong Song","doi":"10.1109/ICCAD51958.2021.9643520","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643520","url":null,"abstract":"Prime compilation aims to generate all prime implicates/implicants of a Boolean formula. Recently, prime compilation of non-clausal formulae has received great attention. Since it is hard for $Sigma_{2}^{P}$, existing methods have performance issues. We argue that the main performance bottleneck stems from enlarging the search space using dual rail (DR) encoding, and computing a minimal clausal formula as a by-product. To deal with the issue, we propose a two-phase approach, namely CoAPI, for prime compilation of non-clausal formulae. Thanks to the two-phase framework, we construct a clausal formula without using DR encoding. In addition, to improve performance, the key in our work is a novel bounded prime extraction (BPE) method that, interleaving extracting prime implicates with extracting small implicates, enables constructing a succinct clausal formula rather than a minimal one. Following the assessment way of the state-of-the-art (SOTA) work, we show that CoAPI achieves SOTA performance. Particularly, for generating all prime implicates, CoAPI is up to about one order of magnitude faster. Moreover, we evaluate CoAPI on a benchmark sourcing from real-world industries. The results also confirm the outperformance of CoAPI11Our code and benchmarks are publicly available at https://github.com/LuoWeiLinWillam/CoAPI.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117023262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643464
Guojin Chen, Ziyang Yu, Hongduo Liu, Yuzhe Ma, Bei Yu
With the feature size continuously shrinking in advanced technology nodes, mask optimization is increasingly crucial in the conventional design flow, accompanied by an explosive growth in prohibitive computational overhead in optical proximity correction (OPC) methods. Recently, inverse lithography technique (ILT) has drawn significant attention and is becoming prevalent in emerging OPC solutions. However, ILT methods are either time-consuming or in weak performance of mask printability and manufacturability. In this paper, we present DevelSet, a GPU and deep neural network (DNN) accelerated level set OPC framework for metal layer. We first improve the conventional level set-based ILT algorithm by introducing the curvature term to reduce mask complexity and applying GPU acceleration to overcome computational bottlenecks. To further enhance printability and fast iterative convergence, we propose a novel deep neural network delicately designed with level set intrinsic principles to facilitate the joint optimization of DNN and GPU accelerated level set optimizer. Experimental results show that DevelSet framework surpasses the state-of-the-art methods in printability and boost the runtime performance achieving instant level (around 1 second).
{"title":"DevelSet: Deep Neural Level Set for Instant Mask Optimization","authors":"Guojin Chen, Ziyang Yu, Hongduo Liu, Yuzhe Ma, Bei Yu","doi":"10.1109/ICCAD51958.2021.9643464","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643464","url":null,"abstract":"With the feature size continuously shrinking in advanced technology nodes, mask optimization is increasingly crucial in the conventional design flow, accompanied by an explosive growth in prohibitive computational overhead in optical proximity correction (OPC) methods. Recently, inverse lithography technique (ILT) has drawn significant attention and is becoming prevalent in emerging OPC solutions. However, ILT methods are either time-consuming or in weak performance of mask printability and manufacturability. In this paper, we present DevelSet, a GPU and deep neural network (DNN) accelerated level set OPC framework for metal layer. We first improve the conventional level set-based ILT algorithm by introducing the curvature term to reduce mask complexity and applying GPU acceleration to overcome computational bottlenecks. To further enhance printability and fast iterative convergence, we propose a novel deep neural network delicately designed with level set intrinsic principles to facilitate the joint optimization of DNN and GPU accelerated level set optimizer. Experimental results show that DevelSet framework surpasses the state-of-the-art methods in printability and boost the runtime performance achieving instant level (around 1 second).","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130558382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643509
Chengyu Zhang, Minquan Sun, Jianwen Li, Ting Su, G. Pu
We introduce Circuit Structure Mutation, a simple but effective mutation-based testing approach, for testing hardware model checkers. The key idea is to mutate the existing And-Inverter Graph (AIG) circuit by manipulating the relations among the components in the graph while preserving the validity of the mutant. Based on Circuit Structure Mutation, we implemented a feedback-guided testing tool named Hammer. In our evaluation, Hammer shows its effectiveness on finding bugs, increasing test coverage, and finding performance optimization chances, which can help the hardware model checker developers improve the reliability and the performance of their tools.
{"title":"Feedback-Guided Circuit Structure Mutation for Testing Hardware Model Checkers","authors":"Chengyu Zhang, Minquan Sun, Jianwen Li, Ting Su, G. Pu","doi":"10.1109/ICCAD51958.2021.9643509","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643509","url":null,"abstract":"We introduce Circuit Structure Mutation, a simple but effective mutation-based testing approach, for testing hardware model checkers. The key idea is to mutate the existing And-Inverter Graph (AIG) circuit by manipulating the relations among the components in the graph while preserving the validity of the mutant. Based on Circuit Structure Mutation, we implemented a feedback-guided testing tool named Hammer. In our evaluation, Hammer shows its effectiveness on finding bugs, increasing test coverage, and finding performance optimization chances, which can help the hardware model checker developers improve the reliability and the performance of their tools.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114773088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Functional ECO is an essential solution in the VLSI design flow. The technique is to realize the functional changes with a minimal patch netlist in the gate-level netlist. As the increasing of the design complexity, functional ECO becomes more and more difficult to generate a minimal patch. ICCAD 2021 CAD contest calls for a feasible and efficient ECO algorithm with behavioral change guidance. More than ordinary functional ECO problems, the RTL designs are provided. Contestants can utilize the behavioral change in RTL designs to minimize the patch for G1.
{"title":"2021 CAD Contest Problem A: Functional ECO with Behavioral Change Guidance Invited Paper","authors":"Yen-Chun Fang, Shao-Lun Huang, Chi-An Wu, Chung-Han Chou, Chih-Jen Hsu, WoeiTzy Jong, Kei-Yong Khoo","doi":"10.1109/ICCAD51958.2021.9643492","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643492","url":null,"abstract":"Functional ECO is an essential solution in the VLSI design flow. The technique is to realize the functional changes with a minimal patch netlist in the gate-level netlist. As the increasing of the design complexity, functional ECO becomes more and more difficult to generate a minimal patch. ICCAD 2021 CAD contest calls for a feasible and efficient ECO algorithm with behavioral change guidance. More than ordinary functional ECO problems, the RTL designs are provided. Contestants can utilize the behavioral change in RTL designs to minimize the patch for G1.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"2 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125966102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643434
Jilan Lin, Shuangchen Li, Yufei Ding, Yuan Xie
Graph processing participates a vital role in mining relational data. However, the intensive but inefficient memory accesses make graph processing applications severely bottlenecked by the conventional memory hierarchy. In this work, we focus on inefficiencies that exist on both on-chip cache and off-chip memory. First, graph processing is known dominated by expensive random accesses, which are difficult to be captured by conventional cache and prefetcher architectures, leading to low cache hits and exhausting main memory visits. Second, the off-chip bandwidth is further underutilized by the small data granularity. Because each vertex/edge data in the graph only needs 4-8B, which is much smaller than the memory access granularity of 64B. Thus, lots of bandwidth is wasted fetching unnecessary data. Therefore, we present G-MEM, a customized memory hierarchy design for graph processing applications. First, we propose a coherence-free scratchpad as the on-chip memory, which leverages the power-law characteristic of graphs and only stores those hot data that are frequent-accessed. We equip the scratchpad memory with a degree-aware mapping strategy to better manage it for various applications. On the other hand, we design an elastic-granularity DRAM (EG-DRAM) to facilitate the main memory access. The EG-DRAM is based on near-data processing architecture, which processes and coalesces multiple fine-grained memory accesses together to maximize bandwidth efficiency. Putting them together, the G-MEM demonstrates a 2.48 × overall speedup over a vanilla CPU, with 1.44 × and 1.79 × speedup against the state-of-the-art cache architecture and memory subsystem, respectively.
{"title":"Overcoming the Memory Hierarchy Inefficiencies in Graph Processing Applications","authors":"Jilan Lin, Shuangchen Li, Yufei Ding, Yuan Xie","doi":"10.1109/ICCAD51958.2021.9643434","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643434","url":null,"abstract":"Graph processing participates a vital role in mining relational data. However, the intensive but inefficient memory accesses make graph processing applications severely bottlenecked by the conventional memory hierarchy. In this work, we focus on inefficiencies that exist on both on-chip cache and off-chip memory. First, graph processing is known dominated by expensive random accesses, which are difficult to be captured by conventional cache and prefetcher architectures, leading to low cache hits and exhausting main memory visits. Second, the off-chip bandwidth is further underutilized by the small data granularity. Because each vertex/edge data in the graph only needs 4-8B, which is much smaller than the memory access granularity of 64B. Thus, lots of bandwidth is wasted fetching unnecessary data. Therefore, we present G-MEM, a customized memory hierarchy design for graph processing applications. First, we propose a coherence-free scratchpad as the on-chip memory, which leverages the power-law characteristic of graphs and only stores those hot data that are frequent-accessed. We equip the scratchpad memory with a degree-aware mapping strategy to better manage it for various applications. On the other hand, we design an elastic-granularity DRAM (EG-DRAM) to facilitate the main memory access. The EG-DRAM is based on near-data processing architecture, which processes and coalesces multiple fine-grained memory accesses together to maximize bandwidth efficiency. Putting them together, the G-MEM demonstrates a 2.48 × overall speedup over a vanilla CPU, with 1.44 × and 1.79 × speedup against the state-of-the-art cache architecture and memory subsystem, respectively.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134198954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643498
Subhajit Dutta Chowdhury, Kaixin Yang, P. Nuzzo
Reverse engineering an integrated circuit netlist is a powerful tool to help detect malicious logic and counteract design piracy. A critical challenge in this domain is the correct classification of data-path and control-logic registers in a design. We present ReIGNN, a novel learning-based register classification methodology that combines graph neural networks (GNNs) with structural analysis to classify the registers in a circuit with high accuracy and generalize well across different designs. GNNs are particularly effective in processing circuit netlists in terms of graphs and leveraging properties of the nodes and their neighborhoods to learn to efficiently discriminate between different types of nodes. Structural analysis can further rectify any registers misclassified as state registers by the GNN by analyzing strongly connected components in the netlist graph. Numerical results on a set of benchmarks show that ReIGNN can achieve, on average, 96.5% balanced accuracy and 97.7% sensitivity across different designs.
{"title":"ReIGNN: State Register Identification Using Graph Neural Networks for Circuit Reverse Engineering","authors":"Subhajit Dutta Chowdhury, Kaixin Yang, P. Nuzzo","doi":"10.1109/ICCAD51958.2021.9643498","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643498","url":null,"abstract":"Reverse engineering an integrated circuit netlist is a powerful tool to help detect malicious logic and counteract design piracy. A critical challenge in this domain is the correct classification of data-path and control-logic registers in a design. We present ReIGNN, a novel learning-based register classification methodology that combines graph neural networks (GNNs) with structural analysis to classify the registers in a circuit with high accuracy and generalize well across different designs. GNNs are particularly effective in processing circuit netlists in terms of graphs and leveraging properties of the nodes and their neighborhoods to learn to efficiently discriminate between different types of nodes. Structural analysis can further rectify any registers misclassified as state registers by the GNN by analyzing strongly connected components in the netlist graph. Numerical results on a set of benchmarks show that ReIGNN can achieve, on average, 96.5% balanced accuracy and 97.7% sensitivity across different designs.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134532990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}