Pub Date : 2024-01-22DOI: 10.1109/ASP-DAC58780.2024.10473881
Xu Cheng, Yuyang Ye, Guoqing He, Qianqian Song, Peng Cao
Statistical timing characterization for standard cell library poses significant challenge to accuracy and runtime cost. Prior analytical and machine learning-based methods neglect the profound influence induced by layout-dependent parasitic resistor and capacitor (RC) network in cell netlist as well as the timing correlation between topological structures of cells and process, voltage, and temperature (PVT) corners, resulting in tremendous simulation effort and/or poor accuracy. In this work, an accurate and efficient statistical cell timing library characterization framework is proposed based on heterogeneous graph attention network (HGAT) assisted with parasitic RC reduction approach, where the transistors and parasitic RC in cell are represented as heterogeneous nodes for graph learning and redundant RC nodes are removed to alleviate node imbalance issue and improve prediction accuracy. The proposed framework was validated with TSMC 22nm standard cells under multiple PVT corners to predict the standard deviation of cell delay with the error of 2.67% on average for all validated cells in terms of relative Root Mean Squared Error (rRMSE) with $3 times $ characterization runtime speedup, achieving $2.7 sim 6.9 times $ accuracy improvement compared with prior works. The predicted statistical timing libraries were further validated with ISCAS’89 benchmark circuits for statistical static timing analysis (SSTA), where the critical path delay at $3 sigma$ percentile point is reported with the average mismatch of $1.34 ps$ compared with foundry-provided library, showing $10.7 sim 14.5 times $ better accuracy than the competitive approaches.
{"title":"Heterogeneous Graph Attention Network Based Statistical Timing Library Characterization with Parasitic RC Reduction","authors":"Xu Cheng, Yuyang Ye, Guoqing He, Qianqian Song, Peng Cao","doi":"10.1109/ASP-DAC58780.2024.10473881","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473881","url":null,"abstract":"Statistical timing characterization for standard cell library poses significant challenge to accuracy and runtime cost. Prior analytical and machine learning-based methods neglect the profound influence induced by layout-dependent parasitic resistor and capacitor (RC) network in cell netlist as well as the timing correlation between topological structures of cells and process, voltage, and temperature (PVT) corners, resulting in tremendous simulation effort and/or poor accuracy. In this work, an accurate and efficient statistical cell timing library characterization framework is proposed based on heterogeneous graph attention network (HGAT) assisted with parasitic RC reduction approach, where the transistors and parasitic RC in cell are represented as heterogeneous nodes for graph learning and redundant RC nodes are removed to alleviate node imbalance issue and improve prediction accuracy. The proposed framework was validated with TSMC 22nm standard cells under multiple PVT corners to predict the standard deviation of cell delay with the error of 2.67% on average for all validated cells in terms of relative Root Mean Squared Error (rRMSE) with $3 times $ characterization runtime speedup, achieving $2.7 sim 6.9 times $ accuracy improvement compared with prior works. The predicted statistical timing libraries were further validated with ISCAS’89 benchmark circuits for statistical static timing analysis (SSTA), where the critical path delay at $3 sigma$ percentile point is reported with the average mismatch of $1.34 ps$ compared with foundry-provided library, showing $10.7 sim 14.5 times $ better accuracy than the competitive approaches.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"178 2","pages":"171-176"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140531166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22DOI: 10.1109/ASP-DAC58780.2024.10473887
Yuan-chun Luo, James Read, A. Lu, Shimeng Yu
Using non-volatile “capacitive” crossbar arrays for compute-in-memory (CIM) offers higher energy and area efficiency compared to “resistive” crossbar arrays. However, the impact of device-to-device (D2D) variation and temporal noise on the system-level performance has not been explored yet. In this work, we provide an end-to-end methodology that incorporates experimentally measured D2D variation into the design space exploration from capacitive weight cell design, CIM array with peripheral circuits, to the inference accuracy of SwinV2-T vision transformer and ResNet-50 on the ImageNet dataset. Our framework further assesses the system’s power, performance, and area (PPA) by considering cell design, circuit structure, and model selection. We explore the design space using an early stopping algorithm to produce optimal designs while meeting strict inference accuracy requirements. Overall findings suggest that the capacitive CIM system is robust against D2D variation and noise, outperforming its resistive counterpart by $6.95 times$ and $14.1 times$ for the optimal design in the figure of merit (TOPS/W $times {mathrm {TOPS}}/mathrm{mm}^{2}$) for ResNet-50 and SwinV2-T respectively.
{"title":"A Cross-layer Framework for Design Space and Variation Analysis of Non-Volatile Ferroelectric Capacitor-Based Compute-in-Memory Accelerators","authors":"Yuan-chun Luo, James Read, A. Lu, Shimeng Yu","doi":"10.1109/ASP-DAC58780.2024.10473887","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473887","url":null,"abstract":"Using non-volatile “capacitive” crossbar arrays for compute-in-memory (CIM) offers higher energy and area efficiency compared to “resistive” crossbar arrays. However, the impact of device-to-device (D2D) variation and temporal noise on the system-level performance has not been explored yet. In this work, we provide an end-to-end methodology that incorporates experimentally measured D2D variation into the design space exploration from capacitive weight cell design, CIM array with peripheral circuits, to the inference accuracy of SwinV2-T vision transformer and ResNet-50 on the ImageNet dataset. Our framework further assesses the system’s power, performance, and area (PPA) by considering cell design, circuit structure, and model selection. We explore the design space using an early stopping algorithm to produce optimal designs while meeting strict inference accuracy requirements. Overall findings suggest that the capacitive CIM system is robust against D2D variation and noise, outperforming its resistive counterpart by $6.95 times$ and $14.1 times$ for the optimal design in the figure of merit (TOPS/W $times {mathrm {TOPS}}/mathrm{mm}^{2}$) for ResNet-50 and SwinV2-T respectively.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"71 3-4","pages":"159-164"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-22DOI: 10.1109/ASP-DAC58780.2024.10473801
Ching-Yao Huang, Wai-Kei Mak
Traditionally, a standard cell library is composed of pre-designed cells all of which have identical height so that the cells can be placed in rows of uniform height on a chip. The desire to integrate more logic gates onto a single chip has led to a continuous reduction of row height with reduced number of routing tracks over the years. It has reached a point that not all cells can be designed with the minimum row height due to internal routability issue. Hybrid-row-height IC design with placement rows of different heights has emerged which offers a better sweet spot for performance and area optimization. [7] proposed the first row planning algorithm for hybrid-row-height design based on k-means clustering to determine the row configuration so that the cells in an initial placement can be moved to rows with matching height with as little cell displacement as possible. The biggest limitation of the k-means clustering method is that it only works for designs without any macros. Here we propose an effective and highly flexible dynamic programming approach to determine an optimized row configuration for designs with or without macros. The experimental results show that for designs without any macros, our approach resulted in 30.7% reduction in total cell displacement and 7.4% reduction in the final routed wirelength on average compared to the k-means clustering approach while satisfying the timing constraints. Additional experimental results show that our approach can comfortably handle designs with macros while satisfying the timing constraints.
{"title":"Row Planning and Placement for Hybrid-Row-Height Designs","authors":"Ching-Yao Huang, Wai-Kei Mak","doi":"10.1109/ASP-DAC58780.2024.10473801","DOIUrl":"https://doi.org/10.1109/ASP-DAC58780.2024.10473801","url":null,"abstract":"Traditionally, a standard cell library is composed of pre-designed cells all of which have identical height so that the cells can be placed in rows of uniform height on a chip. The desire to integrate more logic gates onto a single chip has led to a continuous reduction of row height with reduced number of routing tracks over the years. It has reached a point that not all cells can be designed with the minimum row height due to internal routability issue. Hybrid-row-height IC design with placement rows of different heights has emerged which offers a better sweet spot for performance and area optimization. [7] proposed the first row planning algorithm for hybrid-row-height design based on k-means clustering to determine the row configuration so that the cells in an initial placement can be moved to rows with matching height with as little cell displacement as possible. The biggest limitation of the k-means clustering method is that it only works for designs without any macros. Here we propose an effective and highly flexible dynamic programming approach to determine an optimized row configuration for designs with or without macros. The experimental results show that for designs without any macros, our approach resulted in 30.7% reduction in total cell displacement and 7.4% reduction in the final routed wirelength on average compared to the k-means clustering approach while satisfying the timing constraints. Additional experimental results show that our approach can comfortably handle designs with macros while satisfying the timing constraints.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"286 6-7","pages":"306-311"},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140530974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}