Multi-FPGA system prototyping has become popular for modern VLSI logic verification, but such a system realization is often limited by its number of inter-FPGA connections. As a result, time-division multiplexing (TDM) is employed to accommodate more inter-FPGA signals than the connections in a multi-FPGA system. However, the inter-FPGA signal delay induced by TDM becomes significant due to time-multiplexing. Researchers have shown that TDM ratios (signal time-multiplexing ratios) significantly affect the performance of a multi-FPGA system and inter-FPGA routing highly influences the quality of this system. This paper presents a framework to minimize the system clock period for a system-level FPGA while considering the inter-FPGA routing topology and the timing criticality of nets. Our framework consists of two stages: (1) a distributed profiling scheme to generate the desired net-ordering and then alleviate the routing congestion, and (2) a net-/edge-based refinement to assign TDM ratios efficiently with a strict decrease in the ratios. Based on the 2019 CAD contest at ICCAD benchmarks and the contest evaluation metric with both quality and efficiency, experimental results show that our framework achieves the best overall score among all the participating teams and published works.
{"title":"Time-Division Multiplexing Based System-Level FPGA Routing","authors":"Wei-Kai Liu, Ming-Hung Chen, Chia-Ming Chang, Chen-Chia Chang, Yao-Wen Chang","doi":"10.1109/ICCAD51958.2021.9643558","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643558","url":null,"abstract":"Multi-FPGA system prototyping has become popular for modern VLSI logic verification, but such a system realization is often limited by its number of inter-FPGA connections. As a result, time-division multiplexing (TDM) is employed to accommodate more inter-FPGA signals than the connections in a multi-FPGA system. However, the inter-FPGA signal delay induced by TDM becomes significant due to time-multiplexing. Researchers have shown that TDM ratios (signal time-multiplexing ratios) significantly affect the performance of a multi-FPGA system and inter-FPGA routing highly influences the quality of this system. This paper presents a framework to minimize the system clock period for a system-level FPGA while considering the inter-FPGA routing topology and the timing criticality of nets. Our framework consists of two stages: (1) a distributed profiling scheme to generate the desired net-ordering and then alleviate the routing congestion, and (2) a net-/edge-based refinement to assign TDM ratios efficiently with a strict decrease in the ratios. Based on the 2019 CAD contest at ICCAD benchmarks and the contest evaluation metric with both quality and efficiency, experimental results show that our framework achieves the best overall score among all the participating teams and published works.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127241030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643441
Jai-Ming Lin, Weikang Huang, Yao-Chieh Chen, Yi-Ting Wang, Po-Wen Wang
This article presents an analytical-based placement algorithm to handle dataflow constraint for mixed-size circuits. To quickly obtain a better placement at an early stage, engineers often reference dataflow of a design to determine the relative locations of cells and macros. To achieve this target, this paper presents two methods to make a placement follow this constraint. First, we give larger weights to those nets which connect to datapath-oriented objects in the beginning, and then gradually shrink the values by the modified Gompertz curve according to the status of placement utilization in order to shorten their distances without interfering with object distribution. Second, we define desirable placement regions for each datapath-oriented object and propose a novel sigmoid function to give additional penalties to these objects in the analytical placement formulation if they are not in the regions. The experiment demonstrates that our methodology can obtain better results than the other approach which does not consider dataflow constraint. Not only wirelength but also routability will be improved in the resulting placement. Furthermore, our placer outperforms the RTL-aware dataflow-driven macro placer.
{"title":"DAPA: A Dataflow-Aware Analytical Placement Algorithm for Modern Mixed-Size Circuit Designs","authors":"Jai-Ming Lin, Weikang Huang, Yao-Chieh Chen, Yi-Ting Wang, Po-Wen Wang","doi":"10.1109/ICCAD51958.2021.9643441","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643441","url":null,"abstract":"This article presents an analytical-based placement algorithm to handle dataflow constraint for mixed-size circuits. To quickly obtain a better placement at an early stage, engineers often reference dataflow of a design to determine the relative locations of cells and macros. To achieve this target, this paper presents two methods to make a placement follow this constraint. First, we give larger weights to those nets which connect to datapath-oriented objects in the beginning, and then gradually shrink the values by the modified Gompertz curve according to the status of placement utilization in order to shorten their distances without interfering with object distribution. Second, we define desirable placement regions for each datapath-oriented object and propose a novel sigmoid function to give additional penalties to these objects in the analytical placement formulation if they are not in the regions. The experiment demonstrates that our methodology can obtain better results than the other approach which does not consider dataflow constraint. Not only wirelength but also routability will be improved in the resulting placement. Furthermore, our placer outperforms the RTL-aware dataflow-driven macro placer.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123759291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643572
M. Nazemi, Hitarth Kanakia, M. Pedram
Existing two-level logic minimization methods suffer from scalability problems, i.e. they cannot handle the optimization of Boolean functions with more than about 50k or so product terms. However, applications have arisen that produce Boolean functions with hundreds of thousands to millions of minterms. To ameliorate the aforesaid scalability problem, this work presents a suite of heuristics that enables exact or approximate two-level logic minimization of such large Boolean functions by employing a divide and conquer technique. All proposed heuristics first deploy a decision tree to iteratively partition the original specification of a given Boolean function. Next, they apply one of different leaf optimization techniques (e.g., those based on support vector machines or error budgets) to each leaf node of the tree, and, finally, they merge the locally optimized leaves at the root of the tree to perform one round of the global optimization. We show that our support vector machine-based heuristic compresses Boolean functions with 300,000 minterms by a factor of about 100 (i.e. 3,000 cubes in the optimized function), and achieves 98% accuracy. Similarly, our error-budget-driven heuristic compresses a Boolean function with about 3,000,000 minterms by a factor of 1,273, and achieves 95 % accuracy while it only takes 67 seconds to complete the whole optimization process. This is a significant improvement compared to well-known two-level logic minimization tools such as ESPRESSO-II and BOOM, which fail to optimize the same Boolean functions even after running for a few days.
{"title":"Heuristics for Million-scale Two-level Logic Minimization","authors":"M. Nazemi, Hitarth Kanakia, M. Pedram","doi":"10.1109/ICCAD51958.2021.9643572","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643572","url":null,"abstract":"Existing two-level logic minimization methods suffer from scalability problems, i.e. they cannot handle the optimization of Boolean functions with more than about 50k or so product terms. However, applications have arisen that produce Boolean functions with hundreds of thousands to millions of minterms. To ameliorate the aforesaid scalability problem, this work presents a suite of heuristics that enables exact or approximate two-level logic minimization of such large Boolean functions by employing a divide and conquer technique. All proposed heuristics first deploy a decision tree to iteratively partition the original specification of a given Boolean function. Next, they apply one of different leaf optimization techniques (e.g., those based on support vector machines or error budgets) to each leaf node of the tree, and, finally, they merge the locally optimized leaves at the root of the tree to perform one round of the global optimization. We show that our support vector machine-based heuristic compresses Boolean functions with 300,000 minterms by a factor of about 100 (i.e. 3,000 cubes in the optimized function), and achieves 98% accuracy. Similarly, our error-budget-driven heuristic compresses a Boolean function with about 3,000,000 minterms by a factor of 1,273, and achieves 95 % accuracy while it only takes 67 seconds to complete the whole optimization process. This is a significant improvement compared to well-known two-level logic minimization tools such as ESPRESSO-II and BOOM, which fail to optimize the same Boolean functions even after running for a few days.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126333222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643566
Vidya A. Chhabria, K. Kunal, Masoud Zabihi, S. Sapatnekar
Evaluating CAD solutions to physical implementation problems has been extremely challenging due to the unavailability of modern benchmarks in the public domain. This work aims to address this challenge by proposing a process-portable machine learning (ML)-based methodology for synthesizing synthetic power delivery network (PDN) benchmarks that obfuscate intellectual property information. In particular, the proposed approach leverages generative adversarial networks (GAN) and transfer learning techniques to create realistic PDN benchmarks from a small set of available real circuit data. BeGAN generates thousands of PDN benchmarks with significant histogram correlation (p-value ≤ 0.05) demonstrating its realism and an average L1 Norm of more than 7.1 %, highlighting its IP obfuscation capabilities. The original and thousands of ML-generated synthetic PDN benchmarks for four different open-source technologies are released in the public domain to advance research in this field.
{"title":"BeGAN: Power Grid Benchmark Generation Using a Process-portable GAN-based Methodology","authors":"Vidya A. Chhabria, K. Kunal, Masoud Zabihi, S. Sapatnekar","doi":"10.1109/ICCAD51958.2021.9643566","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643566","url":null,"abstract":"Evaluating CAD solutions to physical implementation problems has been extremely challenging due to the unavailability of modern benchmarks in the public domain. This work aims to address this challenge by proposing a process-portable machine learning (ML)-based methodology for synthesizing synthetic power delivery network (PDN) benchmarks that obfuscate intellectual property information. In particular, the proposed approach leverages generative adversarial networks (GAN) and transfer learning techniques to create realistic PDN benchmarks from a small set of available real circuit data. BeGAN generates thousands of PDN benchmarks with significant histogram correlation (p-value ≤ 0.05) demonstrating its realism and an average L1 Norm of more than 7.1 %, highlighting its IP obfuscation capabilities. The original and thousands of ML-generated synthetic PDN benchmarks for four different open-source technologies are released in the public domain to advance research in this field.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125615626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643494
Mingjie Liu, Xiyuan Tang, Keren Zhu, Hao Chen, Nan Sun, D. Pan
Despite recent developments in automated analog sizing and analog layout generation, there is doubt whether analog design automation techniques could scale to system-level designs. On the other hand, analog designs are considered major roadblocks for open source hardware with limited available design automation tools. In this work, we present OpenSAR, the first open source automated end-to-end successive approximation register (SAR) analog-to-digital converter (ADC) compiler. OpenSAR only requires system performance specifications as the minimal input and outputs DRC and LVS clean layouts. Compared with prior work, we leverage automated placement and routing to generate analog building blocks, removing the need to design layout templates or libraries. We optimize the redundant non-binary capacitor digital-to-analog converter (CDAC) array design for yield considerations with a template-based layout generator that interleaves capacitor rows and columns to reduce process gradient mismatch. Post layout simulations demonstrate that the generated prototype designs achieve state-of-the-art resolution, speed, and energy efficiency.
{"title":"OpenSAR: An Open Source Automated End-to-end SAR ADC Compiler","authors":"Mingjie Liu, Xiyuan Tang, Keren Zhu, Hao Chen, Nan Sun, D. Pan","doi":"10.1109/ICCAD51958.2021.9643494","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643494","url":null,"abstract":"Despite recent developments in automated analog sizing and analog layout generation, there is doubt whether analog design automation techniques could scale to system-level designs. On the other hand, analog designs are considered major roadblocks for open source hardware with limited available design automation tools. In this work, we present OpenSAR, the first open source automated end-to-end successive approximation register (SAR) analog-to-digital converter (ADC) compiler. OpenSAR only requires system performance specifications as the minimal input and outputs DRC and LVS clean layouts. Compared with prior work, we leverage automated placement and routing to generate analog building blocks, removing the need to design layout templates or libraries. We optimize the redundant non-binary capacitor digital-to-analog converter (CDAC) array design for yield considerations with a template-based layout generator that interleaves capacitor rows and columns to reduce process gradient mismatch. Post layout simulations demonstrate that the generated prototype designs achieve state-of-the-art resolution, speed, and energy efficiency.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131866471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cell placement remains a big challenge in the modern VLSI design especially in routability. Routing overflow may come from global and local routing congestion in a placement. To target on resolving these problems, this paper proposes two techniques in a global placement algorithm based on an analytical placement formulation and the multilevel framework. To remove global routing congestion, we consider each net as a movable soft module and propose a novel congestion-aware net penalty model so that a net will receive a larger penalty if it covers more routing congested regions. Therefore, our placement formulation can be more easier to move nets away from routing congested regions than other approaches and has less impact on wirelength. In addition, to relieve local congestion, we propose an inflation technique to expand the area of a cluster according to its internal connectivity intensity and routing congestion occupied by the cluster. The experimental results demonstrate that our approaches can get better routability and wirelength compared to other approaches such as NTUplace4h, NTUplace4dr, and RePlAce.
{"title":"Routability-driven Global Placer Target on Removing Global and Local Congestion for VLSI Designs","authors":"Jai-Ming Lin, Chung-Wei Huang, Liang-Chi Zane, Min-Chia Tsai, Che-Li Lin, Chen-Fa Tsai","doi":"10.1109/ICCAD51958.2021.9643544","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643544","url":null,"abstract":"Cell placement remains a big challenge in the modern VLSI design especially in routability. Routing overflow may come from global and local routing congestion in a placement. To target on resolving these problems, this paper proposes two techniques in a global placement algorithm based on an analytical placement formulation and the multilevel framework. To remove global routing congestion, we consider each net as a movable soft module and propose a novel congestion-aware net penalty model so that a net will receive a larger penalty if it covers more routing congested regions. Therefore, our placement formulation can be more easier to move nets away from routing congested regions than other approaches and has less impact on wirelength. In addition, to relieve local congestion, we propose an inflation technique to expand the area of a cluster according to its internal connectivity intensity and routing congestion occupied by the cluster. The experimental results demonstrate that our approaches can get better routability and wirelength compared to other approaches such as NTUplace4h, NTUplace4dr, and RePlAce.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131725392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643435
Yi-Chen Lu, S. Nath, Vishal Khandelwal, S. Lim
Modern designs are increasingly reliant on physical design (PD) tools to derive full technology scaling benefits of Moore's Law. Designers often perform power, performance, and area (PPA) exploration through parallel PD runs with different tool configurations. Efficient exploration of PPA is mission-critical for chip designers who are working with stringent time-to-market constraints and finite compute resources. Therefore, a framework that can accurately predict a “doomed run” (i.e., will not meet the PPA targets) at early phases of the PD flow can provide a significant productivity boost by enabling early termination of such runs. Multiple QoR metrics can be leveraged to classify successful or doomed PD runs. In this paper, we specifically focus on the aspect of timing, where our goal is to identify the PD runs that cannot achieve end-of-flow timing results by predicting the post-route total negative slack (TNS) values in early PD phases. To achieve our goal, we develop an end-to-end machine learning (ML) framework that performs TNS prediction by modeling PD implementation as a sequential flow. Particularly, our framework leverages graph neural networks (GNNs) to encode netlist graphs extracted from various PD phases, and utilize long short-term memory (LSTM) networks to perform sequential modeling based on the GNN-encoded features. Experimental results on seven industrial designs with 5:2 train/test split ratio demonstrate that our framework predicts post-route TNS values in high fidelity within 5.2% normalized root mean squared error (NRMSE) in early design stages (e.g., placement, CTS) on the two validation designs that are unseen during training.
{"title":"Doomed Run Prediction in Physical Design by Exploiting Sequential Flow and Graph Learning","authors":"Yi-Chen Lu, S. Nath, Vishal Khandelwal, S. Lim","doi":"10.1109/ICCAD51958.2021.9643435","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643435","url":null,"abstract":"Modern designs are increasingly reliant on physical design (PD) tools to derive full technology scaling benefits of Moore's Law. Designers often perform power, performance, and area (PPA) exploration through parallel PD runs with different tool configurations. Efficient exploration of PPA is mission-critical for chip designers who are working with stringent time-to-market constraints and finite compute resources. Therefore, a framework that can accurately predict a “doomed run” (i.e., will not meet the PPA targets) at early phases of the PD flow can provide a significant productivity boost by enabling early termination of such runs. Multiple QoR metrics can be leveraged to classify successful or doomed PD runs. In this paper, we specifically focus on the aspect of timing, where our goal is to identify the PD runs that cannot achieve end-of-flow timing results by predicting the post-route total negative slack (TNS) values in early PD phases. To achieve our goal, we develop an end-to-end machine learning (ML) framework that performs TNS prediction by modeling PD implementation as a sequential flow. Particularly, our framework leverages graph neural networks (GNNs) to encode netlist graphs extracted from various PD phases, and utilize long short-term memory (LSTM) networks to perform sequential modeling based on the GNN-encoded features. Experimental results on seven industrial designs with 5:2 train/test split ratio demonstrate that our framework predicts post-route TNS values in high fidelity within 5.2% normalized root mean squared error (NRMSE) in early design stages (e.g., placement, CTS) on the two validation designs that are unseen during training.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131737454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643515
Yu-Neng Wang, Yun-Rong Luo, Po-Chun Chien, Ping-Lun Wang, Hao-Ren Wang, Wan-Hsuan Lin, J. H. Jiang, Chung-Yang Huang
The X-value arises in various contexts of system design. It often represents an unknown value or a don't-care value depending on the application. Verification of X-valued circuits is a crucial task but relatively unaddressed. The challenge of equivalence checking for X-valued circuits, named compatible equivalence checking, is posed in the 2020 ICCAD CAD Contest. In this paper, we present our winning method based on X-value preserving dual-rail encoding and incremental identification of compatible equivalence relation. Experimental results demonstrate the effectiveness of the proposed techniques and the outperformance of our approach in solving more cases than the commercial tool and the other teams among the top 3 of the contest.
{"title":"Compatible Equivalence Checking of X-Valued Circuits","authors":"Yu-Neng Wang, Yun-Rong Luo, Po-Chun Chien, Ping-Lun Wang, Hao-Ren Wang, Wan-Hsuan Lin, J. H. Jiang, Chung-Yang Huang","doi":"10.1109/ICCAD51958.2021.9643515","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643515","url":null,"abstract":"The X-value arises in various contexts of system design. It often represents an unknown value or a don't-care value depending on the application. Verification of X-valued circuits is a crucial task but relatively unaddressed. The challenge of equivalence checking for X-valued circuits, named compatible equivalence checking, is posed in the 2020 ICCAD CAD Contest. In this paper, we present our winning method based on X-value preserving dual-rail encoding and incremental identification of compatible equivalence relation. Experimental results demonstrate the effectiveness of the proposed techniques and the outperformance of our approach in solving more cases than the commercial tool and the other teams among the top 3 of the contest.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130426382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643468
Di Gao, Grace Li Zhang, Xunzhao Yin, Bing Li, Ulf Schlichtmann, Cheng Zhuo
The memristor crossbar provides a unique opportunity to develop a neuromorphic computing system (NCS) with high scalability and energy efficiency. However, the reliability issues that arise from the immature fabrication process and physical device limitations, i.e., variations and stuck-at-faults (SAF), dramatically prevent its wide application in practice. Specifically, variations make the programmed weights deviate from their expected values. On the other hand, defective mem-ristors cannot even represent the weights effectively. In this work, we propose a variation- and defect-aware framework to improve the reliability of memristor-based NCS while minimizing the inference performance loss. We propose to develop analytical weight models to characterize the non-ideal effects of variations and SAFs, which can then be incorporated into a Bayesian neural network as priori and constraint. We then convert the reliability improvement to the neural network training for optimal weights that can accommodate variations and defects across the chips, which does not require computation-intensive retraining or cost-expensive testing. Extensive experimental results with the proposed framework confirm its effective capability of improving the reliability of NCS, while significantly mitigating the inference accuracy degradation under even severe variations and SAFs.
{"title":"Reliable Memristor-based Neuromorphic Design Using Variation- and Defect-Aware Training","authors":"Di Gao, Grace Li Zhang, Xunzhao Yin, Bing Li, Ulf Schlichtmann, Cheng Zhuo","doi":"10.1109/ICCAD51958.2021.9643468","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643468","url":null,"abstract":"The memristor crossbar provides a unique opportunity to develop a neuromorphic computing system (NCS) with high scalability and energy efficiency. However, the reliability issues that arise from the immature fabrication process and physical device limitations, i.e., variations and stuck-at-faults (SAF), dramatically prevent its wide application in practice. Specifically, variations make the programmed weights deviate from their expected values. On the other hand, defective mem-ristors cannot even represent the weights effectively. In this work, we propose a variation- and defect-aware framework to improve the reliability of memristor-based NCS while minimizing the inference performance loss. We propose to develop analytical weight models to characterize the non-ideal effects of variations and SAFs, which can then be incorporated into a Bayesian neural network as priori and constraint. We then convert the reliability improvement to the neural network training for optimal weights that can accommodate variations and defects across the chips, which does not require computation-intensive retraining or cost-expensive testing. Extensive experimental results with the proposed framework confirm its effective capability of improving the reliability of NCS, while significantly mitigating the inference accuracy degradation under even severe variations and SAFs.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130794707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICCAD51958.2021.9643522
Seungkyu Choi, Jaekang Shin, L. Kim
DNN training has become a major workload in on-device situations to execute various vision tasks with high performance. Accordingly, training architectures accompanying approximate computing have been steadily studied for efficient acceleration. However, most of the works examine their scheme on from-the-scratch training where inaccurate computing is not tolerable. Moreover, previous solutions are mostly provided as an extended version of the inference works, e.g., sparsity/pruning, quantization, dataflow, etc. Therefore, unresolved issues in practical workloads that hinder the total speed of the DNN training process remain still. In this work, with targeting the transfer learning-based task adaptation of the practical on-device training workload, we propose a convergence monitoring method to resolve the redundancy in massive training iterations. By utilizing the network's output value, we detect the training intensity of incoming tasks and monitor the prediction convergence with the given intensity to provide early-exits in the scheduled training iteration. As a result, an accurate approximation over various tasks is performed with minimal overhead. Unlike the sparsity-driven approximation, our method enables runtime optimization and can be easily applicable to off-the-shelf accelerators achieving significant speedup. Evaluation results on various datasets show a geomean of $2.2times$ speedup over baseline and $1.8times$ speedup over the latest convergence-related training method.
{"title":"A Convergence Monitoring Method for DNN Training of On-Device Task Adaptation","authors":"Seungkyu Choi, Jaekang Shin, L. Kim","doi":"10.1109/ICCAD51958.2021.9643522","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643522","url":null,"abstract":"DNN training has become a major workload in on-device situations to execute various vision tasks with high performance. Accordingly, training architectures accompanying approximate computing have been steadily studied for efficient acceleration. However, most of the works examine their scheme on from-the-scratch training where inaccurate computing is not tolerable. Moreover, previous solutions are mostly provided as an extended version of the inference works, e.g., sparsity/pruning, quantization, dataflow, etc. Therefore, unresolved issues in practical workloads that hinder the total speed of the DNN training process remain still. In this work, with targeting the transfer learning-based task adaptation of the practical on-device training workload, we propose a convergence monitoring method to resolve the redundancy in massive training iterations. By utilizing the network's output value, we detect the training intensity of incoming tasks and monitor the prediction convergence with the given intensity to provide early-exits in the scheduled training iteration. As a result, an accurate approximation over various tasks is performed with minimal overhead. Unlike the sparsity-driven approximation, our method enables runtime optimization and can be easily applicable to off-the-shelf accelerators achieving significant speedup. Evaluation results on various datasets show a geomean of $2.2times$ speedup over baseline and $1.8times$ speedup over the latest convergence-related training method.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130766808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}