Pub Date : 2021-12-06DOI: 10.1109/ICFPT52863.2021.9609877
Xiaoxi Wang, Moucheng Yang, Zhen Li, Lingli Wang
Technology mapping from logic netlists to programmable logic blocks (PLB) plays an important role in FPGA EDA flow, especially for architecture exploration of PLBs. However, technology mapping becomes time-consuming due to the booming scale and complexity of IC designs as well as the growing complexity of PLB architectures. To speed up this process, a parallelized technology mapping approach based on adaptive circuit partitioning is proposed in this paper to perform fast multi-thread technology mapping. First, We choose the best of the three candidate partitioning strategies for the given netlist by circuit analysis to partition the original netlist into several independent sub-netlists. Secondly, these sub-netlists are mapped to the given PLB architecture simultaneously in their corresponding mapping threads. Finally, the complete mapped netlist is generated by merging the mapped sub-netlists. The proposed approach is implemented in ABC, independent of the detailed mapping algorithm. 13 large circuits from the Titan23 benchmark set are used as benchmarks to evaluate the proposed approach. Experimental results show that the proposed approach leads to an average of 5.76 × speedup over the single-thread version (up to 8.21 × individually) with no delay loss and less than 0.57% average area penalty.
{"title":"Parallelized Technology Mapping to General PLBs by Adaptive Circuit Partitioning","authors":"Xiaoxi Wang, Moucheng Yang, Zhen Li, Lingli Wang","doi":"10.1109/ICFPT52863.2021.9609877","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609877","url":null,"abstract":"Technology mapping from logic netlists to programmable logic blocks (PLB) plays an important role in FPGA EDA flow, especially for architecture exploration of PLBs. However, technology mapping becomes time-consuming due to the booming scale and complexity of IC designs as well as the growing complexity of PLB architectures. To speed up this process, a parallelized technology mapping approach based on adaptive circuit partitioning is proposed in this paper to perform fast multi-thread technology mapping. First, We choose the best of the three candidate partitioning strategies for the given netlist by circuit analysis to partition the original netlist into several independent sub-netlists. Secondly, these sub-netlists are mapped to the given PLB architecture simultaneously in their corresponding mapping threads. Finally, the complete mapped netlist is generated by merging the mapped sub-netlists. The proposed approach is implemented in ABC, independent of the detailed mapping algorithm. 13 large circuits from the Titan23 benchmark set are used as benchmarks to evaluate the proposed approach. Experimental results show that the proposed approach leads to an average of 5.76 × speedup over the single-thread version (up to 8.21 × individually) with no delay loss and less than 0.57% average area penalty.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121676522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-04DOI: 10.1109/ICFPT52863.2021.9609847
Martin Ferianc, Zhiqiang Que, Hongxiang Fan, W. Luk, Miguel L. Rodrigues
Neural networks have demonstrated their outstanding performance in a wide range of tasks. Specifically recurrent architectures based on long-short term memory (LSTM) cells have manifested excellent capability to model time dependencies in real-world data. However, standard recurrent architectures cannot estimate their uncertainty which is essential for safety-critical applications such as in medicine. In contrast, Bayesian recurrent neural networks (RNNs) are able to provide uncertainty estimation with improved accuracy. Nonetheless, Bayesian RNNs are computationally and memory demanding, which limits their practicality despite their advantages. To address this issue, we propose an FPGA-based hardware design to accelerate Bayesian LSTM-based RNNs. To further improve the overall algorithmic-hardware performance, a co-design framework is proposed to explore the most fitting algorithmic-hardware configurations for Bayesian RNNs. We conduct extensive experiments on healthcare applications to demonstrate the improvement of our design and the effectiveness of our framework. Compared with GPU implementation, our FPGA-based design can achieve up to 10 times speedup with nearly 106 times higher energy efficiency. To the best of our knowledge, this is the first work targeting acceleration of Bayesian RNNs on FPGAs.
{"title":"Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator","authors":"Martin Ferianc, Zhiqiang Que, Hongxiang Fan, W. Luk, Miguel L. Rodrigues","doi":"10.1109/ICFPT52863.2021.9609847","DOIUrl":"https://doi.org/10.1109/ICFPT52863.2021.9609847","url":null,"abstract":"Neural networks have demonstrated their outstanding performance in a wide range of tasks. Specifically recurrent architectures based on long-short term memory (LSTM) cells have manifested excellent capability to model time dependencies in real-world data. However, standard recurrent architectures cannot estimate their uncertainty which is essential for safety-critical applications such as in medicine. In contrast, Bayesian recurrent neural networks (RNNs) are able to provide uncertainty estimation with improved accuracy. Nonetheless, Bayesian RNNs are computationally and memory demanding, which limits their practicality despite their advantages. To address this issue, we propose an FPGA-based hardware design to accelerate Bayesian LSTM-based RNNs. To further improve the overall algorithmic-hardware performance, a co-design framework is proposed to explore the most fitting algorithmic-hardware configurations for Bayesian RNNs. We conduct extensive experiments on healthcare applications to demonstrate the improvement of our design and the effectiveness of our framework. Compared with GPU implementation, our FPGA-based design can achieve up to 10 times speedup with nearly 106 times higher energy efficiency. To the best of our knowledge, this is the first work targeting acceleration of Bayesian RNNs on FPGAs.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"265 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116182712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}