A scalable routability-driven analytical placer with global router integration for FPGAs (abstract only)

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays Pub Date : 2014-02-26 DOI:10.1145/2554688.2554711

Ka-Chun Lam, W. Tang, Evangeline F. Y. Young

{"title":"A scalable routability-driven analytical placer with global router integration for FPGAs (abstract only)","authors":"Ka-Chun Lam, W. Tang, Evangeline F. Y. Young","doi":"10.1145/2554688.2554711","DOIUrl":null,"url":null,"abstract":"As the sizes of modern circuits become bigger and bigger, implementing those large circuits into FPGA becomes arduous. The state-of-the-art academic FPGA place-and-route tool, VPR, has good quality but needs around a whole day to complete a placement when the input circuit contains millions of lookup tables, excluding the runtime for routing. To expedite the placement process, we propose a routability-driven placement algorithm for FPGA that adopts techniques used in ASIC global placer. Our placer follows the lower-bound-and-upper-bound iterative optimization process in ASIC placers like Ripple. In the lower-bound computation, the total HPWL, modeled using the Bound2Bound net model, is minimized using the conjugate gradient method. In the upper-bound computation, an almost-legalized result is produced by spreading cells linearly in the placement area. Those positions are then served as fixed-point anchors and fed into the next lower-bound computation. Furthermore, global routing will be performed in the upper-bound computation to estimate the routing segment usage, as a mean to consider congestion in placement. We tested our approach using 20 MCNC benchmarks and 4 large benchmarks for performance and scalability. Experimental results show that based on the island-style architecture which VPR is most optimized for, our approach can obtain a placement result 8x faster than VPR with 2% more in channel width, or 3x faster with 1% more in channel width when congestion is being considered. Our approach is even 14x faster than VPR in placing large benchmarks with over 10,000 lookup tables, with only 7% more in channel width.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2554688.2554711","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

As the sizes of modern circuits become bigger and bigger, implementing those large circuits into FPGA becomes arduous. The state-of-the-art academic FPGA place-and-route tool, VPR, has good quality but needs around a whole day to complete a placement when the input circuit contains millions of lookup tables, excluding the runtime for routing. To expedite the placement process, we propose a routability-driven placement algorithm for FPGA that adopts techniques used in ASIC global placer. Our placer follows the lower-bound-and-upper-bound iterative optimization process in ASIC placers like Ripple. In the lower-bound computation, the total HPWL, modeled using the Bound2Bound net model, is minimized using the conjugate gradient method. In the upper-bound computation, an almost-legalized result is produced by spreading cells linearly in the placement area. Those positions are then served as fixed-point anchors and fed into the next lower-bound computation. Furthermore, global routing will be performed in the upper-bound computation to estimate the routing segment usage, as a mean to consider congestion in placement. We tested our approach using 20 MCNC benchmarks and 4 large benchmarks for performance and scalability. Experimental results show that based on the island-style architecture which VPR is most optimized for, our approach can obtain a placement result 8x faster than VPR with 2% more in channel width, or 3x faster with 1% more in channel width when congestion is being considered. Our approach is even 14x faster than VPR in placing large benchmarks with over 10,000 lookup tables, with only 7% more in channel width.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

可扩展的可达性驱动的分析放置器，具有fpga的全局路由器集成(仅抽象)

随着现代电路的尺寸越来越大，在FPGA中实现这些大型电路变得非常困难。最先进的学术FPGA放置和路由工具VPR具有良好的质量，但当输入电路包含数百万个查找表(不包括路由运行时)时，需要大约一整天才能完成放置。为了加快放置过程，我们提出了一种可达性驱动的FPGA放置算法，该算法采用了ASIC全局放置器中使用的技术。我们的砂矿遵循Ripple等ASIC砂矿的下限和上限迭代优化过程。在下界计算中，使用Bound2Bound网络模型建模的总HPWL使用共轭梯度法最小化。在上界计算中，通过在放置区域内线性扩展单元，得到一个几乎合法化的结果。然后将这些位置作为定点锚点，并输入到下一个下界计算中。此外，全局路由将在上界计算中执行，以估计路由段的使用情况，作为考虑放置中的拥塞的平均值。我们使用20个MCNC基准和4个大型性能和可伸缩性基准测试了我们的方法。实验结果表明，基于最适合VPR的岛式架构，我们的方法可以比VPR快8倍，通道宽度增加2%，考虑拥塞时可以比VPR快3倍，通道宽度增加1%。在放置超过10,000个查找表的大型基准测试时，我们的方法甚至比VPR快14倍，通道宽度仅多7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

自引率

0.00%

发文量