A scalable routability-driven analytical placer with global router integration for FPGAs (abstract only)

Ka-Chun Lam, W. Tang, Evangeline F. Y. Young
{"title":"A scalable routability-driven analytical placer with global router integration for FPGAs (abstract only)","authors":"Ka-Chun Lam, W. Tang, Evangeline F. Y. Young","doi":"10.1145/2554688.2554711","DOIUrl":null,"url":null,"abstract":"As the sizes of modern circuits become bigger and bigger, implementing those large circuits into FPGA becomes arduous. The state-of-the-art academic FPGA place-and-route tool, VPR, has good quality but needs around a whole day to complete a placement when the input circuit contains millions of lookup tables, excluding the runtime for routing. To expedite the placement process, we propose a routability-driven placement algorithm for FPGA that adopts techniques used in ASIC global placer. Our placer follows the lower-bound-and-upper-bound iterative optimization process in ASIC placers like Ripple. In the lower-bound computation, the total HPWL, modeled using the Bound2Bound net model, is minimized using the conjugate gradient method. In the upper-bound computation, an almost-legalized result is produced by spreading cells linearly in the placement area. Those positions are then served as fixed-point anchors and fed into the next lower-bound computation. Furthermore, global routing will be performed in the upper-bound computation to estimate the routing segment usage, as a mean to consider congestion in placement. We tested our approach using 20 MCNC benchmarks and 4 large benchmarks for performance and scalability. Experimental results show that based on the island-style architecture which VPR is most optimized for, our approach can obtain a placement result 8x faster than VPR with 2% more in channel width, or 3x faster with 1% more in channel width when congestion is being considered. Our approach is even 14x faster than VPR in placing large benchmarks with over 10,000 lookup tables, with only 7% more in channel width.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2554688.2554711","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As the sizes of modern circuits become bigger and bigger, implementing those large circuits into FPGA becomes arduous. The state-of-the-art academic FPGA place-and-route tool, VPR, has good quality but needs around a whole day to complete a placement when the input circuit contains millions of lookup tables, excluding the runtime for routing. To expedite the placement process, we propose a routability-driven placement algorithm for FPGA that adopts techniques used in ASIC global placer. Our placer follows the lower-bound-and-upper-bound iterative optimization process in ASIC placers like Ripple. In the lower-bound computation, the total HPWL, modeled using the Bound2Bound net model, is minimized using the conjugate gradient method. In the upper-bound computation, an almost-legalized result is produced by spreading cells linearly in the placement area. Those positions are then served as fixed-point anchors and fed into the next lower-bound computation. Furthermore, global routing will be performed in the upper-bound computation to estimate the routing segment usage, as a mean to consider congestion in placement. We tested our approach using 20 MCNC benchmarks and 4 large benchmarks for performance and scalability. Experimental results show that based on the island-style architecture which VPR is most optimized for, our approach can obtain a placement result 8x faster than VPR with 2% more in channel width, or 3x faster with 1% more in channel width when congestion is being considered. Our approach is even 14x faster than VPR in placing large benchmarks with over 10,000 lookup tables, with only 7% more in channel width.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
可扩展的可达性驱动的分析放置器,具有fpga的全局路由器集成(仅抽象)
随着现代电路的尺寸越来越大,在FPGA中实现这些大型电路变得非常困难。最先进的学术FPGA放置和路由工具VPR具有良好的质量,但当输入电路包含数百万个查找表(不包括路由运行时)时,需要大约一整天才能完成放置。为了加快放置过程,我们提出了一种可达性驱动的FPGA放置算法,该算法采用了ASIC全局放置器中使用的技术。我们的砂矿遵循Ripple等ASIC砂矿的下限和上限迭代优化过程。在下界计算中,使用Bound2Bound网络模型建模的总HPWL使用共轭梯度法最小化。在上界计算中,通过在放置区域内线性扩展单元,得到一个几乎合法化的结果。然后将这些位置作为定点锚点,并输入到下一个下界计算中。此外,全局路由将在上界计算中执行,以估计路由段的使用情况,作为考虑放置中的拥塞的平均值。我们使用20个MCNC基准和4个大型性能和可伸缩性基准测试了我们的方法。实验结果表明,基于最适合VPR的岛式架构,我们的方法可以比VPR快8倍,通道宽度增加2%,考虑拥塞时可以比VPR快3倍,通道宽度增加1%。在放置超过10,000个查找表的大型基准测试时,我们的方法甚至比VPR快14倍,通道宽度仅多7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Energy-efficient multiplier-less discrete convolver through probabilistic domain transformation Revisiting and-inverter cones Pushing the performance boundary of linear projection designs through device specific optimisations (abstract only) MORP: makespan optimization for processors with an embedded reconfigurable fabric Co-processing with dynamic reconfiguration on heterogeneous MPSoC: practices and design tradeoffs (abstract only)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1