High-Throughput and Energy-Efficient Graph Processing on FPGA

Shijie Zhou, C. Chelmis, V. Prasanna
{"title":"High-Throughput and Energy-Efficient Graph Processing on FPGA","authors":"Shijie Zhou, C. Chelmis, V. Prasanna","doi":"10.1109/FCCM.2016.35","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a novel design for large-scale graph processing on FPGA. Our design uses large external memory for storing massive graph data and FPGA for acceleration, and leverages edge-centric computing principles. We propose a data layout which optimizes the external memory performance and leads to an efficient memory activation schedule to reduce on-chip memory power consumption. Further, we develop a parallel architecture on FPGA which can saturate the external memory bandwidth and concurrently process multiple input data to increase throughput. We use our design to accelerate several classic graph algorithms, including single-source shortest path, weakly connected component, and minimum spanning tree. Experimental results show that for all the considered graph algorithms, our design achieves high throughput of over 600 million traversed edges per second (MTEPS) and high energy-efficiency of over 30 MTEPS/W. Compared with a baseline design, our optimizations result in over 3.6× throughput and 5.8× energy-efficiency improvements, respectively. Our design achieves 32% throughput improvement when compared with state-of-the-art FPGA designs, and up to 7.8× speedup when compared with state-of-the-art multi-core implementation.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"75","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2016.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 75

Abstract

In this paper, we propose a novel design for large-scale graph processing on FPGA. Our design uses large external memory for storing massive graph data and FPGA for acceleration, and leverages edge-centric computing principles. We propose a data layout which optimizes the external memory performance and leads to an efficient memory activation schedule to reduce on-chip memory power consumption. Further, we develop a parallel architecture on FPGA which can saturate the external memory bandwidth and concurrently process multiple input data to increase throughput. We use our design to accelerate several classic graph algorithms, including single-source shortest path, weakly connected component, and minimum spanning tree. Experimental results show that for all the considered graph algorithms, our design achieves high throughput of over 600 million traversed edges per second (MTEPS) and high energy-efficiency of over 30 MTEPS/W. Compared with a baseline design, our optimizations result in over 3.6× throughput and 5.8× energy-efficiency improvements, respectively. Our design achieves 32% throughput improvement when compared with state-of-the-art FPGA designs, and up to 7.8× speedup when compared with state-of-the-art multi-core implementation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于FPGA的高吞吐量和高能效图形处理
本文提出了一种基于FPGA的大规模图形处理的新设计。我们的设计使用大型外部存储器来存储大量图形数据,使用FPGA来加速,并利用边缘中心计算原理。我们提出了一种优化外部存储器性能的数据布局,并导致有效的存储器激活计划,以降低片上存储器功耗。此外,我们在FPGA上开发了一种并行架构,该架构可以饱和外部存储器带宽并并发处理多个输入数据以提高吞吐量。我们使用我们的设计来加速几种经典的图算法,包括单源最短路径、弱连接分量和最小生成树。实验结果表明,对于所有考虑的图算法,我们的设计实现了每秒超过6亿遍历边(MTEPS)的高吞吐量和超过30 MTEPS/W的高能效。与基线设计相比,我们的优化分别使吞吐量提高了3.6倍以上,能效提高了5.8倍。与最先进的FPGA设计相比,我们的设计实现了32%的吞吐量提高,与最先进的多核实现相比,速度提高了7.8倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Spatial Predicates Evaluation in the Geohash Domain Using Reconfigurable Hardware Two-Hit Filter Synthesis for Genomic Database Search Initiation Interval Aware Resource Sharing for FPGA DSP Blocks Finding Space-Time Stream Permutations for Minimum Memory and Latency Runtime Parameterizable Regular Expression Operators for Databases
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1