Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-01 DOI:10.1109/SBAC-PAD.2017.25

Shijie Zhou, V. Prasanna

{"title":"Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform","authors":"Shijie Zhou, V. Prasanna","doi":"10.1109/SBAC-PAD.2017.25","DOIUrl":null,"url":null,"abstract":"Hardware accelerators for graph analytics have gained increasing interest. Vertex-centric and edge-centric paradigms are widely used to design graph analytics accelerators. However, both of them have notable drawbacks: vertex-centric paradigm requires random memory accesses to traverse edges and edge-centric paradigm results in redundant edge traversals. In this paper, we explore the tradeoffs between vertex-centric and edge-centric paradigms and propose a hybrid algorithm which dynamically selects between them during the execution. We introduce the notion of active vertex ratio, based on which we develop a simple but efficient paradigm selection approach. We develop a hybrid data structure to concurrently support vertex-centric and edge-centric paradigms. Based on the hybrid data structure, we propose a graph partitioning scheme to increase parallelism and enable efficient parallel computation on heterogeneous platforms. In each iteration, we use our paradigm selection approach to select the appropriate paradigm for each partition. Further, we map our hybrid algorithm onto a stateof- the-art heterogeneous platform which integrates a multi-core CPU and a Field-Programmable Gate Array (FPGA) in a cache coherent fashion. We use our design methodology to accelerate two fundamental graph algorithms, breadth-first search (BFS) and single-source shortest path (SSSP). Experimental results show that our CPU-FPGA co-processing achieves up to 1.5× (1.9×) speedup for BFS (SSSP) compared with optimized baseline designs. Compared with the state-of-the-art FPGA-based designs, our design achieves up to 4.0× (4.2×) throughput improvement for BFS (SSSP). Compared with a state-of-the-art multi-core design, our design demonstrates up to 1.5× (1.8×) speedup for BFS (SSSP).","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"52","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2017.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 52

Abstract

Hardware accelerators for graph analytics have gained increasing interest. Vertex-centric and edge-centric paradigms are widely used to design graph analytics accelerators. However, both of them have notable drawbacks: vertex-centric paradigm requires random memory accesses to traverse edges and edge-centric paradigm results in redundant edge traversals. In this paper, we explore the tradeoffs between vertex-centric and edge-centric paradigms and propose a hybrid algorithm which dynamically selects between them during the execution. We introduce the notion of active vertex ratio, based on which we develop a simple but efficient paradigm selection approach. We develop a hybrid data structure to concurrently support vertex-centric and edge-centric paradigms. Based on the hybrid data structure, we propose a graph partitioning scheme to increase parallelism and enable efficient parallel computation on heterogeneous platforms. In each iteration, we use our paradigm selection approach to select the appropriate paradigm for each partition. Further, we map our hybrid algorithm onto a stateof- the-art heterogeneous platform which integrates a multi-core CPU and a Field-Programmable Gate Array (FPGA) in a cache coherent fashion. We use our design methodology to accelerate two fundamental graph algorithms, breadth-first search (BFS) and single-source shortest path (SSSP). Experimental results show that our CPU-FPGA co-processing achieves up to 1.5× (1.9×) speedup for BFS (SSSP) compared with optimized baseline designs. Compared with the state-of-the-art FPGA-based designs, our design achieves up to 4.0× (4.2×) throughput improvement for BFS (SSSP). Compared with a state-of-the-art multi-core design, our design demonstrates up to 1.5× (1.8×) speedup for BFS (SSSP).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CPU-FPGA异构平台加速图形分析

用于图形分析的硬件加速器获得了越来越多的兴趣。以顶点为中心和以边缘为中心的范式被广泛用于设计图形分析加速器。然而，它们都有明显的缺点:以顶点为中心的范式需要随机内存访问来遍历边缘，而以边缘为中心的范式会导致冗余的边缘遍历。本文探讨了以顶点为中心和以边缘为中心范式之间的权衡，并提出了一种在执行过程中动态选择两者的混合算法。引入主动顶点比的概念，在此基础上提出了一种简单有效的范式选择方法。我们开发了一种混合数据结构来同时支持以顶点为中心和以边缘为中心的范式。在混合数据结构的基础上，提出了一种图形划分方案来提高并行性，实现异构平台上的高效并行计算。在每次迭代中，我们使用范式选择方法为每个分区选择适当的范式。此外，我们将混合算法映射到最先进的异构平台上，该平台以缓存一致的方式集成了多核CPU和现场可编程门阵列(FPGA)。我们使用我们的设计方法来加速两种基本的图算法，广度优先搜索(BFS)和单源最短路径(SSSP)。实验结果表明，我们的CPU-FPGA协同处理达到1.5×与优化的基线设计相比，BFS (SSSP)的(1.9×)加速。与最先进的基于fpga的设计相比，我们的设计实现了4.0×(4.2×) BFS (SSSP)的吞吐量改进。与最先进的多核设计相比，我们的设计显示高达1.5×(1.8×)加速BFS (SSSP)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

自引率

0.00%

发文量

期刊最新文献

Resource-Management Study in HPC Runtime-Stacking Context Cloud Workload Prediction and Generation Models GC-CR: A Decentralized Garbage Collector Component for Checkpointing in Clouds Overcoming Memory-Capacity Constraints in the Use of ILUPACK on Graphics Processors Beyond the Fog: Bringing Cross-Platform Code Execution to Constrained IoT Devices