Domino:一种异步和节能的图形处理加速器(仅摘要)

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2018-02-15 DOI:10.1145/3174243.3174973

Chongchong Xu, Chao Wang, Yiwei Zhang, Lei Gong, Xi Li, Xuehai Zhou

{"title":"Domino:一种异步和节能的图形处理加速器(仅摘要)","authors":"Chongchong Xu, Chao Wang, Yiwei Zhang, Lei Gong, Xi Li, Xuehai Zhou","doi":"10.1145/3174243.3174973","DOIUrl":null,"url":null,"abstract":"Large-scale graphs processing, which draws attentions of researchers, applies in a large range of domains, such as social networks, web graphs, and transport networks. However, processing large-scale graphs on general processors suffers from difficulties including computation and memory inefficiency. Therefore, the research of hardware accelerator for graph processing has become a hot issue recently. Meanwhile, as a power-efficiency and reconfigurable resource, FPGA is a potential solution to design and employ graph processing algorithms. In this paper, we propose Domino, an asynchronous and energy-efficient hardware accelerator for graph processing. Domino adopts the asynchronous model to process graphs, which is efficient for most of the graph algorithms, such as Breadth-First Search, Depth-First Search, and Single Source Shortest Path. Domino also proposes a specific data structure based on row vector, named Batch Row Vector, to present graphs. Our work adopts the naive update mechanism and bisect update mechanism to perform asynchronous control. Ultimately, we implement Domino on an advanced Xilinx Virtex-7 board, and experimental results demonstrate that Domino has significant performance and energy improvement, especially for graphs with a large diameter(e.g., roadNet-CA and USA-Road). Case studies in Domino achieve 1.47x-7.84x and 0.47x-2.52x average speedup for small-diameter graphs(e.g., com-youtube, WikiTalk, and soc-LiveJournal), over GraphChi on the Intel Core2 and Core i7 processors, respectively. Besides, compared to Intel Core i7 processors, Domino also performs significant energy-efficiency that is 2.03x-10.08x for three small-diameter graphs and 27.98x-134.50x for roadNet-CA which is a graph with relatively large diameter.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Domino: An Asynchronous and Energy-efficient Accelerator for Graph Processing: (Abstract Only)\",\"authors\":\"Chongchong Xu, Chao Wang, Yiwei Zhang, Lei Gong, Xi Li, Xuehai Zhou\",\"doi\":\"10.1145/3174243.3174973\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale graphs processing, which draws attentions of researchers, applies in a large range of domains, such as social networks, web graphs, and transport networks. However, processing large-scale graphs on general processors suffers from difficulties including computation and memory inefficiency. Therefore, the research of hardware accelerator for graph processing has become a hot issue recently. Meanwhile, as a power-efficiency and reconfigurable resource, FPGA is a potential solution to design and employ graph processing algorithms. In this paper, we propose Domino, an asynchronous and energy-efficient hardware accelerator for graph processing. Domino adopts the asynchronous model to process graphs, which is efficient for most of the graph algorithms, such as Breadth-First Search, Depth-First Search, and Single Source Shortest Path. Domino also proposes a specific data structure based on row vector, named Batch Row Vector, to present graphs. Our work adopts the naive update mechanism and bisect update mechanism to perform asynchronous control. Ultimately, we implement Domino on an advanced Xilinx Virtex-7 board, and experimental results demonstrate that Domino has significant performance and energy improvement, especially for graphs with a large diameter(e.g., roadNet-CA and USA-Road). Case studies in Domino achieve 1.47x-7.84x and 0.47x-2.52x average speedup for small-diameter graphs(e.g., com-youtube, WikiTalk, and soc-LiveJournal), over GraphChi on the Intel Core2 and Core i7 processors, respectively. Besides, compared to Intel Core i7 processors, Domino also performs significant energy-efficiency that is 2.03x-10.08x for three small-diameter graphs and 27.98x-134.50x for roadNet-CA which is a graph with relatively large diameter.\",\"PeriodicalId\":164936,\"journal\":{\"name\":\"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3174243.3174973\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174973","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大规模图处理在社交网络、网络图、交通网络等领域的应用日益受到研究人员的关注。然而，在普通处理器上处理大规模图形存在计算和内存效率低下等困难。因此，图形处理硬件加速器的研究已成为近年来的研究热点。同时，FPGA作为一种节能和可重构的资源，是设计和应用图形处理算法的潜在解决方案。在本文中，我们提出了Domino，一个异步和节能的硬件加速器用于图形处理。Domino采用异步模型来处理图，这对于大多数图算法(如广度优先搜索、深度优先搜索和单源最短路径)都是有效的。Domino还提出了一种基于行向量的特定数据结构，称为Batch row vector，用于表示图形。我们的工作采用朴素更新机制和对分更新机制来实现异步控制。最终，我们在高级Xilinx Virtex-7板上实现了Domino，实验结果表明Domino具有显著的性能和能量改进，特别是对于具有大直径的图形(例如:， roadNet-CA和USA-Road)。在Domino的案例研究中，对于小直径图(例如:(如com-youtube、WikiTalk和soc-LiveJournal)，分别在英特尔Core2和Core i7处理器上优于GraphChi。此外，与Intel Core i7处理器相比，Domino的能效也非常显著，三个小直径图的能效为2.03x-10.08x, roadNet-CA的能效为27.98x-134.50x, roadNet-CA是一个直径较大的图。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Domino: An Asynchronous and Energy-efficient Accelerator for Graph Processing: (Abstract Only)

Large-scale graphs processing, which draws attentions of researchers, applies in a large range of domains, such as social networks, web graphs, and transport networks. However, processing large-scale graphs on general processors suffers from difficulties including computation and memory inefficiency. Therefore, the research of hardware accelerator for graph processing has become a hot issue recently. Meanwhile, as a power-efficiency and reconfigurable resource, FPGA is a potential solution to design and employ graph processing algorithms. In this paper, we propose Domino, an asynchronous and energy-efficient hardware accelerator for graph processing. Domino adopts the asynchronous model to process graphs, which is efficient for most of the graph algorithms, such as Breadth-First Search, Depth-First Search, and Single Source Shortest Path. Domino also proposes a specific data structure based on row vector, named Batch Row Vector, to present graphs. Our work adopts the naive update mechanism and bisect update mechanism to perform asynchronous control. Ultimately, we implement Domino on an advanced Xilinx Virtex-7 board, and experimental results demonstrate that Domino has significant performance and energy improvement, especially for graphs with a large diameter(e.g., roadNet-CA and USA-Road). Case studies in Domino achieve 1.47x-7.84x and 0.47x-2.52x average speedup for small-diameter graphs(e.g., com-youtube, WikiTalk, and soc-LiveJournal), over GraphChi on the Intel Core2 and Core i7 processors, respectively. Besides, compared to Intel Core i7 processors, Domino also performs significant energy-efficiency that is 2.03x-10.08x for three small-diameter graphs and 27.98x-134.50x for roadNet-CA which is a graph with relatively large diameter.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量

期刊最新文献

Architecture and Circuit Design of an All-Spintronic FPGA Session details: Session 6: High Level Synthesis 2 A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks: (Abstract Only) Software/Hardware Co-design for Multichannel Scheduling in IEEE 802.11p MLME: (Abstract Only) Session details: Special Session: Deep Learning