Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform

Soroosh Khoram, Jialiang Zhang, Maxwell Strange, J. Li
{"title":"Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform","authors":"Soroosh Khoram, Jialiang Zhang, Maxwell Strange, J. Li","doi":"10.1145/3174243.3174260","DOIUrl":null,"url":null,"abstract":"Graph analytics, which explores the relationships among interconnected entities, is becoming increasingly important due to its broad applicability, from machine learning to social sciences. However, due to the irregular data access patterns in graph computations, one major challenge for graph processing systems is performance. The algorithms, softwares, and hardwares that have been tailored for mainstream parallel applications are generally not effective for massive, sparse graphs from the real-world problems, due to their complex and irregular structures. To address the performance issues in large-scale graph analytics, we leverage the exceptional random access performance of the emerging Hybrid Memory Cube (HMC) combined with the flexibility and efficiency of modern FPGAs. In particular, we develop a collaborative software/hardware technique to perform a level-synchronized Breadth First Search (BFS) on a FPGA-HMC platform. From the software perspective, we develop an architecture-aware graph clustering algorithm that exploits the FPGA-HMC platform»s capability to improve data locality and memory access efficiency. From the hardware perspective, we further improve the FPGA-HMC graph processor architecture by designing a memory request merging unit to take advantage of the increased data locality resulting from graph clustering. We evaluate the performance of our BFS implementation using the AC-510 development kit from Micron and achieve $2.8 \\times$ average performance improvement compared to the latest FPGA-HMC based graph processing system over a set of benchmarks from a wide range of applications.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"149 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

Abstract

Graph analytics, which explores the relationships among interconnected entities, is becoming increasingly important due to its broad applicability, from machine learning to social sciences. However, due to the irregular data access patterns in graph computations, one major challenge for graph processing systems is performance. The algorithms, softwares, and hardwares that have been tailored for mainstream parallel applications are generally not effective for massive, sparse graphs from the real-world problems, due to their complex and irregular structures. To address the performance issues in large-scale graph analytics, we leverage the exceptional random access performance of the emerging Hybrid Memory Cube (HMC) combined with the flexibility and efficiency of modern FPGAs. In particular, we develop a collaborative software/hardware technique to perform a level-synchronized Breadth First Search (BFS) on a FPGA-HMC platform. From the software perspective, we develop an architecture-aware graph clustering algorithm that exploits the FPGA-HMC platform»s capability to improve data locality and memory access efficiency. From the hardware perspective, we further improve the FPGA-HMC graph processor architecture by designing a memory request merging unit to take advantage of the increased data locality resulting from graph clustering. We evaluate the performance of our BFS implementation using the AC-510 development kit from Micron and achieve $2.8 \times$ average performance improvement compared to the latest FPGA-HMC based graph processing system over a set of benchmarks from a wide range of applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过FPGA-HMC平台上的存储和访问协同优化加速图形分析
图分析,探索相互关联的实体之间的关系,由于其广泛的适用性,从机器学习到社会科学,正变得越来越重要。然而,由于图计算中的数据访问模式不规范,对图处理系统的一个主要挑战是性能。为主流并行应用程序量身定制的算法、软件和硬件,由于其复杂和不规则的结构,通常对来自现实世界问题的大量稀疏图无效。为了解决大规模图形分析中的性能问题,我们利用新兴的混合存储立方体(HMC)的卓越随机访问性能,结合现代fpga的灵活性和效率。特别是,我们开发了一种协作的软件/硬件技术,用于在FPGA-HMC平台上执行电平同步的广度优先搜索(BFS)。从软件的角度来看,我们开发了一种架构感知的图聚类算法,该算法利用FPGA-HMC平台的能力来提高数据局域性和内存访问效率。从硬件角度来看,我们通过设计内存请求合并单元进一步改进FPGA-HMC图处理器架构,以利用图聚类带来的数据局域性增加。我们使用美光的AC-510开发套件评估了BFS实现的性能,与最新的基于FPGA-HMC的图形处理系统相比,通过一系列来自广泛应用的基准测试,实现了2.8倍的平均性能提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Architecture and Circuit Design of an All-Spintronic FPGA Session details: Session 6: High Level Synthesis 2 A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks: (Abstract Only) Software/Hardware Co-design for Multichannel Scheduling in IEEE 802.11p MLME: (Abstract Only) Session details: Special Session: Deep Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1