GraphScale: fpga上HBM和大型图形的可扩展处理

IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-09-13 DOI:10.1145/3616497
Jonas Dann, Daniel Ritter, Holger Fröning
{"title":"GraphScale: fpga上HBM和大型图形的可扩展处理","authors":"Jonas Dann, Daniel Ritter, Holger Fröning","doi":"10.1145/3616497","DOIUrl":null,"url":null,"abstract":"Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data analytics. While FPGAs denote a promising solution through flexible memory hierarchies and massive parallelism, we argue that current graph processing accelerators either use the off-chip memory bandwidth inefficiently or do not scale well across memory channels. In this work, we propose GraphScale, a scalable graph processing framework for FPGAs. GraphScale combines multi-channel memory with asynchronous graph processing (i. e., for fast convergence on results) and a compressed graph representation (i. e., for efficient usage of memory bandwidth and reduced memory footprint). GraphScale solves common graph problems like breadth-first search, PageRank, and weakly-connected components through modular user-defined functions, a novel two-dimensional partitioning scheme, and a high-performance two-level crossbar design. Additionally, we extend GraphScale to scale to modern high-bandwidth memory (HBM) and reduce partitioning overhead of large graphs with binary packing.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":3.1000,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs\",\"authors\":\"Jonas Dann, Daniel Ritter, Holger Fröning\",\"doi\":\"10.1145/3616497\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data analytics. While FPGAs denote a promising solution through flexible memory hierarchies and massive parallelism, we argue that current graph processing accelerators either use the off-chip memory bandwidth inefficiently or do not scale well across memory channels. In this work, we propose GraphScale, a scalable graph processing framework for FPGAs. GraphScale combines multi-channel memory with asynchronous graph processing (i. e., for fast convergence on results) and a compressed graph representation (i. e., for efficient usage of memory bandwidth and reduced memory footprint). GraphScale solves common graph problems like breadth-first search, PageRank, and weakly-connected components through modular user-defined functions, a novel two-dimensional partitioning scheme, and a high-performance two-level crossbar design. Additionally, we extend GraphScale to scale to modern high-bandwidth memory (HBM) and reduce partitioning overhead of large graphs with binary packing.\",\"PeriodicalId\":49248,\"journal\":{\"name\":\"ACM Transactions on Reconfigurable Technology and Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2023-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Reconfigurable Technology and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3616497\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3616497","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

fpga图形处理的最新进展有望缓解不规则内存访问模式带来的性能瓶颈。这些瓶颈挑战了越来越多的重要应用领域的性能,如机器学习和数据分析。虽然fpga通过灵活的内存层次结构和大规模并行性表示有前途的解决方案,但我们认为当前的图形处理加速器要么低效地使用片外内存带宽,要么不能很好地跨内存通道扩展。在这项工作中,我们提出了GraphScale,一个可扩展的fpga图形处理框架。GraphScale将多通道内存与异步图形处理(即,为了快速收敛结果)和压缩图形表示(即,为了有效使用内存带宽和减少内存占用)相结合。GraphScale通过模块化的用户定义函数、新颖的二维分区方案和高性能的两级交叉设计,解决了诸如宽度优先搜索、PageRank和弱连接组件等常见的图形问题。此外,我们扩展了GraphScale以适应现代高带宽内存(HBM),并通过二进制打包减少大型图的分区开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs
Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data analytics. While FPGAs denote a promising solution through flexible memory hierarchies and massive parallelism, we argue that current graph processing accelerators either use the off-chip memory bandwidth inefficiently or do not scale well across memory channels. In this work, we propose GraphScale, a scalable graph processing framework for FPGAs. GraphScale combines multi-channel memory with asynchronous graph processing (i. e., for fast convergence on results) and a compressed graph representation (i. e., for efficient usage of memory bandwidth and reduced memory footprint). GraphScale solves common graph problems like breadth-first search, PageRank, and weakly-connected components through modular user-defined functions, a novel two-dimensional partitioning scheme, and a high-performance two-level crossbar design. Additionally, we extend GraphScale to scale to modern high-bandwidth memory (HBM) and reduce partitioning overhead of large graphs with binary packing.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-
CiteScore
4.90
自引率
8.70%
发文量
79
审稿时长
>12 weeks
期刊介绍: TRETS is the top journal focusing on research in, on, and with reconfigurable systems and on their underlying technology. The scope, rationale, and coverage by other journals are often limited to particular aspects of reconfigurable technology or reconfigurable systems. TRETS is a journal that covers reconfigurability in its own right. Topics that would be appropriate for TRETS would include all levels of reconfigurable system abstractions and all aspects of reconfigurable technology including platforms, programming environments and application successes that support these systems for computing or other applications. -The board and systems architectures of a reconfigurable platform. -Programming environments of reconfigurable systems, especially those designed for use with reconfigurable systems that will lead to increased programmer productivity. -Languages and compilers for reconfigurable systems. -Logic synthesis and related tools, as they relate to reconfigurable systems. -Applications on which success can be demonstrated. The underlying technology from which reconfigurable systems are developed. (Currently this technology is that of FPGAs, but research on the nature and use of follow-on technologies is appropriate for TRETS.) In considering whether a paper is suitable for TRETS, the foremost question should be whether reconfigurability has been essential to success. Topics such as architecture, programming languages, compilers, and environments, logic synthesis, and high performance applications are all suitable if the context is appropriate. For example, an architecture for an embedded application that happens to use FPGAs is not necessarily suitable for TRETS, but an architecture using FPGAs for which the reconfigurability of the FPGAs is an inherent part of the specifications (perhaps due to a need for re-use on multiple applications) would be appropriate for TRETS.
期刊最新文献
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs DyRecMul: Fast and Low-Cost Approximate Multiplier for FPGAs using Dynamic Reconfiguration Dynamic-ACTS - A Dynamic Graph Analytics Accelerator For HBM-Enabled FPGAs NC-Library: Expanding SystemC Capabilities for Nested reConfigurable Hardware Modelling PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1