基于FPGA平台的gnn聚合加速器

W. Yuan, Teng Tian, Huawen Liang, Xi Jin
{"title":"基于FPGA平台的gnn聚合加速器","authors":"W. Yuan, Teng Tian, Huawen Liang, Xi Jin","doi":"10.1109/ICPADS53394.2021.00015","DOIUrl":null,"url":null,"abstract":"Graph Neural Networks (GNNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs. GNNs mainly include two phases with different execution patterns. The Gather phase, depends on the structure of the graph, presenting a sparse and irregular execution pattern. The Apply phase, acts like other neural networks, showing a dense and regular execution pattern. It is challenging to accelerate GNNs, due to irregular data communication to gather information within the graph. To address this challenge, hardware acceleration for Gather phase is critical. The purpose of this research is to design and implement an FPGA-based accelerator for Gather phase. It achieves excellent performance on acceleration and energy efficiency. Evaluation is performed using a Xilinx VCU128 FPGA with three commonly-used datasets. Compared to the state-of-the-art software framework running on Intel Xeon CPU and NVIDIA P100 GPU, our work achieves on average 101.28× speedup with 75.27× dynamic energy reduction and average 12.27× speedup with 45.56× dynamic energy reduction, respectively.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Gather Accelerator for GNNs on FPGA Platform\",\"authors\":\"W. Yuan, Teng Tian, Huawen Liang, Xi Jin\",\"doi\":\"10.1109/ICPADS53394.2021.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph Neural Networks (GNNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs. GNNs mainly include two phases with different execution patterns. The Gather phase, depends on the structure of the graph, presenting a sparse and irregular execution pattern. The Apply phase, acts like other neural networks, showing a dense and regular execution pattern. It is challenging to accelerate GNNs, due to irregular data communication to gather information within the graph. To address this challenge, hardware acceleration for Gather phase is critical. The purpose of this research is to design and implement an FPGA-based accelerator for Gather phase. It achieves excellent performance on acceleration and energy efficiency. Evaluation is performed using a Xilinx VCU128 FPGA with three commonly-used datasets. Compared to the state-of-the-art software framework running on Intel Xeon CPU and NVIDIA P100 GPU, our work achieves on average 101.28× speedup with 75.27× dynamic energy reduction and average 12.27× speedup with 45.56× dynamic energy reduction, respectively.\",\"PeriodicalId\":309508,\"journal\":{\"name\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS53394.2021.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS53394.2021.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

图神经网络(gnn)已经成为最先进的深度学习模型,用于图的表示学习。gnn主要包括两个阶段,执行模式不同。Gather阶段,取决于图的结构,呈现稀疏和不规则的执行模式。应用阶段,像其他神经网络一样,表现出密集和规则的执行模式。由于在图中收集信息的数据通信不规范,加速gnn具有挑战性。为了应对这一挑战,Gather阶段的硬件加速至关重要。本研究的目的是设计并实现一个基于fpga的集散相位加速器。它在加速和能源效率方面达到了优异的性能。使用Xilinx VCU128 FPGA和三个常用数据集进行评估。与运行在Intel至强CPU和NVIDIA P100 GPU上的最先进的软件框架相比,我们的工作分别实现了平均101.28倍的加速和75.27倍的动态能耗降低,平均12.27倍的加速和45.56倍的动态能耗降低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Gather Accelerator for GNNs on FPGA Platform
Graph Neural Networks (GNNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs. GNNs mainly include two phases with different execution patterns. The Gather phase, depends on the structure of the graph, presenting a sparse and irregular execution pattern. The Apply phase, acts like other neural networks, showing a dense and regular execution pattern. It is challenging to accelerate GNNs, due to irregular data communication to gather information within the graph. To address this challenge, hardware acceleration for Gather phase is critical. The purpose of this research is to design and implement an FPGA-based accelerator for Gather phase. It achieves excellent performance on acceleration and energy efficiency. Evaluation is performed using a Xilinx VCU128 FPGA with three commonly-used datasets. Compared to the state-of-the-art software framework running on Intel Xeon CPU and NVIDIA P100 GPU, our work achieves on average 101.28× speedup with 75.27× dynamic energy reduction and average 12.27× speedup with 45.56× dynamic energy reduction, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Choosing Appropriate AI-enabled Edge Devices, Not the Costly Ones Collaborative Transmission over Intermediate Links in Duty-Cycle WSNs Efficient Asynchronous GCN Training on a GPU Cluster A Forecasting Method of Dual Traffic Condition Indicators Based on Ensemble Learning Simple yet Efficient Deployment of Scientific Applications in the Cloud
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1