基于FPGA平台的gnn聚合加速器

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2021-12-01 DOI:10.1109/ICPADS53394.2021.00015

W. Yuan, Teng Tian, Huawen Liang, Xi Jin

{"title":"基于FPGA平台的gnn聚合加速器","authors":"W. Yuan, Teng Tian, Huawen Liang, Xi Jin","doi":"10.1109/ICPADS53394.2021.00015","DOIUrl":null,"url":null,"abstract":"Graph Neural Networks (GNNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs. GNNs mainly include two phases with different execution patterns. The Gather phase, depends on the structure of the graph, presenting a sparse and irregular execution pattern. The Apply phase, acts like other neural networks, showing a dense and regular execution pattern. It is challenging to accelerate GNNs, due to irregular data communication to gather information within the graph. To address this challenge, hardware acceleration for Gather phase is critical. The purpose of this research is to design and implement an FPGA-based accelerator for Gather phase. It achieves excellent performance on acceleration and energy efficiency. Evaluation is performed using a Xilinx VCU128 FPGA with three commonly-used datasets. Compared to the state-of-the-art software framework running on Intel Xeon CPU and NVIDIA P100 GPU, our work achieves on average 101.28× speedup with 75.27× dynamic energy reduction and average 12.27× speedup with 45.56× dynamic energy reduction, respectively.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Gather Accelerator for GNNs on FPGA Platform\",\"authors\":\"W. Yuan, Teng Tian, Huawen Liang, Xi Jin\",\"doi\":\"10.1109/ICPADS53394.2021.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph Neural Networks (GNNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs. GNNs mainly include two phases with different execution patterns. The Gather phase, depends on the structure of the graph, presenting a sparse and irregular execution pattern. The Apply phase, acts like other neural networks, showing a dense and regular execution pattern. It is challenging to accelerate GNNs, due to irregular data communication to gather information within the graph. To address this challenge, hardware acceleration for Gather phase is critical. The purpose of this research is to design and implement an FPGA-based accelerator for Gather phase. It achieves excellent performance on acceleration and energy efficiency. Evaluation is performed using a Xilinx VCU128 FPGA with three commonly-used datasets. Compared to the state-of-the-art software framework running on Intel Xeon CPU and NVIDIA P100 GPU, our work achieves on average 101.28× speedup with 75.27× dynamic energy reduction and average 12.27× speedup with 45.56× dynamic energy reduction, respectively.\",\"PeriodicalId\":309508,\"journal\":{\"name\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS53394.2021.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS53394.2021.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

图神经网络(gnn)已经成为最先进的深度学习模型，用于图的表示学习。gnn主要包括两个阶段，执行模式不同。Gather阶段，取决于图的结构，呈现稀疏和不规则的执行模式。应用阶段，像其他神经网络一样，表现出密集和规则的执行模式。由于在图中收集信息的数据通信不规范，加速gnn具有挑战性。为了应对这一挑战，Gather阶段的硬件加速至关重要。本研究的目的是设计并实现一个基于fpga的集散相位加速器。它在加速和能源效率方面达到了优异的性能。使用Xilinx VCU128 FPGA和三个常用数据集进行评估。与运行在Intel至强CPU和NVIDIA P100 GPU上的最先进的软件框架相比，我们的工作分别实现了平均101.28倍的加速和75.27倍的动态能耗降低，平均12.27倍的加速和45.56倍的动态能耗降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Gather Accelerator for GNNs on FPGA Platform

Graph Neural Networks (GNNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs. GNNs mainly include two phases with different execution patterns. The Gather phase, depends on the structure of the graph, presenting a sparse and irregular execution pattern. The Apply phase, acts like other neural networks, showing a dense and regular execution pattern. It is challenging to accelerate GNNs, due to irregular data communication to gather information within the graph. To address this challenge, hardware acceleration for Gather phase is critical. The purpose of this research is to design and implement an FPGA-based accelerator for Gather phase. It achieves excellent performance on acceleration and energy efficiency. Evaluation is performed using a Xilinx VCU128 FPGA with three commonly-used datasets. Compared to the state-of-the-art software framework running on Intel Xeon CPU and NVIDIA P100 GPU, our work achieves on average 101.28× speedup with 75.27× dynamic energy reduction and average 12.27× speedup with 45.56× dynamic energy reduction, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

自引率

0.00%

发文量

期刊最新文献

Choosing Appropriate AI-enabled Edge Devices, Not the Costly Ones Collaborative Transmission over Intermediate Links in Duty-Cycle WSNs Efficient Asynchronous GCN Training on a GPU Cluster A Forecasting Method of Dual Traffic Condition Indicators Based on Ensemble Learning Simple yet Efficient Deployment of Scientific Applications in the Cloud