Graphite: optimizing graph neural networks on CPUs through cooperative software-hardware techniques

Proceedings of the 49th Annual International Symposium on Computer Architecture Pub Date : 2022-06-18 DOI:10.1145/3470496.3527403

Zhangxiaowen Gong, Houxiang Ji, Yao Yao, Christopher W. Fletcher, C. Hughes, J. Torrellas

{"title":"Graphite: optimizing graph neural networks on CPUs through cooperative software-hardware techniques","authors":"Zhangxiaowen Gong, Houxiang Ji, Yao Yao, Christopher W. Fletcher, C. Hughes, J. Torrellas","doi":"10.1145/3470496.3527403","DOIUrl":null,"url":null,"abstract":"Graph Neural Networks (GNNs) are becoming popular because they are effective at extracting information from graphs. To execute GNNs, CPUs are good platforms because of their high availability and terabyte-level memory capacity, which enables full-batch computation on large graphs. However, GNNs on CPUs are heavily memory bound, which limits their performance. In this paper, we address this problem by alleviating the stress of GNNs on memory with cooperative software-hardware techniques. Our software techniques include: (i) layer fusion that overlaps the memory-intensive phase and the compute-intensive phase in a GNN layer, (ii) feature compression that reduces memory traffic by exploiting the sparsity in the vertex feature vectors, and (iii) an algorithm that changes the processing order of vertices to improve temporal locality. On top of the software techniques, we enhance the CPUs' direct memory access (DMA) engines with the capability to execute the GNNs' memory-intensive phase, so that the processor cores can focus on the compute-intensive phase. We call the combination of our software and hardware techniques Graphite. We evaluate Graphite with popular GNN models on large graphs. The result is high-performance full-batch GNN training and inference on CPUs. Our software techniques outperform a state-of-the-art GNN layer implementation by 1.7--1.9x in inference and 1.6--2.6x in training. Our combined software and hardware techniques speedup inference by 1.6--2.0x and training by 1.9--3.1x.","PeriodicalId":337932,"journal":{"name":"Proceedings of the 49th Annual International Symposium on Computer Architecture","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 49th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3470496.3527403","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Graph Neural Networks (GNNs) are becoming popular because they are effective at extracting information from graphs. To execute GNNs, CPUs are good platforms because of their high availability and terabyte-level memory capacity, which enables full-batch computation on large graphs. However, GNNs on CPUs are heavily memory bound, which limits their performance. In this paper, we address this problem by alleviating the stress of GNNs on memory with cooperative software-hardware techniques. Our software techniques include: (i) layer fusion that overlaps the memory-intensive phase and the compute-intensive phase in a GNN layer, (ii) feature compression that reduces memory traffic by exploiting the sparsity in the vertex feature vectors, and (iii) an algorithm that changes the processing order of vertices to improve temporal locality. On top of the software techniques, we enhance the CPUs' direct memory access (DMA) engines with the capability to execute the GNNs' memory-intensive phase, so that the processor cores can focus on the compute-intensive phase. We call the combination of our software and hardware techniques Graphite. We evaluate Graphite with popular GNN models on large graphs. The result is high-performance full-batch GNN training and inference on CPUs. Our software techniques outperform a state-of-the-art GNN layer implementation by 1.7--1.9x in inference and 1.6--2.6x in training. Our combined software and hardware techniques speedup inference by 1.6--2.0x and training by 1.9--3.1x.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

石墨:通过协同软硬件技术优化cpu上的图形神经网络

图神经网络(gnn)由于能够有效地从图中提取信息而越来越受欢迎。为了执行gnn, cpu是很好的平台，因为它们具有高可用性和tb级的内存容量，可以在大型图形上进行全批计算。然而，cpu上的gnn是严重的内存绑定，这限制了它们的性能。在本文中，我们通过使用软硬件协作技术减轻gnn对内存的压力来解决这个问题。我们的软件技术包括:(i)在GNN层中重叠内存密集型阶段和计算密集型阶段的层融合，(ii)通过利用顶点特征向量的稀疏性来减少内存流量的特征压缩，以及(iii)改变顶点处理顺序以改善时间局域性的算法。在软件技术的基础上，我们增强了cpu的直接内存访问(DMA)引擎，使其能够执行gnn的内存密集型阶段，从而使处理器核心能够专注于计算密集型阶段。我们把软件和硬件技术的结合称为石墨。我们在大图上用流行的GNN模型评估石墨。结果是在cpu上实现了高性能的全批GNN训练和推理。我们的软件技术在推理方面比最先进的GNN层实现高出1.7- 1.9倍，在训练方面高出1.6- 2.6倍。我们结合了软件和硬件技术，推理速度提高了1.6- 2.0倍，训练速度提高了1.9- 3.1倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 49th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量