Graphite: optimizing graph neural networks on CPUs through cooperative software-hardware techniques

Zhangxiaowen Gong, Houxiang Ji, Yao Yao, Christopher W. Fletcher, C. Hughes, J. Torrellas
{"title":"Graphite: optimizing graph neural networks on CPUs through cooperative software-hardware techniques","authors":"Zhangxiaowen Gong, Houxiang Ji, Yao Yao, Christopher W. Fletcher, C. Hughes, J. Torrellas","doi":"10.1145/3470496.3527403","DOIUrl":null,"url":null,"abstract":"Graph Neural Networks (GNNs) are becoming popular because they are effective at extracting information from graphs. To execute GNNs, CPUs are good platforms because of their high availability and terabyte-level memory capacity, which enables full-batch computation on large graphs. However, GNNs on CPUs are heavily memory bound, which limits their performance. In this paper, we address this problem by alleviating the stress of GNNs on memory with cooperative software-hardware techniques. Our software techniques include: (i) layer fusion that overlaps the memory-intensive phase and the compute-intensive phase in a GNN layer, (ii) feature compression that reduces memory traffic by exploiting the sparsity in the vertex feature vectors, and (iii) an algorithm that changes the processing order of vertices to improve temporal locality. On top of the software techniques, we enhance the CPUs' direct memory access (DMA) engines with the capability to execute the GNNs' memory-intensive phase, so that the processor cores can focus on the compute-intensive phase. We call the combination of our software and hardware techniques Graphite. We evaluate Graphite with popular GNN models on large graphs. The result is high-performance full-batch GNN training and inference on CPUs. Our software techniques outperform a state-of-the-art GNN layer implementation by 1.7--1.9x in inference and 1.6--2.6x in training. Our combined software and hardware techniques speedup inference by 1.6--2.0x and training by 1.9--3.1x.","PeriodicalId":337932,"journal":{"name":"Proceedings of the 49th Annual International Symposium on Computer Architecture","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 49th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3470496.3527403","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Graph Neural Networks (GNNs) are becoming popular because they are effective at extracting information from graphs. To execute GNNs, CPUs are good platforms because of their high availability and terabyte-level memory capacity, which enables full-batch computation on large graphs. However, GNNs on CPUs are heavily memory bound, which limits their performance. In this paper, we address this problem by alleviating the stress of GNNs on memory with cooperative software-hardware techniques. Our software techniques include: (i) layer fusion that overlaps the memory-intensive phase and the compute-intensive phase in a GNN layer, (ii) feature compression that reduces memory traffic by exploiting the sparsity in the vertex feature vectors, and (iii) an algorithm that changes the processing order of vertices to improve temporal locality. On top of the software techniques, we enhance the CPUs' direct memory access (DMA) engines with the capability to execute the GNNs' memory-intensive phase, so that the processor cores can focus on the compute-intensive phase. We call the combination of our software and hardware techniques Graphite. We evaluate Graphite with popular GNN models on large graphs. The result is high-performance full-batch GNN training and inference on CPUs. Our software techniques outperform a state-of-the-art GNN layer implementation by 1.7--1.9x in inference and 1.6--2.6x in training. Our combined software and hardware techniques speedup inference by 1.6--2.0x and training by 1.9--3.1x.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
石墨:通过协同软硬件技术优化cpu上的图形神经网络
图神经网络(gnn)由于能够有效地从图中提取信息而越来越受欢迎。为了执行gnn, cpu是很好的平台,因为它们具有高可用性和tb级的内存容量,可以在大型图形上进行全批计算。然而,cpu上的gnn是严重的内存绑定,这限制了它们的性能。在本文中,我们通过使用软硬件协作技术减轻gnn对内存的压力来解决这个问题。我们的软件技术包括:(i)在GNN层中重叠内存密集型阶段和计算密集型阶段的层融合,(ii)通过利用顶点特征向量的稀疏性来减少内存流量的特征压缩,以及(iii)改变顶点处理顺序以改善时间局域性的算法。在软件技术的基础上,我们增强了cpu的直接内存访问(DMA)引擎,使其能够执行gnn的内存密集型阶段,从而使处理器核心能够专注于计算密集型阶段。我们把软件和硬件技术的结合称为石墨。我们在大图上用流行的GNN模型评估石墨。结果是在cpu上实现了高性能的全批GNN训练和推理。我们的软件技术在推理方面比最先进的GNN层实现高出1.7- 1.9倍,在训练方面高出1.6- 2.6倍。我们结合了软件和硬件技术,推理速度提高了1.6- 2.0倍,训练速度提高了1.9- 3.1倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
BioHD: an efficient genome sequence search platform using HyperDimensional memorization MeNDA: a near-memory multi-way merge solution for sparse transposition and dataflows Graphite: optimizing graph neural networks on CPUs through cooperative software-hardware techniques INSPIRE: in-storage private information retrieval via protocol and architecture co-design CraterLake: a hardware accelerator for efficient unbounded computation on encrypted data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1