Input Feature Pruning for Accelerating GNN Inference on Heterogeneous Platforms

Jason Yik, S. Kuppannagari, Hanqing Zeng, V. Prasanna
{"title":"Input Feature Pruning for Accelerating GNN Inference on Heterogeneous Platforms","authors":"Jason Yik, S. Kuppannagari, Hanqing Zeng, V. Prasanna","doi":"10.1109/HiPC56025.2022.00045","DOIUrl":null,"url":null,"abstract":"Graph Neural Networks (GNNs) are an emerging class of machine learning models which utilize structured graph information and node features to reduce high-dimensional input data to low-dimensional embeddings, from which predictions can be made. Due to the compounding effect of aggregating neighbor information, GNN inferences require raw data from many times more nodes than are targeted for prediction. Thus, on heterogeneous compute platforms, inference latency can be largely subject to the inter-device communication cost of transferring input feature data to the GPU/accelerator before computation has even begun. In this paper, we analyze the trade-off effect of pruning input features from GNN models, reducing the volume of raw data that the model works with to lower communication latency at the expense of an expected decrease in the overall model accuracy. We develop greedy and regression-based algorithms to determine which features to retain for optimal prediction accuracy. We evaluate pruned model variants and find that they can reduce inference latency by up to 80% with an accuracy loss of less than 5% compared to non-pruned models. Furthermore, we show that the latency reductions from input feature pruning can be extended under different system variables such as batch size and floating point precision.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC56025.2022.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Graph Neural Networks (GNNs) are an emerging class of machine learning models which utilize structured graph information and node features to reduce high-dimensional input data to low-dimensional embeddings, from which predictions can be made. Due to the compounding effect of aggregating neighbor information, GNN inferences require raw data from many times more nodes than are targeted for prediction. Thus, on heterogeneous compute platforms, inference latency can be largely subject to the inter-device communication cost of transferring input feature data to the GPU/accelerator before computation has even begun. In this paper, we analyze the trade-off effect of pruning input features from GNN models, reducing the volume of raw data that the model works with to lower communication latency at the expense of an expected decrease in the overall model accuracy. We develop greedy and regression-based algorithms to determine which features to retain for optimal prediction accuracy. We evaluate pruned model variants and find that they can reduce inference latency by up to 80% with an accuracy loss of less than 5% compared to non-pruned models. Furthermore, we show that the latency reductions from input feature pruning can be extended under different system variables such as batch size and floating point precision.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
异构平台上加速GNN推理的输入特征剪枝
图神经网络(gnn)是一类新兴的机器学习模型,它利用结构化的图信息和节点特征将高维输入数据减少到低维嵌入,从中可以进行预测。由于聚合邻居信息的复合效应,GNN推断需要的原始数据比预测的目标节点多很多倍。因此,在异构计算平台上,推理延迟很大程度上取决于在计算开始之前将输入特征数据传输到GPU/加速器的设备间通信成本。在本文中,我们分析了从GNN模型中修剪输入特征的权衡效应,减少了模型处理的原始数据量,以降低整体模型精度的预期下降为代价降低了通信延迟。我们开发了贪婪和基于回归的算法来确定保留哪些特征以获得最佳预测精度。我们评估了经过修剪的模型变体,发现与未经过修剪的模型相比,它们可以减少高达80%的推理延迟,准确性损失小于5%。此外,我们还证明了输入特征修剪可以在不同的系统变量(如批大小和浮点精度)下扩展延迟减少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HiPC 2022 Technical Program Committee A Deep Learning-Based In Situ Analysis Framework for Tropical Cyclogenesis Prediction COMPROF and COMPLACE: Shared-Memory Communication Profiling and Automated Thread Placement via Dynamic Binary Instrumentation Message from the HiPC 2022 General Co-Chairs Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1