基于Minsky架构的三角计数和桁架分解的协同(CPU + GPU)算法:静态图挑战:子图同构

K. Date, Keven Feng, R. Nagi, Jinjun Xiong, N. Kim, Wen-mei W. Hwu
{"title":"基于Minsky架构的三角计数和桁架分解的协同(CPU + GPU)算法:静态图挑战:子图同构","authors":"K. Date, Keven Feng, R. Nagi, Jinjun Xiong, N. Kim, Wen-mei W. Hwu","doi":"10.1109/HPEC.2017.8091042","DOIUrl":null,"url":null,"abstract":"In this paper, we present collaborative CPU + GPU algorithms for triangle counting and truss decomposition, the two fundamental problems in graph analytics. We describe the implementation details and present experimental evaluation on the IBM Minsky platform. The main contribution of this paper is a thorough benchmarking and comparison of the different memory management schemes offered by CUDA 8 and NVLink, which can be harnessed for tackling large problems where the limited GPU memory capacity is the primary bottleneck in traditional computing platform. We find that the collaborative algorithms achieve 28× speedup on average (180× max) for triangle counting, and 165× speedup on average (498× max) for truss decomposition, when compared with the baseline Python implementation provided by the Graph Challenge organizers.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"16 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture: Static graph challenge: Subgraph isomorphism\",\"authors\":\"K. Date, Keven Feng, R. Nagi, Jinjun Xiong, N. Kim, Wen-mei W. Hwu\",\"doi\":\"10.1109/HPEC.2017.8091042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present collaborative CPU + GPU algorithms for triangle counting and truss decomposition, the two fundamental problems in graph analytics. We describe the implementation details and present experimental evaluation on the IBM Minsky platform. The main contribution of this paper is a thorough benchmarking and comparison of the different memory management schemes offered by CUDA 8 and NVLink, which can be harnessed for tackling large problems where the limited GPU memory capacity is the primary bottleneck in traditional computing platform. We find that the collaborative algorithms achieve 28× speedup on average (180× max) for triangle counting, and 165× speedup on average (498× max) for truss decomposition, when compared with the baseline Python implementation provided by the Graph Challenge organizers.\",\"PeriodicalId\":364903,\"journal\":{\"name\":\"2017 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"16 3\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC.2017.8091042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2017.8091042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

摘要

在本文中,我们提出了三角形计数和桁架分解的协同CPU + GPU算法,这是图分析中的两个基本问题。我们描述了实现细节,并在IBM Minsky平台上进行了实验评估。本文的主要贡献是对CUDA 8和NVLink提供的不同内存管理方案进行了全面的基准测试和比较,这可以用于解决传统计算平台中有限的GPU内存容量是主要瓶颈的大型问题。我们发现,与Graph Challenge组织者提供的基线Python实现相比,协作算法在三角形计数方面平均加速28倍(最大加速180倍),在桁架分解方面平均加速165倍(最大加速498倍)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture: Static graph challenge: Subgraph isomorphism
In this paper, we present collaborative CPU + GPU algorithms for triangle counting and truss decomposition, the two fundamental problems in graph analytics. We describe the implementation details and present experimental evaluation on the IBM Minsky platform. The main contribution of this paper is a thorough benchmarking and comparison of the different memory management schemes offered by CUDA 8 and NVLink, which can be harnessed for tackling large problems where the limited GPU memory capacity is the primary bottleneck in traditional computing platform. We find that the collaborative algorithms achieve 28× speedup on average (180× max) for triangle counting, and 165× speedup on average (498× max) for truss decomposition, when compared with the baseline Python implementation provided by the Graph Challenge organizers.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimized task graph mapping on a many-core neuromorphic supercomputer Software-defined extreme scale networks for bigdata applications Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi xDCI, a data science cyberinfrastructure for interdisciplinary research Leakage energy reduction for hard real-time caches
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1