快速三角形计数GPU

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI:10.1109/HPEC.2019.8916216

Chuangyi Gui, Long Zheng, Pengcheng Yao, Xiaofei Liao, Hai Jin

{"title":"快速三角形计数GPU","authors":"Chuangyi Gui, Long Zheng, Pengcheng Yao, Xiaofei Liao, Hai Jin","doi":"10.1109/HPEC.2019.8916216","DOIUrl":null,"url":null,"abstract":"Triangle counting is one of the most basic graph applications to solve many real-world problems in a wide variety of domains. Exploring the massive parallelism of the Graphics Processing Unit (GPU) to accelerate the triangle counting is prevail. We identify that the stat-of-the-art GPU-based studies that focus on improving the load balancing still exhibit inherently a large number of random accesses in degrading the performance. In this paper, we design a prefetching scheme that buffers the neighbor list of the processed vertex in advance in the fast shared memory to avoid high latency of random global memory access. Also, we adopt the degree-based graph reordering technique and design a simple heuristic to evenly distribute the workload. Compared to the state-of-the-art HEPC Graph Challenge Champion in the last year, we advance to improve the performance of triangle counting by up to $5.9 \\times $ speedup with $\\gt 10^{9}$ TEPS on a single GPU for many large real graphs from graph challenge datasets.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast Triangle Counting on GPU\",\"authors\":\"Chuangyi Gui, Long Zheng, Pengcheng Yao, Xiaofei Liao, Hai Jin\",\"doi\":\"10.1109/HPEC.2019.8916216\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Triangle counting is one of the most basic graph applications to solve many real-world problems in a wide variety of domains. Exploring the massive parallelism of the Graphics Processing Unit (GPU) to accelerate the triangle counting is prevail. We identify that the stat-of-the-art GPU-based studies that focus on improving the load balancing still exhibit inherently a large number of random accesses in degrading the performance. In this paper, we design a prefetching scheme that buffers the neighbor list of the processed vertex in advance in the fast shared memory to avoid high latency of random global memory access. Also, we adopt the degree-based graph reordering technique and design a simple heuristic to evenly distribute the workload. Compared to the state-of-the-art HEPC Graph Challenge Champion in the last year, we advance to improve the performance of triangle counting by up to $5.9 \\\\times $ speedup with $\\\\gt 10^{9}$ TEPS on a single GPU for many large real graphs from graph challenge datasets.\",\"PeriodicalId\":184253,\"journal\":{\"name\":\"2019 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"114 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC.2019.8916216\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2019.8916216","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

三角形计数是最基本的图形应用程序之一，用于解决各种领域中的许多实际问题。探索图形处理单元(GPU)的大规模并行性来加速三角形计数是流行的。我们发现，专注于改善负载平衡的最先进的基于gpu的研究仍然在降低性能方面表现出大量的随机访问。本文设计了一种预取方案，将处理顶点的邻居列表提前缓冲在快速共享内存中，以避免随机全局内存访问的高延迟。同时，我们采用了基于度的图重排序技术，并设计了一个简单的启发式算法来均匀分配工作负载。与去年最先进的HEPC图形挑战冠军相比，我们在单个GPU上使用$ $ gt 10^{9}$ TEPS将三角形计数的性能提高了5.9倍，用于来自图形挑战数据集的许多大型真实图形。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast Triangle Counting on GPU

Triangle counting is one of the most basic graph applications to solve many real-world problems in a wide variety of domains. Exploring the massive parallelism of the Graphics Processing Unit (GPU) to accelerate the triangle counting is prevail. We identify that the stat-of-the-art GPU-based studies that focus on improving the load balancing still exhibit inherently a large number of random accesses in degrading the performance. In this paper, we design a prefetching scheme that buffers the neighbor list of the processed vertex in advance in the fast shared memory to avoid high latency of random global memory access. Also, we adopt the degree-based graph reordering technique and design a simple heuristic to evenly distribute the workload. Compared to the state-of-the-art HEPC Graph Challenge Champion in the last year, we advance to improve the performance of triangle counting by up to $5.9 \times $ speedup with $\gt 10^{9}$ TEPS on a single GPU for many large real graphs from graph challenge datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量