HyPR:基于进化图的混合页面排名

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI:10.1109/HiPC50609.2020.00020

Hemant Kumar Giri, Mridul Haque, D. Banerjee

{"title":"HyPR:基于进化图的混合页面排名","authors":"Hemant Kumar Giri, Mridul Haque, D. Banerjee","doi":"10.1109/HiPC50609.2020.00020","DOIUrl":null,"url":null,"abstract":"PageRank (PR) is the standard metric used by the Google search engine to compute the importance of a web page via modeling the entire web as a first order Markov chain. The challenge of computing PR efficiently and quickly has been already addressed by several works previously who have shown innovations in both algorithms and in the use of parallel computing. The standard method of computing PR is handled by modelling the web as a graph. The fast growing internet adds several new web pages everyday and hence more nodes (representing the web pages) and edges (the hyperlinks) are added to this graph in an incremental fashion. Computing PR on this evolving graph is now an emerging challenge since computations from scratch on the massive graph is time consuming and unscalable. In this work, we propose Hybrid Page Rank (HyPR), which computes PR on evolving graphs using collaborative executions on muti-core CPUs and massively parallel GPUs. We exploit data parallelism via efficiently partitioning the graph into different regions that are affected and unaffected by the new updates. The different partitions are then processed in an overlapped manner for PR updates. The novelty of our technique is in utilizing the hybrid platform to scale the solution to massive graphs. The technique also provides high performance through parallel processing of every batch of updates using a parallel algorithm. HyPR efficiently executes on a NVIDIA V100 GPU hosted on a 6th Gen Intel Xeon CPU and is able to update a graph with 640M edges with a single batch of 100,000 edges in 12 ms. HyPR outperforms other state of the art techniques for computing PR on evolving graphs [1] by 4.8x. Additionally HyPR provides 1.2x speedup over GPU only executions, and 95x speedup over CPU only parallel executions.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HyPR: Hybrid Page Ranking on Evolving Graphs\",\"authors\":\"Hemant Kumar Giri, Mridul Haque, D. Banerjee\",\"doi\":\"10.1109/HiPC50609.2020.00020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PageRank (PR) is the standard metric used by the Google search engine to compute the importance of a web page via modeling the entire web as a first order Markov chain. The challenge of computing PR efficiently and quickly has been already addressed by several works previously who have shown innovations in both algorithms and in the use of parallel computing. The standard method of computing PR is handled by modelling the web as a graph. The fast growing internet adds several new web pages everyday and hence more nodes (representing the web pages) and edges (the hyperlinks) are added to this graph in an incremental fashion. Computing PR on this evolving graph is now an emerging challenge since computations from scratch on the massive graph is time consuming and unscalable. In this work, we propose Hybrid Page Rank (HyPR), which computes PR on evolving graphs using collaborative executions on muti-core CPUs and massively parallel GPUs. We exploit data parallelism via efficiently partitioning the graph into different regions that are affected and unaffected by the new updates. The different partitions are then processed in an overlapped manner for PR updates. The novelty of our technique is in utilizing the hybrid platform to scale the solution to massive graphs. The technique also provides high performance through parallel processing of every batch of updates using a parallel algorithm. HyPR efficiently executes on a NVIDIA V100 GPU hosted on a 6th Gen Intel Xeon CPU and is able to update a graph with 640M edges with a single batch of 100,000 edges in 12 ms. HyPR outperforms other state of the art techniques for computing PR on evolving graphs [1] by 4.8x. Additionally HyPR provides 1.2x speedup over GPU only executions, and 95x speedup over CPU only parallel executions.\",\"PeriodicalId\":375004,\"journal\":{\"name\":\"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPC50609.2020.00020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC50609.2020.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

PageRank (PR)是谷歌搜索引擎通过将整个网页建模为一阶马尔可夫链来计算网页重要性的标准度量。高效和快速计算PR的挑战已经在之前的几项工作中得到了解决，这些工作在算法和并行计算的使用方面都展示了创新。计算PR的标准方法是通过将web建模为图形来处理。快速发展的互联网每天增加几个新的网页，因此更多的节点(代表网页)和边(超链接)以增量的方式添加到这个图中。在这个不断发展的图上计算PR现在是一个新出现的挑战，因为在大规模图上从零开始计算既耗时又不可扩展。在这项工作中，我们提出了混合页面排名(HyPR)，它通过在多核cpu和大规模并行gpu上协同执行来计算进化图上的PR。我们通过有效地将图划分为受新更新影响和不受新更新影响的不同区域来利用数据并行性。然后以重叠的方式处理不同的分区以进行PR更新。我们技术的新颖之处在于利用混合平台将解决方案扩展到海量图形。该技术还通过使用并行算法并行处理每批更新来提供高性能。HyPR在搭载第六代Intel至强CPU的NVIDIA V100 GPU上有效执行，并且能够在12毫秒内用单个批次的100,000条边更新具有640M条边的图形。在演化图b[1]上，HyPR比其他最先进的PR计算技术高出4.8倍。此外，HyPR比仅GPU执行提供1.2倍的加速，比仅CPU并行执行提供95倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HyPR: Hybrid Page Ranking on Evolving Graphs

PageRank (PR) is the standard metric used by the Google search engine to compute the importance of a web page via modeling the entire web as a first order Markov chain. The challenge of computing PR efficiently and quickly has been already addressed by several works previously who have shown innovations in both algorithms and in the use of parallel computing. The standard method of computing PR is handled by modelling the web as a graph. The fast growing internet adds several new web pages everyday and hence more nodes (representing the web pages) and edges (the hyperlinks) are added to this graph in an incremental fashion. Computing PR on this evolving graph is now an emerging challenge since computations from scratch on the massive graph is time consuming and unscalable. In this work, we propose Hybrid Page Rank (HyPR), which computes PR on evolving graphs using collaborative executions on muti-core CPUs and massively parallel GPUs. We exploit data parallelism via efficiently partitioning the graph into different regions that are affected and unaffected by the new updates. The different partitions are then processed in an overlapped manner for PR updates. The novelty of our technique is in utilizing the hybrid platform to scale the solution to massive graphs. The technique also provides high performance through parallel processing of every batch of updates using a parallel algorithm. HyPR efficiently executes on a NVIDIA V100 GPU hosted on a 6th Gen Intel Xeon CPU and is able to update a graph with 640M edges with a single batch of 100,000 edges in 12 ms. HyPR outperforms other state of the art techniques for computing PR on evolving graphs [1] by 4.8x. Additionally HyPR provides 1.2x speedup over GPU only executions, and 95x speedup over CPU only parallel executions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助