HyPR:基于进化图的混合页面排名

Hemant Kumar Giri, Mridul Haque, D. Banerjee
{"title":"HyPR:基于进化图的混合页面排名","authors":"Hemant Kumar Giri, Mridul Haque, D. Banerjee","doi":"10.1109/HiPC50609.2020.00020","DOIUrl":null,"url":null,"abstract":"PageRank (PR) is the standard metric used by the Google search engine to compute the importance of a web page via modeling the entire web as a first order Markov chain. The challenge of computing PR efficiently and quickly has been already addressed by several works previously who have shown innovations in both algorithms and in the use of parallel computing. The standard method of computing PR is handled by modelling the web as a graph. The fast growing internet adds several new web pages everyday and hence more nodes (representing the web pages) and edges (the hyperlinks) are added to this graph in an incremental fashion. Computing PR on this evolving graph is now an emerging challenge since computations from scratch on the massive graph is time consuming and unscalable. In this work, we propose Hybrid Page Rank (HyPR), which computes PR on evolving graphs using collaborative executions on muti-core CPUs and massively parallel GPUs. We exploit data parallelism via efficiently partitioning the graph into different regions that are affected and unaffected by the new updates. The different partitions are then processed in an overlapped manner for PR updates. The novelty of our technique is in utilizing the hybrid platform to scale the solution to massive graphs. The technique also provides high performance through parallel processing of every batch of updates using a parallel algorithm. HyPR efficiently executes on a NVIDIA V100 GPU hosted on a 6th Gen Intel Xeon CPU and is able to update a graph with 640M edges with a single batch of 100,000 edges in 12 ms. HyPR outperforms other state of the art techniques for computing PR on evolving graphs [1] by 4.8x. Additionally HyPR provides 1.2x speedup over GPU only executions, and 95x speedup over CPU only parallel executions.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HyPR: Hybrid Page Ranking on Evolving Graphs\",\"authors\":\"Hemant Kumar Giri, Mridul Haque, D. Banerjee\",\"doi\":\"10.1109/HiPC50609.2020.00020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PageRank (PR) is the standard metric used by the Google search engine to compute the importance of a web page via modeling the entire web as a first order Markov chain. The challenge of computing PR efficiently and quickly has been already addressed by several works previously who have shown innovations in both algorithms and in the use of parallel computing. The standard method of computing PR is handled by modelling the web as a graph. The fast growing internet adds several new web pages everyday and hence more nodes (representing the web pages) and edges (the hyperlinks) are added to this graph in an incremental fashion. Computing PR on this evolving graph is now an emerging challenge since computations from scratch on the massive graph is time consuming and unscalable. In this work, we propose Hybrid Page Rank (HyPR), which computes PR on evolving graphs using collaborative executions on muti-core CPUs and massively parallel GPUs. We exploit data parallelism via efficiently partitioning the graph into different regions that are affected and unaffected by the new updates. The different partitions are then processed in an overlapped manner for PR updates. The novelty of our technique is in utilizing the hybrid platform to scale the solution to massive graphs. The technique also provides high performance through parallel processing of every batch of updates using a parallel algorithm. HyPR efficiently executes on a NVIDIA V100 GPU hosted on a 6th Gen Intel Xeon CPU and is able to update a graph with 640M edges with a single batch of 100,000 edges in 12 ms. HyPR outperforms other state of the art techniques for computing PR on evolving graphs [1] by 4.8x. Additionally HyPR provides 1.2x speedup over GPU only executions, and 95x speedup over CPU only parallel executions.\",\"PeriodicalId\":375004,\"journal\":{\"name\":\"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPC50609.2020.00020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC50609.2020.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

PageRank (PR)是谷歌搜索引擎通过将整个网页建模为一阶马尔可夫链来计算网页重要性的标准度量。高效和快速计算PR的挑战已经在之前的几项工作中得到了解决,这些工作在算法和并行计算的使用方面都展示了创新。计算PR的标准方法是通过将web建模为图形来处理。快速发展的互联网每天增加几个新的网页,因此更多的节点(代表网页)和边(超链接)以增量的方式添加到这个图中。在这个不断发展的图上计算PR现在是一个新出现的挑战,因为在大规模图上从零开始计算既耗时又不可扩展。在这项工作中,我们提出了混合页面排名(HyPR),它通过在多核cpu和大规模并行gpu上协同执行来计算进化图上的PR。我们通过有效地将图划分为受新更新影响和不受新更新影响的不同区域来利用数据并行性。然后以重叠的方式处理不同的分区以进行PR更新。我们技术的新颖之处在于利用混合平台将解决方案扩展到海量图形。该技术还通过使用并行算法并行处理每批更新来提供高性能。HyPR在搭载第六代Intel至强CPU的NVIDIA V100 GPU上有效执行,并且能够在12毫秒内用单个批次的100,000条边更新具有640M条边的图形。在演化图b[1]上,HyPR比其他最先进的PR计算技术高出4.8倍。此外,HyPR比仅GPU执行提供1.2倍的加速,比仅CPU并行执行提供95倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HyPR: Hybrid Page Ranking on Evolving Graphs
PageRank (PR) is the standard metric used by the Google search engine to compute the importance of a web page via modeling the entire web as a first order Markov chain. The challenge of computing PR efficiently and quickly has been already addressed by several works previously who have shown innovations in both algorithms and in the use of parallel computing. The standard method of computing PR is handled by modelling the web as a graph. The fast growing internet adds several new web pages everyday and hence more nodes (representing the web pages) and edges (the hyperlinks) are added to this graph in an incremental fashion. Computing PR on this evolving graph is now an emerging challenge since computations from scratch on the massive graph is time consuming and unscalable. In this work, we propose Hybrid Page Rank (HyPR), which computes PR on evolving graphs using collaborative executions on muti-core CPUs and massively parallel GPUs. We exploit data parallelism via efficiently partitioning the graph into different regions that are affected and unaffected by the new updates. The different partitions are then processed in an overlapped manner for PR updates. The novelty of our technique is in utilizing the hybrid platform to scale the solution to massive graphs. The technique also provides high performance through parallel processing of every batch of updates using a parallel algorithm. HyPR efficiently executes on a NVIDIA V100 GPU hosted on a 6th Gen Intel Xeon CPU and is able to update a graph with 640M edges with a single batch of 100,000 edges in 12 ms. HyPR outperforms other state of the art techniques for computing PR on evolving graphs [1] by 4.8x. Additionally HyPR provides 1.2x speedup over GPU only executions, and 95x speedup over CPU only parallel executions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HiPC 2020 ORGANIZATION HiPC 2020 Industry Sponsors PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks Algorithms for Preemptive Co-scheduling of Kernels on GPUs 27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2020) Technical program
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1