Distributed pagerank for P2P systems

High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on Pub Date : 2003-06-22 DOI:10.1109/HPDC.2003.1210016

K. Sankaralingam, S. Sethumadhavan, J. Browne

{"title":"Distributed pagerank for P2P systems","authors":"K. Sankaralingam, S. Sethumadhavan, J. Browne","doi":"10.1109/HPDC.2003.1210016","DOIUrl":null,"url":null,"abstract":"This paper defines and describes a fully distributed implementation of Google's highly effective pagerank algorithm, for \"peer to peer\" (P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables incremental computation of pageranks as new documents are entered into or deleted from the network. Incremental update enables continuously accurate pageranks whereas the currently centralized web crawl and computation over Internet documents requires several days. This suggests possible applicability of the distributed algorithm to pagerank computations as a replacement for the centralized Web crawler based implementation for Internet documents. A complete solution of the distributed pagerank computation for an in-place network converges rapidly (1% accuracy in 10 iterations) for large systems although the time for iteration may be long. The incremental computation resulting from addition of a single document converges extremely rapidly, typically requiring update path lengths of fewer than 15 nodes even for large networks and very accurate solutions. This implementation of pagerank provides a uniform ranking scheme for documents in P2P systems, and its integration with P2P keyword search provides one solution to the network traffic problems engendered by return of document hits. In basic P2P keyword search, all the document hits must be returned to the querying node causing large network traffic. An incremental keyword search algorithm for P2P keyword search where document hits are sorted by pagerank, and incrementally returned to the querying node is proposed and evaluated. Integration of this algorithm into P2P keyword search can produce dramatic benefit both in terms of effectiveness for users and decrease in network traffic. The incremental search algorithm provided approximately a ten-fold reduction in network traffic for two-word and three-word querying.","PeriodicalId":430378,"journal":{"name":"High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"125","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPDC.2003.1210016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 125

Abstract

This paper defines and describes a fully distributed implementation of Google's highly effective pagerank algorithm, for "peer to peer" (P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables incremental computation of pageranks as new documents are entered into or deleted from the network. Incremental update enables continuously accurate pageranks whereas the currently centralized web crawl and computation over Internet documents requires several days. This suggests possible applicability of the distributed algorithm to pagerank computations as a replacement for the centralized Web crawler based implementation for Internet documents. A complete solution of the distributed pagerank computation for an in-place network converges rapidly (1% accuracy in 10 iterations) for large systems although the time for iteration may be long. The incremental computation resulting from addition of a single document converges extremely rapidly, typically requiring update path lengths of fewer than 15 nodes even for large networks and very accurate solutions. This implementation of pagerank provides a uniform ranking scheme for documents in P2P systems, and its integration with P2P keyword search provides one solution to the network traffic problems engendered by return of document hits. In basic P2P keyword search, all the document hits must be returned to the querying node causing large network traffic. An incremental keyword search algorithm for P2P keyword search where document hits are sorted by pagerank, and incrementally returned to the querying node is proposed and evaluated. Integration of this algorithm into P2P keyword search can produce dramatic benefit both in terms of effectiveness for users and decrease in network traffic. The incremental search algorithm provided approximately a ten-fold reduction in network traffic for two-word and three-word querying.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

P2P系统的分布式网页排名

本文定义并描述了一个完全分布式实现的Google的高效网页排名算法，用于“点对点”(P2P)系统。其实现基于线性系统的混沌(异步)迭代解。P2P实现还支持在网络中输入或删除新文档时对网页排名进行增量计算。增量更新可以实现持续准确的网页排名，而目前集中的网络抓取和互联网文档的计算需要几天时间。这表明分布式算法可能适用于网页排名计算，以替代基于Internet文档的集中式Web爬虫实现。对于大型系统来说，就地网络的分布式pagerank计算的完整解决方案可以快速收敛(10次迭代中有1%的精度)，尽管迭代的时间可能很长。增加单个文档导致的增量计算收敛速度非常快，即使对于大型网络和非常精确的解决方案，通常也需要少于15个节点的更新路径长度。pagerank的实现为P2P系统中的文档提供了一种统一的排序方案，它与P2P关键字搜索的集成为解决由于文档点击返回而产生的网络流量问题提供了一种解决方案。在基本的P2P关键字搜索中，必须将所有的文档命中结果返回到查询节点，导致网络流量很大。提出了一种P2P关键字搜索的增量式关键字搜索算法，该算法将文档点击数按pagerank排序，并增量返回到查询节点。将该算法集成到P2P关键字搜索中，无论是对用户的有效性还是对网络流量的减少都能产生显著的效益。增量搜索算法为两个词和三个词的查询提供了大约十倍的网络流量减少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on

自引率

0.00%

发文量

期刊最新文献

Adaptive polling of grid resource monitors using a slacker coherence model Optimizing GridFTP through dynamic right-sizing Dynamic virtual clusters in a grid site manager Distributed pagerank for P2P systems Using views for customizing reusable components in component-based frameworks