{"title":"分布式网络搜索的P2P目录:从各尽所能到各取所需","authors":"Matthias Bender, S. Michel, G. Weikum","doi":"10.1109/ICDEW.2006.110","DOIUrl":null,"url":null,"abstract":"A compelling application of peer-to-peer (P2P) system technology would be distributed Web search, where each peer autonomously runs a search engine on a personalized local corpus (e.g., built from a thematically focused Web crawl) and peers collaborate by routing queries to remote peers that can contribute many or particularly good results for these specific queries. Such systems typically rely on a decentralized directory, e.g., built on top of a distributed hash table (DHT), that holds compact, aggregated statistical metadata about the peers which is used to identify promising peers for a particular query. To support an a-priori unlimited number of peers, it is crucial to keep the load on the distributed directory low. Moreover, each peer should ideally tailor its postings to the directory to reflect its particular strengths, such as rich information about specialized topics that no or only few other peers would also cover. This paper addresses this problem by proposing strategies for peers that identify suitable subsets of the most beneficial statistical metadata. We argue that posting a carefully selected subset of metadata can achieve almost the same result quality as a complete metadata directory, for only the most relevant peers are eventually involved in the execution of a given query. Additionally, asking only relevant peers will result in higher precision, as the noise introduced by poor peers is reduced. We have implemented these strategies in our fully operational P2P Web search prototype Minerva, and present experimental results on real-world Web data that show the viability of the strategies and their gains in terms of high search result quality at low networking costs.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"P2P Directories for Distributed Web Search: From Each According to His Ability, to Each According to His Needs\",\"authors\":\"Matthias Bender, S. Michel, G. Weikum\",\"doi\":\"10.1109/ICDEW.2006.110\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A compelling application of peer-to-peer (P2P) system technology would be distributed Web search, where each peer autonomously runs a search engine on a personalized local corpus (e.g., built from a thematically focused Web crawl) and peers collaborate by routing queries to remote peers that can contribute many or particularly good results for these specific queries. Such systems typically rely on a decentralized directory, e.g., built on top of a distributed hash table (DHT), that holds compact, aggregated statistical metadata about the peers which is used to identify promising peers for a particular query. To support an a-priori unlimited number of peers, it is crucial to keep the load on the distributed directory low. Moreover, each peer should ideally tailor its postings to the directory to reflect its particular strengths, such as rich information about specialized topics that no or only few other peers would also cover. This paper addresses this problem by proposing strategies for peers that identify suitable subsets of the most beneficial statistical metadata. We argue that posting a carefully selected subset of metadata can achieve almost the same result quality as a complete metadata directory, for only the most relevant peers are eventually involved in the execution of a given query. Additionally, asking only relevant peers will result in higher precision, as the noise introduced by poor peers is reduced. We have implemented these strategies in our fully operational P2P Web search prototype Minerva, and present experimental results on real-world Web data that show the viability of the strategies and their gains in terms of high search result quality at low networking costs.\",\"PeriodicalId\":331953,\"journal\":{\"name\":\"22nd International Conference on Data Engineering Workshops (ICDEW'06)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"22nd International Conference on Data Engineering Workshops (ICDEW'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDEW.2006.110\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2006.110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
P2P Directories for Distributed Web Search: From Each According to His Ability, to Each According to His Needs
A compelling application of peer-to-peer (P2P) system technology would be distributed Web search, where each peer autonomously runs a search engine on a personalized local corpus (e.g., built from a thematically focused Web crawl) and peers collaborate by routing queries to remote peers that can contribute many or particularly good results for these specific queries. Such systems typically rely on a decentralized directory, e.g., built on top of a distributed hash table (DHT), that holds compact, aggregated statistical metadata about the peers which is used to identify promising peers for a particular query. To support an a-priori unlimited number of peers, it is crucial to keep the load on the distributed directory low. Moreover, each peer should ideally tailor its postings to the directory to reflect its particular strengths, such as rich information about specialized topics that no or only few other peers would also cover. This paper addresses this problem by proposing strategies for peers that identify suitable subsets of the most beneficial statistical metadata. We argue that posting a carefully selected subset of metadata can achieve almost the same result quality as a complete metadata directory, for only the most relevant peers are eventually involved in the execution of a given query. Additionally, asking only relevant peers will result in higher precision, as the noise introduced by poor peers is reduced. We have implemented these strategies in our fully operational P2P Web search prototype Minerva, and present experimental results on real-world Web data that show the viability of the strategies and their gains in terms of high search result quality at low networking costs.