{"title":"A Novel URL Assignment Model Based on Multi-objective Decision Making Method","authors":"Qiuyan Huang, Qingzhong Li, Zhongmin Yan","doi":"10.1109/WISA.2012.19","DOIUrl":null,"url":null,"abstract":"With the tremendous growth of the Web, it has become a huge challenge for the single-process crawlers to locate the resources that are precise and relevant to some topics in an appropriate amount of time, so it is increasingly important to use the parallel crawler. However, due to the parallelism of crawlers, one headache problem we have to face is how to distribute the URLs to crawlers to make the parallel system work coordinately and thereby make sure that the Web pages fetched are of high quality. In this paper, a novel URL assignment model for the parallel crawler is described, which is based on multi-objective decision making method and considers multiple factors synthetically such as load balance, overlap and so on. Extensive experiments test and validate our techniques.","PeriodicalId":313228,"journal":{"name":"2012 Ninth Web Information Systems and Applications Conference","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Ninth Web Information Systems and Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WISA.2012.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the tremendous growth of the Web, it has become a huge challenge for the single-process crawlers to locate the resources that are precise and relevant to some topics in an appropriate amount of time, so it is increasingly important to use the parallel crawler. However, due to the parallelism of crawlers, one headache problem we have to face is how to distribute the URLs to crawlers to make the parallel system work coordinately and thereby make sure that the Web pages fetched are of high quality. In this paper, a novel URL assignment model for the parallel crawler is described, which is based on multi-objective decision making method and considers multiple factors synthetically such as load balance, overlap and so on. Extensive experiments test and validate our techniques.