{"title":"A Web Crawler Detection Algorithm Based on Web Page Member List","authors":"Weigang Guo, Yong Zhong, Jianqin Xie","doi":"10.1109/IHMSC.2012.54","DOIUrl":null,"url":null,"abstract":"Following the widely use of search engines, the impact Web crawlers have on the Web sites should not be ignored. After analyzing the navigational patterns of Web crawlers from Web logs, a new algorithm based on Web page member list is proposed. The algorithm constructs one member list for every Web page and one show table for every visitor. The experiment shows that the new algorithm can detect the unknown crawlers and unfriendly crawlers who do not obey the Standard for Robot Exclusion.","PeriodicalId":431532,"journal":{"name":"2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHMSC.2012.54","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Following the widely use of search engines, the impact Web crawlers have on the Web sites should not be ignored. After analyzing the navigational patterns of Web crawlers from Web logs, a new algorithm based on Web page member list is proposed. The algorithm constructs one member list for every Web page and one show table for every visitor. The experiment shows that the new algorithm can detect the unknown crawlers and unfriendly crawlers who do not obey the Standard for Robot Exclusion.