{"title":"Exploring The Role of Web Crawler and Anti-Crawler Technology in Big Data Era","authors":"Fan Zhou, Yang Wang","doi":"10.1109/ICTech55460.2022.00070","DOIUrl":null,"url":null,"abstract":"In the era of big data, with lower costs and higher efficiency, web crawlers access resources and information from the Internet, bringing a lot of convenience to businesses and individuals. Nevertheless, there are two sides to everything, as malicious crawlers bring incalculable threats and losses to websites. In order to prevent web crawlers from being abused or even developing into malicious crawlers, web sites usually perform anti-crawler based on techniques such as ip access frequency, browsing page speed, account login, input captcha, js encryption, ajax obfuscation, etc. Anti-crawlers cannot completely block crawlers with a particular technique, but only find ways to increase the cost of crawling for attackers, forcing the catching party to make the right choice after weighing the cost-benefit.","PeriodicalId":290836,"journal":{"name":"2022 11th International Conference of Information and Communication Technology (ICTech))","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference of Information and Communication Technology (ICTech))","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTech55460.2022.00070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the era of big data, with lower costs and higher efficiency, web crawlers access resources and information from the Internet, bringing a lot of convenience to businesses and individuals. Nevertheless, there are two sides to everything, as malicious crawlers bring incalculable threats and losses to websites. In order to prevent web crawlers from being abused or even developing into malicious crawlers, web sites usually perform anti-crawler based on techniques such as ip access frequency, browsing page speed, account login, input captcha, js encryption, ajax obfuscation, etc. Anti-crawlers cannot completely block crawlers with a particular technique, but only find ways to increase the cost of crawling for attackers, forcing the catching party to make the right choice after weighing the cost-benefit.