{"title":"探讨网络爬虫和反爬虫技术在大数据时代的作用","authors":"Fan Zhou, Yang Wang","doi":"10.1109/ICTech55460.2022.00070","DOIUrl":null,"url":null,"abstract":"In the era of big data, with lower costs and higher efficiency, web crawlers access resources and information from the Internet, bringing a lot of convenience to businesses and individuals. Nevertheless, there are two sides to everything, as malicious crawlers bring incalculable threats and losses to websites. In order to prevent web crawlers from being abused or even developing into malicious crawlers, web sites usually perform anti-crawler based on techniques such as ip access frequency, browsing page speed, account login, input captcha, js encryption, ajax obfuscation, etc. Anti-crawlers cannot completely block crawlers with a particular technique, but only find ways to increase the cost of crawling for attackers, forcing the catching party to make the right choice after weighing the cost-benefit.","PeriodicalId":290836,"journal":{"name":"2022 11th International Conference of Information and Communication Technology (ICTech))","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring The Role of Web Crawler and Anti-Crawler Technology in Big Data Era\",\"authors\":\"Fan Zhou, Yang Wang\",\"doi\":\"10.1109/ICTech55460.2022.00070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of big data, with lower costs and higher efficiency, web crawlers access resources and information from the Internet, bringing a lot of convenience to businesses and individuals. Nevertheless, there are two sides to everything, as malicious crawlers bring incalculable threats and losses to websites. In order to prevent web crawlers from being abused or even developing into malicious crawlers, web sites usually perform anti-crawler based on techniques such as ip access frequency, browsing page speed, account login, input captcha, js encryption, ajax obfuscation, etc. Anti-crawlers cannot completely block crawlers with a particular technique, but only find ways to increase the cost of crawling for attackers, forcing the catching party to make the right choice after weighing the cost-benefit.\",\"PeriodicalId\":290836,\"journal\":{\"name\":\"2022 11th International Conference of Information and Communication Technology (ICTech))\",\"volume\":\"128 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 11th International Conference of Information and Communication Technology (ICTech))\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTech55460.2022.00070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference of Information and Communication Technology (ICTech))","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTech55460.2022.00070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploring The Role of Web Crawler and Anti-Crawler Technology in Big Data Era
In the era of big data, with lower costs and higher efficiency, web crawlers access resources and information from the Internet, bringing a lot of convenience to businesses and individuals. Nevertheless, there are two sides to everything, as malicious crawlers bring incalculable threats and losses to websites. In order to prevent web crawlers from being abused or even developing into malicious crawlers, web sites usually perform anti-crawler based on techniques such as ip access frequency, browsing page speed, account login, input captcha, js encryption, ajax obfuscation, etc. Anti-crawlers cannot completely block crawlers with a particular technique, but only find ways to increase the cost of crawling for attackers, forcing the catching party to make the right choice after weighing the cost-benefit.