{"title":"实时网络爬虫检测","authors":"Andoena Balla, A. Stassopoulou, M. Dikaiakos","doi":"10.1109/CTS.2011.5898963","DOIUrl":null,"url":null,"abstract":"In this paper we present a methodology for detecting web crawlers in real time. We use decision trees to classify requests in real time, as originating from a crawler or human, while their session is ongoing. For this purpose we used machine learning techniques to identify the most important features that differentiate humans from crawlers. The method was tested in real time with the help of an emulator, using only a small number of requests. Our results demonstrate the effectiveness and applicability of our approach.","PeriodicalId":142306,"journal":{"name":"2011 18th International Conference on Telecommunications","volume":"144 11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Real-time web crawler detection\",\"authors\":\"Andoena Balla, A. Stassopoulou, M. Dikaiakos\",\"doi\":\"10.1109/CTS.2011.5898963\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present a methodology for detecting web crawlers in real time. We use decision trees to classify requests in real time, as originating from a crawler or human, while their session is ongoing. For this purpose we used machine learning techniques to identify the most important features that differentiate humans from crawlers. The method was tested in real time with the help of an emulator, using only a small number of requests. Our results demonstrate the effectiveness and applicability of our approach.\",\"PeriodicalId\":142306,\"journal\":{\"name\":\"2011 18th International Conference on Telecommunications\",\"volume\":\"144 11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 18th International Conference on Telecommunications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CTS.2011.5898963\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 18th International Conference on Telecommunications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CTS.2011.5898963","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper we present a methodology for detecting web crawlers in real time. We use decision trees to classify requests in real time, as originating from a crawler or human, while their session is ongoing. For this purpose we used machine learning techniques to identify the most important features that differentiate humans from crawlers. The method was tested in real time with the help of an emulator, using only a small number of requests. Our results demonstrate the effectiveness and applicability of our approach.