{"title":"自动文本分类和集中爬行","authors":"Sameendra Samarawickrama, L. Jayaratne","doi":"10.1109/ICDIM.2011.6093329","DOIUrl":null,"url":null,"abstract":"A focused crawler is a web crawler that traverse the web to explore information that is related to a particular topic of interest only. On the other hand, generic web crawlers try to search the entire web, which is impossible due to the size and the complexity of WWW. In this paper we make a survey of some of the latest focused web crawling approaches discussing each with their experimental results. We categorize them as focused crawling based on content analysis, focused crawling based on link analysis and focused crawling based on both the content and link analysis. We also give an insight to the future research and draw the overall conclusions.","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Automatic text classification and focused crawling\",\"authors\":\"Sameendra Samarawickrama, L. Jayaratne\",\"doi\":\"10.1109/ICDIM.2011.6093329\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A focused crawler is a web crawler that traverse the web to explore information that is related to a particular topic of interest only. On the other hand, generic web crawlers try to search the entire web, which is impossible due to the size and the complexity of WWW. In this paper we make a survey of some of the latest focused web crawling approaches discussing each with their experimental results. We categorize them as focused crawling based on content analysis, focused crawling based on link analysis and focused crawling based on both the content and link analysis. We also give an insight to the future research and draw the overall conclusions.\",\"PeriodicalId\":355775,\"journal\":{\"name\":\"2011 Sixth International Conference on Digital Information Management\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Sixth International Conference on Digital Information Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDIM.2011.6093329\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Sixth International Conference on Digital Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIM.2011.6093329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic text classification and focused crawling
A focused crawler is a web crawler that traverse the web to explore information that is related to a particular topic of interest only. On the other hand, generic web crawlers try to search the entire web, which is impossible due to the size and the complexity of WWW. In this paper we make a survey of some of the latest focused web crawling approaches discussing each with their experimental results. We categorize them as focused crawling based on content analysis, focused crawling based on link analysis and focused crawling based on both the content and link analysis. We also give an insight to the future research and draw the overall conclusions.