{"title":"深层网站的智能爬虫实现","authors":"Ewit","doi":"10.30534/ijccn/2018/30722018","DOIUrl":null,"url":null,"abstract":"Deep web is termed as data present on web but inaccessible to search engine. Due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue. Smart crawler for hidden web interfaces consist of mainly two stages, first is site locating another is in-site exploring. Site locating starts from seed sites and obtains relevant websites through reverse searching and obtains relevant sites through feature space of URL, anchor and text around URL. Second stage takes input from site locating and goes to find relevant link from those sites. The adaptive link learner is used to find out relevant links with help of link priority and link rank.. To eliminate bias on visiting some highly relevant links in hidden web directories, we design a link tree data structure to achieve wider coverage for a website.","PeriodicalId":313852,"journal":{"name":"International Journal of Computing, Communications and Networking","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"IMPLEMENTATION OF SMARTCRAWLER FOR DEEP-WEB SITES\",\"authors\":\"Ewit\",\"doi\":\"10.30534/ijccn/2018/30722018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep web is termed as data present on web but inaccessible to search engine. Due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue. Smart crawler for hidden web interfaces consist of mainly two stages, first is site locating another is in-site exploring. Site locating starts from seed sites and obtains relevant websites through reverse searching and obtains relevant sites through feature space of URL, anchor and text around URL. Second stage takes input from site locating and goes to find relevant link from those sites. The adaptive link learner is used to find out relevant links with help of link priority and link rank.. To eliminate bias on visiting some highly relevant links in hidden web directories, we design a link tree data structure to achieve wider coverage for a website.\",\"PeriodicalId\":313852,\"journal\":{\"name\":\"International Journal of Computing, Communications and Networking\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computing, Communications and Networking\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30534/ijccn/2018/30722018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computing, Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30534/ijccn/2018/30722018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep web is termed as data present on web but inaccessible to search engine. Due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue. Smart crawler for hidden web interfaces consist of mainly two stages, first is site locating another is in-site exploring. Site locating starts from seed sites and obtains relevant websites through reverse searching and obtains relevant sites through feature space of URL, anchor and text around URL. Second stage takes input from site locating and goes to find relevant link from those sites. The adaptive link learner is used to find out relevant links with help of link priority and link rank.. To eliminate bias on visiting some highly relevant links in hidden web directories, we design a link tree data structure to achieve wider coverage for a website.