{"title":"使用web服务设计和实现称职的web爬虫和索引器","authors":"D. K. Santhosh Kumar, M. Kamath","doi":"10.1109/ICACCCT.2014.7019393","DOIUrl":null,"url":null,"abstract":"Today the internet has become a part of human beings life. To get the information what the user is requesting is the job of search engine which indeed takes the help of web crawler. Designing and developing a competent web crawler is a challenging task. This paper proposes Web crawler and Indexer. The WebCrawler consist of crawler services and indexer services and realized as web services. The crawler and indexer services communicate using XML, SOAP and WSDL. The web pages are fetched and parsed for retrieving all the hyperlinks by the crawler service, and then the same process is continued recursively using the Breadth-First strategy. The result of crawler service is downloaded and given as an input to the indexer services by passing the message using web services. Then the indexer service parses the HTML pages, removes stop words, stemming of keywords are carried out as pre-processing steps. Finally the result is stored in the form of inverted index.","PeriodicalId":239918,"journal":{"name":"2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Design and implementation of competent web crawler and indexer using web services\",\"authors\":\"D. K. Santhosh Kumar, M. Kamath\",\"doi\":\"10.1109/ICACCCT.2014.7019393\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today the internet has become a part of human beings life. To get the information what the user is requesting is the job of search engine which indeed takes the help of web crawler. Designing and developing a competent web crawler is a challenging task. This paper proposes Web crawler and Indexer. The WebCrawler consist of crawler services and indexer services and realized as web services. The crawler and indexer services communicate using XML, SOAP and WSDL. The web pages are fetched and parsed for retrieving all the hyperlinks by the crawler service, and then the same process is continued recursively using the Breadth-First strategy. The result of crawler service is downloaded and given as an input to the indexer services by passing the message using web services. Then the indexer service parses the HTML pages, removes stop words, stemming of keywords are carried out as pre-processing steps. Finally the result is stored in the form of inverted index.\",\"PeriodicalId\":239918,\"journal\":{\"name\":\"2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACCCT.2014.7019393\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACCCT.2014.7019393","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Design and implementation of competent web crawler and indexer using web services
Today the internet has become a part of human beings life. To get the information what the user is requesting is the job of search engine which indeed takes the help of web crawler. Designing and developing a competent web crawler is a challenging task. This paper proposes Web crawler and Indexer. The WebCrawler consist of crawler services and indexer services and realized as web services. The crawler and indexer services communicate using XML, SOAP and WSDL. The web pages are fetched and parsed for retrieving all the hyperlinks by the crawler service, and then the same process is continued recursively using the Breadth-First strategy. The result of crawler service is downloaded and given as an input to the indexer services by passing the message using web services. Then the indexer service parses the HTML pages, removes stop words, stemming of keywords are carried out as pre-processing steps. Finally the result is stored in the form of inverted index.