{"title":"优化网络爬虫结果的技术综述","authors":"Anuja Lawankar, Nikhil Mangrulkar","doi":"10.1109/STARTUP.2016.7583952","DOIUrl":null,"url":null,"abstract":"Now a days Internet is widely used by users to satisfy their information needs. In the exponential growth of web, searching for useful information has become more difficult. Web crawler helps to extract the relevant and irrelevant links from the web. To optimizing this irrelevant links various algorithms and technique are used. Discovering information by using web crawler have certain issues; such as different URLs having the similar text which increase the time complexity of the search, crawler resources are wasted in fetching duplicate pages and larger storage is also required to store these web pages. These are some of the roadblocks in getting optimum results from the crawler. This paper provides a deep study of existing information retrieval techniques (I.R) which would help researchers to retrieve optimum result links and information.","PeriodicalId":355852,"journal":{"name":"2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A review on techniques for optimizing web crawler results\",\"authors\":\"Anuja Lawankar, Nikhil Mangrulkar\",\"doi\":\"10.1109/STARTUP.2016.7583952\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Now a days Internet is widely used by users to satisfy their information needs. In the exponential growth of web, searching for useful information has become more difficult. Web crawler helps to extract the relevant and irrelevant links from the web. To optimizing this irrelevant links various algorithms and technique are used. Discovering information by using web crawler have certain issues; such as different URLs having the similar text which increase the time complexity of the search, crawler resources are wasted in fetching duplicate pages and larger storage is also required to store these web pages. These are some of the roadblocks in getting optimum results from the crawler. This paper provides a deep study of existing information retrieval techniques (I.R) which would help researchers to retrieve optimum result links and information.\",\"PeriodicalId\":355852,\"journal\":{\"name\":\"2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/STARTUP.2016.7583952\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STARTUP.2016.7583952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A review on techniques for optimizing web crawler results
Now a days Internet is widely used by users to satisfy their information needs. In the exponential growth of web, searching for useful information has become more difficult. Web crawler helps to extract the relevant and irrelevant links from the web. To optimizing this irrelevant links various algorithms and technique are used. Discovering information by using web crawler have certain issues; such as different URLs having the similar text which increase the time complexity of the search, crawler resources are wasted in fetching duplicate pages and larger storage is also required to store these web pages. These are some of the roadblocks in getting optimum results from the crawler. This paper provides a deep study of existing information retrieval techniques (I.R) which would help researchers to retrieve optimum result links and information.