{"title":"A review on techniques for optimizing web crawler results","authors":"Anuja Lawankar, Nikhil Mangrulkar","doi":"10.1109/STARTUP.2016.7583952","DOIUrl":null,"url":null,"abstract":"Now a days Internet is widely used by users to satisfy their information needs. In the exponential growth of web, searching for useful information has become more difficult. Web crawler helps to extract the relevant and irrelevant links from the web. To optimizing this irrelevant links various algorithms and technique are used. Discovering information by using web crawler have certain issues; such as different URLs having the similar text which increase the time complexity of the search, crawler resources are wasted in fetching duplicate pages and larger storage is also required to store these web pages. These are some of the roadblocks in getting optimum results from the crawler. This paper provides a deep study of existing information retrieval techniques (I.R) which would help researchers to retrieve optimum result links and information.","PeriodicalId":355852,"journal":{"name":"2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STARTUP.2016.7583952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Now a days Internet is widely used by users to satisfy their information needs. In the exponential growth of web, searching for useful information has become more difficult. Web crawler helps to extract the relevant and irrelevant links from the web. To optimizing this irrelevant links various algorithms and technique are used. Discovering information by using web crawler have certain issues; such as different URLs having the similar text which increase the time complexity of the search, crawler resources are wasted in fetching duplicate pages and larger storage is also required to store these web pages. These are some of the roadblocks in getting optimum results from the crawler. This paper provides a deep study of existing information retrieval techniques (I.R) which would help researchers to retrieve optimum result links and information.