{"title":"Web forum crawling using text-based filters","authors":"Priyanka Bandagale, Atiya R. Kazi","doi":"10.1109/ICPCSI.2017.8392037","DOIUrl":null,"url":null,"abstract":"Forum websites contain very valuable information about frequently occurred problems in any subject. The good thing about these forums is, it is organized in a very proper format so that people can navigate easily to their expected discussion page. Now a days search engines are becoming very specific for different types of websites. So, they are making special crawlers for different types of websites to crawl exact information from raw html code. The research on forum crawlers is increased since last few years. But the main challenge in forum crawlers is finding similarity between URLs from different forum sites. This paper focuses on crawling forum URLs and classifying them as Entry, Index, Thread and Page Flipping URLs.","PeriodicalId":6589,"journal":{"name":"2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI)","volume":"1 1","pages":"1856-1859"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPCSI.2017.8392037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Forum websites contain very valuable information about frequently occurred problems in any subject. The good thing about these forums is, it is organized in a very proper format so that people can navigate easily to their expected discussion page. Now a days search engines are becoming very specific for different types of websites. So, they are making special crawlers for different types of websites to crawl exact information from raw html code. The research on forum crawlers is increased since last few years. But the main challenge in forum crawlers is finding similarity between URLs from different forum sites. This paper focuses on crawling forum URLs and classifying them as Entry, Index, Thread and Page Flipping URLs.