Bhaskar Biswas, K. Jain, Vipul Mittal, K. K. Shukla
{"title":"利用网页的树状结构进行聚类","authors":"Bhaskar Biswas, K. Jain, Vipul Mittal, K. K. Shukla","doi":"10.1504/IJKWI.2009.027926","DOIUrl":null,"url":null,"abstract":"An approach to designing a Universal Web Wrapper has been in stages of implementation for over years. An issue associated with this is the automated selection of web pages and thereby extraction of content of interest. We propose an algorithm to cluster pages on the basis of their structure. Due to high amount of similarity in these pages, it is be easier to categorise them and extract any particular section of the page. This algorithm makes use of only the structural factors leading to complexity equivalent to O(log n). Further, the algorithm evaluation illustrates the precision and efficiency of the algorithm.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Exploiting tree structure of a web page for clustering\",\"authors\":\"Bhaskar Biswas, K. Jain, Vipul Mittal, K. K. Shukla\",\"doi\":\"10.1504/IJKWI.2009.027926\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An approach to designing a Universal Web Wrapper has been in stages of implementation for over years. An issue associated with this is the automated selection of web pages and thereby extraction of content of interest. We propose an algorithm to cluster pages on the basis of their structure. Due to high amount of similarity in these pages, it is be easier to categorise them and extract any particular section of the page. This algorithm makes use of only the structural factors leading to complexity equivalent to O(log n). Further, the algorithm evaluation illustrates the precision and efficiency of the algorithm.\",\"PeriodicalId\":113936,\"journal\":{\"name\":\"Int. J. Knowl. Web Intell.\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Knowl. Web Intell.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJKWI.2009.027926\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Knowl. Web Intell.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJKWI.2009.027926","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploiting tree structure of a web page for clustering
An approach to designing a Universal Web Wrapper has been in stages of implementation for over years. An issue associated with this is the automated selection of web pages and thereby extraction of content of interest. We propose an algorithm to cluster pages on the basis of their structure. Due to high amount of similarity in these pages, it is be easier to categorise them and extract any particular section of the page. This algorithm makes use of only the structural factors leading to complexity equivalent to O(log n). Further, the algorithm evaluation illustrates the precision and efficiency of the algorithm.