{"title":"网页检索的视觉相似性比较","authors":"Y. Takama, Noriaki Mitsuhashi","doi":"10.1109/WI.2005.157","DOIUrl":null,"url":null,"abstract":"A comparison method for Web pages in terms of visual similarity is proposed. Conventional Web information retrieval/gathering systems, such as search engines, extract keywords from HTML source files, based on which the similarity between pages is calculated. The extracted keywords are considered as semantic features representing the contents of Web pages. On the other hand, visual feature of Web pages is as important as semantic feature, because HTML is designed for visualizing a Web page in understandable manner for humans. The proposed method compares the layouts of Web pages based on image processing and graph matching. The experimental results show that the accuracy of layout analysis is 91.6% in average, and the visual similarity calculated by the proposed method is closer to the visual judgment by test subjects than color-based comparison method.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":"{\"title\":\"Visual similarity comparison for Web page retrieval\",\"authors\":\"Y. Takama, Noriaki Mitsuhashi\",\"doi\":\"10.1109/WI.2005.157\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A comparison method for Web pages in terms of visual similarity is proposed. Conventional Web information retrieval/gathering systems, such as search engines, extract keywords from HTML source files, based on which the similarity between pages is calculated. The extracted keywords are considered as semantic features representing the contents of Web pages. On the other hand, visual feature of Web pages is as important as semantic feature, because HTML is designed for visualizing a Web page in understandable manner for humans. The proposed method compares the layouts of Web pages based on image processing and graph matching. The experimental results show that the accuracy of layout analysis is 91.6% in average, and the visual similarity calculated by the proposed method is closer to the visual judgment by test subjects than color-based comparison method.\",\"PeriodicalId\":213856,\"journal\":{\"name\":\"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI.2005.157\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2005.157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Visual similarity comparison for Web page retrieval
A comparison method for Web pages in terms of visual similarity is proposed. Conventional Web information retrieval/gathering systems, such as search engines, extract keywords from HTML source files, based on which the similarity between pages is calculated. The extracted keywords are considered as semantic features representing the contents of Web pages. On the other hand, visual feature of Web pages is as important as semantic feature, because HTML is designed for visualizing a Web page in understandable manner for humans. The proposed method compares the layouts of Web pages based on image processing and graph matching. The experimental results show that the accuracy of layout analysis is 91.6% in average, and the visual similarity calculated by the proposed method is closer to the visual judgment by test subjects than color-based comparison method.