{"title":"使用图模型对Web文档进行分类","authors":"A. Schenker, Mark Last, H. Bunke, A. Kandel","doi":"10.1109/ICDAR.2003.1227666","DOIUrl":null,"url":null,"abstract":"In this paper we describe work relating to classification of Web documents using a graph-based model instead of the traditional vector-based model for document representation. We compare the classification accuracy of the vector model approach using the k-nearest neighbor (k-NN) algorithm to a novel approach which allows the use of graphs for document representation in the k-NN algorithm. The proposed method is evaluated on three different Web document collections using the leave-one-out approach for measuring classification accuracy. The results show that the graph-based k-NN approach can outperform traditional vector-based k-NN methods in terms of both accuracy and execution time.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"2009 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"102","resultStr":"{\"title\":\"Classification of Web documents using a graph model\",\"authors\":\"A. Schenker, Mark Last, H. Bunke, A. Kandel\",\"doi\":\"10.1109/ICDAR.2003.1227666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we describe work relating to classification of Web documents using a graph-based model instead of the traditional vector-based model for document representation. We compare the classification accuracy of the vector model approach using the k-nearest neighbor (k-NN) algorithm to a novel approach which allows the use of graphs for document representation in the k-NN algorithm. The proposed method is evaluated on three different Web document collections using the leave-one-out approach for measuring classification accuracy. The results show that the graph-based k-NN approach can outperform traditional vector-based k-NN methods in terms of both accuracy and execution time.\",\"PeriodicalId\":249193,\"journal\":{\"name\":\"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.\",\"volume\":\"2009 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"102\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2003.1227666\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2003.1227666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Classification of Web documents using a graph model
In this paper we describe work relating to classification of Web documents using a graph-based model instead of the traditional vector-based model for document representation. We compare the classification accuracy of the vector model approach using the k-nearest neighbor (k-NN) algorithm to a novel approach which allows the use of graphs for document representation in the k-NN algorithm. The proposed method is evaluated on three different Web document collections using the leave-one-out approach for measuring classification accuracy. The results show that the graph-based k-NN approach can outperform traditional vector-based k-NN methods in terms of both accuracy and execution time.