{"title":"基于概念向量空间模型的电子邮件分类新方法","authors":"C. Zeng, Zhao Lu, J. Gu","doi":"10.1109/FGCNS.2008.7","DOIUrl":null,"url":null,"abstract":"Email classification methods based on the content general use vector space model. The model is constructed based on the frequency of every independent word appearing in Email content. Frequency based VSM does not take the context environment of the word into account, thus the feature vectors can not accurately represent Email content, which will result in the inaccurate of classification. This paper presents a new approach to Email classification based on the concept vector space model using WordNet. In our approach, based on WordNet we extract the high-level information on categories during training process by replacing terms in the feature vector with synonymy sets and considering the hypernymy-hyponymy relation between synonymy sets. We design a Email classification system based on the concept VSM and carry on a series of experiments. The results show that our approach could improve the accuracy of Email classification especially when the size of training set is small.","PeriodicalId":370780,"journal":{"name":"2008 Second International Conference on Future Generation Communication and Networking Symposia","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"A New Approach to Email Classification Using Concept Vector Space Model\",\"authors\":\"C. Zeng, Zhao Lu, J. Gu\",\"doi\":\"10.1109/FGCNS.2008.7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Email classification methods based on the content general use vector space model. The model is constructed based on the frequency of every independent word appearing in Email content. Frequency based VSM does not take the context environment of the word into account, thus the feature vectors can not accurately represent Email content, which will result in the inaccurate of classification. This paper presents a new approach to Email classification based on the concept vector space model using WordNet. In our approach, based on WordNet we extract the high-level information on categories during training process by replacing terms in the feature vector with synonymy sets and considering the hypernymy-hyponymy relation between synonymy sets. We design a Email classification system based on the concept VSM and carry on a series of experiments. The results show that our approach could improve the accuracy of Email classification especially when the size of training set is small.\",\"PeriodicalId\":370780,\"journal\":{\"name\":\"2008 Second International Conference on Future Generation Communication and Networking Symposia\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Second International Conference on Future Generation Communication and Networking Symposia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FGCNS.2008.7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Second International Conference on Future Generation Communication and Networking Symposia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FGCNS.2008.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A New Approach to Email Classification Using Concept Vector Space Model
Email classification methods based on the content general use vector space model. The model is constructed based on the frequency of every independent word appearing in Email content. Frequency based VSM does not take the context environment of the word into account, thus the feature vectors can not accurately represent Email content, which will result in the inaccurate of classification. This paper presents a new approach to Email classification based on the concept vector space model using WordNet. In our approach, based on WordNet we extract the high-level information on categories during training process by replacing terms in the feature vector with synonymy sets and considering the hypernymy-hyponymy relation between synonymy sets. We design a Email classification system based on the concept VSM and carry on a series of experiments. The results show that our approach could improve the accuracy of Email classification especially when the size of training set is small.