{"title":"基于朴素贝叶斯的垃圾邮件过滤器的研究与改进","authors":"Lin Li, Chi Li","doi":"10.1109/IHMSC.2015.208","DOIUrl":null,"url":null,"abstract":"The spam filter based on Naive Bayes algorithm, which has good classification accuracy, but the training and learning mail sample sets takes a lot of resources, affects the overall efficiency of the system, so we should select the features of the message text in the practical application, and thus to reduce the dimension of the features vector space. TF-IDF is commonly used as a text feature selection, the method is simple, the paper improve the IDF weighting algorithm of the TF-IDF feature selection, increase the weight of the high frequency words corresponding its class, use the improved TF-IDF algorithm to select the features, and build a naive Bayesian spam filter improved TF-IDF feature weighting.","PeriodicalId":6592,"journal":{"name":"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics","volume":"61 1","pages":"361-364"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Research and Improvement of a Spam Filter Based on Naive Bayes\",\"authors\":\"Lin Li, Chi Li\",\"doi\":\"10.1109/IHMSC.2015.208\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The spam filter based on Naive Bayes algorithm, which has good classification accuracy, but the training and learning mail sample sets takes a lot of resources, affects the overall efficiency of the system, so we should select the features of the message text in the practical application, and thus to reduce the dimension of the features vector space. TF-IDF is commonly used as a text feature selection, the method is simple, the paper improve the IDF weighting algorithm of the TF-IDF feature selection, increase the weight of the high frequency words corresponding its class, use the improved TF-IDF algorithm to select the features, and build a naive Bayesian spam filter improved TF-IDF feature weighting.\",\"PeriodicalId\":6592,\"journal\":{\"name\":\"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics\",\"volume\":\"61 1\",\"pages\":\"361-364\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IHMSC.2015.208\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHMSC.2015.208","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research and Improvement of a Spam Filter Based on Naive Bayes
The spam filter based on Naive Bayes algorithm, which has good classification accuracy, but the training and learning mail sample sets takes a lot of resources, affects the overall efficiency of the system, so we should select the features of the message text in the practical application, and thus to reduce the dimension of the features vector space. TF-IDF is commonly used as a text feature selection, the method is simple, the paper improve the IDF weighting algorithm of the TF-IDF feature selection, increase the weight of the high frequency words corresponding its class, use the improved TF-IDF algorithm to select the features, and build a naive Bayesian spam filter improved TF-IDF feature weighting.