{"title":"Training anti-spam models with smaller training set via SVM way","authors":"Lili Diao, Chengzhong Yang","doi":"10.1109/ICEIE.2010.5559725","DOIUrl":null,"url":null,"abstract":"In internet era, though emails turn into one of the most popular way for communication, spam emails also bother people seriously. As a result, research on email filtering has become a hot topic with much effort put into this area. Unfortunately, in the real-world application, the large-scale training email dataset which differs from the assumption made in experiment challenges both efficiency and effectiveness. Thus, a new promising method to filter emails is in need. In this paper, we propose an SVM based machine learning method to compress the training set with minimal information loss. The key process is that we reduce large-scale training email set according to the distribution of Support Vectors produced by SVM training. Then a compressed training set is obtained and makes a great contribution to saving time and keeping precision in generating anti-spam models. Experiments show that trained anti-spam classifier can get a better performance by applying our compressing approach.","PeriodicalId":211301,"journal":{"name":"2010 International Conference on Electronics and Information Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2010-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Electronics and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIE.2010.5559725","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In internet era, though emails turn into one of the most popular way for communication, spam emails also bother people seriously. As a result, research on email filtering has become a hot topic with much effort put into this area. Unfortunately, in the real-world application, the large-scale training email dataset which differs from the assumption made in experiment challenges both efficiency and effectiveness. Thus, a new promising method to filter emails is in need. In this paper, we propose an SVM based machine learning method to compress the training set with minimal information loss. The key process is that we reduce large-scale training email set according to the distribution of Support Vectors produced by SVM training. Then a compressed training set is obtained and makes a great contribution to saving time and keeping precision in generating anti-spam models. Experiments show that trained anti-spam classifier can get a better performance by applying our compressing approach.