{"title":"基于特征的支持向量机二值分类性能评价","authors":"Shivani Sharma, S. Srivastava","doi":"10.1109/CICT.2016.41","DOIUrl":null,"url":null,"abstract":"Classification is a challenging phenomenon. Text classification uses terms as features which can be grouped to vote for belongingness of a class. This paper explores the performance of Support Vector Machine (SVM) on variation of text features. Empirical results support the findings. The reported result shows significant degradation in SVM classifier as we reduce features from 100 to 50 and then to 25. Short text messages (tweets) are used as a data set and balanced binary classes are used with 841 tweets each. We have used radial basis function as a kernel parameter. TP Rate, FP Rate, Precision, Recall, F Measure are used as a measure of performance evaluator. Confusion matrix is used for quick review of classifier and 10 fold cross validation is used for estimation of prediction model.","PeriodicalId":118509,"journal":{"name":"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Feature Based Performance Evaluation of Support Vector Machine on Binary Classification\",\"authors\":\"Shivani Sharma, S. Srivastava\",\"doi\":\"10.1109/CICT.2016.41\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification is a challenging phenomenon. Text classification uses terms as features which can be grouped to vote for belongingness of a class. This paper explores the performance of Support Vector Machine (SVM) on variation of text features. Empirical results support the findings. The reported result shows significant degradation in SVM classifier as we reduce features from 100 to 50 and then to 25. Short text messages (tweets) are used as a data set and balanced binary classes are used with 841 tweets each. We have used radial basis function as a kernel parameter. TP Rate, FP Rate, Precision, Recall, F Measure are used as a measure of performance evaluator. Confusion matrix is used for quick review of classifier and 10 fold cross validation is used for estimation of prediction model.\",\"PeriodicalId\":118509,\"journal\":{\"name\":\"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICT.2016.41\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICT.2016.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Feature Based Performance Evaluation of Support Vector Machine on Binary Classification
Classification is a challenging phenomenon. Text classification uses terms as features which can be grouped to vote for belongingness of a class. This paper explores the performance of Support Vector Machine (SVM) on variation of text features. Empirical results support the findings. The reported result shows significant degradation in SVM classifier as we reduce features from 100 to 50 and then to 25. Short text messages (tweets) are used as a data set and balanced binary classes are used with 841 tweets each. We have used radial basis function as a kernel parameter. TP Rate, FP Rate, Precision, Recall, F Measure are used as a measure of performance evaluator. Confusion matrix is used for quick review of classifier and 10 fold cross validation is used for estimation of prediction model.