{"title":"Sentiment analysis based on Support Vector Machine and Big Data","authors":"Lukas Povoda, Radim Burget, M. Dutta","doi":"10.1109/TSP.2016.7760939","DOIUrl":null,"url":null,"abstract":"This paper deals with sentiment analysis in text documents, especially text valence detection. The proposed solution is based on Support Vector Machines classifier. This classifier was trained with huge amount of data and complex word combinations were analysed. For this purpose distributed learning on 112 processors was used. Datasets used for training and testing were automatically obtained from real user feedback on products from different web pages (and different product segments). The proposed solution has been evaluated with different languages - English, German, Czech and Spanish. This paper improves accuracy achieved with the Big Data approach about 11%. The best accuracy achieved in this work was 95.31% for recognition of positive and negative text valence. The described learning is fully automatic, can be applied to any language and no complicated preprocessing is needed.","PeriodicalId":159773,"journal":{"name":"2016 39th International Conference on Telecommunications and Signal Processing (TSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 39th International Conference on Telecommunications and Signal Processing (TSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSP.2016.7760939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23
Abstract
This paper deals with sentiment analysis in text documents, especially text valence detection. The proposed solution is based on Support Vector Machines classifier. This classifier was trained with huge amount of data and complex word combinations were analysed. For this purpose distributed learning on 112 processors was used. Datasets used for training and testing were automatically obtained from real user feedback on products from different web pages (and different product segments). The proposed solution has been evaluated with different languages - English, German, Czech and Spanish. This paper improves accuracy achieved with the Big Data approach about 11%. The best accuracy achieved in this work was 95.31% for recognition of positive and negative text valence. The described learning is fully automatic, can be applied to any language and no complicated preprocessing is needed.