{"title":"A Comparative Study of Machine Learning Techniques for Real-time Multi-tier Sentiment Analysis","authors":"Wint Nyein Chan, T. Thein","doi":"10.1109/ICKII.2018.8569169","DOIUrl":null,"url":null,"abstract":"Nowadays, Big Data, both structured and unstructured data, are generated from Social Media. Social Media are powerful marketing tools and social big data require real-time tracking and analytics because the speed may indeed be the most important competitive business profits. Compared to batch processing of Sentiment Analysis on Big Data Analytics platform, Real-time analytic is data intensive in nature and require to efficiently collect and process large volume and high velocity of data. Real-time multiclass Sentiment Analysis is oriented towards classification of text into more detailed sentiment labels in real-time manner. But Multiclass Sentiment Analysis with Single-tier architecture where single classification model is developed and entire labeled data is trained may decrease the classification accuracy. In this paper, Real-time Multi-tier Sentiment Analysis system (RMSA) is proposed to achieve high level performance of multi-class classification in Real-time manner. Lexicon and learning based classification scheme with Multi-tier architecture are combined to develop the proposed system. Real-time twitter stream data is collected by apache flume and, large volumes and high velocity of social data is efficiently analyzed by Spark. To improve the classification accuracy, the suitable classifier is selected by comparing the accuracy of three different learning based multiclass classification techniques: Naïve Bayes, Linear SVC and Logistic Regression. The evaluation results show that Real-time Multi-tier Sentiment Analysis will achieve the promising accuracy and Linear SVC is better than other techniques for Real-time Multi-tier Sentiment Analysis.","PeriodicalId":170587,"journal":{"name":"2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKII.2018.8569169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Nowadays, Big Data, both structured and unstructured data, are generated from Social Media. Social Media are powerful marketing tools and social big data require real-time tracking and analytics because the speed may indeed be the most important competitive business profits. Compared to batch processing of Sentiment Analysis on Big Data Analytics platform, Real-time analytic is data intensive in nature and require to efficiently collect and process large volume and high velocity of data. Real-time multiclass Sentiment Analysis is oriented towards classification of text into more detailed sentiment labels in real-time manner. But Multiclass Sentiment Analysis with Single-tier architecture where single classification model is developed and entire labeled data is trained may decrease the classification accuracy. In this paper, Real-time Multi-tier Sentiment Analysis system (RMSA) is proposed to achieve high level performance of multi-class classification in Real-time manner. Lexicon and learning based classification scheme with Multi-tier architecture are combined to develop the proposed system. Real-time twitter stream data is collected by apache flume and, large volumes and high velocity of social data is efficiently analyzed by Spark. To improve the classification accuracy, the suitable classifier is selected by comparing the accuracy of three different learning based multiclass classification techniques: Naïve Bayes, Linear SVC and Logistic Regression. The evaluation results show that Real-time Multi-tier Sentiment Analysis will achieve the promising accuracy and Linear SVC is better than other techniques for Real-time Multi-tier Sentiment Analysis.