A Comparative Study of Machine Learning Techniques for Real-time Multi-tier Sentiment Analysis

Wint Nyein Chan, T. Thein
{"title":"A Comparative Study of Machine Learning Techniques for Real-time Multi-tier Sentiment Analysis","authors":"Wint Nyein Chan, T. Thein","doi":"10.1109/ICKII.2018.8569169","DOIUrl":null,"url":null,"abstract":"Nowadays, Big Data, both structured and unstructured data, are generated from Social Media. Social Media are powerful marketing tools and social big data require real-time tracking and analytics because the speed may indeed be the most important competitive business profits. Compared to batch processing of Sentiment Analysis on Big Data Analytics platform, Real-time analytic is data intensive in nature and require to efficiently collect and process large volume and high velocity of data. Real-time multiclass Sentiment Analysis is oriented towards classification of text into more detailed sentiment labels in real-time manner. But Multiclass Sentiment Analysis with Single-tier architecture where single classification model is developed and entire labeled data is trained may decrease the classification accuracy. In this paper, Real-time Multi-tier Sentiment Analysis system (RMSA) is proposed to achieve high level performance of multi-class classification in Real-time manner. Lexicon and learning based classification scheme with Multi-tier architecture are combined to develop the proposed system. Real-time twitter stream data is collected by apache flume and, large volumes and high velocity of social data is efficiently analyzed by Spark. To improve the classification accuracy, the suitable classifier is selected by comparing the accuracy of three different learning based multiclass classification techniques: Naïve Bayes, Linear SVC and Logistic Regression. The evaluation results show that Real-time Multi-tier Sentiment Analysis will achieve the promising accuracy and Linear SVC is better than other techniques for Real-time Multi-tier Sentiment Analysis.","PeriodicalId":170587,"journal":{"name":"2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKII.2018.8569169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Nowadays, Big Data, both structured and unstructured data, are generated from Social Media. Social Media are powerful marketing tools and social big data require real-time tracking and analytics because the speed may indeed be the most important competitive business profits. Compared to batch processing of Sentiment Analysis on Big Data Analytics platform, Real-time analytic is data intensive in nature and require to efficiently collect and process large volume and high velocity of data. Real-time multiclass Sentiment Analysis is oriented towards classification of text into more detailed sentiment labels in real-time manner. But Multiclass Sentiment Analysis with Single-tier architecture where single classification model is developed and entire labeled data is trained may decrease the classification accuracy. In this paper, Real-time Multi-tier Sentiment Analysis system (RMSA) is proposed to achieve high level performance of multi-class classification in Real-time manner. Lexicon and learning based classification scheme with Multi-tier architecture are combined to develop the proposed system. Real-time twitter stream data is collected by apache flume and, large volumes and high velocity of social data is efficiently analyzed by Spark. To improve the classification accuracy, the suitable classifier is selected by comparing the accuracy of three different learning based multiclass classification techniques: Naïve Bayes, Linear SVC and Logistic Regression. The evaluation results show that Real-time Multi-tier Sentiment Analysis will achieve the promising accuracy and Linear SVC is better than other techniques for Real-time Multi-tier Sentiment Analysis.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向实时多层情感分析的机器学习技术比较研究
如今的大数据,包括结构化数据和非结构化数据,都是由社交媒体产生的。社交媒体是强大的营销工具,社交大数据需要实时跟踪和分析,因为速度可能确实是最重要的竞争商业利润。与大数据分析平台上情感分析的批量处理相比,实时分析是数据密集型的,需要高效地收集和处理大容量、高速度的数据。实时多类情感分析的目标是实时地将文本分类为更详细的情感标签。但是单层结构的多类情感分析只开发一个分类模型,对整个标记数据进行训练,可能会降低分类精度。本文提出了实时多层情感分析系统(RMSA),以实现实时多类分类的高水平性能。将基于词典和基于学习的多层分类方案相结合,开发了该系统。实时twitter流数据由apache flume采集,大容量、高速度的社交数据通过Spark进行高效分析。为了提高分类精度,通过比较Naïve贝叶斯、线性SVC和逻辑回归三种不同的基于学习的多类分类技术的准确率,选择合适的分类器。评价结果表明,实时多层情感分析可以达到预期的精度,线性SVC比其他技术更适合实时多层情感分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Study on Knowledge Management System for Knowledge Competitiveness With One Stop Knowledge Service Finite Element Analysis on Dislocation Torque and Contact Stress of a Novel Acetabular Cup Liner Defect When Prosthetic Impingement Occurs Mobile E-learning Support System for Secondary Schools in Nigeria A study on the present-situation analysis and feasible path of the elderly TV programs in mainland China Research on the Stability of Biped Robot Walking on Different Road Surfaces
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1