基于词典的支持向量机在Twitter情感分析中的实现

Nida Hasanati, Qurrotul Aini, Arndini Nuri
{"title":"基于词典的支持向量机在Twitter情感分析中的实现","authors":"Nida Hasanati, Qurrotul Aini, Arndini Nuri","doi":"10.1109/CITSM56380.2022.9935887","DOIUrl":null,"url":null,"abstract":"Twitter is one of the social media that is widely used where Indonesia occupies the 6th largest Twitter user in the world. This research is a quantitative study on fine-grained sentiment analysis that extracts sentiment with the topic of the covid vaccine from Twitter with the aim of implementing the Support Vector Machine algorithm. The research flow uses the SEMMA method (Sample, Explore, Modify, Model, and Assess). The collection of data sets in the form of tweets crawled from Twitter by utilizing the Twitter API at the sample stage for further exploration of the attributes of the data set at the explore stage. The modify stage is text preprocessing so that the data set is more structured. After that is the model stage which applies the lexicon based method to assign sentiment classes to the data set. Data sets that have labels will be classified using the Naïve Bayes method and the Support Vector Machine. The final stage of the SEMMA method is to assess the method applied using confusion matrix and k-fold Cross Validation. The accuracy results from the Support Vector Machine method, the best parameter results using the CV Grid Search are the rbf kernel with $\\boldsymbol{C=100}$ and degree = 0.01 resulting in an accuracy of 85%. The accuracy of the implementation of the Support Vector Machine algorithm produces good scores for the Covid-19 vaccine topic, so that the algorithm can be applied to the classification of sentiment analysis on new data.","PeriodicalId":342813,"journal":{"name":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Implementation of Support Vector Machine with Lexicon Based for Sentiment Analysis on Twitter\",\"authors\":\"Nida Hasanati, Qurrotul Aini, Arndini Nuri\",\"doi\":\"10.1109/CITSM56380.2022.9935887\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter is one of the social media that is widely used where Indonesia occupies the 6th largest Twitter user in the world. This research is a quantitative study on fine-grained sentiment analysis that extracts sentiment with the topic of the covid vaccine from Twitter with the aim of implementing the Support Vector Machine algorithm. The research flow uses the SEMMA method (Sample, Explore, Modify, Model, and Assess). The collection of data sets in the form of tweets crawled from Twitter by utilizing the Twitter API at the sample stage for further exploration of the attributes of the data set at the explore stage. The modify stage is text preprocessing so that the data set is more structured. After that is the model stage which applies the lexicon based method to assign sentiment classes to the data set. Data sets that have labels will be classified using the Naïve Bayes method and the Support Vector Machine. The final stage of the SEMMA method is to assess the method applied using confusion matrix and k-fold Cross Validation. The accuracy results from the Support Vector Machine method, the best parameter results using the CV Grid Search are the rbf kernel with $\\\\boldsymbol{C=100}$ and degree = 0.01 resulting in an accuracy of 85%. The accuracy of the implementation of the Support Vector Machine algorithm produces good scores for the Covid-19 vaccine topic, so that the algorithm can be applied to the classification of sentiment analysis on new data.\",\"PeriodicalId\":342813,\"journal\":{\"name\":\"2022 10th International Conference on Cyber and IT Service Management (CITSM)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 10th International Conference on Cyber and IT Service Management (CITSM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CITSM56380.2022.9935887\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITSM56380.2022.9935887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

Twitter是被广泛使用的社交媒体之一,印度尼西亚是世界上第六大Twitter用户。本研究是以支持向量机(Support Vector Machine)算法为目标,从Twitter上提取以covid疫苗为主题的情绪的细粒度情绪分析的定量研究。研究流程使用SEMMA方法(抽样、探索、修改、建模和评估)。在样本阶段利用Twitter API从Twitter抓取tweets形式的数据集,以便在探索阶段进一步探索数据集的属性。修改阶段是文本预处理,使数据集更加结构化。然后是模型阶段,应用基于词典的方法为数据集分配情感类。有标签的数据集将使用Naïve贝叶斯方法和支持向量机进行分类。SEMMA方法的最后阶段是评估使用混淆矩阵和k-fold交叉验证应用的方法。支持向量机方法的准确率结果,使用CV网格搜索的最佳参数结果是$\boldsymbol{C=100}$和degree = 0.01的rbf核,准确率为85%。支持向量机算法实现的准确性对Covid-19疫苗主题产生了良好的评分,因此该算法可以应用于新数据的情感分析分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Implementation of Support Vector Machine with Lexicon Based for Sentiment Analysis on Twitter
Twitter is one of the social media that is widely used where Indonesia occupies the 6th largest Twitter user in the world. This research is a quantitative study on fine-grained sentiment analysis that extracts sentiment with the topic of the covid vaccine from Twitter with the aim of implementing the Support Vector Machine algorithm. The research flow uses the SEMMA method (Sample, Explore, Modify, Model, and Assess). The collection of data sets in the form of tweets crawled from Twitter by utilizing the Twitter API at the sample stage for further exploration of the attributes of the data set at the explore stage. The modify stage is text preprocessing so that the data set is more structured. After that is the model stage which applies the lexicon based method to assign sentiment classes to the data set. Data sets that have labels will be classified using the Naïve Bayes method and the Support Vector Machine. The final stage of the SEMMA method is to assess the method applied using confusion matrix and k-fold Cross Validation. The accuracy results from the Support Vector Machine method, the best parameter results using the CV Grid Search are the rbf kernel with $\boldsymbol{C=100}$ and degree = 0.01 resulting in an accuracy of 85%. The accuracy of the implementation of the Support Vector Machine algorithm produces good scores for the Covid-19 vaccine topic, so that the algorithm can be applied to the classification of sentiment analysis on new data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fault Detection in Wireless Sensor Networks Data Using Random Under Sampling and Extra-Tree Algorithm Automated House Budget Plan Application Analysis of E-Government Service Quality using E-GovQual and Importance Performance Analysis (IPA) Analysis of Public Sentiment Using The K-Nearest Neighbor (k-NN) Algorithm and Lexicon Based on Indonesian Television Shows on Social Media Twitter Heuristic and Webuse Method to Evaluate UI/UX of Faculty Website
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1