基于词典的支持向量机在Twitter情感分析中的实现

2022 10th International Conference on Cyber and IT Service Management (CITSM) Pub Date : 2022-09-20 DOI:10.1109/CITSM56380.2022.9935887

Nida Hasanati, Qurrotul Aini, Arndini Nuri

{"title":"基于词典的支持向量机在Twitter情感分析中的实现","authors":"Nida Hasanati, Qurrotul Aini, Arndini Nuri","doi":"10.1109/CITSM56380.2022.9935887","DOIUrl":null,"url":null,"abstract":"Twitter is one of the social media that is widely used where Indonesia occupies the 6th largest Twitter user in the world. This research is a quantitative study on fine-grained sentiment analysis that extracts sentiment with the topic of the covid vaccine from Twitter with the aim of implementing the Support Vector Machine algorithm. The research flow uses the SEMMA method (Sample, Explore, Modify, Model, and Assess). The collection of data sets in the form of tweets crawled from Twitter by utilizing the Twitter API at the sample stage for further exploration of the attributes of the data set at the explore stage. The modify stage is text preprocessing so that the data set is more structured. After that is the model stage which applies the lexicon based method to assign sentiment classes to the data set. Data sets that have labels will be classified using the Naïve Bayes method and the Support Vector Machine. The final stage of the SEMMA method is to assess the method applied using confusion matrix and k-fold Cross Validation. The accuracy results from the Support Vector Machine method, the best parameter results using the CV Grid Search are the rbf kernel with $\\boldsymbol{C=100}$ and degree = 0.01 resulting in an accuracy of 85%. The accuracy of the implementation of the Support Vector Machine algorithm produces good scores for the Covid-19 vaccine topic, so that the algorithm can be applied to the classification of sentiment analysis on new data.","PeriodicalId":342813,"journal":{"name":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Implementation of Support Vector Machine with Lexicon Based for Sentiment Analysis on Twitter\",\"authors\":\"Nida Hasanati, Qurrotul Aini, Arndini Nuri\",\"doi\":\"10.1109/CITSM56380.2022.9935887\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter is one of the social media that is widely used where Indonesia occupies the 6th largest Twitter user in the world. This research is a quantitative study on fine-grained sentiment analysis that extracts sentiment with the topic of the covid vaccine from Twitter with the aim of implementing the Support Vector Machine algorithm. The research flow uses the SEMMA method (Sample, Explore, Modify, Model, and Assess). The collection of data sets in the form of tweets crawled from Twitter by utilizing the Twitter API at the sample stage for further exploration of the attributes of the data set at the explore stage. The modify stage is text preprocessing so that the data set is more structured. After that is the model stage which applies the lexicon based method to assign sentiment classes to the data set. Data sets that have labels will be classified using the Naïve Bayes method and the Support Vector Machine. The final stage of the SEMMA method is to assess the method applied using confusion matrix and k-fold Cross Validation. The accuracy results from the Support Vector Machine method, the best parameter results using the CV Grid Search are the rbf kernel with $\\\\boldsymbol{C=100}$ and degree = 0.01 resulting in an accuracy of 85%. The accuracy of the implementation of the Support Vector Machine algorithm produces good scores for the Covid-19 vaccine topic, so that the algorithm can be applied to the classification of sentiment analysis on new data.\",\"PeriodicalId\":342813,\"journal\":{\"name\":\"2022 10th International Conference on Cyber and IT Service Management (CITSM)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 10th International Conference on Cyber and IT Service Management (CITSM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CITSM56380.2022.9935887\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITSM56380.2022.9935887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

Twitter是被广泛使用的社交媒体之一，印度尼西亚是世界上第六大Twitter用户。本研究是以支持向量机(Support Vector Machine)算法为目标，从Twitter上提取以covid疫苗为主题的情绪的细粒度情绪分析的定量研究。研究流程使用SEMMA方法(抽样、探索、修改、建模和评估)。在样本阶段利用Twitter API从Twitter抓取tweets形式的数据集，以便在探索阶段进一步探索数据集的属性。修改阶段是文本预处理，使数据集更加结构化。然后是模型阶段，应用基于词典的方法为数据集分配情感类。有标签的数据集将使用Naïve贝叶斯方法和支持向量机进行分类。SEMMA方法的最后阶段是评估使用混淆矩阵和k-fold交叉验证应用的方法。支持向量机方法的准确率结果，使用CV网格搜索的最佳参数结果是$\boldsymbol{C=100}$和degree = 0.01的rbf核，准确率为85%。支持向量机算法实现的准确性对Covid-19疫苗主题产生了良好的评分，因此该算法可以应用于新数据的情感分析分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Implementation of Support Vector Machine with Lexicon Based for Sentiment Analysis on Twitter

Twitter is one of the social media that is widely used where Indonesia occupies the 6th largest Twitter user in the world. This research is a quantitative study on fine-grained sentiment analysis that extracts sentiment with the topic of the covid vaccine from Twitter with the aim of implementing the Support Vector Machine algorithm. The research flow uses the SEMMA method (Sample, Explore, Modify, Model, and Assess). The collection of data sets in the form of tweets crawled from Twitter by utilizing the Twitter API at the sample stage for further exploration of the attributes of the data set at the explore stage. The modify stage is text preprocessing so that the data set is more structured. After that is the model stage which applies the lexicon based method to assign sentiment classes to the data set. Data sets that have labels will be classified using the Naïve Bayes method and the Support Vector Machine. The final stage of the SEMMA method is to assess the method applied using confusion matrix and k-fold Cross Validation. The accuracy results from the Support Vector Machine method, the best parameter results using the CV Grid Search are the rbf kernel with $\boldsymbol{C=100}$ and degree = 0.01 resulting in an accuracy of 85%. The accuracy of the implementation of the Support Vector Machine algorithm produces good scores for the Covid-19 vaccine topic, so that the algorithm can be applied to the classification of sentiment analysis on new data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 10th International Conference on Cyber and IT Service Management (CITSM)

自引率

0.00%

发文量