{"title":"基于词典的支持向量机在Twitter情感分析中的实现","authors":"Nida Hasanati, Qurrotul Aini, Arndini Nuri","doi":"10.1109/CITSM56380.2022.9935887","DOIUrl":null,"url":null,"abstract":"Twitter is one of the social media that is widely used where Indonesia occupies the 6th largest Twitter user in the world. This research is a quantitative study on fine-grained sentiment analysis that extracts sentiment with the topic of the covid vaccine from Twitter with the aim of implementing the Support Vector Machine algorithm. The research flow uses the SEMMA method (Sample, Explore, Modify, Model, and Assess). The collection of data sets in the form of tweets crawled from Twitter by utilizing the Twitter API at the sample stage for further exploration of the attributes of the data set at the explore stage. The modify stage is text preprocessing so that the data set is more structured. After that is the model stage which applies the lexicon based method to assign sentiment classes to the data set. Data sets that have labels will be classified using the Naïve Bayes method and the Support Vector Machine. The final stage of the SEMMA method is to assess the method applied using confusion matrix and k-fold Cross Validation. The accuracy results from the Support Vector Machine method, the best parameter results using the CV Grid Search are the rbf kernel with $\\boldsymbol{C=100}$ and degree = 0.01 resulting in an accuracy of 85%. The accuracy of the implementation of the Support Vector Machine algorithm produces good scores for the Covid-19 vaccine topic, so that the algorithm can be applied to the classification of sentiment analysis on new data.","PeriodicalId":342813,"journal":{"name":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Implementation of Support Vector Machine with Lexicon Based for Sentiment Analysis on Twitter\",\"authors\":\"Nida Hasanati, Qurrotul Aini, Arndini Nuri\",\"doi\":\"10.1109/CITSM56380.2022.9935887\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter is one of the social media that is widely used where Indonesia occupies the 6th largest Twitter user in the world. This research is a quantitative study on fine-grained sentiment analysis that extracts sentiment with the topic of the covid vaccine from Twitter with the aim of implementing the Support Vector Machine algorithm. The research flow uses the SEMMA method (Sample, Explore, Modify, Model, and Assess). The collection of data sets in the form of tweets crawled from Twitter by utilizing the Twitter API at the sample stage for further exploration of the attributes of the data set at the explore stage. The modify stage is text preprocessing so that the data set is more structured. After that is the model stage which applies the lexicon based method to assign sentiment classes to the data set. Data sets that have labels will be classified using the Naïve Bayes method and the Support Vector Machine. The final stage of the SEMMA method is to assess the method applied using confusion matrix and k-fold Cross Validation. The accuracy results from the Support Vector Machine method, the best parameter results using the CV Grid Search are the rbf kernel with $\\\\boldsymbol{C=100}$ and degree = 0.01 resulting in an accuracy of 85%. The accuracy of the implementation of the Support Vector Machine algorithm produces good scores for the Covid-19 vaccine topic, so that the algorithm can be applied to the classification of sentiment analysis on new data.\",\"PeriodicalId\":342813,\"journal\":{\"name\":\"2022 10th International Conference on Cyber and IT Service Management (CITSM)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 10th International Conference on Cyber and IT Service Management (CITSM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CITSM56380.2022.9935887\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITSM56380.2022.9935887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Implementation of Support Vector Machine with Lexicon Based for Sentiment Analysis on Twitter
Twitter is one of the social media that is widely used where Indonesia occupies the 6th largest Twitter user in the world. This research is a quantitative study on fine-grained sentiment analysis that extracts sentiment with the topic of the covid vaccine from Twitter with the aim of implementing the Support Vector Machine algorithm. The research flow uses the SEMMA method (Sample, Explore, Modify, Model, and Assess). The collection of data sets in the form of tweets crawled from Twitter by utilizing the Twitter API at the sample stage for further exploration of the attributes of the data set at the explore stage. The modify stage is text preprocessing so that the data set is more structured. After that is the model stage which applies the lexicon based method to assign sentiment classes to the data set. Data sets that have labels will be classified using the Naïve Bayes method and the Support Vector Machine. The final stage of the SEMMA method is to assess the method applied using confusion matrix and k-fold Cross Validation. The accuracy results from the Support Vector Machine method, the best parameter results using the CV Grid Search are the rbf kernel with $\boldsymbol{C=100}$ and degree = 0.01 resulting in an accuracy of 85%. The accuracy of the implementation of the Support Vector Machine algorithm produces good scores for the Covid-19 vaccine topic, so that the algorithm can be applied to the classification of sentiment analysis on new data.