支持向量机与Naïve贝叶斯算法情感分析场景预处理性能比较

Nabila Valinka Pusean, N. Charibaldi, B. Santosa
{"title":"支持向量机与Naïve贝叶斯算法情感分析场景预处理性能比较","authors":"Nabila Valinka Pusean, N. Charibaldi, B. Santosa","doi":"10.25139/inform.v8i1.5667","DOIUrl":null,"url":null,"abstract":"Television shows need a rating in their assessment, but public opinion is also required to complete it. Sentiment analysis is necessary for its completion. An essential step in sentiment analysis is pre-processing because, in public opinion, there are still many inappropriate writings. This study aims to compare the performance results using different pre-processing scenarios to get the best pre-processing performance on Support Vector Machine (SVM) and Naïve Bayes (NB) on sentiment analysis about the television show X Factor Indonesia. The stages used to start from literature study, problem analysis, design, data collection, pre-processing with two scenarios, word weighting with TF-IDF, classification using SVM and NB, then resulting accuracy from Confusion Matrix. The findings of this research are that optimal performance can be achieved using a comprehensive pre-processing scenario. This scenario should include the following steps: case-folding, removing emoji, cleansing, removing repetition characters, word normalization, negation handling, stopwords removal, stemming, and tokenization, with an accuracy of 79.44% on the SVM algorithm. This research shows that the complete pre-processing of the SVM algorithm is better in terms of accuracy, precision, recall, and F1-score. \n ","PeriodicalId":52760,"journal":{"name":"Inform Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Scenario Pre-processing Performance on Support Vector Machine and Naïve Bayes Algorithms for Sentiment Analysis\",\"authors\":\"Nabila Valinka Pusean, N. Charibaldi, B. Santosa\",\"doi\":\"10.25139/inform.v8i1.5667\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Television shows need a rating in their assessment, but public opinion is also required to complete it. Sentiment analysis is necessary for its completion. An essential step in sentiment analysis is pre-processing because, in public opinion, there are still many inappropriate writings. This study aims to compare the performance results using different pre-processing scenarios to get the best pre-processing performance on Support Vector Machine (SVM) and Naïve Bayes (NB) on sentiment analysis about the television show X Factor Indonesia. The stages used to start from literature study, problem analysis, design, data collection, pre-processing with two scenarios, word weighting with TF-IDF, classification using SVM and NB, then resulting accuracy from Confusion Matrix. The findings of this research are that optimal performance can be achieved using a comprehensive pre-processing scenario. This scenario should include the following steps: case-folding, removing emoji, cleansing, removing repetition characters, word normalization, negation handling, stopwords removal, stemming, and tokenization, with an accuracy of 79.44% on the SVM algorithm. This research shows that the complete pre-processing of the SVM algorithm is better in terms of accuracy, precision, recall, and F1-score. \\n \",\"PeriodicalId\":52760,\"journal\":{\"name\":\"Inform Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Inform Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.25139/inform.v8i1.5667\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inform Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25139/inform.v8i1.5667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

电视节目在评估中需要评级,但也需要公众的意见来完成它。情感分析是其完成的必要条件。情感分析的一个重要步骤是预处理,因为在公众舆论中,仍然有许多不恰当的文章。本研究旨在比较不同预处理场景的性能结果,以获得支持向量机(SVM)和Naïve贝叶斯(NB)在电视节目《X Factor Indonesia》情感分析中的最佳预处理性能。从文献研究、问题分析、设计、数据收集、两种场景的预处理、TF-IDF的词权、SVM和NB的分类、混淆矩阵的准确率开始。本研究的结果是,使用全面的预处理方案可以实现最佳性能。该场景应包括以下步骤:case-folding, removal emoji, cleansing, removal repetition characters, word normalization, negation handling, stopwords removal,词干提取,tokenization, SVM算法的准确率为79.44%。本研究表明,完成预处理后的SVM算法在准确率、精密度、召回率和F1-score方面都有较好的表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparison of Scenario Pre-processing Performance on Support Vector Machine and Naïve Bayes Algorithms for Sentiment Analysis
Television shows need a rating in their assessment, but public opinion is also required to complete it. Sentiment analysis is necessary for its completion. An essential step in sentiment analysis is pre-processing because, in public opinion, there are still many inappropriate writings. This study aims to compare the performance results using different pre-processing scenarios to get the best pre-processing performance on Support Vector Machine (SVM) and Naïve Bayes (NB) on sentiment analysis about the television show X Factor Indonesia. The stages used to start from literature study, problem analysis, design, data collection, pre-processing with two scenarios, word weighting with TF-IDF, classification using SVM and NB, then resulting accuracy from Confusion Matrix. The findings of this research are that optimal performance can be achieved using a comprehensive pre-processing scenario. This scenario should include the following steps: case-folding, removing emoji, cleansing, removing repetition characters, word normalization, negation handling, stopwords removal, stemming, and tokenization, with an accuracy of 79.44% on the SVM algorithm. This research shows that the complete pre-processing of the SVM algorithm is better in terms of accuracy, precision, recall, and F1-score.  
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
31
审稿时长
10 weeks
期刊最新文献
Blended Learning Vocationalogy Entrepreneurship Program: Analysis of Human-Computer Interaction Based on Technology Acceptance Model (TAM) Sentiment Analysis for IMDb Movie Review Using Support Vector Machine (SVM) Method Estimation of Brake Pad Wear Using Fuzzy Logic in Real Time Website Analysis and Design Using Iconix Process Method: Case Study: Kedai Lengghian Classification of Pistachio Nut Using Convolutional Neural Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1