印度尼西亚大型产品评论数据集的情感分析

A. Romadhony, Said Al Faraby, Rita Rismala, U. N. Wisesty, Anditya Arifianto
{"title":"印度尼西亚大型产品评论数据集的情感分析","authors":"A. Romadhony, Said Al Faraby, Rita Rismala, U. N. Wisesty, Anditya Arifianto","doi":"10.20473/jisebi.10.1.167-178","DOIUrl":null,"url":null,"abstract":"Background: The publicly available large dataset plays an important role in the development of the natural language processing/computational linguistic research field. However, up to now, there are only a few large Indonesian language datasets accessible for research purposes, including sentiment analysis datasets, where sentiment analysis is considered the most popular task.\nObjective: The objective of this work is to present sentiment analysis on a large Indonesian product review dataset, employing various features and methods. Two tasks have been implemented: classifying reviews into three classes (positive, negative, neutral), and predicting ratings.\nMethods: Sentiment analysis was conducted on the FDReview dataset, comprising over 700,000 reviews. The analysis treated sentiment as a classification problem, employing the following methods: Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), LSTM, and BiLSTM.\nResult: The experimental results indicate that in the comparison of performance using conventional methods, MNB outperformed SVM in rating prediction, whereas SVM exhibited better performance in the review classification task. Additionally, the results demonstrate that the BiLSTM method outperformed all other methods in both tasks. Furthermore, this study includes experiments conducted on balanced and unbalanced small-sized sample datasets.\nConclusion: Analysis of the experimental results revealed that the deep learning-based method performed better only in the large dataset setting. Results from the small balanced dataset indicate that conventional machine learning methods exhibit competitive performance compared to deep learning approaches.\n \nKeywords: Indonesian review dataset, Large dataset, Rating prediction, Sentiment analysis","PeriodicalId":16185,"journal":{"name":"Journal of Information Systems Engineering and Business Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sentiment Analysis on a Large Indonesian Product Review Dataset\",\"authors\":\"A. Romadhony, Said Al Faraby, Rita Rismala, U. N. Wisesty, Anditya Arifianto\",\"doi\":\"10.20473/jisebi.10.1.167-178\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: The publicly available large dataset plays an important role in the development of the natural language processing/computational linguistic research field. However, up to now, there are only a few large Indonesian language datasets accessible for research purposes, including sentiment analysis datasets, where sentiment analysis is considered the most popular task.\\nObjective: The objective of this work is to present sentiment analysis on a large Indonesian product review dataset, employing various features and methods. Two tasks have been implemented: classifying reviews into three classes (positive, negative, neutral), and predicting ratings.\\nMethods: Sentiment analysis was conducted on the FDReview dataset, comprising over 700,000 reviews. The analysis treated sentiment as a classification problem, employing the following methods: Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), LSTM, and BiLSTM.\\nResult: The experimental results indicate that in the comparison of performance using conventional methods, MNB outperformed SVM in rating prediction, whereas SVM exhibited better performance in the review classification task. Additionally, the results demonstrate that the BiLSTM method outperformed all other methods in both tasks. Furthermore, this study includes experiments conducted on balanced and unbalanced small-sized sample datasets.\\nConclusion: Analysis of the experimental results revealed that the deep learning-based method performed better only in the large dataset setting. Results from the small balanced dataset indicate that conventional machine learning methods exhibit competitive performance compared to deep learning approaches.\\n \\nKeywords: Indonesian review dataset, Large dataset, Rating prediction, Sentiment analysis\",\"PeriodicalId\":16185,\"journal\":{\"name\":\"Journal of Information Systems Engineering and Business Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Systems Engineering and Business Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20473/jisebi.10.1.167-178\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Systems Engineering and Business Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20473/jisebi.10.1.167-178","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:公开可用的大型数据集在自然语言处理/计算语言学研究领域的发展中发挥着重要作用。然而,到目前为止,只有少数大型印尼语数据集可用于研究目的,包括情感分析数据集,而情感分析被认为是最受欢迎的任务:这项工作的目的是利用各种特征和方法,对大型印尼语产品评论数据集进行情感分析。我们执行了两项任务:将评论分为三类(正面、负面、中性)和预测评分:情感分析是在 FDReview 数据集上进行的,该数据集包含 70 多万条评论。该分析将情感作为一个分类问题来处理,并采用了以下方法:多项式奈夫贝叶斯(MNB)、支持向量机(SVM)、LSTM 和 BiLSTM:实验结果表明,在使用传统方法进行性能比较时,MNB 在评级预测方面的性能优于 SVM,而 SVM 在评论分类任务中表现出更好的性能。此外,实验结果还表明,BiLSTM 方法在这两项任务中的表现均优于所有其他方法。此外,本研究还包括在平衡和非平衡小型样本数据集上进行的实验:对实验结果的分析表明,基于深度学习的方法仅在大型数据集设置中表现较好。来自小型平衡数据集的结果表明,与深度学习方法相比,传统的机器学习方法表现出了竞争力。关键词印尼评论数据集 大型数据集 评分预测 情感分析
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Sentiment Analysis on a Large Indonesian Product Review Dataset
Background: The publicly available large dataset plays an important role in the development of the natural language processing/computational linguistic research field. However, up to now, there are only a few large Indonesian language datasets accessible for research purposes, including sentiment analysis datasets, where sentiment analysis is considered the most popular task. Objective: The objective of this work is to present sentiment analysis on a large Indonesian product review dataset, employing various features and methods. Two tasks have been implemented: classifying reviews into three classes (positive, negative, neutral), and predicting ratings. Methods: Sentiment analysis was conducted on the FDReview dataset, comprising over 700,000 reviews. The analysis treated sentiment as a classification problem, employing the following methods: Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), LSTM, and BiLSTM. Result: The experimental results indicate that in the comparison of performance using conventional methods, MNB outperformed SVM in rating prediction, whereas SVM exhibited better performance in the review classification task. Additionally, the results demonstrate that the BiLSTM method outperformed all other methods in both tasks. Furthermore, this study includes experiments conducted on balanced and unbalanced small-sized sample datasets. Conclusion: Analysis of the experimental results revealed that the deep learning-based method performed better only in the large dataset setting. Results from the small balanced dataset indicate that conventional machine learning methods exhibit competitive performance compared to deep learning approaches.   Keywords: Indonesian review dataset, Large dataset, Rating prediction, Sentiment analysis
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.30
自引率
0.00%
发文量
0
期刊最新文献
Sentiment Analysis on a Large Indonesian Product Review Dataset Leveraging Biotic Interaction Knowledge Graph and Network Analysis to Uncover Insect Vectors of Plant Virus Model-based Decision Support System Using a System Dynamics Approach to Increase Corn Productivity Optimizing Support Vector Machine Performance for Parkinson's Disease Diagnosis Using GridSearchCV and PCA-Based Feature Extraction A Practical Approach to Enhance Data Quality Management in Government: Case Study of Indonesian Customs and Excise Office
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1