Rate Insight: A Comparative Study on Different Machine Learning and Deep Learning Approaches for Product Review Rating Prediction in Bengali Language

R. Chowdhury, Farhad Uz Zaman, Arman Sharker, Mashfiq Rahman, F. Shah
{"title":"Rate Insight: A Comparative Study on Different Machine Learning and Deep Learning Approaches for Product Review Rating Prediction in Bengali Language","authors":"R. Chowdhury, Farhad Uz Zaman, Arman Sharker, Mashfiq Rahman, F. Shah","doi":"10.1109/ICCIT57492.2022.10055515","DOIUrl":null,"url":null,"abstract":"In this contemporary era of digital marketing, ecommerce has emerged as one of the most preferred methods for day-to-day shopping. Ever since the COVID-19 pandemic, online shopping behavior has forever changed to less or no human-to-human interaction. As a result, it is getting more difficult for e-commerce enterprises to observe and evaluate market trends, particularly when done through consumer behavior analysis. To identify behavioral patterns and customer review-rating discrepancies, extensive analysis of product reviews is a substantial research field. Lack of benchmark corpora and language processing techniques, predicting review ratings in Bengali has become increasingly problematic. This paper thoroughly analyzes the approach to product review rating prediction for Bengali text reviews exploiting our own constructed dataset that was collected from an e-commerce website called DarazBD1. We acquired product reviews with labels known as ratings of five sentiment classes, from \"1\" to \"5\". It is noteworthy that we established a well-balanced dataset using our automated scraping system and a significant amount of time and effort is spent to maintain quality standards through the human annotation process. Exploration of multiple approaches to machine learning models such as logistic regression, random forest, multinomial naïve Bayes, and support vector machine, the best classification accuracy score of 78.63% is achieved by SVM. Subsequently, using Word2Vec, FastText, and GloVe embeddings with three deep neural network(DNN) architectures: CNN, Bi-LSTM, and a combination of CNN and Bi-LSTM, CNN+Bi-LSTM gave the highest accuracy score of 75.25% among the DNN architectures.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT57492.2022.10055515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this contemporary era of digital marketing, ecommerce has emerged as one of the most preferred methods for day-to-day shopping. Ever since the COVID-19 pandemic, online shopping behavior has forever changed to less or no human-to-human interaction. As a result, it is getting more difficult for e-commerce enterprises to observe and evaluate market trends, particularly when done through consumer behavior analysis. To identify behavioral patterns and customer review-rating discrepancies, extensive analysis of product reviews is a substantial research field. Lack of benchmark corpora and language processing techniques, predicting review ratings in Bengali has become increasingly problematic. This paper thoroughly analyzes the approach to product review rating prediction for Bengali text reviews exploiting our own constructed dataset that was collected from an e-commerce website called DarazBD1. We acquired product reviews with labels known as ratings of five sentiment classes, from "1" to "5". It is noteworthy that we established a well-balanced dataset using our automated scraping system and a significant amount of time and effort is spent to maintain quality standards through the human annotation process. Exploration of multiple approaches to machine learning models such as logistic regression, random forest, multinomial naïve Bayes, and support vector machine, the best classification accuracy score of 78.63% is achieved by SVM. Subsequently, using Word2Vec, FastText, and GloVe embeddings with three deep neural network(DNN) architectures: CNN, Bi-LSTM, and a combination of CNN and Bi-LSTM, CNN+Bi-LSTM gave the highest accuracy score of 75.25% among the DNN architectures.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
率洞察:不同机器学习和深度学习方法在孟加拉语产品评论评级预测中的比较研究
在这个数字营销的当代时代,电子商务已经成为最受欢迎的日常购物方式之一。自新冠肺炎疫情以来,网上购物行为永远改变为人与人之间的互动减少或根本没有。因此,电子商务企业越来越难以观察和评估市场趋势,特别是通过消费者行为分析来进行观察和评估。为了识别行为模式和客户评价-评级差异,对产品评论的广泛分析是一个重要的研究领域。缺乏基准语料库和语言处理技术,预测孟加拉语的评论评分已经变得越来越成问题。本文利用从电子商务网站DarazBD1收集的我们自己构建的数据集,深入分析了孟加拉语文本评论的产品评论评级预测方法。我们获得了带有标签的产品评论,这些标签被称为从“1”到“5”的五个情感等级的评级。值得注意的是,我们使用自动抓取系统建立了一个平衡良好的数据集,并且通过人工注释过程花费了大量的时间和精力来维持质量标准。探索了逻辑回归、随机森林、多项naïve贝叶斯、支持向量机等多种机器学习模型方法,SVM的分类准确率得分最高,达到78.63%。随后,将Word2Vec、FastText和GloVe与CNN、Bi-LSTM以及CNN和Bi-LSTM的组合三种深度神经网络(DNN)架构进行嵌入,CNN+Bi-LSTM在DNN架构中准确率最高,达到75.25%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
SlotFinder: A Spatio-temporal based Car Parking System Land Cover and Land Use Detection using Semi-Supervised Learning Comparative Analysis of Process Scheduling Algorithm using AI models Throughput Optimization of IEEE 802.15.4e TSCH-Based Scheduling: A Deep Neural Network (DNN) Scheme Towards Developing a Voice-Over-Guided System for Visually Impaired People to Learn Writing the Alphabets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1