An Efficient Deep Learning for Thai Sentiment Analysis

IF 2.7 3区 物理与天体物理 Q2 PHYSICS, ATOMIC, MOLECULAR & CHEMICAL Atomic Data and Nuclear Data Tables Pub Date : 2023-05-13 DOI:10.3390/data8050090
Nattawat Khamphakdee, Pusadee Seresangtakul
{"title":"An Efficient Deep Learning for Thai Sentiment Analysis","authors":"Nattawat Khamphakdee, Pusadee Seresangtakul","doi":"10.3390/data8050090","DOIUrl":null,"url":null,"abstract":"The number of reviews from customers on travel websites and platforms is quickly increasing. They provide people with the ability to write reviews about their experience with respect to service quality, location, room, and cleanliness, thereby helping others before booking hotels. Many people fail to consider hotel bookings because the numerous reviews take a long time to read, and many are in a non-native language. Thus, hotel businesses need an efficient process to analyze and categorize the polarity of reviews as positive, negative, or neutral. In particular, low-resource languages such as Thai have greater limitations in terms of resources to classify sentiment polarity. In this paper, a sentiment analysis method is proposed for Thai sentiment classification in the hotel domain. Firstly, the Word2Vec technique (the continuous bag-of-words (CBOW) and skip-gram approaches) was applied to create word embeddings of different vector dimensions. Secondly, each word embedding model was combined with deep learning (DL) models to observe the impact of each word vector dimension result. We compared the performance of nine DL models (CNN, LSTM, Bi-LSTM, GRU, Bi-GRU, CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-BiGRU) with different numbers of layers to evaluate their performance in polarity classification. The dataset was classified using the FastText and BERT pre-trained models to carry out the sentiment polarity classification. Finally, our experimental results show that the WangchanBERTa model slightly improved the accuracy, producing a value of 0.9225, and the skip-gram and CNN model combination outperformed other DL models, reaching an accuracy of 0.9170. From the experiments, we found that the word vector dimensions, hyperparameter values, and the number of layers of the DL models affected the performance of sentiment classification. Our research provides guidance for setting suitable hyperparameter values to improve the accuracy of sentiment classification for the Thai language in the hotel domain.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"08 1","pages":"90"},"PeriodicalIF":2.7000,"publicationDate":"2023-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atomic Data and Nuclear Data Tables","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/data8050090","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, ATOMIC, MOLECULAR & CHEMICAL","Score":null,"Total":0}
引用次数: 3

Abstract

The number of reviews from customers on travel websites and platforms is quickly increasing. They provide people with the ability to write reviews about their experience with respect to service quality, location, room, and cleanliness, thereby helping others before booking hotels. Many people fail to consider hotel bookings because the numerous reviews take a long time to read, and many are in a non-native language. Thus, hotel businesses need an efficient process to analyze and categorize the polarity of reviews as positive, negative, or neutral. In particular, low-resource languages such as Thai have greater limitations in terms of resources to classify sentiment polarity. In this paper, a sentiment analysis method is proposed for Thai sentiment classification in the hotel domain. Firstly, the Word2Vec technique (the continuous bag-of-words (CBOW) and skip-gram approaches) was applied to create word embeddings of different vector dimensions. Secondly, each word embedding model was combined with deep learning (DL) models to observe the impact of each word vector dimension result. We compared the performance of nine DL models (CNN, LSTM, Bi-LSTM, GRU, Bi-GRU, CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-BiGRU) with different numbers of layers to evaluate their performance in polarity classification. The dataset was classified using the FastText and BERT pre-trained models to carry out the sentiment polarity classification. Finally, our experimental results show that the WangchanBERTa model slightly improved the accuracy, producing a value of 0.9225, and the skip-gram and CNN model combination outperformed other DL models, reaching an accuracy of 0.9170. From the experiments, we found that the word vector dimensions, hyperparameter values, and the number of layers of the DL models affected the performance of sentiment classification. Our research provides guidance for setting suitable hyperparameter values to improve the accuracy of sentiment classification for the Thai language in the hotel domain.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
泰国语情感分析的高效深度学习
顾客在旅游网站和平台上的评论数量正在迅速增加。他们为人们提供了就服务质量、位置、房间和清洁度等方面的体验撰写评论的能力,从而在预订酒店之前帮助他人。许多人没有考虑预订酒店,因为大量的评论需要很长时间才能阅读,而且许多评论是用非母语写的。因此,酒店企业需要一个有效的流程来分析和分类评论的极性,分为正面、负面或中性。特别是,像泰语这样的低资源语言在分类情感极性的资源方面有更大的局限性。本文提出了一种面向酒店领域的泰语情感分类的情感分析方法。首先,应用Word2Vec技术(连续词袋(CBOW)和跳过图方法)创建不同向量维数的词嵌入;其次,将每个词嵌入模型与深度学习(DL)模型相结合,观察每个词向量维结果的影响;我们比较了不同层数的9种深度学习模型(CNN、LSTM、Bi-LSTM、GRU、Bi-GRU、CNN-LSTM、CNN- bilstm、CNN-GRU和CNN- bigru)的性能,以评估它们在极性分类方面的性能。使用FastText和BERT预训练模型对数据集进行分类,进行情感极性分类。最后,我们的实验结果表明,WangchanBERTa模型的准确率略有提高,产生的值为0.9225,skip-gram和CNN模型组合优于其他深度学习模型,达到0.9170的准确率。从实验中,我们发现深度学习模型的词向量维数、超参数值和层数会影响情感分类的性能。我们的研究为在酒店领域设置合适的超参数值以提高泰语情感分类的准确性提供了指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Atomic Data and Nuclear Data Tables
Atomic Data and Nuclear Data Tables 物理-物理:核物理
CiteScore
4.50
自引率
11.10%
发文量
27
审稿时长
47 days
期刊介绍: Atomic Data and Nuclear Data Tables presents compilations of experimental and theoretical information in atomic physics, nuclear physics, and closely related fields. The journal is devoted to the publication of tables and graphs of general usefulness to researchers in both basic and applied areas. Extensive ... click here for full Aims & Scope Atomic Data and Nuclear Data Tables presents compilations of experimental and theoretical information in atomic physics, nuclear physics, and closely related fields. The journal is devoted to the publication of tables and graphs of general usefulness to researchers in both basic and applied areas. Extensive and comprehensive compilations of experimental and theoretical results are featured.
期刊最新文献
Editorial Board Subshell gaps and onsets of collectivity from proton and neutron pairing gap correlations Matrix elements for spin-orbit couplings in KRb Fine structure transitions with spectral features in Fe V and Fe VI Editorial Board
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1