Improvement of Moroccan Dialect Sentiment Analysis Using Arabic BERT-Based Models

Ghizlane Bourahouat, Manar Abourezq, N. Daoudi
{"title":"Improvement of Moroccan Dialect Sentiment Analysis Using Arabic BERT-Based Models","authors":"Ghizlane Bourahouat, Manar Abourezq, N. Daoudi","doi":"10.3844/jcssp.2024.157.167","DOIUrl":null,"url":null,"abstract":": This study addresses the crucial task of sentiment analysis in natural language processing, with a particular focus on Arabic, especially dialectal Arabic, which has been relatively understudied due to inherent challenges. Our approach centers on sentiment analysis in Moroccan Arabic, leveraging BERT models that are pre-trained in the Arabic language, namely AraBERT, QARIB, ALBERT, AraELECTRA, and CAMeLBERT. These models are integrated alongside deep learning and machine learning algorithms, including SVM and CNN, with additional fine-tuning of the pre-trained model. Furthermore, we examine the impact of data imbalance by evaluating the models on three distinct datasets: An unbalanced set, a balanced set obtained through under-sampling, and a balanced set created by combining the initial dataset with another unbalanced one. Notably, our proposed approach demonstrates impressive accuracy, achieving a notable 96% when employing the QARIB model even on imbalanced data. The novelty of this research lies in the integration of pre-trained Arabic BERT models for Moroccan sentiment analysis, as well as the exploration of their combined use with CNN and SVM algorithms. Furthermore, our findings reveal that employing BERT-based models yields superior results compared to their application in conjunction with CNN or SVM, marking a significant advancement in sentiment analysis for Moroccan Arabic. Our method's effectiveness is highlighted through a comparative analysis with state-of-the-art approaches, providing valuable insights that contribute to the advancement of sentiment analysis in Arabic dialects","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3844/jcssp.2024.157.167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

: This study addresses the crucial task of sentiment analysis in natural language processing, with a particular focus on Arabic, especially dialectal Arabic, which has been relatively understudied due to inherent challenges. Our approach centers on sentiment analysis in Moroccan Arabic, leveraging BERT models that are pre-trained in the Arabic language, namely AraBERT, QARIB, ALBERT, AraELECTRA, and CAMeLBERT. These models are integrated alongside deep learning and machine learning algorithms, including SVM and CNN, with additional fine-tuning of the pre-trained model. Furthermore, we examine the impact of data imbalance by evaluating the models on three distinct datasets: An unbalanced set, a balanced set obtained through under-sampling, and a balanced set created by combining the initial dataset with another unbalanced one. Notably, our proposed approach demonstrates impressive accuracy, achieving a notable 96% when employing the QARIB model even on imbalanced data. The novelty of this research lies in the integration of pre-trained Arabic BERT models for Moroccan sentiment analysis, as well as the exploration of their combined use with CNN and SVM algorithms. Furthermore, our findings reveal that employing BERT-based models yields superior results compared to their application in conjunction with CNN or SVM, marking a significant advancement in sentiment analysis for Moroccan Arabic. Our method's effectiveness is highlighted through a comparative analysis with state-of-the-art approaches, providing valuable insights that contribute to the advancement of sentiment analysis in Arabic dialects
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用基于阿拉伯语 BERT 的模型改进摩洛哥方言情感分析
:本研究探讨了自然语言处理中情感分析这一关键任务,尤其侧重于阿拉伯语,特别是方言阿拉伯语,由于其固有的挑战,对该领域的研究相对不足。我们的方法以摩洛哥阿拉伯语的情感分析为中心,利用在阿拉伯语中预先训练好的 BERT 模型,即 AraBERT、QARIB、ALBERT、AraELECTRA 和 CAMeLBERT。这些模型与 SVM 和 CNN 等深度学习和机器学习算法集成,并对预训练模型进行了额外的微调。此外,我们还通过在三个不同的数据集上评估模型来检验数据不平衡的影响:一个不平衡数据集、一个通过抽样不足获得的平衡数据集,以及一个通过将初始数据集与另一个不平衡数据集相结合而创建的平衡数据集。值得注意的是,我们提出的方法表现出了令人印象深刻的准确性,即使在不平衡数据上使用 QARIB 模型,也能达到 96% 的显著准确率。这项研究的新颖之处在于将预先训练好的阿拉伯语 BERT 模型整合到摩洛哥情感分析中,并探索如何将其与 CNN 和 SVM 算法结合使用。此外,我们的研究结果表明,与结合 CNN 或 SVM 的应用相比,使用基于 BERT 的模型会产生更优越的结果,这标志着摩洛哥阿拉伯语情感分析的重大进步。我们的方法通过与最先进方法的对比分析凸显了其有效性,为阿拉伯语方言情感分析的进步提供了有价值的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Computer Science
Journal of Computer Science Computer Science-Computer Networks and Communications
CiteScore
1.70
自引率
0.00%
发文量
92
期刊介绍: Journal of Computer Science is aimed to publish research articles on theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems. JCS updated twelve times a year and is a peer reviewed journal covers the latest and most compelling research of the time.
期刊最新文献
Features of the Security System Development of a Computer Telecommunication Network Performance Assessment of CPU Scheduling Algorithms: A Scenario-Based Approach with FCFS, RR, and SJF Website-Based Educational Application to Help MSMEs in Indonesia Develop A Multi-Split Cross-Strategy for Enhancing Machine Learning Algorithms Prediction Results with Data Generated by Conditional Generative Adversarial Network Improving the Detection of Mask-Wearing Mistakes by Deep Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1