ArabBert-LSTM: improving Arabic sentiment analysis based on transformer model and Long Short-Term Memory.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Frontiers in Artificial Intelligence Pub Date : 2024-07-02 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1408845
Wael Alosaimi, Hager Saleh, Ali A Hamzah, Nora El-Rashidy, Abdullah Alharb, Ahmed Elaraby, Sherif Mostafa
{"title":"ArabBert-LSTM: improving Arabic sentiment analysis based on transformer model and Long Short-Term Memory.","authors":"Wael Alosaimi, Hager Saleh, Ali A Hamzah, Nora El-Rashidy, Abdullah Alharb, Ahmed Elaraby, Sherif Mostafa","doi":"10.3389/frai.2024.1408845","DOIUrl":null,"url":null,"abstract":"<p><p>Sentiment analysis also referred to as opinion mining, plays a significant role in automating the identification of negative, positive, or neutral sentiments expressed in textual data. The proliferation of social networks, review sites, and blogs has rendered these platforms valuable resources for mining opinions. Sentiment analysis finds applications in various domains and languages, including English and Arabic. However, Arabic presents unique challenges due to its complex morphology characterized by inflectional and derivation patterns. To effectively analyze sentiment in Arabic text, sentiment analysis techniques must account for this intricacy. This paper proposes a model designed using the transformer model and deep learning (DL) techniques. The word embedding is represented by Transformer-based Model for Arabic Language Understanding (ArabBert), and then passed to the AraBERT model. The output of AraBERT is subsequently fed into a Long Short-Term Memory (LSTM) model, followed by feedforward neural networks and an output layer. AraBERT is used to capture rich contextual information and LSTM to enhance sequence modeling and retain long-term dependencies within the text data. We compared the proposed model with machine learning (ML) algorithms and DL algorithms, as well as different vectorization techniques: term frequency-inverse document frequency (TF-IDF), ArabBert, Continuous Bag-of-Words (CBOW), and skipGrams using four Arabic benchmark datasets. Through extensive experimentation and evaluation of Arabic sentiment analysis datasets, we showcase the effectiveness of our approach. The results underscore significant improvements in sentiment analysis accuracy, highlighting the potential of leveraging transformer models for Arabic Sentiment Analysis. The outcomes of this research contribute to advancing Arabic sentiment analysis, enabling more accurate and reliable sentiment analysis in Arabic text. The findings reveal that the proposed framework exhibits exceptional performance in sentiment classification, achieving an impressive accuracy rate of over 97%.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1408845"},"PeriodicalIF":3.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11250580/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1408845","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Sentiment analysis also referred to as opinion mining, plays a significant role in automating the identification of negative, positive, or neutral sentiments expressed in textual data. The proliferation of social networks, review sites, and blogs has rendered these platforms valuable resources for mining opinions. Sentiment analysis finds applications in various domains and languages, including English and Arabic. However, Arabic presents unique challenges due to its complex morphology characterized by inflectional and derivation patterns. To effectively analyze sentiment in Arabic text, sentiment analysis techniques must account for this intricacy. This paper proposes a model designed using the transformer model and deep learning (DL) techniques. The word embedding is represented by Transformer-based Model for Arabic Language Understanding (ArabBert), and then passed to the AraBERT model. The output of AraBERT is subsequently fed into a Long Short-Term Memory (LSTM) model, followed by feedforward neural networks and an output layer. AraBERT is used to capture rich contextual information and LSTM to enhance sequence modeling and retain long-term dependencies within the text data. We compared the proposed model with machine learning (ML) algorithms and DL algorithms, as well as different vectorization techniques: term frequency-inverse document frequency (TF-IDF), ArabBert, Continuous Bag-of-Words (CBOW), and skipGrams using four Arabic benchmark datasets. Through extensive experimentation and evaluation of Arabic sentiment analysis datasets, we showcase the effectiveness of our approach. The results underscore significant improvements in sentiment analysis accuracy, highlighting the potential of leveraging transformer models for Arabic Sentiment Analysis. The outcomes of this research contribute to advancing Arabic sentiment analysis, enabling more accurate and reliable sentiment analysis in Arabic text. The findings reveal that the proposed framework exhibits exceptional performance in sentiment classification, achieving an impressive accuracy rate of over 97%.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ArabBert-LSTM:基于转换器模型和长短时记忆改进阿拉伯语情感分析。
情感分析又称意见挖掘,在自动识别文本数据中表达的负面、正面或中性情感方面发挥着重要作用。社交网络、评论网站和博客的普及使这些平台成为挖掘观点的宝贵资源。情感分析可应用于各种领域和语言,包括英语和阿拉伯语。然而,阿拉伯语因其复杂的词形,以转折和派生模式为特征,带来了独特的挑战。要有效分析阿拉伯语文本中的情感,情感分析技术必须考虑到这种复杂性。本文提出了一种利用变换器模型和深度学习(DL)技术设计的模型。单词嵌入由基于转换器的阿拉伯语理解模型(ArabBert)表示,然后传递给 AraBERT 模型。AraBERT 的输出随后被送入长短期记忆(LSTM)模型,然后是前馈神经网络和输出层。AraBERT 用于捕捉丰富的上下文信息,LSTM 用于增强序列建模并保留文本数据中的长期依赖关系。我们使用四个阿拉伯语基准数据集,将所提出的模型与机器学习(ML)算法、DL 算法以及不同的矢量化技术(词频-反向文档频率(TF-IDF)、ArabBert、连续词袋(CBOW)和 skipGrams)进行了比较。通过对阿拉伯语情感分析数据集的广泛实验和评估,我们展示了我们方法的有效性。研究结果表明,情感分析的准确性有了显著提高,凸显了利用转换器模型进行阿拉伯语情感分析的潜力。这项研究的成果有助于推进阿拉伯语情感分析,使阿拉伯语文本中的情感分析更加准确可靠。研究结果表明,所提出的框架在情感分类方面表现出色,准确率超过 97%,令人印象深刻。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.10
自引率
2.50%
发文量
272
审稿时长
13 weeks
期刊最新文献
Advancing smart city factories: enhancing industrial mechanical operations via deep learning techniques. Inpainting of damaged temple murals using edge- and line-guided diffusion patch GAN. Catalyzing IVF outcome prediction: exploring advanced machine learning paradigms for enhanced success rate prognostication. Predicting patient reported outcome measures: a scoping review for the artificial intelligence-guided patient preference predictor. A generative AI-driven interactive listening assessment task.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1