Improving Arabic sentiment analysis across context-aware attention deep model based on natural language processing

IF 1.7 3区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Language Resources and Evaluation Pub Date : 2024-04-27 DOI:10.1007/s10579-024-09741-z
Abubakr H. Ombabi, Wael Ouarda, Adel M. Alimi
{"title":"Improving Arabic sentiment analysis across context-aware attention deep model based on natural language processing","authors":"Abubakr H. Ombabi, Wael Ouarda, Adel M. Alimi","doi":"10.1007/s10579-024-09741-z","DOIUrl":null,"url":null,"abstract":"<p>With the enormous growth of social data in recent years, sentiment analysis has gained increasing research attention and has been widely explored in various languages. Arabic language nature imposes several challenges, such as the complicated morphological structure and the limited resources, Thereby, the current state-of-the-art methods for sentiment analysis remain to be enhanced. This inspired us to explore the application of the emerging deep-learning architecture to Arabic text classification. In this paper, we present an ensemble model which integrates a convolutional neural network, bidirectional long short-term memory (Bi-LSTM), and attention mechanism, to predict the sentiment orientation of Arabic sentences. The convolutional layer is used for feature extraction from the higher-level sentence representations layer, the BiLSTM is integrated to further capture the contextual information from the produced set of features. Two attention mechanism units are incorporated to highlight the critical information from the contextual feature vectors produced by the Bi-LSTM hidden layers. The context-related vectors generated by the attention mechanism layers are then concatenated and passed into a classifier to predict the final label. To disentangle the influence of these components, the proposed model is validated as three variant architectures on a multi-domains corpus, as well as four benchmarks. Experimental results show that incorporating Bi-LSTM and attention mechanism improves the model’s performance while yielding 96.08% in accuracy. Consequently, this architecture consistently outperforms the other State-of-The-Art approaches with up to + 14.47%, + 20.38%, and + 18.45% improvements in accuracy, precision, and recall respectively. These results demonstrated the strengths of this model in addressing the challenges of text classification tasks.</p>","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"8 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Resources and Evaluation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10579-024-09741-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

With the enormous growth of social data in recent years, sentiment analysis has gained increasing research attention and has been widely explored in various languages. Arabic language nature imposes several challenges, such as the complicated morphological structure and the limited resources, Thereby, the current state-of-the-art methods for sentiment analysis remain to be enhanced. This inspired us to explore the application of the emerging deep-learning architecture to Arabic text classification. In this paper, we present an ensemble model which integrates a convolutional neural network, bidirectional long short-term memory (Bi-LSTM), and attention mechanism, to predict the sentiment orientation of Arabic sentences. The convolutional layer is used for feature extraction from the higher-level sentence representations layer, the BiLSTM is integrated to further capture the contextual information from the produced set of features. Two attention mechanism units are incorporated to highlight the critical information from the contextual feature vectors produced by the Bi-LSTM hidden layers. The context-related vectors generated by the attention mechanism layers are then concatenated and passed into a classifier to predict the final label. To disentangle the influence of these components, the proposed model is validated as three variant architectures on a multi-domains corpus, as well as four benchmarks. Experimental results show that incorporating Bi-LSTM and attention mechanism improves the model’s performance while yielding 96.08% in accuracy. Consequently, this architecture consistently outperforms the other State-of-The-Art approaches with up to + 14.47%, + 20.38%, and + 18.45% improvements in accuracy, precision, and recall respectively. These results demonstrated the strengths of this model in addressing the challenges of text classification tasks.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过基于自然语言处理的情境感知注意力深度模型改进阿拉伯语情感分析
近年来,随着社会数据的巨大增长,情感分析越来越受到研究人员的关注,并在各种语言中得到了广泛的探索。阿拉伯语的性质带来了一些挑战,如复杂的形态结构和有限的资源,因此,目前最先进的情感分析方法仍有待改进。这启发我们探索将新兴的深度学习架构应用于阿拉伯语文本分类。在本文中,我们提出了一种整合了卷积神经网络、双向长短期记忆(Bi-LSTM)和注意力机制的集合模型,用于预测阿拉伯语句子的情感取向。卷积层用于从高层句子表征层提取特征,双向长短期记忆(Bi-LSTM)用于从生成的特征集中进一步捕捉上下文信息。Bi-LSTM 隐藏层生成的上下文特征向量中包含两个注意力机制单元,用于突出关键信息。然后,由注意力机制层生成的上下文相关向量会被串联起来并传入分类器,以预测最终标签。为了区分这些成分的影响,我们在多领域语料库和四个基准上验证了所提出模型的三种不同架构。实验结果表明,结合 Bi-LSTM 和注意力机制提高了模型的性能,准确率达到 96.08%。因此,该架构的准确率、精确度和召回率分别提高了 + 14.47%、+ 20.38% 和 + 18.45%,始终优于其他最新方法。这些结果证明了该模型在应对文本分类任务挑战方面的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Language Resources and Evaluation
Language Resources and Evaluation 工程技术-计算机:跨学科应用
CiteScore
6.50
自引率
3.70%
发文量
55
审稿时长
>12 weeks
期刊介绍: Language Resources and Evaluation is the first publication devoted to the acquisition, creation, annotation, and use of language resources, together with methods for evaluation of resources, technologies, and applications. Language resources include language data and descriptions in machine readable form used to assist and augment language processing applications, such as written or spoken corpora and lexica, multimodal resources, grammars, terminology or domain specific databases and dictionaries, ontologies, multimedia databases, etc., as well as basic software tools for their acquisition, preparation, annotation, management, customization, and use. Evaluation of language resources concerns assessing the state-of-the-art for a given technology, comparing different approaches to a given problem, assessing the availability of resources and technologies for a given application, benchmarking, and assessing system usability and user satisfaction.
期刊最新文献
Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect Studying word meaning evolution through incremental semantic shift detection PARSEME-AR: Arabic reference corpus for multiword expressions using PARSEME annotation guidelines Normalized dataset for Sanskrit word segmentation and morphological parsing Conversion of the Spanish WordNet databases into a Prolog-readable format
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1