Automatic Expandable Large-Scale Sentiment Lexicon of Modern Standard Arabic and Colloquial

Hossam S. Ibrahim, Sherif M. Abdou, M. Gheith
{"title":"Automatic Expandable Large-Scale Sentiment Lexicon of Modern Standard Arabic and Colloquial","authors":"Hossam S. Ibrahim, Sherif M. Abdou, M. Gheith","doi":"10.1109/ACLING.2015.20","DOIUrl":null,"url":null,"abstract":"In subjectivity and sentiment analysis (SSA), there are two main requirements are necessary to improve sentiment analysis effectively in any language and genres, first, high coverage sentiment lexicon - where entries are tagged with semantic orientation (positive, negative and neutral) - second, tagged corpora to train the sentiment classifier. Much of research has been conducted in this area during the last decade, but the need of building these resources is still ongoing, especially for morphologically-Rich language (MRL) such as Arabic. In this paper, we present an automatic expandable wide coverage polarity lexicon of Arabic sentiment words, this lexical resource explicitly devised for supporting Arabic sentiment classification and opinion mining applications. The lexicon is built using a seed of gold-standard Arabic sentiment words which are manually collected and annotated with semantic orientation (positive or negative), and automatically expanded with sentiment orientation detection of the new sentiment words by exploiting some lexical information such as part-of-speech (POS) tags and using synset aggregation techniques from free online Arabic lexicons, thesauruses. We report efforts to expand a manually-built our polarity lexicon using different types of data. Finally, we used various tagged data to evaluate the coverage and quality of our polarity lexicon, moreover, to evaluate the lexicon expansion and its effects on the sentiment analysis accuracy. Our data focus on modern standard Arabic (MSA) and Egyptian dialectal Arabic tweets and Arabic microblogs (hotel reservation, product reviews, and TV program comments).","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACLING.2015.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

In subjectivity and sentiment analysis (SSA), there are two main requirements are necessary to improve sentiment analysis effectively in any language and genres, first, high coverage sentiment lexicon - where entries are tagged with semantic orientation (positive, negative and neutral) - second, tagged corpora to train the sentiment classifier. Much of research has been conducted in this area during the last decade, but the need of building these resources is still ongoing, especially for morphologically-Rich language (MRL) such as Arabic. In this paper, we present an automatic expandable wide coverage polarity lexicon of Arabic sentiment words, this lexical resource explicitly devised for supporting Arabic sentiment classification and opinion mining applications. The lexicon is built using a seed of gold-standard Arabic sentiment words which are manually collected and annotated with semantic orientation (positive or negative), and automatically expanded with sentiment orientation detection of the new sentiment words by exploiting some lexical information such as part-of-speech (POS) tags and using synset aggregation techniques from free online Arabic lexicons, thesauruses. We report efforts to expand a manually-built our polarity lexicon using different types of data. Finally, we used various tagged data to evaluate the coverage and quality of our polarity lexicon, moreover, to evaluate the lexicon expansion and its effects on the sentiment analysis accuracy. Our data focus on modern standard Arabic (MSA) and Egyptian dialectal Arabic tweets and Arabic microblogs (hotel reservation, product reviews, and TV program comments).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
现代标准阿拉伯语和口语的自动扩展大规模情感词典
在主观性和情感分析(SSA)中,要有效地改进任何语言和类型的情感分析,有两个主要要求:第一,高覆盖率的情感词典-其中条目被标记为语义取向(积极,消极和中性)-第二,标记语料库来训练情感分类器。在过去的十年中,这一领域进行了大量的研究,但建立这些资源的需求仍在继续,特别是对阿拉伯语等形态丰富的语言(MRL)。在本文中,我们提出了一个自动扩展的广泛覆盖的阿拉伯语情感词极性词典,该词汇资源明确地为支持阿拉伯语情感分类和意见挖掘应用而设计。该词典使用人工收集的金标准阿拉伯语情感词种子,并对其进行语义倾向(积极或消极)的注释,利用词性标签等词汇信息,利用免费在线阿拉伯语词典、同义词库中的句法集聚合技术,自动扩展新的情感词的情感倾向检测。我们报告了使用不同类型的数据扩展手动构建极性词典的努力。最后,我们使用各种标记数据来评估极性词典的覆盖率和质量,并评估词典扩展及其对情感分析准确性的影响。我们的数据集中于现代标准阿拉伯语(MSA)和埃及方言阿拉伯语推文和阿拉伯语微博(酒店预订、产品评论和电视节目评论)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Which Configuration Works Best? An Experimental Study on Supervised Arabic Twitter Sentiment Analysis Increasing the Accuracy of Opinion Mining in Arabic Tunisian Arabic aeb Wordnet: Current State and Future Extensions A Named Entities Recognition System for Modern Standard Arabic using Rule-Based Approach Transducers Cascades for an Automatic Recognition of Arabic Named Entities in Order to Establish Links to Free Resources
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1