首页 > 最新文献

2015 First International Conference on Arabic Computational Linguistics (ACLing)最新文献

英文 中文
Increasing the Accuracy of Opinion Mining in Arabic 提高阿拉伯语意见挖掘的准确性
Sasi Atia, K. Shaalan
Opinion Mining is a raising research field of interest, with its different applications derived by market needs to analyze product reviews or to assess the public opinion, for political reasons, during presidential campaigns. In this paper, we address an approach for improving accuracy of Opinion Mining in Arabic. In order to conduct our study we need Arabic linguistic resources for opinion mining. Investigating the available resources we found that the OCA corpus is available and sufficient to prove our approach. Experimental results showed that applying different parameters of the machine learning classifiers on the OCA corpus leads to increasing the accuracy of the Arabic Opinion Mining.
意见挖掘是一个越来越受关注的研究领域,它的不同应用来源于市场需求,如分析产品评论或评估公众意见,出于政治原因,在总统竞选期间。本文提出了一种提高阿拉伯语意见挖掘准确性的方法。为了进行我们的研究,我们需要阿拉伯语的语言资源来进行意见挖掘。调查了可用的资源,我们发现OCA语料库是可用的,足以证明我们的方法。实验结果表明,在OCA语料库上应用不同的机器学习分类器参数可以提高阿拉伯语意见挖掘的准确性。
{"title":"Increasing the Accuracy of Opinion Mining in Arabic","authors":"Sasi Atia, K. Shaalan","doi":"10.1109/ACLING.2015.22","DOIUrl":"https://doi.org/10.1109/ACLING.2015.22","url":null,"abstract":"Opinion Mining is a raising research field of interest, with its different applications derived by market needs to analyze product reviews or to assess the public opinion, for political reasons, during presidential campaigns. In this paper, we address an approach for improving accuracy of Opinion Mining in Arabic. In order to conduct our study we need Arabic linguistic resources for opinion mining. Investigating the available resources we found that the OCA corpus is available and sufficient to prove our approach. Experimental results showed that applying different parameters of the machine learning classifiers on the OCA corpus leads to increasing the accuracy of the Arabic Opinion Mining.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114509920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Towards Analyzing Saudi Tweets 分析沙特的推文
Nora Al-Twairesh, H. Al-Khalifa, A. Al-Salman
Recently Arabic dialects are receiving attention from the NLP research community due to their high usage in social media. One of the challenges of sentiment analysis of social media is the use of dialects. Since our ongoing research is on sentiment analysis of Saudi tweets, we conduct a pilot study to discover the percentage of Modern Standard Arabic (MSA) use by Saudi tweeters. The preliminary results show that 80% of the tweets used in the study are in MSA. Some phenomena found about the use of dialect in Saudi tweets are highlighted.
最近,阿拉伯语方言在社交媒体上的使用率很高,受到了NLP研究界的关注。社交媒体情感分析的挑战之一是方言的使用。由于我们正在进行的研究是对沙特推文的情绪分析,我们进行了一项试点研究,以发现沙特推特用户使用现代标准阿拉伯语(MSA)的百分比。初步结果表明,研究中使用的推文中有80%是MSA。重点分析了沙特阿拉伯推文中方言使用的一些现象。
{"title":"Towards Analyzing Saudi Tweets","authors":"Nora Al-Twairesh, H. Al-Khalifa, A. Al-Salman","doi":"10.1109/ACLING.2015.23","DOIUrl":"https://doi.org/10.1109/ACLING.2015.23","url":null,"abstract":"Recently Arabic dialects are receiving attention from the NLP research community due to their high usage in social media. One of the challenges of sentiment analysis of social media is the use of dialects. Since our ongoing research is on sentiment analysis of Saudi tweets, we conduct a pilot study to discover the percentage of Modern Standard Arabic (MSA) use by Saudi tweeters. The preliminary results show that 80% of the tweets used in the study are in MSA. Some phenomena found about the use of dialect in Saudi tweets are highlighted.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124634168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Building a Corpus for Arabic Dialects Using Games with a Purpose 用有目的的游戏建立阿拉伯语方言语料库
Maya Osman, Caroline Sabty, Nada Sharaf, Slim Abdennadher
There is a huge gap between the written form of Arabic, Modern Standard Arabic (MSA), and the different spoken Arabic dialects due to the big number of dialects. In addition, most Arabic data-sets are formed for MSA content. Traditional ways of identifying dialects of texts are time and money consuming. In addition, due to the morphological complexity of Arabic, the gender of the speaker may change structure of an Arabic sentence. Thus, dialects hold rich information (such as the origin of the speaker and the gender of the addressee). A Game With A Purpose (GWAP) called "3ammeya" is implemented to identify the dialects of Arabic sentences along with their MSA translations. Moreover, through the game, the gender of the speaker addressee are classified. The collected data will help construct an expandable and cheap corpus for dialect identification and translation to MSA.
阿拉伯语的书面形式现代标准阿拉伯语(MSA)和不同的阿拉伯语方言之间存在巨大的差距,因为方言数量众多。此外,大多数阿拉伯语数据集都是为MSA内容形成的。传统的识别文本方言的方法既费时又费钱。此外,由于阿拉伯语的形态复杂性,说话者的性别可能会改变阿拉伯语句子的结构。因此,方言拥有丰富的信息(如说话者的来源和收件人的性别)。一款名为“3ammeya”的带有目的的游戏(GWAP)被用于识别阿拉伯语句子的方言及其MSA翻译。此外,通过游戏对说话人的性别进行了分类。收集到的数据将有助于构建一个可扩展和廉价的语料库,用于方言识别和翻译到MSA。
{"title":"Building a Corpus for Arabic Dialects Using Games with a Purpose","authors":"Maya Osman, Caroline Sabty, Nada Sharaf, Slim Abdennadher","doi":"10.1109/ACLING.2015.10","DOIUrl":"https://doi.org/10.1109/ACLING.2015.10","url":null,"abstract":"There is a huge gap between the written form of Arabic, Modern Standard Arabic (MSA), and the different spoken Arabic dialects due to the big number of dialects. In addition, most Arabic data-sets are formed for MSA content. Traditional ways of identifying dialects of texts are time and money consuming. In addition, due to the morphological complexity of Arabic, the gender of the speaker may change structure of an Arabic sentence. Thus, dialects hold rich information (such as the origin of the speaker and the gender of the addressee). A Game With A Purpose (GWAP) called \"3ammeya\" is implemented to identify the dialects of Arabic sentences along with their MSA translations. Moreover, through the game, the gender of the speaker addressee are classified. The collected data will help construct an expandable and cheap corpus for dialect identification and translation to MSA.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127874579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Expandable Large-Scale Sentiment Lexicon of Modern Standard Arabic and Colloquial 现代标准阿拉伯语和口语的自动扩展大规模情感词典
Hossam S. Ibrahim, Sherif M. Abdou, M. Gheith
In subjectivity and sentiment analysis (SSA), there are two main requirements are necessary to improve sentiment analysis effectively in any language and genres, first, high coverage sentiment lexicon - where entries are tagged with semantic orientation (positive, negative and neutral) - second, tagged corpora to train the sentiment classifier. Much of research has been conducted in this area during the last decade, but the need of building these resources is still ongoing, especially for morphologically-Rich language (MRL) such as Arabic. In this paper, we present an automatic expandable wide coverage polarity lexicon of Arabic sentiment words, this lexical resource explicitly devised for supporting Arabic sentiment classification and opinion mining applications. The lexicon is built using a seed of gold-standard Arabic sentiment words which are manually collected and annotated with semantic orientation (positive or negative), and automatically expanded with sentiment orientation detection of the new sentiment words by exploiting some lexical information such as part-of-speech (POS) tags and using synset aggregation techniques from free online Arabic lexicons, thesauruses. We report efforts to expand a manually-built our polarity lexicon using different types of data. Finally, we used various tagged data to evaluate the coverage and quality of our polarity lexicon, moreover, to evaluate the lexicon expansion and its effects on the sentiment analysis accuracy. Our data focus on modern standard Arabic (MSA) and Egyptian dialectal Arabic tweets and Arabic microblogs (hotel reservation, product reviews, and TV program comments).
在主观性和情感分析(SSA)中,要有效地改进任何语言和类型的情感分析,有两个主要要求:第一,高覆盖率的情感词典-其中条目被标记为语义取向(积极,消极和中性)-第二,标记语料库来训练情感分类器。在过去的十年中,这一领域进行了大量的研究,但建立这些资源的需求仍在继续,特别是对阿拉伯语等形态丰富的语言(MRL)。在本文中,我们提出了一个自动扩展的广泛覆盖的阿拉伯语情感词极性词典,该词汇资源明确地为支持阿拉伯语情感分类和意见挖掘应用而设计。该词典使用人工收集的金标准阿拉伯语情感词种子,并对其进行语义倾向(积极或消极)的注释,利用词性标签等词汇信息,利用免费在线阿拉伯语词典、同义词库中的句法集聚合技术,自动扩展新的情感词的情感倾向检测。我们报告了使用不同类型的数据扩展手动构建极性词典的努力。最后,我们使用各种标记数据来评估极性词典的覆盖率和质量,并评估词典扩展及其对情感分析准确性的影响。我们的数据集中于现代标准阿拉伯语(MSA)和埃及方言阿拉伯语推文和阿拉伯语微博(酒店预订、产品评论和电视节目评论)。
{"title":"Automatic Expandable Large-Scale Sentiment Lexicon of Modern Standard Arabic and Colloquial","authors":"Hossam S. Ibrahim, Sherif M. Abdou, M. Gheith","doi":"10.1109/ACLING.2015.20","DOIUrl":"https://doi.org/10.1109/ACLING.2015.20","url":null,"abstract":"In subjectivity and sentiment analysis (SSA), there are two main requirements are necessary to improve sentiment analysis effectively in any language and genres, first, high coverage sentiment lexicon - where entries are tagged with semantic orientation (positive, negative and neutral) - second, tagged corpora to train the sentiment classifier. Much of research has been conducted in this area during the last decade, but the need of building these resources is still ongoing, especially for morphologically-Rich language (MRL) such as Arabic. In this paper, we present an automatic expandable wide coverage polarity lexicon of Arabic sentiment words, this lexical resource explicitly devised for supporting Arabic sentiment classification and opinion mining applications. The lexicon is built using a seed of gold-standard Arabic sentiment words which are manually collected and annotated with semantic orientation (positive or negative), and automatically expanded with sentiment orientation detection of the new sentiment words by exploiting some lexical information such as part-of-speech (POS) tags and using synset aggregation techniques from free online Arabic lexicons, thesauruses. We report efforts to expand a manually-built our polarity lexicon using different types of data. Finally, we used various tagged data to evaluate the coverage and quality of our polarity lexicon, moreover, to evaluate the lexicon expansion and its effects on the sentiment analysis accuracy. Our data focus on modern standard Arabic (MSA) and Egyptian dialectal Arabic tweets and Arabic microblogs (hotel reservation, product reviews, and TV program comments).","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124902886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Arabic Natural Language Processing from Software Engineering to Complex Pipeline 从软件工程到复杂管道的阿拉伯自然语言处理
Younes Jaafar, Karim Bouzoubaa
Arabic Natural Language Processing (ANLP) has known an important development during the last decade. Nowadays, Several ANLP tools are developed such as morphological analyzers, syntactic parsers, etc. These tools are characterized by their diversity in terms of development languages used, inputs/outputs manipulated, internal and external representations of results, etc. This is mainly due to the lack of models and standards that govern their implementations. This diversity does not favor interoperability between these tools or their reuse in new advanced projects. In this article, we propose APIs and models for three types of tools namely: stemmers, morphological analyzers and syntactic parsers, using SAFAR platform. Our proposal is a step for standardizing all aspects shared by tools of the same type. We review also the issue of interoperability between these tools. Finally, we discuss pipeline processes.
阿拉伯语自然语言处理(ANLP)在过去十年中有了重要的发展。目前,人们开发了多种语言分析工具,如词法分析工具、句法分析工具等。这些工具的特点是在使用的开发语言、操作的输入/输出、结果的内部和外部表示等方面具有多样性。这主要是由于缺乏管理其实现的模型和标准。这种多样性不利于这些工具之间的互操作性,也不利于它们在新的高级项目中的重用。在本文中,我们提出了使用SAFAR平台的三种工具的api和模型:词干分析器、形态分析器和语法分析器。我们的建议是标准化同类型工具共享的所有方面的一个步骤。我们还回顾了这些工具之间的互操作性问题。最后,我们讨论了管道过程。
{"title":"Arabic Natural Language Processing from Software Engineering to Complex Pipeline","authors":"Younes Jaafar, Karim Bouzoubaa","doi":"10.1109/ACLING.2015.11","DOIUrl":"https://doi.org/10.1109/ACLING.2015.11","url":null,"abstract":"Arabic Natural Language Processing (ANLP) has known an important development during the last decade. Nowadays, Several ANLP tools are developed such as morphological analyzers, syntactic parsers, etc. These tools are characterized by their diversity in terms of development languages used, inputs/outputs manipulated, internal and external representations of results, etc. This is mainly due to the lack of models and standards that govern their implementations. This diversity does not favor interoperability between these tools or their reuse in new advanced projects. In this article, we propose APIs and models for three types of tools namely: stemmers, morphological analyzers and syntactic parsers, using SAFAR platform. Our proposal is a step for standardizing all aspects shared by tools of the same type. We review also the issue of interoperability between these tools. Finally, we discuss pipeline processes.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127669776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Semantic Based Query Expansion for Arabic Question Answering Systems 基于语义的阿拉伯语问答系统查询扩展
Hani Al-Chalabi, S. Ray, K. Shaalan
Question Answering Systems have emerged as a good alternative to search engines where they produce the desired information in a very precise way in the real time. However, one serious concern with the Question Answering system is that despite having answers of the questions in the knowledge base, they are not able to retrieve the answer due to mismatch between the words used by users and content creators. There has been a lot of research in the field of English and some European language Question Answering Systems to handle this issue. However, Arabic Question Answering Systems could not match the pace due to some inherent difficulties with the language itself as well as due to lack of tools available to assist the researchers. In this paper, we are presenting a method to add semantically equivalent keywords in the questions by using semantic resources. The experiments suggest that the proposed research can deliver highly accurate answers for Arabic questions.
问答系统已经成为搜索引擎的一个很好的替代品,它们以非常精确的方式实时地产生所需的信息。然而,问题回答系统的一个严重问题是,尽管知识库中有问题的答案,但由于用户和内容创建者使用的单词不匹配,他们无法检索答案。在英语和一些欧洲语言的问答系统领域已经进行了大量的研究来处理这个问题。然而,由于语言本身的一些固有困难以及由于缺乏可用的工具来协助研究人员,阿拉伯语问答系统无法跟上速度。本文提出了一种利用语义资源在问题中添加语义等价关键词的方法。实验表明,拟议的研究可以为阿拉伯问题提供高度准确的答案。
{"title":"Semantic Based Query Expansion for Arabic Question Answering Systems","authors":"Hani Al-Chalabi, S. Ray, K. Shaalan","doi":"10.1109/ACLING.2015.25","DOIUrl":"https://doi.org/10.1109/ACLING.2015.25","url":null,"abstract":"Question Answering Systems have emerged as a good alternative to search engines where they produce the desired information in a very precise way in the real time. However, one serious concern with the Question Answering system is that despite having answers of the questions in the knowledge base, they are not able to retrieve the answer due to mismatch between the words used by users and content creators. There has been a lot of research in the field of English and some European language Question Answering Systems to handle this issue. However, Arabic Question Answering Systems could not match the pace due to some inherent difficulties with the language itself as well as due to lack of tools available to assist the researchers. In this paper, we are presenting a method to add semantically equivalent keywords in the questions by using semantic resources. The experiments suggest that the proposed research can deliver highly accurate answers for Arabic questions.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132754531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Toward the Resolution of Arabic Lexical Ambiguities with Transduction on Text Automaton 基于文本自动机的转导解译阿拉伯语词汇歧义研究
Nadia Ghezaiel, K. Haddar
Lexical analysis can be a way to remove ambiguities in Arabic language. So their resolution is an important task in several Natural Language Processing (NLP) applications. In this context that this paper is inscribed. Our proposed resolution method is based essentially on the use of transducers on text automata. Indeed these transducers specify the lexical and contextual rules for Arabic language. They allow the resolution of lexical ambiguities. In order to achieve this resolution method, different types of lexical ambiguities are identified and studied to extract an appropriate set of rules. After that, we described lexical rules in ELAG [19] system (Elimination of Lexical Ambiguities by Grammars), which can delete paths representing morphosyntactic ambiguities. In addition, we present an experimentation implemented in the Unitex platform and conducted by various linguistic resources to obtain disambiguated syntactic structures suitable for the syntactic analysis. The obtained results are ambitious and can be improved by adding other rules and heuristics.
词法分析是消除阿拉伯语歧义的一种方法。因此,它们的解析是自然语言处理(NLP)应用中的一个重要任务。这篇论文就是在这样的背景下题写的。我们提出的解决方法基本上是基于在文本自动机上使用换能器。实际上,这些换能器指定了阿拉伯语的词汇和上下文规则。它们允许解决词汇歧义。为了实现这种解决方法,对不同类型的词汇歧义进行识别和研究,以提取合适的规则集。之后,我们描述了ELAG[19]系统中的词法规则(Elimination of lexical ambigities by Grammars),它可以删除表示形态句法歧义的路径。此外,我们提出了在Unitex平台上实施的实验,并利用各种语言资源进行了实验,以获得适合句法分析的消歧句法结构。所获得的结果是雄心勃勃的,可以通过添加其他规则和启发式来改进。
{"title":"Toward the Resolution of Arabic Lexical Ambiguities with Transduction on Text Automaton","authors":"Nadia Ghezaiel, K. Haddar","doi":"10.1109/ACLING.2015.12","DOIUrl":"https://doi.org/10.1109/ACLING.2015.12","url":null,"abstract":"Lexical analysis can be a way to remove ambiguities in Arabic language. So their resolution is an important task in several Natural Language Processing (NLP) applications. In this context that this paper is inscribed. Our proposed resolution method is based essentially on the use of transducers on text automata. Indeed these transducers specify the lexical and contextual rules for Arabic language. They allow the resolution of lexical ambiguities. In order to achieve this resolution method, different types of lexical ambiguities are identified and studied to extract an appropriate set of rules. After that, we described lexical rules in ELAG [19] system (Elimination of Lexical Ambiguities by Grammars), which can delete paths representing morphosyntactic ambiguities. In addition, we present an experimentation implemented in the Unitex platform and conducted by various linguistic resources to obtain disambiguated syntactic structures suitable for the syntactic analysis. The obtained results are ambitious and can be improved by adding other rules and heuristics.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123930325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Unsupervised Data Driven Taxonomy Learning 无监督数据驱动分类法学习
Mahmoud M. Hosny, S. El-Beltagy, M.E. Allam
The ability to effectively organize textual information is a big challenge in intelligent text processing. With the increase in the amount of textual data being generated, this task is becoming more and more essential. In this paper we present an unsupervised computer-aided tool for automatically building classification schemes and taxonomies for enhancing the process of automated text classification. The tool utilizes the Wikipedia knowledge base and its categorization system to achieve its goal. Validation of the tool was done using a subset of a large language dataset obtained from the Google moderator series (Egypt 2.0) idea bank. The output of the tool was evaluated by comparing the similarity between the results obtained automatically from the tool, and those manually annotated by three different human evaluators, verifying the effectiveness of the tool. The tool showed effectiveness with a precision of 88.6% and recall of 81.2%.
有效组织文本信息的能力是智能文本处理的一大挑战。随着生成的文本数据量的增加,这一任务变得越来越重要。在本文中,我们提出了一个无监督的计算机辅助工具,用于自动构建分类方案和分类法,以提高自动文本分类的过程。该工具利用维基百科知识库及其分类系统来实现其目标。该工具的验证是使用从Google版主系列(埃及2.0)创意银行获得的大型语言数据集的子集完成的。通过比较从工具中自动获得的结果与由三个不同的评估人员手动注释的结果之间的相似性来评估工具的输出,验证工具的有效性。该工具的准确率为88.6%,召回率为81.2%。
{"title":"Unsupervised Data Driven Taxonomy Learning","authors":"Mahmoud M. Hosny, S. El-Beltagy, M.E. Allam","doi":"10.1109/ACLING.2015.8","DOIUrl":"https://doi.org/10.1109/ACLING.2015.8","url":null,"abstract":"The ability to effectively organize textual information is a big challenge in intelligent text processing. With the increase in the amount of textual data being generated, this task is becoming more and more essential. In this paper we present an unsupervised computer-aided tool for automatically building classification schemes and taxonomies for enhancing the process of automated text classification. The tool utilizes the Wikipedia knowledge base and its categorization system to achieve its goal. Validation of the tool was done using a subset of a large language dataset obtained from the Google moderator series (Egypt 2.0) idea bank. The output of the tool was evaluated by comparing the similarity between the results obtained automatically from the tool, and those manually annotated by three different human evaluators, verifying the effectiveness of the tool. The tool showed effectiveness with a precision of 88.6% and recall of 81.2%.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131719814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Which Configuration Works Best? An Experimental Study on Supervised Arabic Twitter Sentiment Analysis 哪种配置效果最好?监督阿拉伯语推特情感分析的实验研究
Talaat Khalil, Amal Halaby, Muhammad Hammad, S. El-Beltagy
Arabic Twitter Sentiment Analysis has been gaining a lot of attention lately with supervised approaches being exploited widely. However, to date, there has not been an experimental study that examines how different configurations of the Bag of Words model, text representation scheme, can affect various supervised machine learning methods. The goal of the presented work is to do exactly that. Specifically, this work examines which configurations work best for each of three machine learning approaches that have shown good results when applied on the task of sentiment analysis, namely: Support Vector Machines, Compliment Naïve Bayes, and Multinomial Naïve Bayes. Experimenting with different datasets has shown that each of these classifiers has a Bag of Words configuration in conjunction with which, it consistently performs best. It also showed that some features are dataset dependent.
阿拉伯语推特情感分析最近获得了很多关注,有监督的方法被广泛利用。然而,到目前为止,还没有一项实验研究来检验词袋模型(文本表示方案)的不同配置如何影响各种监督机器学习方法。本文的目标就是做到这一点。具体来说,这项工作检查了哪种配置最适合三种机器学习方法,这些方法在应用于情感分析任务时显示出良好的结果,即:支持向量机,恭维Naïve贝叶斯和多项式Naïve贝叶斯。对不同数据集的实验表明,这些分类器中的每一个都有一个词袋配置,与之相结合,它始终表现最好。它还表明,一些特征是数据集相关的。
{"title":"Which Configuration Works Best? An Experimental Study on Supervised Arabic Twitter Sentiment Analysis","authors":"Talaat Khalil, Amal Halaby, Muhammad Hammad, S. El-Beltagy","doi":"10.1109/ACLING.2015.19","DOIUrl":"https://doi.org/10.1109/ACLING.2015.19","url":null,"abstract":"Arabic Twitter Sentiment Analysis has been gaining a lot of attention lately with supervised approaches being exploited widely. However, to date, there has not been an experimental study that examines how different configurations of the Bag of Words model, text representation scheme, can affect various supervised machine learning methods. The goal of the presented work is to do exactly that. Specifically, this work examines which configurations work best for each of three machine learning approaches that have shown good results when applied on the task of sentiment analysis, namely: Support Vector Machines, Compliment Naïve Bayes, and Multinomial Naïve Bayes. Experimenting with different datasets has shown that each of these classifiers has a Bag of Words configuration in conjunction with which, it consistently performs best. It also showed that some features are dataset dependent.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114365654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Transducers Cascades for an Automatic Recognition of Arabic Named Entities in Order to Establish Links to Free Resources 转换器级联自动识别阿拉伯语命名实体,以建立链接到免费资源
Fatma Ben Mesmia, Nathalie Friburger, K. Haddar, D. Maurel
Arabic named entities (ANE) are often sources of information. That is why they are used by several applications of natural language processing (NLP) mainly in information retrieval. In order to improve the relevance of the information obtained, links to free resources can be established. Indeed, the recognition of these entities requires the use of adequate formalisms. In this paper, we propose an approach based on transducer cascades which allows the recognition of ANE more precisely the dates. This categorycan be an integral part in the events and the names of places. The implementation of the developed transducers cascades elaborated by using the CasSys tool is available under the Unitex platform. The results are motivating.
阿拉伯命名实体(ANE)通常是信息来源。这就是为什么它们在自然语言处理(NLP)的一些应用中主要用于信息检索。为了提高所获得信息的相关性,可以建立免费资源的链接。事实上,承认这些实体需要使用适当的形式。在本文中,我们提出了一种基于换能器级联的方法,可以更精确地识别ANE的日期。这个类别可以是事件和地名的一个组成部分。在Unitex平台下,使用CasSys工具实现了所开发的传感器级联。结果是鼓舞人心的。
{"title":"Transducers Cascades for an Automatic Recognition of Arabic Named Entities in Order to Establish Links to Free Resources","authors":"Fatma Ben Mesmia, Nathalie Friburger, K. Haddar, D. Maurel","doi":"10.1109/ACLING.2015.16","DOIUrl":"https://doi.org/10.1109/ACLING.2015.16","url":null,"abstract":"Arabic named entities (ANE) are often sources of information. That is why they are used by several applications of natural language processing (NLP) mainly in information retrieval. In order to improve the relevance of the information obtained, links to free resources can be established. Indeed, the recognition of these entities requires the use of adequate formalisms. In this paper, we propose an approach based on transducer cascades which allows the recognition of ANE more precisely the dates. This categorycan be an integral part in the events and the names of places. The implementation of the developed transducers cascades elaborated by using the CasSys tool is available under the Unitex platform. The results are motivating.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122008817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2015 First International Conference on Arabic Computational Linguistics (ACLing)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1