首页 > 最新文献

2015 First International Conference on Arabic Computational Linguistics (ACLing)最新文献

英文 中文
Islamic Fatwa Request Routing via Hierarchical Multi-label Arabic Text Categorization 基于分层多标签阿拉伯文本分类的伊斯兰法特瓦请求路由
Reda A. Zayed, Mohamed Farouk Abdel Hady, H. Hefny
Multi-label classification (MLC) is concerned withlearning from examples where each example is associatedwith a set of labels in opposite to traditional single-labelclassification where an example typically is assigned a single label. MLC problems appear in many areas, including text categorization, protein function classification, and semantic annotation of multimedia. The religious domain has become an interesting and challenging area for machine learning and natural language processing. A "fatwa" in the Islamic religion represents the legal opinion or interpretation that a qualified scholar (mufti) can give on issues related to the Islamic law. It is similar to the issue of legal opinions from courts in common-law systems. In this paper, a hierarchical classification system is introduced to automatically route incoming fatwa requests to the most relevant mufti. Each fatwa is associated to multiple categories by mufti where the categories can be organized in a hierarchy. The results on fatwa requests routing have confirmed the effective and efficient predictive performance of hierarchical ensembles of multi-label classifiers trained using the HOMER method and its variations compared to binary relevance which simply trains a classifier for each label independently.
多标签分类(MLC)关注的是从示例中学习,其中每个示例都与一组标签相关联,而传统的单标签分类则相反,其中一个示例通常被分配单个标签。MLC问题出现在文本分类、蛋白质功能分类、多媒体语义标注等多个领域。对于机器学习和自然语言处理来说,宗教领域已经成为一个有趣且具有挑战性的领域。伊斯兰教中的“法特瓦”代表了一位合格的学者(穆夫提)就与伊斯兰法有关的问题所能给出的法律意见或解释。这与英美法系法院的法律意见问题类似。本文引入了一种分层分类系统,将传入的法特瓦请求自动路由到最相关的穆夫提。每个法特瓦由穆夫提与多个类别相关联,这些类别可以按层次结构组织。在fatwa请求路由上的结果证实了使用HOMER方法训练的多标签分类器的分层集成及其变化与简单地为每个标签单独训练分类器的二元关联相比具有有效和高效的预测性能。
{"title":"Islamic Fatwa Request Routing via Hierarchical Multi-label Arabic Text Categorization","authors":"Reda A. Zayed, Mohamed Farouk Abdel Hady, H. Hefny","doi":"10.1109/ACLING.2015.28","DOIUrl":"https://doi.org/10.1109/ACLING.2015.28","url":null,"abstract":"Multi-label classification (MLC) is concerned withlearning from examples where each example is associatedwith a set of labels in opposite to traditional single-labelclassification where an example typically is assigned a single label. MLC problems appear in many areas, including text categorization, protein function classification, and semantic annotation of multimedia. The religious domain has become an interesting and challenging area for machine learning and natural language processing. A \"fatwa\" in the Islamic religion represents the legal opinion or interpretation that a qualified scholar (mufti) can give on issues related to the Islamic law. It is similar to the issue of legal opinions from courts in common-law systems. In this paper, a hierarchical classification system is introduced to automatically route incoming fatwa requests to the most relevant mufti. Each fatwa is associated to multiple categories by mufti where the categories can be organized in a hierarchy. The results on fatwa requests routing have confirmed the effective and efficient predictive performance of hierarchical ensembles of multi-label classifiers trained using the HOMER method and its variations compared to binary relevance which simply trains a classifier for each label independently.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128081390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A System for Extracting Sentiment from Large-Scale Arabic Social Data 从大规模阿拉伯社会数据中提取情感的系统
Hao Wang, Vijay R. Bommireddipalli, Ayman Hanafy, Mohamed Bahgat, Sara Noeman, O. Emam
Social media data in Arabic language is becoming more and more abundant. It is a consensus that valuable information lies in social media data. Mining this data and making the process easier are gaining momentum in the industries. This paper describes an enterprise system we developed for extracting sentiment from large volumes of social data in Arabic dialects. First, we give an overview of the Big Data system for information extraction from multilingual social data from a variety of sources. Then, we focus on the Arabic sentiment analysis capability that was built on top of the system including normalizing written Arabic dialects, building sentiment lexicons, sentiment classification, and performance evaluation. Lastly, we demonstrate the value of enriching sentiment results with user profiles in understanding sentiments of a specific user group.
阿拉伯语的社交媒体数据越来越丰富。有价值的信息存在于社交媒体数据中,这是一个共识。挖掘这些数据并简化流程的势头正在各行业中得到加强。本文描述了我们开发的一个企业系统,用于从大量阿拉伯方言的社会数据中提取情感。首先,我们概述了从各种来源的多语言社交数据中提取信息的大数据系统。然后,我们重点研究了建立在系统之上的阿拉伯语情感分析能力,包括阿拉伯语方言规范化、情感词典构建、情感分类和性能评估。最后,我们展示了用用户资料丰富情感结果在理解特定用户群体情感方面的价值。
{"title":"A System for Extracting Sentiment from Large-Scale Arabic Social Data","authors":"Hao Wang, Vijay R. Bommireddipalli, Ayman Hanafy, Mohamed Bahgat, Sara Noeman, O. Emam","doi":"10.1109/ACLING.2015.17","DOIUrl":"https://doi.org/10.1109/ACLING.2015.17","url":null,"abstract":"Social media data in Arabic language is becoming more and more abundant. It is a consensus that valuable information lies in social media data. Mining this data and making the process easier are gaining momentum in the industries. This paper describes an enterprise system we developed for extracting sentiment from large volumes of social data in Arabic dialects. First, we give an overview of the Big Data system for information extraction from multilingual social data from a variety of sources. Then, we focus on the Arabic sentiment analysis capability that was built on top of the system including normalizing written Arabic dialects, building sentiment lexicons, sentiment classification, and performance evaluation. Lastly, we demonstrate the value of enriching sentiment results with user profiles in understanding sentiments of a specific user group.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122690478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Identifying the Topic-Specific Influential Users Using SLM 使用SLM识别特定主题的有影响力用户
M. Shalaby, Ahmed Rafea
Social Influence can be described as the ability to have an effect on the thoughts or actions of others. The objective of this research is to investigate the use of language in detecting the influential users in a specific topic on Twitter. From a collection of tweets matching a specified query, we want to detect the influential users from the tweets' text. The study investigates the Arabic Egyptian dialect and if it can be used for detecting the author's influence. Using a Statistical Language Model, we found a correlation between the users' average Retweets counts and their tweets' perplexity, consolidating the hypothesis that SLM can be trained to detect the highly retweeted tweets. However, the use of the perplexity for identifying influential users resulted in low precision values. The simplistic approach carried out did not produce good results. There is still work to be done for the SLM to be used for identifying influential users.
社会影响力可以描述为对他人的思想或行为产生影响的能力。本研究的目的是调查语言在Twitter上检测特定主题中有影响力用户的使用情况。从匹配指定查询的tweet集合中,我们希望从tweet的文本中检测有影响力的用户。本研究考察了阿拉伯埃及方言,以及是否可以用它来检测作者的影响。使用统计语言模型,我们发现用户的平均转发数与他们的推文困惑度之间存在相关性,巩固了SLM可以被训练来检测高转发推文的假设。然而,使用困惑度来识别有影响力的用户导致精度值较低。这种简单化的做法没有产生好的结果。要利用SLM确定有影响力的用户,仍有许多工作要做。
{"title":"Identifying the Topic-Specific Influential Users Using SLM","authors":"M. Shalaby, Ahmed Rafea","doi":"10.1109/ACLING.2015.24","DOIUrl":"https://doi.org/10.1109/ACLING.2015.24","url":null,"abstract":"Social Influence can be described as the ability to have an effect on the thoughts or actions of others. The objective of this research is to investigate the use of language in detecting the influential users in a specific topic on Twitter. From a collection of tweets matching a specified query, we want to detect the influential users from the tweets' text. The study investigates the Arabic Egyptian dialect and if it can be used for detecting the author's influence. Using a Statistical Language Model, we found a correlation between the users' average Retweets counts and their tweets' perplexity, consolidating the hypothesis that SLM can be trained to detect the highly retweeted tweets. However, the use of the perplexity for identifying influential users resulted in low precision values. The simplistic approach carried out did not produce good results. There is still work to be done for the SLM to be used for identifying influential users.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122120993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Combined Classification for Extracting Named Entities from Arabic Texts 从阿拉伯文本中提取命名实体的组合分类
Fériel Ben Fraj Trabelsi, C. Ben Othmane Zribi, Wiem Kouki
In this paper, we describe an approach for extracting named entities from Arabic texts. Arabic language is hard to process since its characteristics that influence, even, the NE extraction. For our case, we consider that the named entities extraction can be assimilated to a typical classification problem. Indeed, this extraction consists of searching for text portions that can be classified in a NE class (Person, Locality or Organization). Thus, we choose to use a supervised learning approach and employ the BIO tagging format that can solve the twin problems of segmentation and categorization. In addition, singular classifier cannot give good results for all types of contexts. Thus, we adopt a set of weighted classifiers which we combined through a voting procedure. In order to appreciate properly the performance of our system, we perform two types of tests: with and without morphological attributes. We consider that the results are highly satisfactory especially with a accuracy that exceeds 89% for both Person and Locality classes.
在本文中,我们描述了一种从阿拉伯文本中提取命名实体的方法。阿拉伯语很难处理,因为它的特点甚至影响到NE的提取。对于我们的案例,我们认为命名实体的提取可以被同化为一个典型的分类问题。实际上,这种提取包括搜索可以在NE类(Person, Locality或Organization)中分类的文本部分。因此,我们选择使用监督学习方法,并采用可以解决分词和分类双重问题的BIO标记格式。此外,奇异分类器不能对所有类型的上下文都给出很好的结果。因此,我们采用一组加权分类器,我们通过投票程序组合。为了正确评估系统的性能,我们执行了两种类型的测试:带形态属性和不带形态属性。我们认为结果非常令人满意,特别是Person和Locality类的准确率都超过89%。
{"title":"Combined Classification for Extracting Named Entities from Arabic Texts","authors":"Fériel Ben Fraj Trabelsi, C. Ben Othmane Zribi, Wiem Kouki","doi":"10.1109/ACLING.2015.15","DOIUrl":"https://doi.org/10.1109/ACLING.2015.15","url":null,"abstract":"In this paper, we describe an approach for extracting named entities from Arabic texts. Arabic language is hard to process since its characteristics that influence, even, the NE extraction. For our case, we consider that the named entities extraction can be assimilated to a typical classification problem. Indeed, this extraction consists of searching for text portions that can be classified in a NE class (Person, Locality or Organization). Thus, we choose to use a supervised learning approach and employ the BIO tagging format that can solve the twin problems of segmentation and categorization. In addition, singular classifier cannot give good results for all types of contexts. Thus, we adopt a set of weighted classifiers which we combined through a voting procedure. In order to appreciate properly the performance of our system, we perform two types of tests: with and without morphological attributes. We consider that the results are highly satisfactory especially with a accuracy that exceeds 89% for both Person and Locality classes.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127606808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enrichment of the Arabic Treebank ATB with Syntactic Properties 阿拉伯语树库ATB语法特性的充实
R. Bahloul, K. Haddar, P. Blache
The enrichment of Arabic treebank with syntactic properties provides the increase of its use in different applications, the acquisition of new linguistic resources and the alleviation of the probabilistic parsing process by using statistics to limit the properties to satisfied ones. This method of enrichment requires two steps to follow starting by inducting a Property Grammar from a source treebank and generating finally the new syntactic property-based representation.
阿拉伯语树库中语法属性的丰富,增加了其在不同应用中的使用,获得了新的语言资源,并通过统计将属性限制在满足的属性上,减轻了概率分析过程。这种增强方法需要遵循两个步骤,首先从源树库中引入属性语法,最后生成新的基于属性的句法表示。
{"title":"Enrichment of the Arabic Treebank ATB with Syntactic Properties","authors":"R. Bahloul, K. Haddar, P. Blache","doi":"10.1109/ACLING.2015.9","DOIUrl":"https://doi.org/10.1109/ACLING.2015.9","url":null,"abstract":"The enrichment of Arabic treebank with syntactic properties provides the increase of its use in different applications, the acquisition of new linguistic resources and the alleviation of the probabilistic parsing process by using statistics to limit the properties to satisfied ones. This method of enrichment requires two steps to follow starting by inducting a Property Grammar from a source treebank and generating finally the new syntactic property-based representation.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123094330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hybrid Approach for Sentiment Classification of Egyptian Dialect Tweets 埃及方言推文情感分类的混合方法
A. Shoukry, Ahmed Rafea
Sentiment analysis has recently become one of the growing areas of research related to text mining and natural language processing. The main task of sentiment classification is to classify a sentence (i.e. tweet, review, blog, comment, news, etc.) as holding an overall positive, negative or neutral sentiment. Most of the current studies related to this topic focus mainly on English texts with very limited resources available for other languages like Arabic, especially for the Egyptian dialect. In this research work, we would like to improve the performance measures of Egyptian dialect sentence-level sentiment analysis by proposing a hybrid approach which combines both the machine learning approach using support vector machines and the semantic orientation approach. Two methodologies were proposed, one for each approach, which were then joined, creating the hybrid proposed approach. The results obtained show significant improvements in terms of the accuracy, precision, recall and F-measure, indicating that our proposed hybrid approach is effective in sentence-level sentiment classification. Also, the results are very promising which encourages continuing in this line of research.
情感分析近年来已成为与文本挖掘和自然语言处理相关的研究领域之一。情感分类的主要任务是将一个句子(如推文、评论、博客、评论、新闻等)分类为总体上持有积极、消极或中性的情绪。目前与此主题相关的大多数研究主要集中在英语文本上,而阿拉伯语等其他语言的资源非常有限,特别是埃及方言。在这项研究工作中,我们希望通过提出一种混合方法来改进埃及方言句子级情感分析的性能指标,该方法结合了使用支持向量机的机器学习方法和语义取向方法。提出了两种方法,每种方法一种,然后将它们连接起来,创建混合建议的方法。结果表明,该方法在准确率、精密度、查全率和f测度方面均有显著提高,表明该方法在句子级情感分类中是有效的。此外,结果非常有希望,鼓励继续进行这方面的研究。
{"title":"A Hybrid Approach for Sentiment Classification of Egyptian Dialect Tweets","authors":"A. Shoukry, Ahmed Rafea","doi":"10.1109/ACLING.2015.18","DOIUrl":"https://doi.org/10.1109/ACLING.2015.18","url":null,"abstract":"Sentiment analysis has recently become one of the growing areas of research related to text mining and natural language processing. The main task of sentiment classification is to classify a sentence (i.e. tweet, review, blog, comment, news, etc.) as holding an overall positive, negative or neutral sentiment. Most of the current studies related to this topic focus mainly on English texts with very limited resources available for other languages like Arabic, especially for the Egyptian dialect. In this research work, we would like to improve the performance measures of Egyptian dialect sentence-level sentiment analysis by proposing a hybrid approach which combines both the machine learning approach using support vector machines and the semantic orientation approach. Two methodologies were proposed, one for each approach, which were then joined, creating the hybrid proposed approach. The results obtained show significant improvements in terms of the accuracy, precision, recall and F-measure, indicating that our proposed hybrid approach is effective in sentence-level sentiment classification. Also, the results are very promising which encourages continuing in this line of research.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129143096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
A Proposed Approach for Arabic Language Segmentation 一种阿拉伯语分词方法
Adnan Souri, Mohammed Al Achhab, Badr Eddine El Mouhajir
This paper presents a research about natural language processing (NLP). Our area of interest is the process of Arabic text segmentation. Text segmentation is important step in any NLP. In this paper, we discuss several methods dealing mainly with cases of ambiguity of Arabic text segmentation. Several conclusions have been made. These conclusions lead to make a proposal of text segmentation. A vision based on connectors is developed.
本文对自然语言处理(NLP)进行了研究。我们感兴趣的领域是阿拉伯语文本分割的过程。文本分割是任何自然语言处理的重要步骤。本文主要讨论了几种处理阿拉伯语文本分词歧义的方法。得出了几个结论。在这些结论的基础上,提出了文本分割的建议。提出了一种基于连接器的愿景。
{"title":"A Proposed Approach for Arabic Language Segmentation","authors":"Adnan Souri, Mohammed Al Achhab, Badr Eddine El Mouhajir","doi":"10.1109/ACLING.2015.13","DOIUrl":"https://doi.org/10.1109/ACLING.2015.13","url":null,"abstract":"This paper presents a research about natural language processing (NLP). Our area of interest is the process of Arabic text segmentation. Text segmentation is important step in any NLP. In this paper, we discuss several methods dealing mainly with cases of ambiguity of Arabic text segmentation. Several conclusions have been made. These conclusions lead to make a proposal of text segmentation. A vision based on connectors is developed.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127098625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Tunisian Arabic aeb Wordnet: Current State and Future Extensions 突尼斯阿拉伯语aeb Wordnet:当前状态和未来扩展
Nadia Karmani Ben Moussa, Hsan Soussou, A. Alimi
Nowadays, Internet communication and especially informal Internet communication such as social networks, blogs, etc. is directing politic, economic, financial and social environments all over the world. Consequently, Internet monitoring is taking more and more scale particularly in Tunisia suffering from unsteadiness since the politic revolution in 2011. In a Tunisian context, Internet communication is characterized by the increasing use of aeb language (i.e. an Arabic dialect called Tunisian Arabic). Therefore, Tunisian Internet monitoring needs primarily aeb language processing tools, especially an aeb lexicon. However, few aeb lexicon were developed seen the lack of written resources. Some of these lexicons are created from Arabic lexicons. They cover aeb lexicon originally Arabic and ignore the large borrowed aeb lexicon. Others are build using the informal Web. In fact, they need a rigorous linguistic verification, correction and validation. In this case, we suggest building a standard, large and robust Wordnet taking in charge phonetic. Our Wordnet is created by the expand approach used for EuroWordnet building as in [12], based on the bilingual English-Tunisian Arabic dictionary Peace corps dictionary prepared by the linguists: R. Ben abdelkader, A. Ayed and A. Naouar [13], and the last version of Princeton Wordnet PWN 3.1. Moreover, it is modelized according to ISO-LMF by a switable Wordnet-LMF model for aeb language. In this paper, we present aeb wordnet building approach, describe its current state and propose extensions.
如今,网络传播,尤其是社交网络、博客等非正式网络传播,正在影响着世界各地的政治、经济、金融和社会环境。因此,互联网监控的规模越来越大,特别是在2011年政治革命以来遭受不稳定的突尼斯。在突尼斯,互联网交流的特点是越来越多地使用aeb语言(即一种被称为突尼斯阿拉伯语的阿拉伯方言)。因此,突尼斯互联网监测主要需要aeb语言处理工具,尤其是aeb词典。然而,由于缺乏书面资源,很少有aeb词汇被开发出来。其中一些词汇是从阿拉伯语词汇中创建的。它们涵盖了原始阿拉伯语的aeb词汇,而忽略了大量借来的aeb词汇。其他的则是使用非正式的Web构建的。实际上,它们需要经过严格的语言验证、纠正和确认。在这种情况下,我们建议建立一个标准的、大型的、健壮的Wordnet来负责语音。我们的Wordnet是根据b[12]中用于欧洲Wordnet构建的扩展方法创建的,基于由语言学家R. Ben abdelkader, A. Ayed和A. Naouar b[13]编写的英语-突尼斯阿拉伯语双语词典和平队词典,以及普林斯顿Wordnet PWN 3.1的最后版本。此外,还根据ISO-LMF模型建立了一个可切换的Wordnet-LMF模型。在本文中,我们提出了aeb世界网的构建方法,描述了它的现状并提出了扩展。
{"title":"Tunisian Arabic aeb Wordnet: Current State and Future Extensions","authors":"Nadia Karmani Ben Moussa, Hsan Soussou, A. Alimi","doi":"10.1109/ACLING.2015.7","DOIUrl":"https://doi.org/10.1109/ACLING.2015.7","url":null,"abstract":"Nowadays, Internet communication and especially informal Internet communication such as social networks, blogs, etc. is directing politic, economic, financial and social environments all over the world. Consequently, Internet monitoring is taking more and more scale particularly in Tunisia suffering from unsteadiness since the politic revolution in 2011. In a Tunisian context, Internet communication is characterized by the increasing use of aeb language (i.e. an Arabic dialect called Tunisian Arabic). Therefore, Tunisian Internet monitoring needs primarily aeb language processing tools, especially an aeb lexicon. However, few aeb lexicon were developed seen the lack of written resources. Some of these lexicons are created from Arabic lexicons. They cover aeb lexicon originally Arabic and ignore the large borrowed aeb lexicon. Others are build using the informal Web. In fact, they need a rigorous linguistic verification, correction and validation. In this case, we suggest building a standard, large and robust Wordnet taking in charge phonetic. Our Wordnet is created by the expand approach used for EuroWordnet building as in [12], based on the bilingual English-Tunisian Arabic dictionary Peace corps dictionary prepared by the linguists: R. Ben abdelkader, A. Ayed and A. Naouar [13], and the last version of Princeton Wordnet PWN 3.1. Moreover, it is modelized according to ISO-LMF by a switable Wordnet-LMF model for aeb language. In this paper, we present aeb wordnet building approach, describe its current state and propose extensions.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115053181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Named Entities Recognition System for Modern Standard Arabic using Rule-Based Approach 基于规则方法的现代标准阿拉伯语命名实体识别系统
Hala Elsayed, T. Elghazaly
Named Entity Recognition (NER) is a task in Information Extraction (IE). The Named Entity Recognition has become very important for Natural Language Processing (NLP). In this paper, we designed a system which enhanced the named entities recognition for Arabic language where the system was developed for Arabic nouns and entities extractions. The nouns extraction system is based on Arabic morphological, the Arabic grammar rules a lot of them are not used before. The noun extraction in the system uses no gazetteers and the system is combined with entities extraction system depending on gazetteers. The system extracts noun according to morphological Arabic and classify them into proper nouns entities, title entities, currency entities, percentage entities, countries entities, cities entities, nationality entities, number entities, places entities, date entities and time entities. The system applied algorithms for generate nationality entities from countries entities, and the system applied Regular Expression (RE) for extract numbers in digit format. The system is not needed to normalization into the text before extraction process. The system tested text that is in the Modern Standard Arabic (MSA), the corpus is in open text. The system achieves results in an average recall of 85%.
命名实体识别(NER)是信息抽取(IE)中的一项任务。命名实体识别在自然语言处理(NLP)中已经成为一个非常重要的问题。本文设计了一个增强阿拉伯语命名实体识别的系统,该系统是针对阿拉伯语名词和实体抽取而开发的。名词抽取系统是基于阿拉伯语的形态,很多阿拉伯语的语法规则是以前没有使用过的。系统中的名词提取不使用地名词典,并结合了依赖地名词典的实体提取系统。系统根据形态阿拉伯语提取名词,并将其分类为专有名词实体、名称实体、货币实体、百分比实体、国家实体、城市实体、国籍实体、数字实体、地点实体、日期实体和时间实体。系统采用算法从国家实体生成国籍实体,采用正则表达式(正则表达式)提取数字格式的数字。该系统不需要将文本归一化后再进行提取处理。系统测试的文本是现代标准阿拉伯语(MSA),语料库是开放文本。该系统的平均召回率为85%。
{"title":"A Named Entities Recognition System for Modern Standard Arabic using Rule-Based Approach","authors":"Hala Elsayed, T. Elghazaly","doi":"10.1109/ACLING.2015.14","DOIUrl":"https://doi.org/10.1109/ACLING.2015.14","url":null,"abstract":"Named Entity Recognition (NER) is a task in Information Extraction (IE). The Named Entity Recognition has become very important for Natural Language Processing (NLP). In this paper, we designed a system which enhanced the named entities recognition for Arabic language where the system was developed for Arabic nouns and entities extractions. The nouns extraction system is based on Arabic morphological, the Arabic grammar rules a lot of them are not used before. The noun extraction in the system uses no gazetteers and the system is combined with entities extraction system depending on gazetteers. The system extracts noun according to morphological Arabic and classify them into proper nouns entities, title entities, currency entities, percentage entities, countries entities, cities entities, nationality entities, number entities, places entities, date entities and time entities. The system applied algorithms for generate nationality entities from countries entities, and the system applied Regular Expression (RE) for extract numbers in digit format. The system is not needed to normalization into the text before extraction process. The system tested text that is in the Modern Standard Arabic (MSA), the corpus is in open text. The system achieves results in an average recall of 85%.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116857028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Lexicon Based and Multi-Criteria Decision Making (MCDM) Approach for Detecting Emotions from Arabic Microblog Text 基于词汇的多准则决策(MCDM)阿拉伯语微博文本情感检测方法
Ahmad M. Abd Al-Aziz, M. Gheith, A. Eldin
Emotions serve as a communicative function both within the brain and within the social group. Most of previous opinion mining studies applied on Arabic microblog text to identify positive, negative or neutral polarity. This paper studies the problem of detecting multiple emotion classes in Arabic microblog text (e.g. Twitter). Incoming Arabic microblog text is classified into one of fine grained emotional classes {happiness, sadness, fear, anger, disgust or none} if exists or mixed emotion if text contains multiple emotions e.g. {Happiness/Fear} or {Anger/Disgust}. We applied a combined approach of lexicon approach and Multi-Criteria Decision Making approach. We use a conditioned plot to classify and analyze the text by generating a two dimensional graphic analysis space, one dimension represents observations (tweets) and the other represents our variables (5 emotional scores). The experimental results show that our proposed approach by using the conditioned plot able to classify text into different fine grained emotions, and also able to classify Arabic text with mixed emotions.
情绪在大脑和社会群体中都是一种交流功能。以往的意见挖掘研究大多是对阿拉伯语微博文本进行正面、负面或中性极性的识别。本文研究了阿拉伯语微博文本(如Twitter)中多情感类的检测问题。传入的阿拉伯语微博文本如果存在,则分为细粒度情绪类{快乐、悲伤、恐惧、愤怒、厌恶或无情绪};如果文本包含多种情绪,例如{快乐/恐惧}或{愤怒/厌恶},则分为混合情绪。我们采用了词典法和多标准决策法相结合的方法。我们使用条件图通过生成二维图形分析空间来对文本进行分类和分析,一维表示观察(tweet),另一维表示我们的变量(5个情感得分)。实验结果表明,我们提出的基于条件图的方法能够将文本分类为不同细粒度的情感,也能够对混合情感的阿拉伯文本进行分类。
{"title":"Lexicon Based and Multi-Criteria Decision Making (MCDM) Approach for Detecting Emotions from Arabic Microblog Text","authors":"Ahmad M. Abd Al-Aziz, M. Gheith, A. Eldin","doi":"10.1109/ACLING.2015.21","DOIUrl":"https://doi.org/10.1109/ACLING.2015.21","url":null,"abstract":"Emotions serve as a communicative function both within the brain and within the social group. Most of previous opinion mining studies applied on Arabic microblog text to identify positive, negative or neutral polarity. This paper studies the problem of detecting multiple emotion classes in Arabic microblog text (e.g. Twitter). Incoming Arabic microblog text is classified into one of fine grained emotional classes {happiness, sadness, fear, anger, disgust or none} if exists or mixed emotion if text contains multiple emotions e.g. {Happiness/Fear} or {Anger/Disgust}. We applied a combined approach of lexicon approach and Multi-Criteria Decision Making approach. We use a conditioned plot to classify and analyze the text by generating a two dimensional graphic analysis space, one dimension represents observations (tweets) and the other represents our variables (5 emotional scores). The experimental results show that our proposed approach by using the conditioned plot able to classify text into different fine grained emotions, and also able to classify Arabic text with mixed emotions.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131028983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
2015 First International Conference on Arabic Computational Linguistics (ACLing)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1