{"title":"Tunisian Arabic aeb Wordnet: Current State and Future Extensions","authors":"Nadia Karmani Ben Moussa, Hsan Soussou, A. Alimi","doi":"10.1109/ACLING.2015.7","DOIUrl":null,"url":null,"abstract":"Nowadays, Internet communication and especially informal Internet communication such as social networks, blogs, etc. is directing politic, economic, financial and social environments all over the world. Consequently, Internet monitoring is taking more and more scale particularly in Tunisia suffering from unsteadiness since the politic revolution in 2011. In a Tunisian context, Internet communication is characterized by the increasing use of aeb language (i.e. an Arabic dialect called Tunisian Arabic). Therefore, Tunisian Internet monitoring needs primarily aeb language processing tools, especially an aeb lexicon. However, few aeb lexicon were developed seen the lack of written resources. Some of these lexicons are created from Arabic lexicons. They cover aeb lexicon originally Arabic and ignore the large borrowed aeb lexicon. Others are build using the informal Web. In fact, they need a rigorous linguistic verification, correction and validation. In this case, we suggest building a standard, large and robust Wordnet taking in charge phonetic. Our Wordnet is created by the expand approach used for EuroWordnet building as in [12], based on the bilingual English-Tunisian Arabic dictionary Peace corps dictionary prepared by the linguists: R. Ben abdelkader, A. Ayed and A. Naouar [13], and the last version of Princeton Wordnet PWN 3.1. Moreover, it is modelized according to ISO-LMF by a switable Wordnet-LMF model for aeb language. In this paper, we present aeb wordnet building approach, describe its current state and propose extensions.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACLING.2015.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Nowadays, Internet communication and especially informal Internet communication such as social networks, blogs, etc. is directing politic, economic, financial and social environments all over the world. Consequently, Internet monitoring is taking more and more scale particularly in Tunisia suffering from unsteadiness since the politic revolution in 2011. In a Tunisian context, Internet communication is characterized by the increasing use of aeb language (i.e. an Arabic dialect called Tunisian Arabic). Therefore, Tunisian Internet monitoring needs primarily aeb language processing tools, especially an aeb lexicon. However, few aeb lexicon were developed seen the lack of written resources. Some of these lexicons are created from Arabic lexicons. They cover aeb lexicon originally Arabic and ignore the large borrowed aeb lexicon. Others are build using the informal Web. In fact, they need a rigorous linguistic verification, correction and validation. In this case, we suggest building a standard, large and robust Wordnet taking in charge phonetic. Our Wordnet is created by the expand approach used for EuroWordnet building as in [12], based on the bilingual English-Tunisian Arabic dictionary Peace corps dictionary prepared by the linguists: R. Ben abdelkader, A. Ayed and A. Naouar [13], and the last version of Princeton Wordnet PWN 3.1. Moreover, it is modelized according to ISO-LMF by a switable Wordnet-LMF model for aeb language. In this paper, we present aeb wordnet building approach, describe its current state and propose extensions.
如今,网络传播,尤其是社交网络、博客等非正式网络传播,正在影响着世界各地的政治、经济、金融和社会环境。因此,互联网监控的规模越来越大,特别是在2011年政治革命以来遭受不稳定的突尼斯。在突尼斯,互联网交流的特点是越来越多地使用aeb语言(即一种被称为突尼斯阿拉伯语的阿拉伯方言)。因此,突尼斯互联网监测主要需要aeb语言处理工具,尤其是aeb词典。然而,由于缺乏书面资源,很少有aeb词汇被开发出来。其中一些词汇是从阿拉伯语词汇中创建的。它们涵盖了原始阿拉伯语的aeb词汇,而忽略了大量借来的aeb词汇。其他的则是使用非正式的Web构建的。实际上,它们需要经过严格的语言验证、纠正和确认。在这种情况下,我们建议建立一个标准的、大型的、健壮的Wordnet来负责语音。我们的Wordnet是根据b[12]中用于欧洲Wordnet构建的扩展方法创建的,基于由语言学家R. Ben abdelkader, A. Ayed和A. Naouar b[13]编写的英语-突尼斯阿拉伯语双语词典和平队词典,以及普林斯顿Wordnet PWN 3.1的最后版本。此外,还根据ISO-LMF模型建立了一个可切换的Wordnet-LMF模型。在本文中,我们提出了aeb世界网的构建方法,描述了它的现状并提出了扩展。