一种改进词性标注系统的混合方法

2018 International Conference on Intelligent Systems and Computer Vision (ISCV) Pub Date : 2018-04-01 DOI:10.1109/ISACV.2018.8354032

S. Farrah, Hanane El Manssouri, E. Ziyati, M. Ouzzif

{"title":"一种改进词性标注系统的混合方法","authors":"S. Farrah, Hanane El Manssouri, E. Ziyati, M. Ouzzif","doi":"10.1109/ISACV.2018.8354032","DOIUrl":null,"url":null,"abstract":"Platforms interacting with data in text format, such as social networks or search engines, face major challenges regarding this flow of texts such as storage, search and information processing. New disciplines have emerged as natural language processing that involve identifying all aspects of language (spoken or written). In this perspective, we focus on the aspect of part-of speech (POS) tagging applied to the Arabic language which consists in marking each word in the text with its good tag. One of the most difficult problems affecting POS tagging is the ambiguity of the text. Ambiguity is the most important problem in the natural language processing. We propose a rule-based hybrid approach with an artificial neural network classifier to determine the appropriate tags of an Arabic text. The first phase consists of extracting all the affixes to identify the nature of the word and its tags according to grammatical rules, the second phase begins by transliterating the Arabic text into text with Roman letters. The transliterated text is then transformed into digital vectors to form the input of the classifier based on the neural networks. The two phases are combined to identify the tag of each word.","PeriodicalId":184662,"journal":{"name":"2018 International Conference on Intelligent Systems and Computer Vision (ISCV)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An hybrid approach to improve part of speech tagging system\",\"authors\":\"S. Farrah, Hanane El Manssouri, E. Ziyati, M. Ouzzif\",\"doi\":\"10.1109/ISACV.2018.8354032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Platforms interacting with data in text format, such as social networks or search engines, face major challenges regarding this flow of texts such as storage, search and information processing. New disciplines have emerged as natural language processing that involve identifying all aspects of language (spoken or written). In this perspective, we focus on the aspect of part-of speech (POS) tagging applied to the Arabic language which consists in marking each word in the text with its good tag. One of the most difficult problems affecting POS tagging is the ambiguity of the text. Ambiguity is the most important problem in the natural language processing. We propose a rule-based hybrid approach with an artificial neural network classifier to determine the appropriate tags of an Arabic text. The first phase consists of extracting all the affixes to identify the nature of the word and its tags according to grammatical rules, the second phase begins by transliterating the Arabic text into text with Roman letters. The transliterated text is then transformed into digital vectors to form the input of the classifier based on the neural networks. The two phases are combined to identify the tag of each word.\",\"PeriodicalId\":184662,\"journal\":{\"name\":\"2018 International Conference on Intelligent Systems and Computer Vision (ISCV)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Intelligent Systems and Computer Vision (ISCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISACV.2018.8354032\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Intelligent Systems and Computer Vision (ISCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISACV.2018.8354032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

与文本格式的数据交互的平台，如社交网络或搜索引擎，面临着关于文本流的主要挑战，如存储、搜索和信息处理。新的学科如自然语言处理已经出现，涉及识别语言的各个方面(口语或书面语)。从这个角度来看，我们关注的是词性标注(POS)在阿拉伯语中的应用，即在文本中为每个单词标记好词性标注。影响词性标注的最困难的问题之一是文本的歧义。歧义是自然语言处理中的一个重要问题。我们提出了一种基于规则的混合方法与人工神经网络分类器来确定阿拉伯语文本的适当标签。第一阶段是根据语法规则提取词缀来识别单词的性质及其标签，第二阶段是将阿拉伯语文本音译为罗马字母文本。然后将音译后的文本转换成数字向量，形成基于神经网络的分类器的输入。这两个阶段相结合，以确定每个词的标签。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An hybrid approach to improve part of speech tagging system

Platforms interacting with data in text format, such as social networks or search engines, face major challenges regarding this flow of texts such as storage, search and information processing. New disciplines have emerged as natural language processing that involve identifying all aspects of language (spoken or written). In this perspective, we focus on the aspect of part-of speech (POS) tagging applied to the Arabic language which consists in marking each word in the text with its good tag. One of the most difficult problems affecting POS tagging is the ambiguity of the text. Ambiguity is the most important problem in the natural language processing. We propose a rule-based hybrid approach with an artificial neural network classifier to determine the appropriate tags of an Arabic text. The first phase consists of extracting all the affixes to identify the nature of the word and its tags according to grammatical rules, the second phase begins by transliterating the Arabic text into text with Roman letters. The transliterated text is then transformed into digital vectors to form the input of the classifier based on the neural networks. The two phases are combined to identify the tag of each word.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 International Conference on Intelligent Systems and Computer Vision (ISCV)

自引率

0.00%

发文量

期刊最新文献

Policy based generic autonomic adapter for a context-aware social-collaborative system Dual-camera 3D head tracking for clinical infant monitoring Integrating web usage mining for an automatic learner profile detection: A learning styles-based approach Deep generative models: Survey Deep neural network dynamic traffic routing system for vehicles