一种改进词性标注系统的混合方法

S. Farrah, Hanane El Manssouri, E. Ziyati, M. Ouzzif
{"title":"一种改进词性标注系统的混合方法","authors":"S. Farrah, Hanane El Manssouri, E. Ziyati, M. Ouzzif","doi":"10.1109/ISACV.2018.8354032","DOIUrl":null,"url":null,"abstract":"Platforms interacting with data in text format, such as social networks or search engines, face major challenges regarding this flow of texts such as storage, search and information processing. New disciplines have emerged as natural language processing that involve identifying all aspects of language (spoken or written). In this perspective, we focus on the aspect of part-of speech (POS) tagging applied to the Arabic language which consists in marking each word in the text with its good tag. One of the most difficult problems affecting POS tagging is the ambiguity of the text. Ambiguity is the most important problem in the natural language processing. We propose a rule-based hybrid approach with an artificial neural network classifier to determine the appropriate tags of an Arabic text. The first phase consists of extracting all the affixes to identify the nature of the word and its tags according to grammatical rules, the second phase begins by transliterating the Arabic text into text with Roman letters. The transliterated text is then transformed into digital vectors to form the input of the classifier based on the neural networks. The two phases are combined to identify the tag of each word.","PeriodicalId":184662,"journal":{"name":"2018 International Conference on Intelligent Systems and Computer Vision (ISCV)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An hybrid approach to improve part of speech tagging system\",\"authors\":\"S. Farrah, Hanane El Manssouri, E. Ziyati, M. Ouzzif\",\"doi\":\"10.1109/ISACV.2018.8354032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Platforms interacting with data in text format, such as social networks or search engines, face major challenges regarding this flow of texts such as storage, search and information processing. New disciplines have emerged as natural language processing that involve identifying all aspects of language (spoken or written). In this perspective, we focus on the aspect of part-of speech (POS) tagging applied to the Arabic language which consists in marking each word in the text with its good tag. One of the most difficult problems affecting POS tagging is the ambiguity of the text. Ambiguity is the most important problem in the natural language processing. We propose a rule-based hybrid approach with an artificial neural network classifier to determine the appropriate tags of an Arabic text. The first phase consists of extracting all the affixes to identify the nature of the word and its tags according to grammatical rules, the second phase begins by transliterating the Arabic text into text with Roman letters. The transliterated text is then transformed into digital vectors to form the input of the classifier based on the neural networks. The two phases are combined to identify the tag of each word.\",\"PeriodicalId\":184662,\"journal\":{\"name\":\"2018 International Conference on Intelligent Systems and Computer Vision (ISCV)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Intelligent Systems and Computer Vision (ISCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISACV.2018.8354032\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Intelligent Systems and Computer Vision (ISCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISACV.2018.8354032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

与文本格式的数据交互的平台,如社交网络或搜索引擎,面临着关于文本流的主要挑战,如存储、搜索和信息处理。新的学科如自然语言处理已经出现,涉及识别语言的各个方面(口语或书面语)。从这个角度来看,我们关注的是词性标注(POS)在阿拉伯语中的应用,即在文本中为每个单词标记好词性标注。影响词性标注的最困难的问题之一是文本的歧义。歧义是自然语言处理中的一个重要问题。我们提出了一种基于规则的混合方法与人工神经网络分类器来确定阿拉伯语文本的适当标签。第一阶段是根据语法规则提取词缀来识别单词的性质及其标签,第二阶段是将阿拉伯语文本音译为罗马字母文本。然后将音译后的文本转换成数字向量,形成基于神经网络的分类器的输入。这两个阶段相结合,以确定每个词的标签。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An hybrid approach to improve part of speech tagging system
Platforms interacting with data in text format, such as social networks or search engines, face major challenges regarding this flow of texts such as storage, search and information processing. New disciplines have emerged as natural language processing that involve identifying all aspects of language (spoken or written). In this perspective, we focus on the aspect of part-of speech (POS) tagging applied to the Arabic language which consists in marking each word in the text with its good tag. One of the most difficult problems affecting POS tagging is the ambiguity of the text. Ambiguity is the most important problem in the natural language processing. We propose a rule-based hybrid approach with an artificial neural network classifier to determine the appropriate tags of an Arabic text. The first phase consists of extracting all the affixes to identify the nature of the word and its tags according to grammatical rules, the second phase begins by transliterating the Arabic text into text with Roman letters. The transliterated text is then transformed into digital vectors to form the input of the classifier based on the neural networks. The two phases are combined to identify the tag of each word.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Policy based generic autonomic adapter for a context-aware social-collaborative system Dual-camera 3D head tracking for clinical infant monitoring Integrating web usage mining for an automatic learner profile detection: A learning styles-based approach Deep generative models: Survey Deep neural network dynamic traffic routing system for vehicles
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1