UniBA @ KIPoS:词性标注的混合方法(短文)

Giovanni Luca Izzi, S. Ferilli
{"title":"UniBA @ KIPoS:词性标注的混合方法(短文)","authors":"Giovanni Luca Izzi, S. Ferilli","doi":"10.4000/BOOKS.AACCADEMIA.7773","DOIUrl":null,"url":null,"abstract":"English. The Part of Speech tagging operation is becoming increasingly important as it represents the starting point for other high-level operations such as Speech Recognition, Machine Translation, Parsing and Information Retrieval. Although the accuracy of state-of-the-art POS-taggers reach a high level of accuracy (around 96-97%) it cannot yet be considered a solved problem because there are many variables to take into account. For example, most of these systems use lexical knowledge to assign a tag to unknown words. The task solution proposed in this work is based on a hybrid tagger, which doesn’t use any prior lexical knowledge, consisting of two different types of POS-taggers used sequentially: HMM tagger and RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. We trained the hybrid model using the Development set and the combination of Development and Silver sets. The results have shown an accuracy of 0,8114 and 0,8100 respectively for the main task. Italiano. L’operazione di Part of Speech tagging sta diventando sempre più importante in quanto rappresenta il punto di partenza per altre operazioni di alto livello come Speech Recognition, Machine Translation, Parsing e Information Retrieval. Sebbene l’accuratezza dei POS tagger allo stato dell’arte raggiunga un alto livello di accuratezza (intorno al 9697%), esso non può ancora essere considerato un problema risolto perché ci Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). sono molte variabili da tenere in considerazione. Ad esempio, la maggior parte di questi sistemi utilizza della conoscenza linguistica per assegnare un tag alle parole sconosciute. La soluzione proposta in questo lavoro si basa su un tagger ibrido, che non utilizza alcuna conoscenza linguistica pregressa, costituito da due diversi tipi di POS-tagger usati in sequenza: HMM tagger e RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. Abbiamo addestrato il modello ibrido utilizzando il Development Set e la combinazione di Silver e Development Sets. I risultati hanno mostrato un’accuratezza pari a 0,8114 e 0,8100 rispettivamente per","PeriodicalId":184564,"journal":{"name":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UniBA @ KIPoS: A Hybrid Approach for Part-of-Speech Tagging (short paper)\",\"authors\":\"Giovanni Luca Izzi, S. Ferilli\",\"doi\":\"10.4000/BOOKS.AACCADEMIA.7773\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"English. The Part of Speech tagging operation is becoming increasingly important as it represents the starting point for other high-level operations such as Speech Recognition, Machine Translation, Parsing and Information Retrieval. Although the accuracy of state-of-the-art POS-taggers reach a high level of accuracy (around 96-97%) it cannot yet be considered a solved problem because there are many variables to take into account. For example, most of these systems use lexical knowledge to assign a tag to unknown words. The task solution proposed in this work is based on a hybrid tagger, which doesn’t use any prior lexical knowledge, consisting of two different types of POS-taggers used sequentially: HMM tagger and RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. We trained the hybrid model using the Development set and the combination of Development and Silver sets. The results have shown an accuracy of 0,8114 and 0,8100 respectively for the main task. Italiano. L’operazione di Part of Speech tagging sta diventando sempre più importante in quanto rappresenta il punto di partenza per altre operazioni di alto livello come Speech Recognition, Machine Translation, Parsing e Information Retrieval. Sebbene l’accuratezza dei POS tagger allo stato dell’arte raggiunga un alto livello di accuratezza (intorno al 9697%), esso non può ancora essere considerato un problema risolto perché ci Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). sono molte variabili da tenere in considerazione. Ad esempio, la maggior parte di questi sistemi utilizza della conoscenza linguistica per assegnare un tag alle parole sconosciute. La soluzione proposta in questo lavoro si basa su un tagger ibrido, che non utilizza alcuna conoscenza linguistica pregressa, costituito da due diversi tipi di POS-tagger usati in sequenza: HMM tagger e RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. Abbiamo addestrato il modello ibrido utilizzando il Development Set e la combinazione di Silver e Development Sets. I risultati hanno mostrato un’accuratezza pari a 0,8114 e 0,8100 rispettivamente per\",\"PeriodicalId\":184564,\"journal\":{\"name\":\"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4000/BOOKS.AACCADEMIA.7773\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4000/BOOKS.AACCADEMIA.7773","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

英语。词性标注操作作为语音识别、机器翻译、句法分析和信息检索等高级操作的起点,正变得越来越重要。虽然最先进的pos标记器的准确性达到了很高的准确性水平(约96-97%),但它还不能被认为是一个解决的问题,因为有许多变量需要考虑。例如,大多数这些系统使用词汇知识为未知单词分配标签。本工作提出的任务解决方案基于混合标注器,它不使用任何先前的词汇知识,由顺序使用的两种不同类型的pos标注器组成:HMM标注器和RDRPOSTagger [(Nguyen et al., 2014), (Nguyen et al., 2016)]。我们使用Development集以及Development集和Silver集的组合来训练混合模型。结果表明,主要任务的准确率分别为0.8114和0.8100。意大利语。词性标注技术在语音识别、机器翻译、句法分析和信息检索等领域的重要研究进展più。Sebbene l 'accuratezza dei POS tagger允许statto dell 'arte raggiunga un alto livello di accuratezza (intorno 9697%), essso non può ancora essere考虑到unproblema risolto perchchci版权所有©2020本文由其作者提供。在知识共享许可国际署名4.0 (CC BY 4.0)下允许使用。Sono molte变异性在考虑范围内是不存在的。与此同时,语言学家也在研究如何利用语言学家的语言能力。[[Nguyen et al., 2014], [Nguyen et al., 2016]] [font =宋体][font =宋体],[font =宋体],[font =宋体],[font =宋体],[font =宋体]。]Abbiamo adstrastrat将模型结合使用,并将开发集与开发集相结合。我认为,这是最不准确的数据来源,每年有8,814万至8,8100万人次访问
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
UniBA @ KIPoS: A Hybrid Approach for Part-of-Speech Tagging (short paper)
English. The Part of Speech tagging operation is becoming increasingly important as it represents the starting point for other high-level operations such as Speech Recognition, Machine Translation, Parsing and Information Retrieval. Although the accuracy of state-of-the-art POS-taggers reach a high level of accuracy (around 96-97%) it cannot yet be considered a solved problem because there are many variables to take into account. For example, most of these systems use lexical knowledge to assign a tag to unknown words. The task solution proposed in this work is based on a hybrid tagger, which doesn’t use any prior lexical knowledge, consisting of two different types of POS-taggers used sequentially: HMM tagger and RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. We trained the hybrid model using the Development set and the combination of Development and Silver sets. The results have shown an accuracy of 0,8114 and 0,8100 respectively for the main task. Italiano. L’operazione di Part of Speech tagging sta diventando sempre più importante in quanto rappresenta il punto di partenza per altre operazioni di alto livello come Speech Recognition, Machine Translation, Parsing e Information Retrieval. Sebbene l’accuratezza dei POS tagger allo stato dell’arte raggiunga un alto livello di accuratezza (intorno al 9697%), esso non può ancora essere considerato un problema risolto perché ci Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). sono molte variabili da tenere in considerazione. Ad esempio, la maggior parte di questi sistemi utilizza della conoscenza linguistica per assegnare un tag alle parole sconosciute. La soluzione proposta in questo lavoro si basa su un tagger ibrido, che non utilizza alcuna conoscenza linguistica pregressa, costituito da due diversi tipi di POS-tagger usati in sequenza: HMM tagger e RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. Abbiamo addestrato il modello ibrido utilizzando il Development Set e la combinazione di Silver e Development Sets. I risultati hanno mostrato un’accuratezza pari a 0,8114 e 0,8100 rispettivamente per
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DIACR-Ita @ EVALITA2020: Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task QMUL-SDS @ DIACR-Ita: Evaluating Unsupervised Diachronic Lexical Semantics Classification in Italian (short paper) By1510 @ HaSpeeDe 2: Identification of Hate Speech for Italian Language in Social Media Data (short paper) HaSpeeDe 2 @ EVALITA2020: Overview of the EVALITA 2020 Hate Speech Detection Task KIPoS @ EVALITA2020: Overview of the Task on KIParla Part of Speech Tagging
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1