及物性信息在乌克兰语代词形态和句法消歧中的应用

N. Kotsyba, Bohdan Moskalevskyi
{"title":"及物性信息在乌克兰语代词形态和句法消歧中的应用","authors":"N. Kotsyba, Bohdan Moskalevskyi","doi":"10.23939/sisn2019.01.101","DOIUrl":null,"url":null,"abstract":"The paper presents a short introduction to several electronic resources for Ukrainian language, namely, two treebanks: the Gold standard (ab. 130 thousand tokens), manually annotated in the Universal Dependencies flavour (https://universaldependencies.org/), which comprises the training data for a machine-trained syntactic parser, and a big (near 3 billion tokens), automatically annotated General Treebank (also known as Zvidusil), as well as a valency dictionary, developed by the Institute for Ukrainian, NGO (Kyiv) in 2015-2019 (https://mova.institute/). We also describe an experimental usage of the valency dictionary information to boost the performance of the syntactic parser. As a proof of concept, we discuss the case of syntactic and morphological ambiguity of frequently used Ukrainian pronouns його, її, їх ‘his, her, their’ and ways of improving the syntactic parser’s performance using the supervised machine learning techniques with a theoretical linguistic support. Apart from the multiple morphological ambiguity (24+ possible tags for each of these forms), one of the challenges connected with the presented linguistic phenomenon, is that its correct disambiguation involves anaphora resolution and semantic roles identification. On the one hand, this makes the disambiguation process much more complicated, given the followed annotation design, on the other hand, by resolving a seemingly low-level (morphological) problem we gain a bonus in the form of significant textual analysis hints which can be later used in various NLP applications for Ukrainian. The present article is a practical follow-up of its more theoretical predecessor (Kotsyba, Moskalevskyi 2018 [11]), where the linguistic underpinnings of the syntactic and morphological interpretation of the pronouns його, її, їх in comparison with other Slavic languages are presented in greater detail.","PeriodicalId":444399,"journal":{"name":"Vìsnik Nacìonalʹnogo unìversitetu \"Lʹvìvsʹka polìtehnìka\". Serìâ Ìnformacìjnì sistemi ta merežì","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using transitivity information for morphological and syntactic disambiguation of pronouns in Ukrainian\",\"authors\":\"N. Kotsyba, Bohdan Moskalevskyi\",\"doi\":\"10.23939/sisn2019.01.101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper presents a short introduction to several electronic resources for Ukrainian language, namely, two treebanks: the Gold standard (ab. 130 thousand tokens), manually annotated in the Universal Dependencies flavour (https://universaldependencies.org/), which comprises the training data for a machine-trained syntactic parser, and a big (near 3 billion tokens), automatically annotated General Treebank (also known as Zvidusil), as well as a valency dictionary, developed by the Institute for Ukrainian, NGO (Kyiv) in 2015-2019 (https://mova.institute/). We also describe an experimental usage of the valency dictionary information to boost the performance of the syntactic parser. As a proof of concept, we discuss the case of syntactic and morphological ambiguity of frequently used Ukrainian pronouns його, її, їх ‘his, her, their’ and ways of improving the syntactic parser’s performance using the supervised machine learning techniques with a theoretical linguistic support. Apart from the multiple morphological ambiguity (24+ possible tags for each of these forms), one of the challenges connected with the presented linguistic phenomenon, is that its correct disambiguation involves anaphora resolution and semantic roles identification. On the one hand, this makes the disambiguation process much more complicated, given the followed annotation design, on the other hand, by resolving a seemingly low-level (morphological) problem we gain a bonus in the form of significant textual analysis hints which can be later used in various NLP applications for Ukrainian. The present article is a practical follow-up of its more theoretical predecessor (Kotsyba, Moskalevskyi 2018 [11]), where the linguistic underpinnings of the syntactic and morphological interpretation of the pronouns його, її, їх in comparison with other Slavic languages are presented in greater detail.\",\"PeriodicalId\":444399,\"journal\":{\"name\":\"Vìsnik Nacìonalʹnogo unìversitetu \\\"Lʹvìvsʹka polìtehnìka\\\". Serìâ Ìnformacìjnì sistemi ta merežì\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Vìsnik Nacìonalʹnogo unìversitetu \\\"Lʹvìvsʹka polìtehnìka\\\". Serìâ Ìnformacìjnì sistemi ta merežì\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23939/sisn2019.01.101\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vìsnik Nacìonalʹnogo unìversitetu \"Lʹvìvsʹka polìtehnìka\". Serìâ Ìnformacìjnì sistemi ta merežì","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23939/sisn2019.01.101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文简要介绍了乌克兰语的几个电子资源,即两个树库:黄金标准(约13万个代币),以通用依赖风格手工注释(https://universaldependencies.org/),其中包括机器训练语法解析器的训练数据,以及一个大的(近30亿个代币),自动注释的General Treebank(也称为Zvidusil),以及由乌克兰研究所开发的价字典,NGO(基辅)在2015-2019年(https://mova.institute/)。我们还描述了一种使用价字典信息来提高语法解析器性能的实验方法。作为概念证明,我们讨论了经常使用的乌克兰代词його, її, їх ' his, her, their '的句法和形态歧义的情况,以及在理论语言学支持下使用监督机器学习技术提高句法解析器性能的方法。除了多种形态歧义(每种形式都有超过24种可能的标签)之外,与所提出的语言现象相关的挑战之一是其正确的歧义消除涉及回指解决和语义角色识别。一方面,考虑到下面的注释设计,这使得消歧过程变得更加复杂,另一方面,通过解决一个看似低级的(形态学)问题,我们获得了重要的文本分析提示,这些提示可以稍后在乌克兰语的各种NLP应用程序中使用。本文是其更具理论性的前身(Kotsyba, Moskalevskyi 2018[11])的实际后续,其中更详细地介绍了与其他斯拉夫语言相比,代词його, її, їх的句法和形态解释的语言学基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Using transitivity information for morphological and syntactic disambiguation of pronouns in Ukrainian
The paper presents a short introduction to several electronic resources for Ukrainian language, namely, two treebanks: the Gold standard (ab. 130 thousand tokens), manually annotated in the Universal Dependencies flavour (https://universaldependencies.org/), which comprises the training data for a machine-trained syntactic parser, and a big (near 3 billion tokens), automatically annotated General Treebank (also known as Zvidusil), as well as a valency dictionary, developed by the Institute for Ukrainian, NGO (Kyiv) in 2015-2019 (https://mova.institute/). We also describe an experimental usage of the valency dictionary information to boost the performance of the syntactic parser. As a proof of concept, we discuss the case of syntactic and morphological ambiguity of frequently used Ukrainian pronouns його, її, їх ‘his, her, their’ and ways of improving the syntactic parser’s performance using the supervised machine learning techniques with a theoretical linguistic support. Apart from the multiple morphological ambiguity (24+ possible tags for each of these forms), one of the challenges connected with the presented linguistic phenomenon, is that its correct disambiguation involves anaphora resolution and semantic roles identification. On the one hand, this makes the disambiguation process much more complicated, given the followed annotation design, on the other hand, by resolving a seemingly low-level (morphological) problem we gain a bonus in the form of significant textual analysis hints which can be later used in various NLP applications for Ukrainian. The present article is a practical follow-up of its more theoretical predecessor (Kotsyba, Moskalevskyi 2018 [11]), where the linguistic underpinnings of the syntactic and morphological interpretation of the pronouns його, її, їх in comparison with other Slavic languages are presented in greater detail.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Project of the information system of sales on the charity auction platform Intelligent system for analyzing battery charge consumption processes Information system of feedback monitoring in social networks for the formation of recommendations for the purchase of goods Software for the implementation of an intelligent system to solve the problem of “cold start” Analysis of multiplication algorithms in Galuis fields for the cryptographic protection of information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1