{"title":"Using transitivity information for morphological and syntactic disambiguation of pronouns in Ukrainian","authors":"N. Kotsyba, Bohdan Moskalevskyi","doi":"10.23939/sisn2019.01.101","DOIUrl":null,"url":null,"abstract":"The paper presents a short introduction to several electronic resources for Ukrainian language, namely, two treebanks: the Gold standard (ab. 130 thousand tokens), manually annotated in the Universal Dependencies flavour (https://universaldependencies.org/), which comprises the training data for a machine-trained syntactic parser, and a big (near 3 billion tokens), automatically annotated General Treebank (also known as Zvidusil), as well as a valency dictionary, developed by the Institute for Ukrainian, NGO (Kyiv) in 2015-2019 (https://mova.institute/). We also describe an experimental usage of the valency dictionary information to boost the performance of the syntactic parser. As a proof of concept, we discuss the case of syntactic and morphological ambiguity of frequently used Ukrainian pronouns його, її, їх ‘his, her, their’ and ways of improving the syntactic parser’s performance using the supervised machine learning techniques with a theoretical linguistic support. Apart from the multiple morphological ambiguity (24+ possible tags for each of these forms), one of the challenges connected with the presented linguistic phenomenon, is that its correct disambiguation involves anaphora resolution and semantic roles identification. On the one hand, this makes the disambiguation process much more complicated, given the followed annotation design, on the other hand, by resolving a seemingly low-level (morphological) problem we gain a bonus in the form of significant textual analysis hints which can be later used in various NLP applications for Ukrainian. The present article is a practical follow-up of its more theoretical predecessor (Kotsyba, Moskalevskyi 2018 [11]), where the linguistic underpinnings of the syntactic and morphological interpretation of the pronouns його, її, їх in comparison with other Slavic languages are presented in greater detail.","PeriodicalId":444399,"journal":{"name":"Vìsnik Nacìonalʹnogo unìversitetu \"Lʹvìvsʹka polìtehnìka\". Serìâ Ìnformacìjnì sistemi ta merežì","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vìsnik Nacìonalʹnogo unìversitetu \"Lʹvìvsʹka polìtehnìka\". Serìâ Ìnformacìjnì sistemi ta merežì","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23939/sisn2019.01.101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The paper presents a short introduction to several electronic resources for Ukrainian language, namely, two treebanks: the Gold standard (ab. 130 thousand tokens), manually annotated in the Universal Dependencies flavour (https://universaldependencies.org/), which comprises the training data for a machine-trained syntactic parser, and a big (near 3 billion tokens), automatically annotated General Treebank (also known as Zvidusil), as well as a valency dictionary, developed by the Institute for Ukrainian, NGO (Kyiv) in 2015-2019 (https://mova.institute/). We also describe an experimental usage of the valency dictionary information to boost the performance of the syntactic parser. As a proof of concept, we discuss the case of syntactic and morphological ambiguity of frequently used Ukrainian pronouns його, її, їх ‘his, her, their’ and ways of improving the syntactic parser’s performance using the supervised machine learning techniques with a theoretical linguistic support. Apart from the multiple morphological ambiguity (24+ possible tags for each of these forms), one of the challenges connected with the presented linguistic phenomenon, is that its correct disambiguation involves anaphora resolution and semantic roles identification. On the one hand, this makes the disambiguation process much more complicated, given the followed annotation design, on the other hand, by resolving a seemingly low-level (morphological) problem we gain a bonus in the form of significant textual analysis hints which can be later used in various NLP applications for Ukrainian. The present article is a practical follow-up of its more theoretical predecessor (Kotsyba, Moskalevskyi 2018 [11]), where the linguistic underpinnings of the syntactic and morphological interpretation of the pronouns його, її, їх in comparison with other Slavic languages are presented in greater detail.