Алгоритм построения дерева синтаксических единиц русскоязычного предложения по дереву синтаксических связей

Informatika i avtomatizaciâ Pub Date : 2023-11-10 DOI:10.15622/ia.22.6.3

Anatoliy Poletaev, Ilya Paramonov, Elena Boychuk

{"title":"Алгоритм построения дерева синтаксических единиц русскоязычного предложения по дереву синтаксических связей","authors":"Anatoliy Poletaev, Ilya Paramonov, Elena Boychuk","doi":"10.15622/ia.22.6.3","DOIUrl":null,"url":null,"abstract":"Automatic syntactic analysis of a sentence is an important computational linguistics task. At present, there are no syntactic structure parsers for Russian that are publicly available and suitable for practical applications. Ground-up creation of such parsers requires building of a treebank annotated according to a given formal grammar, which is quite a cumbersome task. However, since there are several syntactic dependency parsers for Russian, it seems reasonable to employ dependency parsing results for syntactic structure analysis. The article introduces an algorithm that allows to construct the constituency tree of a Russian sentence by a syntactic dependency tree. The formal grammar used by the algorithm is based on the D.E. Rosenthal’s classic reference. The algorithm was evaluated on 300 Russian-language sentences. 200 of them were selected from the aforementioned reference, and 100 from OpenCorpora, an open corpus of sentences extracted from Russian news and periodicals. During the evaluation, the sentences were passed to syntactic dependency parsers from Stanza, SpaCy, and Natasha packages, then the resulted dependency trees were processed by the proposed algorithm. The obtained constituency trees were compared with the trees manually annotated by experts in linguistics. The best performance was achieved using the Stanza parser: the constituency parsing F1–score was 0.85, and the sentence parts tagging accuracy was 0.93, that would be sufficient for many practical applications, such as event extraction, information retrieval and sentiment analysis.","PeriodicalId":491127,"journal":{"name":"Informatika i avtomatizaciâ","volume":" 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatika i avtomatizaciâ","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15622/ia.22.6.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic syntactic analysis of a sentence is an important computational linguistics task. At present, there are no syntactic structure parsers for Russian that are publicly available and suitable for practical applications. Ground-up creation of such parsers requires building of a treebank annotated according to a given formal grammar, which is quite a cumbersome task. However, since there are several syntactic dependency parsers for Russian, it seems reasonable to employ dependency parsing results for syntactic structure analysis. The article introduces an algorithm that allows to construct the constituency tree of a Russian sentence by a syntactic dependency tree. The formal grammar used by the algorithm is based on the D.E. Rosenthal’s classic reference. The algorithm was evaluated on 300 Russian-language sentences. 200 of them were selected from the aforementioned reference, and 100 from OpenCorpora, an open corpus of sentences extracted from Russian news and periodicals. During the evaluation, the sentences were passed to syntactic dependency parsers from Stanza, SpaCy, and Natasha packages, then the resulted dependency trees were processed by the proposed algorithm. The obtained constituency trees were compared with the trees manually annotated by experts in linguistics. The best performance was achieved using the Stanza parser: the constituency parsing F1–score was 0.85, and the sentence parts tagging accuracy was 0.93, that would be sufficient for many practical applications, such as event extraction, information retrieval and sentiment analysis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用句法关系树构建俄语句子句法单元树的算法

句子的自动句法分析是一项重要的计算语言学任务。目前，还没有公开可用的适合实际应用的俄语语法结构解析器。从头开始创建这样的解析器需要根据给定的形式化语法构建一个注释过的树库，这是一项相当繁琐的任务。但是，由于俄语有几个语法依赖解析器，因此使用依赖解析结果进行语法结构分析似乎是合理的。本文介绍了一种算法，该算法允许通过句法依赖树来构建俄语句子的选区树。该算法使用的形式语法是基于D.E. Rosenthal的经典参考。该算法在300个俄语句子上进行了评估。其中200个来自上述参考文献，100个来自OpenCorpora，一个从俄罗斯新闻和期刊中提取的开放语料库。在求值过程中，将来自Stanza、SpaCy和Natasha包的句子传递给句法依赖解析器，然后由所提出的算法处理生成的依赖树。将获得的选区树与语言学专家手工注释的树进行比较。使用Stanza解析器获得了最好的性能:选区解析f1得分为0.85，句子部分标注准确率为0.93，足以用于事件提取、信息检索和情感分析等许多实际应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Informatika i avtomatizaciâ

自引率

0.00%

发文量