Алгоритм построения дерева синтаксических единиц русскоязычного предложения по дереву синтаксических связей

Anatoliy Poletaev, Ilya Paramonov, Elena Boychuk
{"title":"Алгоритм построения дерева синтаксических единиц русскоязычного предложения по дереву синтаксических связей","authors":"Anatoliy Poletaev, Ilya Paramonov, Elena Boychuk","doi":"10.15622/ia.22.6.3","DOIUrl":null,"url":null,"abstract":"Automatic syntactic analysis of a sentence is an important computational linguistics task. At present, there are no syntactic structure parsers for Russian that are publicly available and suitable for practical applications. Ground-up creation of such parsers requires building of a treebank annotated according to a given formal grammar, which is quite a cumbersome task. However, since there are several syntactic dependency parsers for Russian, it seems reasonable to employ dependency parsing results for syntactic structure analysis. The article introduces an algorithm that allows to construct the constituency tree of a Russian sentence by a syntactic dependency tree. The formal grammar used by the algorithm is based on the D.E. Rosenthal’s classic reference. The algorithm was evaluated on 300 Russian-language sentences. 200 of them were selected from the aforementioned reference, and 100 from OpenCorpora, an open corpus of sentences extracted from Russian news and periodicals. During the evaluation, the sentences were passed to syntactic dependency parsers from Stanza, SpaCy, and Natasha packages, then the resulted dependency trees were processed by the proposed algorithm. The obtained constituency trees were compared with the trees manually annotated by experts in linguistics. The best performance was achieved using the Stanza parser: the constituency parsing F1–score was 0.85, and the sentence parts tagging accuracy was 0.93, that would be sufficient for many practical applications, such as event extraction, information retrieval and sentiment analysis.","PeriodicalId":491127,"journal":{"name":"Informatika i avtomatizaciâ","volume":" 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatika i avtomatizaciâ","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15622/ia.22.6.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Automatic syntactic analysis of a sentence is an important computational linguistics task. At present, there are no syntactic structure parsers for Russian that are publicly available and suitable for practical applications. Ground-up creation of such parsers requires building of a treebank annotated according to a given formal grammar, which is quite a cumbersome task. However, since there are several syntactic dependency parsers for Russian, it seems reasonable to employ dependency parsing results for syntactic structure analysis. The article introduces an algorithm that allows to construct the constituency tree of a Russian sentence by a syntactic dependency tree. The formal grammar used by the algorithm is based on the D.E. Rosenthal’s classic reference. The algorithm was evaluated on 300 Russian-language sentences. 200 of them were selected from the aforementioned reference, and 100 from OpenCorpora, an open corpus of sentences extracted from Russian news and periodicals. During the evaluation, the sentences were passed to syntactic dependency parsers from Stanza, SpaCy, and Natasha packages, then the resulted dependency trees were processed by the proposed algorithm. The obtained constituency trees were compared with the trees manually annotated by experts in linguistics. The best performance was achieved using the Stanza parser: the constituency parsing F1–score was 0.85, and the sentence parts tagging accuracy was 0.93, that would be sufficient for many practical applications, such as event extraction, information retrieval and sentiment analysis.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用句法关系树构建俄语句子句法单元树的算法
句子的自动句法分析是一项重要的计算语言学任务。目前,还没有公开可用的适合实际应用的俄语语法结构解析器。从头开始创建这样的解析器需要根据给定的形式化语法构建一个注释过的树库,这是一项相当繁琐的任务。但是,由于俄语有几个语法依赖解析器,因此使用依赖解析结果进行语法结构分析似乎是合理的。本文介绍了一种算法,该算法允许通过句法依赖树来构建俄语句子的选区树。该算法使用的形式语法是基于D.E. Rosenthal的经典参考。该算法在300个俄语句子上进行了评估。其中200个来自上述参考文献,100个来自OpenCorpora,一个从俄罗斯新闻和期刊中提取的开放语料库。在求值过程中,将来自Stanza、SpaCy和Natasha包的句子传递给句法依赖解析器,然后由所提出的算法处理生成的依赖树。将获得的选区树与语言学专家手工注释的树进行比较。使用Stanza解析器获得了最好的性能:选区解析f1得分为0.85,句子部分标注准确率为0.93,足以用于事件提取、信息检索和情感分析等许多实际应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Forecasting in Stock Markets Using the Formalism of Statistical Mechanics Аппроксимация временных рядов индексов вегетации (NDVI и EVI) для мониторинга сельхозкультур (посевов) Хабаровского края On the Partial Stability of Nonlinear Discrete-Time Systems with Delay Алгоритм построения дерева синтаксических единиц русскоязычного предложения по дереву синтаксических связей Mathematical Modeling of the Processes of Executing Packages of Tasks in Conveyor Systems with Intermediate Buffers of Limited Size
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1