Towards the Data-driven System for Rhetorical Parsing of Russian Texts

Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019 Pub Date : 2019-06-01 DOI:10.18653/v1/W19-2711

Artem Shelmanov, D. Pisarevskaya, Elena Chistova, S. Toldova, M. Kobozeva, I. Smirnov

引用次数: 4

Abstract

Results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank – first Russian corpus annotated within RST framework – are presented. Various lexical, quantitative, morphological, and semantic features were used. In rhetorical relation classification, ensemble of CatBoost model with selected features and a linear SVM model provides the best score (macro F1 = 54.67 ± 0.38). We discover that most of the important features for rhetorical relation classification are related to discourse connectives derived from the connectives lexicon for Russian and from other sources.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

俄语文本修辞分析的数据驱动系统研究

介绍了在Ru-RSTreebank上训练的机器学习模型的第一次实验评估结果-第一个在RST框架内注释的俄语语料库。使用了各种词汇、数量、形态和语义特征。在修辞关系分类中，选择特征的CatBoost模型与线性支持向量机模型的集成得分最高(宏观F1 = 54.67±0.38)。我们发现，大多数修辞关系分类的重要特征都与来自俄语连接词词典和其他来源的话语连接词有关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

自引率

0.00%

发文量

期刊最新文献

Towards the Data-driven System for Rhetorical Parsing of Russian Texts Nuclearity in RST and signals of coherence relations ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents Toward Cross-theory Discourse Relation Annotation Towards discourse annotation and sentiment analysis of the Basque Opinion Corpus