首页 > 最新文献

Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019最新文献

英文 中文
RST-Tace A tool for automatic comparison and evaluation of RST trees RST- tace一个自动比较和评估RST树的工具
Shujun Wan, Tino Kutschbach, Anke Lüdeling, Manfred Stede
This paper presents RST-Tace, a tool for automatic comparison and evaluation of RST trees. RST-Tace serves as an implementation of Iruskieta’s comparison method, which allows trees to be compared and evaluated without the influence of decisions at lower levels in a tree in terms of four factors: constituent, attachment point, nuclearity as well as relation. RST-Tace can be used regardless of the language or the size of rhetorical trees. This tool aims to measure the agreement between two annotators. The result is reflected by F-measure and inter-annotator agreement. Both the comparison table and the result of the evaluation can be obtained automatically.
本文介绍了RST- tace,一个用于RST树自动比较和评价的工具。RST-Tace是Iruskieta比较方法的实现,该方法允许在不受树中较低层次决策影响的情况下,根据四个因素对树进行比较和评估:成分、附着点、核性和关系。不论语言或修辞树的大小,都可以使用RST-Tace。该工具旨在度量两个注释者之间的一致性。结果反映在f测度和注释者间的一致性上。自动获得比较表和评价结果。
{"title":"RST-Tace A tool for automatic comparison and evaluation of RST trees","authors":"Shujun Wan, Tino Kutschbach, Anke Lüdeling, Manfred Stede","doi":"10.18653/v1/W19-2712","DOIUrl":"https://doi.org/10.18653/v1/W19-2712","url":null,"abstract":"This paper presents RST-Tace, a tool for automatic comparison and evaluation of RST trees. RST-Tace serves as an implementation of Iruskieta’s comparison method, which allows trees to be compared and evaluated without the influence of decisions at lower levels in a tree in terms of four factors: constituent, attachment point, nuclearity as well as relation. RST-Tace can be used regardless of the language or the size of rhetorical trees. This tool aims to measure the agreement between two annotators. The result is reflected by F-measure and inter-annotator agreement. Both the comparison table and the result of the evaluation can be obtained automatically.","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131073532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Toward Cross-theory Discourse Relation Annotation 跨理论话语关系注释研究
Peter Bourgonje, Olha Zolotarenko
In this exploratory study, we attempt to automatically induce PDTB-style relations from RST trees. We work with a German corpus of news commentary articles, annotated for RST trees and explicit PDTB-style relations and we focus on inducing the implicit relations in an automated way. Preliminary results look promising as a high-precision (but low-recall) way of finding implicit relations where there is no shallow structure annotated at all, but mapping proves more difficult in cases where EDUs and relation arguments overlap, yet do not seem to signal the same relation.
在这一探索性研究中,我们尝试从RST树中自动导出pdtb类型的关系。我们使用德语新闻评论文章语料库,对RST树和显式pdtb风格的关系进行了注释,并专注于以自动方式诱导隐含关系。初步结果看起来很有希望作为一种高精度(但低召回率)的方法来查找根本没有注释的浅层结构的隐式关系,但是在edu和关系参数重叠的情况下,映射证明更加困难,但似乎没有表示相同的关系。
{"title":"Toward Cross-theory Discourse Relation Annotation","authors":"Peter Bourgonje, Olha Zolotarenko","doi":"10.18653/v1/W19-2702","DOIUrl":"https://doi.org/10.18653/v1/W19-2702","url":null,"abstract":"In this exploratory study, we attempt to automatically induce PDTB-style relations from RST trees. We work with a German corpus of news commentary articles, annotated for RST trees and explicit PDTB-style relations and we focus on inducing the implicit relations in an automated way. Preliminary results look promising as a high-precision (but low-recall) way of finding implicit relations where there is no shallow structure annotated at all, but mapping proves more difficult in cases where EDUs and relation arguments overlap, yet do not seem to signal the same relation.","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125591918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards the Data-driven System for Rhetorical Parsing of Russian Texts 俄语文本修辞分析的数据驱动系统研究
Artem Shelmanov, D. Pisarevskaya, Elena Chistova, S. Toldova, M. Kobozeva, I. Smirnov
Results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank – first Russian corpus annotated within RST framework – are presented. Various lexical, quantitative, morphological, and semantic features were used. In rhetorical relation classification, ensemble of CatBoost model with selected features and a linear SVM model provides the best score (macro F1 = 54.67 ± 0.38). We discover that most of the important features for rhetorical relation classification are related to discourse connectives derived from the connectives lexicon for Russian and from other sources.
介绍了在Ru-RSTreebank上训练的机器学习模型的第一次实验评估结果-第一个在RST框架内注释的俄语语料库。使用了各种词汇、数量、形态和语义特征。在修辞关系分类中,选择特征的CatBoost模型与线性支持向量机模型的集成得分最高(宏观F1 = 54.67±0.38)。我们发现,大多数修辞关系分类的重要特征都与来自俄语连接词词典和其他来源的话语连接词有关。
{"title":"Towards the Data-driven System for Rhetorical Parsing of Russian Texts","authors":"Artem Shelmanov, D. Pisarevskaya, Elena Chistova, S. Toldova, M. Kobozeva, I. Smirnov","doi":"10.18653/v1/W19-2711","DOIUrl":"https://doi.org/10.18653/v1/W19-2711","url":null,"abstract":"Results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank – first Russian corpus annotated within RST framework – are presented. Various lexical, quantitative, morphological, and semantic features were used. In rhetorical relation classification, ensemble of CatBoost model with selected features and a linear SVM model provides the best score (macro F1 = 54.67 ± 0.38). We discover that most of the important features for rhetorical relation classification are related to discourse connectives derived from the connectives lexicon for Russian and from other sources.","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115648595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The Rhetorical Structure of Attribution 归因的修辞结构
Andrew Potter
The relational status of Attribution in Rhetorical Structure Theory has been a matter of ongoing debate. Although several researchers have weighed in on the topic, and although numerous studies have relied upon attributional structures for their analyses, nothing approaching consensus has emerged. This paper identifies three basic issues that must be resolved to determine the relational status of attributions. These are identified as the Discourse Units Issue, the Nuclearity Issue, and the Relation Identification Issue. These three issues are analyzed from the perspective of classical RST. A finding of this analysis is that the nuclearity and the relational identification of attribution structures are shown to depend on the writer’s intended effect, such that attributional relations cannot be considered as a single relation, but rather as attributional instances of other RST relations.
归因在修辞结构理论中的关系地位一直是争论不休的问题。尽管几位研究人员对这个话题进行了权衡,尽管许多研究都依赖于归因结构进行分析,但没有达成任何共识。本文提出了确定属性关系状态必须解决的三个基本问题。这些问题被认为是话语单位问题、核心性问题和关系识别问题。本文从经典RST的角度对这三个问题进行了分析。这一分析的一个发现是,归因结构的核性和关系识别取决于作者的预期效果,因此归因关系不能被视为一个单一的关系,而是作为其他RST关系的归因实例。
{"title":"The Rhetorical Structure of Attribution","authors":"Andrew Potter","doi":"10.18653/v1/W19-2706","DOIUrl":"https://doi.org/10.18653/v1/W19-2706","url":null,"abstract":"The relational status of Attribution in Rhetorical Structure Theory has been a matter of ongoing debate. Although several researchers have weighed in on the topic, and although numerous studies have relied upon attributional structures for their analyses, nothing approaching consensus has emerged. This paper identifies three basic issues that must be resolved to determine the relational status of attributions. These are identified as the Discourse Units Issue, the Nuclearity Issue, and the Relation Identification Issue. These three issues are analyzed from the perspective of classical RST. A finding of this analysis is that the nuclearity and the relational identification of attribution structures are shown to depend on the writer’s intended effect, such that attributional relations cannot be considered as a single relation, but rather as attributional instances of other RST relations.","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129827361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Nuclearity in RST and signals of coherence relations RST的核性与相干关系的信号
Debopam Das
We investigate the relationship between the notion of nuclearity as proposed in Rhetorical Structure Theory (RST) and the signalling of coherence relations. RST relations are categorized as either mononuclear (comprising a nucleus and a satellite span) or multinuclear (comprising two or more nuclei spans). We examine how mononuclear relations (e.g., Antithesis, Condition) and multinuclear relations (e.g., Contrast, List) are indicated by relational signals, more particularly by discourse markers (e.g., because, however, if, therefore). We conduct a corpus study, examining the distribution of either type of relations in the RST Discourse Treebank (Carlson et al., 2002) and the distribution of discourse markers for those relations in the RST Signalling Corpus (Das et al., 2015). Our results show that discourse markers are used more often to signal multinuclear relations than mononuclear relations. The findings also suggest a complex relationship between the relation types and syntactic categories of discourse markers (subordinating and coordinating conjunctions).
我们研究了修辞结构理论(RST)中提出的核性概念与连贯关系信号之间的关系。RST关系分为单核关系(包括一个核和一个卫星跨度)和多核关系(包括两个或多个核跨度)。我们研究了单核关系(例如,对偶,条件)和多核关系(例如,对比,列表)是如何通过关系信号,特别是话语标记(例如,因为,然而,如果,因此)来表示的。我们进行了一项语料库研究,检查了RST话语树库中两种关系的分布(Carlson等人,2002)以及RST信号语料库中这些关系的话语标记的分布(Das等人,2015)。我们的研究结果表明,话语标记更多地用于标记多核关系而不是单核关系。研究结果还表明,话语标记语(从属连词和协调连词)的关系类型与句法范畴之间存在复杂的关系。
{"title":"Nuclearity in RST and signals of coherence relations","authors":"Debopam Das","doi":"10.18653/v1/W19-2705","DOIUrl":"https://doi.org/10.18653/v1/W19-2705","url":null,"abstract":"We investigate the relationship between the notion of nuclearity as proposed in Rhetorical Structure Theory (RST) and the signalling of coherence relations. RST relations are categorized as either mononuclear (comprising a nucleus and a satellite span) or multinuclear (comprising two or more nuclei spans). We examine how mononuclear relations (e.g., Antithesis, Condition) and multinuclear relations (e.g., Contrast, List) are indicated by relational signals, more particularly by discourse markers (e.g., because, however, if, therefore). We conduct a corpus study, examining the distribution of either type of relations in the RST Discourse Treebank (Carlson et al., 2002) and the distribution of discourse markers for those relations in the RST Signalling Corpus (Das et al., 2015). Our results show that discourse markers are used more often to signal multinuclear relations than mononuclear relations. The findings also suggest a complex relationship between the relation types and syntactic categories of discourse markers (subordinating and coordinating conjunctions).","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116017034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents 上下文嵌入用于完整文档的精确多语言话语分割
Philippe Muller, Chloé Braud, Mathieu Morey
Segmentation is the first step in building practical discourse parsers, and is often neglected in discourse parsing studies. The goal is to identify the minimal spans of text to be linked by discourse relations, or to isolate explicit marking of discourse relations. Existing systems on English report F1 scores as high as 95%, but they generally assume gold sentence boundaries and are restricted to English newswire texts annotated within the RST framework. This article presents a generic approach and a system, ToNy, a discourse segmenter developed for the DisRPT shared task where multiple discourse representation schemes, languages and domains are represented. In our experiments, we found that a straightforward sequence prediction architecture with pretrained contextual embeddings is sufficient to reach performance levels comparable to existing systems, when separately trained on each corpus. We report performance between 81% and 96% in F1 score. We also observed that discourse segmentation models only display a moderate generalization capability, even within the same language and discourse representation scheme.
分词是构建实用语篇解析器的第一步,在语篇分析研究中常常被忽视。目标是确定由话语关系连接的最小文本范围,或者隔离话语关系的明确标记。现有的英语报告F1得分高达95%,但它们通常假设金句边界,并且仅限于在RST框架内注释的英语新闻专线文本。本文提出了一种通用的方法和一个系统,ToNy,一个为DisRPT共享任务开发的话语切分器,其中多个话语表示方案,语言和领域被表示。在我们的实验中,我们发现,当在每个语料库上单独训练时,具有预训练上下文嵌入的简单序列预测架构足以达到与现有系统相当的性能水平。我们报告的F1得分在81%到96%之间。我们还观察到,即使在相同的语言和话语表示方案中,话语分割模型也只显示出中等的泛化能力。
{"title":"ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents","authors":"Philippe Muller, Chloé Braud, Mathieu Morey","doi":"10.18653/v1/W19-2715","DOIUrl":"https://doi.org/10.18653/v1/W19-2715","url":null,"abstract":"Segmentation is the first step in building practical discourse parsers, and is often neglected in discourse parsing studies. The goal is to identify the minimal spans of text to be linked by discourse relations, or to isolate explicit marking of discourse relations. Existing systems on English report F1 scores as high as 95%, but they generally assume gold sentence boundaries and are restricted to English newswire texts annotated within the RST framework. This article presents a generic approach and a system, ToNy, a discourse segmenter developed for the DisRPT shared task where multiple discourse representation schemes, languages and domains are represented. In our experiments, we found that a straightforward sequence prediction architecture with pretrained contextual embeddings is sufficient to reach performance levels comparable to existing systems, when separately trained on each corpus. We report performance between 81% and 96% in F1 score. We also observed that discourse segmentation models only display a moderate generalization capability, even within the same language and discourse representation scheme.","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122160711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Towards discourse annotation and sentiment analysis of the Basque Opinion Corpus 巴斯克语意见语料库的话语注释与情感分析
J. Alkorta, Koldo Gojenola, Mikel Iruskieta
Discourse information is crucial for a better understanding of the text structure and it is also necessary to describe which part of an opinionated text is more relevant or to decide how a text span can change the polarity (strengthen or weaken) of other span by means of coherence relations. This work presents the first results on the annotation of the Basque Opinion Corpus using Rhetorical Structure Theory (RST). Our evaluation results and analysis show us the main avenues to improve on a future annotation process. We have also extracted the subjectivity of several rhetorical relations and the results show the effect of sentiment words in relations and the influence of each relation in the semantic orientation value.
语篇信息对于更好地理解语篇结构至关重要,也有必要描述一个自以为是的语篇的哪个部分更相关,或者决定一个语篇如何通过连贯关系改变其他语篇的极性(加强或削弱)。本文介绍了运用修辞结构理论(RST)对巴斯克语意见语料库进行标注的初步结果。我们的评估结果和分析向我们展示了改进未来注释过程的主要途径。我们还提取了几种修辞关系的主体性,结果显示了情感词在关系中的作用以及各关系对语义取向值的影响。
{"title":"Towards discourse annotation and sentiment analysis of the Basque Opinion Corpus","authors":"J. Alkorta, Koldo Gojenola, Mikel Iruskieta","doi":"10.18653/v1/W19-2718","DOIUrl":"https://doi.org/10.18653/v1/W19-2718","url":null,"abstract":"Discourse information is crucial for a better understanding of the text structure and it is also necessary to describe which part of an opinionated text is more relevant or to decide how a text span can change the polarity (strengthen or weaken) of other span by means of coherence relations. This work presents the first results on the annotation of the Basque Opinion Corpus using Rhetorical Structure Theory (RST). Our evaluation results and analysis show us the main avenues to improve on a future annotation process. We have also extracted the subjectivity of several rhetorical relations and the results show the effect of sentiment words in relations and the influence of each relation in the semantic orientation value.","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128012669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Using Rhetorical Structure Theory to Assess Discourse Coherence for Non-native Spontaneous Speech 运用修辞结构理论评价非母语自发语篇连贯
Xinhao Wang, Binod Gyawali, James V. Bruno, Hillary R. Molloy, Keelan Evanini, K. Zechner
This study aims to model the discourse structure of spontaneous spoken responses within the context of an assessment of English speaking proficiency for non-native speakers. Rhetorical Structure Theory (RST) has been commonly used in the analysis of discourse organization of written texts; however, limited research has been conducted to date on RST annotation and parsing of spoken language, in particular, non-native spontaneous speech. Due to the fact that the measurement of discourse coherence is typically a key metric in human scoring rubrics for assessments of spoken language, we conducted research to obtain RST annotations on non-native spoken responses from a standardized assessment of academic English proficiency. Subsequently, automatic parsers were trained on these annotations to process non-native spontaneous speech. Finally, a set of features were extracted from automatically generated RST trees to evaluate the discourse structure of non-native spontaneous speech, which were then employed to further improve the validity of an automated speech scoring system.
本研究的目的是在非英语母语者英语口语水平评估的背景下,建立自发口语反应的话语结构模型。修辞结构理论(RST)是分析书面语篇组织的常用理论。然而,到目前为止,对口语,特别是非母语自发语音的RST注释和解析的研究还很有限。由于语篇连贯的测量通常是口语评估的人类评分标准中的一个关键指标,我们进行了一项研究,从学术英语水平的标准化评估中获得非母语口语反应的RST注释。随后,在这些注释上训练自动解析器来处理非母语自发语音。最后,从自动生成的RST树中提取一组特征来评价非母语自发语音的语篇结构,进一步提高自动语音评分系统的有效性。
{"title":"Using Rhetorical Structure Theory to Assess Discourse Coherence for Non-native Spontaneous Speech","authors":"Xinhao Wang, Binod Gyawali, James V. Bruno, Hillary R. Molloy, Keelan Evanini, K. Zechner","doi":"10.18653/v1/W19-2719","DOIUrl":"https://doi.org/10.18653/v1/W19-2719","url":null,"abstract":"This study aims to model the discourse structure of spontaneous spoken responses within the context of an assessment of English speaking proficiency for non-native speakers. Rhetorical Structure Theory (RST) has been commonly used in the analysis of discourse organization of written texts; however, limited research has been conducted to date on RST annotation and parsing of spoken language, in particular, non-native spontaneous speech. Due to the fact that the measurement of discourse coherence is typically a key metric in human scoring rubrics for assessments of spoken language, we conducted research to obtain RST annotations on non-native spoken responses from a standardized assessment of academic English proficiency. Subsequently, automatic parsers were trained on these annotations to process non-native spontaneous speech. Finally, a set of features were extracted from automatically generated RST trees to evaluate the discourse structure of non-native spontaneous speech, which were then employed to further improve the validity of an automated speech scoring system.","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127798580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
EusDisParser: improving an under-resourced discourse parser with cross-lingual data EusDisParser:使用跨语言数据改进资源不足的话语解析器
Mikel Iruskieta, Chloé Braud
Development of discourse parsers to annotate the relational discourse structure of a text is crucial for many downstream tasks. However, most of the existing work focuses on English, assuming a quite large dataset. Discourse data have been annotated for Basque, but training a system on these data is challenging since the corpus is very small. In this paper, we create the first demonstrator based on RST for Basque, and we investigate the use of data in another language to improve the performance of a Basque discourse parser. More precisely, we build a monolingual system using the small set of data available and investigate the use of multilingual word embeddings to train a system for Basque using data annotated for another language. We found that our approach to building a system limited to the small set of data available for Basque allowed us to get an improvement over previous approaches making use of many data annotated in other languages. At best, we get 34.78 in F1 for the full discourse structure. More data annotation is necessary in order to improve the results obtained with these techniques. We also describe which relations match with the gold standard, in order to understand these results.
开发篇章解析器来注释文本的关系篇章结构对于许多下游任务至关重要。然而,现有的大部分工作都集中在英语上,假设数据集相当大。已经为巴斯克语注释了话语数据,但是由于语料库非常小,因此在这些数据上训练系统是具有挑战性的。在本文中,我们为巴斯克语创建了第一个基于RST的演示器,并研究了使用另一种语言的数据来提高巴斯克语语篇解析器的性能。更准确地说,我们使用可用的小数据集构建了一个单语言系统,并研究了使用多语言词嵌入来训练巴斯克语系统,该系统使用为另一种语言注释的数据。我们发现,我们构建系统的方法仅限于巴斯克语可用的小数据集,这使我们能够比以前使用其他语言注释的许多数据的方法得到改进。对于完整的语篇结构,我们最多得到34.78的F1。为了改进使用这些技术获得的结果,需要更多的数据注释。我们还描述了哪些关系与金标准相匹配,以便理解这些结果。
{"title":"EusDisParser: improving an under-resourced discourse parser with cross-lingual data","authors":"Mikel Iruskieta, Chloé Braud","doi":"10.18653/v1/w19-2709","DOIUrl":"https://doi.org/10.18653/v1/w19-2709","url":null,"abstract":"Development of discourse parsers to annotate the relational discourse structure of a text is crucial for many downstream tasks. However, most of the existing work focuses on English, assuming a quite large dataset. Discourse data have been annotated for Basque, but training a system on these data is challenging since the corpus is very small. In this paper, we create the first demonstrator based on RST for Basque, and we investigate the use of data in another language to improve the performance of a Basque discourse parser. More precisely, we build a monolingual system using the small set of data available and investigate the use of multilingual word embeddings to train a system for Basque using data annotated for another language. We found that our approach to building a system limited to the small set of data available for Basque allowed us to get an improvement over previous approaches making use of many data annotated in other languages. At best, we get 34.78 in F1 for the full discourse structure. More data annotation is necessary in order to improve the results obtained with these techniques. We also describe which relations match with the gold standard, in order to understand these results.","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122263546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Discourse Signal Annotation System for RST Trees 基于RST树的语篇信号标注系统
Luke Gessler, Yang Janet Liu, Amir Zeldes
This paper presents a new system for open-ended discourse relation signal annotation in the framework of Rhetorical Structure Theory (RST), implemented on top of an online tool for RST annotation. We discuss existing projects annotating textual signals of discourse relations, which have so far not allowed simultaneously structuring and annotating words signaling hierarchical discourse trees, and demonstrate the design and applications of our interface by extending existing RST annotations in the freely available GUM corpus.
本文提出了一种在修辞结构理论(RST)框架下的开放式话语关系信号标注系统,该系统是在RST标注在线工具的基础上实现的。我们讨论了现有的项目标注话语关系的文本信号,到目前为止,这些项目还不允许同时构建和标注信号分层话语树的单词,并通过扩展现有的RST标注来展示我们的接口的设计和应用。
{"title":"A Discourse Signal Annotation System for RST Trees","authors":"Luke Gessler, Yang Janet Liu, Amir Zeldes","doi":"10.18653/v1/W19-2708","DOIUrl":"https://doi.org/10.18653/v1/W19-2708","url":null,"abstract":"This paper presents a new system for open-ended discourse relation signal annotation in the framework of Rhetorical Structure Theory (RST), implemented on top of an online tool for RST annotation. We discuss existing projects annotating textual signals of discourse relations, which have so far not allowed simultaneously structuring and annotating words signaling hierarchical discourse trees, and demonstrate the design and applications of our interface by extending existing RST annotations in the freely available GUM corpus.","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116710749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1