How compatible are our discourse annotation frameworks? Insights from mapping RST-DT and PDTB annotations

Q1 Arts and Humanities Dialogue and Discourse Pub Date : 2019-06-14 DOI:10.5087/dad.2019.104

Vera Demberg, Merel C. J. Scholman, Fatemeh Torabi Asr

{"title":"How compatible are our discourse annotation frameworks? Insights from mapping RST-DT and PDTB annotations","authors":"Vera Demberg, Merel C. J. Scholman, Fatemeh Torabi Asr","doi":"10.5087/dad.2019.104","DOIUrl":null,"url":null,"abstract":"Discourse-annotated corpora are an important resource for the community, but they are often annotated according to different frameworks. This makes joint usage of the annotations difficult, preventing researchers from searching the corpora in a unified way, or using all annotated data jointly to train computational systems. Several theoretical proposals have recently been made for mapping the relational labels of different frameworks to each other, but these proposals have so far not been validated against existing annotations. The two largest discourse relation annotated resources, the Penn Discourse Treebank and the Rhetorical Structure Theory Discourse Treebank, have however been annotated on the same texts, allowing for a direct comparison of the annotation layers. We propose a method for automatically aligning the discourse segments, and then evaluate existing mapping proposals by comparing the empirically observed against the proposed mappings. Our analysis highlights the influence of segmentation on subsequent discourse relation labelling, and shows that while agreement between frameworks is reasonable for explicit relations, agreement on implicit relations is low. We identify several sources of systematic discrepancies between the two annotation schemes and discuss consequences for future annotation and for usage of the existing resources.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dialogue and Discourse","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5087/dad.2019.104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Arts and Humanities","Score":null,"Total":0}

引用次数: 16

Abstract

Discourse-annotated corpora are an important resource for the community, but they are often annotated according to different frameworks. This makes joint usage of the annotations difficult, preventing researchers from searching the corpora in a unified way, or using all annotated data jointly to train computational systems. Several theoretical proposals have recently been made for mapping the relational labels of different frameworks to each other, but these proposals have so far not been validated against existing annotations. The two largest discourse relation annotated resources, the Penn Discourse Treebank and the Rhetorical Structure Theory Discourse Treebank, have however been annotated on the same texts, allowing for a direct comparison of the annotation layers. We propose a method for automatically aligning the discourse segments, and then evaluate existing mapping proposals by comparing the empirically observed against the proposed mappings. Our analysis highlights the influence of segmentation on subsequent discourse relation labelling, and shows that while agreement between frameworks is reasonable for explicit relations, agreement on implicit relations is low. We identify several sources of systematic discrepancies between the two annotation schemes and discuss consequences for future annotation and for usage of the existing resources.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

我们的话语注释框架的兼容性如何?映射RST-DT和PDTB注释的见解

话语标注语料库是社区的重要资源，但它们往往根据不同的框架进行标注。这使得标注的联合使用变得困难，阻碍了研究人员以统一的方式搜索语料库，或者共同使用所有标注的数据来训练计算系统。最近提出了几个理论建议，将不同框架的关系标签相互映射，但是这些建议到目前为止还没有针对现有的注释进行验证。然而，两个最大的语篇关系注释资源，Penn语篇树库和修辞结构理论语篇树库，已经在同一文本上进行了注释，从而可以直接比较注释层。我们提出了一种自动对齐话语片段的方法，然后通过将经验观察到的映射与建议的映射进行比较来评估现有的映射建议。我们的分析强调了分割对后续话语关系标记的影响，并表明虽然框架之间的一致性对于显式关系是合理的，但对于隐式关系的一致性很低。我们确定了两种注释方案之间系统差异的几个来源，并讨论了未来注释和现有资源使用的后果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Dialogue and Discourse Arts and Humanities-Language and Linguistics

CiteScore

1.90

自引率

0.00%

发文量

审稿时长

12 weeks

期刊介绍： D&D seeks previously unpublished, high quality articles on the analysis of discourse and dialogue that contain -experimental and/or theoretical studies related to the construction, representation, and maintenance of (linguistic) context -linguistic analysis of phenomena characteristic of discourse and/or dialogue (including, but not limited to: reference and anaphora, presupposition and accommodation, topicality and salience, implicature, ---discourse structure and rhetorical relations, discourse markers and particles, the semantics and -pragmatics of dialogue acts, questions, imperatives, non-sentential utterances, intonation, and meta--communicative phenomena such as repair and grounding) -experimental and/or theoretical studies of agents'' information states and their dynamics in conversational interaction -new analytical frameworks that advance theoretical studies of discourse and dialogue -research on systems performing coreference resolution, discourse structure parsing, event and temporal -structure, and reference resolution in multimodal communication -experimental and/or theoretical results yielding new insight into non-linguistic interaction in -communication -work on natural language understanding (including spoken language understanding), dialogue management, -reasoning, and natural language generation (including text-to-speech) in dialogue systems -work related to the design and engineering of dialogue systems (including, but not limited to: -evaluation, usability design and testing, rapid application deployment, embodied agents, affect detection, -mixed-initiative, adaptation, and user modeling). -extremely well-written surveys of existing work. Highest priority is given to research reports that are specifically written for a multidisciplinary audience. The audience is primarily researchers on discourse and dialogue and its associated fields, including computer scientists, linguists, psychologists, philosophers, roboticists, sociologists.

期刊最新文献

The Conversational Discourse Unit: Identification and Its Role in Conversational Turn-taking Management Exploring the Sensitivity to Alternative Signals of Coherence Relations Scoring Coreference Chains with Split-Antecedent Anaphors Form and Function of Connectives in Chinese Conversational Speech Bullshit, Pragmatic Deception, and Natural Language Processing