语料库驱动的话语组织方法:从线索到复杂标记

Q1 Arts and Humanities Dialogue and Discourse Pub Date : 2017-01-31 DOI:10.5087/dad.2017.103

Marie-Paule Péry-Woodley, L. Ho-Dac, Josette Rebeyrolle, Ludovic Tanguy, Cécile Fabre

{"title":"语料库驱动的话语组织方法:从线索到复杂标记","authors":"Marie-Paule Péry-Woodley, L. Ho-Dac, Josette Rebeyrolle, Ludovic Tanguy, Cécile Fabre","doi":"10.5087/dad.2017.103","DOIUrl":null,"url":null,"abstract":"This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature markup alongside manual annotation, we explore a method to identify complex discourse markers seen as configurations of cues. The presentation of the background to what is termed \" multi-level annotation \" is organised around four issues: linearity, complexity of discourse markers, top-down processing, granularity and the multi-level nature of discourse structures. In this context, enumerative structures seem to deserve scrutiny for a number of reasons: they are frequent structures appearing at different granularity levels, they are signalled by a variety of devices appearing to work together in complex ways, and they combine a textual role (discourse organisation) with an ideational role (categorisation). We describe the annotation procedure and experimental framework which resulted in nearly 1,000 enumerative structures being annotated in a diversified corpus of over 600,000 words. The results of two approaches to the rich data produced are then presented: firstly, a descriptive survey highlights considerable variation in length and composition, while showing enumerative structure to be a basic strategy resorted to in all three sub-corpora, and leads to a granularity-based typology of the annotated structures; secondly, recurrent cue configurations—-our \" complex markers \" —-are identified by the application of data mining methods. The paper ends with perspectives for further exploitation of the data, in particular with respect to the semantic characterisation of enumerative structures.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"55 3 1","pages":"66-105"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A corpus-driven approach to discourse organisation: from cues to complex markers\",\"authors\":\"Marie-Paule Péry-Woodley, L. Ho-Dac, Josette Rebeyrolle, Ludovic Tanguy, Cécile Fabre\",\"doi\":\"10.5087/dad.2017.103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature markup alongside manual annotation, we explore a method to identify complex discourse markers seen as configurations of cues. The presentation of the background to what is termed \\\" multi-level annotation \\\" is organised around four issues: linearity, complexity of discourse markers, top-down processing, granularity and the multi-level nature of discourse structures. In this context, enumerative structures seem to deserve scrutiny for a number of reasons: they are frequent structures appearing at different granularity levels, they are signalled by a variety of devices appearing to work together in complex ways, and they combine a textual role (discourse organisation) with an ideational role (categorisation). We describe the annotation procedure and experimental framework which resulted in nearly 1,000 enumerative structures being annotated in a diversified corpus of over 600,000 words. The results of two approaches to the rich data produced are then presented: firstly, a descriptive survey highlights considerable variation in length and composition, while showing enumerative structure to be a basic strategy resorted to in all three sub-corpora, and leads to a granularity-based typology of the annotated structures; secondly, recurrent cue configurations—-our \\\" complex markers \\\" —-are identified by the application of data mining methods. The paper ends with perspectives for further exploitation of the data, in particular with respect to the semantic characterisation of enumerative structures.\",\"PeriodicalId\":37604,\"journal\":{\"name\":\"Dialogue and Discourse\",\"volume\":\"55 3 1\",\"pages\":\"66-105\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Dialogue and Discourse\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5087/dad.2017.103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dialogue and Discourse","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5087/dad.2017.103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Arts and Humanities","Score":null,"Total":0}

引用次数: 5

摘要

本文报道了一项实验，该实验实现了一种数据密集型的话语组织方法。它的重点是枚举结构设想作为一种类型的文本模式在顺序导向的方法，以语篇。在大规模标注练习的基础上，我们探索了一种识别复杂话语标记的方法，这种标记被视为线索的配置。所谓的“多层次注释”的背景是围绕四个问题来组织的:线性、话语标记的复杂性、自上而下的处理、粒度和话语结构的多层次性质。在这种情况下，列举结构似乎值得仔细研究，原因有很多:它们是出现在不同粒度级别的频繁结构，它们由各种各样的设备以复杂的方式协同工作，它们结合了文本角色(话语组织)和概念角色(分类)。我们描述了标注过程和实验框架，从而在60多万字的多样化语料库中标注了近1000个枚举结构。然后提出了对产生的丰富数据的两种方法的结果:首先，描述性调查突出了长度和组成的相当大的变化，同时显示枚举结构是所有三个子语料库中采用的基本策略，并导致了基于粒度的注释结构类型;其次，循环线索配置——我们的“复杂标记”——通过数据挖掘方法的应用来识别。论文最后对数据的进一步开发进行了展望，特别是关于枚举结构的语义特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A corpus-driven approach to discourse organisation: from cues to complex markers

This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature markup alongside manual annotation, we explore a method to identify complex discourse markers seen as configurations of cues. The presentation of the background to what is termed " multi-level annotation " is organised around four issues: linearity, complexity of discourse markers, top-down processing, granularity and the multi-level nature of discourse structures. In this context, enumerative structures seem to deserve scrutiny for a number of reasons: they are frequent structures appearing at different granularity levels, they are signalled by a variety of devices appearing to work together in complex ways, and they combine a textual role (discourse organisation) with an ideational role (categorisation). We describe the annotation procedure and experimental framework which resulted in nearly 1,000 enumerative structures being annotated in a diversified corpus of over 600,000 words. The results of two approaches to the rich data produced are then presented: firstly, a descriptive survey highlights considerable variation in length and composition, while showing enumerative structure to be a basic strategy resorted to in all three sub-corpora, and leads to a granularity-based typology of the annotated structures; secondly, recurrent cue configurations—-our " complex markers " —-are identified by the application of data mining methods. The paper ends with perspectives for further exploitation of the data, in particular with respect to the semantic characterisation of enumerative structures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Dialogue and Discourse Arts and Humanities-Language and Linguistics

CiteScore

1.90

自引率

0.00%

发文量

审稿时长

12 weeks

期刊介绍： D&D seeks previously unpublished, high quality articles on the analysis of discourse and dialogue that contain -experimental and/or theoretical studies related to the construction, representation, and maintenance of (linguistic) context -linguistic analysis of phenomena characteristic of discourse and/or dialogue (including, but not limited to: reference and anaphora, presupposition and accommodation, topicality and salience, implicature, ---discourse structure and rhetorical relations, discourse markers and particles, the semantics and -pragmatics of dialogue acts, questions, imperatives, non-sentential utterances, intonation, and meta--communicative phenomena such as repair and grounding) -experimental and/or theoretical studies of agents'' information states and their dynamics in conversational interaction -new analytical frameworks that advance theoretical studies of discourse and dialogue -research on systems performing coreference resolution, discourse structure parsing, event and temporal -structure, and reference resolution in multimodal communication -experimental and/or theoretical results yielding new insight into non-linguistic interaction in -communication -work on natural language understanding (including spoken language understanding), dialogue management, -reasoning, and natural language generation (including text-to-speech) in dialogue systems -work related to the design and engineering of dialogue systems (including, but not limited to: -evaluation, usability design and testing, rapid application deployment, embodied agents, affect detection, -mixed-initiative, adaptation, and user modeling). -extremely well-written surveys of existing work. Highest priority is given to research reports that are specifically written for a multidisciplinary audience. The audience is primarily researchers on discourse and dialogue and its associated fields, including computer scientists, linguists, psychologists, philosophers, roboticists, sociologists.