语料库驱动的话语组织方法:从线索到复杂标记

Q1 Arts and Humanities Dialogue and Discourse Pub Date : 2017-01-31 DOI:10.5087/dad.2017.103
Marie-Paule Péry-Woodley, L. Ho-Dac, Josette Rebeyrolle, Ludovic Tanguy, Cécile Fabre
{"title":"语料库驱动的话语组织方法:从线索到复杂标记","authors":"Marie-Paule Péry-Woodley, L. Ho-Dac, Josette Rebeyrolle, Ludovic Tanguy, Cécile Fabre","doi":"10.5087/dad.2017.103","DOIUrl":null,"url":null,"abstract":"This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature markup alongside manual annotation, we explore a method to identify complex discourse markers seen as configurations of cues. The presentation of the background to what is termed \" multi-level annotation \" is organised around four issues: linearity, complexity of discourse markers, top-down processing, granularity and the multi-level nature of discourse structures. In this context, enumerative structures seem to deserve scrutiny for a number of reasons: they are frequent structures appearing at different granularity levels, they are signalled by a variety of devices appearing to work together in complex ways, and they combine a textual role (discourse organisation) with an ideational role (categorisation). We describe the annotation procedure and experimental framework which resulted in nearly 1,000 enumerative structures being annotated in a diversified corpus of over 600,000 words. The results of two approaches to the rich data produced are then presented: firstly, a descriptive survey highlights considerable variation in length and composition, while showing enumerative structure to be a basic strategy resorted to in all three sub-corpora, and leads to a granularity-based typology of the annotated structures; secondly, recurrent cue configurations—-our \" complex markers \" —-are identified by the application of data mining methods. The paper ends with perspectives for further exploitation of the data, in particular with respect to the semantic characterisation of enumerative structures.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"55 3 1","pages":"66-105"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A corpus-driven approach to discourse organisation: from cues to complex markers\",\"authors\":\"Marie-Paule Péry-Woodley, L. Ho-Dac, Josette Rebeyrolle, Ludovic Tanguy, Cécile Fabre\",\"doi\":\"10.5087/dad.2017.103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature markup alongside manual annotation, we explore a method to identify complex discourse markers seen as configurations of cues. The presentation of the background to what is termed \\\" multi-level annotation \\\" is organised around four issues: linearity, complexity of discourse markers, top-down processing, granularity and the multi-level nature of discourse structures. In this context, enumerative structures seem to deserve scrutiny for a number of reasons: they are frequent structures appearing at different granularity levels, they are signalled by a variety of devices appearing to work together in complex ways, and they combine a textual role (discourse organisation) with an ideational role (categorisation). We describe the annotation procedure and experimental framework which resulted in nearly 1,000 enumerative structures being annotated in a diversified corpus of over 600,000 words. The results of two approaches to the rich data produced are then presented: firstly, a descriptive survey highlights considerable variation in length and composition, while showing enumerative structure to be a basic strategy resorted to in all three sub-corpora, and leads to a granularity-based typology of the annotated structures; secondly, recurrent cue configurations—-our \\\" complex markers \\\" —-are identified by the application of data mining methods. The paper ends with perspectives for further exploitation of the data, in particular with respect to the semantic characterisation of enumerative structures.\",\"PeriodicalId\":37604,\"journal\":{\"name\":\"Dialogue and Discourse\",\"volume\":\"55 3 1\",\"pages\":\"66-105\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Dialogue and Discourse\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5087/dad.2017.103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dialogue and Discourse","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5087/dad.2017.103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 5

摘要

本文报道了一项实验,该实验实现了一种数据密集型的话语组织方法。它的重点是枚举结构设想作为一种类型的文本模式在顺序导向的方法,以语篇。在大规模标注练习的基础上,我们探索了一种识别复杂话语标记的方法,这种标记被视为线索的配置。所谓的“多层次注释”的背景是围绕四个问题来组织的:线性、话语标记的复杂性、自上而下的处理、粒度和话语结构的多层次性质。在这种情况下,列举结构似乎值得仔细研究,原因有很多:它们是出现在不同粒度级别的频繁结构,它们由各种各样的设备以复杂的方式协同工作,它们结合了文本角色(话语组织)和概念角色(分类)。我们描述了标注过程和实验框架,从而在60多万字的多样化语料库中标注了近1000个枚举结构。然后提出了对产生的丰富数据的两种方法的结果:首先,描述性调查突出了长度和组成的相当大的变化,同时显示枚举结构是所有三个子语料库中采用的基本策略,并导致了基于粒度的注释结构类型;其次,循环线索配置——我们的“复杂标记”——通过数据挖掘方法的应用来识别。论文最后对数据的进一步开发进行了展望,特别是关于枚举结构的语义特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A corpus-driven approach to discourse organisation: from cues to complex markers
This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature markup alongside manual annotation, we explore a method to identify complex discourse markers seen as configurations of cues. The presentation of the background to what is termed " multi-level annotation " is organised around four issues: linearity, complexity of discourse markers, top-down processing, granularity and the multi-level nature of discourse structures. In this context, enumerative structures seem to deserve scrutiny for a number of reasons: they are frequent structures appearing at different granularity levels, they are signalled by a variety of devices appearing to work together in complex ways, and they combine a textual role (discourse organisation) with an ideational role (categorisation). We describe the annotation procedure and experimental framework which resulted in nearly 1,000 enumerative structures being annotated in a diversified corpus of over 600,000 words. The results of two approaches to the rich data produced are then presented: firstly, a descriptive survey highlights considerable variation in length and composition, while showing enumerative structure to be a basic strategy resorted to in all three sub-corpora, and leads to a granularity-based typology of the annotated structures; secondly, recurrent cue configurations—-our " complex markers " —-are identified by the application of data mining methods. The paper ends with perspectives for further exploitation of the data, in particular with respect to the semantic characterisation of enumerative structures.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Dialogue and Discourse
Dialogue and Discourse Arts and Humanities-Language and Linguistics
CiteScore
1.90
自引率
0.00%
发文量
7
审稿时长
12 weeks
期刊介绍: D&D seeks previously unpublished, high quality articles on the analysis of discourse and dialogue that contain -experimental and/or theoretical studies related to the construction, representation, and maintenance of (linguistic) context -linguistic analysis of phenomena characteristic of discourse and/or dialogue (including, but not limited to: reference and anaphora, presupposition and accommodation, topicality and salience, implicature, ---discourse structure and rhetorical relations, discourse markers and particles, the semantics and -pragmatics of dialogue acts, questions, imperatives, non-sentential utterances, intonation, and meta--communicative phenomena such as repair and grounding) -experimental and/or theoretical studies of agents'' information states and their dynamics in conversational interaction -new analytical frameworks that advance theoretical studies of discourse and dialogue -research on systems performing coreference resolution, discourse structure parsing, event and temporal -structure, and reference resolution in multimodal communication -experimental and/or theoretical results yielding new insight into non-linguistic interaction in -communication -work on natural language understanding (including spoken language understanding), dialogue management, -reasoning, and natural language generation (including text-to-speech) in dialogue systems -work related to the design and engineering of dialogue systems (including, but not limited to: -evaluation, usability design and testing, rapid application deployment, embodied agents, affect detection, -mixed-initiative, adaptation, and user modeling). -extremely well-written surveys of existing work. Highest priority is given to research reports that are specifically written for a multidisciplinary audience. The audience is primarily researchers on discourse and dialogue and its associated fields, including computer scientists, linguists, psychologists, philosophers, roboticists, sociologists.
期刊最新文献
The Conversational Discourse Unit: Identification and Its Role in Conversational Turn-taking Management Exploring the Sensitivity to Alternative Signals of Coherence Relations Scoring Coreference Chains with Split-Antecedent Anaphors Form and Function of Connectives in Chinese Conversational Speech Bullshit, Pragmatic Deception, and Natural Language Processing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1