Dialogue and Discourse最新文献

英文中文

Reasoning Between the Lines: a Logic of Relational Propositions 字里行间的推理:关系命题的逻辑

Q1 Arts and Humanities

Dialogue and Discourse

Pub Date : 2019-01-04 DOI: 10.5087/dad.2018.203

Andrew Potter

This paper describes how Rhetorical Structure Theory (RST) and relational propositions can be used to define a method for rendering and analyzing texts as expressions in propositional logic. Relational propositions, the implicit assertions that correspond to RST relations, are defined using standard logical operators and rules of inference. The resulting logical forms are used to construct logical expressions that map to RST tree structures. The resulting expressions show that inference is pervasive within coherent texts. To support reasoning over these expressions, a set of rules for negation is defined. The logical forms and their negation rules can be used to examine the flow of reasoning and the effects of incoherence. Because there is a correspondence between logical coherence and the functional relationships of RST, an RST analysis that cannot pass the test of logic is indicative either of a problematic analysis or of an incoherent text. The result is a method for analyzing the logic implicit within discursive reasoning.

本文描述了如何利用修辞结构理论和关系命题来定义一种将文本作为命题逻辑表达式来呈现和分析的方法。关系命题，即对应于RST关系的隐式断言，是使用标准逻辑运算符和推理规则定义的。生成的逻辑形式用于构造映射到RST树结构的逻辑表达式。结果表明，推理在连贯文本中是普遍存在的。为了支持对这些表达式的推理，定义了一组否定规则。逻辑形式及其否定规则可以用来检验推理的流程和不连贯的影响。由于逻辑连贯与RST的功能关系之间存在对应关系，因此不能通过逻辑测试的RST分析要么表明分析有问题，要么表明文本不连贯。结果是一种分析话语推理中隐含逻辑的方法。

引用次数: 5

Asymmetries between interpretation and production in Catalan pronouns 加泰罗尼亚语代词的解释与表达不对称

Q1 Arts and Humanities

Dialogue and Discourse

Pub Date : 2018-12-14 DOI: 10.5087/DAD.2018.201

Laia Mayol

The literature on Romance null-subject languages has often postulated a division of labor between Null and Overt pronouns: Nulls prefer to retrieve an antecedent in subject position, whereas Overts prefer an antecedent in a lower syntactic position (Carminati, 2002). However, recent research on English pronouns (Rohde and Kehler, 2014) has shown grammatical function alone cannot explain pronoun interpretation. According to these models, pronoun interpretation and production are sensitive to different sets of factors and, instead of being mirror images of each other, are related probabilistically in a Bayesian fashion. This paper tests this model with Catalan data from two discourse-completion experiments to study the grammatical and pragmatic factors that affect the interpretation and production of Null and Overt pronouns. Our main result is that both Null and Overt pronouns present asymmetries regarding their interpretation and production: (1) the production of Null pronouns is affected mainly by grammatical factors (they are subject-biased), but their interpretation is also influenced by pragmatic factors (in particular, rhetorical relations), and (2) while Overt pronouns have a strong interpretation bias towards the object, the data indicates that they are not the preferred form to refer to the object.

罗曼语无主语语言的文献通常假设无主语代词和显性代词之间存在分工:无主语倾向于检索处于主语位置的先行词，而显性代词倾向于检索处于句法位置较低的先行词(Carminati, 2002)。然而，最近对英语代词的研究(Rohde and Kehler, 2014)表明，单靠语法功能并不能解释代词的解释。根据这些模型，代词的解释和产生对不同的因素集合敏感，而不是互为镜像，而是以贝叶斯方式概率相关。本文利用两个语篇完成实验的加泰罗尼亚语数据对该模型进行了检验，以研究影响虚代词和显性代词解释和产生的语法和语用因素。我们的主要结果是，Null代词和显性代词在解释和产生方面都存在不对称性:(1)Null代词的产生主要受到语法因素的影响(它们是主语偏向的)，但它们的解释也受到语用因素的影响(特别是修辞关系);(2)虽然显性代词对宾语有很强的解释偏向，但数据表明它们不是指称宾语的首选形式。

{"title":"Asymmetries between interpretation and production in Catalan pronouns","authors":"Laia Mayol","doi":"10.5087/DAD.2018.201","DOIUrl":"https://doi.org/10.5087/DAD.2018.201","url":null,"abstract":"The literature on Romance null-subject languages has often postulated a division of labor between Null and Overt pronouns: Nulls prefer to retrieve an antecedent in subject position, whereas Overts prefer an antecedent in a lower syntactic position (Carminati, 2002). However, recent research on English pronouns (Rohde and Kehler, 2014) has shown grammatical function alone cannot explain pronoun interpretation. According to these models, pronoun interpretation and production are sensitive to different sets of factors and, instead of being mirror images of each other, are related probabilistically in a Bayesian fashion. This paper tests this model with Catalan data from two discourse-completion experiments to study the grammatical and pragmatic factors that affect the interpretation and production of Null and Overt pronouns. Our main result is that both Null and Overt pronouns present asymmetries regarding their interpretation and production: (1) the production of Null pronouns is affected mainly by grammatical factors (they are subject-biased), but their interpretation is also influenced by pragmatic factors (in particular, rhetorical relations), and (2) while Overt pronouns have a strong interpretation bias towards the object, the data indicates that they are not the preferred form to refer to the object.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"31 1","pages":"1-34"},"PeriodicalIF":0.0,"publicationDate":"2018-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74778144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Subjectivity in Spanish Discourse: Explicit and Implicit Causal Relations in Different Text Types 西班牙语语篇中的主体性:不同语篇类型中的显性和隐性因果关系

Q1 Arts and Humanities

Dialogue and Discourse

Pub Date : 2018-09-14 DOI: 10.5087/dad.2018.106

Andrea Santana, W. Spooren, Dorien Nieuwenhuijsen, T. Sanders

Corpus-based studies in various languages have demonstrated that some connectives are used preferentially to express subjective versus objective meanings, for example, omdat vs. want in Dutch. However, Spanish connectives have been understudied from this perspective. Moreover, most of the studies of subjectivity have focused on explicit relations and little is known about the subjectivity of implicit coherence relations. In addition, the role that text type plays in the meaning and use of causal relations and their connectives is still under discussion. This study aims to analyze the local contexts of Spanish causal explicit and implicit relations in different text types by carrying out manual analyses of subjectivity. 360 relations marked by three prototypical causal connectives and 120 implicit relations were extracted from academic and journalistic texts. The analytical model applied is based on an integrative approach to subjectivity. Statistical analyses indicate a particular behavior of Spanish connectives and implicit relations and a three-way interaction between subjectivity, text type, and linguistic marking in journalistic texts. Therefore, this study reveals new insights into subjectivity in Spanish discourse.

基于语料库的各种语言研究表明，一些连接词优先用于表达主观意义而不是客观意义，例如荷兰语中的omdat和want。然而，从这个角度来看，西班牙语连接词的研究不足。此外，对主体性的研究大多集中在显性关系上，对内隐连贯关系的主体性了解甚少。此外，文本类型在因果关系及其连接词的意义和使用中所起的作用仍在讨论中。本研究旨在通过手工主体性分析，分析西班牙语在不同语篇类型中因果显性和隐性关系的语境。从学术和新闻文本中提取了360个以3个典型因果连接词为标志的关系和120个隐含关系。所采用的分析模型是基于对主观性的综合研究方法。统计分析表明，西班牙语连接词和隐含关系在新闻语篇中的特殊行为，以及主体性、文本类型和语言标记之间的三向互动。因此，本研究揭示了对西班牙语语篇主体性的新认识。

{"title":"Subjectivity in Spanish Discourse: Explicit and Implicit Causal Relations in Different Text Types","authors":"Andrea Santana, W. Spooren, Dorien Nieuwenhuijsen, T. Sanders","doi":"10.5087/dad.2018.106","DOIUrl":"https://doi.org/10.5087/dad.2018.106","url":null,"abstract":"Corpus-based studies in various languages have demonstrated that some connectives are used preferentially to express subjective versus objective meanings, for example, omdat vs. want in Dutch. However, Spanish connectives have been understudied from this perspective. Moreover, most of the studies of subjectivity have focused on explicit relations and little is known about the subjectivity of implicit coherence relations. In addition, the role that text type plays in the meaning and use of causal relations and their connectives is still under discussion. This study aims to analyze the local contexts of Spanish causal explicit and implicit relations in different text types by carrying out manual analyses of subjectivity. 360 relations marked by three prototypical causal connectives and 120 implicit relations were extracted from academic and journalistic texts. The analytical model applied is based on an integrative approach to subjectivity. Statistical analyses indicate a particular behavior of Spanish connectives and implicit relations and a three-way interaction between subjectivity, text type, and linguistic marking in journalistic texts. Therefore, this study reveals new insights into subjectivity in Spanish discourse.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"229 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75909231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Source vs. Stance: On the Relationship between Evidential and Modal Expressions 来源与立场:论证据与情态表达的关系

Q1 Arts and Humanities

Dialogue and Discourse

Pub Date : 2018-08-10 DOI: 10.5087/dad.2018.105

Sumeyra Tosun, Jyotsna Vaid

Languages vary in how they encode and interpret attested information. The present research examined how users of Turkish and English construe utterances containing evidential information, in particular, whether evidential information is interpreted strictly as conveying source information (firsthand, or non-firsthand), or whether it is also perceived as signaling reliability of particular sources. Participants read sentences in their respective language presented in various source and modal forms and were asked to judge the source of information of the proposition and their confidence in whether the asserted event actually happened. It was found that there was sufficient information from evidential and modal expressions to make both source and probability of occurrence judgments, although the groups differed somewhat in their judgment patterns. The findings are taken to suggest that, for both Turkish and English speakers, evidentiality and epistemic modality overlaps to some extent but the two do not function exactly in the same way.

语言在编码和解释已证实信息的方式上各不相同。本研究考察了土耳其语和英语使用者如何解释包含证据信息的话语，特别是证据信息是否被严格解释为传达源信息(第一手或非第一手)，或者是否也被视为特定来源的可靠性信号。参与者用各自的语言阅读以各种来源和情态形式呈现的句子，并被要求判断命题信息的来源，以及他们对所断言的事件是否真的发生过的信心。结果发现，尽管两组的判断模式有所不同，但证据性表达和情态表达都有足够的信息来判断发生的来源和概率。研究结果表明，对于说土耳其语和英语的人来说，证据性和认知形态在某种程度上是重叠的，但两者的功能并不完全相同。

引用次数: 3

Cross-domain analysis of discourse markers in European Portuguese 欧洲葡萄牙语语篇标记语的跨域分析

Q1 Arts and Humanities

Dialogue and Discourse

Pub Date : 2018-06-08 DOI: 10.5087/dad.2018.103

Vera Cabarrão, Helena Moniz, Fernando Batista, Jaime Ferreira, I. Trancoso, Ana Isabel Mata

This paper presents an analysis of discourse markers in two spontaneous speech corpora for European Portuguese - university lectures and map-task dialogues - and also in a collection of tweets, aiming at contributing to their categorization, scarcely existent for European Portuguese. Our results show that the selection of discourse markers is domain and speaker dependent. We also found that the most frequent discourse markers are similar in all three corpora, despite tweets containing discourse markers not found in the other two corpora. In this multidisciplinary study, comprising both a linguistic perspective and a computational approach, discourse markers are also automatically discriminated from other structural metadata events, namely sentence-like units and disfluencies. Our results show that discourse markers and disfluencies tend to co-occur in the dialogue corpus, but have a complementary distribution in the university lectures. We used three acoustic-prosodic feature sets and machine learning to automatically distinguish between discourse markers, disfluencies and sentence-like units. Our in-domain experiments achieved an accuracy of about 87% in university lectures and 84% in dialogues, in line with our previous results. The eGeMAPS features, commonly used for other paralinguistic tasks, achieved a considerable performance on our data, especially considering the small size of the feature set. Our results suggest that turn-initial discourse markers are usually easier to classify than disfluencies, a result also previously reported in the literature. We conducted a cross-domain evaluation in order to evaluate the robustness of the models across domains. The results achieved are about 11%-12% lower, but we conclude that data from one domain can still be used to classify the same events in the other. Overall, despite the complexity of this task, these are very encouraging state-of-the-art results. Ultimately, using exclusively acoustic-prosodic cues, discourse markers can be fairly discriminated from disfluencies and SUs. In order to better understand the contribution of each feature, we have also reported the impact of the features in both the dialogues and the university lectures. Pitch features are the most relevant ones for the distinction between discourse markers and disfluencies, namely pitch slopes. These features are in line with the wide pitch range of discourse markers, in a continuum from a very compressed pitch range to a very wide one, expressed by total deaccented material or H+L* L* contours, with upstep H tones.

本文对欧洲葡萄牙语大学演讲和地图任务对话这两个自发语料库中的话语标记进行了分析，并对一组推文进行了分析，目的是对欧洲葡萄牙语几乎不存在的推文进行分类。我们的研究结果表明，话语标记的选择是领域和说话人相关的。我们还发现，尽管推文中包含的话语标记在其他两个语料库中没有发现，但这三个语料库中最常见的话语标记是相似的。在这项多学科研究中，包括语言学视角和计算方法，话语标记也自动与其他结构性元数据事件区分开来，即句子类单位和不流畅。研究结果表明，语篇标记语和语篇不流畅语在对话语料库中往往同时出现，但在大学讲座中却呈互补分布。我们使用了三个声学韵律特征集和机器学习来自动区分话语标记、不流畅和句子类单位。我们的领域内实验在大学讲座和对话中的准确率分别达到了87%和84%，与我们之前的结果一致。eGeMAPS特征通常用于其他副语言任务，在我们的数据上取得了相当大的性能，特别是考虑到特征集的小尺寸。我们的研究结果表明，转向起始语篇标记通常比不流利语更容易分类，这一结果也在文献中有所报道。为了评估模型跨领域的鲁棒性，我们进行了跨领域评估。所获得的结果大约降低了11%-12%，但我们得出结论，来自一个领域的数据仍然可以用于对另一个领域的相同事件进行分类。总的来说，尽管这项任务很复杂，但这些都是非常令人鼓舞的最新成果。最后，仅使用声学韵律线索，话语标记可以与不流利和不连贯区分开来。为了更好地理解每个特稿的贡献，我们还在对话和大学讲座中报道了特稿的影响。音高特征是区分语篇标记语和不流畅语最相关的特征，即音高斜率。这些特征与话语标记的宽音高范围一致，在一个从非常压缩的音高范围到一个非常宽的音高范围的连续体中，用完全去音的材料或H+L* L*轮廓来表达，带有上行的H音调。

{"title":"Cross-domain analysis of discourse markers in European Portuguese","authors":"Vera Cabarrão, Helena Moniz, Fernando Batista, Jaime Ferreira, I. Trancoso, Ana Isabel Mata","doi":"10.5087/dad.2018.103","DOIUrl":"https://doi.org/10.5087/dad.2018.103","url":null,"abstract":"This paper presents an analysis of discourse markers in two spontaneous speech corpora for European Portuguese - university lectures and map-task dialogues - and also in a collection of tweets, aiming at contributing to their categorization, scarcely existent for European Portuguese. Our results show that the selection of discourse markers is domain and speaker dependent. We also found that the most frequent discourse markers are similar in all three corpora, despite tweets containing discourse markers not found in the other two corpora. In this multidisciplinary study, comprising both a linguistic perspective and a computational approach, discourse markers are also automatically discriminated from other structural metadata events, namely sentence-like units and disfluencies. Our results show that discourse markers and disfluencies tend to co-occur in the dialogue corpus, but have a complementary distribution in the university lectures. We used three acoustic-prosodic feature sets and machine learning to automatically distinguish between discourse markers, disfluencies and sentence-like units. Our in-domain experiments achieved an accuracy of about 87% in university lectures and 84% in dialogues, in line with our previous results. The eGeMAPS features, commonly used for other paralinguistic tasks, achieved a considerable performance on our data, especially considering the small size of the feature set. Our results suggest that turn-initial discourse markers are usually easier to classify than disfluencies, a result also previously reported in the literature. We conducted a cross-domain evaluation in order to evaluate the robustness of the models across domains. The results achieved are about 11%-12% lower, but we conclude that data from one domain can still be used to classify the same events in the other. Overall, despite the complexity of this task, these are very encouraging state-of-the-art results. Ultimately, using exclusively acoustic-prosodic cues, discourse markers can be fairly discriminated from disfluencies and SUs. In order to better understand the contribution of each feature, we have also reported the impact of the features in both the dialogues and the university lectures. Pitch features are the most relevant ones for the distinction between discourse markers and disfluencies, namely pitch slopes. These features are in line with the wide pitch range of discourse markers, in a continuum from a very compressed pitch range to a very wide one, expressed by total deaccented material or H+L* L* contours, with upstep H tones.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"4 1","pages":"79-106"},"PeriodicalIF":0.0,"publicationDate":"2018-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87851832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Primary and secondary discourse connectives: definitions and lexicons 主语篇连接词和次语篇连接词:定义和词汇

Q1 Arts and Humanities

Dialogue and Discourse

Pub Date : 2018-06-01 DOI: 10.5087/DAD.2018.102

L. Danlos, Katerina Rysova, Magdaléna Rysová, Manfred Stede

Starting from the perspective that discourse structure arises from the presence of coherence relations, we provide a map of linguistic discourse structuring devices (DRDs), and focus on those for written text. We propose to structure these items by differentiating between primary and secondary connectives on the one hand, and free connecting phrases on the other. For the former, we propose that their behavior can be described by lexicons, and we show one concrete proposal that by now has been applied to three languages, with others being added in ongoing work. The lexical representations can be useful both for humans (theoretical investigations, transfer to other languages) and for machines (automatic discourse parsing and generation).

从语篇结构源于连贯关系的观点出发，我们提供了语言语篇结构工具的地图，并重点研究了用于书面文本的语篇结构工具。我们建议通过区分主连接词和次连接词以及自由连接短语来构建这些条目。对于前者，我们建议它们的行为可以用词汇来描述，并且我们展示了一个具体的建议，该建议目前已应用于三种语言，其他建议正在进行的工作中添加。词汇表示对人类(理论研究，转换到其他语言)和机器(自动话语解析和生成)都很有用。

引用次数: 15

Signalling Implicit Relations: A PDTB - RST Comparison 信号隐式关系:PDTB - RST比较

Q1 Arts and Humanities

Dialogue and Discourse

Pub Date : 2017-12-30 DOI: 10.5087/DAD.2017.210

Lucie Poláková, Jirí Mírovský, Pavlína Synková

Describing implicit phenomena in discourse is known to be a problematic task, from both theoretical and empirical perspectives. The present article contributes to this topic by a novel comparative analysis of two prominent annotation approaches to discourse relations (coherence relations) that were carried out on the same texts. We compare the annotation of implicit relations in the Penn Discourse Treebank 2.0, i.e. discourse relations not signalled by an explicit discourse connective, to the recently released analysis of signals of rhetorical relations in the RST Signalling Corpus (RST-SC). The intersection of corresponding pairs of relations is rather a small one, but it shows a cleartendency: unliketheoverallsignaldistributionintheRST-SC,morethanhalfofthesignalsin the studied intersection are of semantic type, formed mostly by loosely deﬁned lexical chains. Our data transformation allows for a simultaneous depiction and detailed study of the two resources.

从理论和经验的角度来看，描述话语中的内隐现象是一个有问题的任务。本文通过对同一文本上两种突出的话语关系(连贯关系)注释方法进行新颖的比较分析，为这一主题做出了贡献。我们比较了宾州语篇树库2.0中隐式关系的注释，即没有显语篇连接语的语篇关系，与RST信号语料库(RST- sc)中最近发布的修辞关系信号分析。对应关系对的交集很小，但有一个明显的趋势:与st - sc的整体信号分布不同，所研究的交集中超过一半的信号是语义型的，大部分是由松散定义的词汇链构成的。我们的数据转换允许同时描述和详细研究这两种资源。

引用次数: 8

Discovering Rhetoric Agreement between a Request and Response 发现请求与回应之间的修辞一致性

Q1 Arts and Humanities

Dialogue and Discourse

Pub Date : 2017-12-21 DOI: 10.5087/dad.2017.208

Boris A. Galitsky

To support a natural flow of a conversation between humans and automated agents, rhetoric structures of each message has to be analyzed. We classify a pair of paragraphs of text as appropriate for one to follow another, or inappropriate, based on both topic and communicative discourse considerations. To represent a multi-sentence message with respect to how it should follow a previous message in a conversation or dialogue, we build an extension of a discourse tree for it. Extended discourse tree is based on a discourse tree for RST relations with labels for communicative actions, and also additional arcs for anaphora and ontology-based relations for entities. We refer to such trees as Communicative Discourse Trees (CDTs). We explore syntactic and discourse features that are indicative of correct vs incorrect request-response or question-answer pairs. Two learning frameworks are used to recognize such correct pairs: deterministic, nearest-neighbor learning of CDTs as graphs, and a tree kernel learning of CDTs, where a feature space of all CDT sub-trees is subject to SVM learning. We form the positive training set from the correct pairs obtained from Yahoo Answers, social network, corporate conversations including Enron emails, customer complaints and interviews by journalists. The corresponding negative training set is artificially created by attaching responses for different, inappropriate requests that include relevant keywords. The evaluation showed that it is possible to recognize valid pairs in 70% of cases in the domains of weak request-response agreement and 80% of cases in the domains of strong agreement, which is essential to support automated conversations. These accuracies are comparable with the benchmark task of classification of discourse trees themselves as valid or invalid, and also with classification of multi-sentence answers in factoid question-answering systems. The applicability of proposed machinery to the problem of chatbots, social chats and programming via NL is demonstrated. We conclude that learning rhetoric structures in the form of CDTs is the key source of data to support answering complex questions, chatbots and dialogue management.

为了支持人类和自动代理之间的自然对话流，必须分析每个消息的修辞结构。我们根据主题和交际话语的考虑，将一对文本段落分为适合或不适合。为了表示一个多句的消息，并考虑到它应该如何在对话或对话中跟随前一个消息，我们为它构建了一个话语树的扩展。扩展语篇树是在RST关系的语篇树的基础上建立的，该语篇树为交际行为提供了标签，并为实体的回指和基于本体的关系提供了额外的弧。我们将这种树称为交际话语树(CDTs)。我们探讨了指示正确与不正确的请求-回应或问答对的句法和话语特征。使用两种学习框架来识别这些正确的对:作为图的CDT的确定性最近邻学习，以及CDT的树核学习，其中所有CDT子树的特征空间都服从SVM学习。我们从Yahoo Answers、社交网络、公司对话(包括安然邮件)、客户投诉和记者采访中获得的正确配对中形成正训练集。相应的负训练集是人为地通过附加对不同的、不适当的请求的响应来创建的，这些请求包括相关的关键字。评估表明，在弱请求-响应协议域中，70%的情况下可以识别有效对，在强协议域中，80%的情况下可以识别有效对，这对于支持自动对话至关重要。这些精度与话语树本身的有效或无效分类的基准任务相当，也与factoid问答系统中的多句答案分类相当。提出的机器对聊天机器人、社交聊天和通过自然语言编程问题的适用性进行了论证。我们得出结论，学习cdt形式的修辞结构是支持回答复杂问题、聊天机器人和对话管理的关键数据来源。

{"title":"Discovering Rhetoric Agreement between a Request and Response","authors":"Boris A. Galitsky","doi":"10.5087/dad.2017.208","DOIUrl":"https://doi.org/10.5087/dad.2017.208","url":null,"abstract":"To support a natural flow of a conversation between humans and automated agents, rhetoric structures of each message has to be analyzed. We classify a pair of paragraphs of text as appropriate for one to follow another, or inappropriate, based on both topic and communicative discourse considerations. To represent a multi-sentence message with respect to how it should follow a previous message in a conversation or dialogue, we build an extension of a discourse tree for it. Extended discourse tree is based on a discourse tree for RST relations with labels for communicative actions, and also additional arcs for anaphora and ontology-based relations for entities. We refer to such trees as Communicative Discourse Trees (CDTs). We explore syntactic and discourse features that are indicative of correct vs incorrect request-response or question-answer pairs. Two learning frameworks are used to recognize such correct pairs: deterministic, nearest-neighbor learning of CDTs as graphs, and a tree kernel learning of CDTs, where a feature space of all CDT sub-trees is subject to SVM learning. We form the positive training set from the correct pairs obtained from Yahoo Answers, social network, corporate conversations including Enron emails, customer complaints and interviews by journalists. The corresponding negative training set is artificially created by attaching responses for different, inappropriate requests that include relevant keywords. The evaluation showed that it is possible to recognize valid pairs in 70% of cases in the domains of weak request-response agreement and 80% of cases in the domains of strong agreement, which is essential to support automated conversations. These accuracies are comparable with the benchmark task of classification of discourse trees themselves as valid or invalid, and also with classification of multi-sentence answers in factoid question-answering systems. The applicability of proposed machinery to the problem of chatbots, social chats and programming via NL is demonstrated. We conclude that learning rhetoric structures in the form of CDTs is the key source of data to support answering complex questions, chatbots and dialogue management.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"18 1","pages":"167-205"},"PeriodicalIF":0.0,"publicationDate":"2017-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88028334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

User-Adaptive A Posteriori Restoration for Incorrectly Segmented Utterances in Spoken Dialogue Systems 语音对话系统中错误分割话语的用户自适应后验恢复

Q1 Arts and Humanities

Dialogue and Discourse

Pub Date : 2017-12-15 DOI: 10.5087/DAD.2017.209

Kazunori Komatani, Naoki Hotta, Satoshi Sato, Mikio Nakano

Ideally, the users of spoken dialogue systems should be able to speak at their own tempo. Thus, the systems needs to interpret utterances from various users correctly, even when the utterances contain pauses. In response to this issue, we propose an approach based on a posteriori restoration for incorrectly segmented utterances. A crucial part of this approach is to determine whether restoration is required. We use a classiﬁcation-based approach, adapted to each user. We focus on each user’s dialogue tempo, which can be obtained during the dialogue, and determine the correlation between each user’s tempo and the appropriate thresholds for classiﬁcation. A linear regression function used to convert the tempos into thresholds is also derived. Experimental results show that the proposed user adaptation approach applied to two restoration classiﬁcation methods, thresholding and decision trees, improves classiﬁcation accuracies by 3.0% and 7.4%, respectively, in cross validation.

理想情况下，口语对话系统的用户应该能够以自己的节奏说话。因此，系统需要正确地解释来自不同用户的话语，即使这些话语包含停顿。针对这一问题，我们提出了一种基于后验恢复的方法来处理错误分割的话语。这种方法的一个关键部分是确定是否需要恢复。我们使用基于分类的方法，适合每个用户。我们关注每个用户在对话过程中获得的对话速度，并确定每个用户的对话速度与合适的分类阈值之间的相关性。一个线性回归函数用于将速度转换为阈值也被导出。实验结果表明，将用户自适应方法应用于阈值和决策树两种恢复分类方法，交叉验证的分类准确率分别提高了3.0%和7.4%。

引用次数: 1

Discourse Markers in Speech: Distinctive Features and Corpus Annotation 话语中的话语标记:特征与语料库标注

Q1 Arts and Humanities

Dialogue and Discourse

Pub Date : 2017-12-01 DOI: 10.5087/dad.2017.207

Ludivine Crible, Maria-Josep Cuenca

It is generally acknowledged that discourse markers are used differently in speech and writing, yet many general descriptions and most annotation frameworks are written-based, thus partially unfit to be applied in spoken corpora. This paper identifies the major distinctive features of discourse markers in spoken language, which can be associated with problems related to their scope and structure, their meaning and their tendency to co-occur. The description is based on authentic examples and is followed by methodological recommendations on how to deal with these phenomena in more exhaustive, speech-friendly annotation models.

人们普遍认为，话语标记在口语和写作中的使用是不同的，但许多一般描述和大多数注释框架都是基于书面的，因此部分不适合用于口语语料库。本文分析了口语话语标记语的主要特征，这些特征与话语标记语的范围和结构、意义以及共同出现的趋势有关。描述是基于真实的例子，然后是关于如何在更详尽的、语音友好的注释模型中处理这些现象的方法建议。

引用次数: 32

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Dialogue and Discourse

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀