首页 > 最新文献

Dialogue and Discourse最新文献

英文 中文
An Empirical Analysis of Subjectivity and Narrative Levels in Weblog Storytelling Across Cultures 跨文化博客叙事的主体性与叙事层次实证分析
Q1 Arts and Humanities Pub Date : 2017-11-20 DOI: 10.5087/DAD.2017.205
Reid Swanson, A. Gordon, P. Khooshabeh, Kenji Sagae, Richard Huskey, Michael Mangus, Ori Amir, R. Weber
Storytelling is a universal activity, but the way in which discourse structure is used to persuasively convey ideas and emotions may depend on cultural factors.  Because first-person accounts of life experiences can have a powerful impact in how a person is perceived, the storyteller may instinctively employ specific strategies to shape the audience's perception. Hypothesizing that some of the differences in storytelling can be captured by the use of narrative levels and subjectivity, we analyzed over one thousand narratives taken from personal weblogs. First, we compared stories from three different cultures written in their native languages: English, Chinese and Farsi. Second, we examined the impact of these two discourse properties on a reader's attitude and behavior toward the narrator. We found surprising similarities and differences in how stories are structured along these two dimensions across cultures. These discourse properties have a small but significant impact on a reader's behavioral response toward the narrator.
讲故事是一种普遍的活动,但话语结构用于有说服力地传达思想和情感的方式可能取决于文化因素。因为第一人称的生活经历会对人们对一个人的看法产生强大的影响,所以讲故事的人可能会本能地采用特定的策略来塑造观众的看法。假设讲故事的一些差异可以通过叙事层次和主观性的使用来捕捉,我们分析了来自个人博客的一千多篇叙事。首先,我们比较了用英语、汉语和波斯语这三种不同文化写成的故事。其次,我们考察了这两种话语属性对读者对叙述者的态度和行为的影响。我们发现,在不同文化中,故事在这两个维度上的结构有惊人的异同。这些话语属性对读者对叙述者的行为反应有微小但重要的影响。
{"title":"An Empirical Analysis of Subjectivity and Narrative Levels in Weblog Storytelling Across Cultures","authors":"Reid Swanson, A. Gordon, P. Khooshabeh, Kenji Sagae, Richard Huskey, Michael Mangus, Ori Amir, R. Weber","doi":"10.5087/DAD.2017.205","DOIUrl":"https://doi.org/10.5087/DAD.2017.205","url":null,"abstract":"Storytelling is a universal activity, but the way in which discourse structure is used to persuasively convey ideas and emotions may depend on cultural factors.  Because first-person accounts of life experiences can have a powerful impact in how a person is perceived, the storyteller may instinctively employ specific strategies to shape the audience's perception. Hypothesizing that some of the differences in storytelling can be captured by the use of narrative levels and subjectivity, we analyzed over one thousand narratives taken from personal weblogs. First, we compared stories from three different cultures written in their native languages: English, Chinese and Farsi. Second, we examined the impact of these two discourse properties on a reader's attitude and behavior toward the narrator. We found surprising similarities and differences in how stories are structured along these two dimensions across cultures. These discourse properties have a small but significant impact on a reader's behavioral response toward the narrator.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83338060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Discourse coherence and the interpretation of accented pronouns 语篇连贯与重读代词的解读
Q1 Arts and Humanities Pub Date : 2017-10-25 DOI: 10.5087/dad.2017.204
Mindaugas Mozuraitis, Daphna Heller
It has long been argued that accenting or stressing a pronoun (i.e., making it prosodically prominent) changes its interpretation as compared to its unaccented counterpart. However, recent experimental work demonstrated that this generalization does not apply when the alternative interpretation of the pronoun is not plausible (Taylor et al., 2013). In a series of three experiments that use an offline comprehension task, we show, first, that the lack of reversal is observed when plausibility is controlled for. We furthermore show that a new generalization cannot be formed by excluding cases where the bias towards the unmarked interpretation is strong or cases where the character in the alternative interpretation is low in salience. Instead, we conclude that what constrains the interpretation of accented pronouns is coherence relations, with parallel discourses exhibiting reversal and result discourses not exhibiting reversal. We propose that the difference between coherence relations should be viewed in what would be the minimal change in order to create a ‘surprising’ or expected’ event, which is the characteristic of accenting more generally.
长期以来,人们一直认为重读或重读一个代词(即使其在韵律上突出)会改变其与非重读代词相比的解释。然而,最近的实验工作表明,当代词的替代解释不合理时,这种概括并不适用(Taylor et al., 2013)。在使用离线理解任务的一系列三个实验中,我们表明,首先,当合理性被控制时,观察到缺乏反转。我们进一步表明,不能通过排除对未标记解释的偏见很强或替代解释中的特征显着性较低的情况来形成新的概括。相反,我们得出结论,限制重读代词解释的是连贯关系,平行语篇表现出反转,结果语篇不表现反转。我们建议,连贯性关系之间的差异应该被看作是为了创造一个“令人惊讶的”或预期的“事件而发生的最小变化,这是重音更普遍的特征。
{"title":"Discourse coherence and the interpretation of accented pronouns","authors":"Mindaugas Mozuraitis, Daphna Heller","doi":"10.5087/dad.2017.204","DOIUrl":"https://doi.org/10.5087/dad.2017.204","url":null,"abstract":"It has long been argued that accenting or stressing a pronoun (i.e., making it prosodically prominent) changes its interpretation as compared to its unaccented counterpart. However, recent experimental work demonstrated that this generalization does not apply when the alternative interpretation of the pronoun is not plausible (Taylor et al., 2013). In a series of three experiments that use an offline comprehension task, we show, first, that the lack of reversal is observed when plausibility is controlled for. We furthermore show that a new generalization cannot be formed by excluding cases where the bias towards the unmarked interpretation is strong or cases where the character in the alternative interpretation is low in salience. Instead, we conclude that what constrains the interpretation of accented pronouns is coherence relations, with parallel discourses exhibiting reversal and result discourses not exhibiting reversal. We propose that the difference between coherence relations should be viewed in what would be the minimal change in order to create a ‘surprising’ or expected’ event, which is the characteristic of accenting more generally.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87081088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On Temporality in Discourse Annotation: Theoretical and Practical Considerations 论话语注释中的时间性:理论与实践的思考
Q1 Arts and Humanities Pub Date : 2017-07-19 DOI: 10.5087/DAD.2017.201
J. Evers-Vermeul, J. Hoek, Merel C. J. Scholman
Temporal information is one of the prominent features that determine the coherence in a discourse. That is why we need an adequate way to deal with this type of information during discourse annotation. In this paper, we will argue that temporal order is a relational rather than a segment-specific property, and that it is a cognitively plausible notion: temporal order is expressed in the system of linguistic markers and is relevant in both acquisition and language processing. This means that temporal relations meet the requirements set by the Cognitive approach of Coherence Relations (CCR) to be considered coherence relations, and that CCR would need a way to distinguish temporal relations within its annotation system. We will present merits and drawbacks of different options of reaching this objective and argue in favor of adding temporal order as a new dimension to CCR.
时间信息是决定语篇连贯的重要特征之一。这就是为什么我们需要一个适当的方法来处理这类信息在话语注释。在本文中,我们将论证时间顺序是一种关系属性,而不是特定于语段的属性,这是一个认知上合理的概念:时间顺序在语言标记系统中表达,在习得和语言加工中都是相关的。这意味着时间关系满足连贯关系认知方法(Cognitive approach of Coherence relations, CCR)将时间关系视为连贯关系的要求,并且CCR需要一种在其注释系统中区分时间关系的方法。我们将介绍实现这一目标的不同选择的优点和缺点,并赞成将时间顺序作为CCR的一个新维度。
{"title":"On Temporality in Discourse Annotation: Theoretical and Practical Considerations","authors":"J. Evers-Vermeul, J. Hoek, Merel C. J. Scholman","doi":"10.5087/DAD.2017.201","DOIUrl":"https://doi.org/10.5087/DAD.2017.201","url":null,"abstract":"Temporal information is one of the prominent features that determine the coherence in a discourse. That is why we need an adequate way to deal with this type of information during discourse annotation. In this paper, we will argue that temporal order is a relational rather than a segment-specific property, and that it is a cognitively plausible notion: temporal order is expressed in the system of linguistic markers and is relevant in both acquisition and language processing. This means that temporal relations meet the requirements set by the Cognitive approach of Coherence Relations (CCR) to be considered coherence relations, and that CCR would need a way to distinguish temporal relations within its annotation system. We will present merits and drawbacks of different options of reaching this objective and argue in favor of adding temporal order as a new dimension to CCR.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85794651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Examples and Specifications that Prove a Point: Identifying Elaborative and Argumentative Discourse Relations 证明一个观点的例子和说明:确定阐述和论证的话语关系
Q1 Arts and Humanities Pub Date : 2017-07-19 DOI: 10.5087/dad.2017.203
Merel C. J. Scholman, Vera Demberg
Examples and specifications occur frequently in text, but not much is known about how they function in discourse and how readers interpret them. Looking at how they’re annotated in existing discourse corpora, we find that annotators often disagree on these types of relations; specifically, there is disagreement about whether these relations are elaborative (additive) or argumentative (pragmatic causal). To investigate how readers interpret examples and specifications, we conducted a crowdsourced discourse annotation study. The results show that these relations can indeed have two functions: they can be used to both illustrate / specify a situation and serve as an argument for a claim. These findings suggest that examples and specifications can have multiple simultaneous readings. We discuss the implications of these results for discourse annotation.
例子和规范经常出现在文本中,但对于它们在话语中的作用以及读者如何解释它们却知之甚少。看看它们在现有话语语料库中的注释方式,我们发现注释者经常对这些类型的关系持不同意见;具体来说,对于这些关系是阐述性的(附加的)还是论证性的(语用因果的)存在分歧。为了研究读者如何解读例子和规范,我们进行了一项众包话语注释研究。结果表明,这些关系确实可以有两个功能:它们既可以用来说明/说明一种情况,也可以作为一种主张的论据。这些发现表明示例和规范可以同时有多个读数。我们讨论了这些结果对语篇注释的影响。
{"title":"Examples and Specifications that Prove a Point: Identifying Elaborative and Argumentative Discourse Relations","authors":"Merel C. J. Scholman, Vera Demberg","doi":"10.5087/dad.2017.203","DOIUrl":"https://doi.org/10.5087/dad.2017.203","url":null,"abstract":"Examples and specifications occur frequently in text, but not much is known about how they function in discourse and how readers interpret them. Looking at how they’re annotated in existing discourse corpora, we find that annotators often disagree on these types of relations; specifically, there is disagreement about whether these relations are elaborative (additive) or argumentative (pragmatic causal). To investigate how readers interpret examples and specifications, we conducted a crowdsourced discourse annotation study. The results show that these relations can indeed have two functions: they can be used to both illustrate / specify a situation and serve as an argument for a claim. These findings suggest that examples and specifications can have multiple simultaneous readings. We discuss the implications of these results for discourse annotation.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80316228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Dialog Structure Through the Lens of Gender, Gender Environment, and Power 性别、性别环境与权力视角下的对话结构
Q1 Arts and Humanities Pub Date : 2017-06-12 DOI: 10.5087/dad.2017.202
Vinodkumar Prabhakaran, Owen Rambow
Understanding how the social context of an interaction affects our dialog behavior is of great interest to social scientists who study human behavior, as well as to computer scientists who build automatic methods to infer those social contexts. In this paper, we study the interaction of power, gender, and dialog behavior in organizational interactions. In order to perform this study, we first construct the Gender Identified Enron Corpus of emails, in which we semi-automatically assign the gender of around 23,000 individuals who authored around 97,000 email messages in the Enron corpus. This corpus, which is made freely available, is orders of magnitude larger than previously existing gender identified corpora in the email domain. Next, we use this corpus to perform a large-scale data-oriented study of the interplay of gender and manifestations of power. We argue that, in addition to one's own gender, the "gender environment" of an interaction, i.e., the gender makeup of one's interlocutors, also affects the way power is manifested in dialog. We focus especially on manifestations of power in the dialog structure --- both, in a shallow sense that disregards the textual content of messages (e.g., how often do the participants contribute, how often do they get replies etc.), as well as the structure that is expressed within the textual content (e.g., who issues requests and how are they made, whose requests get responses etc.). We find that both gender and gender environment affect the ways power is manifested in dialog, resulting in patterns that reveal the underlying factors. Finally, we show the utility of gender information in the problem of automatically predicting the direction of power between pairs of participants in email interactions.
了解互动的社会背景如何影响我们的对话行为是研究人类行为的社会科学家以及建立自动方法来推断这些社会背景的计算机科学家非常感兴趣的。本文研究了权力、性别和对话行为在组织互动中的相互作用。为了进行这项研究,我们首先构建了性别识别安然电子邮件语料库,在该语料库中,我们半自动地分配了大约23,000个人的性别,这些人在安然语料库中撰写了大约97,000封电子邮件。该语料库是免费提供的,比以前在电子邮件领域存在的性别识别语料库要大几个数量级。接下来,我们使用这个语料库对性别和权力表现的相互作用进行了大规模的数据导向研究。我们认为,除了一个人自身的性别之外,互动的“性别环境”,即对话者的性别构成,也会影响权力在对话中的表现方式。我们特别关注对话结构中权力的表现——两者都是,在肤浅的意义上,忽略了消息的文本内容(例如,参与者贡献的频率,他们得到回复的频率等),以及文本内容中表达的结构(例如,谁发出请求以及如何发出请求,谁的请求得到响应等)。我们发现,性别和性别环境都会影响权力在对话中的表现方式,从而形成揭示潜在因素的模式。最后,我们展示了性别信息在自动预测电子邮件交互中参与者对之间权力方向的问题中的效用。
{"title":"Dialog Structure Through the Lens of Gender, Gender Environment, and Power","authors":"Vinodkumar Prabhakaran, Owen Rambow","doi":"10.5087/dad.2017.202","DOIUrl":"https://doi.org/10.5087/dad.2017.202","url":null,"abstract":"Understanding how the social context of an interaction affects our dialog behavior is of great interest to social scientists who study human behavior, as well as to computer scientists who build automatic methods to infer those social contexts. In this paper, we study the interaction of power, gender, and dialog behavior in organizational interactions. In order to perform this study, we first construct the Gender Identified Enron Corpus of emails, in which we semi-automatically assign the gender of around 23,000 individuals who authored around 97,000 email messages in the Enron corpus. This corpus, which is made freely available, is orders of magnitude larger than previously existing gender identified corpora in the email domain. Next, we use this corpus to perform a large-scale data-oriented study of the interplay of gender and manifestations of power. We argue that, in addition to one's own gender, the \"gender environment\" of an interaction, i.e., the gender makeup of one's interlocutors, also affects the way power is manifested in dialog. We focus especially on manifestations of power in the dialog structure --- both, in a shallow sense that disregards the textual content of messages (e.g., how often do the participants contribute, how often do they get replies etc.), as well as the structure that is expressed within the textual content (e.g., who issues requests and how are they made, whose requests get responses etc.). We find that both gender and gender environment affect the ways power is manifested in dialog, resulting in patterns that reveal the underlying factors. Finally, we show the utility of gender information in the problem of automatically predicting the direction of power between pairs of participants in email interactions.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86582750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Just because: In search of objective criteria of subjectivity expressed by causal connectives 正因为:寻找由因果联系词表达的主观性的客观标准
Q1 Arts and Humanities Pub Date : 2017-02-08 DOI: 10.5087/dad.2017.105
N. Levshina, Liesbeth Degand
The connective because can express both highly objective and highly subjective causal relations. In this, it differs from its counterparts in other languages, e.g. Dutch, where two conjunctions omdat and want express more objective and more subjective causal relations, respectively. The present study investigates whether it is possible to anchor the different uses of because in context, examining a large number of syntactic, morphological and semantic cues with a minimal cost of manual annotation. We propose an innovative method of distinguishing between subjective and objective uses of because with the help of information available from an English/Dutch segment of a parallel corpus, which is accompanied by a distributional analysis of contextual features. On the basis of automatic syntactic and morphological annotation of approximately 1500 examples of because , every English sentence is coded semi-automatically for more than twenty contextual variables, such as the part of speech, number, person, semantic class of the subject, modality, etc. We employ logistic regression to determine whether these contextual variables help predict which of the two causal connectives is used in the corresponding Dutch sentences. Our results indicate that a set of semantic and syntactic features that include modality, semantics of referents (subjects), semantic class of the verbal predicate, tense (past vs. non-past) and the presence of evaluative adjectives, are reliable predictors of the more subjective and objective uses of because , demonstrating that this distinction can indeed be anchored in the immediate linguistic context. The proposed method and relevant contextual cues can be used for identification of objective and subjective relationships in discourse.
连接词because既可以表达高度客观的因果关系,也可以表达高度主观的因果关系。在这一点上,它不同于其他语言中的对应词,例如荷兰语,在荷兰语中,两个连词分别表示更客观和更主观的因果关系。本研究探讨了是否有可能在上下文中锚定because的不同用法,以最小的人工注释成本检查大量的句法、形态和语义线索。我们提出了一种区分because的主观和客观使用的创新方法,该方法使用了来自平行语料库的英语/荷兰语片段的信息,并伴随着上下文特征的分布分析。在对大约1500例because的句法和形态自动标注的基础上,对每个英语句子进行了20多个上下文变量的半自动编码,如词性、数、人称、主语的语义类、情态等。我们使用逻辑回归来确定这些上下文变量是否有助于预测在相应的荷兰语句子中使用的两个因果连接词中的哪一个。我们的研究结果表明,一组语义和句法特征,包括情态、指涉物(主语)的语义、动词谓语的语义类别、时态(过去与非过去)和评价性形容词的存在,是because更主观和客观使用的可靠预测因素,表明这种区别确实可以锚定在直接的语言语境中。该方法和相关的语境线索可用于识别语篇中的客观和主观关系。
{"title":"Just because: In search of objective criteria of subjectivity expressed by causal connectives","authors":"N. Levshina, Liesbeth Degand","doi":"10.5087/dad.2017.105","DOIUrl":"https://doi.org/10.5087/dad.2017.105","url":null,"abstract":"The connective because can express both highly objective and highly subjective causal relations. In this, it differs from its counterparts in other languages, e.g. Dutch, where two conjunctions omdat and want express more objective and more subjective causal relations, respectively. The present study investigates whether it is possible to anchor the different uses of because in context, examining a large number of syntactic, morphological and semantic cues with a minimal cost of manual annotation. We propose an innovative method of distinguishing between subjective and objective uses of because with the help of information available from an English/Dutch segment of a parallel corpus, which is accompanied by a distributional analysis of contextual features. On the basis of automatic syntactic and morphological annotation of approximately 1500 examples of because , every English sentence is coded semi-automatically for more than twenty contextual variables, such as the part of speech, number, person, semantic class of the subject, modality, etc. We employ logistic regression to determine whether these contextual variables help predict which of the two causal connectives is used in the corresponding Dutch sentences. Our results indicate that a set of semantic and syntactic features that include modality, semantics of referents (subjects), semantic class of the verbal predicate, tense (past vs. non-past) and the presence of evaluative adjectives, are reliable predictors of the more subjective and objective uses of because , demonstrating that this distinction can indeed be anchored in the immediate linguistic context. The proposed method and relevant contextual cues can be used for identification of objective and subjective relationships in discourse.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81947881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus 训练端到端对话系统与Ubuntu对话语料库
Q1 Arts and Humanities Pub Date : 2017-01-31 DOI: 10.5087/dad.2017.102
R. Lowe, Nissan Pow, Iulian Serban, Laurent Charlin, Chia-Wei Liu, Joelle Pineau
In this paper, we construct and train end-to-end neural network-based dialogue systems using an updated version of the recent Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering, which can be both time consuming and expensive. We provide baselines  in two different environments: one where models are trained to maximize the log-likelihood of a generated utterance  conditioned on the context of the conversation, and one where models are trained to select the correct next response from a list of candidate responses. These are both evaluated on a recall task that we call Next Utterance Classification (NUC), as well as other generation-specific metrics. Finally, we provide a qualitative error analysis to help determine the most promising directions for future research on the Ubuntu  Dialogue Corpus, and for end-to-end dialogue systems in general.
在本文中,我们使用最新版本的Ubuntu对话语料库构建和训练端到端基于神经网络的对话系统,该语料库包含近100万个多回合对话,总共有超过700万个话语和1亿个单词。这个数据集很有趣,因为它的大小、长上下文长度和技术性质;因此,它可以用最小的特征工程直接从数据中训练大型模型,这既耗时又昂贵。我们在两种不同的环境中提供基线:一种是训练模型以最大化根据对话上下文生成的话语的对数似然,另一种是训练模型从候选响应列表中选择正确的下一个响应。这些都是通过我们称之为下一个话语分类(NUC)的回忆任务以及其他特定于代的指标来评估的。最后,我们提供了一个定性错误分析,以帮助确定Ubuntu对话语料库和端到端对话系统的未来研究最有希望的方向。
{"title":"Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus","authors":"R. Lowe, Nissan Pow, Iulian Serban, Laurent Charlin, Chia-Wei Liu, Joelle Pineau","doi":"10.5087/dad.2017.102","DOIUrl":"https://doi.org/10.5087/dad.2017.102","url":null,"abstract":"In this paper, we construct and train end-to-end neural network-based dialogue systems using an updated version of the recent Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering, which can be both time consuming and expensive. We provide baselines  in two different environments: one where models are trained to maximize the log-likelihood of a generated utterance  conditioned on the context of the conversation, and one where models are trained to select the correct next response from a list of candidate responses. These are both evaluated on a recall task that we call Next Utterance Classification (NUC), as well as other generation-specific metrics. Finally, we provide a qualitative error analysis to help determine the most promising directions for future research on the Ubuntu  Dialogue Corpus, and for end-to-end dialogue systems in general.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72494917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 152
Non-Native Differences in Prosodic-Construction Use 韵律结构使用的非母语差异
Q1 Arts and Humanities Pub Date : 2017-01-31 DOI: 10.5087/dad.2017.101
Nigel G. Ward, Paola Gallardo
Many language learners never acquire truly native-sounding prosody. Previous work has suggested that this involves skill deficits in the dialog-related uses of prosody, and may be attributable to weaknesses with specific prosodic constructions. Using semi-automated methods, we identified 32 of the most common prosodic constructions in English dialog. Examining 90 minutes of six advanced native-Spanish learners conversing in English, there were differences, notably regarding swift turn-taking, alignment, and empathy, but overall their uses of prosodic constructions were largely similar to those of native speakers.
许多语言学习者从未获得真正地道的韵律。先前的研究表明,这涉及到与对话相关的韵律使用的技能缺陷,并且可能归因于特定韵律结构的弱点。使用半自动化的方法,我们识别了英语对话中32个最常见的韵律结构。通过对6名高水平的西班牙语母语学习者90分钟的英语会话进行研究,发现他们之间存在差异,尤其是在快速轮位、对齐和同理心方面,但总体而言,他们对韵律结构的使用与母语人士基本相似。
{"title":"Non-Native Differences in Prosodic-Construction Use","authors":"Nigel G. Ward, Paola Gallardo","doi":"10.5087/dad.2017.101","DOIUrl":"https://doi.org/10.5087/dad.2017.101","url":null,"abstract":"Many language learners never acquire truly native-sounding prosody. Previous work has suggested that this involves skill deficits in the dialog-related uses of prosody, and may be attributable to weaknesses with specific prosodic constructions. Using semi-automated methods, we identified 32 of the most common prosodic constructions in English dialog. Examining 90 minutes of six advanced native-Spanish learners conversing in English, there were differences, notably regarding swift turn-taking, alignment, and empathy, but overall their uses of prosodic constructions were largely similar to those of native speakers.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79377451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
A Psycholinguistic Model for the Marking of Discourse Relations 话语关系标记的心理语言学模型
Q1 Arts and Humanities Pub Date : 2017-01-31 DOI: 10.5087/DAD.2017.104
Frances Yung, Kevin Duh, T. Komura, Yuji Matsumoto
Discourse relations can either be explicitly marked by discourse connectives (DCs), such as therefore and but , or implicitly conveyed in natural language utterances. How speakers choose between the two options is a question that is not well understood. In this study, we propose a psycholinguistic model that predicts whether or not speakers will produce an explicit marker given the discourse relation they wish to express. Our model is based on two information-theoretic frameworks: (1) the Rational Speech Acts model, which models the pragmatic interaction between language production and interpretation by Bayesian inference, and (2) the Uniform Information Density theory, which advocates that speakers adjust linguistic redundancy to maintain a uniform rate of information transmission. Specifically, our model quantifies the utility of using or omitting a DC based on the expected surprisal of comprehension, cost of production, and availability of other signals in the rest of the utterance. Experiments based on the Penn Discourse Treebank show that our approach outperforms the state-of-the-art performance at predicting the presence of DCs (Patterson and Kehler, 2013), in addition to giving an explanatory account of the speaker’s choice.
话语关系既可以由话语连接词(dc)明确标记,如“因此”和“但是”,也可以在自然语言话语中含蓄地传达。说话者如何在这两种选择之间做出选择,这是一个尚未得到很好理解的问题。在本研究中,我们提出了一个心理语言学模型来预测说话者是否会在给定他们希望表达的话语关系下产生显式标记。我们的模型基于两个信息理论框架:(1)理性言语行为模型,该模型通过贝叶斯推理模拟语言产生和解释之间的语用相互作用;(2)均匀信息密度理论,该理论主张说话者调整语言冗余以保持统一的信息传递率。具体来说,我们的模型基于理解的预期惊讶度、生产成本和话语其余部分中其他信号的可用性,量化了使用或省略DC的效用。基于宾夕法尼亚大学话语树库的实验表明,我们的方法在预测dc的存在方面优于最先进的性能(Patterson和Kehler, 2013),此外还给出了说话人选择的解释性说明。
{"title":"A Psycholinguistic Model for the Marking of Discourse Relations","authors":"Frances Yung, Kevin Duh, T. Komura, Yuji Matsumoto","doi":"10.5087/DAD.2017.104","DOIUrl":"https://doi.org/10.5087/DAD.2017.104","url":null,"abstract":"Discourse relations can either be explicitly marked by discourse connectives (DCs), such as therefore and but , or implicitly conveyed in natural language utterances. How speakers choose between the two options is a question that is not well understood. In this study, we propose a psycholinguistic model that predicts whether or not speakers will produce an explicit marker given the discourse relation they wish to express. Our model is based on two information-theoretic frameworks: (1) the Rational Speech Acts model, which models the pragmatic interaction between language production and interpretation by Bayesian inference, and (2) the Uniform Information Density theory, which advocates that speakers adjust linguistic redundancy to maintain a uniform rate of information transmission. Specifically, our model quantifies the utility of using or omitting a DC based on the expected surprisal of comprehension, cost of production, and availability of other signals in the rest of the utterance. Experiments based on the Penn Discourse Treebank show that our approach outperforms the state-of-the-art performance at predicting the presence of DCs (Patterson and Kehler, 2013), in addition to giving an explanatory account of the speaker’s choice.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73721433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A corpus-driven approach to discourse organisation: from cues to complex markers 语料库驱动的话语组织方法:从线索到复杂标记
Q1 Arts and Humanities Pub Date : 2017-01-31 DOI: 10.5087/dad.2017.103
Marie-Paule Péry-Woodley, L. Ho-Dac, Josette Rebeyrolle, Ludovic Tanguy, Cécile Fabre
This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature markup alongside manual annotation, we explore a method to identify complex discourse markers seen as configurations of cues. The presentation of the background to what is termed " multi-level annotation " is organised around four issues: linearity, complexity of discourse markers, top-down processing, granularity and the multi-level nature of discourse structures. In this context, enumerative structures seem to deserve scrutiny for a number of reasons: they are frequent structures appearing at different granularity levels, they are signalled by a variety of devices appearing to work together in complex ways, and they combine a textual role (discourse organisation) with an ideational role (categorisation). We describe the annotation procedure and experimental framework which resulted in nearly 1,000 enumerative structures being annotated in a diversified corpus of over 600,000 words. The results of two approaches to the rich data produced are then presented: firstly, a descriptive survey highlights considerable variation in length and composition, while showing enumerative structure to be a basic strategy resorted to in all three sub-corpora, and leads to a granularity-based typology of the annotated structures; secondly, recurrent cue configurations—-our " complex markers " —-are identified by the application of data mining methods. The paper ends with perspectives for further exploitation of the data, in particular with respect to the semantic characterisation of enumerative structures.
本文报道了一项实验,该实验实现了一种数据密集型的话语组织方法。它的重点是枚举结构设想作为一种类型的文本模式在顺序导向的方法,以语篇。在大规模标注练习的基础上,我们探索了一种识别复杂话语标记的方法,这种标记被视为线索的配置。所谓的“多层次注释”的背景是围绕四个问题来组织的:线性、话语标记的复杂性、自上而下的处理、粒度和话语结构的多层次性质。在这种情况下,列举结构似乎值得仔细研究,原因有很多:它们是出现在不同粒度级别的频繁结构,它们由各种各样的设备以复杂的方式协同工作,它们结合了文本角色(话语组织)和概念角色(分类)。我们描述了标注过程和实验框架,从而在60多万字的多样化语料库中标注了近1000个枚举结构。然后提出了对产生的丰富数据的两种方法的结果:首先,描述性调查突出了长度和组成的相当大的变化,同时显示枚举结构是所有三个子语料库中采用的基本策略,并导致了基于粒度的注释结构类型;其次,循环线索配置——我们的“复杂标记”——通过数据挖掘方法的应用来识别。论文最后对数据的进一步开发进行了展望,特别是关于枚举结构的语义特征。
{"title":"A corpus-driven approach to discourse organisation: from cues to complex markers","authors":"Marie-Paule Péry-Woodley, L. Ho-Dac, Josette Rebeyrolle, Ludovic Tanguy, Cécile Fabre","doi":"10.5087/dad.2017.103","DOIUrl":"https://doi.org/10.5087/dad.2017.103","url":null,"abstract":"This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature markup alongside manual annotation, we explore a method to identify complex discourse markers seen as configurations of cues. The presentation of the background to what is termed \" multi-level annotation \" is organised around four issues: linearity, complexity of discourse markers, top-down processing, granularity and the multi-level nature of discourse structures. In this context, enumerative structures seem to deserve scrutiny for a number of reasons: they are frequent structures appearing at different granularity levels, they are signalled by a variety of devices appearing to work together in complex ways, and they combine a textual role (discourse organisation) with an ideational role (categorisation). We describe the annotation procedure and experimental framework which resulted in nearly 1,000 enumerative structures being annotated in a diversified corpus of over 600,000 words. The results of two approaches to the rich data produced are then presented: firstly, a descriptive survey highlights considerable variation in length and composition, while showing enumerative structure to be a basic strategy resorted to in all three sub-corpora, and leads to a granularity-based typology of the annotated structures; secondly, recurrent cue configurations—-our \" complex markers \" —-are identified by the application of data mining methods. The paper ends with perspectives for further exploitation of the data, in particular with respect to the semantic characterisation of enumerative structures.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90934928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Dialogue and Discourse
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1