Reid Swanson, A. Gordon, P. Khooshabeh, Kenji Sagae, Richard Huskey, Michael Mangus, Ori Amir, R. Weber
Storytelling is a universal activity, but the way in which discourse structure is used to persuasively convey ideas and emotions may depend on cultural factors. Because first-person accounts of life experiences can have a powerful impact in how a person is perceived, the storyteller may instinctively employ specific strategies to shape the audience's perception. Hypothesizing that some of the differences in storytelling can be captured by the use of narrative levels and subjectivity, we analyzed over one thousand narratives taken from personal weblogs. First, we compared stories from three different cultures written in their native languages: English, Chinese and Farsi. Second, we examined the impact of these two discourse properties on a reader's attitude and behavior toward the narrator. We found surprising similarities and differences in how stories are structured along these two dimensions across cultures. These discourse properties have a small but significant impact on a reader's behavioral response toward the narrator.
{"title":"An Empirical Analysis of Subjectivity and Narrative Levels in Weblog Storytelling Across Cultures","authors":"Reid Swanson, A. Gordon, P. Khooshabeh, Kenji Sagae, Richard Huskey, Michael Mangus, Ori Amir, R. Weber","doi":"10.5087/DAD.2017.205","DOIUrl":"https://doi.org/10.5087/DAD.2017.205","url":null,"abstract":"Storytelling is a universal activity, but the way in which discourse structure is used to persuasively convey ideas and emotions may depend on cultural factors. Because first-person accounts of life experiences can have a powerful impact in how a person is perceived, the storyteller may instinctively employ specific strategies to shape the audience's perception. Hypothesizing that some of the differences in storytelling can be captured by the use of narrative levels and subjectivity, we analyzed over one thousand narratives taken from personal weblogs. First, we compared stories from three different cultures written in their native languages: English, Chinese and Farsi. Second, we examined the impact of these two discourse properties on a reader's attitude and behavior toward the narrator. We found surprising similarities and differences in how stories are structured along these two dimensions across cultures. These discourse properties have a small but significant impact on a reader's behavioral response toward the narrator.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"54 1","pages":"105-128"},"PeriodicalIF":0.0,"publicationDate":"2017-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83338060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It has long been argued that accenting or stressing a pronoun (i.e., making it prosodically prominent) changes its interpretation as compared to its unaccented counterpart. However, recent experimental work demonstrated that this generalization does not apply when the alternative interpretation of the pronoun is not plausible (Taylor et al., 2013). In a series of three experiments that use an offline comprehension task, we show, first, that the lack of reversal is observed when plausibility is controlled for. We furthermore show that a new generalization cannot be formed by excluding cases where the bias towards the unmarked interpretation is strong or cases where the character in the alternative interpretation is low in salience. Instead, we conclude that what constrains the interpretation of accented pronouns is coherence relations, with parallel discourses exhibiting reversal and result discourses not exhibiting reversal. We propose that the difference between coherence relations should be viewed in what would be the minimal change in order to create a ‘surprising’ or expected’ event, which is the characteristic of accenting more generally.
长期以来,人们一直认为重读或重读一个代词(即使其在韵律上突出)会改变其与非重读代词相比的解释。然而,最近的实验工作表明,当代词的替代解释不合理时,这种概括并不适用(Taylor et al., 2013)。在使用离线理解任务的一系列三个实验中,我们表明,首先,当合理性被控制时,观察到缺乏反转。我们进一步表明,不能通过排除对未标记解释的偏见很强或替代解释中的特征显着性较低的情况来形成新的概括。相反,我们得出结论,限制重读代词解释的是连贯关系,平行语篇表现出反转,结果语篇不表现反转。我们建议,连贯性关系之间的差异应该被看作是为了创造一个“令人惊讶的”或预期的“事件而发生的最小变化,这是重音更普遍的特征。
{"title":"Discourse coherence and the interpretation of accented pronouns","authors":"Mindaugas Mozuraitis, Daphna Heller","doi":"10.5087/dad.2017.204","DOIUrl":"https://doi.org/10.5087/dad.2017.204","url":null,"abstract":"It has long been argued that accenting or stressing a pronoun (i.e., making it prosodically prominent) changes its interpretation as compared to its unaccented counterpart. However, recent experimental work demonstrated that this generalization does not apply when the alternative interpretation of the pronoun is not plausible (Taylor et al., 2013). In a series of three experiments that use an offline comprehension task, we show, first, that the lack of reversal is observed when plausibility is controlled for. We furthermore show that a new generalization cannot be formed by excluding cases where the bias towards the unmarked interpretation is strong or cases where the character in the alternative interpretation is low in salience. Instead, we conclude that what constrains the interpretation of accented pronouns is coherence relations, with parallel discourses exhibiting reversal and result discourses not exhibiting reversal. We propose that the difference between coherence relations should be viewed in what would be the minimal change in order to create a ‘surprising’ or expected’ event, which is the characteristic of accenting more generally.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"35 1","pages":"84-104"},"PeriodicalIF":0.0,"publicationDate":"2017-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87081088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Temporal information is one of the prominent features that determine the coherence in a discourse. That is why we need an adequate way to deal with this type of information during discourse annotation. In this paper, we will argue that temporal order is a relational rather than a segment-specific property, and that it is a cognitively plausible notion: temporal order is expressed in the system of linguistic markers and is relevant in both acquisition and language processing. This means that temporal relations meet the requirements set by the Cognitive approach of Coherence Relations (CCR) to be considered coherence relations, and that CCR would need a way to distinguish temporal relations within its annotation system. We will present merits and drawbacks of different options of reaching this objective and argue in favor of adding temporal order as a new dimension to CCR.
时间信息是决定语篇连贯的重要特征之一。这就是为什么我们需要一个适当的方法来处理这类信息在话语注释。在本文中,我们将论证时间顺序是一种关系属性,而不是特定于语段的属性,这是一个认知上合理的概念:时间顺序在语言标记系统中表达,在习得和语言加工中都是相关的。这意味着时间关系满足连贯关系认知方法(Cognitive approach of Coherence relations, CCR)将时间关系视为连贯关系的要求,并且CCR需要一种在其注释系统中区分时间关系的方法。我们将介绍实现这一目标的不同选择的优点和缺点,并赞成将时间顺序作为CCR的一个新维度。
{"title":"On Temporality in Discourse Annotation: Theoretical and Practical Considerations","authors":"J. Evers-Vermeul, J. Hoek, Merel C. J. Scholman","doi":"10.5087/DAD.2017.201","DOIUrl":"https://doi.org/10.5087/DAD.2017.201","url":null,"abstract":"Temporal information is one of the prominent features that determine the coherence in a discourse. That is why we need an adequate way to deal with this type of information during discourse annotation. In this paper, we will argue that temporal order is a relational rather than a segment-specific property, and that it is a cognitively plausible notion: temporal order is expressed in the system of linguistic markers and is relevant in both acquisition and language processing. This means that temporal relations meet the requirements set by the Cognitive approach of Coherence Relations (CCR) to be considered coherence relations, and that CCR would need a way to distinguish temporal relations within its annotation system. We will present merits and drawbacks of different options of reaching this objective and argue in favor of adding temporal order as a new dimension to CCR.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"99 1","pages":"1-20"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85794651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Examples and specifications occur frequently in text, but not much is known about how they function in discourse and how readers interpret them. Looking at how they’re annotated in existing discourse corpora, we find that annotators often disagree on these types of relations; specifically, there is disagreement about whether these relations are elaborative (additive) or argumentative (pragmatic causal). To investigate how readers interpret examples and specifications, we conducted a crowdsourced discourse annotation study. The results show that these relations can indeed have two functions: they can be used to both illustrate / specify a situation and serve as an argument for a claim. These findings suggest that examples and specifications can have multiple simultaneous readings. We discuss the implications of these results for discourse annotation.
{"title":"Examples and Specifications that Prove a Point: Identifying Elaborative and Argumentative Discourse Relations","authors":"Merel C. J. Scholman, Vera Demberg","doi":"10.5087/dad.2017.203","DOIUrl":"https://doi.org/10.5087/dad.2017.203","url":null,"abstract":"Examples and specifications occur frequently in text, but not much is known about how they function in discourse and how readers interpret them. Looking at how they’re annotated in existing discourse corpora, we find that annotators often disagree on these types of relations; specifically, there is disagreement about whether these relations are elaborative (additive) or argumentative (pragmatic causal). To investigate how readers interpret examples and specifications, we conducted a crowdsourced discourse annotation study. The results show that these relations can indeed have two functions: they can be used to both illustrate / specify a situation and serve as an argument for a claim. These findings suggest that examples and specifications can have multiple simultaneous readings. We discuss the implications of these results for discourse annotation.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"324 1","pages":"56-83"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80316228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding how the social context of an interaction affects our dialog behavior is of great interest to social scientists who study human behavior, as well as to computer scientists who build automatic methods to infer those social contexts. In this paper, we study the interaction of power, gender, and dialog behavior in organizational interactions. In order to perform this study, we first construct the Gender Identified Enron Corpus of emails, in which we semi-automatically assign the gender of around 23,000 individuals who authored around 97,000 email messages in the Enron corpus. This corpus, which is made freely available, is orders of magnitude larger than previously existing gender identified corpora in the email domain. Next, we use this corpus to perform a large-scale data-oriented study of the interplay of gender and manifestations of power. We argue that, in addition to one's own gender, the "gender environment" of an interaction, i.e., the gender makeup of one's interlocutors, also affects the way power is manifested in dialog. We focus especially on manifestations of power in the dialog structure --- both, in a shallow sense that disregards the textual content of messages (e.g., how often do the participants contribute, how often do they get replies etc.), as well as the structure that is expressed within the textual content (e.g., who issues requests and how are they made, whose requests get responses etc.). We find that both gender and gender environment affect the ways power is manifested in dialog, resulting in patterns that reveal the underlying factors. Finally, we show the utility of gender information in the problem of automatically predicting the direction of power between pairs of participants in email interactions.
{"title":"Dialog Structure Through the Lens of Gender, Gender Environment, and Power","authors":"Vinodkumar Prabhakaran, Owen Rambow","doi":"10.5087/dad.2017.202","DOIUrl":"https://doi.org/10.5087/dad.2017.202","url":null,"abstract":"Understanding how the social context of an interaction affects our dialog behavior is of great interest to social scientists who study human behavior, as well as to computer scientists who build automatic methods to infer those social contexts. In this paper, we study the interaction of power, gender, and dialog behavior in organizational interactions. In order to perform this study, we first construct the Gender Identified Enron Corpus of emails, in which we semi-automatically assign the gender of around 23,000 individuals who authored around 97,000 email messages in the Enron corpus. This corpus, which is made freely available, is orders of magnitude larger than previously existing gender identified corpora in the email domain. Next, we use this corpus to perform a large-scale data-oriented study of the interplay of gender and manifestations of power. We argue that, in addition to one's own gender, the \"gender environment\" of an interaction, i.e., the gender makeup of one's interlocutors, also affects the way power is manifested in dialog. We focus especially on manifestations of power in the dialog structure --- both, in a shallow sense that disregards the textual content of messages (e.g., how often do the participants contribute, how often do they get replies etc.), as well as the structure that is expressed within the textual content (e.g., who issues requests and how are they made, whose requests get responses etc.). We find that both gender and gender environment affect the ways power is manifested in dialog, resulting in patterns that reveal the underlying factors. Finally, we show the utility of gender information in the problem of automatically predicting the direction of power between pairs of participants in email interactions.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"86 1","pages":"21-55"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86582750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The connective because can express both highly objective and highly subjective causal relations. In this, it differs from its counterparts in other languages, e.g. Dutch, where two conjunctions omdat and want express more objective and more subjective causal relations, respectively. The present study investigates whether it is possible to anchor the different uses of because in context, examining a large number of syntactic, morphological and semantic cues with a minimal cost of manual annotation. We propose an innovative method of distinguishing between subjective and objective uses of because with the help of information available from an English/Dutch segment of a parallel corpus, which is accompanied by a distributional analysis of contextual features. On the basis of automatic syntactic and morphological annotation of approximately 1500 examples of because , every English sentence is coded semi-automatically for more than twenty contextual variables, such as the part of speech, number, person, semantic class of the subject, modality, etc. We employ logistic regression to determine whether these contextual variables help predict which of the two causal connectives is used in the corresponding Dutch sentences. Our results indicate that a set of semantic and syntactic features that include modality, semantics of referents (subjects), semantic class of the verbal predicate, tense (past vs. non-past) and the presence of evaluative adjectives, are reliable predictors of the more subjective and objective uses of because , demonstrating that this distinction can indeed be anchored in the immediate linguistic context. The proposed method and relevant contextual cues can be used for identification of objective and subjective relationships in discourse.
{"title":"Just because: In search of objective criteria of subjectivity expressed by causal connectives","authors":"N. Levshina, Liesbeth Degand","doi":"10.5087/dad.2017.105","DOIUrl":"https://doi.org/10.5087/dad.2017.105","url":null,"abstract":"The connective because can express both highly objective and highly subjective causal relations. In this, it differs from its counterparts in other languages, e.g. Dutch, where two conjunctions omdat and want express more objective and more subjective causal relations, respectively. The present study investigates whether it is possible to anchor the different uses of because in context, examining a large number of syntactic, morphological and semantic cues with a minimal cost of manual annotation. We propose an innovative method of distinguishing between subjective and objective uses of because with the help of information available from an English/Dutch segment of a parallel corpus, which is accompanied by a distributional analysis of contextual features. On the basis of automatic syntactic and morphological annotation of approximately 1500 examples of because , every English sentence is coded semi-automatically for more than twenty contextual variables, such as the part of speech, number, person, semantic class of the subject, modality, etc. We employ logistic regression to determine whether these contextual variables help predict which of the two causal connectives is used in the corresponding Dutch sentences. Our results indicate that a set of semantic and syntactic features that include modality, semantics of referents (subjects), semantic class of the verbal predicate, tense (past vs. non-past) and the presence of evaluative adjectives, are reliable predictors of the more subjective and objective uses of because , demonstrating that this distinction can indeed be anchored in the immediate linguistic context. The proposed method and relevant contextual cues can be used for identification of objective and subjective relationships in discourse.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"10 1","pages":"132-150"},"PeriodicalIF":0.0,"publicationDate":"2017-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81947881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we construct and train end-to-end neural network-based dialogue systems using an updated version of the recent Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering, which can be both time consuming and expensive. We provide baselines in two different environments: one where models are trained to maximize the log-likelihood of a generated utterance conditioned on the context of the conversation, and one where models are trained to select the correct next response from a list of candidate responses. These are both evaluated on a recall task that we call Next Utterance Classification (NUC), as well as other generation-specific metrics. Finally, we provide a qualitative error analysis to help determine the most promising directions for future research on the Ubuntu Dialogue Corpus, and for end-to-end dialogue systems in general.
{"title":"Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus","authors":"R. Lowe, Nissan Pow, Iulian Serban, Laurent Charlin, Chia-Wei Liu, Joelle Pineau","doi":"10.5087/dad.2017.102","DOIUrl":"https://doi.org/10.5087/dad.2017.102","url":null,"abstract":"In this paper, we construct and train end-to-end neural network-based dialogue systems using an updated version of the recent Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering, which can be both time consuming and expensive. We provide baselines in two different environments: one where models are trained to maximize the log-likelihood of a generated utterance conditioned on the context of the conversation, and one where models are trained to select the correct next response from a list of candidate responses. These are both evaluated on a recall task that we call Next Utterance Classification (NUC), as well as other generation-specific metrics. Finally, we provide a qualitative error analysis to help determine the most promising directions for future research on the Ubuntu Dialogue Corpus, and for end-to-end dialogue systems in general.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"455 ","pages":"31-65"},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72494917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many language learners never acquire truly native-sounding prosody. Previous work has suggested that this involves skill deficits in the dialog-related uses of prosody, and may be attributable to weaknesses with specific prosodic constructions. Using semi-automated methods, we identified 32 of the most common prosodic constructions in English dialog. Examining 90 minutes of six advanced native-Spanish learners conversing in English, there were differences, notably regarding swift turn-taking, alignment, and empathy, but overall their uses of prosodic constructions were largely similar to those of native speakers.
{"title":"Non-Native Differences in Prosodic-Construction Use","authors":"Nigel G. Ward, Paola Gallardo","doi":"10.5087/dad.2017.101","DOIUrl":"https://doi.org/10.5087/dad.2017.101","url":null,"abstract":"Many language learners never acquire truly native-sounding prosody. Previous work has suggested that this involves skill deficits in the dialog-related uses of prosody, and may be attributable to weaknesses with specific prosodic constructions. Using semi-automated methods, we identified 32 of the most common prosodic constructions in English dialog. Examining 90 minutes of six advanced native-Spanish learners conversing in English, there were differences, notably regarding swift turn-taking, alignment, and empathy, but overall their uses of prosodic constructions were largely similar to those of native speakers.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"170 1","pages":"1-30"},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79377451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frances Yung, Kevin Duh, T. Komura, Yuji Matsumoto
Discourse relations can either be explicitly marked by discourse connectives (DCs), such as therefore and but , or implicitly conveyed in natural language utterances. How speakers choose between the two options is a question that is not well understood. In this study, we propose a psycholinguistic model that predicts whether or not speakers will produce an explicit marker given the discourse relation they wish to express. Our model is based on two information-theoretic frameworks: (1) the Rational Speech Acts model, which models the pragmatic interaction between language production and interpretation by Bayesian inference, and (2) the Uniform Information Density theory, which advocates that speakers adjust linguistic redundancy to maintain a uniform rate of information transmission. Specifically, our model quantifies the utility of using or omitting a DC based on the expected surprisal of comprehension, cost of production, and availability of other signals in the rest of the utterance. Experiments based on the Penn Discourse Treebank show that our approach outperforms the state-of-the-art performance at predicting the presence of DCs (Patterson and Kehler, 2013), in addition to giving an explanatory account of the speaker’s choice.
{"title":"A Psycholinguistic Model for the Marking of Discourse Relations","authors":"Frances Yung, Kevin Duh, T. Komura, Yuji Matsumoto","doi":"10.5087/DAD.2017.104","DOIUrl":"https://doi.org/10.5087/DAD.2017.104","url":null,"abstract":"Discourse relations can either be explicitly marked by discourse connectives (DCs), such as therefore and but , or implicitly conveyed in natural language utterances. How speakers choose between the two options is a question that is not well understood. In this study, we propose a psycholinguistic model that predicts whether or not speakers will produce an explicit marker given the discourse relation they wish to express. Our model is based on two information-theoretic frameworks: (1) the Rational Speech Acts model, which models the pragmatic interaction between language production and interpretation by Bayesian inference, and (2) the Uniform Information Density theory, which advocates that speakers adjust linguistic redundancy to maintain a uniform rate of information transmission. Specifically, our model quantifies the utility of using or omitting a DC based on the expected surprisal of comprehension, cost of production, and availability of other signals in the rest of the utterance. Experiments based on the Penn Discourse Treebank show that our approach outperforms the state-of-the-art performance at predicting the presence of DCs (Patterson and Kehler, 2013), in addition to giving an explanatory account of the speaker’s choice.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"5 1","pages":"106-131"},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73721433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marie-Paule Péry-Woodley, L. Ho-Dac, Josette Rebeyrolle, Ludovic Tanguy, Cécile Fabre
This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature markup alongside manual annotation, we explore a method to identify complex discourse markers seen as configurations of cues. The presentation of the background to what is termed " multi-level annotation " is organised around four issues: linearity, complexity of discourse markers, top-down processing, granularity and the multi-level nature of discourse structures. In this context, enumerative structures seem to deserve scrutiny for a number of reasons: they are frequent structures appearing at different granularity levels, they are signalled by a variety of devices appearing to work together in complex ways, and they combine a textual role (discourse organisation) with an ideational role (categorisation). We describe the annotation procedure and experimental framework which resulted in nearly 1,000 enumerative structures being annotated in a diversified corpus of over 600,000 words. The results of two approaches to the rich data produced are then presented: firstly, a descriptive survey highlights considerable variation in length and composition, while showing enumerative structure to be a basic strategy resorted to in all three sub-corpora, and leads to a granularity-based typology of the annotated structures; secondly, recurrent cue configurations—-our " complex markers " —-are identified by the application of data mining methods. The paper ends with perspectives for further exploitation of the data, in particular with respect to the semantic characterisation of enumerative structures.
{"title":"A corpus-driven approach to discourse organisation: from cues to complex markers","authors":"Marie-Paule Péry-Woodley, L. Ho-Dac, Josette Rebeyrolle, Ludovic Tanguy, Cécile Fabre","doi":"10.5087/dad.2017.103","DOIUrl":"https://doi.org/10.5087/dad.2017.103","url":null,"abstract":"This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature markup alongside manual annotation, we explore a method to identify complex discourse markers seen as configurations of cues. The presentation of the background to what is termed \" multi-level annotation \" is organised around four issues: linearity, complexity of discourse markers, top-down processing, granularity and the multi-level nature of discourse structures. In this context, enumerative structures seem to deserve scrutiny for a number of reasons: they are frequent structures appearing at different granularity levels, they are signalled by a variety of devices appearing to work together in complex ways, and they combine a textual role (discourse organisation) with an ideational role (categorisation). We describe the annotation procedure and experimental framework which resulted in nearly 1,000 enumerative structures being annotated in a diversified corpus of over 600,000 words. The results of two approaches to the rich data produced are then presented: firstly, a descriptive survey highlights considerable variation in length and composition, while showing enumerative structure to be a basic strategy resorted to in all three sub-corpora, and leads to a granularity-based typology of the annotated structures; secondly, recurrent cue configurations—-our \" complex markers \" —-are identified by the application of data mining methods. The paper ends with perspectives for further exploitation of the data, in particular with respect to the semantic characterisation of enumerative structures.","PeriodicalId":37604,"journal":{"name":"Dialogue and Discourse","volume":"55 3 1","pages":"66-105"},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90934928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}