In corpus pragmatics, most of the research into speech acts still tends to be limited to working with the original, highly abstract, speech-act taxonomies devised by ordinary language philosophers like Austin and Searle. The aim of this article is to illustrate how the use of such restricted taxonomies may lead to oversimplified or potentially misleading impressions regarding the communicative functions expressed in spoken interaction, and to demonstrate how a more elaborate taxonomy, the DART taxonomy (Weisser, 2018), may help us gain better insights into the pragmatic strategies that occur in dialogues. To this end, I will draw on a small sample of dialogues, both from a task-oriented domain and unconstrained interaction, and contrast selected speech-act categorisations on the basis of Searle’s and the DART taxonomy, demonstrating the advantages that arise from using a more fine-grained taxonomy to describe complex verbal exchanges.
{"title":"Speech acts in corpus pragmatics","authors":"M. Weisser","doi":"10.1075/IJCL.19023.WEI","DOIUrl":"https://doi.org/10.1075/IJCL.19023.WEI","url":null,"abstract":"\u0000 In corpus pragmatics, most of the research into speech acts still tends to be limited to working with the original, highly\u0000 abstract, speech-act taxonomies devised by ordinary language philosophers like Austin and Searle. The aim of this article is to illustrate\u0000 how the use of such restricted taxonomies may lead to oversimplified or potentially misleading impressions regarding the communicative\u0000 functions expressed in spoken interaction, and to demonstrate how a more elaborate taxonomy, the DART taxonomy (Weisser, 2018), may help us gain better insights into the pragmatic strategies that occur in dialogues. To this end,\u0000 I will draw on a small sample of dialogues, both from a task-oriented domain and unconstrained interaction, and contrast selected speech-act\u0000 categorisations on the basis of Searle’s and the DART taxonomy, demonstrating the advantages that arise from using a more fine-grained\u0000 taxonomy to describe complex verbal exchanges.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"58658304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract This article presents a corpus-driven sociolinguistic study of Redfern Now – the first major television drama series commissioned, written, acted, directed and produced by Indigenous industry professionals in Australia. The study examines whether corpus linguistic keyword analysis can identify evidence for type indexicality (social demographics, personae) and trait indexicality (stance, personality), with particular attention paid to the potential indexing of Aboriginal and Torres Strait Islander identity. More specifically, the study’s goal is to retrieve and analyse words that are associated with varieties of English in Australia, and with Australian Aboriginal Englishes in particular. To this end, a corpus with dialogue from Redfern Now is compared to a reference corpus of US television dialogue. Results show that Redfern Now features the use of easily recognisable and familiar words (e.g. blackfella[s], deadly; kinship terms), but also shows clear variation among characters. The case study concludes by evaluating the use of keyword analysis for identifying indexicality in telecinematic discourse.
本文介绍了一个语料库驱动的社会语言学研究Redfern Now -第一个主要的电视连续剧委托,编剧,表演,导演和制作的土著行业专业人士在澳大利亚。本研究考察了语料库语言关键词分析是否可以识别类型索引性(社会人口统计、人物)和特征索引性(立场、个性)的证据,并特别关注原住民和托雷斯海峡岛民身份的潜在索引。更具体地说,这项研究的目标是检索和分析与澳大利亚各种英语有关的单词,尤其是与澳大利亚土著英语有关的单词。为此,将Redfern Now的对话语料库与美国电视对话的参考语料库进行了比较。结果表明,Redfern Now的特点是使用容易识别和熟悉的单词(例如blackfella[s], deadly;亲属术语),但也显示出人物之间的明显差异。案例研究最后评估了关键词分析在影视语篇中识别指标性的使用。
{"title":"Keyword analysis and the indexing of Aboriginal and Torres Strait Islander identity","authors":"M. Bednarek","doi":"10.1075/ijcl.00031.bed","DOIUrl":"https://doi.org/10.1075/ijcl.00031.bed","url":null,"abstract":"Abstract This article presents a corpus-driven sociolinguistic study of Redfern Now – the first major television drama series commissioned, written, acted, directed and produced by Indigenous industry professionals in Australia. The study examines whether corpus linguistic keyword analysis can identify evidence for type indexicality (social demographics, personae) and trait indexicality (stance, personality), with particular attention paid to the potential indexing of Aboriginal and Torres Strait Islander identity. More specifically, the study’s goal is to retrieve and analyse words that are associated with varieties of English in Australia, and with Australian Aboriginal Englishes in particular. To this end, a corpus with dialogue from Redfern Now is compared to a reference corpus of US television dialogue. Results show that Redfern Now features the use of easily recognisable and familiar words (e.g. blackfella[s], deadly; kinship terms), but also shows clear variation among characters. The case study concludes by evaluating the use of keyword analysis for identifying indexicality in telecinematic discourse.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44255321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we investigate how deep learning techniques can be applied to discourse pragmatics. As a testcase we analyse heuristic textual practices, defined as linguistic implementations of decision routines in research processes in academic discourse. We develop a complex annotation scheme of pragmalinguistic categories on different levels of granularity and manually annotate a corpus of texts across various scientific disciplines. This is the basis for training recurrent neural networks to classify heuristic textual practices. Our experiments show that the annotation categories are robust enough to be recognised by our models which learn similarities of the sentence-surfaces represented as word embeddings. Our study aims at an iterative human-in-the-loop process in which manual-hermeneutic and algorithmic procedures mutually advance the insight process. It underlines the fact that the interaction between manual and automated methods opens up a promising field for further research, allowing interpretative analyses of complex pragmatic phenomena in large corpora.
{"title":"Classifying heuristic textual practices in academic discourse","authors":"Maria Becker, M. Bender, Marcus Müller","doi":"10.1075/ijcl.19097.bec","DOIUrl":"https://doi.org/10.1075/ijcl.19097.bec","url":null,"abstract":"In this paper, we investigate how deep learning techniques can be applied to discourse pragmatics. As a testcase we analyse heuristic textual practices, defined as linguistic implementations of decision routines in research processes in academic discourse. We develop a complex annotation scheme of pragmalinguistic categories on different levels of granularity and manually annotate a corpus of texts across various scientific disciplines. This is the basis for training recurrent neural networks to classify heuristic textual practices. Our experiments show that the annotation categories are robust enough to be recognised by our models which learn similarities of the sentence-surfaces represented as word embeddings. Our study aims at an iterative human-in-the-loop process in which manual-hermeneutic and algorithmic procedures mutually advance the insight process. It underlines the fact that the interaction between manual and automated methods opens up a promising field for further research, allowing interpretative analyses of complex pragmatic phenomena in large corpora.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44259463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article reviews Overcoming Challenges in Corpus Construction: The Spoken British National Corpus 2014
本文综述了《克服语料库建设中的挑战:2014年英国国家口语语料库》
{"title":"Love, R. (2020). Overcoming Challenges in Corpus Construction: The spoken British National Corpus 2014","authors":"Jiawei Wang","doi":"10.1075/ijcl.00032.wan","DOIUrl":"https://doi.org/10.1075/ijcl.00032.wan","url":null,"abstract":"This article reviews Overcoming Challenges in Corpus Construction: The Spoken British National Corpus 2014","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42299166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract This paper outlines the construction of the corpus Alpenwort, a large, genre-based corpus of German texts on alpinism. We report on issues related to building the corpus from the Austrian Alpine Club Journal (1869–2010). First, a general description of our data and the project phases from digitization and annotation to publication is given. We focus on the most interesting challenges that the diverse layouts and the extensive use of Fraktur typefacing posed for optical layout recognition and optical character recognition (OCR) as well as post correction. The corrected data was lemmatized and annotated with part-of-speech information including named entities as well as TEI-conformant metadata. The resulting 19.9-million-word corpus is designed to be queried using CQPweb and Hyperbase and can be accessed freely online. Lastly, we give a short roadmap of current and future expansions and improvements as corpus data has been and is being enhanced in follow-up projects.
{"title":"Lima or cima?","authors":"C. Posch, Gerhard Rampl","doi":"10.1075/IJCL.19094.POS","DOIUrl":"https://doi.org/10.1075/IJCL.19094.POS","url":null,"abstract":"Abstract This paper outlines the construction of the corpus Alpenwort, a large, genre-based corpus of German texts on alpinism. We report on issues related to building the corpus from the Austrian Alpine Club Journal (1869–2010). First, a general description of our data and the project phases from digitization and annotation to publication is given. We focus on the most interesting challenges that the diverse layouts and the extensive use of Fraktur typefacing posed for optical layout recognition and optical character recognition (OCR) as well as post correction. The corrected data was lemmatized and annotated with part-of-speech information including named entities as well as TEI-conformant metadata. The resulting 19.9-million-word corpus is designed to be queried using CQPweb and Hyperbase and can be accessed freely online. Lastly, we give a short roadmap of current and future expansions and improvements as corpus data has been and is being enhanced in follow-up projects.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47764052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract This paper presents the first entirely linguistic typology of contemporary American television, derived from a multi-dimensional (MD) analysis of the USTV corpus. The USTV corpus comprises 930 texts from 191 different TV programs, classified into 31 different registers (including nine telecinematic ones: drama series, miniseries, movies, sitcoms, soap operas, general animation, children’s animation, short-feature animation, and children’s and teens’ shows). The linguistic typology we present in this study is based on the linguistic characteristics present in the individual programs, with no a priori textual categorizations. A cluster analysis grouped the individual programs into clusters that shared similar dimensional profiles. The resulting typology comprises nine different text types – namely Presentation of information, Opinion and discussion, Analysis and debate, Description, Interactive recount, Engaging demonstration, Playful discourse, Simplified interaction, and Simulated conversation. The paper discusses and illustrates each text type and considers how telecinematic discourse relates to each of them.
{"title":"A linguistic typology of American television","authors":"Tony Berber Sardinha, M. Pinto","doi":"10.1075/IJCL.00039.BER","DOIUrl":"https://doi.org/10.1075/IJCL.00039.BER","url":null,"abstract":"Abstract This paper presents the first entirely linguistic typology of contemporary American television, derived from a multi-dimensional (MD) analysis of the USTV corpus. The USTV corpus comprises 930 texts from 191 different TV programs, classified into 31 different registers (including nine telecinematic ones: drama series, miniseries, movies, sitcoms, soap operas, general animation, children’s animation, short-feature animation, and children’s and teens’ shows). The linguistic typology we present in this study is based on the linguistic characteristics present in the individual programs, with no a priori textual categorizations. A cluster analysis grouped the individual programs into clusters that shared similar dimensional profiles. The resulting typology comprises nine different text types – namely Presentation of information, Opinion and discussion, Analysis and debate, Description, Interactive recount, Engaging demonstration, Playful discourse, Simplified interaction, and Simulated conversation. The paper discusses and illustrates each text type and considers how telecinematic discourse relates to each of them.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47273436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Previous corpus-based studies, which have mostly focused on a particular film or series, have identified various key characteristics of telecinematic language. However, a restriction on those results applies as regards the stability of findings across time and across individual productions. To address this gap, and following calls for more nuanced perspectives on telecinematic language as a whole, this study re-assesses a number of claims pertaining to lexical and lexicogrammatical aspects through a diachronic lens. To this end, it uses the Northern American sections of the new Movie and TV Corpora, multi-million word corpora compiled from subtitles of a wide range of film and series genres in the English-speaking world from the 20th and 21st century. Overall, the diachronic view of the data is suggestive of a highly complex nature of telecinematic language, with levels of emotionality and informality increasing over time for most items tested.
{"title":"A diachronic perspective on telecinematic language","authors":"Valentin Werner","doi":"10.1075/IJCL.00036.WER","DOIUrl":"https://doi.org/10.1075/IJCL.00036.WER","url":null,"abstract":"Abstract Previous corpus-based studies, which have mostly focused on a particular film or series, have identified various key characteristics of telecinematic language. However, a restriction on those results applies as regards the stability of findings across time and across individual productions. To address this gap, and following calls for more nuanced perspectives on telecinematic language as a whole, this study re-assesses a number of claims pertaining to lexical and lexicogrammatical aspects through a diachronic lens. To this end, it uses the Northern American sections of the new Movie and TV Corpora, multi-million word corpora compiled from subtitles of a wide range of film and series genres in the English-speaking world from the 20th and 21st century. Overall, the diachronic view of the data is suggestive of a highly complex nature of telecinematic language, with levels of emotionality and informality increasing over time for most items tested.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47515883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Analyzing variation in language features in literature and telecinematic discourse provides valuable insights into society’s shifting values and perspectives. In this study, we carry out a keyword analysis on the language of three series of Star Trek television dialogues, broadcast in the 1960s, 1980s, and 1990s, from two perspectives: (i) keywords across the three series highlighting words that are unique to one series in contrast to the other two, providing insights about changes of foci across time; (ii) keywords in relation to gender depicting potential differences in gender roles and how these may change through time across the series.
{"title":"Language use in pop culture over three decades","authors":"Enikó Csomay, Ryan Young","doi":"10.1075/IJCL.00037.CSO","DOIUrl":"https://doi.org/10.1075/IJCL.00037.CSO","url":null,"abstract":"Abstract Analyzing variation in language features in literature and telecinematic discourse provides valuable insights into society’s shifting values and perspectives. In this study, we carry out a keyword analysis on the language of three series of Star Trek television dialogues, broadcast in the 1960s, 1980s, and 1990s, from two perspectives: (i) keywords across the three series highlighting words that are unique to one series in contrast to the other two, providing insights about changes of foci across time; (ii) keywords in relation to gender depicting potential differences in gender roles and how these may change through time across the series.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44270333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corpus approaches to telecinematic language","authors":"M. Bednarek, M. Pinto, Valentin Werner","doi":"10.1075/IJCL.00034.INT","DOIUrl":"https://doi.org/10.1075/IJCL.00034.INT","url":null,"abstract":"","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44552861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Cvrček, Zuzana Laubeová, D. Lukeš, Petra Poukarová, Anna Řehořková, A. Zasina
Abstract This paper investigates the contribution of author/idiolect vs. register/type-of-text – as the most salient factors influencing the final shape of a text – towards explaining the variation observed in Czech texts. Since it is almost impossible to explore the effect of these factors on authentic data, we used elicited letters collected in a fully crossed experimental design (representative sample of 200 authors × four elicitation scenarios serving as a proxy to register variation). The variation encompassed by the elicited texts is analyzed through the lens of a general-purpose multi-dimensional model of Czech. Using triangulation via three established statistical methods and one devised for the purpose of this study, we find that register matters a great deal, explaining 1.5 times as much variation overall as idiolect. This should be taken into account when designing research in sociolinguistics or variation studies in general.
{"title":"Author and register as sources of variation","authors":"V. Cvrček, Zuzana Laubeová, D. Lukeš, Petra Poukarová, Anna Řehořková, A. Zasina","doi":"10.1075/IJCL.19020.CVR","DOIUrl":"https://doi.org/10.1075/IJCL.19020.CVR","url":null,"abstract":"Abstract This paper investigates the contribution of author/idiolect vs. register/type-of-text – as the most salient factors influencing the final shape of a text – towards explaining the variation observed in Czech texts. Since it is almost impossible to explore the effect of these factors on authentic data, we used elicited letters collected in a fully crossed experimental design (representative sample of 200 authors × four elicitation scenarios serving as a proxy to register variation). The variation encompassed by the elicited texts is analyzed through the lens of a general-purpose multi-dimensional model of Czech. Using triangulation via three established statistical methods and one devised for the purpose of this study, we find that register matters a great deal, explaining 1.5 times as much variation overall as idiolect. This should be taken into account when designing research in sociolinguistics or variation studies in general.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45098377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}