The present study offers new insights in how the cognitive-semantic analysis of adjectival deontic modality in the mediatized register of fatwa can be methodologically enhanced at both quantitative and qualitative levels. Drawing on the force-dynamics model originated by Talmy (1981, 1988) and developed by Sweetser (1990), the adjectivally modal expressions of obligation and permission have been investigated in an electronic corpus of fatwas (353,293 words falling in 1440 texts). The research data is manipulated by the corpus tool of Wmatrix (Rayson, 2003) with a view to calculating the relevant modal keywords and generating their concordances; further, the interactive register analysis of the tenor in the fatwa discourse is provided in a way that (i) facilitates the concordance reading of the adjectival keywords of deontic modality and (ii) examines the force dynamics underlying these adjectival keywords in terms of their modally interactive meanings. The study has reached three main findings. First, in the specialized corpus of fatwa there are five keywords of adjectival deontic modality: obligatory, obliged, permissible, impermissible, and forbidden. Second, the force dynamics of obligatory, obliged and permissible reveals enacting positive-compulsion force with attitudinal variations of objective and subjective meanings towards real-world content (themes) and participants (questioner and questionee) in the mediatized register of fatwa. Third, complementary to second, the force dynamics of impermissible and forbidden reveals a set of debarring negative-restriction barriers of various forms, viz. personal, collective, generic, and topical, in the same fatwa register.
{"title":"The force dynamics of adjectival deontic modality in the mediatised register of the fatwa: a corpus cognitive–semantic analysis","authors":"A. Youssef","doi":"10.3366/COR.2021.0207","DOIUrl":"https://doi.org/10.3366/COR.2021.0207","url":null,"abstract":"The present study offers new insights in how the cognitive-semantic analysis of adjectival deontic modality in the mediatized register of fatwa can be methodologically enhanced at both quantitative and qualitative levels. Drawing on the force-dynamics model originated by Talmy (1981, 1988) and developed by Sweetser (1990), the adjectivally modal expressions of obligation and permission have been investigated in an electronic corpus of fatwas (353,293 words falling in 1440 texts). The research data is manipulated by the corpus tool of Wmatrix (Rayson, 2003) with a view to calculating the relevant modal keywords and generating their concordances; further, the interactive register analysis of the tenor in the fatwa discourse is provided in a way that (i) facilitates the concordance reading of the adjectival keywords of deontic modality and (ii) examines the force dynamics underlying these adjectival keywords in terms of their modally interactive meanings. The study has reached three main findings. First, in the specialized corpus of fatwa there are five keywords of adjectival deontic modality: obligatory, obliged, permissible, impermissible, and forbidden. Second, the force dynamics of obligatory, obliged and permissible reveals enacting positive-compulsion force with attitudinal variations of objective and subjective meanings towards real-world content (themes) and participants (questioner and questionee) in the mediatized register of fatwa. Third, complementary to second, the force dynamics of impermissible and forbidden reveals a set of debarring negative-restriction barriers of various forms, viz. personal, collective, generic, and topical, in the same fatwa register.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47903014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review: Römer, Cortes and Friginal (eds). 2020. Advances in Corpus-Based Research on Academic Writing: Effects of Discipline, Register, and Writing Expertise. Amsterdam and Philadelphia: John Benjamins","authors":"Larissa Goulart","doi":"10.3366/COR.2021.0212","DOIUrl":"https://doi.org/10.3366/COR.2021.0212","url":null,"abstract":"","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41644103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing availability of large corpora, quantitative corpus analysis is becoming more and more popular as a method for doing linguistic research. This paper uses a new research tool that makes it possible to search syntactically annotated corpora without extensive programming knowledge (CESAR) to study the subjectivity patterns of four Dutch causal connectives. Analyzing a large set of causal relations marked by four of the most frequent Dutch causal connectives (daarom, dus, omdat, and want), the case study aims to corroborate the subjectivity hypothesis established on the basis of smaller scale studies that used manual annotation. The automatic analysis of the subjectivity patterns of Dutch causal connectives illustrates the usability of CESAR in particular and the feasibility of automatic coherence analysis in general. In addition, it generates new insights into the subjectivity patterns of daarom, dus, omdat, and want.
{"title":"Automatic coherence analysis of Dutch: testing the subjectivity hypothesis on a larger scale","authors":"J. Hoek, T. Sanders, W. Spooren","doi":"10.3366/COR.2021.0211","DOIUrl":"https://doi.org/10.3366/COR.2021.0211","url":null,"abstract":"With the increasing availability of large corpora, quantitative corpus analysis is becoming more and more popular as a method for doing linguistic research. This paper uses a new research tool that makes it possible to search syntactically annotated corpora without extensive programming knowledge (CESAR) to study the subjectivity patterns of four Dutch causal connectives. Analyzing a large set of causal relations marked by four of the most frequent Dutch causal connectives (daarom, dus, omdat, and want), the case study aims to corroborate the subjectivity hypothesis established on the basis of smaller scale studies that used manual annotation. The automatic analysis of the subjectivity patterns of Dutch causal connectives illustrates the usability of CESAR in particular and the feasibility of automatic coherence analysis in general. In addition, it generates new insights into the subjectivity patterns of daarom, dus, omdat, and want.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44800476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contrary to the idea which has been widespread for at least a hundred years that women differ substantially from men when they express themselves in English-speaking contexts (e.g., Jespersen, 1922 ; and Steadman, 1935 ), empirical studies have shown that these differences are often minimal and are not due to gender alone (e.g., Eckert, 2008 ; and Baker, 2014 ). This also frequently applies to the way they swear, despite certain preferences which have been documented in empirical studies. With the growing impact that social media now has in our everyday lives, these represent a unique opportunity to study vast quantities of written data. This paper is based on a corpus of about one-million tweets and is an attempt to delve deeper into the analysis of gendered swearword habits. First, the goal is to show that even if there are certain gendered preferences in terms of the choice of swearwords, women and men frequently display similar patterns in using them, thus reinforcing the idea that they are not so linguistically different. Secondly, this paper provides insights into how collocational networks can be used to achieve this, and thus how focussing on differences can be one way to spot similarities across two sub-corpora.
{"title":"‘Eww wtf, what a dumb bitch’: a case study of similitudes inside gender-specific swearing patterns on Twitter","authors":"Michael Gauthier","doi":"10.3366/COR.2021.0208","DOIUrl":"https://doi.org/10.3366/COR.2021.0208","url":null,"abstract":"Contrary to the idea which has been widespread for at least a hundred years that women differ substantially from men when they express themselves in English-speaking contexts (e.g., Jespersen, 1922 ; and Steadman, 1935 ), empirical studies have shown that these differences are often minimal and are not due to gender alone (e.g., Eckert, 2008 ; and Baker, 2014 ). This also frequently applies to the way they swear, despite certain preferences which have been documented in empirical studies. With the growing impact that social media now has in our everyday lives, these represent a unique opportunity to study vast quantities of written data. This paper is based on a corpus of about one-million tweets and is an attempt to delve deeper into the analysis of gendered swearword habits. First, the goal is to show that even if there are certain gendered preferences in terms of the choice of swearwords, women and men frequently display similar patterns in using them, thus reinforcing the idea that they are not so linguistically different. Secondly, this paper provides insights into how collocational networks can be used to achieve this, and thus how focussing on differences can be one way to spot similarities across two sub-corpora.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49229808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent lexical approaches to the identification of language ideologies focus on the application of quantitative corpus-linguistic techniques to large data sets as a way to minimise researcher inference and ensure more objective sampling methods, replicability of analytical procedures, and a higher degree of generalisability ( Fitzsimmons-Doolan, 2014 ; Subtirelu, 2015 ; Vessey, 2017 ; Wright and Brooks, 2019 ; and McEntee-Atalianis and Vessey, 2020 ). Based on two comprehensive, specialised research (11.6 million words) and comparator (22.4 million words) newspaper corpora, this study offers an examination of the effectiveness of the multivariate and univariate statistical techniques, and proposes a three-step approach whereby corpus linguistics and critical discourse analysis are combined to identify ( 1) thematic and ( 2) ideological discourses (cf. ‘d’/’D’ discourses; Gee, 2010 ), and ( 3) language ideologies. In contrast to recent contributions, it is argued that item frequency is not necessarily a reliable or effective indicator of language ideologies but, rather, of language-related discourses which can be examined for implicit and explicit language-ideological content. A combination of multivariate and univariate statistical techniques, and the three-step approach are shown to be a highly effective methodological solution for synchronic and diachronic language ideology and discourse research based on topically/discursively heterogeneous corpora.
{"title":"Capturing Herder: a three-step approach to the identification of language ideologies using corpus linguistics and critical discourse analysis","authors":"Adnan Ajšić","doi":"10.3366/COR.2021.0209","DOIUrl":"https://doi.org/10.3366/COR.2021.0209","url":null,"abstract":"Recent lexical approaches to the identification of language ideologies focus on the application of quantitative corpus-linguistic techniques to large data sets as a way to minimise researcher inference and ensure more objective sampling methods, replicability of analytical procedures, and a higher degree of generalisability ( Fitzsimmons-Doolan, 2014 ; Subtirelu, 2015 ; Vessey, 2017 ; Wright and Brooks, 2019 ; and McEntee-Atalianis and Vessey, 2020 ). Based on two comprehensive, specialised research (11.6 million words) and comparator (22.4 million words) newspaper corpora, this study offers an examination of the effectiveness of the multivariate and univariate statistical techniques, and proposes a three-step approach whereby corpus linguistics and critical discourse analysis are combined to identify ( 1) thematic and ( 2) ideological discourses (cf. ‘d’/’D’ discourses; Gee, 2010 ), and ( 3) language ideologies. In contrast to recent contributions, it is argued that item frequency is not necessarily a reliable or effective indicator of language ideologies but, rather, of language-related discourses which can be examined for implicit and explicit language-ideological content. A combination of multivariate and univariate statistical techniques, and the three-step approach are shown to be a highly effective methodological solution for synchronic and diachronic language ideology and discourse research based on topically/discursively heterogeneous corpora.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44744876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Research on lexical bundles has shed much light on disciplinary influences on the employment of these multi-word expressions in academic discourse, particularly in research articles. Little work, however, has been done on how research paradigms may impact on lexical bundles in academic discourse. This study aims to investigate the extent to which lexical bundles vary in quantitative, qualitative and mixed methods research articles across two disciplines. All four-word lexical bundles were extracted from a specially built corpus of research articles and were analysed for their linguistic structures and discourse functions. The data analyses revealed marked structural and functional variation between different research paradigms and disciplines. Across paradigms, the quantitative articles differed from the qualitative articles by employing significantly more verb phrase bundles and participant-orientated functions whereas the qualitative articles employed significantly more prepositional phrase bundles and text-orientated functions. Across disciplines, the mixed methods articles in education employed significantly more noun phrase bundles and research-orientated functions, whereas the mixed methods articles in psychology used more prepositional bundles and text-orientated functions. These paradigmatic and disciplinary differences in lexical bundles are explained by examining the underlying perceptions of knowledge and knowledge-making practices in different research paradigms and disciplines.
{"title":"A comparative study of lexical bundles across paradigms and disciplines","authors":"Fenglong Cao","doi":"10.3366/COR.2021.0210","DOIUrl":"https://doi.org/10.3366/COR.2021.0210","url":null,"abstract":"Research on lexical bundles has shed much light on disciplinary influences on the employment of these multi-word expressions in academic discourse, particularly in research articles. Little work, however, has been done on how research paradigms may impact on lexical bundles in academic discourse. This study aims to investigate the extent to which lexical bundles vary in quantitative, qualitative and mixed methods research articles across two disciplines. All four-word lexical bundles were extracted from a specially built corpus of research articles and were analysed for their linguistic structures and discourse functions. The data analyses revealed marked structural and functional variation between different research paradigms and disciplines. Across paradigms, the quantitative articles differed from the qualitative articles by employing significantly more verb phrase bundles and participant-orientated functions whereas the qualitative articles employed significantly more prepositional phrase bundles and text-orientated functions. Across disciplines, the mixed methods articles in education employed significantly more noun phrase bundles and research-orientated functions, whereas the mixed methods articles in psychology used more prepositional bundles and text-orientated functions. These paradigmatic and disciplinary differences in lexical bundles are explained by examining the underlying perceptions of knowledge and knowledge-making practices in different research paradigms and disciplines.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43182334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review: Tao. 2018. Russian–Chinese Parallel Corpus-based Research on Translational Texts about Humanities and Social Sciences. Beijing: Science Press","authors":"Zhanhao Jiang","doi":"10.3366/COR.2021.0213","DOIUrl":"https://doi.org/10.3366/COR.2021.0213","url":null,"abstract":"","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43556429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucia Busso, Márton Petykó, S. Atkins, Tim D. Grant
The paper presents a two-part forensic linguistic analysis of an historic collection of abuse letters, sent to individuals in the public eye and individuals’ private homes between 2007 and 2009. We employ the technique of structural topic modelling (stm) to identify distinctions in the core topics of the letters, gauging the value of this relatively under-used methodology in forensic linguistics. Four key topics were identified in the letters, ‘Politics A’ and ‘B’, ‘Healthcare’ and ‘Immigration’, and their coherence, correlation and shifts in topic were evaluated. Following the stm, a qualitative corpus linguistic analysis was undertaken, coding concordance lines according to topic, with the reliability between coders tested. This coding demonstrated that various connected statements within the same topic tend to gain or lose prevalence over time, and ultimately confirmed the consistency of content within the four topics identified through stm throughout the letter series. The discussion and conclusions to the paper reflect on the findings and also consider the utility of these methodologies for linguistics and forensic linguistics in particular. The study demonstrates real value in revisiting a forensic linguistic dataset such as this to test and develop methodologies for the field.
{"title":"Operation Heron: latent topic changes in an abusive letter series","authors":"Lucia Busso, Márton Petykó, S. Atkins, Tim D. Grant","doi":"10.3366/cor.2022.0255","DOIUrl":"https://doi.org/10.3366/cor.2022.0255","url":null,"abstract":"The paper presents a two-part forensic linguistic analysis of an historic collection of abuse letters, sent to individuals in the public eye and individuals’ private homes between 2007 and 2009. We employ the technique of structural topic modelling (stm) to identify distinctions in the core topics of the letters, gauging the value of this relatively under-used methodology in forensic linguistics. Four key topics were identified in the letters, ‘Politics A’ and ‘B’, ‘Healthcare’ and ‘Immigration’, and their coherence, correlation and shifts in topic were evaluated. Following the stm, a qualitative corpus linguistic analysis was undertaken, coding concordance lines according to topic, with the reliability between coders tested. This coding demonstrated that various connected statements within the same topic tend to gain or lose prevalence over time, and ultimately confirmed the consistency of content within the four topics identified through stm throughout the letter series. The discussion and conclusions to the paper reflect on the findings and also consider the utility of these methodologies for linguistics and forensic linguistics in particular. The study demonstrates real value in revisiting a forensic linguistic dataset such as this to test and develop methodologies for the field.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2021-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42546142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes the construction and annotation of the Late Latin Charter Treebank, a set of three dependency treebanks (llct1, llct2 and llct3) which together contain 1,261 Early Medieval Latin documentary texts (i.e., original charters) written in Italy between ad 714 and 1000 (about 594,000 tokens). The paper focusses on matters which a linguistically or philologically inclined user of llct needs to know: the criteria on which the charters were selected, the special characteristics of the annotation types utilised, and the geographical and chronological distribution of the data. In addition to normal queries on forms, lemmas, morphology and syntax, complex philological research settings are enabled by the textual annotation layer of llct, which indicates abbreviated and damaged words, as well as the formulaic and non-formulaic passages of each charter.
{"title":"Late Latin Charter Treebank: contents and annotation","authors":"Timo Korkiakangas","doi":"10.3366/cor.2021.0217","DOIUrl":"https://doi.org/10.3366/cor.2021.0217","url":null,"abstract":"This paper describes the construction and annotation of the Late Latin Charter Treebank, a set of three dependency treebanks (llct1, llct2 and llct3) which together contain 1,261 Early Medieval Latin documentary texts (i.e., original charters) written in Italy between ad 714 and 1000 (about 594,000 tokens). The paper focusses on matters which a linguistically or philologically inclined user of llct needs to know: the criteria on which the charters were selected, the special characteristics of the annotation types utilised, and the geographical and chronological distribution of the data. In addition to normal queries on forms, lemmas, morphology and syntax, complex philological research settings are enabled by the textual annotation layer of llct, which indicates abbreviated and damaged words, as well as the formulaic and non-formulaic passages of each charter.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69516683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}