MorphInd 2 ( Larasati et al., 2011 ) is a state-of-the-art morphological analyser for Indonesian. To date, there has not been any comprehensive evaluation of the morphological annotation scheme which MorphInd implements. My evaluation of this annotation scheme reveals a number of significant drawbacks. Some analytical features encoded in MorphInd's tagset seem not to reflect features actually present in Indonesian morphology, while certain common features in the analysis of Indonesian are absent. Likewise, the Part of Speech (pos) hierarchy in the MorphInd tagset does not reflect the usual pos hierarchy used by Indonesian reference grammars. Moreover, the MorphInd output does not link morphological tags to the corresponding morpheme. Finally, a number of issues which might problematise text/corpus querying in the annotation's layout are observable, particularly relating to affixes, reduplication, and the affix–reduplication interface.
{"title":"An evaluation of MorphInd's morphological annotation scheme for Indonesian","authors":"Prihantoro","doi":"10.3366/cor.2021.0221","DOIUrl":"https://doi.org/10.3366/cor.2021.0221","url":null,"abstract":"MorphInd 2 ( Larasati et al., 2011 ) is a state-of-the-art morphological analyser for Indonesian. To date, there has not been any comprehensive evaluation of the morphological annotation scheme which MorphInd implements. My evaluation of this annotation scheme reveals a number of significant drawbacks. Some analytical features encoded in MorphInd's tagset seem not to reflect features actually present in Indonesian morphology, while certain common features in the analysis of Indonesian are absent. Likewise, the Part of Speech (pos) hierarchy in the MorphInd tagset does not reflect the usual pos hierarchy used by Indonesian reference grammars. Moreover, the MorphInd output does not link morphological tags to the corresponding morpheme. Finally, a number of issues which might problematise text/corpus querying in the annotation's layout are observable, particularly relating to affixes, reduplication, and the affix–reduplication interface.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44733248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study presents a register analysis of pop lyrics. To this end, it applies multi-dimensional register analysis to empirically test claims regarding the allegedly conversational nature of pop lyrics. It thus follows broader calls for the linguistic exploration of performed language as represented in non-canonical pop culture registers. This text-linguistic investigation relies on a corpus of contemporary pop lyrics and uses the Multidimensional Analysis Tagger ( Nini, 2018 ), software that replicates Biber's (1988) tagger, to identify register features to contrast lyrics with other varieties of text. In addition, the n-gram and keyword functionalities of a concordancer are used for establishing register markers and style features to identify characteristic properties of pop lyrics. In line with earlier claims, it becomes apparent that pop lyrics indeed carry some conversational force despite situational factors being indicative of planned and performed production. Furthermore, this analysis identifies additional features that are highly distinctive of pop lyrics ( versus general conversation), and is suggestive of the special status of this register on the speech-writing continuum.
{"title":"Catchy and conversational? A register analysis of pop lyrics","authors":"Valentin Werner","doi":"10.3366/cor.2021.0219","DOIUrl":"https://doi.org/10.3366/cor.2021.0219","url":null,"abstract":"This study presents a register analysis of pop lyrics. To this end, it applies multi-dimensional register analysis to empirically test claims regarding the allegedly conversational nature of pop lyrics. It thus follows broader calls for the linguistic exploration of performed language as represented in non-canonical pop culture registers. This text-linguistic investigation relies on a corpus of contemporary pop lyrics and uses the Multidimensional Analysis Tagger ( Nini, 2018 ), software that replicates Biber's (1988) tagger, to identify register features to contrast lyrics with other varieties of text. In addition, the n-gram and keyword functionalities of a concordancer are used for establishing register markers and style features to identify characteristic properties of pop lyrics. In line with earlier claims, it becomes apparent that pop lyrics indeed carry some conversational force despite situational factors being indicative of planned and performed production. Furthermore, this analysis identifies additional features that are highly distinctive of pop lyrics ( versus general conversation), and is suggestive of the special status of this register on the speech-writing continuum.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49612690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The present study offers new insights in how the cognitive-semantic analysis of adjectival deontic modality in the mediatized register of fatwa can be methodologically enhanced at both quantitative and qualitative levels. Drawing on the force-dynamics model originated by Talmy (1981, 1988) and developed by Sweetser (1990), the adjectivally modal expressions of obligation and permission have been investigated in an electronic corpus of fatwas (353,293 words falling in 1440 texts). The research data is manipulated by the corpus tool of Wmatrix (Rayson, 2003) with a view to calculating the relevant modal keywords and generating their concordances; further, the interactive register analysis of the tenor in the fatwa discourse is provided in a way that (i) facilitates the concordance reading of the adjectival keywords of deontic modality and (ii) examines the force dynamics underlying these adjectival keywords in terms of their modally interactive meanings. The study has reached three main findings. First, in the specialized corpus of fatwa there are five keywords of adjectival deontic modality: obligatory, obliged, permissible, impermissible, and forbidden. Second, the force dynamics of obligatory, obliged and permissible reveals enacting positive-compulsion force with attitudinal variations of objective and subjective meanings towards real-world content (themes) and participants (questioner and questionee) in the mediatized register of fatwa. Third, complementary to second, the force dynamics of impermissible and forbidden reveals a set of debarring negative-restriction barriers of various forms, viz. personal, collective, generic, and topical, in the same fatwa register.
{"title":"The force dynamics of adjectival deontic modality in the mediatised register of the fatwa: a corpus cognitive–semantic analysis","authors":"A. Youssef","doi":"10.3366/COR.2021.0207","DOIUrl":"https://doi.org/10.3366/COR.2021.0207","url":null,"abstract":"The present study offers new insights in how the cognitive-semantic analysis of adjectival deontic modality in the mediatized register of fatwa can be methodologically enhanced at both quantitative and qualitative levels. Drawing on the force-dynamics model originated by Talmy (1981, 1988) and developed by Sweetser (1990), the adjectivally modal expressions of obligation and permission have been investigated in an electronic corpus of fatwas (353,293 words falling in 1440 texts). The research data is manipulated by the corpus tool of Wmatrix (Rayson, 2003) with a view to calculating the relevant modal keywords and generating their concordances; further, the interactive register analysis of the tenor in the fatwa discourse is provided in a way that (i) facilitates the concordance reading of the adjectival keywords of deontic modality and (ii) examines the force dynamics underlying these adjectival keywords in terms of their modally interactive meanings. The study has reached three main findings. First, in the specialized corpus of fatwa there are five keywords of adjectival deontic modality: obligatory, obliged, permissible, impermissible, and forbidden. Second, the force dynamics of obligatory, obliged and permissible reveals enacting positive-compulsion force with attitudinal variations of objective and subjective meanings towards real-world content (themes) and participants (questioner and questionee) in the mediatized register of fatwa. Third, complementary to second, the force dynamics of impermissible and forbidden reveals a set of debarring negative-restriction barriers of various forms, viz. personal, collective, generic, and topical, in the same fatwa register.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":"16 1","pages":"1-30"},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47903014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review: Römer, Cortes and Friginal (eds). 2020. Advances in Corpus-Based Research on Academic Writing: Effects of Discipline, Register, and Writing Expertise. Amsterdam and Philadelphia: John Benjamins","authors":"Larissa Goulart","doi":"10.3366/COR.2021.0212","DOIUrl":"https://doi.org/10.3366/COR.2021.0212","url":null,"abstract":"","PeriodicalId":44933,"journal":{"name":"Corpora","volume":"16 1","pages":"157-159"},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41644103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing availability of large corpora, quantitative corpus analysis is becoming more and more popular as a method for doing linguistic research. This paper uses a new research tool that makes it possible to search syntactically annotated corpora without extensive programming knowledge (CESAR) to study the subjectivity patterns of four Dutch causal connectives. Analyzing a large set of causal relations marked by four of the most frequent Dutch causal connectives (daarom, dus, omdat, and want), the case study aims to corroborate the subjectivity hypothesis established on the basis of smaller scale studies that used manual annotation. The automatic analysis of the subjectivity patterns of Dutch causal connectives illustrates the usability of CESAR in particular and the feasibility of automatic coherence analysis in general. In addition, it generates new insights into the subjectivity patterns of daarom, dus, omdat, and want.
{"title":"Automatic coherence analysis of Dutch: testing the subjectivity hypothesis on a larger scale","authors":"J. Hoek, T. Sanders, W. Spooren","doi":"10.3366/COR.2021.0211","DOIUrl":"https://doi.org/10.3366/COR.2021.0211","url":null,"abstract":"With the increasing availability of large corpora, quantitative corpus analysis is becoming more and more popular as a method for doing linguistic research. This paper uses a new research tool that makes it possible to search syntactically annotated corpora without extensive programming knowledge (CESAR) to study the subjectivity patterns of four Dutch causal connectives. Analyzing a large set of causal relations marked by four of the most frequent Dutch causal connectives (daarom, dus, omdat, and want), the case study aims to corroborate the subjectivity hypothesis established on the basis of smaller scale studies that used manual annotation. The automatic analysis of the subjectivity patterns of Dutch causal connectives illustrates the usability of CESAR in particular and the feasibility of automatic coherence analysis in general. In addition, it generates new insights into the subjectivity patterns of daarom, dus, omdat, and want.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":"16 1","pages":"129-155"},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44800476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent lexical approaches to the identification of language ideologies focus on the application of quantitative corpus-linguistic techniques to large data sets as a way to minimise researcher inference and ensure more objective sampling methods, replicability of analytical procedures, and a higher degree of generalisability ( Fitzsimmons-Doolan, 2014 ; Subtirelu, 2015 ; Vessey, 2017 ; Wright and Brooks, 2019 ; and McEntee-Atalianis and Vessey, 2020 ). Based on two comprehensive, specialised research (11.6 million words) and comparator (22.4 million words) newspaper corpora, this study offers an examination of the effectiveness of the multivariate and univariate statistical techniques, and proposes a three-step approach whereby corpus linguistics and critical discourse analysis are combined to identify ( 1) thematic and ( 2) ideological discourses (cf. ‘d’/’D’ discourses; Gee, 2010 ), and ( 3) language ideologies. In contrast to recent contributions, it is argued that item frequency is not necessarily a reliable or effective indicator of language ideologies but, rather, of language-related discourses which can be examined for implicit and explicit language-ideological content. A combination of multivariate and univariate statistical techniques, and the three-step approach are shown to be a highly effective methodological solution for synchronic and diachronic language ideology and discourse research based on topically/discursively heterogeneous corpora.
{"title":"Capturing Herder: a three-step approach to the identification of language ideologies using corpus linguistics and critical discourse analysis","authors":"Adnan Ajšić","doi":"10.3366/COR.2021.0209","DOIUrl":"https://doi.org/10.3366/COR.2021.0209","url":null,"abstract":"Recent lexical approaches to the identification of language ideologies focus on the application of quantitative corpus-linguistic techniques to large data sets as a way to minimise researcher inference and ensure more objective sampling methods, replicability of analytical procedures, and a higher degree of generalisability ( Fitzsimmons-Doolan, 2014 ; Subtirelu, 2015 ; Vessey, 2017 ; Wright and Brooks, 2019 ; and McEntee-Atalianis and Vessey, 2020 ). Based on two comprehensive, specialised research (11.6 million words) and comparator (22.4 million words) newspaper corpora, this study offers an examination of the effectiveness of the multivariate and univariate statistical techniques, and proposes a three-step approach whereby corpus linguistics and critical discourse analysis are combined to identify ( 1) thematic and ( 2) ideological discourses (cf. ‘d’/’D’ discourses; Gee, 2010 ), and ( 3) language ideologies. In contrast to recent contributions, it is argued that item frequency is not necessarily a reliable or effective indicator of language ideologies but, rather, of language-related discourses which can be examined for implicit and explicit language-ideological content. A combination of multivariate and univariate statistical techniques, and the three-step approach are shown to be a highly effective methodological solution for synchronic and diachronic language ideology and discourse research based on topically/discursively heterogeneous corpora.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":"16 1","pages":"63-95"},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44744876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contrary to the idea which has been widespread for at least a hundred years that women differ substantially from men when they express themselves in English-speaking contexts (e.g., Jespersen, 1922 ; and Steadman, 1935 ), empirical studies have shown that these differences are often minimal and are not due to gender alone (e.g., Eckert, 2008 ; and Baker, 2014 ). This also frequently applies to the way they swear, despite certain preferences which have been documented in empirical studies. With the growing impact that social media now has in our everyday lives, these represent a unique opportunity to study vast quantities of written data. This paper is based on a corpus of about one-million tweets and is an attempt to delve deeper into the analysis of gendered swearword habits. First, the goal is to show that even if there are certain gendered preferences in terms of the choice of swearwords, women and men frequently display similar patterns in using them, thus reinforcing the idea that they are not so linguistically different. Secondly, this paper provides insights into how collocational networks can be used to achieve this, and thus how focussing on differences can be one way to spot similarities across two sub-corpora.
{"title":"‘Eww wtf, what a dumb bitch’: a case study of similitudes inside gender-specific swearing patterns on Twitter","authors":"Michael Gauthier","doi":"10.3366/COR.2021.0208","DOIUrl":"https://doi.org/10.3366/COR.2021.0208","url":null,"abstract":"Contrary to the idea which has been widespread for at least a hundred years that women differ substantially from men when they express themselves in English-speaking contexts (e.g., Jespersen, 1922 ; and Steadman, 1935 ), empirical studies have shown that these differences are often minimal and are not due to gender alone (e.g., Eckert, 2008 ; and Baker, 2014 ). This also frequently applies to the way they swear, despite certain preferences which have been documented in empirical studies. With the growing impact that social media now has in our everyday lives, these represent a unique opportunity to study vast quantities of written data. This paper is based on a corpus of about one-million tweets and is an attempt to delve deeper into the analysis of gendered swearword habits. First, the goal is to show that even if there are certain gendered preferences in terms of the choice of swearwords, women and men frequently display similar patterns in using them, thus reinforcing the idea that they are not so linguistically different. Secondly, this paper provides insights into how collocational networks can be used to achieve this, and thus how focussing on differences can be one way to spot similarities across two sub-corpora.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":"16 1","pages":"31-61"},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49229808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Research on lexical bundles has shed much light on disciplinary influences on the employment of these multi-word expressions in academic discourse, particularly in research articles. Little work, however, has been done on how research paradigms may impact on lexical bundles in academic discourse. This study aims to investigate the extent to which lexical bundles vary in quantitative, qualitative and mixed methods research articles across two disciplines. All four-word lexical bundles were extracted from a specially built corpus of research articles and were analysed for their linguistic structures and discourse functions. The data analyses revealed marked structural and functional variation between different research paradigms and disciplines. Across paradigms, the quantitative articles differed from the qualitative articles by employing significantly more verb phrase bundles and participant-orientated functions whereas the qualitative articles employed significantly more prepositional phrase bundles and text-orientated functions. Across disciplines, the mixed methods articles in education employed significantly more noun phrase bundles and research-orientated functions, whereas the mixed methods articles in psychology used more prepositional bundles and text-orientated functions. These paradigmatic and disciplinary differences in lexical bundles are explained by examining the underlying perceptions of knowledge and knowledge-making practices in different research paradigms and disciplines.
{"title":"A comparative study of lexical bundles across paradigms and disciplines","authors":"Fenglong Cao","doi":"10.3366/COR.2021.0210","DOIUrl":"https://doi.org/10.3366/COR.2021.0210","url":null,"abstract":"Research on lexical bundles has shed much light on disciplinary influences on the employment of these multi-word expressions in academic discourse, particularly in research articles. Little work, however, has been done on how research paradigms may impact on lexical bundles in academic discourse. This study aims to investigate the extent to which lexical bundles vary in quantitative, qualitative and mixed methods research articles across two disciplines. All four-word lexical bundles were extracted from a specially built corpus of research articles and were analysed for their linguistic structures and discourse functions. The data analyses revealed marked structural and functional variation between different research paradigms and disciplines. Across paradigms, the quantitative articles differed from the qualitative articles by employing significantly more verb phrase bundles and participant-orientated functions whereas the qualitative articles employed significantly more prepositional phrase bundles and text-orientated functions. Across disciplines, the mixed methods articles in education employed significantly more noun phrase bundles and research-orientated functions, whereas the mixed methods articles in psychology used more prepositional bundles and text-orientated functions. These paradigmatic and disciplinary differences in lexical bundles are explained by examining the underlying perceptions of knowledge and knowledge-making practices in different research paradigms and disciplines.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":"16 1","pages":"97-128"},"PeriodicalIF":0.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43182334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}