Corpora are usually not only made up of words, sentences and plain texts; they usually also have metadata, background information and structural features which can be used to filter searches or pro...
{"title":"Calculating and displaying key labels: the texts, sections, authors and neighbourhoods where words and collocations are likely to be prominent","authors":"Stephen Jeaco","doi":"10.3366/cor.2020.0193","DOIUrl":"https://doi.org/10.3366/cor.2020.0193","url":null,"abstract":"Corpora are usually not only made up of words, sentences and plain texts; they usually also have metadata, background information and structural features which can be used to filter searches or pro...","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2020-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46657574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yukiko Ohashi, N. Katagiri, K. Oka, Michiko Hanada
This paper reports on two research results: (1) designing an English for Specific Purposes (esp) corpus architecture complete with annotations structured by regular expressions; and (2) a case stud...
{"title":"ESP corpus design: compilation of the Veterinary Nursing Medical Chart Corpus and the Veterinary Nursing Wordlist","authors":"Yukiko Ohashi, N. Katagiri, K. Oka, Michiko Hanada","doi":"10.3366/cor.2020.0191","DOIUrl":"https://doi.org/10.3366/cor.2020.0191","url":null,"abstract":"This paper reports on two research results: (1) designing an English for Specific Purposes (esp) corpus architecture complete with annotations structured by regular expressions; and (2) a case stud...","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2020-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47066325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review: McIntyre and Walker. 2019. Corpus Stylistics","authors":"Jordan Smith","doi":"10.3366/cor.2020.0196","DOIUrl":"https://doi.org/10.3366/cor.2020.0196","url":null,"abstract":"","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2020-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46573938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Adolphs, Dawn Knight, Catherine Smith, Dominic T. Price
Spoken corpora have traditionally been assembled through careful recording and transcription of discourse events, a process which is both labour intensive and often restrictive in terms of breadth of recording contexts available. To overcome these potential challenges in spoken corpus compilation, we explore the use of crowdsourcing of language samples that are reported by participants. We investigate the level of precision and recall of the ‘crowd’ when it comes to reporting language they have heard in certain contexts, alongside the use of a crowdsourcing toolkit to facilitate this task. As a focussing device for the selection of reported language samples, we draw on the use of formulaic phrases as an area that has received considerable attention by corpus linguists and applied linguists over the years. We argue that while studying reported language usage instead of actual language-in-use is problematic for several reasons, many of which have been highlighted in the literature on Discourse Completion Tasks ( Schauer and Adolphs, 2006 ), our suggested approach presents several advantages and opportunities for spoken corpus linguistics.
{"title":"Crowdsourcing formulaic phrases: towards a new type of spoken corpus","authors":"S. Adolphs, Dawn Knight, Catherine Smith, Dominic T. Price","doi":"10.3366/COR.2020.0192","DOIUrl":"https://doi.org/10.3366/COR.2020.0192","url":null,"abstract":"Spoken corpora have traditionally been assembled through careful recording and transcription of discourse events, a process which is both labour intensive and often restrictive in terms of breadth of recording contexts available. To overcome these potential challenges in spoken corpus compilation, we explore the use of crowdsourcing of language samples that are reported by participants. We investigate the level of precision and recall of the ‘crowd’ when it comes to reporting language they have heard in certain contexts, alongside the use of a crowdsourcing toolkit to facilitate this task. As a focussing device for the selection of reported language samples, we draw on the use of formulaic phrases as an area that has received considerable attention by corpus linguists and applied linguists over the years. We argue that while studying reported language usage instead of actual language-in-use is problematic for several reasons, many of which have been highlighted in the literature on Discourse Completion Tasks ( Schauer and Adolphs, 2006 ), our suggested approach presents several advantages and opportunities for spoken corpus linguistics.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49051057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study explores the alternation between the mandative subjunctive and its modal alternative with should across native and non-native Englishes. Methodologically, we try to improve on existing standards by investigating over 3,300 occurrences of the alternation from the Corpus of Web-based Global English and annotated for a range of linguistic factors analysed with a forest of conditional inference trees; also, we are exemplifying a new strategy for the use of random or conditional inference forests in corpus-based alternation studies. We obtain a forest with significant prediction accuracies and a good C-score and discuss the strongest predictors of the subjunctive versus should alternation across Englishes. Contrasting with existing research, our multi-factorial results: ( i) suggest that in British English the mandative subjunctive may not be dying out as much as we thought; and ( ii) individual suasive verbs influence speakers' use of the two variants more than their variety of English.
{"title":"Mandative subjunctive versus should in world Englishes: a new take on an old alternation","authors":"Sandra C. Deshors, S. Gries","doi":"10.3366/cor.2020.0195","DOIUrl":"https://doi.org/10.3366/cor.2020.0195","url":null,"abstract":"This study explores the alternation between the mandative subjunctive and its modal alternative with should across native and non-native Englishes. Methodologically, we try to improve on existing standards by investigating over 3,300 occurrences of the alternation from the Corpus of Web-based Global English and annotated for a range of linguistic factors analysed with a forest of conditional inference trees; also, we are exemplifying a new strategy for the use of random or conditional inference forests in corpus-based alternation studies. We obtain a forest with significant prediction accuracies and a good C-score and discuss the strongest predictors of the subjunctive versus should alternation across Englishes. Contrasting with existing research, our multi-factorial results: ( i) suggest that in British English the mandative subjunctive may not be dying out as much as we thought; and ( ii) individual suasive verbs influence speakers' use of the two variants more than their variety of English.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48311872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, I use methods from corpus linguistics to examine patterns pertaining to the representation of women in online Arabic- and English-language political corpora. I highlight the discursi...
{"title":"Sketching women: a corpus-based approach to representations of women's agency in political Internet corpora in Arabic and English","authors":"K. Karimullah","doi":"10.3366/COR.2020.0184","DOIUrl":"https://doi.org/10.3366/COR.2020.0184","url":null,"abstract":"In this paper, I use methods from corpus linguistics to examine patterns pertaining to the representation of women in online Arabic- and English-language political corpora. I highlight the discursi...","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2020-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43224672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}