This paper demonstrates the value of studying co-occurrence ‘quads’ – constellations of four non-adjacent lemmas that consistently co-occur across spans of up to 100 tokens – for understanding discursive change. We map meaning onto quads as ‘discursive concepts’, which encompass encyclopaedic semantics, pragmatics, and context. We investigate a high-frequency quad with high co-occurrence strength in EEBO-TCP: world-heaven-earth-power. We conduct semantic and pragmatic analysis to generate hypotheses regarding discursive change. The quad’s components are semantically underspecified; thus, although the quad indicates a discursive concept, each instantiation of the quad is variable, contingent, and dependent upon context and pragmatic processes for interpretation. We observe how the vague lexemes that constitute building blocks of religious discourse are employed to generate new, timely secular discourses; and we argue that semantic underspecification is the site and source of discursive change. Indeed, the volatile, unstable nature of the component lexical meanings renders them indispensable to early modern debate.
{"title":"Volatile concepts","authors":"S. Fitzmaurice, Seth Mehl","doi":"10.1075/ijcl.22005.fit","DOIUrl":"https://doi.org/10.1075/ijcl.22005.fit","url":null,"abstract":"\u0000This paper demonstrates the value of studying co-occurrence ‘quads’ – constellations of four non-adjacent lemmas that consistently co-occur across spans of up to 100 tokens – for understanding discursive change. We map meaning onto quads as ‘discursive concepts’, which encompass encyclopaedic semantics, pragmatics, and context. We investigate a high-frequency quad with high co-occurrence strength in EEBO-TCP: world-heaven-earth-power. We conduct semantic and pragmatic analysis to generate hypotheses regarding discursive change. The quad’s components are semantically underspecified; thus, although the quad indicates a discursive concept, each instantiation of the quad is variable, contingent, and dependent upon context and pragmatic processes for interpretation. We observe how the vague lexemes that constitute building blocks of religious discourse are employed to generate new, timely secular discourses; and we argue that semantic underspecification is the site and source of discursive change. Indeed, the volatile, unstable nature of the component lexical meanings renders them indispensable to early modern debate.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46980749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study examines usage changes of English-based loanwords and Korean replacement words promoted by the National Institute of Korean Language in a six-year span, using two corpora. It focuses on 18 Korean and anglicized word pairs appearing on the National Institute of Korean Language’s website that purportedly showcase the Institute’s successful efforts to curtail the usage of English words by promoting Korean replacement words. The results indicate that promoting Korean does not necessarily decrease the usage of English, and that the usage of English-based words seems to increase in conjunction with the Korean words. Several Korean words promoted by the National Institute of Korean Language have extremely low frequencies, and some loanwords are being used with various meanings. Commentaries are provided to explain various patterns of observed usage change.
{"title":"A corpus-based study of anglicized neologisms in Korea","authors":"E. Kim","doi":"10.1075/ijcl.20055.kim","DOIUrl":"https://doi.org/10.1075/ijcl.20055.kim","url":null,"abstract":"\u0000 This study examines usage changes of English-based loanwords and Korean replacement words promoted by the National\u0000 Institute of Korean Language in a six-year span, using two corpora. It focuses on 18 Korean and anglicized word pairs appearing on\u0000 the National Institute of Korean Language’s website that purportedly showcase the Institute’s successful efforts to curtail the\u0000 usage of English words by promoting Korean replacement words. The results indicate that promoting Korean does not necessarily\u0000 decrease the usage of English, and that the usage of English-based words seems to increase in conjunction with the Korean words.\u0000 Several Korean words promoted by the National Institute of Korean Language have extremely low frequencies, and some loanwords are\u0000 being used with various meanings. Commentaries are provided to explain various patterns of observed usage change.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43175169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper applies a new approach to the identification of discourses, based on Multiple Correspondence Analysis (MCA), to the study of discourse variation over time. The MCA approach to keywords deals with a major issue with the use of keywords to identify discourses: the allocation of individual keywords to multiple discourses. Yet, as this paper demonstrates, the approach also allows us to observe variation in the prevalence of discourses over time. The MCA approach to keywords allows the allocation of individual texts to multiple discourses based on patterns of keyword co-occurrence. Metadata in the corpus data analysed (here, UK newspaper articles about Islam) can then be used to map those discourses over time, resulting in a clear view of how the discourses vary relative to one another as time progresses. The paper argues that the drivers for these fluctuations are language external; the real-world events reported on in the newspapers.
{"title":"Keywords through time","authors":"Isobelle Clarke, Gavin Brookes, Tony McEnery","doi":"10.1075/ijcl.22011.cla","DOIUrl":"https://doi.org/10.1075/ijcl.22011.cla","url":null,"abstract":"\u0000This paper applies a new approach to the identification of discourses, based on Multiple Correspondence Analysis (MCA), to the study of discourse variation over time. The MCA approach to keywords deals with a major issue with the use of keywords to identify discourses: the allocation of individual keywords to multiple discourses. Yet, as this paper demonstrates, the approach also allows us to observe variation in the prevalence of discourses over time. The MCA approach to keywords allows the allocation of individual texts to multiple discourses based on patterns of keyword co-occurrence. Metadata in the corpus data analysed (here, UK newspaper articles about Islam) can then be used to map those discourses over time, resulting in a clear view of how the discourses vary relative to one another as time progresses. The paper argues that the drivers for these fluctuations are language external; the real-world events reported on in the newspapers.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47439187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper explores variation in lexico-grammatical register features across text lengths in a large-scale sample of Reddit comments. Very short texts are known to be problematic for many statistical methods, so understanding their nature is important for the corpus-linguistic study of social media, where most contributions are short. I show that the frequencies of linguistic features change with comment length, even between longer comments, although longer texts are often considered similar in statistical terms. Moreover, I classify the variation found between short comments of different lengths into two main patterns, although other patterns can also be found, and there is variation even within these patterns. Furthermore, I interpret the observed differences in terms of register variation. For example, shorter comments appear to be more casual and less edited in terms of their feature makeup, whereas narrative and informational registers seem to favor longer comments.
{"title":"Register variation across text lengths","authors":"A. Liimatta","doi":"10.1075/ijcl.20177.lii","DOIUrl":"https://doi.org/10.1075/ijcl.20177.lii","url":null,"abstract":"\u0000This paper explores variation in lexico-grammatical register features across text lengths in a large-scale sample of Reddit comments. Very short texts are known to be problematic for many statistical methods, so understanding their nature is important for the corpus-linguistic study of social media, where most contributions are short. I show that the frequencies of linguistic features change with comment length, even between longer comments, although longer texts are often considered similar in statistical terms. Moreover, I classify the variation found between short comments of different lengths into two main patterns, although other patterns can also be found, and there is variation even within these patterns. Furthermore, I interpret the observed differences in terms of register variation. For example, shorter comments appear to be more casual and less edited in terms of their feature makeup, whereas narrative and informational registers seem to favor longer comments.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44427278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper tracks stylistic variation in the use of two roughly synonymous suffixes, the Romance -ity and the native -ness, during the Early Modern English period. We seek to verify from a statistical viewpoint the claims of Rodríguez-Puente (2020), who reports on a decrease of -ness in favour of -ity in registers representative of the speech-written and formal-informal continua at that time. To this end, we develop new methods of statistical and visual analysis that enable diachronic comparisons of competing processes across subcorpora, building upon an earlier method by Säily and Suomela (2009). Our results confirm that -ity gained ground first in written registers and then spread towards speech-related registers, and we are able to time this change more accurately thanks to a novel periodisation. We also provide strong statistical support indicating that the proportion of -ity was significantly higher in legal registers than in other registers.
{"title":"New methods for analysing diachronic suffix competition across registers","authors":"Paula Rodríguez-Puente, Tanja Säily, J. Suomela","doi":"10.1075/ijcl.22014.rod","DOIUrl":"https://doi.org/10.1075/ijcl.22014.rod","url":null,"abstract":"\u0000This paper tracks stylistic variation in the use of two roughly synonymous suffixes, the Romance -ity and the native -ness, during the Early Modern English period. We seek to verify from a statistical viewpoint the claims of Rodríguez-Puente (2020), who reports on a decrease of -ness in favour of -ity in registers representative of the speech-written and formal-informal continua at that time. To this end, we develop new methods of statistical and visual analysis that enable diachronic comparisons of competing processes across subcorpora, building upon an earlier method by Säily and Suomela (2009). Our results confirm that -ity gained ground first in written registers and then spread towards speech-related registers, and we are able to time this change more accurately thanks to a novel periodisation. We also provide strong statistical support indicating that the proportion of -ity was significantly higher in legal registers than in other registers.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47926633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The aims of this paper are to detect the most problematic issues related to dialogue act annotation in speech corpora and to define basic categories of dialogue acts. I critically examine and test generic schemes that represent different lines of dialogue act annotation: AMI, DART, ISO 24617–2 and SWBD-DAMSL. It is found that the most problematic issues regarding dialogue act annotation are related to the distinction between the semantic and pragmatic meanings of utterances, the annotation of metadiscourse, and the adequacy and informativeness of the tagset. The identified basic dialogue act categories are information providing, information seeking, actions, social acts and metadiscourse. The findings help improve dialogue act annotation.
本文的目的是发现语音语料库中对话行为标注中最具问题的问题,并定义对话行为的基本类别。我严格检查和测试代表不同对话行为注释行的通用方案:AMI, DART, ISO 24617-2和SWBD-DAMSL。研究发现,对话行为标注中存在的最大问题是话语语义和语用意义的区分、元话语的标注以及标记集的充分性和信息性。确定的基本对话行为类别为信息提供、信息寻求、行为、社会行为和元话语。研究结果有助于改进对话行为注释。
{"title":"Annotating dialogue acts in speech data","authors":"D. Verdonik","doi":"10.1075/ijcl.20165.ver","DOIUrl":"https://doi.org/10.1075/ijcl.20165.ver","url":null,"abstract":"\u0000 The aims of this paper are to detect the most problematic issues related to dialogue act annotation in speech\u0000 corpora and to define basic categories of dialogue acts. I critically examine and test generic schemes that represent different\u0000 lines of dialogue act annotation: AMI, DART, ISO 24617–2 and SWBD-DAMSL. It is found that the most problematic issues regarding\u0000 dialogue act annotation are related to the distinction between the semantic and pragmatic meanings of utterances, the annotation\u0000 of metadiscourse, and the adequacy and informativeness of the tagset. The identified basic dialogue act categories are information\u0000 providing, information seeking, actions, social acts and metadiscourse. The findings help improve dialogue act annotation.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47202283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The article focuses on the polysemy and usage patterns of the Polish lexeme głowa “head” and its diminutive główka. Based on corpus methodology and cognitive linguistics analysis, it is argued that the two lexemes are too autonomous in their meanings than predicted by their morphological relatedness. As the two words cover different semantic domains, we observe that the diminutive suffix has developed a new function which signals lexicalization of meaning toward a non-human semantic domain, for example, material objects, plants, etc. Our research contributes to studies on Polish morphology and lexical semantics and to theoretical research on the polysemy of body part terms.
{"title":"Derivation and semantic autonomy","authors":"Iwona Kraska-Szlenk, Beata Wójtowicz","doi":"10.1075/ijcl.20074.kra","DOIUrl":"https://doi.org/10.1075/ijcl.20074.kra","url":null,"abstract":"\u0000 The article focuses on the polysemy and usage patterns of the Polish lexeme głowa “head” and its\u0000 diminutive główka. Based on corpus methodology and cognitive linguistics analysis, it is argued that the two\u0000 lexemes are too autonomous in their meanings than predicted by their morphological relatedness. As the two words cover different\u0000 semantic domains, we observe that the diminutive suffix has developed a new function which signals lexicalization of meaning\u0000 toward a non-human semantic domain, for example, material objects, plants, etc. Our research contributes to studies on Polish\u0000 morphology and lexical semantics and to theoretical research on the polysemy of body part terms.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46948133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Corpus research on questions as reader engagement markers in academic writing typically focuses on direct questions. Such questions are signalled by question marks and are relatively easily searchable in a corpus. However, indirect questions can be more challenging to identify, as they can be introduced by a range of forms. Based on a contrastive analysis of a corpus of English, French, and Spanish economics research articles, this paper provides pertinent evidence on direct and indirect questions as reader engagement markers. Firstly, it shows that direct and indirect questions as reader engagement markers are a rhetorical and generic feature of academic writing in the economics research article and, secondly, it presents a comprehensive list of indirect question illocutionary force indicating devices, valuable for future studies of indirect questions. Methodologically, this paper illustrates a replicable process for functional analysis and discusses the value of theoretically merging corpus and contrastive linguistic approaches.
{"title":"Question illocutionary force indicating devices in academic writing","authors":"Niall Curry","doi":"10.1075/ijcl.20065.cur","DOIUrl":"https://doi.org/10.1075/ijcl.20065.cur","url":null,"abstract":"\u0000Corpus research on questions as reader engagement markers in academic writing typically focuses on direct questions. Such questions are signalled by question marks and are relatively easily searchable in a corpus. However, indirect questions can be more challenging to identify, as they can be introduced by a range of forms. Based on a contrastive analysis of a corpus of English, French, and Spanish economics research articles, this paper provides pertinent evidence on direct and indirect questions as reader engagement markers. Firstly, it shows that direct and indirect questions as reader engagement markers are a rhetorical and generic feature of academic writing in the economics research article and, secondly, it presents a comprehensive list of indirect question illocutionary force indicating devices, valuable for future studies of indirect questions. Methodologically, this paper illustrates a replicable process for functional analysis and discusses the value of theoretically merging corpus and contrastive linguistic approaches.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49168052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article examines the lexically parallel English and German constructions can’t stand somebody/something and jemanden/etwas nicht ausstehen können “not tolerate (someone or something)”, from synchronic, diachronic, and quantitative perspectives. Syntactic and semantic restrictions suggest that the usage of stand and ausstehen in the relevant sense is older than other semantically similar verbs (e.g. English tolerate, German leiden), while quantitative evidence from corpora shows that the can’t stand and nicht ausstehen können constructions are both colligationally stronger than lexical competitors. Evidence from the history of stand indicates that the lexeme stand in the Germanic and other Indo-European languages has a long history of being employed in the relevant sense. The restrictions on usage and the colligational strength of the respective English and German constructions are thus argued to result from the antiquity of the construction and functional competition from other lexemes.
本文从共时性、历时性和定量的角度考察了英语和德语词汇平行结构can 't stand someone /something和jemanden/etwas nicht ausstehen können“不能容忍(某人或某物)”。句法和语义限制表明,stand和ausstehen在相关意义上的使用比其他语义相似的动词(如英语tolerate,德语leiden)更古老,而来自语料库的定量证据表明,can 't stand和night ausstehen können结构的综合能力都强于词汇竞争对手。来自stand历史的证据表明,在日耳曼语和其他印欧语中,词素stand在相关意义上的使用历史悠久。因此,英语和德语结构对用法的限制和各自的整合强度被认为是由于结构的古老和与其他词汇的功能竞争。
{"title":"The rise of colligations","authors":"Olav Hackstein, Ryan Sandell","doi":"10.1075/ijcl.20022.hac","DOIUrl":"https://doi.org/10.1075/ijcl.20022.hac","url":null,"abstract":"\u0000This article examines the lexically parallel English and German constructions can’t stand somebody/something and jemanden/etwas nicht ausstehen können “not tolerate (someone or something)”, from synchronic, diachronic, and quantitative perspectives. Syntactic and semantic restrictions suggest that the usage of stand and ausstehen in the relevant sense is older than other semantically similar verbs (e.g. English tolerate, German leiden), while quantitative evidence from corpora shows that the can’t stand and nicht ausstehen können constructions are both colligationally stronger than lexical competitors. Evidence from the history of stand indicates that the lexeme stand in the Germanic and other Indo-European languages has a long history of being employed in the relevant sense. The restrictions on usage and the colligational strength of the respective English and German constructions are thus argued to result from the antiquity of the construction and functional competition from other lexemes.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45803461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ben Naismith, Alan Juffs, Na-Rae Han, Daniel Zheng
Vocabulary lists of high-frequency lexical items are an important resource in language education and a key product of corpus research. However, no single vocabulary list will be useful for every learning context, with the appropriateness of such lists affected by the corpora on which they are based. This paper investigates the impact of corpus selection on one measure of lexical sophistication, Advanced Guiraud, focusing on two frequency lists originating from an in-house learner corpus (PELIC) and a global learner corpus (Cambridge Learner Corpus). This analysis shows that frequency lists derived from both types of learner corpus can effectively serve as the basis for measuring the development of lexical sophistication, regardless of the specific program of the learners. Therefore, publicly available learner corpus frequency lists can be a reliable resource for stakeholders interested in the lexical gains of language learners.
{"title":"Handle it in-house?","authors":"Ben Naismith, Alan Juffs, Na-Rae Han, Daniel Zheng","doi":"10.1075/ijcl.20024.nai","DOIUrl":"https://doi.org/10.1075/ijcl.20024.nai","url":null,"abstract":"Vocabulary lists of high-frequency lexical items are an important resource in language education and a key product of corpus research. However, no single vocabulary list will be useful for every learning context, with the appropriateness of such lists affected by the corpora on which they are based. This paper investigates the impact of corpus selection on one measure of lexical sophistication, Advanced Guiraud, focusing on two frequency lists originating from an in-house learner corpus (PELIC) and a global learner corpus (Cambridge Learner Corpus). This analysis shows that frequency lists derived from both types of learner corpus can effectively serve as the basis for measuring the development of lexical sophistication, regardless of the specific program of the learners. Therefore, publicly available learner corpus frequency lists can be a reliable resource for stakeholders interested in the lexical gains of language learners.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138518757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}