Abstract In the absence of a diachronic corpus or a synchronic corpus tagged for speakers’ age, substantiating the presence of semantic change and the stage of change ― initial or advanced ― are challenging tasks. In the present study I introduce three methods for overcoming such difficulties by extracting various kinds of evidence from a synchronic corpus not tagged for speakers’ age. All three methods are based on speakers’ metalinguistic activity. Two of them are of a psycholinguistic nature and the third is of a sociolinguistic nature. Not only do these methods provide data hitherto overlooked by researchers for detecting semantic change, but they can also minimize the researchers’ need for interpretative interventions with regard to speakers’ communicative intentions, thus improving the quality of the analysis.
{"title":"Let my speakers talk: metalinguistic activity can indicate semantic change","authors":"Israela Becker","doi":"10.1515/cllt-2023-0022","DOIUrl":"https://doi.org/10.1515/cllt-2023-0022","url":null,"abstract":"Abstract In the absence of a diachronic corpus or a synchronic corpus tagged for speakers’ age, substantiating the presence of semantic change and the stage of change ― initial or advanced ― are challenging tasks. In the present study I introduce three methods for overcoming such difficulties by extracting various kinds of evidence from a synchronic corpus not tagged for speakers’ age. All three methods are based on speakers’ metalinguistic activity. Two of them are of a psycholinguistic nature and the third is of a sociolinguistic nature. Not only do these methods provide data hitherto overlooked by researchers for detecting semantic change, but they can also minimize the researchers’ need for interpretative interventions with regard to speakers’ communicative intentions, thus improving the quality of the analysis.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49002825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract As a cognitive ability to construe events in alternate ways, aspectuality has aroused many researchers’ academic attention; however, the concatenation of aspect markers in a clause is understudied in previous studies. The present paper follows a bidimensional approach of aspect to conduct a corpus-based aspectual analysis of verb concatenation with imperfective markers zhe (henceforth VCIMs zhe) in Mandarin. Specifically, to construe the cognitive inference mechanism of aspect, a multifactorial analysis of VCIMs zhe by the statistical techniques of multiple correspondence analysis, conditional inference trees and conditional random forests is carried out to explore the prototypical temporal features of verbs in two slots, predict the aspectual meanings of two imperfective markers zhe, and also discuss the conditional importance of factors such as durativity, dynamicity, telicity, boundedness, and slot in identifying the situation types of two verbs or verb phrases in VCIMs zhe. Methodologically, a usage-based multifactorial analysis of VCIMs zhe complements previous introspective studies on aspect marking. Theoretically, a corpus-based aspectual account of VCIMs zhe, one type of complex viewpoint aspects, expands traditional studies on Chinese aspect system, supplies evidence for aspect typology cross-linguistically, and provides reference for second language acquisition of usage patterns of zhe by non-native speakers.
{"title":"A multifactorial aspectual analysis of verb concatenation with imperfective markers zhe in Mandarin","authors":"Junjie Jin, F. Li","doi":"10.1515/cllt-2022-0080","DOIUrl":"https://doi.org/10.1515/cllt-2022-0080","url":null,"abstract":"Abstract As a cognitive ability to construe events in alternate ways, aspectuality has aroused many researchers’ academic attention; however, the concatenation of aspect markers in a clause is understudied in previous studies. The present paper follows a bidimensional approach of aspect to conduct a corpus-based aspectual analysis of verb concatenation with imperfective markers zhe (henceforth VCIMs zhe) in Mandarin. Specifically, to construe the cognitive inference mechanism of aspect, a multifactorial analysis of VCIMs zhe by the statistical techniques of multiple correspondence analysis, conditional inference trees and conditional random forests is carried out to explore the prototypical temporal features of verbs in two slots, predict the aspectual meanings of two imperfective markers zhe, and also discuss the conditional importance of factors such as durativity, dynamicity, telicity, boundedness, and slot in identifying the situation types of two verbs or verb phrases in VCIMs zhe. Methodologically, a usage-based multifactorial analysis of VCIMs zhe complements previous introspective studies on aspect marking. Theoretically, a corpus-based aspectual account of VCIMs zhe, one type of complex viewpoint aspects, expands traditional studies on Chinese aspect system, supplies evidence for aspect typology cross-linguistically, and provides reference for second language acquisition of usage patterns of zhe by non-native speakers.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47468703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aleksandrs Berdicevskis, E. Coussé, Alexander Koplenig, Yvonne Adesam
Abstract We investigate the optional omission of the infinitival marker in a Swedish future tense construction. During the last two decades the frequency of omission has been rapidly increasing, and this process has received considerable attention in the literature. We test whether the knowledge which has been accumulated can yield accurate predictions of language variation and change. We extracted all occurrences of the construction from a very large collection of corpora. The dataset was automatically annotated with language-internal predictors which have previously been shown or hypothesized to affect the variation. We trained several models in order to make two kinds of predictions: whether the marker will be omitted in a specific utterance and how large the proportion of omissions will be for a given time period. For most of the approaches we tried, we were not able to achieve a better-than-baseline performance. The only exception was predicting the proportion of omissions using autoregressive integrated moving average models for one-step-ahead forecast, and in this case time was the only predictor that mattered. Our data suggest that most of the language-internal predictors do have some effect on the variation, but the effect is not strong enough to yield reliable predictions.
{"title":"To drop or not to drop? Predicting the omission of the infinitival marker in a Swedish future construction","authors":"Aleksandrs Berdicevskis, E. Coussé, Alexander Koplenig, Yvonne Adesam","doi":"10.1515/cllt-2022-0101","DOIUrl":"https://doi.org/10.1515/cllt-2022-0101","url":null,"abstract":"Abstract We investigate the optional omission of the infinitival marker in a Swedish future tense construction. During the last two decades the frequency of omission has been rapidly increasing, and this process has received considerable attention in the literature. We test whether the knowledge which has been accumulated can yield accurate predictions of language variation and change. We extracted all occurrences of the construction from a very large collection of corpora. The dataset was automatically annotated with language-internal predictors which have previously been shown or hypothesized to affect the variation. We trained several models in order to make two kinds of predictions: whether the marker will be omitted in a specific utterance and how large the proportion of omissions will be for a given time period. For most of the approaches we tried, we were not able to achieve a better-than-baseline performance. The only exception was predicting the proportion of omissions using autoregressive integrated moving average models for one-step-ahead forecast, and in this case time was the only predictor that mattered. Our data suggest that most of the language-internal predictors do have some effect on the variation, but the effect is not strong enough to yield reliable predictions.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42350433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1515/cllt-2023-frontmatter2
{"title":"Frontmatter","authors":"","doi":"10.1515/cllt-2023-frontmatter2","DOIUrl":"https://doi.org/10.1515/cllt-2023-frontmatter2","url":null,"abstract":"","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136272042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The methodological debates surrounding keyword analysis have given rise to a wide range of keyness metrics. The present paper delineates four dimensions of keyness, which distinguish between frequency- and dispersion-related perspectives. Existing measures are then organized according to these dimensions and evaluated with regard to their performance on a specific keyword analysis task: The identification of key verbs in academic writing. To this end, the rankings produced by 32 different metrics are evaluated against an established academic word list. Further, the reliability of measures is assessed, to determine whether they produce stable rankings across repeated studies on the same pair of text varieties. We observe notable differences among metrics with regard to these criteria. Our findings provide further support for the superiority of the Wilcoxon rank sum test and text-dispersion–based measures, and allow us to identify, within each dimension of keyness, metrics that may be given preference in applied work.
{"title":"Evaluation of keyness metrics: performance and reliability","authors":"Lukas Sönning","doi":"10.1515/cllt-2022-0116","DOIUrl":"https://doi.org/10.1515/cllt-2022-0116","url":null,"abstract":"Abstract The methodological debates surrounding keyword analysis have given rise to a wide range of keyness metrics. The present paper delineates four dimensions of keyness, which distinguish between frequency- and dispersion-related perspectives. Existing measures are then organized according to these dimensions and evaluated with regard to their performance on a specific keyword analysis task: The identification of key verbs in academic writing. To this end, the rankings produced by 32 different metrics are evaluated against an established academic word list. Further, the reliability of measures is assessed, to determine whether they produce stable rankings across repeated studies on the same pair of text varieties. We observe notable differences among metrics with regard to these criteria. Our findings provide further support for the superiority of the Wilcoxon rank sum test and text-dispersion–based measures, and allow us to identify, within each dimension of keyness, metrics that may be given preference in applied work.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43362274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Classification trees and random forests offer a number of attractive features to corpus data analysts. However, the way in which these models are typically reported – a decision tree and/or set of variable importance scores – offers insufficient information if interest centers on the (form of) relationship between (multiple) predictors and the outcome. This paper develops predictive margins as an interpretative approach to ensemble techniques such as random forests. These are model summaries in the form of adjusted predictions, which provide a clearer picture of patterns in the data and allow us to query a model on potential nonlinear associations and interactions among predictor variables. The present paper outlines the general strategy for forming predictive margins and addresses methodological issues from an explicitly (corpus) linguistic perspective. For illustration, we use data on the English genitive alternation and provide an R package and code for their implementation.
{"title":"Seeing the wood for the trees: predictive margins for random forests","authors":"Lukas Sönning, Jason Grafmiller","doi":"10.1515/cllt-2022-0083","DOIUrl":"https://doi.org/10.1515/cllt-2022-0083","url":null,"abstract":"Abstract Classification trees and random forests offer a number of attractive features to corpus data analysts. However, the way in which these models are typically reported – a decision tree and/or set of variable importance scores – offers insufficient information if interest centers on the (form of) relationship between (multiple) predictors and the outcome. This paper develops predictive margins as an interpretative approach to ensemble techniques such as random forests. These are model summaries in the form of adjusted predictions, which provide a clearer picture of patterns in the data and allow us to query a model on potential nonlinear associations and interactions among predictor variables. The present paper outlines the general strategy for forming predictive margins and addresses methodological issues from an explicitly (corpus) linguistic perspective. For illustration, we use data on the English genitive alternation and provide an R package and code for their implementation.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41334909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-27eCollection Date: 2024-02-01DOI: 10.1515/cllt-2022-0082
Greg Woodin, Bodo Winter, Jeannette Littlemore, Marcus Perlman, Jack Grieve
This paper describes patterns of number use in spoken and written English and the main factors that contribute to these patterns. We analysed more than 1.7 million occurrences of numbers between 0 and a billion in the British National Corpus, including conversational speech, presentational speech (e.g., lectures, interviews), imaginative writing (e.g., fiction), and informative writing (e.g., academic books). We find that four main factors affect number frequency: (1) Magnitude - smaller numbers are more frequent than larger numbers; (2) Roundness - round numbers are more frequent than unround numbers of a comparable magnitude, and some round numbers are more frequent than others; (3) Cultural salience - culturally salient numbers (e.g., recent years) are more frequent than non-salient numbers; and (4) Register - more informational texts contain more numbers (in writing), types of numbers, decimals, and larger numbers than less informational texts. In writing, we find that the numbers 1-9 are mostly represented by number words (e.g., 'three'), 10-999,999 are mostly represented by numerals (e.g., '14'), and 1 million-1 billion are mostly represented by a mix of numerals and number words (e.g., '8 million'). Altogether, this study builds a detailed profile of number use in spoken and written English.
{"title":"Large-scale patterns of number use in spoken and written English.","authors":"Greg Woodin, Bodo Winter, Jeannette Littlemore, Marcus Perlman, Jack Grieve","doi":"10.1515/cllt-2022-0082","DOIUrl":"10.1515/cllt-2022-0082","url":null,"abstract":"<p><p>This paper describes patterns of number use in spoken and written English and the main factors that contribute to these patterns. We analysed more than 1.7 million occurrences of numbers between 0 and a billion in the British National Corpus, including conversational speech, presentational speech (e.g., lectures, interviews), imaginative writing (e.g., fiction), and informative writing (e.g., academic books). We find that four main factors affect number frequency: (1) Magnitude - smaller numbers are more frequent than larger numbers; (2) Roundness - round numbers are more frequent than unround numbers of a comparable magnitude, and some round numbers are more frequent than others; (3) Cultural salience - culturally salient numbers (e.g., recent years) are more frequent than non-salient numbers; and (4) Register - more informational texts contain more numbers (in writing), types of numbers, decimals, and larger numbers than less informational texts. In writing, we find that the numbers 1-9 are mostly represented by number words (e.g., 'three'), 10-999,999 are mostly represented by numerals (e.g., '14'), and 1 million-1 billion are mostly represented by a mix of numerals and number words (e.g., '8 million'). Altogether, this study builds a detailed profile of number use in spoken and written English.</p>","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10853912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45802213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Nepali is typologically rare in terms of nominal classification systems, as it is one of the few languages of the world having simultaneously two gender systems (human/non-human, masculine/feminine) and one numeral classifier system (distinguishing features such as human, round-shaped objects, and long objects among others). Such a rare co-occurrence of different nominal classification systems is highly relevant for investigating linguistic complexity, as languages generally do not have several systems of the same type fulfilling the same functions. However, no corpus-based quantitative analyses have been conducted on the productive use of nominal classification systems in Nepali. The current paper aims at filling this gap by providing a token-based study from the Nepali National Corpus (∼20 million words). Our preliminary results show that there is in fact little formal overlap between the classifier and the gender systems.
{"title":"A corpus-based quantitative study of numeral classifiers in Nepali","authors":"Krishna Prasad Parajuli, Marc Allassonnière-Tang","doi":"10.1515/cllt-2022-0064","DOIUrl":"https://doi.org/10.1515/cllt-2022-0064","url":null,"abstract":"Abstract Nepali is typologically rare in terms of nominal classification systems, as it is one of the few languages of the world having simultaneously two gender systems (human/non-human, masculine/feminine) and one numeral classifier system (distinguishing features such as human, round-shaped objects, and long objects among others). Such a rare co-occurrence of different nominal classification systems is highly relevant for investigating linguistic complexity, as languages generally do not have several systems of the same type fulfilling the same functions. However, no corpus-based quantitative analyses have been conducted on the productive use of nominal classification systems in Nepali. The current paper aims at filling this gap by providing a token-based study from the Nepali National Corpus (∼20 million words). Our preliminary results show that there is in fact little formal overlap between the classifier and the gender systems.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43975397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract English verbs can combine with an object-like (or Objoid) element consisting of a possessive and a superlative. These Superlative Objoids do not add a participant to the event but function like manner adverbs (they work their hardest, i.e. they work extremely hard). This paper is the first to use diachronic evidence from a corpus of Late Modern American English to trace the recent history of Superlative Objoid Constructions (SOC). In particular, it aims to assess whether the construction has become entrenched to the extent that it can give rise to analogical extension. Secondly, the evidence is used to model, within the framework of Construction Grammar, the horizontal and vertical links between the SOC and its (potential) relatives in the constructional network of transitivity changing constructions.
{"title":"They worked their hardest on the construction’s history: Superlative Objoid Constructions in Late Modern American English","authors":"Tamara Bouso, M. Hundt","doi":"10.1515/cllt-2022-0088","DOIUrl":"https://doi.org/10.1515/cllt-2022-0088","url":null,"abstract":"Abstract English verbs can combine with an object-like (or Objoid) element consisting of a possessive and a superlative. These Superlative Objoids do not add a participant to the event but function like manner adverbs (they work their hardest, i.e. they work extremely hard). This paper is the first to use diachronic evidence from a corpus of Late Modern American English to trace the recent history of Superlative Objoid Constructions (SOC). In particular, it aims to assess whether the construction has become entrenched to the extent that it can give rise to analogical extension. Secondly, the evidence is used to model, within the framework of Construction Grammar, the horizontal and vertical links between the SOC and its (potential) relatives in the constructional network of transitivity changing constructions.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43926545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-01DOI: 10.1515/cllt-2023-frontmatter1
{"title":"Frontmatter","authors":"","doi":"10.1515/cllt-2023-frontmatter1","DOIUrl":"https://doi.org/10.1515/cllt-2023-frontmatter1","url":null,"abstract":"","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136178354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}