{"title":"Review of Egbert & Baker (2019): Using Corpus Methods to Triangulate Linguistic Analysis","authors":"L. Anthony","doi":"10.1075/ijcl.00048.ant","DOIUrl":"https://doi.org/10.1075/ijcl.00048.ant","url":null,"abstract":"","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41640039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents evidence from both corpora and agent-based simulation for the effect of lectal contamination. By doing so, it shows how agent-based simulation can be used as a complementary technique to corpus research in the study of language variation. Lectal contamination is an effect whereby the words that are typical of a language variety more often appear in a morphosyntactic variant typical of that same variety, even among language use from a different variety. This study looks at the Dutch partitive genitive construction, which exhibits variation between a “Netherlandic” variant with -s ending and a “Belgian” variant without -s ending. It is shown that the probability of the Belgian variant without -s increases among more “Belgian” words, in the language use of both Belgians and people from the Netherlands. Meanwhile, an agent-based simulation reveals the crucial theoretical preconditions that lead to this effect.
{"title":"Lectal contamination","authors":"Dirk Pijpops","doi":"10.1075/ijcl.20040.pij","DOIUrl":"https://doi.org/10.1075/ijcl.20040.pij","url":null,"abstract":"\u0000 This paper presents evidence from both corpora and agent-based simulation for the effect of lectal contamination.\u0000 By doing so, it shows how agent-based simulation can be used as a complementary technique to corpus research in the study of\u0000 language variation. Lectal contamination is an effect whereby the words that are typical of a language variety more often appear\u0000 in a morphosyntactic variant typical of that same variety, even among language use from a different variety. This study looks at\u0000 the Dutch partitive genitive construction, which exhibits variation between a “Netherlandic” variant with -s\u0000 ending and a “Belgian” variant without -s ending. It is shown that the probability of the Belgian variant without\u0000 -s increases among more “Belgian” words, in the language use of both Belgians and people from the Netherlands. Meanwhile, an\u0000 agent-based simulation reveals the crucial theoretical preconditions that lead to this effect.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45538571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of Carrió-Pastor (2020): Corpus Analysis in Different Genres: Academic Discourse and Learner Corpora","authors":"Shuqiong Wu","doi":"10.1075/ijcl.00049.wu","DOIUrl":"https://doi.org/10.1075/ijcl.00049.wu","url":null,"abstract":"","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48933416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The aim of collostructional analysis or, more precisely, simple collexeme analysis, is to quantify the statistical association between a construction c and a lexeme l that occurs in a particular slot of the construction. The analysis is based on 2×2 contingency tables that ought to represent a cross-classification of the units of analysis. So far, the units of analysis have been identified either as all constructions in the corpus or all instances of a class C of constructions to which construction c belongs. In practice, it is often not possible or feasible to identify these constructions. Therefore, the sample size is typically approximated by heuristic estimates. The bottom-right cell of the contingency table is most affected by these approximations. I suggest that the units of analysis be defined on the word level, instead, as the class W of word forms that satisfy the restrictions on the collexeme slot of c.
{"title":"Use words, not constructions!","authors":"Thomas Proisl","doi":"10.1075/ijcl.20072.pro","DOIUrl":"https://doi.org/10.1075/ijcl.20072.pro","url":null,"abstract":"\u0000The aim of collostructional analysis or, more precisely, simple collexeme analysis, is to quantify the statistical association between a construction c and a lexeme l that occurs in a particular slot of the construction. The analysis is based on 2×2 contingency tables that ought to represent a cross-classification of the units of analysis. So far, the units of analysis have been identified either as all constructions in the corpus or all instances of a class C of constructions to which construction c belongs. In practice, it is often not possible or feasible to identify these constructions. Therefore, the sample size is typically approximated by heuristic estimates. The bottom-right cell of the contingency table is most affected by these approximations. I suggest that the units of analysis be defined on the word level, instead, as the class W of word forms that satisfy the restrictions on the collexeme slot of c.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44619049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In word association tasks, participants respond with the first word that comes to mind on seeing a given cue. These responses are generally assumed to be influenced by a number of factors, including cue semantics, form, and textual distribution. Previous studies exploring the third of these influences have used pairwise association measures, such as mutual information, to evaluate the extent to which textual distributions influence response selection. In the current paper, a different approach is taken. Rather than examining co-occurrences between a cue and its observed responses, this paper explores the possibility that the cue’s holistic collocational environment shapes its associative profile. Regression modelling demonstrates that the predictability of this textual distribution is a significant predictor of variance in the cue’s response profile. Overall, however, the amount of variance explained is small. A subsequent qualitative examination of distributional and associative profiles suggests several semantically based constraints to response generation.
{"title":"Exploring the impact of lexical context on word association responses","authors":"P. Thwaites","doi":"10.1075/ijcl.20102.thw","DOIUrl":"https://doi.org/10.1075/ijcl.20102.thw","url":null,"abstract":"\u0000In word association tasks, participants respond with the first word that comes to mind on seeing a given cue. These responses are generally assumed to be influenced by a number of factors, including cue semantics, form, and textual distribution. Previous studies exploring the third of these influences have used pairwise association measures, such as mutual information, to evaluate the extent to which textual distributions influence response selection. In the current paper, a different approach is taken. Rather than examining co-occurrences between a cue and its observed responses, this paper explores the possibility that the cue’s holistic collocational environment shapes its associative profile. Regression modelling demonstrates that the predictability of this textual distribution is a significant predictor of variance in the cue’s response profile. Overall, however, the amount of variance explained is small. A subsequent qualitative examination of distributional and associative profiles suggests several semantically based constraints to response generation.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49042248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Teodora Vuković, Anastasia Escher, Barbara Sonnenhauser
A corpus-based method for assessing a range of dialect-standard variation is presented for identifying samples exhibiting the highest prevalence of dialect features. This method provides insight into areal and inter-speaker variation and allows the extraction of maximally non-standard manifestations of the dialect, which may then be sampled and used for the study of language change and variation. The focus is on a non-standard Torlak variety, which has undergone considerable change under the influence of standard Serbian. The degree of variation is assessed by measuring the frequencies of five distinguishing linguistic features: accent position, dative reflexive si, auxiliary omission in the compound perfect, the post-positive article, and analytic case marking in the indirect object and possessive. Locations subject to the greatest and least influence of the standard are revealed using hierarchical clustering. A positive correlation between the frequencies of occurrence reveals which non-standard feature is the best predictor of the others.
{"title":"Degrees of non-standardness","authors":"Teodora Vuković, Anastasia Escher, Barbara Sonnenhauser","doi":"10.1075/ijcl.20014.vuk","DOIUrl":"https://doi.org/10.1075/ijcl.20014.vuk","url":null,"abstract":"\u0000 A corpus-based method for assessing a range of dialect-standard variation is presented for identifying samples\u0000 exhibiting the highest prevalence of dialect features. This method provides insight into areal and inter-speaker variation and\u0000 allows the extraction of maximally non-standard manifestations of the dialect, which may then be sampled and used for the study of\u0000 language change and variation. The focus is on a non-standard Torlak variety, which has undergone considerable change under the\u0000 influence of standard Serbian. The degree of variation is assessed by measuring the frequencies of five distinguishing linguistic\u0000 features: accent position, dative reflexive si, auxiliary omission in the compound perfect, the post-positive\u0000 article, and analytic case marking in the indirect object and possessive. Locations subject to the greatest and least influence of\u0000 the standard are revealed using hierarchical clustering. A positive correlation between the frequencies of occurrence reveals\u0000 which non-standard feature is the best predictor of the others.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45507128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Because of the ubiquity and importance of collocations in language use/learning, how to effectively and efficiently identify collocations has been a topic of interest. Although some studies have evaluated many of the existing association measures (AMs) used in the automatic identification of collocations, the results so far have been inconsistent and unclear due to various limitations of the existing studies. Hence, this study makes a multi-dimensional evaluation of the effectiveness and efficiency of seven major AMs in the identification of three types of collocations across five genres and seven corpora of different sizes. The results indicate that while a few AMs, such as Log Likelihood Ratio and Cubic Mutual Information (MI3), are consistently more effective and efficient than the other five AMs being examined, no one AM alone may be adequate in the identification of different types of collocations across different genres and corpus sizes. Research implications are also discussed.
{"title":"A multi-dimensional comparison of the effectiveness and efficiency of association measures in collocation\u0000 extraction","authors":"Yaochen Deng, Dilin Liu","doi":"10.1075/ijcl.19111.den","DOIUrl":"https://doi.org/10.1075/ijcl.19111.den","url":null,"abstract":"\u0000 Because of the ubiquity and importance of collocations in language use/learning, how to effectively and\u0000 efficiently identify collocations has been a topic of interest. Although some studies have evaluated many of the existing\u0000 association measures (AMs) used in the automatic identification of collocations, the results so far have been inconsistent and\u0000 unclear due to various limitations of the existing studies. Hence, this study makes a multi-dimensional evaluation of the\u0000 effectiveness and efficiency of seven major AMs in the identification of three types of collocations across five genres and seven\u0000 corpora of different sizes. The results indicate that while a few AMs, such as Log Likelihood Ratio and Cubic Mutual Information\u0000 (MI3), are consistently more effective and efficient than the other five AMs being examined, no one AM alone may be\u0000 adequate in the identification of different types of collocations across different genres and corpus sizes. Research implications\u0000 are also discussed.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44397793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}