{"title":"Review of Egbert, Biber & Gray (2022): Designing and Evaluating Language Corpora: A Practical Framework for Corpus Representativeness","authors":"Tony McEnery","doi":"10.1075/ijcl.00054.mce","DOIUrl":"https://doi.org/10.1075/ijcl.00054.mce","url":null,"abstract":"","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41831134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper reports a corpus-based, cognitive semantic study on profiling the varied uses of the Chinese color term hēi 黑 “black” with regard to its metaphorical polysemy. We hypothesize that the semantic (dis)similarities among the eight metaphorical meanings of hēi “black” can be captured by clustering their contextual features, including collocational patterns, morphosyntactic and semantic properties, and discourse information. The Behavioral Profiles approach is adopted for the analyses with the annotations of 800 instances for 46 contextual features and a hierarchical agglomerative cluster analysis conducted on the annotated data. The results show that the eight metaphorical senses of hēi “black” fall into three clusters. This clustering can be explained by the conceptual bases pertaining to color perceptions and color changes, in line with Conceptual Metaphor Theory. This study demonstrates the effectiveness of the corpus-based Behavioral Profiles approach in exploring the underlying cognitive mechanisms of metaphorical extensions and meaning differentiations.
{"title":"Metaphorical polysemy of the Chinese color term hēi 黑 “black”","authors":"Meichun Liu, Jinmeng Dou","doi":"10.1075/ijcl.21067.liu","DOIUrl":"https://doi.org/10.1075/ijcl.21067.liu","url":null,"abstract":"\u0000This paper reports a corpus-based, cognitive semantic study on profiling the varied uses of the Chinese color term hēi 黑 “black” with regard to its metaphorical polysemy. We hypothesize that the semantic (dis)similarities among the eight metaphorical meanings of hēi “black” can be captured by clustering their contextual features, including collocational patterns, morphosyntactic and semantic properties, and discourse information. The Behavioral Profiles approach is adopted for the analyses with the annotations of 800 instances for 46 contextual features and a hierarchical agglomerative cluster analysis conducted on the annotated data. The results show that the eight metaphorical senses of hēi “black” fall into three clusters. This clustering can be explained by the conceptual bases pertaining to color perceptions and color changes, in line with Conceptual Metaphor Theory. This study demonstrates the effectiveness of the corpus-based Behavioral Profiles approach in exploring the underlying cognitive mechanisms of metaphorical extensions and meaning differentiations.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47787233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes the collection and analysis of the most recent edition of the Brown family, the BE21 corpus, consisting of 1 million words of written British English texts, published in 2021. Using the Coefficient of Variance, the frequencies of part of speech tags in BE21 are compared against the other four British members of the Brown family (from 1931, 1961, 1991 and 2006). Part of speech tags that are steadily increasing or decreasing in all five or the latest three corpora are examined via concordance lines and their distributions in order to identify long-standing and emerging trends in British English. The analysis points to the continuation of some trends (such as declines in modal verbs and titles of address), along with newer trends like the rise of first person pronouns. The analysis indicates that more general trends of densification, democratisation and colloquialisation are continuing in British English.
{"title":"A year to remember?","authors":"Paul Baker","doi":"10.1075/ijcl.22007.bak","DOIUrl":"https://doi.org/10.1075/ijcl.22007.bak","url":null,"abstract":"\u0000This paper describes the collection and analysis of the most recent edition of the Brown family, the BE21 corpus, consisting of 1 million words of written British English texts, published in 2021. Using the Coefficient of Variance, the frequencies of part of speech tags in BE21 are compared against the other four British members of the Brown family (from 1931, 1961, 1991 and 2006). Part of speech tags that are steadily increasing or decreasing in all five or the latest three corpora are examined via concordance lines and their distributions in order to identify long-standing and emerging trends in British English. The analysis points to the continuation of some trends (such as declines in modal verbs and titles of address), along with newer trends like the rise of first person pronouns. The analysis indicates that more general trends of densification, democratisation and colloquialisation are continuing in British English.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42148378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eva Zehentner, M. Hundt, G. Schneider, M. Röthlisberger
Prepositional phrases (PPs) play an important part in English argument structure constructions, but pose considerable challenges for linguistic investigations of any kind. In addition to the fact that PP-attachment is generally notoriously difficult to model computationally, a particularly striking methodological challenge in investigating verb-dependent PPs across (synchronic and/or diachronic) corpora is that such cross-corpus studies may have to rely on material annotated with different tools. This study evaluates the impact that such differences in corpus annotation may have on retrieval of verb-attached PPs by means of data from Early and Late Modern English corpora. Our intrinsic (recall/precision) and extrinsic parser evaluation shows that annotation does play a role, but that the noise introduced is negligible as far as frequency developments are concerned.
{"title":"Differences in syntactic annotation affect retrieval","authors":"Eva Zehentner, M. Hundt, G. Schneider, M. Röthlisberger","doi":"10.1075/ijcl.21104.zeh","DOIUrl":"https://doi.org/10.1075/ijcl.21104.zeh","url":null,"abstract":"\u0000Prepositional phrases (PPs) play an important part in English argument structure constructions, but pose considerable challenges for linguistic investigations of any kind. In addition to the fact that PP-attachment is generally notoriously difficult to model computationally, a particularly striking methodological challenge in investigating verb-dependent PPs across (synchronic and/or diachronic) corpora is that such cross-corpus studies may have to rely on material annotated with different tools. This study evaluates the impact that such differences in corpus annotation may have on retrieval of verb-attached PPs by means of data from Early and Late Modern English corpora. Our intrinsic (recall/precision) and extrinsic parser evaluation shows that annotation does play a role, but that the noise introduced is negligible as far as frequency developments are concerned.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43437237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study investigates the factors significantly constraining dative alternation in Chinese by adopting mixed-effects logistic regression modelling. The analysis showed that such factors significantly affected the choice of dative variants in Chinese, including the animacy, pronominality, and definiteness of the recipient, the accessibility and concreteness of the theme, and the length difference between the theme and the recipient. Findings were compared with those for the English dative alternation discussed in the literature. When the theme was recoverable from context or shorter than the recipient, the prepositional dative construction was preferred in both English and Chinese. This can be explained by the principles of end-focus and end-weight. However, when the recipient was animate or definite, the double object construction was preferred in English, while the prepositional dative construction was more likely to be used in Chinese. This divergence is due to the different syntactic and semantic features of their recipient markers.
{"title":"Dative alternation in Chinese","authors":"Dong Zhang, Jiajin Xu","doi":"10.1075/ijcl.21086.zha","DOIUrl":"https://doi.org/10.1075/ijcl.21086.zha","url":null,"abstract":"\u0000This study investigates the factors significantly constraining dative alternation in Chinese by adopting mixed-effects logistic regression modelling. The analysis showed that such factors significantly affected the choice of dative variants in Chinese, including the animacy, pronominality, and definiteness of the recipient, the accessibility and concreteness of the theme, and the length difference between the theme and the recipient. Findings were compared with those for the English dative alternation discussed in the literature. When the theme was recoverable from context or shorter than the recipient, the prepositional dative construction was preferred in both English and Chinese. This can be explained by the principles of end-focus and end-weight. However, when the recipient was animate or definite, the double object construction was preferred in English, while the prepositional dative construction was more likely to be used in Chinese. This divergence is due to the different syntactic and semantic features of their recipient markers.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42101345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of McCarthy (2020): Innovations and Challenges in Grammar","authors":"Beatrix Busse, Sophie Du Bois","doi":"10.1075/ijcl.00053.bus","DOIUrl":"https://doi.org/10.1075/ijcl.00053.bus","url":null,"abstract":"","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49283960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of McEnery & Brezina (2022): Fundamental Principles of Corpus Linguistics","authors":"Niall Curry","doi":"10.1075/ijcl.00052.cur","DOIUrl":"https://doi.org/10.1075/ijcl.00052.cur","url":null,"abstract":"","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47509136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Overlapping bundles, that is, shorter lexical bundles that are totally or partially embedded in longer expressions, may prove problematic in the structural and functional classification of bundles. For example, many studies in the literature focus only on four-word lexical bundles and conduct extensive structural and functional analysis of those bundles. However, most scholars have not considered the fact that some 4-word expressions may be embedded in longer expressions. These longer expressions may not only have a different structure but may also carry out a different functional role. The present study introduces the Lexical Bundle Identification and Analysis Program (LBiaP), a software tool designed to facilitate lexical bundle research with independent observations of each lexical bundle identified. First, we describe complete overlapping, complete subsumption, and interlocking bundles in detail. We then explain how LBiaP deals with these types of bundles when detected.
{"title":"LBiaP","authors":"Viviana Cortes, William M. Lake","doi":"10.1075/ijcl.21100.cor","DOIUrl":"https://doi.org/10.1075/ijcl.21100.cor","url":null,"abstract":"\u0000Overlapping bundles, that is, shorter lexical bundles that are totally or partially embedded in longer expressions, may prove problematic in the structural and functional classification of bundles. For example, many studies in the literature focus only on four-word lexical bundles and conduct extensive structural and functional analysis of those bundles. However, most scholars have not considered the fact that some 4-word expressions may be embedded in longer expressions. These longer expressions may not only have a different structure but may also carry out a different functional role. The present study introduces the Lexical Bundle Identification and Analysis Program (LBiaP), a software tool designed to facilitate lexical bundle research with independent observations of each lexical bundle identified. First, we describe complete overlapping, complete subsumption, and interlocking bundles in detail. We then explain how LBiaP deals with these types of bundles when detected.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47987105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article studies you bet and related phrases when they are used as a parenthetical and as a free-standing response. Drawing on a range of corpora, we provide both contemporary and historical perspectives on the set of pragmatic expressions that has largely escaped scholars’ attention. Synchronically, we demonstrate that they are colloquial American pragmatic markers to express speaker certainty/affirmation or to respond to thanks. Diachronically, these markers are hypothesized to have developed out of main clause usage with a clausal complement (‘the matrix clause hypothesis’); however, our historical corpus evidence does not straightforwardly support this hypothesis. Instead, we suggest that multiple constructions might have been involved in the emergence of the pragmatic markers, namely, wh-interrogatives (e.g. what will you bet (that) …?), modal constructions (e.g. you may/can bet (that) …), and main clauses with a reduced complement (e.g. You bet I do).
本文将研究您的赌注和相关短语,当它们被用作插入语或作为独立的回应时。在一系列语料库的基础上,我们提供了当代和历史的视角来研究这些很大程度上逃过了学者们注意的语用表达。同时,我们证明了它们是美国口语语用标记,用来表达说话者的肯定/肯定或对感谢的回应。历时上,这些标记被假设是从带有小句补语的主句使用中发展出来的(“矩阵子句假设”);然而,我们的历史语料库证据并不能直接支持这一假设。相反,我们认为语用标记的出现可能涉及多种结构,即wh-疑问句(例如,what will you bet (that)…?)、情态结构(例如,you may/can bet (that)…)和主句(例如,you bet I do)。
{"title":"“You betcha I’m a ’Merican”","authors":"Tomoharu Hirota, Laurel J. Brinton","doi":"10.1075/ijcl.21060.hir","DOIUrl":"https://doi.org/10.1075/ijcl.21060.hir","url":null,"abstract":"\u0000This article studies you bet and related phrases when they are used as a parenthetical and as a free-standing response. Drawing on a range of corpora, we provide both contemporary and historical perspectives on the set of pragmatic expressions that has largely escaped scholars’ attention. Synchronically, we demonstrate that they are colloquial American pragmatic markers to express speaker certainty/affirmation or to respond to thanks. Diachronically, these markers are hypothesized to have developed out of main clause usage with a clausal complement (‘the matrix clause hypothesis’); however, our historical corpus evidence does not straightforwardly support this hypothesis. Instead, we suggest that multiple constructions might have been involved in the emergence of the pragmatic markers, namely, wh-interrogatives (e.g. what will you bet (that) …?), modal constructions (e.g. you may/can bet (that) …), and main clauses with a reduced complement (e.g. You bet I do).","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47186541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a method for the automatic induction of categories of Spanish discourse markers using parallel corpora, based on a quantitative and empirical approach that minimises explicit linguistic knowledge. We conducted the analysis the using a large Spanish-English parallel corpus. First, we used this corpus to obtain a list of parenthetical discourse markers in each language. Then, we used it as a “semantic mirror”, inspecting the English equivalences and assessing which Spanish discourse markers fulfil a similar function in discourse and vice versa. The result of this procedure is an emerging categorisation of discourse markers. The main contribution is to offer empirical evidence for the adequacy of existing manually-compiled taxonomies and the potential for discovery of new, unaccounted categories. In this article we focus on units pertaining to the Spanish language but, since the method is purely quantitative, it is possible to apply it to different languages as well.
{"title":"A proposal for the inductive categorisation of parenthetical discourse markers in Spanish using parallel corpora","authors":"Hernán Robledo, Rogelio Nazar","doi":"10.1075/ijcl.20017.rob","DOIUrl":"https://doi.org/10.1075/ijcl.20017.rob","url":null,"abstract":"\u0000We propose a method for the automatic induction of categories of Spanish discourse markers using parallel corpora, based on a quantitative and empirical approach that minimises explicit linguistic knowledge. We conducted the analysis the using a large Spanish-English parallel corpus. First, we used this corpus to obtain a list of parenthetical discourse markers in each language. Then, we used it as a “semantic mirror”, inspecting the English equivalences and assessing which Spanish discourse markers fulfil a similar function in discourse and vice versa. The result of this procedure is an emerging categorisation of discourse markers. The main contribution is to offer empirical evidence for the adequacy of existing manually-compiled taxonomies and the potential for discovery of new, unaccounted categories. In this article we focus on units pertaining to the Spanish language but, since the method is purely quantitative, it is possible to apply it to different languages as well.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44216622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}