Which adjectives tend to occur as attributive (the cute/red dress) versus predicative (the dress is cute/red) and why? Building on findings from Wiegand et al. (2013. Predicative adjectives: An unsupervised criterion to extract subjective adjectives. In Lucy Vanderwende, Hal DauméIII & Katrin Kirchhoff (eds.), Proceedings of the 2013 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies (NAACL-HLT), 534–539. Atlanta, GA: Association for Computational Linguistics) and Vartiainen (2013. Subjectivity, indefiniteness and semantic change. English Language and Linguistics 17(1). 157–179), this paper argues that subjective adjectives such as cute tend to be placed in predicative position not just because they often describe discourse-new information, but because this position serves to foreground information that the hearer may disagree with. This claim is supported using data from the Corpus of Contemporary American English (Davies, Mark. 2008. The corpus of contemporary American English: One billion words, 1990-present. Available at: https://www.english-corpora.org/coca/) combined with human annotations for subjectivity from Scontras et al. (2017. Subjectivity predicts adjective ordering preferences. Open Mind 1(1). 53–66) et seq.; and data from image captions versus descriptions (for seeing versus low-vision people) from the National Gallery of Art. A production experiment manipulates the discourse context to further show that adjectives tend to be placed in predicative position when they express controversial information. Overall, this paper explores how the lexical semantics of adjectives shapes the pragmatic contexts in which they tend to be used, which in turn shapes the syntax of the sentences using them.
哪些形容词倾向于作为属性词(可爱/红色连衣裙)出现,哪些形容词倾向于作为谓词(连衣裙很可爱/红色)出现,为什么?以 Wiegand 等人(2013.谓语形容词:提取主观形容词的无监督标准。见 Lucy Vanderwende、Hal DauméIII & Katrin Kirchhoff(编辑),《计算语言学协会北美分会 2013 年会议论文集:人类语言技术》(NAACL-HLT),534-539 页。亚特兰大,佐治亚州:计算语言学协会)和 Vartiainen(2013 年。主观性、不确定性和语义变化。英语语言和语言学 17(1).157-179),本文认为,诸如可爱之类的主观形容词往往被置于谓语位置,这不仅是因为它们经常描述话语新信息,还因为这一位置有助于突出听者可能不同意的信息。本文使用《当代美国英语语料库》(Corpus of Contemporary American English)中的数据(Davies, Mark.2008.The corpus of contemporary American English:The corpus of contemporary American English: One billion words, 1990-present.Available at: https://www.english-corpora.org/coca/)结合 Scontras 等人(2017.主观性预测形容词排序偏好。Open Mind 1(1).53-66) et seq.;以及来自美国国家美术馆的图片说明与描述数据(针对视力好的人与视力差的人)。一个制作实验操纵了话语语境,进一步表明形容词在表达有争议的信息时倾向于被置于谓语位置。总之,本文探讨了形容词的词汇语义是如何塑造形容词倾向于使用的语用语境的,而语用语境又是如何塑造使用形容词的句子的语法的。
{"title":"The red dress is cute: why subjective adjectives are more often predicative","authors":"Lelia Glass","doi":"10.1515/cllt-2024-0044","DOIUrl":"https://doi.org/10.1515/cllt-2024-0044","url":null,"abstract":"Which adjectives tend to occur as attributive (<jats:italic>the cute/red dress</jats:italic>) versus predicative (<jats:italic>the dress is cute/red</jats:italic>) and why? Building on findings from Wiegand et al. (2013. Predicative adjectives: An unsupervised criterion to extract subjective adjectives. In Lucy Vanderwende, Hal DauméIII & Katrin Kirchhoff (eds.), <jats:italic>Proceedings of the 2013 conference of the North American chapter of the </jats:italic> <jats:italic>Association for Computational Linguistics</jats:italic> <jats:italic>: Human language technologies (NAACL-HLT)</jats:italic>, 534–539. Atlanta, GA: Association for Computational Linguistics) and Vartiainen (2013. Subjectivity, indefiniteness and semantic change. <jats:italic>English Language and Linguistics</jats:italic> 17(1). 157–179), this paper argues that subjective adjectives such as <jats:italic>cute</jats:italic> tend to be placed in predicative position not just because they often describe discourse-new information, but because this position serves to foreground information that the hearer may disagree with. This claim is supported using data from the Corpus of Contemporary American English (Davies, Mark. 2008. <jats:italic>The corpus of contemporary American English: One billion words, 1990-present</jats:italic>. Available at: <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" ext-link-type=\"uri\" xlink:href=\"https://www.english-corpora.org/coca/\">https://www.english-corpora.org/coca/</jats:ext-link>) combined with human annotations for subjectivity from Scontras et al. (2017. Subjectivity predicts adjective ordering preferences. <jats:italic>Open Mind</jats:italic> 1(1). 53–66) <jats:italic>et seq.</jats:italic>; and data from image captions versus descriptions (for seeing versus low-vision people) from the National Gallery of Art. A production experiment manipulates the discourse context to further show that adjectives tend to be placed in predicative position when they express controversial information. Overall, this paper explores how the lexical semantics of adjectives shapes the pragmatic contexts in which they tend to be used, which in turn shapes the syntax of the sentences using them.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"20 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bei sentences in Mandarin Chinese with SOV word order have attracted extensive interest. However, their semantic features lacked quantitative evidence and their cognitive features received insufficient attention. Therefore, the current study aims to quantitatively investigate the semantic and cognitive features through the analysis of nine annotated factors in a corpus. The results regarding bei sentences show that (i) subjects exhibit a tendency to be definite and animate; non-adversative verbs have gained popularity over time, and intransitive verbs are capable of taking objects; (ii) subject relations tend to be long, implying heavy cognitive load, whereas the dependencies governed by subjects are often short, suggesting light cognitive load; and (iii) certain semantic factors significantly impact cognitive factors; for instance, animate subjects tend to govern shorter dependencies. Overall, our study provides empirical support for the semantic features of bei sentences and reveals their cognitive features using dependency distance.
汉语普通话中带有 SOV 词序的 Bei 句子引起了广泛关注。然而,其语义特征缺乏量化证据,认知特征也未得到足够关注。因此,本研究旨在通过分析语料库中的九个注释因素,对其语义和认知特征进行定量研究。有关 bei 句子的研究结果表明:(i) 主语表现出确定和有生命的倾向;随着时间的推移,非谓语动词越来越受欢迎,而不及物动词则可以带宾语;(ii) 主语关系往往较长,这意味着认知负荷较重,而主语所支配的从属关系往往较短,这意味着认知负荷较轻;(iii) 某些语义因素对认知因素有显著影响,例如,有生命的主语往往支配较短的从属关系。总之,我们的研究为 bei 句子的语义特征提供了实证支持,并利用依存距离揭示了其认知特征。
{"title":"A corpus-based study on semantic and cognitive features of bei sentences in Mandarin Chinese","authors":"Yonghui Xie, Ruochen Niu, Haitao Liu","doi":"10.1515/cllt-2024-0031","DOIUrl":"https://doi.org/10.1515/cllt-2024-0031","url":null,"abstract":"<jats:italic>Bei</jats:italic> sentences in Mandarin Chinese with SOV word order have attracted extensive interest. However, their semantic features lacked quantitative evidence and their cognitive features received insufficient attention. Therefore, the current study aims to quantitatively investigate the semantic and cognitive features through the analysis of nine annotated factors in a corpus. The results regarding <jats:italic>bei</jats:italic> sentences show that (i) subjects exhibit a tendency to be definite and animate; non-adversative verbs have gained popularity over time, and intransitive verbs are capable of taking objects; (ii) subject relations tend to be long, implying heavy cognitive load, whereas the dependencies governed by subjects are often short, suggesting light cognitive load; and (iii) certain semantic factors significantly impact cognitive factors; for instance, animate subjects tend to govern shorter dependencies. Overall, our study provides empirical support for the semantic features of <jats:italic>bei</jats:italic> sentences and reveals their cognitive features using dependency distance.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"10 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Our study investigates the effect of French verb lemmata on the preverbal (QV) or postverbal (VQ) positioning of interrogative forms equivalent to English ‘what’ (que, quoi, and related forms) within a French–Spanish parallel corpus of subtitles. We highlight and illustrate the corpus’s utility for studying less frequent verbs in combination with specific wh-forms. Our findings suggest that less frequent French verbs exhibit weaker associations with QV compared to their more frequent counterparts. A post-hoc study using Spanish translations reveals that French verbs correlated with QV often denote observable actions involving directly accessible Q-referents. We hypothesise that queries concerning ‘situationally accessible’ referents are predominantly utilised for non-standard, evaluative, or challenging questions, which are typically QV in French.
{"title":"Verb influence on French wh-placement: a parallel corpus study","authors":"Jan Fliessbach, Johanna Rockstroh","doi":"10.1515/cllt-2024-0001","DOIUrl":"https://doi.org/10.1515/cllt-2024-0001","url":null,"abstract":"Our study investigates the effect of French verb lemmata on the preverbal (QV) or postverbal (VQ) positioning of interrogative forms equivalent to English ‘what’ (<jats:italic>que</jats:italic>, <jats:italic>quoi</jats:italic>, and related forms) within a French–Spanish parallel corpus of subtitles. We highlight and illustrate the corpus’s utility for studying less frequent verbs in combination with specific <jats:italic>wh</jats:italic>-forms. Our findings suggest that less frequent French verbs exhibit weaker associations with QV compared to their more frequent counterparts. A post-hoc study using Spanish translations reveals that French verbs correlated with QV often denote observable actions involving directly accessible Q-referents. We hypothesise that queries concerning ‘situationally accessible’ referents are predominantly utilised for non-standard, evaluative, or challenging questions, which are typically QV in French.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"23 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Usage-based constructionist approaches see language as an inventory of constructions at different levels of schematicity learned from the input. If so, personal constructicons should vary as a function of usage. Repeated use and chunking/entrenchment of concrete instances should lead to reanalysis of their internal structure and change in the level of schematicity. This paper exploits the reduction probability of is in it is as a diagnostic of reanalysis in a 1.75-million-word diachronic corpus of a single blogger over 8 years. All instances of it is/it’s (n = 10,929) were annotated at the constructional and lexical levels. A multilevel logistic regression model showed significant fixed effects of constructional entropy and construction-to-word association on reduction probability. Importantly, there remained substantial variation across lexical types of constructions in the extent to which they associated or became associated with reduction over time, suggesting idiosyncratic entrenchment and potential reanalysis as a function of usage.
以使用为基础的建构主义方法认为,语言是从输入中学习到的不同层次结构的建构库。如果是这样的话,个人构词法应该随着使用而变化。具体实例的重复使用和分块/堑壕化应导致对其内部结构的重新分析和图式化水平的变化。本文利用 "is "在 "it is "中的还原概率,对一位博主 8 年来 175 万字的双时态语料库进行重新分析。所有 it is/it's 实例(n = 10,929)都在构词和词汇层面进行了注释。多层次逻辑回归模型显示,构词熵和构词与词关联对还原概率有显著的固定效应。重要的是,随着时间的推移,不同词性类型的构词与缩减的关联或关联程度仍存在很大差异,这表明随着使用情况的变化,构词的特异性固着和潜在的重新分析也会发生变化。
{"title":"Idiosyncratic entrenchment: tracing change in constructional schematicity with nested random effects","authors":"Svetlana Vetchinnikova","doi":"10.1515/cllt-2023-0092","DOIUrl":"https://doi.org/10.1515/cllt-2023-0092","url":null,"abstract":"Usage-based constructionist approaches see language as an inventory of constructions at different levels of schematicity learned from the input. If so, personal constructicons should vary as a function of usage. Repeated use and chunking/entrenchment of concrete instances should lead to reanalysis of their internal structure and change in the level of schematicity. This paper exploits the reduction probability of <jats:italic>is</jats:italic> in <jats:italic>it is</jats:italic> as a diagnostic of reanalysis in a 1.75-million-word diachronic corpus of a single blogger over 8 years. All instances of <jats:italic>it is/it’s</jats:italic> (n = 10,929) were annotated at the constructional and lexical levels. A multilevel logistic regression model showed significant fixed effects of constructional entropy and construction-to-word association on reduction probability. Importantly, there remained substantial variation across lexical types of constructions in the extent to which they associated or became associated with reduction over time, suggesting idiosyncratic entrenchment and potential reanalysis as a function of usage.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"44 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The dative alternation has been extensively studied in the world’s languages, and the meanings of the verbs participating in the alternation have been shown to play a key role in determining its argument realization options. The present paper presents a multiple distinctive collexeme analysis approach to the dative alternation in Mandarin Chinese, which involves a choice of one of five functionally similar alternants, and it does so by also discussing several ways to improve how this has been done statistically in most previous analyses. Linguistically, we identify the core semantic differences of the five constructions based on which verbs statistically prefer to occur in which pattern, focusing on semantic potential and direction of transfer. Methodologically, this study contributes to the slowly growing body of studies that use collexeme strengths that are not only less related to frequency than the traditional methods (i.e., association is measured in a less diluted way) and that are directional (i.e., we can focus on one direction of association from the verb to the construction).
{"title":"Transfer five ways: applications of multiple distinctive collexeme analysis to the dative alternation in Mandarin Chinese","authors":"Shengyu Liao, Stefan Th. Gries, Stefanie Wulff","doi":"10.1515/cllt-2024-0033","DOIUrl":"https://doi.org/10.1515/cllt-2024-0033","url":null,"abstract":"The dative alternation has been extensively studied in the world’s languages, and the meanings of the verbs participating in the alternation have been shown to play a key role in determining its argument realization options. The present paper presents a multiple distinctive collexeme analysis approach to the dative alternation in Mandarin Chinese, which involves a choice of one of five functionally similar alternants, and it does so by also discussing several ways to improve how this has been done statistically in most previous analyses. Linguistically, we identify the core semantic differences of the five constructions based on which verbs statistically prefer to occur in which pattern, focusing on semantic potential and direction of transfer. Methodologically, this study contributes to the slowly growing body of studies that use collexeme strengths that are not only less related to frequency than the traditional methods (i.e., association is measured in a less diluted way) and that are directional (i.e., we can focus on one direction of association from the verb to the construction).","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"94 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141153761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jesse Egbert, Douglas Biber, Daniel Keller, Marianna Gracheva
During the past 20 years, corpus linguistic research on register variation has yielded important theoretical advances. The first part of this paper discusses these advances and the cumulative body of research that has produced them. In the second part of the paper, we focus on the goals of research on register variation. The traditional goal of the text-linguistic (TxtLx) approach to linguistic variation has been to describe registers and patterns of register variation: describing the linguistic and situational characteristics of registers. In this paper, we explore a related, but distinct, text-linguistic goal: to account for all linguistic variation among texts. Because the TxtLx framework assumes the importance of functional correspondence between linguistic characteristics and situational characteristics, it is reasonable to assume that in addition to register, we can use situational parameters coded continuously at the level of individual texts as additional predictors of text-linguistic variation. We describe the results of an empirical study to show that using both register categories and text-level situational parameters as predictors results in a more comprehensive and explanatory model of text-linguistic variation. In the conclusion we discuss the future of corpus-based register studies, focusing on unanswered questions related to theoretical claims about register.
{"title":"Register and the dual nature of functional correspondence: accounting for text-linguistic variation between registers, within registers, and without registers","authors":"Jesse Egbert, Douglas Biber, Daniel Keller, Marianna Gracheva","doi":"10.1515/cllt-2024-0011","DOIUrl":"https://doi.org/10.1515/cllt-2024-0011","url":null,"abstract":"During the past 20 years, corpus linguistic research on register variation has yielded important theoretical advances. The first part of this paper discusses these advances and the cumulative body of research that has produced them. In the second part of the paper, we focus on the goals of research on register variation. The traditional goal of the text-linguistic (TxtLx) approach to linguistic variation has been to describe registers and patterns of register variation: describing the linguistic and situational characteristics of registers. In this paper, we explore a related, but distinct, text-linguistic goal: to account for all linguistic variation among texts. Because the TxtLx framework assumes the importance of <jats:italic>functional correspondence</jats:italic> between linguistic characteristics and situational characteristics, it is reasonable to assume that in addition to register, we can use situational parameters coded continuously at the level of individual texts as additional predictors of text-linguistic variation. We describe the results of an empirical study to show that using both register categories and text-level situational parameters as predictors results in a more comprehensive and explanatory model of text-linguistic variation. In the conclusion we discuss the future of corpus-based register studies, focusing on unanswered questions related to theoretical claims about register.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"38 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the past decade, learner corpora have gained recognition as valuable data sources in Second Language Acquisition (SLA) research. This development can be attributed to significant progress in Learner Corpus Research (LCR). However, there is still substantial work to be done. This article highlights key issues essential for sustaining the relevance of learner corpora in SLA. More particularly, I focus on the need for more diverse types of learner corpora, stress the importance of detailed metadata, and advocate for multifactorial study designs. I then revisit ongoing debates regarding the role of the native speaker in LCR and propose a practical solution to address this thorny issue. Finally, I also readdress the need for improvement in the quantitative methods and statistics, arguing that the importance of robust quantitative analysis cannot be overstated. In conclusion, I envision an ambitious learner corpus compilation project that adheres to the FAIR principles, with the goal of further elevating study quality in LCR.
{"title":"Learner corpus research: a critical appraisal and roadmap for contributing (more) to SLA research agendas","authors":"Magali Paquot","doi":"10.1515/cllt-2024-0014","DOIUrl":"https://doi.org/10.1515/cllt-2024-0014","url":null,"abstract":"Over the past decade, learner corpora have gained recognition as valuable data sources in Second Language Acquisition (SLA) research. This development can be attributed to significant progress in Learner Corpus Research (LCR). However, there is still substantial work to be done. This article highlights key issues essential for sustaining the relevance of learner corpora in SLA. More particularly, I focus on the need for more diverse types of learner corpora, stress the importance of detailed metadata, and advocate for multifactorial study designs. I then revisit ongoing debates regarding the role of the native speaker in LCR and propose a practical solution to address this thorny issue. Finally, I also readdress the need for improvement in the quantitative methods and statistics, arguing that the importance of robust quantitative analysis cannot be overstated. In conclusion, I envision an ambitious learner corpus compilation project that adheres to the FAIR principles, with the goal of further elevating study quality in LCR.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"129 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Gamboa, Kristina Braun, Juhani Järvikivi, Shanley E. M. Allen
Nominal compounds are a structure commonly used in scientific texts. Despite their commonality, very little is known about how they are distributed in scientific articles. Based on the Uniform Information Density hypothesis, which states that speakers communicate information at a constant rate, avoiding peaks and troughs of information transmission, we predict that nominal compounds should cluster toward the end of scientific texts, be preceded by supporting text that facilitates their understanding, and be repeated often after their first use. In this paper, we examine these predictions through a quantitative and a qualitative analysis of a corpus of scientific papers from the fields of Biology, Economics and Linguistics. While our investigation did not reveal definitive findings for the first and third predictions above, it did produce supporting evidence in favor of our second prediction, thus advancing our understanding of NC use and the choices speakers make when transmitting information.
名词性化合物是科学文章中常用的一种结构。尽管它们很常见,但人们对它们在科学文章中的分布却知之甚少。根据 "均匀信息密度假说"(Uniform Information Density hypothesis),即说话者以恒定的速度传递信息,避免信息传递的高峰和低谷,我们预测名词性复词应集中在科技文章的末尾,在其前面有有助于理解的辅助文字,并在首次使用后经常重复出现。在本文中,我们通过对生物学、经济学和语言学领域的科学论文语料库进行定量和定性分析,对上述预测进行了研究。虽然我们的调查没有为上述第一和第三项预测揭示明确的结论,但却为第二项预测提供了支持性证据,从而推进了我们对数控系统使用和说话者在传递信息时所作选择的理解。
{"title":"The distributional properties of long nominal compounds in scientific articles: an investigation based on the uniform information density hypothesis","authors":"John Gamboa, Kristina Braun, Juhani Järvikivi, Shanley E. M. Allen","doi":"10.1515/cllt-2023-0028","DOIUrl":"https://doi.org/10.1515/cllt-2023-0028","url":null,"abstract":"Nominal compounds are a structure commonly used in scientific texts. Despite their commonality, very little is known about how they are distributed in scientific articles. Based on the Uniform Information Density hypothesis, which states that speakers communicate information at a constant rate, avoiding peaks and troughs of information transmission, we predict that nominal compounds should cluster toward the end of scientific texts, be preceded by supporting text that facilitates their understanding, and be repeated often after their first use. In this paper, we examine these predictions through a quantitative and a qualitative analysis of a corpus of scientific papers from the fields of Biology, Economics and Linguistics. While our investigation did not reveal definitive findings for the first and third predictions above, it did produce supporting evidence in favor of our second prediction, thus advancing our understanding of NC use and the choices speakers make when transmitting information.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"56 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140609182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Monika Bednarek, Martin Schweinberger, Kelvin K. H. Lee
Recent years have seen an increase in data and method reflection in corpus-based discourse analysis. In this article, we first take stock of some of the issues arising from such reflection (covering concepts such as triangulation, objectivity/subjectivity, replication, transparency, reflexivity, consistency). We then introduce a new ‘accountability’ framework for use in corpus-based discourse analysis (and perhaps beyond). We conceptualise such accountability as a multi-faceted phenomenon, covering various aspects of the research process. In the second part of this article, we then link this framework to a new cross-institutional initiative – the Australian Text Analytics Platform (ATAP) – which aims to address a small part of the framework, namely the transparency of analyses through Jupyter notebooks. We introduce the Quotation Tool as an example ATAP notebook of particular relevance to corpus-based discourse analysis. We reflect on how this notebook fosters accountability in relation to transparency of analysis and illustrate key applications using a set of different corpora.
{"title":"Corpus-based discourse analysis: from meta-reflection to accountability","authors":"Monika Bednarek, Martin Schweinberger, Kelvin K. H. Lee","doi":"10.1515/cllt-2023-0104","DOIUrl":"https://doi.org/10.1515/cllt-2023-0104","url":null,"abstract":"Recent years have seen an increase in data and method reflection in corpus-based discourse analysis. In this article, we first take stock of some of the issues arising from such reflection (covering concepts such as triangulation, objectivity/subjectivity, replication, transparency, reflexivity, consistency). We then introduce a new ‘accountability’ framework for use in corpus-based discourse analysis (and perhaps beyond). We conceptualise such accountability as a multi-faceted phenomenon, covering various aspects of the research process. In the second part of this article, we then link this framework to a new cross-institutional initiative – the Australian Text Analytics Platform (ATAP) – which aims to address a small part of the framework, namely the transparency of analyses through Jupyter notebooks. We introduce the Quotation Tool as an example ATAP notebook of particular relevance to corpus-based discourse analysis. We reflect on how this notebook fosters accountability in relation to transparency of analysis and illustrate key applications using a set of different corpora.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"25 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140609022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Japanese features a general noun-modifying clause construction (NMCC) with a more versatile range of semantic and pragmatic interpretations than equivalent constructions in other languages. Motivated by the learning challenge NMCCs pose to Japanese as a foreign language (JFL) learners, this article examines speech data from the International Corpus of Japanese as a Second Language (I-JAS) to compare learner use of NMCCs against a large L1 Japanese corpus. Instances of the construction from both corpora were analyzed to identify high-frequency part-of-speech categories and subcategories in the modifying clause predicate and head noun slots. A simple collexeme analysis was then employed to identify strongly attracted and repelled lexical items among those identified in realizations of the construction. Taken together, findings from these analyses revealed an important connection between the semantic weight of head nouns in NMCCs and the idiomaticity of the construction, with learner productions demonstrating a tendency toward heavy head nouns. This study lays the groundwork for future research seeking to explore the NMCC at different levels of granularity and to improve its treatment in JFL pedagogical materials.
{"title":"A collostructional approach to Japanese noun-modifying clause construction use and acquisition: a learner corpus study","authors":"Nicole C. De Los Reyes, Ute Römer-Barron","doi":"10.1515/cllt-2024-0020","DOIUrl":"https://doi.org/10.1515/cllt-2024-0020","url":null,"abstract":"Japanese features a general noun-modifying clause construction (NMCC) with a more versatile range of semantic and pragmatic interpretations than equivalent constructions in other languages. Motivated by the learning challenge NMCCs pose to Japanese as a foreign language (JFL) learners, this article examines speech data from the International Corpus of Japanese as a Second Language (I-JAS) to compare learner use of NMCCs against a large L1 Japanese corpus. Instances of the construction from both corpora were analyzed to identify high-frequency part-of-speech categories and subcategories in the modifying clause predicate and head noun slots. A simple collexeme analysis was then employed to identify strongly attracted and repelled lexical items among those identified in realizations of the construction. Taken together, findings from these analyses revealed an important connection between the semantic weight of head nouns in NMCCs and the idiomaticity of the construction, with learner productions demonstrating a tendency toward heavy head nouns. This study lays the groundwork for future research seeking to explore the NMCC at different levels of granularity and to improve its treatment in JFL pedagogical materials.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"30 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140196917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}