首页 > 最新文献

Corpus Linguistics and Linguistic Theory最新文献

英文 中文
Present perfect and preterit variation in the Spanish of Lima and Mexico city: findings from a corpus analysis 利马和墨西哥城西班牙语的现在完成语和优选语变异:来自语料库分析的发现
IF 1.6 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-08-03 DOI: 10.1515/cllt-2022-0060
Anna Mastrantuono, Brendan Regan
Abstract In many languages, the present perfect has grammaticalized, gradually displacing the preterit. Within Spanish, this has been documented with the grammaticalization of the present perfect in Peninsular Spanish. To examine this possibility in two Latin American varieties, this study examined present perfect/preterit variation of 36 speakers from Lima and Mexico City from the PRESEEA corpus. While Lima Spanish presented overall more present perfect than Mexico City Spanish, a similar internal constraint hierarchy is predictive of present perfect use in both speech communities. However, Lima Spanish demonstrated a change in progress toward an expansion of the preterit among younger speakers with the indeterminate temporal reference as locus of change. The findings suggest that present perfect grammaticalization may not always be the most common cross-linguistic pathway but rather is subject to source constraints, which may lead to another pathway in which the preterit expands at the expense of the present perfect.
摘要在许多语言中,现在完成语已经语法化,逐渐取代了前置词。在西班牙语中,半岛西班牙语中现在完成语的语法化已经证明了这一点。为了在两个拉丁美洲变体中检验这种可能性,本研究从PRESEEA语料库中检验了来自利马和墨西哥城的36名说话者的当前完全/前语变体。虽然利马西班牙语总体上比墨西哥城西班牙语更具现在完成语,但相似的内部约束层次可以预测两种语言社区中现在完成语的使用。然而,利马西班牙语在年轻的说话者中表现出了一种进步,即在不确定的时间参照作为变化点的情况下,prerit的扩展。研究结果表明,现在完成语语法化可能并不总是最常见的跨语言途径,而是受到来源限制,这可能导致另一种以牺牲现在完成语为代价扩展前概念的途径。
{"title":"Present perfect and preterit variation in the Spanish of Lima and Mexico city: findings from a corpus analysis","authors":"Anna Mastrantuono, Brendan Regan","doi":"10.1515/cllt-2022-0060","DOIUrl":"https://doi.org/10.1515/cllt-2022-0060","url":null,"abstract":"Abstract In many languages, the present perfect has grammaticalized, gradually displacing the preterit. Within Spanish, this has been documented with the grammaticalization of the present perfect in Peninsular Spanish. To examine this possibility in two Latin American varieties, this study examined present perfect/preterit variation of 36 speakers from Lima and Mexico City from the PRESEEA corpus. While Lima Spanish presented overall more present perfect than Mexico City Spanish, a similar internal constraint hierarchy is predictive of present perfect use in both speech communities. However, Lima Spanish demonstrated a change in progress toward an expansion of the preterit among younger speakers with the indeterminate temporal reference as locus of change. The findings suggest that present perfect grammaticalization may not always be the most common cross-linguistic pathway but rather is subject to source constraints, which may lead to another pathway in which the preterit expands at the expense of the present perfect.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44220873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The linguistic organization of grammatical text complexity: comparing the empirical adequacy of theory-based models 语法篇章复杂性的语言组织:比较基于理论模型的经验充分性
IF 1.6 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-06-15 DOI: 10.1515/cllt-2023-0016
D. Biber, Tove Larsson, G. Hancock
Abstract Although there is a long tradition of research analyzing the grammatical complexity of texts (in both linguistics and applied linguistics), there is surprisingly little consensus on the nature of complexity. Many studies have disregarded syntactic (and structural) distinctions in their analyses of grammatical text complexity, treating it instead as if it were a single unified construct. However, other corpus-based studies indicate that different grammatical complexity features pattern in fundamentally different ways. The present study employs methods that are informed by structural equation modeling to test the goodness-of-fit of four models that can be motivated from previous research and linguistic theory: a model treating all complexity features as a single dimension, a model distinguishing among three major structural types of complexity features, a model distinguishing among three major syntactic functions of complexity features, and a model distinguishing among nine combinations of structural type and syntactic functions. The findings show that text complexity is clearly a multi-dimensional construct. Both structural and syntactic distinctions are important. Syntactic distinctions are actually more important than structural distinctions, although the combination of the two best accounts for the ways in which complexity features pattern in texts from different registers.
虽然在语言学和应用语言学中对语篇语法复杂性的研究有着悠久的传统,但令人惊讶的是,人们对复杂性的本质几乎没有共识。许多研究在分析语法文本复杂性时忽略了句法(和结构)的区别,而是将其视为一个单一的统一结构。然而,其他基于语料库的研究表明,不同的语法复杂性以根本不同的方式特征模式。本研究采用结构方程建模的方法来检验四个模型的拟合优度,这些模型可以从以前的研究和语言理论中得到激励:一个将所有复杂性特征视为单一维度的模型,一个区分三种主要结构类型的复杂性特征的模型,一个区分复杂性特征的三种主要句法功能的模型,一个区分结构类型和句法功能的九种组合的模型。研究结果表明,文本复杂性显然是一个多维结构。结构和句法的区别都很重要。语法上的区别实际上比结构上的区别更重要,尽管两者的结合最好地说明了复杂性在不同语域的文本中的特征模式。
{"title":"The linguistic organization of grammatical text complexity: comparing the empirical adequacy of theory-based models","authors":"D. Biber, Tove Larsson, G. Hancock","doi":"10.1515/cllt-2023-0016","DOIUrl":"https://doi.org/10.1515/cllt-2023-0016","url":null,"abstract":"Abstract Although there is a long tradition of research analyzing the grammatical complexity of texts (in both linguistics and applied linguistics), there is surprisingly little consensus on the nature of complexity. Many studies have disregarded syntactic (and structural) distinctions in their analyses of grammatical text complexity, treating it instead as if it were a single unified construct. However, other corpus-based studies indicate that different grammatical complexity features pattern in fundamentally different ways. The present study employs methods that are informed by structural equation modeling to test the goodness-of-fit of four models that can be motivated from previous research and linguistic theory: a model treating all complexity features as a single dimension, a model distinguishing among three major structural types of complexity features, a model distinguishing among three major syntactic functions of complexity features, and a model distinguishing among nine combinations of structural type and syntactic functions. The findings show that text complexity is clearly a multi-dimensional construct. Both structural and syntactic distinctions are important. Syntactic distinctions are actually more important than structural distinctions, although the combination of the two best accounts for the ways in which complexity features pattern in texts from different registers.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48874237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The blurring of the boundaries: changes in verb/noun heterosemy in Recent English 边界的模糊:近代英语动词/名词异质现象的变化
IF 1.6 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-06-12 DOI: 10.1515/cllt-2022-0053
Bin Shao, Jing Zheng, Hendrik De Smet
Abstract Conversion is a common feature of present-day English, leading to many ‘heterosemous’ words that express related meanings across multiple word classes. Especially common is verb/noun heterosemy, as in flow or hand, both of which can be used as verbs or as nouns. The prevalence of verb/noun heterosemy sets English apart from closely related Germanic languages and is one respect in which English behaves as a language with high boundary permeability. This paper investigates how verb/noun heterosemy has been evolving in Recent English (1920s–2010s). Using quantitative analysis within a large sample of 877 heterosemous words, it is shown that associations between specific words and word classes have been weakening over the last century. More precisely, within our sample, heterosemous words on average tend to develop towards more balanced heterosemy, whereby their association to either one word class or another becomes less pronounced. The findings suggest that English is in the process of a long-term drift towards greater boundary permeability. As high boundary permeability has been associated with low reliance on inflectional morphology in a language, this could be a long-term consequence of the overall loss of inflections earlier in the history of the language.
摘要转换是当今英语的一个常见特征,导致许多“异性恋”单词在多个单词类别中表达相关含义。特别常见的是动词/名词的异义词,如flow或hand,两者都可以用作动词或名词。动词/名词异质性的盛行使英语与密切相关的日耳曼语言区别开来,也是英语作为一种具有高度边界渗透性的语言表现出来的一个方面。本文研究了20世纪20年代至2010年代的现代英语中动词/名词的异义现象是如何演变的。通过对877个异表情词的大样本进行定量分析,发现在过去一个世纪里,特定单词和单词类别之间的联系一直在减弱。更准确地说,在我们的样本中,异性恋单词平均倾向于向更平衡的异性恋发展,从而它们与一个单词类别或另一个单词类型的关联变得不那么明显。研究结果表明,英语正处于一个长期向更大的边界渗透性漂移的过程中。由于高边界渗透性与语言中对屈折形态的低依赖性有关,这可能是语言历史早期屈折整体缺失的长期结果。
{"title":"The blurring of the boundaries: changes in verb/noun heterosemy in Recent English","authors":"Bin Shao, Jing Zheng, Hendrik De Smet","doi":"10.1515/cllt-2022-0053","DOIUrl":"https://doi.org/10.1515/cllt-2022-0053","url":null,"abstract":"Abstract Conversion is a common feature of present-day English, leading to many ‘heterosemous’ words that express related meanings across multiple word classes. Especially common is verb/noun heterosemy, as in flow or hand, both of which can be used as verbs or as nouns. The prevalence of verb/noun heterosemy sets English apart from closely related Germanic languages and is one respect in which English behaves as a language with high boundary permeability. This paper investigates how verb/noun heterosemy has been evolving in Recent English (1920s–2010s). Using quantitative analysis within a large sample of 877 heterosemous words, it is shown that associations between specific words and word classes have been weakening over the last century. More precisely, within our sample, heterosemous words on average tend to develop towards more balanced heterosemy, whereby their association to either one word class or another becomes less pronounced. The findings suggest that English is in the process of a long-term drift towards greater boundary permeability. As high boundary permeability has been associated with low reliance on inflectional morphology in a language, this could be a long-term consequence of the overall loss of inflections earlier in the history of the language.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44874543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Let my speakers talk: metalinguistic activity can indicate semantic change 让我的演讲者谈谈:元语言活动可以指示语义变化
IF 1.6 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-06-08 DOI: 10.1515/cllt-2023-0022
Israela Becker
Abstract In the absence of a diachronic corpus or a synchronic corpus tagged for speakers’ age, substantiating the presence of semantic change and the stage of change ― initial or advanced ― are challenging tasks. In the present study I introduce three methods for overcoming such difficulties by extracting various kinds of evidence from a synchronic corpus not tagged for speakers’ age. All three methods are based on speakers’ metalinguistic activity. Two of them are of a psycholinguistic nature and the third is of a sociolinguistic nature. Not only do these methods provide data hitherto overlooked by researchers for detecting semantic change, but they can also minimize the researchers’ need for interpretative interventions with regard to speakers’ communicative intentions, thus improving the quality of the analysis.
摘要在缺乏根据说话者年龄标记的历时语料库或共时语料库的情况下,证实语义变化的存在和变化的阶段——初始或高级——是一项具有挑战性的任务。在本研究中,我介绍了三种克服这些困难的方法,即从未标记说话者年龄的共时语料库中提取各种证据。这三种方法都是基于说话人的元语言活动。其中两个是心理语言学性质的,第三个是社会语言学性质的。这些方法不仅提供了迄今为止被研究人员忽视的用于检测语义变化的数据,而且还可以最大限度地减少研究人员对说话者交际意图的解释干预需求,从而提高分析质量。
{"title":"Let my speakers talk: metalinguistic activity can indicate semantic change","authors":"Israela Becker","doi":"10.1515/cllt-2023-0022","DOIUrl":"https://doi.org/10.1515/cllt-2023-0022","url":null,"abstract":"Abstract In the absence of a diachronic corpus or a synchronic corpus tagged for speakers’ age, substantiating the presence of semantic change and the stage of change ― initial or advanced ― are challenging tasks. In the present study I introduce three methods for overcoming such difficulties by extracting various kinds of evidence from a synchronic corpus not tagged for speakers’ age. All three methods are based on speakers’ metalinguistic activity. Two of them are of a psycholinguistic nature and the third is of a sociolinguistic nature. Not only do these methods provide data hitherto overlooked by researchers for detecting semantic change, but they can also minimize the researchers’ need for interpretative interventions with regard to speakers’ communicative intentions, thus improving the quality of the analysis.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49002825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multifactorial aspectual analysis of verb concatenation with imperfective markers zhe in Mandarin 汉语动词连接不完全标记语的多因素面相分析
IF 1.6 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-05-08 DOI: 10.1515/cllt-2022-0080
Junjie Jin, F. Li
Abstract As a cognitive ability to construe events in alternate ways, aspectuality has aroused many researchers’ academic attention; however, the concatenation of aspect markers in a clause is understudied in previous studies. The present paper follows a bidimensional approach of aspect to conduct a corpus-based aspectual analysis of verb concatenation with imperfective markers zhe (henceforth VCIMs zhe) in Mandarin. Specifically, to construe the cognitive inference mechanism of aspect, a multifactorial analysis of VCIMs zhe by the statistical techniques of multiple correspondence analysis, conditional inference trees and conditional random forests is carried out to explore the prototypical temporal features of verbs in two slots, predict the aspectual meanings of two imperfective markers zhe, and also discuss the conditional importance of factors such as durativity, dynamicity, telicity, boundedness, and slot in identifying the situation types of two verbs or verb phrases in VCIMs zhe. Methodologically, a usage-based multifactorial analysis of VCIMs zhe complements previous introspective studies on aspect marking. Theoretically, a corpus-based aspectual account of VCIMs zhe, one type of complex viewpoint aspects, expands traditional studies on Chinese aspect system, supplies evidence for aspect typology cross-linguistically, and provides reference for second language acquisition of usage patterns of zhe by non-native speakers.
作为一种以不同的方式解释事件的认知能力,方面性引起了学术界的广泛关注。然而,在以往的研究中,对从句中体标记的连接研究较少。本文采用多维方面的方法,对汉语动词与不完全标记物“着”(以下简称“着”)的连接进行了基于语料库的方面分析。具体来说,为了解释方面的认知推理机制,运用多重对应分析、条件推理树和条件随机森林等统计技术,对VCIMs的两个槽进行了多因素分析,探索两个槽动词的原型时间特征,预测两个不完全标记的方面意义,并讨论了持久性、动态性、遥性、有界性、有界性等因素的条件重要性。并对vcim中两个动词或动词短语的情景类型进行识别。在方法学上,基于使用的多因素VCIMs分析补充了先前对方面标记的内省研究。从理论上讲,基于语料库的对“着”这一复杂视点“着”的体貌描述,拓展了传统汉语体貌系统的研究,为“着”的跨语言类型学研究提供了依据,并为非母语者对“着”使用模式的二语习得提供了参考。
{"title":"A multifactorial aspectual analysis of verb concatenation with imperfective markers zhe in Mandarin","authors":"Junjie Jin, F. Li","doi":"10.1515/cllt-2022-0080","DOIUrl":"https://doi.org/10.1515/cllt-2022-0080","url":null,"abstract":"Abstract As a cognitive ability to construe events in alternate ways, aspectuality has aroused many researchers’ academic attention; however, the concatenation of aspect markers in a clause is understudied in previous studies. The present paper follows a bidimensional approach of aspect to conduct a corpus-based aspectual analysis of verb concatenation with imperfective markers zhe (henceforth VCIMs zhe) in Mandarin. Specifically, to construe the cognitive inference mechanism of aspect, a multifactorial analysis of VCIMs zhe by the statistical techniques of multiple correspondence analysis, conditional inference trees and conditional random forests is carried out to explore the prototypical temporal features of verbs in two slots, predict the aspectual meanings of two imperfective markers zhe, and also discuss the conditional importance of factors such as durativity, dynamicity, telicity, boundedness, and slot in identifying the situation types of two verbs or verb phrases in VCIMs zhe. Methodologically, a usage-based multifactorial analysis of VCIMs zhe complements previous introspective studies on aspect marking. Theoretically, a corpus-based aspectual account of VCIMs zhe, one type of complex viewpoint aspects, expands traditional studies on Chinese aspect system, supplies evidence for aspect typology cross-linguistically, and provides reference for second language acquisition of usage patterns of zhe by non-native speakers.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47468703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
To drop or not to drop? Predicting the omission of the infinitival marker in a Swedish future construction 摔还是不摔?预测瑞典语将来式中不定式标记的省略
IF 1.6 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-05-05 DOI: 10.1515/cllt-2022-0101
Aleksandrs Berdicevskis, E. Coussé, Alexander Koplenig, Yvonne Adesam
Abstract We investigate the optional omission of the infinitival marker in a Swedish future tense construction. During the last two decades the frequency of omission has been rapidly increasing, and this process has received considerable attention in the literature. We test whether the knowledge which has been accumulated can yield accurate predictions of language variation and change. We extracted all occurrences of the construction from a very large collection of corpora. The dataset was automatically annotated with language-internal predictors which have previously been shown or hypothesized to affect the variation. We trained several models in order to make two kinds of predictions: whether the marker will be omitted in a specific utterance and how large the proportion of omissions will be for a given time period. For most of the approaches we tried, we were not able to achieve a better-than-baseline performance. The only exception was predicting the proportion of omissions using autoregressive integrated moving average models for one-step-ahead forecast, and in this case time was the only predictor that mattered. Our data suggest that most of the language-internal predictors do have some effect on the variation, but the effect is not strong enough to yield reliable predictions.
摘要我们研究了瑞典语将来时结构中不定式标记的选择性省略。在过去的二十年里,遗漏的频率迅速增加,这一过程在文献中受到了相当大的关注。我们测试积累的知识是否能准确预测语言的变化。我们从大量的语料库中提取了所有出现的结构。数据集自动注释了语言内部预测因子,这些预测因子先前已被显示或假设会影响变异。为了做出两种预测,我们训练了几个模型:标记在特定的话语中是否会被省略,以及在给定的时间段内省略的比例有多大。对于我们尝试的大多数方法,我们都无法实现比基线更好的性能。唯一的例外是使用自回归综合移动平均模型预测遗漏的比例,用于提前一步预测,在这种情况下,时间是唯一重要的预测因素。我们的数据表明,大多数语言内部预测因素确实对变异有一定影响,但这种影响还不足以产生可靠的预测。
{"title":"To drop or not to drop? Predicting the omission of the infinitival marker in a Swedish future construction","authors":"Aleksandrs Berdicevskis, E. Coussé, Alexander Koplenig, Yvonne Adesam","doi":"10.1515/cllt-2022-0101","DOIUrl":"https://doi.org/10.1515/cllt-2022-0101","url":null,"abstract":"Abstract We investigate the optional omission of the infinitival marker in a Swedish future tense construction. During the last two decades the frequency of omission has been rapidly increasing, and this process has received considerable attention in the literature. We test whether the knowledge which has been accumulated can yield accurate predictions of language variation and change. We extracted all occurrences of the construction from a very large collection of corpora. The dataset was automatically annotated with language-internal predictors which have previously been shown or hypothesized to affect the variation. We trained several models in order to make two kinds of predictions: whether the marker will be omitted in a specific utterance and how large the proportion of omissions will be for a given time period. For most of the approaches we tried, we were not able to achieve a better-than-baseline performance. The only exception was predicting the proportion of omissions using autoregressive integrated moving average models for one-step-ahead forecast, and in this case time was the only predictor that mattered. Our data suggest that most of the language-internal predictors do have some effect on the variation, but the effect is not strong enough to yield reliable predictions.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42350433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frontmatter 头版头条
2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-05-01 DOI: 10.1515/cllt-2023-frontmatter2
{"title":"Frontmatter","authors":"","doi":"10.1515/cllt-2023-frontmatter2","DOIUrl":"https://doi.org/10.1515/cllt-2023-frontmatter2","url":null,"abstract":"","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136272042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of keyness metrics: performance and reliability 关键指标的评估:性能和可靠性
IF 1.6 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-04-27 DOI: 10.1515/cllt-2022-0116
Lukas Sönning
Abstract The methodological debates surrounding keyword analysis have given rise to a wide range of keyness metrics. The present paper delineates four dimensions of keyness, which distinguish between frequency- and dispersion-related perspectives. Existing measures are then organized according to these dimensions and evaluated with regard to their performance on a specific keyword analysis task: The identification of key verbs in academic writing. To this end, the rankings produced by 32 different metrics are evaluated against an established academic word list. Further, the reliability of measures is assessed, to determine whether they produce stable rankings across repeated studies on the same pair of text varieties. We observe notable differences among metrics with regard to these criteria. Our findings provide further support for the superiority of the Wilcoxon rank sum test and text-dispersion–based measures, and allow us to identify, within each dimension of keyness, metrics that may be given preference in applied work.
围绕关键字分析的方法论争论已经引起了广泛的关键字度量。本文描述了关键度的四个维度,它们区分了频率和色散相关的视角。然后根据这些维度组织现有的测量方法,并评估它们在特定关键词分析任务中的表现:学术写作中关键动词的识别。为此,由32种不同指标产生的排名是根据既定的学术词汇表进行评估的。此外,评估措施的可靠性,以确定它们是否在同一对文本品种的重复研究中产生稳定的排名。我们观察到关于这些标准的指标之间存在显著差异。我们的研究结果进一步支持了Wilcoxon秩和检验和基于文本分散的测量方法的优越性,并使我们能够在每个关键度维度中确定在应用工作中可能优先考虑的指标。
{"title":"Evaluation of keyness metrics: performance and reliability","authors":"Lukas Sönning","doi":"10.1515/cllt-2022-0116","DOIUrl":"https://doi.org/10.1515/cllt-2022-0116","url":null,"abstract":"Abstract The methodological debates surrounding keyword analysis have given rise to a wide range of keyness metrics. The present paper delineates four dimensions of keyness, which distinguish between frequency- and dispersion-related perspectives. Existing measures are then organized according to these dimensions and evaluated with regard to their performance on a specific keyword analysis task: The identification of key verbs in academic writing. To this end, the rankings produced by 32 different metrics are evaluated against an established academic word list. Further, the reliability of measures is assessed, to determine whether they produce stable rankings across repeated studies on the same pair of text varieties. We observe notable differences among metrics with regard to these criteria. Our findings provide further support for the superiority of the Wilcoxon rank sum test and text-dispersion–based measures, and allow us to identify, within each dimension of keyness, metrics that may be given preference in applied work.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43362274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seeing the wood for the trees: predictive margins for random forests 见树见木:随机森林的预测边缘
IF 1.6 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-03-28 DOI: 10.1515/cllt-2022-0083
Lukas Sönning, Jason Grafmiller
Abstract Classification trees and random forests offer a number of attractive features to corpus data analysts. However, the way in which these models are typically reported – a decision tree and/or set of variable importance scores – offers insufficient information if interest centers on the (form of) relationship between (multiple) predictors and the outcome. This paper develops predictive margins as an interpretative approach to ensemble techniques such as random forests. These are model summaries in the form of adjusted predictions, which provide a clearer picture of patterns in the data and allow us to query a model on potential nonlinear associations and interactions among predictor variables. The present paper outlines the general strategy for forming predictive margins and addresses methodological issues from an explicitly (corpus) linguistic perspective. For illustration, we use data on the English genitive alternation and provide an R package and code for their implementation.
摘要分类树和随机森林为语料库数据分析提供了许多有吸引力的特征。然而,如果兴趣集中在(多个)预测因子和结果之间的关系(形式)上,这些模型的典型报告方式——决策树和/或可变重要性分数集——提供的信息不足。本文发展预测边际作为一种解释方法集成技术,如随机森林。这些是调整预测形式的模型摘要,它提供了数据模式的更清晰的图像,并允许我们查询预测变量之间潜在的非线性关联和相互作用的模型。本文概述了形成预测边缘的一般策略,并从明确(语料库)语言学的角度解决了方法论问题。为了说明这一点,我们使用了英语属格替换的数据,并提供了一个R包和实现它们的代码。
{"title":"Seeing the wood for the trees: predictive margins for random forests","authors":"Lukas Sönning, Jason Grafmiller","doi":"10.1515/cllt-2022-0083","DOIUrl":"https://doi.org/10.1515/cllt-2022-0083","url":null,"abstract":"Abstract Classification trees and random forests offer a number of attractive features to corpus data analysts. However, the way in which these models are typically reported – a decision tree and/or set of variable importance scores – offers insufficient information if interest centers on the (form of) relationship between (multiple) predictors and the outcome. This paper develops predictive margins as an interpretative approach to ensemble techniques such as random forests. These are model summaries in the form of adjusted predictions, which provide a clearer picture of patterns in the data and allow us to query a model on potential nonlinear associations and interactions among predictor variables. The present paper outlines the general strategy for forming predictive margins and addresses methodological issues from an explicitly (corpus) linguistic perspective. For illustration, we use data on the English genitive alternation and provide an R package and code for their implementation.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"0 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41334909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A corpus-based quantitative study of numeral classifiers in Nepali 基于语料库的尼泊尔语数词定量研究
IF 1.6 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-02-13 DOI: 10.1515/cllt-2022-0064
Krishna Prasad Parajuli, Marc Allassonnière-Tang
Abstract Nepali is typologically rare in terms of nominal classification systems, as it is one of the few languages of the world having simultaneously two gender systems (human/non-human, masculine/feminine) and one numeral classifier system (distinguishing features such as human, round-shaped objects, and long objects among others). Such a rare co-occurrence of different nominal classification systems is highly relevant for investigating linguistic complexity, as languages generally do not have several systems of the same type fulfilling the same functions. However, no corpus-based quantitative analyses have been conducted on the productive use of nominal classification systems in Nepali. The current paper aims at filling this gap by providing a token-based study from the Nepali National Corpus (∼20 million words). Our preliminary results show that there is in fact little formal overlap between the classifier and the gender systems.
摘要尼泊尔语在名词分类系统方面在类型学上是罕见的,因为它是世界上为数不多的同时拥有两个性别系统(人类/非人、阳性/阴性)和一个数字分类系统(区分特征,如人类、圆形物体和长形物体等)的语言之一。不同的名词分类系统罕见地同时出现,这与研究语言复杂性非常相关,因为语言通常没有几个相同类型的系统来实现相同的功能。然而,尚未对尼泊尔语中名词分类系统的生产性使用进行基于语料库的定量分析。目前的论文旨在通过提供尼泊尔国家语料库(约2000万字)的代币研究来填补这一空白。我们的初步结果表明,事实上,分类器和性别系统之间几乎没有正式的重叠。
{"title":"A corpus-based quantitative study of numeral classifiers in Nepali","authors":"Krishna Prasad Parajuli, Marc Allassonnière-Tang","doi":"10.1515/cllt-2022-0064","DOIUrl":"https://doi.org/10.1515/cllt-2022-0064","url":null,"abstract":"Abstract Nepali is typologically rare in terms of nominal classification systems, as it is one of the few languages of the world having simultaneously two gender systems (human/non-human, masculine/feminine) and one numeral classifier system (distinguishing features such as human, round-shaped objects, and long objects among others). Such a rare co-occurrence of different nominal classification systems is highly relevant for investigating linguistic complexity, as languages generally do not have several systems of the same type fulfilling the same functions. However, no corpus-based quantitative analyses have been conducted on the productive use of nominal classification systems in Nepali. The current paper aims at filling this gap by providing a token-based study from the Nepali National Corpus (∼20 million words). Our preliminary results show that there is in fact little formal overlap between the classifier and the gender systems.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43975397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Corpus Linguistics and Linguistic Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1