Corpus Linguistics and Linguistic Theory最新文献_第10页

Prototype-driven alternations: The case of German weak nouns 原型驱动的改变:德语弱名词的例子

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2019-10-25 DOI: 10.1515/cllt-2015-0051

R. Schäfer

Abstract Over the past years, multifactorial corpus-based explorations of alternations in grammar have become an accepted major tool in cognitively oriented corpus linguistics. For example, prototype theory as a theory of similarity-based and inherently probabilistic linguistic categorization has received support from studies showing that alternating constructions and items often occur with probabilities influenced by prototypical formal, semantic or contextual factors. In this paper, I analyze a low-frequency alternation effect in German noun inflection in terms of prototype theory, based on strong hypotheses from the existing literature that I integrate into an established theoretical framework of usage-based probabilistic morphology, which allows us to account for similarity effects even in seemingly regular areas of the grammar. Specifically, the so-called weak masculine nouns in German, which follow an unusual pattern of case marking and often have characteristic lexical properties, sporadically occur in forms of the dominant strong masculine nouns. Using data from the nine-billion-token DECOW12A web corpus of contemporary German, I demonstrate that the probability of the alternation is influenced by the presence or absence of semantic, phonotactic, and paradigmatic features. Token frequency is also shown to have an effect on the alternation, in line with common assumptions about the relation between frequency and entrenchment. I use a version of prototype theory with weighted features and polycentric categories, but I also discuss the question of whether such corpus data can be taken as strong evidence for or against specific models of cognitive representation (prototypes vs. exemplars).

近年来，基于多因素语料库的语法变化研究已成为认知导向语料库语言学的重要研究工具。例如，原型理论作为一种基于相似性和固有概率的语言分类理论，得到了研究的支持，这些研究表明，交替结构和项目经常发生，其概率受到原型形式、语义或语境因素的影响。在本文中，我根据原型理论分析了德语名词屈折的低频交替效应，基于现有文献的强有力假设，我将这些假设整合到基于使用的概率形态学的既定理论框架中，这使我们能够解释即使在看似规则的语法区域也能解释相似效应。具体来说，德语中所谓的弱阳性名词，遵循一种不同寻常的格标记模式，往往具有独特的词汇特性，偶尔出现在占主导地位的强阳性名词的形式中。使用来自90亿个token DECOW12A当代德语网络语料库的数据，我证明了这种交替的可能性受到语义、语音和范式特征存在与否的影响。令牌频率也显示对交替有影响，这与频率和堑壕之间关系的常见假设一致。我使用带有加权特征和多中心类别的原型理论版本，但我也讨论了这样的语料库数据是否可以作为支持或反对特定认知表征模型(原型与范例)的有力证据的问题。

{"title":"Prototype-driven alternations: The case of German weak nouns","authors":"R. Schäfer","doi":"10.1515/cllt-2015-0051","DOIUrl":"https://doi.org/10.1515/cllt-2015-0051","url":null,"abstract":"Abstract Over the past years, multifactorial corpus-based explorations of alternations in grammar have become an accepted major tool in cognitively oriented corpus linguistics. For example, prototype theory as a theory of similarity-based and inherently probabilistic linguistic categorization has received support from studies showing that alternating constructions and items often occur with probabilities influenced by prototypical formal, semantic or contextual factors. In this paper, I analyze a low-frequency alternation effect in German noun inflection in terms of prototype theory, based on strong hypotheses from the existing literature that I integrate into an established theoretical framework of usage-based probabilistic morphology, which allows us to account for similarity effects even in seemingly regular areas of the grammar. Specifically, the so-called weak masculine nouns in German, which follow an unusual pattern of case marking and often have characteristic lexical properties, sporadically occur in forms of the dominant strong masculine nouns. Using data from the nine-billion-token DECOW12A web corpus of contemporary German, I demonstrate that the probability of the alternation is influenced by the presence or absence of semantic, phonotactic, and paradigmatic features. Token frequency is also shown to have an effect on the alternation, in line with common assumptions about the relation between frequency and entrenchment. I use a version of prototype theory with weighted features and polycentric categories, but I also discuss the question of whether such corpus data can be taken as strong evidence for or against specific models of cognitive representation (prototypes vs. exemplars).","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"15 1","pages":"383 - 417"},"PeriodicalIF":1.6,"publicationDate":"2019-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/cllt-2015-0051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48125671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Against statistical significance testing in corpus linguistics 反对语料库语言学中的统计显著性检验

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2019-10-25 DOI: 10.1515/cllt-2016-0036

Alexander Koplenig

Abstract In the first volume of Corpus Linguistics and Linguistic Theory, Gries (2005. Null-hypothesis significance testing of word frequencies: A follow-up on Kilgarriff. Corpus Linguistics and Linguistic Theory 1(2). doi:10.1515/cllt.2005.1.2.277. http://www.degruyter.com/view/j/cllt.2005.1.issue-2/cllt.2005.1.2.277/cllt.2005.1.2.277.xml: 285) asked whether corpus linguists should abandon null-hypothesis significance testing. In this paper, I want to revive this discussion by defending the argument that the assumptions that allow inferences about a given population – in this case about the studied languages – based on results observed in a sample – in this case a collection of naturally occurring language data – are not fulfilled. As a consequence, corpus linguists should indeed abandon null-hypothesis significance testing.

摘要载于《语料库语言学与语言学理论》第一卷，格里斯（2005）。词频的零假设显著性检验：Kilgarriff的后续研究。语料库语言学与语言学理论1（2）。doi:10.1515/cllt.2005.1.2.277。http://www.degruyter.com/view/j/cllt.2005.1.issue-2/cllt.2005.1.2.277/cllt.2005.1.2.277.xml:285）询问语料库语言学家是否应该放弃零假设显著性测试。在这篇论文中，我想通过捍卫这样一种论点来重新展开这一讨论，即允许根据样本中观察到的结果对给定人群进行推断的假设——在本例中是对所研究的语言的推断——在本案中是对自然发生的语言数据的收集——没有得到满足。因此，语料库语言学家确实应该放弃零假设意义测试。

引用次数: 52

Definite article bridging relations in L2: A learner corpus study 二语中定冠词桥接关系的学习者语料库研究

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2019-10-25 DOI: 10.1515/cllt-2015-0058

P. Crosthwaite

Abstract Bridging relations are used when the identity of a discourse-new entity can be inferred via lexical relations from an antecedent (e. g. a cake … the slice) or non-lexically via reference to world knowledge or discourse structure (e. g. a war … the survivors). Such relations are marked in English via the definite article, which is considered a difficult feature of the English language for L2 learners to acquire, particularly for L1 speakers of article-less languages. This paper provides an Integrated Contrastive Model (e. g. Granger 1996) of the L1 and L2 production of definite article bridging relations using L2 English learner corpus data produced by native Mandarin and Korean speakers at four L2 proficiency levels, alongside comparative native English data. The data is taken from the International Corpus Network of Asian Learners of English (ICNALE, Ishikawa 2011, 2013), totalling just under 400,000 words with over 1500 bridging NPs identified. Results suggest subtle but significant differences between L1-L2 and L2-L2 groupings in terms of the frequency of particular bridging relation types and lemmatised wordings identified in the data, although there was little evidence of pseudo-longitudinal development. Such differences may suggest an effect of L1-L2 linguistic relativity, influencing the selection of relational links between given/new discourse entities during L2 production.

摘要衔接关系是指通过词汇关系从先行词（e。 g.蛋糕…切片）或通过参考世界知识或话语结构的非词汇（e。 g.战争…幸存者）。这种关系在英语中是通过定冠词来标记的，这被认为是二语学习者，尤其是无冠词语言的母语使用者难以习得的英语语言的一个特点。本文提出了一个综合对比模型（e。 g.Granger 1996），使用母语为普通话和韩语的人在四个二语水平上产生的二语英语学习者语料库数据，以及比较母语为英语的数据，对一语和二语产生的定冠词桥接关系进行了研究。数据来自亚洲英语学习者国际语料库网络（ICNALE，Ishikawa 20112013），总计不到400000个单词，已确定1500多个衔接NP。研究结果表明，L1-L2和L2-L2组在数据中发现的特定桥接关系类型和引理词的频率方面存在细微但显著的差异，尽管几乎没有证据表明存在伪纵向发展。这种差异可能表明了L1-L2语言相关性的影响，影响了二语生成过程中给定/新话语实体之间关系联系的选择。

{"title":"Definite article bridging relations in L2: A learner corpus study","authors":"P. Crosthwaite","doi":"10.1515/cllt-2015-0058","DOIUrl":"https://doi.org/10.1515/cllt-2015-0058","url":null,"abstract":"Abstract Bridging relations are used when the identity of a discourse-new entity can be inferred via lexical relations from an antecedent (e. g. a cake … the slice) or non-lexically via reference to world knowledge or discourse structure (e. g. a war … the survivors). Such relations are marked in English via the definite article, which is considered a difficult feature of the English language for L2 learners to acquire, particularly for L1 speakers of article-less languages. This paper provides an Integrated Contrastive Model (e. g. Granger 1996) of the L1 and L2 production of definite article bridging relations using L2 English learner corpus data produced by native Mandarin and Korean speakers at four L2 proficiency levels, alongside comparative native English data. The data is taken from the International Corpus Network of Asian Learners of English (ICNALE, Ishikawa 2011, 2013), totalling just under 400,000 words with over 1500 bridging NPs identified. Results suggest subtle but significant differences between L1-L2 and L2-L2 groupings in terms of the frequency of particular bridging relation types and lemmatised wordings identified in the data, although there was little evidence of pseudo-longitudinal development. Such differences may suggest an effect of L1-L2 linguistic relativity, influencing the selection of relational links between given/new discourse entities during L2 production.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"15 1","pages":"297 - 319"},"PeriodicalIF":1.6,"publicationDate":"2019-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/cllt-2015-0058","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45865903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Grammatical construction of function words between old and modern written Arabic: A corpus-based analysis 古今书面语虚词的语法结构:基于语料库的分析

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2019-10-25 DOI: 10.1515/cllt-2016-0069

Sultan Almujaiwel

Abstract This paper argues that Arabic function words (FWs) vary in usage between old and modern Arabic, thus prompting an experimental investigation into their changeability. This investigation is carried out by testing classical Arabic (CA) in Arabic heritage language (AHL) texts – those labeled as archistratum – and the modern standard Arabic (MSA) of Arabic newspaper texts (ANT), each group of which contains randomly collected 5 million (M) word texts. The linguistic theory of the grammar of Arabic FWs is explained through the differences between CA and MSA, despite Arabic FW changes and the unlearnability and/or unusability of some FW constructions between in these two eras of Arabic usage. The dispersion/distribution of the construction grammar (CxG) of FWs and the number (n) of word attractions/repulsions between the two distinct eras is explored using the very latest and most sophisticated Arabic corpus processing tools, and Sketch Engine’s SkeEn gramrels operators. The analysis of a 5 M word corpus from each era of Arabic serves to prove the non-existence of rigorous Arabic CxG. The approach in this study adopts a technique which, by contrasting AHL with ANT, relies on analyzing the frequency distributions of FWs, the co-occurrences of FWs in a span of 2n-grams collocational patterning, and some cases of FW usage changes in terms of lexical cognition (FW grammatical relationships). The results show that the frequencies of FWs, in addition to the case studies, are not the same, and this implies that FWs and their associations with the main part of speech class in a fusion language like Arabic have grammatically changed in MSA. Their constructional changes are neglected in Arabic grammar.

摘要本文认为阿拉伯语虚词在古阿拉伯语和现代阿拉伯语中的用法存在差异，因此对其变化进行了实验研究。这项调查是通过测试阿拉伯遗产语言(AHL)文本中的古典阿拉伯语(CA) -那些被标记为档案的文本-和阿拉伯报纸文本(ANT)的现代标准阿拉伯语(MSA)来进行的，每组包含随机收集的500万(M)单词文本。阿拉伯语FW语法的语言学理论是通过CA和MSA之间的差异来解释的，尽管阿拉伯语FW发生了变化，并且在这两个阿拉伯语使用时代之间一些FW结构不可学习和/或不可使用。使用最新和最复杂的阿拉伯语料库处理工具和Sketch Engine的SkeEn gramrels算子，探索了fw的结构语法(CxG)的分散/分布以及两个不同时代之间的单词吸引/排斥数量(n)。通过对阿拉伯文各个时代的500万个单词语料库的分析，可以证明严格的阿拉伯文CxG不存在。本研究采用AHL与ANT相比较的方法，通过分析FW的频率分布、FW在2n-gram搭配模式范围内的共现情况以及FW在词汇认知(FW语法关系)方面的一些使用变化情况。结果表明，除了个案研究外，陪语的频率也不相同，这意味着陪语及其与阿拉伯语等融合语言的主要词类的关联在MSA中发生了语法变化。它们的结构变化在阿拉伯语语法中被忽略了。

{"title":"Grammatical construction of function words between old and modern written Arabic: A corpus-based analysis","authors":"Sultan Almujaiwel","doi":"10.1515/cllt-2016-0069","DOIUrl":"https://doi.org/10.1515/cllt-2016-0069","url":null,"abstract":"Abstract This paper argues that Arabic function words (FWs) vary in usage between old and modern Arabic, thus prompting an experimental investigation into their changeability. This investigation is carried out by testing classical Arabic (CA) in Arabic heritage language (AHL) texts – those labeled as archistratum – and the modern standard Arabic (MSA) of Arabic newspaper texts (ANT), each group of which contains randomly collected 5 million (M) word texts. The linguistic theory of the grammar of Arabic FWs is explained through the differences between CA and MSA, despite Arabic FW changes and the unlearnability and/or unusability of some FW constructions between in these two eras of Arabic usage. The dispersion/distribution of the construction grammar (CxG) of FWs and the number (n) of word attractions/repulsions between the two distinct eras is explored using the very latest and most sophisticated Arabic corpus processing tools, and Sketch Engine’s SkeEn gramrels operators. The analysis of a 5 M word corpus from each era of Arabic serves to prove the non-existence of rigorous Arabic CxG. The approach in this study adopts a technique which, by contrasting AHL with ANT, relies on analyzing the frequency distributions of FWs, the co-occurrences of FWs in a span of 2n-grams collocational patterning, and some cases of FW usage changes in terms of lexical cognition (FW grammatical relationships). The results show that the frequencies of FWs, in addition to the case studies, are not the same, and this implies that FWs and their associations with the main part of speech class in a fusion language like Arabic have grammatically changed in MSA. Their constructional changes are neglected in Arabic grammar.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"15 1","pages":"267 - 296"},"PeriodicalIF":1.6,"publicationDate":"2019-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/cllt-2016-0069","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46255286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

How do English translations differ from non-translated English writings? A multi-feature statistical model for linguistic variation analysis 英语翻译与非翻译的英语作品有何不同?语言变异分析的多特征统计模型

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2019-10-01 DOI: 10.1515/cllt-2014-0047

Xianyao Hu, R. Xiao, A. Hardie

Abstract This paper discusses the debatable hypotheses of “Translation Universals”, i. e. the recurring common features of translated texts in relation to original utterances. We propose that, if translational language does have some distinctive linguistic features in contrast to non-translated writings in the same language, those differences should be statistically significant, consistently distributed and systematically co-occurring across registers and genres. Based on the balanced Corpus of Translational English (COTE) and its non-translated English counterpart, the Freiburg-LOB corpus of British English (FLOB), and by deploying a multi-feature statistical analysis on 96 lexical, syntactic and textual features, we try to pinpoint those distinctive features in translated English texts. We also propose that the stylo-statistical model developed in this study will be effective not only in analysing the translational variation of English but also be capable of clustering those variational features into a “translational” dimension which will facilitate a crosslinguistic comparison of translational languages (e. g. translational Chinese) to test the Translation Universals hypotheses.

摘要本文讨论了“翻译普遍性”的几个有争议的假设。 e.翻译文本相对于原文反复出现的共同特征。我们认为，如果翻译语言与同一语言中的非翻译作品相比确实具有一些独特的语言特征，那么这些差异应该在统计上是显著的、一致分布的，并且在语域和流派中系统地共同发生。在平衡翻译英语语料库（COTE）和非翻译英语语料库弗赖堡LOB英国英语语料库（FLOB）的基础上，通过对96个词汇、句法和篇章特征进行多特征统计分析，我们试图找出翻译英语文本中的这些特征。我们还提出，本研究中开发的风格统计模型不仅可以有效地分析英语的翻译变化，而且能够将这些变化特征聚类到一个“翻译”维度，这将有助于翻译语言的跨语言比较。 g.翻译汉语）来检验翻译普遍性假说。

{"title":"How do English translations differ from non-translated English writings? A multi-feature statistical model for linguistic variation analysis","authors":"Xianyao Hu, R. Xiao, A. Hardie","doi":"10.1515/cllt-2014-0047","DOIUrl":"https://doi.org/10.1515/cllt-2014-0047","url":null,"abstract":"Abstract This paper discusses the debatable hypotheses of “Translation Universals”, i. e. the recurring common features of translated texts in relation to original utterances. We propose that, if translational language does have some distinctive linguistic features in contrast to non-translated writings in the same language, those differences should be statistically significant, consistently distributed and systematically co-occurring across registers and genres. Based on the balanced Corpus of Translational English (COTE) and its non-translated English counterpart, the Freiburg-LOB corpus of British English (FLOB), and by deploying a multi-feature statistical analysis on 96 lexical, syntactic and textual features, we try to pinpoint those distinctive features in translated English texts. We also propose that the stylo-statistical model developed in this study will be effective not only in analysing the translational variation of English but also be capable of clustering those variational features into a “translational” dimension which will facilitate a crosslinguistic comparison of translational languages (e. g. translational Chinese) to test the Translation Universals hypotheses.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"15 1","pages":"347 - 382"},"PeriodicalIF":1.6,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/cllt-2014-0047","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42398556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Frontmatter

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2019-09-27 DOI: 10.1515/cllt-2019-frontmatter2

引用次数: 0

Vocabulary complexity and reading and listening comprehension of various physics genres 不同物理体裁的词汇复杂性和阅读和听力理解

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2019-09-27 DOI: 10.1515/cllt-2019-0022

Milica Vuković Stamatović

Abstract This study sheds light on the vocabulary complexity of various physics genres and how it affects reading and listening comprehension of the science of physics. We analysed the vocabulary frequency profile of seven physics genres: research articles, textbooks, lectures, magazines, popular books, TV documentaries and TED talks, to determine the presence of general-purpose, academic and technical vocabulary in them, as well as their vocabulary level and variation. The main research question was whether the vocabulary level of these genres could pose an impediment to typical native and non-native speakers of English in terms of their reading/listening comprehension, and, in general, how accessible these genres are vocabulary-wise. The results suggest that typical native speakers will struggle reading physics research and magazine articles, whereas typical non-native speakers will not read/listen to any of the genres at an optimal level, but will be able to read/listen to four of them at an acceptable level.

摘要本研究揭示了不同物理体裁的词汇复杂性及其对物理科学阅读和听力理解的影响。我们分析了研究文章、教科书、讲座、杂志、流行书籍、电视纪录片和TED演讲等七种物理类型的词汇频率特征，以确定它们中通用词汇、学术词汇和技术词汇的存在，以及它们的词汇水平和变化。主要的研究问题是，这些体裁的词汇水平是否会对典型的英语母语和非英语母语人士的阅读/听力理解造成障碍，以及总的来说，这些体裁在词汇方面有多容易理解。结果表明，典型的以英语为母语的人在阅读物理研究和杂志文章时会遇到困难，而典型的非以英语为母语的人在阅读/听任何一种类型的文章时都不会达到最佳水平，但在阅读/听其中四种类型的文章时却能达到可接受的水平。

引用次数: 10

Pluralized non-count nouns across Englishes: A corpus-linguistic approach to variety types 英语中不可数名词的复数化：从语料库语言学角度研究多样性类型

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2019-07-24 DOI: 10.1515/CLLT-2018-0068

G. Schneider, M. Hundt, D. Schreier

Abstract This corpus-based study of pluralized non-count nouns (informations, advices, etc.) uses collocation-derived measures (determiners vs. bare noun and mass quantifiers) to extract potential candidates of non-count nouns in a bottom-up approach from the British National Corpus (BNC), allowing the detection of grammatical categories from distributional features. We then use this token list to retrieve data on pluralization of non-counts from nine annotated components of the International Corpus of English (ICE). While the distinction between count and non-count nouns is gradient rather than categorical, it is still possible to distinguish between standard and non-standard pluralization of non-counts. Qualitative analyses of our data show that non-standard pluralization of non-count nouns is regularly attested in second-language varieties, including previously unrecorded types; however, it is also occasionally found in first-language varieties. We discuss implications of our corpus results for common explanations of pluralized non-count nouns, such as substrate influence, language learning effects and historical input. By combining a bottom-up corpus-based approach with fine-grained qualitative analyses we can provide a more nuanced view of pluralization of non-counts across ENL and ESL for the investigation of World Englishes.

摘要:本文基于语料库对复数不可数名词(信息、建议等)进行研究，使用搭配衍生量词(限定词与裸名词和质量量词)从英国国家语料库(BNC)中自下而上的方法提取不可数名词的潜在候选词，从而从分布特征中检测语法类别。然后，我们使用这个令牌列表从国际英语语料库(ICE)的九个注释组件中检索非计数复数的数据。虽然可数名词和不可数名词之间的区别是渐变的，而不是直言的，但仍然可以区分非可数名词的标准和非标准复数形式。数据的定性分析表明，不可数名词的非标准复数形式在第二语言变体中经常得到证实，包括以前未记录的类型;然而，它偶尔也会出现在第一语言变体中。我们讨论了语料库结果对复数不可数名词的常见解释的影响，如基底影响、语言学习效应和历史输入。通过将自下而上的基于语料库的方法与细粒度的定性分析相结合，我们可以为世界英语的调查提供一个更细致入微的非数词复数的观点。

{"title":"Pluralized non-count nouns across Englishes: A corpus-linguistic approach to variety types","authors":"G. Schneider, M. Hundt, D. Schreier","doi":"10.1515/CLLT-2018-0068","DOIUrl":"https://doi.org/10.1515/CLLT-2018-0068","url":null,"abstract":"Abstract This corpus-based study of pluralized non-count nouns (informations, advices, etc.) uses collocation-derived measures (determiners vs. bare noun and mass quantifiers) to extract potential candidates of non-count nouns in a bottom-up approach from the British National Corpus (BNC), allowing the detection of grammatical categories from distributional features. We then use this token list to retrieve data on pluralization of non-counts from nine annotated components of the International Corpus of English (ICE). While the distinction between count and non-count nouns is gradient rather than categorical, it is still possible to distinguish between standard and non-standard pluralization of non-counts. Qualitative analyses of our data show that non-standard pluralization of non-count nouns is regularly attested in second-language varieties, including previously unrecorded types; however, it is also occasionally found in first-language varieties. We discuss implications of our corpus results for common explanations of pluralized non-count nouns, such as substrate influence, language learning effects and historical input. By combining a bottom-up corpus-based approach with fine-grained qualitative analyses we can provide a more nuanced view of pluralization of non-counts across ENL and ESL for the investigation of World Englishes.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"16 1","pages":"515 - 546"},"PeriodicalIF":1.6,"publicationDate":"2019-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/CLLT-2018-0068","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44211764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Shell nouns as grammatical metaphor revealing disparate construals: Investigating the differences between British English and China English based on a comparable corpus 壳名词作为揭示不同识解的语法隐喻:基于可比语料库的英式英语和中式英语差异研究

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2019-06-14 DOI: 10.1515/CLLT-2018-0047

Min Dong, A. Fang

Abstract This article describes a study of shell nouns (SNs) complemented by appositive that-clauses observed in a two-million-word corpus of media English by British and Chinese writers. The grammatical metaphor theory was applied to the data in the light of a novel proposal that the metaphorical forms of SN+that constructions, in their contextual semantic settings, serve to re-construe various transitivity processes. The study produced significant findings, including: (1) the two writer groups demonstrate significantly different preferences for SN types but the British and the Chinese uses are instantiated from a common core set; (2) the Chinese group prefers the re-construal of Identifying Relational processes of facts and evidence as markers of neutral and impersonal discourse; (3) British writers favour the re-construal of Verbal processes of assertion and stance and tend to re-construe Attributive Relational processes with varying degrees of commitment to the encapsulated propositional truth; (4) both groups are inclined towards the re-construal of Mental processes of cognition with a common preference for the re-construal of the experience of knowing, believing and thinking. The findings above lend important empirical support to systemic functional theories and suggest further research in the future regarding SNs as indicators of disparate construals in discourse.

摘要本文介绍了英国和中国作家在200万字的媒体英语语料库中观察到的由同位从句补充的外壳名词的研究。语法隐喻理论被应用于数据，因为有一个新颖的建议，即SN+的隐喻形式，即结构在其上下文语义环境中，用于重新解释各种及物性过程。该研究产生了重要的发现，包括：（1）两个作家群体对SN类型的偏好显著不同，但英国人和中国人的使用是从一个共同的核心集合中实例化的；（2）汉语群体更倾向于重新建构“认定事实和证据的关系过程”作为中性和非客观话语的标记；（3）英国作家倾向于对断言和立场的言语过程进行重新解释，并倾向于对封装的命题真理进行不同程度的承诺来重新解释归因关系过程；（4）这两个群体都倾向于重新建构认知的心理过程，共同偏好重新建构认知、相信和思考的经验。以上研究结果为系统功能理论提供了重要的实证支持，并建议未来进一步研究SNs作为语篇中不同构念的指标。

{"title":"Shell nouns as grammatical metaphor revealing disparate construals: Investigating the differences between British English and China English based on a comparable corpus","authors":"Min Dong, A. Fang","doi":"10.1515/CLLT-2018-0047","DOIUrl":"https://doi.org/10.1515/CLLT-2018-0047","url":null,"abstract":"Abstract This article describes a study of shell nouns (SNs) complemented by appositive that-clauses observed in a two-million-word corpus of media English by British and Chinese writers. The grammatical metaphor theory was applied to the data in the light of a novel proposal that the metaphorical forms of SN+that constructions, in their contextual semantic settings, serve to re-construe various transitivity processes. The study produced significant findings, including: (1) the two writer groups demonstrate significantly different preferences for SN types but the British and the Chinese uses are instantiated from a common core set; (2) the Chinese group prefers the re-construal of Identifying Relational processes of facts and evidence as markers of neutral and impersonal discourse; (3) British writers favour the re-construal of Verbal processes of assertion and stance and tend to re-construe Attributive Relational processes with varying degrees of commitment to the encapsulated propositional truth; (4) both groups are inclined towards the re-construal of Mental processes of cognition with a common preference for the re-construal of the experience of knowing, believing and thinking. The findings above lend important empirical support to systemic functional theories and suggest further research in the future regarding SNs as indicators of disparate construals in discourse.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"17 1","pages":"743 - 779"},"PeriodicalIF":1.6,"publicationDate":"2019-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/CLLT-2018-0047","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47030812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Toward an optimal code for communication: The case of scientific English 走向最佳的交流代码:以科学英语为例

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2019-06-13 DOI: 10.1515/CLLT-2018-0088

Stefania Degaetano-Ortlieb, E. Teich

Abstract We present a model of the linguistic development of scientific English from the mid-seventeenth to the late-nineteenth century, a period that witnessed significant political and social changes, including the evolution of modern science. There is a wealth of descriptive accounts of scientific English, both from a synchronic and a diachronic perspective, but only few attempts at a unified explanation of its evolution. The explanation we offer here is a communicative one: while external pressures (specialization, diversification) push for an increase in expressivity, communicative concerns pull toward convergence on particular options (conventionalization). What emerges over time is a code which is optimized for written, specialist communication, relying on specific linguistic means to modulate information content. As we show, this is achieved by the systematic interplay between lexis and grammar. The corpora we employ are the Royal Society Corpus (RSC) and for comparative purposes, the Corpus of Late Modern English (CLMET). We build various diachronic, computational n-gram language models of these corpora and then apply formal measures of information content (here: relative entropy and surprisal) to detect the linguistic features significantly contributing to diachronic change, estimate the (changing) level of information of features and capture the time course of change.

摘要我们介绍了17世纪中期至19世纪末科学英语的语言学发展模式，这一时期见证了重大的政治和社会变革，包括现代科学的演变。从共时和历时的角度来看，科学英语有很多描述性的描述，但很少有人试图对其演变做出统一的解释。我们在这里提供的解释是一种交际性的解释：当外部压力（专业化、多样化）推动表达能力的提高时，交际关注点则倾向于在特定选项上趋同（惯例化）。随着时间的推移，出现了一种针对书面专业交流进行优化的代码，它依赖于特定的语言手段来调节信息内容。正如我们所展示的，这是通过词汇和语法之间系统的相互作用来实现的。我们使用的语料库是皇家学会语料库（RSC），出于比较的目的，是现代晚期英语语料库（CLMET）。我们为这些语料库建立了各种历时性的、计算性的n-gram语言模型，然后应用信息内容的形式化度量（这里：相对熵和surprisal）来检测对历时变化有显著贡献的语言特征，估计特征的信息（变化）水平，并捕捉变化的时间过程。

{"title":"Toward an optimal code for communication: The case of scientific English","authors":"Stefania Degaetano-Ortlieb, E. Teich","doi":"10.1515/CLLT-2018-0088","DOIUrl":"https://doi.org/10.1515/CLLT-2018-0088","url":null,"abstract":"Abstract We present a model of the linguistic development of scientific English from the mid-seventeenth to the late-nineteenth century, a period that witnessed significant political and social changes, including the evolution of modern science. There is a wealth of descriptive accounts of scientific English, both from a synchronic and a diachronic perspective, but only few attempts at a unified explanation of its evolution. The explanation we offer here is a communicative one: while external pressures (specialization, diversification) push for an increase in expressivity, communicative concerns pull toward convergence on particular options (conventionalization). What emerges over time is a code which is optimized for written, specialist communication, relying on specific linguistic means to modulate information content. As we show, this is achieved by the systematic interplay between lexis and grammar. The corpora we employ are the Royal Society Corpus (RSC) and for comparative purposes, the Corpus of Late Modern English (CLMET). We build various diachronic, computational n-gram language models of these corpora and then apply formal measures of information content (here: relative entropy and surprisal) to detect the linguistic features significantly contributing to diachronic change, estimate the (changing) level of information of features and capture the time course of change.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"18 1","pages":"175 - 207"},"PeriodicalIF":1.6,"publicationDate":"2019-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/CLLT-2018-0088","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48344006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24