Corpus Linguistics and Linguistic Theory最新文献

英文中文

Seeing the wood for the trees: predictive margins for random forests 见树见木:随机森林的预测边缘

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2023-03-28 DOI: 10.1515/cllt-2022-0083

Lukas Sönning, Jason Grafmiller

Abstract Classification trees and random forests offer a number of attractive features to corpus data analysts. However, the way in which these models are typically reported – a decision tree and/or set of variable importance scores – offers insufficient information if interest centers on the (form of) relationship between (multiple) predictors and the outcome. This paper develops predictive margins as an interpretative approach to ensemble techniques such as random forests. These are model summaries in the form of adjusted predictions, which provide a clearer picture of patterns in the data and allow us to query a model on potential nonlinear associations and interactions among predictor variables. The present paper outlines the general strategy for forming predictive margins and addresses methodological issues from an explicitly (corpus) linguistic perspective. For illustration, we use data on the English genitive alternation and provide an R package and code for their implementation.

摘要分类树和随机森林为语料库数据分析提供了许多有吸引力的特征。然而，如果兴趣集中在(多个)预测因子和结果之间的关系(形式)上，这些模型的典型报告方式——决策树和/或可变重要性分数集——提供的信息不足。本文发展预测边际作为一种解释方法集成技术，如随机森林。这些是调整预测形式的模型摘要，它提供了数据模式的更清晰的图像，并允许我们查询预测变量之间潜在的非线性关联和相互作用的模型。本文概述了形成预测边缘的一般策略，并从明确(语料库)语言学的角度解决了方法论问题。为了说明这一点，我们使用了英语属格替换的数据，并提供了一个R包和实现它们的代码。

引用次数: 0

A corpus-based quantitative study of numeral classifiers in Nepali 基于语料库的尼泊尔语数词定量研究

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2023-02-13 DOI: 10.1515/cllt-2022-0064

Krishna Prasad Parajuli, Marc Allassonnière-Tang

Abstract Nepali is typologically rare in terms of nominal classification systems, as it is one of the few languages of the world having simultaneously two gender systems (human/non-human, masculine/feminine) and one numeral classifier system (distinguishing features such as human, round-shaped objects, and long objects among others). Such a rare co-occurrence of different nominal classification systems is highly relevant for investigating linguistic complexity, as languages generally do not have several systems of the same type fulfilling the same functions. However, no corpus-based quantitative analyses have been conducted on the productive use of nominal classification systems in Nepali. The current paper aims at filling this gap by providing a token-based study from the Nepali National Corpus (∼20 million words). Our preliminary results show that there is in fact little formal overlap between the classifier and the gender systems.

摘要尼泊尔语在名词分类系统方面在类型学上是罕见的，因为它是世界上为数不多的同时拥有两个性别系统（人类/非人、阳性/阴性）和一个数字分类系统（区分特征，如人类、圆形物体和长形物体等）的语言之一。不同的名词分类系统罕见地同时出现，这与研究语言复杂性非常相关，因为语言通常没有几个相同类型的系统来实现相同的功能。然而，尚未对尼泊尔语中名词分类系统的生产性使用进行基于语料库的定量分析。目前的论文旨在通过提供尼泊尔国家语料库（约2000万字）的代币研究来填补这一空白。我们的初步结果表明，事实上，分类器和性别系统之间几乎没有正式的重叠。

引用次数: 0

They worked their hardest on the construction’s history: Superlative Objoid Constructions in Late Modern American English 他们尽最大努力研究这种结构的历史:晚期现代美国英语中的最高级宾语结构

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2023-02-13 DOI: 10.1515/cllt-2022-0088

Tamara Bouso, M. Hundt

Abstract English verbs can combine with an object-like (or Objoid) element consisting of a possessive and a superlative. These Superlative Objoids do not add a participant to the event but function like manner adverbs (they work their hardest, i.e. they work extremely hard). This paper is the first to use diachronic evidence from a corpus of Late Modern American English to trace the recent history of Superlative Objoid Constructions (SOC). In particular, it aims to assess whether the construction has become entrenched to the extent that it can give rise to analogical extension. Secondly, the evidence is used to model, within the framework of Construction Grammar, the horizontal and vertical links between the SOC and its (potential) relatives in the constructional network of transitivity changing constructions.

抽象的英语动词可以与一个由所有格和最高级组成的类宾语(或宾语)结合。这些最高级宾语不会将参与者添加到事件中，而是像态度副词一样起作用(他们工作得最努力，即他们工作得非常努力)。本文首次利用美国晚期现代英语语料库中的历时证据来追溯最高级宾语结构(SOC)的近代史。特别是，它的目的是评估结构是否已经成为根深蒂固的程度，它可以引起类比延伸。其次，在构式语法的框架内，利用证据对及物性变化构式网络中SOC与其(潜在)亲属之间的横向和纵向联系进行建模。

引用次数: 0

Frontmatter 头版头条

2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2023-02-01 DOI: 10.1515/cllt-2023-frontmatter1

引用次数: 0

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2023-01-02 DOI: 10.1515/cllt-2022-0040

J. Grieve

Abstract For centuries, investigations of disputed authorship have shown that people have unique styles of writing. Given sufficient data, it is generally possible to distinguish between the writings of a small group of authors, for example, through the multivariate analysis of the relative frequencies of common function words. There is, however, no accepted explanation for why this type of stylometric analysis is successful. Authorship analysts often argue that authors write in subtly different dialects, but the analysis of individual words is not licensed by standard theories of sociolinguistic variation. Alternatively, stylometric analysis is consistent with standard theories of register variation. In this paper, I argue that stylometric methods work because authors write in subtly different registers. To support this claim, I present the results of parallel stylometric and multidimensional register analyses of a corpus of newspaper articles written by two columnists. I demonstrate that both analyses not only distinguish between these authors but identify the same underlying patterns of linguistic variation. I therefore propose that register variation, as opposed to dialect variation, provides a basis for explaining these differences and for explaining stylometric analyses of authorship more generally.

摘要几个世纪以来，对有争议的作者的调查表明，人们有着独特的写作风格。如果有足够的数据，通常可以区分一小群作者的作品，例如，通过对常见虚词相对频率的多元分析。然而，对于为什么这种风格分析是成功的，目前还没有公认的解释。作者分析人士经常认为，作者用微妙不同的方言写作，但对单个单词的分析并没有得到社会语言学变异标准理论的许可。或者，风格分析与语域变异的标准理论是一致的。在这篇论文中，我认为风格计量法之所以有效，是因为作者用微妙的不同语域写作。为了支持这一说法，我对两位专栏作家撰写的报纸文章语料库进行了平行风格分析和多维语域分析。我证明，这两种分析不仅区分了这些作者，而且确定了语言变异的相同潜在模式。因此，我认为语域变异，而不是方言变异，为解释这些差异和更普遍地解释作者的风格分析提供了基础。

{"title":"Register variation explains stylometric authorship analysis","authors":"J. Grieve","doi":"10.1515/cllt-2022-0040","DOIUrl":"https://doi.org/10.1515/cllt-2022-0040","url":null,"abstract":"Abstract For centuries, investigations of disputed authorship have shown that people have unique styles of writing. Given sufficient data, it is generally possible to distinguish between the writings of a small group of authors, for example, through the multivariate analysis of the relative frequencies of common function words. There is, however, no accepted explanation for why this type of stylometric analysis is successful. Authorship analysts often argue that authors write in subtly different dialects, but the analysis of individual words is not licensed by standard theories of sociolinguistic variation. Alternatively, stylometric analysis is consistent with standard theories of register variation. In this paper, I argue that stylometric methods work because authors write in subtly different registers. To support this claim, I present the results of parallel stylometric and multidimensional register analyses of a corpus of newspaper articles written by two columnists. I demonstrate that both analyses not only distinguish between these authors but identify the same underlying patterns of linguistic variation. I therefore propose that register variation, as opposed to dialect variation, provides a basis for explaining these differences and for explaining stylometric analyses of authorship more generally.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"38 1","pages":"47 - 77"},"PeriodicalIF":1.6,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41269648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Metaphorical language change is Self-Organized Criticality 隐喻性语言变化具有自组织临界性

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2022-12-12 DOI: 10.1515/cllt-2022-0016

Xuri Tang, Huifang Ye

One way to resolve the actuation problem of metaphorical language change is to provide a statistical profile of metaphorical constructions and generative rules with antecedent conditions. Based on arguments from the view of language as complex systems and the dynamic view of metaphor, this paper argues that metaphorical language change qualifies as a Self-Organized Criticality state and the linguistic expressions of a metaphor can be profiled as a fractal with spatio-temporal correlations. Synchronously, these metaphorical expressions self-organize into a self-similar, scale-invariant fractal that follows a power-law distribution; temporally, long range interdependence constrains the self-organization process by the way of transformation rules that are intrinsic of a language system. This argument is verified in the paper with statistical analyses of twelve randomly selected Chinese verb metaphors in a large-scale diachronic corpus.

解决隐喻语言变化驱动问题的途径之一是提供隐喻结构和具有先行词条件的生成规则的统计概况。基于语言是复杂系统的观点和隐喻的动态观点，本文认为隐喻的语言变化是一种自组织临界状态，隐喻的语言表达可以被描述为具有时空相关性的分形。同时，这些隐喻表达自组织成一个自相似的、尺度不变的分形，遵循幂律分布;在时间上，长期的相互依赖通过语言系统固有的转换规则约束了自组织过程。本文通过对一个大型历时语料库中随机抽取的12个汉语动词隐喻进行统计分析，验证了这一观点。

引用次数: 0

Register variation and corpus linguistics: empirical findings and emerging theories. Special issue introduction of Corpus Linguistics and Linguistic Theory in honor of Douglas Biber 语域变异与语料库语言学:实证发现与新兴理论。《语料库语言学与语言学理论》特刊介绍，以纪念道格拉斯·比伯

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2022-12-12 DOI: 10.1515/cllt-2022-0093

Jesse Egbert, Bethany Gray, Tove Larsson

引用次数: 0

Clausal and phrasal coordination in recent American English 现代美国英语中的小句和短语协调

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2022-11-25 DOI: 10.1515/cllt-2022-0035

Merja Kytö, Erik Smitterberg

Abstract Several studies have shown that there is considerable cross-genre variation as regards what linguistic units tend to be coordinated by and. While literate, expository writing favors coordination of phrasal units such as noun phrases, coordinated units are more often clausal (e.g., main or subordinate clauses) in speech-related texts. This difference has been attested in studies that focus exclusively on coordination as well as in macro-level studies of co-variation among a large number of linguistic features. However, this register differentiation has increased over time: studies of Early and Late Modern English point to less pronounced differences among registers than those attested in the present-day language. This study fills a gap in research by considering data on coordination by and from the middle of the 20th century, a period that does not belong fully to either Late Modern or Present-Day English, and the late 20th and early 21st century, and thus ties diachronic and synchronic research on register variation in coordination together. We also examine language from films and television in order to complement historical findings for speech-related language with data on registers that arose in the 20th century.

一些研究表明，在哪些语言单位倾向于由和协调方面存在相当大的跨体裁差异。虽然有文化，说明文写作倾向于短语单位的协调，如名词短语，协调单位更经常是小句(例如，主句或从句)在言语相关的文本。这种差异已经在专门关注协调的研究以及大量语言特征之间共变的宏观研究中得到证实。然而，这种语域差异随着时间的推移而增加:对早期和晚期现代英语的研究表明，与现代语言相比，语域之间的差异并不那么明显。本研究考虑了20世纪中期前后的协调数据，填补了研究的空白，这一时期既不完全属于现代英语晚期，也不完全属于现代英语晚期，也不完全属于20世纪末和21世纪初，从而将协调中语域变化的历时性和共时性研究联系在一起。我们还研究了电影和电视中的语言，以便用20世纪出现的语域数据来补充与语言相关的历史发现。

{"title":"Clausal and phrasal coordination in recent American English","authors":"Merja Kytö, Erik Smitterberg","doi":"10.1515/cllt-2022-0035","DOIUrl":"https://doi.org/10.1515/cllt-2022-0035","url":null,"abstract":"Abstract Several studies have shown that there is considerable cross-genre variation as regards what linguistic units tend to be coordinated by and. While literate, expository writing favors coordination of phrasal units such as noun phrases, coordinated units are more often clausal (e.g., main or subordinate clauses) in speech-related texts. This difference has been attested in studies that focus exclusively on coordination as well as in macro-level studies of co-variation among a large number of linguistic features. However, this register differentiation has increased over time: studies of Early and Late Modern English point to less pronounced differences among registers than those attested in the present-day language. This study fills a gap in research by considering data on coordination by and from the middle of the 20th century, a period that does not belong fully to either Late Modern or Present-Day English, and the late 20th and early 21st century, and thus ties diachronic and synchronic research on register variation in coordination together. We also examine language from films and television in order to complement historical findings for speech-related language with data on registers that arose in the 20th century.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"19 1","pages":"23 - 46"},"PeriodicalIF":1.6,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42014549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2022-11-25 DOI: 10.1515/cllt-2022-0032

Susan Conrad

Abstract This article provides an overview of Douglas Biber’s work on register and his central role in establishing register as both an empirical focus and a theoretical construct in corpus linguistics. I identity four general phases of his work. Each has a slightly different emphasis, but each also advances intertwined threads of research that lead to an increased understanding of register variation. Biber’s work has made major contributions to distinct areas within the study of registers, from cross-linguistic speech-writing differences to English grammar, but he has advanced the field especially by integrating the findings from different areas. He has offered conceptualizations of register that account for findings from multiple areas of study, and he continues to refine the conceptualization as he engages in new lines of inquiry today.

摘要本文概述了Douglas Biber在语域方面的工作，以及他在建立语域作为语料库语言学的经验焦点和理论建构方面的核心作用。我确定了他工作的四个一般阶段。每一种都有略微不同的重点，但每一种也都推进了相互交织的研究线索，从而加深了对语域变异的理解。Biber的工作在语域研究的不同领域做出了重大贡献，从跨语言的语音写作差异到英语语法，但他特别是通过整合不同领域的研究结果来推进这一领域。他提供了寄存器的概念化，这些概念化解释了多个研究领域的发现，他在今天从事新的调查时继续完善概念化。

引用次数: 0

Metaphorical language change is Self-Organized Criticality 隐喻语言的变化是自组织批判

IF 1.6 2区文学 0 LANGUAGE & LINGUISTICS

Corpus Linguistics and Linguistic Theory

Pub Date : 2022-11-19 DOI: 10.48550/arXiv.2211.10709

Xuri Tang, Huifang Ye

Abstract One way to resolve the actuation problem of metaphorical language change is to provide a statistical profile of metaphorical constructions and generative rules with antecedent conditions. Based on arguments from the view of language as complex systems and the dynamic view of metaphor, this paper argues that metaphorical language change qualifies as a Self-Organized Criticality state and the linguistic expressions of a metaphor can be profiled as a fractal with spatio-temporal correlations. Synchronously, these metaphorical expressions self-organize into a self-similar, scale-invariant fractal that follows a power-law distribution; temporally, long range interdependence constrains the self-organization process by the way of transformation rules that are intrinsic of a language system. This argument is verified in the paper with statistical analyses of twelve randomly selected Chinese verb metaphors in a large-scale diachronic corpus.

摘要解决隐喻语言变化驱动问题的一种方法是提供一个具有先行条件的隐喻结构和生成规则的统计概况。基于语言作为复杂系统的观点和隐喻的动态观点，本文认为隐喻语言的变化是一种自组织的批判性状态，隐喻的语言表达可以被描述为具有时空相关性的分形。同时，这些隐喻表达自组织成一个自相似的、尺度不变的分形，遵循幂律分布；在时间上，长期的相互依赖通过语言系统固有的转换规则来约束自组织过程。本文通过对大规模历时语料库中随机选取的12个汉语动词隐喻的统计分析，验证了这一观点。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Corpus Linguistics and Linguistic Theory

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀