International Journal of Corpus Linguistics最新文献

英文中文

Strategies in tracing linguistic variation in a corpus of Old Irish texts (CorPH) 古爱尔兰语语料库中语言变异的追踪策略

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2022-09-20 DOI: 10.1075/ijcl.22018.sti

D. Stifter, Fangzhe Qiu, M. Aquino-López, Bernhard Bauer, E. Lash, Nora White

This article introduces Corpus PalaeoHibernicum (CorPH), a corpus currently consisting of 78 texts in Early Irish (c. 7th–10th cent.) created by the ERC-funded Chronologicon Hibernicum (ChronHib) project by bringing together pre-existing lexical and syntactic databases and adding further crucial texts from the period. In addition to being annotated for POS, morphological and syntactic information, another layer of annotation has been developed for CorPH – ‘Variation Tagging’, i.e. a tagset that numerically encodes synchronic language variation during the Early Irish period, thus allowing for much improved research on the chronological variation among the material. Another new pillar of studying linguistic variation is Bayesian Language Variation Analysis (BLaVA), in order to address the challenge that “not-so-big data” poses to statistical corpus methods. Instead of reflecting feature frequencies, BLaVA models language variation as probabilities of variation.

本文介绍了古爱尔兰语语料库（CorPH），这是一个由78篇早期爱尔兰语文本（约7-10美分）组成的语料库，由ERC资助的Chronologicon Hibernicum（ChroonHib）项目创建，该项目将预先存在的词汇和句法数据库结合在一起，并添加了该时期的更多关键文本。除了对词性、形态和句法信息进行注释外，还为CorPH开发了另一层注释——“变体标记”，即对爱尔兰早期共时语言变体进行数字编码的标记集，从而大大改进了对材料之间时间变化的研究。研究语言变异的另一个新支柱是贝叶斯语言变异分析（BLaVA），以应对“不那么大的数据”对统计语料库方法提出的挑战。BLaVA不是反映特征频率，而是将语言变化建模为变化的概率。

引用次数: 0

“In barbarous times and in uncivilized countries” “在野蛮的时代和不文明的国家”

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2022-09-09 DOI: 10.1075/ijcl.22016.ale

Marc Alexander, Andrew Struan

The ways in which politicians have discussed who, what, and where was considered “uncivilized’” across the past two centuries gives an insight into how speakers in a position of authority classified and constructed the world around them, and how those in power in Britain see the country and themselves. This article uses the Hansard Corpus 1803–2003 of speeches in the UK Parliament alongside data from the Historical Thesaurus of English to analyse diachronic variation in usage of words for persons, places and practices considered uncivil. It proposes new methods and offers quantitative data to describe the period’s shift in political attitudes towards not just the so-called “uncivil” but also the country as a whole.

在过去的两个世纪里，政客们讨论谁、什么和在哪里被认为是“不文明的”的方式，让我们深入了解了处于权威地位的演讲者是如何对他们周围的世界进行分类和构建的，以及英国当权者是如何看待这个国家和他们自己的。本文使用1803–2003年英国议会演讲的汉萨语料库以及《英语历史同义词库》中的数据，分析被认为不文明的人、地方和做法的词汇使用的历时变化。它提出了新的方法，并提供了定量数据来描述这一时期政治态度的转变，不仅是对所谓的“不文明”，而且是对整个国家的转变。

引用次数: 0

Volatile concepts 不稳定的概念

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2022-09-06 DOI: 10.1075/ijcl.22005.fit

S. Fitzmaurice, Seth Mehl

This paper demonstrates the value of studying co-occurrence ‘quads’ – constellations of four non-adjacent lemmas that consistently co-occur across spans of up to 100 tokens – for understanding discursive change. We map meaning onto quads as ‘discursive concepts’, which encompass encyclopaedic semantics, pragmatics, and context. We investigate a high-frequency quad with high co-occurrence strength in EEBO-TCP: world-heaven-earth-power. We conduct semantic and pragmatic analysis to generate hypotheses regarding discursive change. The quad’s components are semantically underspecified; thus, although the quad indicates a discursive concept, each instantiation of the quad is variable, contingent, and dependent upon context and pragmatic processes for interpretation. We observe how the vague lexemes that constitute building blocks of religious discourse are employed to generate new, timely secular discourses; and we argue that semantic underspecification is the site and source of discursive change. Indeed, the volatile, unstable nature of the component lexical meanings renders them indispensable to early modern debate.

本文证明了研究共现“quads”（四个不相邻狐猴的星座，它们在多达100个标记的跨度内一致地共现）对理解话语变化的价值。我们将意义映射到四元组上，作为“话语概念”，其中包括百科全书式的语义、语用学和上下文。我们研究了EEBO-TCP中一个具有高共现强度的高频quad：世界天地力量。我们进行语义和语用分析，以产生关于话语变化的假设。quad的组件在语义上没有得到充分的指定；因此，尽管quad表示一个话语概念，但quad的每个实例化都是可变的、偶然的，并依赖于上下文和语用过程来进行解释。我们观察到构成宗教话语基石的模糊词汇是如何被用来产生新的、及时的世俗话语的；我们认为，语义指定不足是话语变化的场所和来源。事实上，构成词意义的波动性和不稳定性使得它们在早期现代辩论中不可或缺。

引用次数: 4

A corpus-based study of anglicized neologisms in Korea 基于语料库的韩语英语化新词研究

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2022-09-06 DOI: 10.1075/ijcl.20055.kim

E. Kim

This study examines usage changes of English-based loanwords and Korean replacement words promoted by the National Institute of Korean Language in a six-year span, using two corpora. It focuses on 18 Korean and anglicized word pairs appearing on the National Institute of Korean Language’s website that purportedly showcase the Institute’s successful efforts to curtail the usage of English words by promoting Korean replacement words. The results indicate that promoting Korean does not necessarily decrease the usage of English, and that the usage of English-based words seems to increase in conjunction with the Korean words. Several Korean words promoted by the National Institute of Korean Language have extremely low frequencies, and some loanwords are being used with various meanings. Commentaries are provided to explain various patterns of observed usage change.

本研究使用两个语料库，在六年的时间里，考察了国家韩语研究所推广的英语借词和韩语替换词的用法变化。它关注的是韩国国家韩语研究所网站上出现的18个韩语和英语单词对，据称这些单词对展示了该研究所通过推广韩语替代词来减少英语单词使用的成功努力。结果表明，推广韩语并不一定会减少英语的使用，而且英语单词的使用似乎会随着韩语单词的使用而增加。国家韩语研究所推广的几个韩语单词频率极低，一些外来词被使用，含义各异。提供注释来解释观察到的用法变化的各种模式。

引用次数: 0

Keywords through time 关键词通过时间

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2022-08-29 DOI: 10.1075/ijcl.22011.cla

Isobelle Clarke, Gavin Brookes, Tony McEnery

This paper applies a new approach to the identification of discourses, based on Multiple Correspondence Analysis (MCA), to the study of discourse variation over time. The MCA approach to keywords deals with a major issue with the use of keywords to identify discourses: the allocation of individual keywords to multiple discourses. Yet, as this paper demonstrates, the approach also allows us to observe variation in the prevalence of discourses over time. The MCA approach to keywords allows the allocation of individual texts to multiple discourses based on patterns of keyword co-occurrence. Metadata in the corpus data analysed (here, UK newspaper articles about Islam) can then be used to map those discourses over time, resulting in a clear view of how the discourses vary relative to one another as time progresses. The paper argues that the drivers for these fluctuations are language external; the real-world events reported on in the newspapers.

本文将一种新的语篇识别方法——基于多重对应分析（MCA），应用于语篇随时间变化的研究。关键词的MCA方法处理了使用关键词识别语篇的一个主要问题：将单个关键词分配给多个语篇。然而，正如本文所表明的，这种方法也让我们能够观察到话语流行率随时间的变化。关键词的MCA方法允许基于关键词共现模式将单个文本分配给多个语篇。分析的语料库数据中的元数据（这里是英国报纸上关于伊斯兰教的文章）可以用来绘制这些话语随时间的变化图，从而清楚地看到随着时间的推移，话语是如何相互变化的。论文认为，这些波动的驱动因素是外部语言；报纸上报道的真实世界的事件。

引用次数: 2

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2022-08-23 DOI: 10.1075/ijcl.20177.lii

A. Liimatta

This paper explores variation in lexico-grammatical register features across text lengths in a large-scale sample of Reddit comments. Very short texts are known to be problematic for many statistical methods, so understanding their nature is important for the corpus-linguistic study of social media, where most contributions are short. I show that the frequencies of linguistic features change with comment length, even between longer comments, although longer texts are often considered similar in statistical terms. Moreover, I classify the variation found between short comments of different lengths into two main patterns, although other patterns can also be found, and there is variation even within these patterns. Furthermore, I interpret the observed differences in terms of register variation. For example, shorter comments appear to be more casual and less edited in terms of their feature makeup, whereas narrative and informational registers seem to favor longer comments.

本文在Reddit评论的大规模样本中探讨了词典语法域特征在文本长度上的变化。众所周知，对于许多统计方法来说，非常短的文本是有问题的，因此了解它们的性质对于社交媒体的语料库语言研究很重要，因为大多数贡献都很短。我展示了语言特征的频率随评论长度而变化，即使在较长的评论之间也是如此，尽管较长的文本在统计术语中通常被认为是相似的。此外，我将在不同长度的短注释之间发现的变化分为两种主要模式，尽管也可以发现其他模式，甚至在这些模式中也存在变化。此外，我从语域变化的角度来解释观察到的差异。例如，较短的评论似乎更随意，在特征构成方面编辑较少，而叙述性和信息性寄存器似乎更倾向于较长的评论。

引用次数: 2

New methods for analysing diachronic suffix competition across registers 跨语域历时后缀竞争分析的新方法

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2022-08-19 DOI: 10.1075/ijcl.22014.rod

Paula Rodríguez-Puente, Tanja Säily, J. Suomela

This paper tracks stylistic variation in the use of two roughly synonymous suffixes, the Romance -ity and the native -ness, during the Early Modern English period. We seek to verify from a statistical viewpoint the claims of Rodríguez-Puente (2020), who reports on a decrease of -ness in favour of -ity in registers representative of the speech-written and formal-informal continua at that time. To this end, we develop new methods of statistical and visual analysis that enable diachronic comparisons of competing processes across subcorpora, building upon an earlier method by Säily and Suomela (2009). Our results confirm that -ity gained ground first in written registers and then spread towards speech-related registers, and we are able to time this change more accurately thanks to a novel periodisation. We also provide strong statistical support indicating that the proportion of -ity was significantly higher in legal registers than in other registers.

本文追踪了近代早期英语中两个大致同义的后缀“浪漫性”和“乡土性”在文体上的变化。我们试图从统计角度验证Rodríguez-Puente(2020)的说法，该报告称，在当时代表演讲-书面和正式-非正式连续体的注册表中，-ness的减少有利于-ity。为此，我们在Säily和Suomela(2009)的早期方法的基础上，开发了新的统计和视觉分析方法，使跨亚语料库竞争过程的历时比较成为可能。我们的研究结果证实，-ity首先在书面寄存器中获得了一席之地，然后传播到与语音相关的寄存器中，由于一种新的周期化，我们能够更准确地确定这种变化的时间。我们还提供了强有力的统计支持，表明-ity在合法登记册中的比例明显高于其他登记册。

引用次数: 1

Annotating dialogue acts in speech data 语音数据中的对话行为注释

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2022-08-08 DOI: 10.1075/ijcl.20165.ver

D. Verdonik

The aims of this paper are to detect the most problematic issues related to dialogue act annotation in speech corpora and to define basic categories of dialogue acts. I critically examine and test generic schemes that represent different lines of dialogue act annotation: AMI, DART, ISO 24617–2 and SWBD-DAMSL. It is found that the most problematic issues regarding dialogue act annotation are related to the distinction between the semantic and pragmatic meanings of utterances, the annotation of metadiscourse, and the adequacy and informativeness of the tagset. The identified basic dialogue act categories are information providing, information seeking, actions, social acts and metadiscourse. The findings help improve dialogue act annotation.

本文的目的是发现语音语料库中对话行为标注中最具问题的问题，并定义对话行为的基本类别。我严格检查和测试代表不同对话行为注释行的通用方案:AMI, DART, ISO 24617-2和SWBD-DAMSL。研究发现，对话行为标注中存在的最大问题是话语语义和语用意义的区分、元话语的标注以及标记集的充分性和信息性。确定的基本对话行为类别为信息提供、信息寻求、行为、社会行为和元话语。研究结果有助于改进对话行为注释。

引用次数: 0

Derivation and semantic autonomy 派生与语义自主

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2022-07-21 DOI: 10.1075/ijcl.20074.kra

Iwona Kraska-Szlenk, Beata Wójtowicz

The article focuses on the polysemy and usage patterns of the Polish lexeme głowa “head” and its diminutive główka. Based on corpus methodology and cognitive linguistics analysis, it is argued that the two lexemes are too autonomous in their meanings than predicted by their morphological relatedness. As the two words cover different semantic domains, we observe that the diminutive suffix has developed a new function which signals lexicalization of meaning toward a non-human semantic domain, for example, material objects, plants, etc. Our research contributes to studies on Polish morphology and lexical semantics and to theoretical research on the polysemy of body part terms.

本文主要研究波兰语词głowa“head”及其小词gßówka的多义性和使用模式。基于语料库方法和认知语言学分析，认为这两个词在意义上过于自主，而不是通过形态关联来预测的。由于这两个词涵盖了不同的语义领域，我们观察到，小后缀已经发展出一种新的功能，它标志着意义向非人类语义领域的词汇化，例如实物、植物等。我们的研究有助于波兰语形态和词汇语义的研究，以及身体部位术语多义的理论研究。

引用次数: 1

Question illocutionary force indicating devices in academic writing 质疑学术写作中的言外之力指示手段

IF 1 2区文学 0 LANGUAGE & LINGUISTICS

International Journal of Corpus Linguistics

Pub Date : 2022-07-18 DOI: 10.1075/ijcl.20065.cur

Niall Curry

Corpus research on questions as reader engagement markers in academic writing typically focuses on direct questions. Such questions are signalled by question marks and are relatively easily searchable in a corpus. However, indirect questions can be more challenging to identify, as they can be introduced by a range of forms. Based on a contrastive analysis of a corpus of English, French, and Spanish economics research articles, this paper provides pertinent evidence on direct and indirect questions as reader engagement markers. Firstly, it shows that direct and indirect questions as reader engagement markers are a rhetorical and generic feature of academic writing in the economics research article and, secondly, it presents a comprehensive list of indirect question illocutionary force indicating devices, valuable for future studies of indirect questions. Methodologically, this paper illustrates a replicable process for functional analysis and discusses the value of theoretically merging corpus and contrastive linguistic approaches.

语料库对学术写作中作为读者参与标记的问题的研究通常侧重于直接问题。这样的问题由问号表示，并且在语料库中相对容易搜索。然而，间接问题可能更难识别，因为它们可以通过一系列形式引入。基于对英语、法语和西班牙语经济学研究文章语料库的对比分析，本文提供了直接和间接问题作为读者参与标记的相关证据。首先，它表明，作为读者参与标记的直接和间接问题是经济学研究文章学术写作的一个修辞和一般特征；其次，它提供了一个全面的间接问题话语外力量指示手段列表，对未来的间接问题研究有价值。在方法论上，本文阐述了一个可复制的功能分析过程，并讨论了将语料库和对比语言学方法相结合的理论价值。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Journal of Corpus Linguistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀