The value of the Janes corpus for Slovenian language standardization

Q2 Arts and Humanities Slovenscina 2.0 Pub Date : 2016-09-27 DOI:10.4312/slo2.0.2016.2.1-37

Špela Arhar Holdt, K. Dobrovoljc

{"title":"The value of the Janes corpus for Slovenian language standardization","authors":"Špela Arhar Holdt, K. Dobrovoljc","doi":"10.4312/slo2.0.2016.2.1-37","DOIUrl":null,"url":null,"abstract":"The main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modified by a proofreading expert; it therefore offers a more realistic insight into the trends of language use, as well as the intuitiveness of existing language rules, within a wider language community. We illustrate this methodological potential in a case study of nominal phrases with nonagreeing premodifiers, such as solo petje and RTV prispevek, by comparing their usage in Janes and the reference Kres corpus. The results reveal: this type of phrases is used more often in Janes and includes a longer list of candidates than in Kres; both corpora include a large number of phrases with variant spelling as either one or two words, irrespective of the premodifier in question; and, somewhat surprising, Janes displays a more consistent language use, suggesting that prescriptive regulation actually increases the level of inconsistency in language use. The article, a revised and enhanced extension of a prior conference paper, concludes with a discussion on possible future approaches to this linguistic issue and advocates for inclusion of Janes into Slovenian language standardisation methodology.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"4 1","pages":"1-37"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Slovenscina 2.0","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4312/slo2.0.2016.2.1-37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}

引用次数: 0

Abstract

The main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modified by a proofreading expert; it therefore offers a more realistic insight into the trends of language use, as well as the intuitiveness of existing language rules, within a wider language community. We illustrate this methodological potential in a case study of nominal phrases with nonagreeing premodifiers, such as solo petje and RTV prispevek, by comparing their usage in Janes and the reference Kres corpus. The results reveal: this type of phrases is used more often in Janes and includes a longer list of candidates than in Kres; both corpora include a large number of phrases with variant spelling as either one or two words, irrespective of the premodifier in question; and, somewhat surprising, Janes displays a more consistent language use, suggesting that prescriptive regulation actually increases the level of inconsistency in language use. The article, a revised and enhanced extension of a prior conference paper, concludes with a discussion on possible future approaches to this linguistic issue and advocates for inclusion of Janes into Slovenian language standardisation methodology.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

简氏语料库对斯洛文尼亚语标准化的价值

本文的主要目的是评估简氏语料库在语言标准化领域的研究价值。与现有的斯洛文尼亚文参考语料库不同，新推出的用户生成内容的简氏语料库主要由未经校对专家修改的文本组成;因此，它提供了一个更现实的洞察语言使用的趋势，以及现有的语言规则的直观性，在更广泛的语言社区。我们通过对带有不一致前置修饰语的名义短语(如solo petje和RTV pripevek)的案例研究，通过比较它们在简氏语料库和参考Kres语料库中的使用情况，说明了这种方法的潜力。结果显示:这类短语在英语中使用的频率更高，而且候选词的列表也比英语长;这两个语料库都包含大量的短语，无论其前缀是什么，它们都有一个或两个单词的变体拼写;而且，有点令人惊讶的是，詹姆斯表现出更一致的语言使用，这表明规范性规定实际上增加了语言使用不一致的程度。这篇文章是对先前会议论文的修订和增强的延伸，最后讨论了未来可能解决这一语言学问题的方法，并主张将Janes纳入斯洛文尼亚语言标准化方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊