The value of the Janes corpus for Slovenian language standardization

Q2 Arts and Humanities Slovenscina 2.0 Pub Date : 2016-09-27 DOI:10.4312/slo2.0.2016.2.1-37
Špela Arhar Holdt, K. Dobrovoljc
{"title":"The value of the Janes corpus for Slovenian language standardization","authors":"Špela Arhar Holdt, K. Dobrovoljc","doi":"10.4312/slo2.0.2016.2.1-37","DOIUrl":null,"url":null,"abstract":"The main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modified by a proofreading expert; it therefore offers a more realistic insight into the trends of language use, as well as the intuitiveness of existing language rules, within a wider language community. We illustrate this methodological potential in a case study of nominal phrases with nonagreeing premodifiers, such as solo petje and RTV prispevek, by comparing their usage in Janes and the reference Kres corpus. The results reveal: this type of phrases is used more often in Janes and includes a longer list of candidates than in Kres; both corpora include a large number of phrases with variant spelling as either one or two words, irrespective of the premodifier in question; and, somewhat surprising, Janes displays a more consistent language use, suggesting that prescriptive regulation actually increases the level of inconsistency in language use. The article, a revised and enhanced extension of a prior conference paper, concludes with a discussion on possible future approaches to this linguistic issue and advocates for inclusion of Janes into Slovenian language standardisation methodology.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"4 1","pages":"1-37"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Slovenscina 2.0","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4312/slo2.0.2016.2.1-37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 0

Abstract

The main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modified by a proofreading expert; it therefore offers a more realistic insight into the trends of language use, as well as the intuitiveness of existing language rules, within a wider language community. We illustrate this methodological potential in a case study of nominal phrases with nonagreeing premodifiers, such as solo petje and RTV prispevek, by comparing their usage in Janes and the reference Kres corpus. The results reveal: this type of phrases is used more often in Janes and includes a longer list of candidates than in Kres; both corpora include a large number of phrases with variant spelling as either one or two words, irrespective of the premodifier in question; and, somewhat surprising, Janes displays a more consistent language use, suggesting that prescriptive regulation actually increases the level of inconsistency in language use. The article, a revised and enhanced extension of a prior conference paper, concludes with a discussion on possible future approaches to this linguistic issue and advocates for inclusion of Janes into Slovenian language standardisation methodology.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
简氏语料库对斯洛文尼亚语标准化的价值
本文的主要目的是评估简氏语料库在语言标准化领域的研究价值。与现有的斯洛文尼亚文参考语料库不同,新推出的用户生成内容的简氏语料库主要由未经校对专家修改的文本组成;因此,它提供了一个更现实的洞察语言使用的趋势,以及现有的语言规则的直观性,在更广泛的语言社区。我们通过对带有不一致前置修饰语的名义短语(如solo petje和RTV pripevek)的案例研究,通过比较它们在简氏语料库和参考Kres语料库中的使用情况,说明了这种方法的潜力。结果显示:这类短语在英语中使用的频率更高,而且候选词的列表也比英语长;这两个语料库都包含大量的短语,无论其前缀是什么,它们都有一个或两个单词的变体拼写;而且,有点令人惊讶的是,詹姆斯表现出更一致的语言使用,这表明规范性规定实际上增加了语言使用不一致的程度。这篇文章是对先前会议论文的修订和增强的延伸,最后讨论了未来可能解决这一语言学问题的方法,并主张将Janes纳入斯洛文尼亚语言标准化方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Slovenscina 2.0
Slovenscina 2.0 Arts and Humanities-Language and Linguistics
CiteScore
0.90
自引率
0.00%
发文量
0
审稿时长
16 weeks
期刊最新文献
Universal Dependencies za slovenščino Grammatical and Pragmatic Aspects of Slovenian Modality in Socially Unacceptable Facebook Comments Govoriš nevronsko? DirKorp Named Entities in Modernist Literary Texts
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1