Omogočanje dostopa do korpusov slovenskih spletnih besedil v luči pravnih omejitev

Q2 Arts and Humanities Slovenscina 2.0 Pub Date : 2016-09-27 DOI:10.4312/SLO2.0.2016.2.189-219

T. Erjavec, Jaka Čibej, Darja Fišer

引用次数: 2

Abstract

Web texts are becoming increasingly relevant sources of information, with web corpora useful for corpus linguistic studies and development of language technologies. Even though web texts are directly accessable, which substantially simplifies the collection procedure compilation of web corpora is still complex, time consuming and expensive. It is crucial that similar endeavours are not repeated, which is why it is necessary to make the created corpora easily and widely accessible both to researchers and a wider audience. While this is logistically and technically a straightforward procedure, legal constraints, such as copyright, privacy and terms of use severely hinder the dissemination of web corpora. This paper discusses legal conditions and actual practice in this area, gives an overview of current practices and proposes a range of mitigation measures on the example of the Janes corpus of Slovene user-generated content in order to ensure free and open dissemination of Slovene web corpora.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

根据法律限制允许访问斯洛文尼亚在线文本语料库

网络文本正成为越来越重要的信息来源，网络语料库对语料库语言学研究和语言技术的发展非常有用。尽管网络文本是可直接访问的，这大大简化了网络语料库的收集过程，但网络语料库的编制仍然是复杂、耗时和昂贵的。至关重要的是，不要重复类似的努力，这就是为什么有必要使创建的语料库对研究人员和更广泛的受众都容易和广泛地访问。虽然这在逻辑上和技术上都是一个简单的过程，但版权、隐私和使用条款等法律限制严重阻碍了网络语料库的传播。本文讨论了这一领域的法律条件和实际做法，概述了目前的做法，并以斯洛文尼亚用户生成内容的Janes语料库为例，提出了一系列缓解措施，以确保斯洛文尼亚网络语料库的自由和公开传播。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊