Omogočanje dostopa do korpusov slovenskih spletnih besedil v luči pravnih omejitev

Q2 Arts and Humanities Slovenscina 2.0 Pub Date : 2016-09-27 DOI:10.4312/SLO2.0.2016.2.189-219
T. Erjavec, Jaka Čibej, Darja Fišer
{"title":"Omogočanje dostopa do korpusov slovenskih spletnih besedil v luči pravnih omejitev","authors":"T. Erjavec, Jaka Čibej, Darja Fišer","doi":"10.4312/SLO2.0.2016.2.189-219","DOIUrl":null,"url":null,"abstract":"Web texts are becoming increasingly relevant sources of information, with web corpora useful for corpus linguistic studies and development of language technologies. Even though web texts are directly accessable, which substantially simplifies the collection procedure compilation of web corpora is still complex, time consuming and expensive. It is crucial that similar endeavours are not repeated, which is why it is necessary to make the created corpora easily and widely accessible both to researchers and a wider audience. While this is logistically and technically a straightforward procedure, legal constraints, such as copyright, privacy and terms of use severely hinder the dissemination of web corpora. This paper discusses legal conditions and actual practice in this area, gives an overview of current practices and proposes a range of mitigation measures on the example of the Janes corpus of Slovene user-generated content in order to ensure free and open dissemination of Slovene web corpora.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"4 1","pages":"189-219"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Slovenscina 2.0","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4312/SLO2.0.2016.2.189-219","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 2

Abstract

Web texts are becoming increasingly relevant sources of information, with web corpora useful for corpus linguistic studies and development of language technologies. Even though web texts are directly accessable, which substantially simplifies the collection procedure compilation of web corpora is still complex, time consuming and expensive. It is crucial that similar endeavours are not repeated, which is why it is necessary to make the created corpora easily and widely accessible both to researchers and a wider audience. While this is logistically and technically a straightforward procedure, legal constraints, such as copyright, privacy and terms of use severely hinder the dissemination of web corpora. This paper discusses legal conditions and actual practice in this area, gives an overview of current practices and proposes a range of mitigation measures on the example of the Janes corpus of Slovene user-generated content in order to ensure free and open dissemination of Slovene web corpora.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
根据法律限制允许访问斯洛文尼亚在线文本语料库
网络文本正成为越来越重要的信息来源,网络语料库对语料库语言学研究和语言技术的发展非常有用。尽管网络文本是可直接访问的,这大大简化了网络语料库的收集过程,但网络语料库的编制仍然是复杂、耗时和昂贵的。至关重要的是,不要重复类似的努力,这就是为什么有必要使创建的语料库对研究人员和更广泛的受众都容易和广泛地访问。虽然这在逻辑上和技术上都是一个简单的过程,但版权、隐私和使用条款等法律限制严重阻碍了网络语料库的传播。本文讨论了这一领域的法律条件和实际做法,概述了目前的做法,并以斯洛文尼亚用户生成内容的Janes语料库为例,提出了一系列缓解措施,以确保斯洛文尼亚网络语料库的自由和公开传播。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Slovenscina 2.0
Slovenscina 2.0 Arts and Humanities-Language and Linguistics
CiteScore
0.90
自引率
0.00%
发文量
0
审稿时长
16 weeks
期刊最新文献
Universal Dependencies za slovenščino Grammatical and Pragmatic Aspects of Slovenian Modality in Socially Unacceptable Facebook Comments Govoriš nevronsko? DirKorp Named Entities in Modernist Literary Texts
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1