Analyzing the unrestricted web: The finnish corpus of online registers

IF 0.5 3区 文学 0 LANGUAGE & LINGUISTICS Nordic Journal of Linguistics Pub Date : 2023-03-13 DOI:10.1017/s0332586523000021
Valtteri Skantsi, Veronika Laippala
{"title":"Analyzing the unrestricted web: The finnish corpus of online registers","authors":"Valtteri Skantsi, Veronika Laippala","doi":"10.1017/s0332586523000021","DOIUrl":null,"url":null,"abstract":"\n This article introduces the Finnish Corpus of Online Registers (FinCORE) representing the full range of registers – situationally defined text varieties such as news and blogs – on the Finnish Internet. The extreme range of language use found online has challenged the study of registers. It has been unclear what registers the entire Internet includes, and if they can be sufficiently defined to allow for their analysis or classification, previous studies focusing on restricted sets of registers and English. FinCORE features 10,754 texts from the unrestricted web, manually annotated for their register using a scheme originally established for the Corpus of Online Registers of English (CORE). We present the FinCORE registers and compare them to CORE. Finally, we show that the FinCORE registers are sufficiently well-defined to allow for their automatic identification, thus opening novel possibilities for both linguistics and web-as-corpus research. FinCORE is published under an open license.","PeriodicalId":43203,"journal":{"name":"Nordic Journal of Linguistics","volume":" ","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nordic Journal of Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1017/s0332586523000021","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 1

Abstract

This article introduces the Finnish Corpus of Online Registers (FinCORE) representing the full range of registers – situationally defined text varieties such as news and blogs – on the Finnish Internet. The extreme range of language use found online has challenged the study of registers. It has been unclear what registers the entire Internet includes, and if they can be sufficiently defined to allow for their analysis or classification, previous studies focusing on restricted sets of registers and English. FinCORE features 10,754 texts from the unrestricted web, manually annotated for their register using a scheme originally established for the Corpus of Online Registers of English (CORE). We present the FinCORE registers and compare them to CORE. Finally, we show that the FinCORE registers are sufficiently well-defined to allow for their automatic identification, thus opening novel possibilities for both linguistics and web-as-corpus research. FinCORE is published under an open license.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分析不受限制的网络:芬兰语在线注册语料库
本文介绍了芬兰语在线语域语料库(FinCORE),它代表了芬兰互联网上所有的语域——情境定义的文本类型,如新闻和博客。网上发现的语言使用范围之广给语域研究带来了挑战。目前还不清楚整个互联网包括哪些注册表,也不清楚这些注册表是否可以被充分定义,以便进行分析或分类,之前的研究主要集中在有限的注册表和英语上。FinCORE拥有10,754篇来自无限制网络的文本,使用最初为在线英语注册语料库(CORE)建立的方案为其注册手工注释。我们介绍了FinCORE寄存器,并将它们与CORE进行了比较。最后,我们表明FinCORE注册表被充分定义以允许它们的自动识别,从而为语言学和网络作为语料库的研究开辟了新的可能性。FinCORE是在开放许可下发布的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.20
自引率
20.00%
发文量
22
期刊最新文献
På, i, for, or til: A comparative analysis of prepositions in the writing of L1 and L2 Danish users Letters to the Paulaharjus from Ruija: The emergence of two writing cultures in Finnish among Kvens in the early twentieth century The Russian origin of Karelian cow names OKAY as a content word: Regulating language and constructing centres of norms in Finnish, Finland-Swedish, and Sweden-Swedish academic writing consultation meetings Stability in the integrated bilingual grammar: Tense exponency in North American Norwegian
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1