Historical Corpus of Dutch: A new multi-genre corpus of Early and Late Modern Dutch

Iris Van de Voorde, Gijsbert Rutten, Rik Vosters, Marijke van der Wal, Wim Vandenbussche
{"title":"Historical Corpus of Dutch: A new multi-genre corpus of Early and Late Modern Dutch","authors":"Iris Van de Voorde, Gijsbert Rutten, Rik Vosters, Marijke van der Wal, Wim Vandenbussche","doi":"10.5117/tet2023.1.006.vand","DOIUrl":null,"url":null,"abstract":"In this contribution, we present the Historical Corpus of Dutch (HCD), a new multi-genre, diachronic corpus of Early and Late Modern Dutch (ca. 1550-1850). It consists of a digitised collection of handwritten administrative texts (e.g. town council meeting reports), handwritten ego-documents (e.g. diaries and travelogues), and printed pamphlets (e.g. of a political or religious nature). The corpus is also balanced between northern and southern material, with data from the provinces of Holland and Zeeland for the North, and from Flanders and Brabant for the South. After having discussed its structure and composition, we will illustrate the value of the new corpus with a number of smaller case studies. Based on our experiences with the corpus, we will conclude by launching a plea for historical corpus building not to focus too much on the quantity of data (‘big data’), but rather shift attention to data quality.","PeriodicalId":30675,"journal":{"name":"Taal en Tongval Language Variation in the Low Countries","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Taal en Tongval Language Variation in the Low Countries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5117/tet2023.1.006.vand","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In this contribution, we present the Historical Corpus of Dutch (HCD), a new multi-genre, diachronic corpus of Early and Late Modern Dutch (ca. 1550-1850). It consists of a digitised collection of handwritten administrative texts (e.g. town council meeting reports), handwritten ego-documents (e.g. diaries and travelogues), and printed pamphlets (e.g. of a political or religious nature). The corpus is also balanced between northern and southern material, with data from the provinces of Holland and Zeeland for the North, and from Flanders and Brabant for the South. After having discussed its structure and composition, we will illustrate the value of the new corpus with a number of smaller case studies. Based on our experiences with the corpus, we will conclude by launching a plea for historical corpus building not to focus too much on the quantity of data (‘big data’), but rather shift attention to data quality.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
荷兰语历史语料库:一个新的多体裁的早期和晚期现代荷兰语语料库
在这一贡献,我们提出荷兰语的历史语料库(HCD),一个新的多体裁,历时语料库早期和晚期现代荷兰语(约1550-1850)。它包括数字化的手写行政文件(如镇议会会议报告)、手写的个人文件(如日记和游记)和印刷的小册子(如政治或宗教性质的小册子)。语料库在北部和南部材料之间也保持平衡,北部的数据来自荷兰和泽兰省,南部的数据来自佛兰德斯和布拉班特省。在讨论了它的结构和组成之后,我们将用一些较小的案例研究来说明新语料库的价值。根据我们在语料库方面的经验,我们将在最后提出一个请求,即历史语料库建设不要过于关注数据的数量(“大数据”),而是将注意力转移到数据质量上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
5
审稿时长
53 weeks
期刊最新文献
Historical Corpus of Dutch: A new multi-genre corpus of Early and Late Modern Dutch Big Pimpin’. Een big data-benadering van de verspreiding van het leenwoord pimpen in het Nederlands Sound Change Estimation in Netherlandic Regional Languages: Reducing Inter-Transcriber Variability in Dialect Corpora Big data: New perspectives for research on language variation and change The validity of mixed-effects regression for analysing linguistic distance matrices: a simulation study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1