迈向虚拟研究环境与基础设施之路

Manfred Nölte, M. Blenkle
{"title":"迈向虚拟研究环境与基础设施之路","authors":"Manfred Nölte, M. Blenkle","doi":"10.21825/jeps.v4i1.10171","DOIUrl":null,"url":null,"abstract":"The State and University Library Bremen (SuUB) is dedicated to the digitization of its historical collections. Digitization is an important instrument for improving the accessibility of valuable information contained in fragile historical documents. It facilitates academic research and teaching and is indispensable to the digital humanities. Especially the research of digital serial publications benefits from ‘recent systematic digitization efforts, often initiated by libraries […]. More and more historical periodicals and other serial publications are now digitally available in full, i.e., all of their issues’ [Piotrowski, this volume]. The historical journal presented in this article is one of these and the final section will discuss why it can be considered a complete corpus. Usually, digitization projects produce digital images, metadata for cataloguing and web-navigation purposes and OCR full text for searching. This information is made available through the library's web portal for digital collections. However, digital humanists need high-quality full texts enriched with metadata in the appropriate format to analyse them with powerful software tools. \nThe historical journal Die Grenzboten serves as an exemplary model to bridge the gap between digitization projects in libraries and research infrastructures. Die Grenzboten is a long running serial publication (1841 – 1922). It can be classified as a literary journal that also covered politics and arts. We demonstrate that OCR post correction and a page-wise structuring are prerequisites for the creation of a high-quality TEI version of a full text. The TEI version was created in cooperation with the Deutsches Textarchiv (DTA) at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW). A fully automated OCR post correction developed at the SuUB Bremen is freely available on GitHub. \nTo enable scientists to work with powerful software tools the transfer of high-quality full texts to research infrastructures is a necessary step. We describe transfers of full text and the experience we have made, but still some general questions persist: What has to be done to prepare raw OCR output for this purpose in a reasonable and cost-effective manner? What quality is needed or expected? Which metadata and file formats are needed? Should there not be a closer cooperation between research infrastructures and libraries handling the digitization? OCR full texts, even post corrected, are not perfect but character recognition rates around 99% certainly provide more options than just being used as a search index. There is a vast amount of textual resources available ready to be made fully accessible for scientific research! Finally, some suggestions for scholars and the researchers working on digital serial publications are given.","PeriodicalId":142850,"journal":{"name":"Journal of European Periodical Studies","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Die Grenzboten on its Way to Virtual Research Environments and Infrastructures\",\"authors\":\"Manfred Nölte, M. Blenkle\",\"doi\":\"10.21825/jeps.v4i1.10171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The State and University Library Bremen (SuUB) is dedicated to the digitization of its historical collections. Digitization is an important instrument for improving the accessibility of valuable information contained in fragile historical documents. It facilitates academic research and teaching and is indispensable to the digital humanities. Especially the research of digital serial publications benefits from ‘recent systematic digitization efforts, often initiated by libraries […]. More and more historical periodicals and other serial publications are now digitally available in full, i.e., all of their issues’ [Piotrowski, this volume]. The historical journal presented in this article is one of these and the final section will discuss why it can be considered a complete corpus. Usually, digitization projects produce digital images, metadata for cataloguing and web-navigation purposes and OCR full text for searching. This information is made available through the library's web portal for digital collections. However, digital humanists need high-quality full texts enriched with metadata in the appropriate format to analyse them with powerful software tools. \\nThe historical journal Die Grenzboten serves as an exemplary model to bridge the gap between digitization projects in libraries and research infrastructures. Die Grenzboten is a long running serial publication (1841 – 1922). It can be classified as a literary journal that also covered politics and arts. We demonstrate that OCR post correction and a page-wise structuring are prerequisites for the creation of a high-quality TEI version of a full text. The TEI version was created in cooperation with the Deutsches Textarchiv (DTA) at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW). A fully automated OCR post correction developed at the SuUB Bremen is freely available on GitHub. \\nTo enable scientists to work with powerful software tools the transfer of high-quality full texts to research infrastructures is a necessary step. We describe transfers of full text and the experience we have made, but still some general questions persist: What has to be done to prepare raw OCR output for this purpose in a reasonable and cost-effective manner? What quality is needed or expected? Which metadata and file formats are needed? Should there not be a closer cooperation between research infrastructures and libraries handling the digitization? OCR full texts, even post corrected, are not perfect but character recognition rates around 99% certainly provide more options than just being used as a search index. There is a vast amount of textual resources available ready to be made fully accessible for scientific research! Finally, some suggestions for scholars and the researchers working on digital serial publications are given.\",\"PeriodicalId\":142850,\"journal\":{\"name\":\"Journal of European Periodical Studies\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of European Periodical Studies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21825/jeps.v4i1.10171\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of European Periodical Studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21825/jeps.v4i1.10171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

不来梅州立大学图书馆致力于其历史馆藏的数字化。数字化是提高易碎历史文献中有价值信息可及性的重要手段。它促进了学术研究和教学,是数字人文不可或缺的。特别是对数字系列出版物的研究受益于“最近由图书馆发起的系统化数字化努力……”。越来越多的历史期刊和其他系列出版物现在都有了完整的数字版,也就是说,他们所有的问题都可以得到。本文中介绍的历史期刊就是其中之一,最后一节将讨论为什么它可以被视为一个完整的语料库。通常,数字化项目产生数字图像、用于编目和网络导航的元数据以及用于搜索的OCR全文。这些信息可通过图书馆的数字馆藏门户网站获得。然而,数字人文主义者需要高质量的全文,其中包含适当格式的元数据,以便用强大的软件工具进行分析。历史杂志《Die Grenzboten》是弥合图书馆数字化项目与研究基础设施之间差距的典范。《Grenzboten》是一份长期连续出版的刊物(1841 - 1922)。它可以被归类为一份涵盖政治和艺术的文学杂志。我们证明了OCR后期校正和页面结构是创建高质量的全文TEI版本的先决条件。TEI版本是与柏林-勃兰登堡科学与人文学院(BBAW)的Deutsches Textarchiv (DTA)合作创建的。在SuUB不来梅开发的全自动OCR后校正在GitHub上免费提供。为了使科学家能够使用强大的软件工具,将高质量的全文转移到研究基础设施是必要的一步。我们描述了全文传输和我们所取得的经验,但仍然存在一些一般性问题:必须做些什么才能以合理和具有成本效益的方式为此目的准备原始OCR输出?需要或期望什么样的质量?需要哪些元数据和文件格式?难道研究基础设施和图书馆之间不应该更紧密地合作处理数字化吗?OCR全文,即使是经过修正的,也不是完美的,但99%左右的字符识别率肯定提供了更多的选择,而不仅仅是用作搜索索引。有大量的文本资源可供科学研究充分利用!最后,对从事数字连载出版物研究的学者和研究者提出了一些建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Die Grenzboten on its Way to Virtual Research Environments and Infrastructures
The State and University Library Bremen (SuUB) is dedicated to the digitization of its historical collections. Digitization is an important instrument for improving the accessibility of valuable information contained in fragile historical documents. It facilitates academic research and teaching and is indispensable to the digital humanities. Especially the research of digital serial publications benefits from ‘recent systematic digitization efforts, often initiated by libraries […]. More and more historical periodicals and other serial publications are now digitally available in full, i.e., all of their issues’ [Piotrowski, this volume]. The historical journal presented in this article is one of these and the final section will discuss why it can be considered a complete corpus. Usually, digitization projects produce digital images, metadata for cataloguing and web-navigation purposes and OCR full text for searching. This information is made available through the library's web portal for digital collections. However, digital humanists need high-quality full texts enriched with metadata in the appropriate format to analyse them with powerful software tools. The historical journal Die Grenzboten serves as an exemplary model to bridge the gap between digitization projects in libraries and research infrastructures. Die Grenzboten is a long running serial publication (1841 – 1922). It can be classified as a literary journal that also covered politics and arts. We demonstrate that OCR post correction and a page-wise structuring are prerequisites for the creation of a high-quality TEI version of a full text. The TEI version was created in cooperation with the Deutsches Textarchiv (DTA) at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW). A fully automated OCR post correction developed at the SuUB Bremen is freely available on GitHub. To enable scientists to work with powerful software tools the transfer of high-quality full texts to research infrastructures is a necessary step. We describe transfers of full text and the experience we have made, but still some general questions persist: What has to be done to prepare raw OCR output for this purpose in a reasonable and cost-effective manner? What quality is needed or expected? Which metadata and file formats are needed? Should there not be a closer cooperation between research infrastructures and libraries handling the digitization? OCR full texts, even post corrected, are not perfect but character recognition rates around 99% certainly provide more options than just being used as a search index. There is a vast amount of textual resources available ready to be made fully accessible for scientific research! Finally, some suggestions for scholars and the researchers working on digital serial publications are given.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The Ambiguities of Contempt for the Folliculaires in Eighteenth-Century France How to Avoid Making False Friends: Taking the Multilingual Turn in Periodical Studies An Analysis of the coverage of the 2008 presidential election campaign in periodical press in Cyprus Joyce and his Apostles: The Exchange of Capital between Joyce and transition 'Truth is Stranger than Fiction': Representations of Greece in the Wide World Magazine
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1