Design and Implementation of the first Generic Archive Storage Service for Research Data in Germany

Felix Bach, B. Schembera, J. V. Wezel
{"title":"Design and Implementation of the first Generic Archive Storage Service for Research Data in Germany","authors":"Felix Bach, B. Schembera, J. V. Wezel","doi":"10.2218/ijdc.v15i1.553","DOIUrl":null,"url":null,"abstract":"Research data as the true valuable good in science must be saved and subsequently kept findable, accessible and reusable for reasons of proper scientific conduct for a time span of several years. However, managing long-term storage of research data is a burden for institutes and researchers. Because of the sheer size and the required retention time apt storage providers are hard to find. \nAiming to solve this puzzle, the bwDataArchive project started development of a long-term research data archive that is reliable, cost effective and able store multiple petabytes of data. The hardware consists of data storage on magnetic tape, interfaced with disk caches and nodes for data movement and access. On the software side, the High Performance Storage System (HPSS) was chosen for its proven ability to reliably store huge amounts of data. However, the implementation of bwDataArchive is not dependant on HPSS. For authentication the bwDataArchive is integrated into the federated identity management for educational institutions in the State of Baden-Württemberg in Germany. \nThe archive features data protection by means of a dual copy at two distinct locations on different tape technologies, data accessibility by common storage protocols, data retention assurance for more than ten years, data preservation with checksums, and data management capabilities supported by a flexible directory structure allowing sharing and publication. As of September 2019, the bwDataArchive holds over 9 PB and 90 million files and sees a constant increase in usage and users from many communities.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of digital curation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2218/ijdc.v15i1.553","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Research data as the true valuable good in science must be saved and subsequently kept findable, accessible and reusable for reasons of proper scientific conduct for a time span of several years. However, managing long-term storage of research data is a burden for institutes and researchers. Because of the sheer size and the required retention time apt storage providers are hard to find. Aiming to solve this puzzle, the bwDataArchive project started development of a long-term research data archive that is reliable, cost effective and able store multiple petabytes of data. The hardware consists of data storage on magnetic tape, interfaced with disk caches and nodes for data movement and access. On the software side, the High Performance Storage System (HPSS) was chosen for its proven ability to reliably store huge amounts of data. However, the implementation of bwDataArchive is not dependant on HPSS. For authentication the bwDataArchive is integrated into the federated identity management for educational institutions in the State of Baden-Württemberg in Germany. The archive features data protection by means of a dual copy at two distinct locations on different tape technologies, data accessibility by common storage protocols, data retention assurance for more than ten years, data preservation with checksums, and data management capabilities supported by a flexible directory structure allowing sharing and publication. As of September 2019, the bwDataArchive holds over 9 PB and 90 million files and sees a constant increase in usage and users from many communities.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
德国首个研究数据通用档案存储服务的设计与实现
研究数据作为科学中真正有价值的东西,必须保存下来,并在随后的几年内保持可查找、可访问和可重复使用,以进行适当的科学行为。然而,管理研究数据的长期存储对研究所和研究人员来说是一种负担。由于庞大的规模和所需的保留时间,很难找到合适的存储提供商。为了解决这个难题,bwDataArchive项目开始开发一种长期的研究数据存档,它可靠、经济、能够存储多个pb的数据。硬件包括磁带上的数据存储,与磁盘缓存和用于数据移动和访问的节点相连。在软件方面,选择了高性能存储系统(HPSS),因为它具有可靠存储大量数据的能力。然而,bwDataArchive的实现并不依赖于HPSS。为了进行身份验证,bwDataArchive被集成到德国巴登州符腾堡州教育机构的联邦身份管理中。通过在不同磁带技术上的两个不同位置上的双重副本来实现数据保护,通过通用存储协议进行数据访问,保证数据保留超过十年,使用校验和进行数据保存,以及通过允许共享和发布的灵活目录结构支持的数据管理功能。截至2019年9月,bwDataArchive拥有超过9pb和9000万份文件,并且来自许多社区的使用量和用户不断增加。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
30 weeks
期刊最新文献
Reproducible and Attributable Materials Science Curation Practices: A Case Study Trusted Research Environments: Analysis of Characteristics and Data Availability Preserving Secondary Knowledge Factors Influencing Perceptions of Trust in Data Infrastructures Assessing Quality Variations in Early Career Researchers’ Data Management Plans
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1