Precisely and Persistently Identifying and Citing Arbitrary Subsets of Dynamic Data

A. Rauber, Bernhard Gößwein, C. Zwölf, C. Schubert, Florian Wörister, James Duncan, Katharina Flicker, K. Zettsu, Kristof Meixner, L. McIntosh, R. Jenkyns, Stefan Pröll, Tomasz Miksa, M. Parsons
{"title":"Precisely and Persistently Identifying and Citing Arbitrary Subsets of Dynamic Data","authors":"A. Rauber, Bernhard Gößwein, C. Zwölf, C. Schubert, Florian Wörister, James Duncan, Katharina Flicker, K. Zettsu, Kristof Meixner, L. McIntosh, R. Jenkyns, Stefan Pröll, Tomasz Miksa, M. Parsons","doi":"10.1162/99608f92.be565013","DOIUrl":null,"url":null,"abstract":"Precisely identifying arbitrary subsets of data so that these can be re-produced is a daunting challenge in data-driven science, the more so if the underlying data source is dynamically evolving. Yet, most settings exhibit exactly those characteristics: increasingly larger amounts of data being continuously ingested from a range of sources, with error correction and quality improvement processes adding to the dynamics. Yet, for studies to be reproducible, for decision-making to be transparent, and for meta studies to be performed conveniently, having a precise identification mechanism to reference, retrieve and work with such data is essential. The RDA Working Group on Dynamic Data Citation has published 14 recommendations that are centered around timestamping and versioning evolving data sources and identifying subsets dynamically via persistent identifiers that are assigned to the queries selecting the respective subsets. These principles are generic and work for virtually any kind of data. In the past few years numerous repositories around the globe have implemented these recommendations and deployed solution. This paper provides an overview of the recommendations, reference implementations and pilot systems deployed and analyses key lessons learned from these. This provides a solid","PeriodicalId":250931,"journal":{"name":"Issue 3.4, Fall 2021","volume":"49 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Issue 3.4, Fall 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/99608f92.be565013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Precisely identifying arbitrary subsets of data so that these can be re-produced is a daunting challenge in data-driven science, the more so if the underlying data source is dynamically evolving. Yet, most settings exhibit exactly those characteristics: increasingly larger amounts of data being continuously ingested from a range of sources, with error correction and quality improvement processes adding to the dynamics. Yet, for studies to be reproducible, for decision-making to be transparent, and for meta studies to be performed conveniently, having a precise identification mechanism to reference, retrieve and work with such data is essential. The RDA Working Group on Dynamic Data Citation has published 14 recommendations that are centered around timestamping and versioning evolving data sources and identifying subsets dynamically via persistent identifiers that are assigned to the queries selecting the respective subsets. These principles are generic and work for virtually any kind of data. In the past few years numerous repositories around the globe have implemented these recommendations and deployed solution. This paper provides an overview of the recommendations, reference implementations and pilot systems deployed and analyses key lessons learned from these. This provides a solid
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
精确持久地识别和引用动态数据的任意子集
在数据驱动的科学中,精确地识别任意数据子集以便重新生成这些子集是一项艰巨的挑战,如果底层数据源是动态发展的,则更是如此。然而,大多数设置都表现出这些特征:不断从各种来源摄取越来越多的数据,并伴随着错误纠正和质量改进过程增加了动态。然而,为了研究的可重复性,为了决策的透明度,为了meta研究的方便进行,拥有一个精确的识别机制来参考、检索和处理这些数据是必不可少的。RDA动态数据引用工作组发布了14项建议,这些建议围绕时间戳和版本控制不断发展的数据源,以及通过分配给选择各自子集的查询的持久标识符来动态标识子集。这些原则是通用的,几乎适用于任何类型的数据。在过去的几年中,全球各地的许多存储库已经实现了这些建议并部署了解决方案。本文概述了建议、参考实施和部署的试点系统,并分析了从中吸取的主要经验教训。这提供了一个坚实的
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Performance Measurement: Issues, Approaches, and Opportunities The Yankee Leviathan Collects Statistics: Federal Education Policy During Reconstruction Official Statistics from the Changing World of Data Science Precisely and Persistently Identifying and Citing Arbitrary Subsets of Dynamic Data Data Science for Official Statistics: Views of the United Nations Statistics Division
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1