Toward reliable biodiversity dataset references

Michael Elliott, J. Poelen, J. Fortes
{"title":"Toward reliable biodiversity dataset references","authors":"Michael Elliott, J. Poelen, J. Fortes","doi":"10.32942/osf.io/mysfp","DOIUrl":null,"url":null,"abstract":"No systematic approach has yet been adopted to reliably reference and provide access to digital biodiversity datasets. Based on accumulated evidence, we argue that location-based identifiers such as URLs are not sufficient to ensure long-term data access. We introduce a method that uses dedicated data observatories to evaluate long-term URL reliability.From March 2019 through May 2020, we took periodic inventories of the data provided to major biodiversity aggregators, including GBIF, iDigBio, DataONE, and BHL by accessing the URL-based dataset references from which the aggregators retrieve data. Over the period of observation, we found that, for the URL-based dataset references available in each of the aggregators' data provider registries, 5% to 70% of URLs were intermittently or consistently unresponsive, 0% to 66% produced unstable content, and 20% to 75% became either unresponsive or unstable.We propose the use of cryptographic hashing to generate content-based identifiers that can reliably reference datasets. We show that content-based identifiers facilitate decentralized archival and reliable distribution of biodiversity datasets to enable long-term accessibility of the referenced datasets.","PeriodicalId":178797,"journal":{"name":"Ecol. Informatics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecol. Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32942/osf.io/mysfp","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

No systematic approach has yet been adopted to reliably reference and provide access to digital biodiversity datasets. Based on accumulated evidence, we argue that location-based identifiers such as URLs are not sufficient to ensure long-term data access. We introduce a method that uses dedicated data observatories to evaluate long-term URL reliability.From March 2019 through May 2020, we took periodic inventories of the data provided to major biodiversity aggregators, including GBIF, iDigBio, DataONE, and BHL by accessing the URL-based dataset references from which the aggregators retrieve data. Over the period of observation, we found that, for the URL-based dataset references available in each of the aggregators' data provider registries, 5% to 70% of URLs were intermittently or consistently unresponsive, 0% to 66% produced unstable content, and 20% to 75% became either unresponsive or unstable.We propose the use of cryptographic hashing to generate content-based identifiers that can reliably reference datasets. We show that content-based identifiers facilitate decentralized archival and reliable distribution of biodiversity datasets to enable long-term accessibility of the referenced datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
迈向可靠的生物多样性数据集参考
目前还没有采用系统的方法来可靠地参考和提供对数字生物多样性数据集的访问。基于积累的证据,我们认为基于位置的标识符(如url)不足以确保长期数据访问。我们介绍了一种使用专用数据观测站来评估长期URL可靠性的方法。从2019年3月到2020年5月,我们通过访问聚合器从中检索数据的基于url的数据集引用,对提供给主要生物多样性聚合器(包括GBIF、iDigBio、DataONE和BHL)的数据进行了定期盘点。在观察期间,我们发现,对于每个聚合器的数据提供者注册表中可用的基于url的数据集引用,5%至70%的url间歇性或持续无响应,0%至66%产生不稳定的内容,20%至75%变得无响应或不稳定。我们建议使用加密散列来生成能够可靠地引用数据集的基于内容的标识符。我们表明,基于内容的标识符促进了生物多样性数据集的分散存档和可靠分布,从而使参考数据集能够长期访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Species density per grid cell no longer predicts the local abundance of woody plants Dust source susceptibility mapping based on remote sensing and machine learning techniques Climate change and dispersion dynamics of the invasive plant species Chromolaena odorata and Lantana camara in parts of the central and eastern India Remote sensing-based assessment of ecosystem health by optimizing vigor-organization-resilience model: A case study in Fuzhou City, China Predicting habitat suitability for Castor fiber reintroduction: MaxEnt vs SWOT-Spatial multicriteria approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1