首页 > 最新文献

Biodiversity Information Science and Standards最新文献

英文 中文
Semantic Mapping of the Geologic Time Scale: A temporal reference 地质时间尺度的语义映射:一个时间参考
Pub Date : 2023-09-07 DOI: 10.3897/biss.7.112232
Susan Edelstein, Ben Norton
The Geologic Time Scale is an ordered hierarchical set of terms representing specific time intervals in Earth's history. The hierarchical structure is correlated to the geologic record and major geologic events in Earth’s history (Gradstein et al. 2020). In the absence of quantitative numeric age values from absolute dating methods, the relative time intervals in the geologic time scale provide us with the vocabulary needed for deciphering Earth’s history and chronological reconstruction. This temporal frame of reference is critical to establishing correlations between specimens and how they fit within the Earth’s 4.567 Ga (giga annum) history. Due to spatial and temporal variations in the stratigraphic record, the terminology used in conjunction with geologic time scales is largely inconsistent. For a detailed discussion regarding term use in geologic timescales, see Cohen et al. (2013). As a result, published values for geologic timescale terms are often ambiguous and highly variable, limiting interoperability and hindering temporal correlations among specimens. A solution is to map verbatim geologic timescale values to a controlled vocabulary, constructing a single temporal frame of reference. The harmonization process is governed by an established set of business rules that can ultimately become fully or partially automated. In this study, we examined the Global Biodiversity Information Facility’s (GBIF) published distinct verbatim values for Darwin Core terms in the GeologicalContext Class of Darwin Core to assess the the use of chronostratiphic terms, a process highlighted in Sahdev et al. (2017). Preservation of these verbatim values, the initial unmapped set of published values, is important. Many are derived directly from primary source material and possess special historical and regional significance. These include land mammal ages (e.g., Lindsay (2003)), biostratigraphic zones, regional terms, and terms with higher granularity than the International Commission of Stratigraphy’s (ICS) timescale allows (e.g., subages/substages). For the purposes of this study, we selected the 2023/6 version of the ICS chronostratigraphic timescale as the controlled vocabulary (Cohen et al. 2023). The ICS is the most widely adopted timescale, comprising the most generalized and universally applicable intervals of geologic time. After semantic analysis of the verbatim values (see Table 1 for comparative statistics), we established a comprehensive set of business rules to map to the ICS timescale controlled vocabulary. This process yielded a collection of documented procedures to transform the heterogeneous collection of published terms into a semantically consistent dataset. The end result is a single temporal frame of reference for published geologic and paleontological specimens through semantic mapping to improve the temporal correlations between geologic specimens globally through data interoperability. This talk will highlight the process of harmon
地质年标是一组有序的、层次分明的术语,代表地球历史上特定的时间间隔。分层结构与地质记录和地球历史上的重大地质事件相关(Gradstein et al. 2020)。在绝对测年方法缺乏定量的数字年龄值的情况下,地质时间尺度上的相对时间间隔为我们提供了破译地球历史和重建时间顺序所需的词汇。这个时间参考框架对于建立标本之间的相关性以及它们如何适应地球4.567亿年(千兆年)历史至关重要。由于地层记录的时空变化,与地质时间尺度结合使用的术语在很大程度上是不一致的。有关术语在地质时间尺度上使用的详细讨论,请参见Cohen et al.(2013)。因此,地质时标项的公布值往往是模糊的和高度可变的,限制了互操作性,阻碍了标本之间的时间相关性。一种解决方案是将逐字的地质时间标度值映射到受控词汇表,构建一个单一的时间参考框架。协调过程由一组已建立的业务规则控制,这些业务规则最终可以完全或部分自动化。在本研究中,我们检查了全球生物多样性信息设施(GBIF)在达尔文核心的地质背景类中发布的达尔文核心术语的不同逐字值,以评估时间地层术语的使用,Sahdev等人(2017)强调了这一过程。保存这些逐字值(发布值的初始未映射集)非常重要。许多是直接来源于原始材料,具有特殊的历史和地域意义。这些包括陆地哺乳动物的年龄(例如Lindsay(2003))、生物地层带、区域术语以及比国际地层学委员会(ICS)的时间尺度允许的粒度更高的术语(例如亚年龄/亚阶段)。为了本研究的目的,我们选择了2023/6版本的ICS年代地层时间标度作为控制词汇(Cohen et al. 2023)。ICS是采用最广泛的时间标度,它包括最广义和普遍适用的地质时间间隔。在对逐字值进行语义分析之后(参见表1的比较统计数据),我们建立了一组全面的业务规则,以映射到ICS时间刻度控制词汇表。这个过程产生了一组文档化的过程,用于将异构的已发布术语集合转换为语义一致的数据集。最终结果是通过语义映射为已发表的地质和古生物标本提供一个单一的时间参考框架,通过数据互操作性提高全球地质标本之间的时间相关性。本次演讲将重点介绍通过语义映射将已发表的逐字地质时间尺度值的异构集合与已建立的受控词汇表协调起来的过程。
{"title":"Semantic Mapping of the Geologic Time Scale: A temporal reference","authors":"Susan Edelstein, Ben Norton","doi":"10.3897/biss.7.112232","DOIUrl":"https://doi.org/10.3897/biss.7.112232","url":null,"abstract":"The Geologic Time Scale is an ordered hierarchical set of terms representing specific time intervals in Earth's history. The hierarchical structure is correlated to the geologic record and major geologic events in Earth’s history (Gradstein et al. 2020). In the absence of quantitative numeric age values from absolute dating methods, the relative time intervals in the geologic time scale provide us with the vocabulary needed for deciphering Earth’s history and chronological reconstruction. This temporal frame of reference is critical to establishing correlations between specimens and how they fit within the Earth’s 4.567 Ga (giga annum) history.\u0000 Due to spatial and temporal variations in the stratigraphic record, the terminology used in conjunction with geologic time scales is largely inconsistent. For a detailed discussion regarding term use in geologic timescales, see Cohen et al. (2013). As a result, published values for geologic timescale terms are often ambiguous and highly variable, limiting interoperability and hindering temporal correlations among specimens. A solution is to map verbatim geologic timescale values to a controlled vocabulary, constructing a single temporal frame of reference. The harmonization process is governed by an established set of business rules that can ultimately become fully or partially automated.\u0000 In this study, we examined the Global Biodiversity Information Facility’s (GBIF) published distinct verbatim values for Darwin Core terms in the GeologicalContext Class of Darwin Core to assess the the use of chronostratiphic terms, a process highlighted in Sahdev et al. (2017). Preservation of these verbatim values, the initial unmapped set of published values, is important. Many are derived directly from primary source material and possess special historical and regional significance. These include land mammal ages (e.g., Lindsay (2003)), biostratigraphic zones, regional terms, and terms with higher granularity than the International Commission of Stratigraphy’s (ICS) timescale allows (e.g., subages/substages). For the purposes of this study, we selected the 2023/6 version of the ICS chronostratigraphic timescale as the controlled vocabulary (Cohen et al. 2023). The ICS is the most widely adopted timescale, comprising the most generalized and universally applicable intervals of geologic time.\u0000 After semantic analysis of the verbatim values (see Table 1 for comparative statistics), we established a comprehensive set of business rules to map to the ICS timescale controlled vocabulary. This process yielded a collection of documented procedures to transform the heterogeneous collection of published terms into a semantically consistent dataset. The end result is a single temporal frame of reference for published geologic and paleontological specimens through semantic mapping to improve the temporal correlations between geologic specimens globally through data interoperability. This talk will highlight the process of harmon","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84260869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regional Data Platform of West and Central African Herbaria 西非和中非植物标本区域数据平台
Pub Date : 2023-09-07 DOI: 10.3897/biss.7.112180
Alice Ainsa, Sophie Pamerlon, Anne-Sophie Archambeau, Rémi Beauvieux, R. Radji, Hervé Chevillotte
In April 2021, a Biodiversity Information for Development (BID) project was launched to deliver a regional data platform of West and Central African herbaria, which just concluded in April 2023. A dataset containing 168,545 herbarium specimens from 6 different countries: Togo, Gabon, Ivory Coast, Benin, Guinea Conakry and Cameroon, is now visible on the Global Biodiversity Information Facility (GBIF) website and will be regularly updated. A checklist datatset (Radji 2023a) and an occurrence dataset (Radji 2023b) obtained from herbarium parts are also available on GBIF. In addition, a Living Atlases portal for herbaria in West and Central Africa has been created to allow users to search, display, filter, and download these data. This application reuses open source modules developed by the Atlas of Living Australia (ALA) community (Morin et al. 2021). In addition to that, the RIHA platform (Réseau Informatique des Herbiers d'Afrique / Digital Network of African Herbaria) enables herbarium administrators to manage their own data. Thanks to all these tools, the workflow (Fig. 1) for data publication on GBIF is carried out regularly and easily and the addition of new member herbaria from West and Central Africa can be easily incorporated.
2021年4月,启动了生物多样性信息促进发展(BID)项目,以提供西非和中非植物标本区域数据平台,该项目于2023年4月刚刚结束。一个包含来自6个不同国家(多哥、加蓬、科特迪瓦、贝宁、几内亚、科纳克里和喀麦隆)的168,545个植物标本的数据集现在可以在全球生物多样性信息设施(GBIF)网站上看到,并将定期更新。从植物标本馆部分获得的清单数据集(Radji 2023a)和事件数据集(Radji 2023b)也可在GBIF上获得。此外,还为西非和中非的植物标本馆创建了一个“生活地图集”门户,允许用户搜索、显示、过滤和下载这些数据。该应用程序重用了澳大利亚生活图集(ALA)社区开发的开源模块(Morin et al. 2021)。除此之外,RIHA平台(非洲植物标本室数字网络)使植物标本室管理员能够管理他们自己的数据。多亏了所有这些工具,GBIF数据发布的工作流程(图1)可以定期轻松地进行,并且可以轻松地加入来自西非和中非的新成员植物标本馆。
{"title":"Regional Data Platform of West and Central African Herbaria","authors":"Alice Ainsa, Sophie Pamerlon, Anne-Sophie Archambeau, Rémi Beauvieux, R. Radji, Hervé Chevillotte","doi":"10.3897/biss.7.112180","DOIUrl":"https://doi.org/10.3897/biss.7.112180","url":null,"abstract":"In April 2021, a Biodiversity Information for Development (BID) project was launched to deliver a regional data platform of West and Central African herbaria, which just concluded in April 2023. A dataset containing 168,545 herbarium specimens from 6 different countries: Togo, Gabon, Ivory Coast, Benin, Guinea Conakry and Cameroon, is now visible on the Global Biodiversity Information Facility (GBIF) website and will be regularly updated. A checklist datatset (Radji 2023a) and an occurrence dataset (Radji 2023b) obtained from herbarium parts are also available on GBIF.\u0000 In addition, a Living Atlases portal for herbaria in West and Central Africa has been created to allow users to search, display, filter, and download these data. This application reuses open source modules developed by the Atlas of Living Australia (ALA) community (Morin et al. 2021).\u0000 In addition to that, the RIHA platform (Réseau Informatique des Herbiers d'Afrique / Digital Network of African Herbaria) enables herbarium administrators to manage their own data. Thanks to all these tools, the workflow (Fig. 1) for data publication on GBIF is carried out regularly and easily and the addition of new member herbaria from West and Central Africa can be easily incorporated.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80691004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retiring TDWG Standards and How Mapping Standards Could Support Agility 淘汰TDWG标准以及映射标准如何支持敏捷性
Pub Date : 2023-09-07 DOI: 10.3897/biss.7.112258
Kristen "Kit" Lewers
Since its genesis in September 1985, TDWG (formerly Taxonomic Databases Working Group now Biodiversity Information Standards) has become the steward of standards for the biodiversity informatics community; however, there is not yet a process for retiring standards. This talk will educate the community members on TDWG Standard categories of "Current Standard", "Prior Standard", "2005 Standard", and the history and context of how these categories came to be. It will also report on the progress the TAG (Technical Architecture Group) has made on moving towards creating a process for retiring standards through auditing, community participation, and other methods. Mapping TDWG standards can provide an agility to address overlaps, gaps, contradictions, and/or inconsistencies between standards in a proactive manner. Mapping standards' relationships provides infrastructure to support decision-making, combat information overload, and give context to the community as it continues to progress at a rapid pace. More specifically for TDWG, it gives a clear picture how updating, ratifying, and/or retiring a singular standard impacts the greater TDWG information ecosystem and how to update adjacent standards to preserve clarity and consistency for the community as a whole.
自1985年9月成立以来,TDWG(原分类学数据库工作组,现为生物多样性信息标准)已成为生物多样性信息界标准的管理者;然而,目前还没有退出标准的程序。本次讲座将向社区成员介绍TDWG标准的“当前标准”、“先前标准”、“2005标准”,以及这些标准形成的历史和背景。它还将报告TAG(技术架构组)在通过审计、社区参与和其他方法创建标准退役过程方面所取得的进展。映射TDWG标准可以提供一种敏捷性,以主动的方式处理标准之间的重叠、差距、矛盾和/或不一致。映射标准的关系提供了支持决策、对抗信息过载的基础设施,并在社区继续快速发展的过程中为其提供背景。更具体地说,对于TDWG,它清晰地描述了更新、批准和/或废除单一标准如何影响更大的TDWG信息生态系统,以及如何更新相邻标准以保持整个社区的清晰度和一致性。
{"title":"Retiring TDWG Standards and How Mapping Standards Could Support Agility","authors":"Kristen \"Kit\" Lewers","doi":"10.3897/biss.7.112258","DOIUrl":"https://doi.org/10.3897/biss.7.112258","url":null,"abstract":"Since its genesis in September 1985, TDWG (formerly Taxonomic Databases Working Group now Biodiversity Information Standards) has become the steward of standards for the biodiversity informatics community; however, there is not yet a process for retiring standards. This talk will educate the community members on TDWG Standard categories of \"Current Standard\", \"Prior Standard\", \"2005 Standard\", and the history and context of how these categories came to be. It will also report on the progress the TAG (Technical Architecture Group) has made on moving towards creating a process for retiring standards through auditing, community participation, and other methods. \u0000 Mapping TDWG standards can provide an agility to address overlaps, gaps, contradictions, and/or inconsistencies between standards in a proactive manner. Mapping standards' relationships provides infrastructure to support decision-making, combat information overload, and give context to the community as it continues to progress at a rapid pace. More specifically for TDWG, it gives a clear picture how updating, ratifying, and/or retiring a singular standard impacts the greater TDWG information ecosystem and how to update adjacent standards to preserve clarity and consistency for the community as a whole.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"91 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83207348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Elevating the Fitness of Use of GBIF Occurrence Datasets: A proposal for peer review 提高GBIF发生数据集使用的适应度:一项同行评审建议
Pub Date : 2023-09-07 DOI: 10.3897/biss.7.112237
Vijay Barve
Biodiversity data plays a pivotal role in understanding and conserving our natural world. As the largest occurrence data aggregator, the Global Biodiversity Information Facility (GBIF) serves as a valuable platform for researchers and practitioners to access and analyze biodiversity information from across the globe (Ball-Damerow et al. 2019). However, ensuring the quality of GBIF datasets remains a critical challenge (Chapman 2005). The community emphasizes the importance of data quality and its direct impact on the fitness of use for biodiversity research and conservation efforts (Chapman et al. 2020). While GBIF continues to grow in terms of the quantity of data it provides, the quality of these datasets varies significantly (Zizka et al. 2020). The biodiversity informatics community has been working diligently to ensure data quality at every step of data creation, curation, publication (Waller et al. 2021), and end-use (Gueta et al. 2019) by employing automated tools and flagging systems to identify and address issues. However, there is still more work to be done to effectively address data quality problems and enhance the fitness of use for GBIF-mediated data. I highlight a missing component in GBIF's data publication process: the absence of formal peer reviews. Despite GBIF encompassing the essential elements of a data paper, including detailed metadata, data accessibility, and robust data citation mechanisms, the lack of peer review hinders the credibility and reliability of the datasets mobilized through GBIF. To bridge this gap, I propose the implementation of a comprehensive peer review system within GBIF. Peer reviews would involve subjecting GBIF datasets to rigorous evaluation by domain experts and data scientists, ensuring the accuracy, completeness, and consistency of the data. This process would enhance the trustworthiness and usability of datasets, enabling researchers and policymakers to make informed decisions based on reliable biodiversity information. Furthermore, the establishment of a peer review system within GBIF would foster collaboration and knowledge exchange among the biodiversity community, as experts provide constructive feedback to dataset authors. This iterative process would not only improve data quality but also encourage data contributors to adhere to best practices, thereby elevating the overall standards of biodiversity data mobilization through GBIF.
生物多样性数据在理解和保护我们的自然世界中起着关键作用。作为最大的发生数据聚合器,全球生物多样性信息设施(GBIF)为研究人员和从业者获取和分析全球生物多样性信息提供了一个有价值的平台(Ball-Damerow et al. 2019)。然而,确保GBIF数据集的质量仍然是一个关键的挑战(Chapman 2005)。科学界强调数据质量的重要性及其对生物多样性研究和保护工作的适用性的直接影响(Chapman et al. 2020)。虽然GBIF在提供的数据量方面继续增长,但这些数据集的质量差异很大(Zizka et al. 2020)。生物多样性信息学社区一直在努力确保数据创建、管理、出版(Waller等人,2021)和最终使用(Gueta等人,2019)的每一步的数据质量,方法是采用自动化工具和标记系统来识别和解决问题。然而,要有效地解决数据质量问题,提高gbif介导数据的使用适应性,还有更多的工作要做。我强调GBIF数据发布过程中缺失的一个组成部分:缺乏正式的同行评审。尽管GBIF包含了数据论文的基本要素,包括详细的元数据、数据可访问性和强大的数据引用机制,但缺乏同行评审阻碍了通过GBIF动员的数据集的可信度和可靠性。为了弥补这一差距,我建议在GBIF内实施全面的同行评议制度。同行评审将涉及让GBIF数据集接受领域专家和数据科学家的严格评估,以确保数据的准确性、完整性和一致性。这一过程将提高数据集的可信度和可用性,使研究人员和决策者能够根据可靠的生物多样性信息做出明智的决策。此外,在GBIF内建立同行评审系统将促进生物多样性界之间的合作和知识交流,因为专家可以向数据集作者提供建设性的反馈。这一迭代过程不仅可以提高数据质量,还可以鼓励数据提供者遵循最佳做法,从而通过GBIF提高生物多样性数据动员的总体标准。
{"title":"Elevating the Fitness of Use of GBIF Occurrence Datasets: A proposal for peer review","authors":"Vijay Barve","doi":"10.3897/biss.7.112237","DOIUrl":"https://doi.org/10.3897/biss.7.112237","url":null,"abstract":"Biodiversity data plays a pivotal role in understanding and conserving our natural world. As the largest occurrence data aggregator, the Global Biodiversity Information Facility (GBIF) serves as a valuable platform for researchers and practitioners to access and analyze biodiversity information from across the globe (Ball-Damerow et al. 2019). However, ensuring the quality of GBIF datasets remains a critical challenge (Chapman 2005).\u0000 The community emphasizes the importance of data quality and its direct impact on the fitness of use for biodiversity research and conservation efforts (Chapman et al. 2020). While GBIF continues to grow in terms of the quantity of data it provides, the quality of these datasets varies significantly (Zizka et al. 2020). The biodiversity informatics community has been working diligently to ensure data quality at every step of data creation, curation, publication (Waller et al. 2021), and end-use (Gueta et al. 2019) by employing automated tools and flagging systems to identify and address issues. However, there is still more work to be done to effectively address data quality problems and enhance the fitness of use for GBIF-mediated data.\u0000 I highlight a missing component in GBIF's data publication process: the absence of formal peer reviews. Despite GBIF encompassing the essential elements of a data paper, including detailed metadata, data accessibility, and robust data citation mechanisms, the lack of peer review hinders the credibility and reliability of the datasets mobilized through GBIF.\u0000 To bridge this gap, I propose the implementation of a comprehensive peer review system within GBIF. Peer reviews would involve subjecting GBIF datasets to rigorous evaluation by domain experts and data scientists, ensuring the accuracy, completeness, and consistency of the data. This process would enhance the trustworthiness and usability of datasets, enabling researchers and policymakers to make informed decisions based on reliable biodiversity information.\u0000 Furthermore, the establishment of a peer review system within GBIF would foster collaboration and knowledge exchange among the biodiversity community, as experts provide constructive feedback to dataset authors. This iterative process would not only improve data quality but also encourage data contributors to adhere to best practices, thereby elevating the overall standards of biodiversity data mobilization through GBIF.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74002214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Long Tails of Specimen Data 论样本数据的长尾
Pub Date : 2023-09-07 DOI: 10.3897/biss.7.112151
Arturo H. Ariño
A recent article by K.R. Johnson and I.F.P. Owens in Science (Johnson and Owens 2023) suggested that the 73 main natural history museums around the world collectively hold over 1 billion records of accessioned "specimens" (taken as collection units), a result remarkably close to, but obtained through a completely different method from, research published a decade earlier by A.H. Ariño in Biodiversity Informatics (Ariño 2010). Both sets of approaches have benefitted from information available at the Global Biodiversity Information Facility (GBIF), which in the intervening years has grown by an order of magnitude, although mostly through observation-based occurrences rather than through accretion of specimen records in collections. When comparing the estimated size of collections and the amount of digital data from those collections, there is still a huge gap, as there was then. Digitization efforts have been progressing, but they are still far from reaching the goal of bringing information about all specimens into the digital domain. While the larger institutions may doubtlessly have greater overall resources to try and make their data available than smaller institutions, how do they compare in terms of data mobilization and sharing? Not surprisingly, the distribution of the collection sizes shows a long tail of small institutions that, nonetheless, are also embarking on digitization efforts. Will this long tail of science actually manage to have all their biodiversity data available sooner than the larger institutions? It is becoming more widely recognized that data usability is predicated on data becoming findable, accessible, interoperable and reusable (FAIR, Wilkinson et al. 2016). What could be the consequences of having a data availability bias towards having many tiny collections available for ready use, rather than a much smaller (although surely very significant) fraction of larger collections of a comparable type? This presentation explores and compares the distribution of potential versus readily available data in 2010 and in 2023, examines what trends might exist in the race to universal specimen data availability, and whether the digitization efforts might be better targeted to achieve greater overall scientific benefit.
K.R. Johnson和I.F.P. Owens最近在《科学》(Johnson and Owens 2023)上发表的一篇文章指出,世界上73个主要的自然历史博物馆总共拥有超过10亿条加入的“标本”记录(作为收集单位),这一结果与A.H. Ariño在《生物多样性信息学》(Ariño 2010)上发表的研究结果非常接近,但通过一种完全不同的方法获得。这两种方法都受益于全球生物多样性信息设施(GBIF)提供的信息,这些信息在其间的几年里增长了一个数量级,尽管主要是通过基于观测的事件,而不是通过收集标本记录的增加。当比较这些集合的估计大小和来自这些集合的数字数据的数量时,仍然存在巨大的差距,就像当时一样。数字化工作已经取得了进展,但距离将所有标本的信息纳入数字领域的目标还很遥远。虽然大型机构无疑比小型机构拥有更多的整体资源来尝试和提供数据,但它们在数据动员和共享方面如何比较?毫不奇怪,收藏规模的分布显示出小型机构的长尾,尽管如此,它们也在着手数字化工作。这种长尾科学真的能比大型机构更快地获得所有生物多样性数据吗?越来越多的人认识到,数据可用性取决于数据的可查找性、可访问性、可互操作性和可重用性(FAIR, Wilkinson et al. 2016)。如果数据可用性倾向于拥有许多可供随时使用的小集合,而不是类似类型的大集合的一小部分(尽管肯定非常重要),那么会有什么后果呢?本报告探讨并比较了2010年和2023年潜在数据与现成数据的分布,探讨了在普遍标本数据可用性的竞争中可能存在的趋势,以及数字化工作是否可以更好地针对性地实现更大的整体科学效益。
{"title":"On the Long Tails of Specimen Data","authors":"Arturo H. Ariño","doi":"10.3897/biss.7.112151","DOIUrl":"https://doi.org/10.3897/biss.7.112151","url":null,"abstract":"A recent article by K.R. Johnson and I.F.P. Owens in Science (Johnson and Owens 2023) suggested that the 73 main natural history museums around the world collectively hold over 1 billion records of accessioned \"specimens\" (taken as collection units), a result remarkably close to, but obtained through a completely different method from, research published a decade earlier by A.H. Ariño in Biodiversity Informatics (Ariño 2010). Both sets of approaches have benefitted from information available at the Global Biodiversity Information Facility (GBIF), which in the intervening years has grown by an order of magnitude, although mostly through observation-based occurrences rather than through accretion of specimen records in collections. When comparing the estimated size of collections and the amount of digital data from those collections, there is still a huge gap, as there was then. Digitization efforts have been progressing, but they are still far from reaching the goal of bringing information about all specimens into the digital domain.\u0000 While the larger institutions may doubtlessly have greater overall resources to try and make their data available than smaller institutions, how do they compare in terms of data mobilization and sharing? Not surprisingly, the distribution of the collection sizes shows a long tail of small institutions that, nonetheless, are also embarking on digitization efforts. Will this long tail of science actually manage to have all their biodiversity data available sooner than the larger institutions? It is becoming more widely recognized that data usability is predicated on data becoming findable, accessible, interoperable and reusable (FAIR, Wilkinson et al. 2016). What could be the consequences of having a data availability bias towards having many tiny collections available for ready use, rather than a much smaller (although surely very significant) fraction of larger collections of a comparable type?\u0000 This presentation explores and compares the distribution of potential versus readily available data in 2010 and in 2023, examines what trends might exist in the race to universal specimen data availability, and whether the digitization efforts might be better targeted to achieve greater overall scientific benefit.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"241 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78356990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OpenObs: Living Atlases platform for French biodiversity data OpenObs:法国生物多样性数据的生活地图集平台
Pub Date : 2023-09-07 DOI: 10.3897/biss.7.112179
Alice Ainsa, Sophie Pamerlon, Anne-Sophie Archambeau, Solène Robert, Rémi Beauvieux
The OpenObs project, led by Patrinat, was launched in September 2017, and the first version of this tool was released in October 2020. OpenObs is based on the Atlas of Living Australia platform, supported by the Global Biodiversity Information Facility (GBIF) community, particularly the Living Atlases (LA) collective. OpenObs enables the visualization and downloading of observation data on species available in the National Inventory of Natural Heritage (INPN), the national platform of SINP (Information System for the Inventory of Natural Heritage). It provides open access to non-sensitive public data and includes all available observations, whether they are occurrence or synthesis data. As of July 2023, OpenObs has 134,922,015 observation records, and new data is reguarly added (at least twice a year). Furthermore, the project is constantly evolving with new developments planned, such as a user validation interface and new cartographic tools. We will present the architecture of this LA-based national biodiversity portal (Fig. 1), as well as its planned new functionality and development roadmap.
由Patrinat领导的OpenObs项目于2017年9月启动,该工具的第一个版本于2020年10月发布。OpenObs是基于澳大利亚生活地图集平台,由全球生物多样性信息设施(GBIF)社区,特别是生活地图集(LA)集体支持。OpenObs实现了国家自然遗产目录(INPN)中物种观测数据的可视化和下载,INPN是国家自然遗产目录信息系统(SINP)的国家平台。它提供对非敏感公共数据的开放访问,并包括所有可用的观测数据,无论是发生数据还是合成数据。截至2023年7月,OpenObs有134,922,015条观测记录,并定期添加新数据(每年至少两次)。此外,该项目随着新的开发计划不断发展,例如用户验证界面和新的制图工具。我们将介绍这个基于洛杉矶的国家生物多样性门户网站的架构(图1),以及它计划的新功能和发展路线图。
{"title":"OpenObs: Living Atlases platform for French biodiversity data","authors":"Alice Ainsa, Sophie Pamerlon, Anne-Sophie Archambeau, Solène Robert, Rémi Beauvieux","doi":"10.3897/biss.7.112179","DOIUrl":"https://doi.org/10.3897/biss.7.112179","url":null,"abstract":"The OpenObs project, led by Patrinat, was launched in September 2017, and the first version of this tool was released in October 2020. OpenObs is based on the Atlas of Living Australia platform, supported by the Global Biodiversity Information Facility (GBIF) community, particularly the Living Atlases (LA) collective.\u0000 OpenObs enables the visualization and downloading of observation data on species available in the National Inventory of Natural Heritage (INPN), the national platform of SINP (Information System for the Inventory of Natural Heritage). It provides open access to non-sensitive public data and includes all available observations, whether they are occurrence or synthesis data.\u0000 As of July 2023, OpenObs has 134,922,015 observation records, and new data is reguarly added (at least twice a year). Furthermore, the project is constantly evolving with new developments planned, such as a user validation interface and new cartographic tools.\u0000 We will present the architecture of this LA-based national biodiversity portal (Fig. 1), as well as its planned new functionality and development roadmap.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"16 15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87077943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Australian Model of Cooperative Data Publishing to OBIS and GBIF 澳大利亚向OBIS和GBIF合作发布数据的模式
Pub Date : 2023-09-07 DOI: 10.3897/biss.7.112228
Katherine Tattersall, P. Newman, Sachit Rajbhandari, Dave Watts, Mahmoud Sadeghi
The Australian Commonwealth Science and Industrial Research Organisation (CSIRO) hosts both the Australian Ocean Biodiversity Information System (OBIS) and Global Biodiversity Information Facility (GBIF) nodes within the National Collections and Marine Infrastructure (NCMI) business unit. OBIS-AU is led by the NCMI Information and Data Centre and publishes marine biodiversity data in the Darwin Core (DwC) standard via an Integrated Publishing Toolkit (IPT), with over 450 marine datasets at present. The Australian GBIF node is hosted by a separate team at the Atlas of Living Australia (ALA), a national-scale biodiversity analytical and knowledge delivery portal. The ALA aggregates and publishes over 800 terrestrial and marine datasets from a wide variety of research institutes, museums and collections, governments and citizen science agencies, including OBIS-AU. Many OBIS-AU published datasets are harvested and republished by ALA and vice-versa. OBIS-AU identifies, performs Quality Control and formats marine biodiversity and observation data, then publishes directly to the OBIS international data repository and portal, using GBIF IPT technology. The ALA data processing pipeline harvests, aggregates and enhances datasets from many sources with authoritative taxonomic and spatial reference data before passing the data on to GBIF. OBIS-AU and ALA are working together to ensure that the publication pathways for any datasets managed by both (with potential for duplication of records and incomplete metadata harvests) are rationalised and that a single collaborative workflow across both units is followed for publication to GBIF. Recently, the data management groups have established an agreement to cooperatively publish marine data and eDNA data. OBIS-AU have commenced publishing datasets directly to GBIF with ALA endorsement. We present the convergent evolution of OBIS and GBIF data publishing in Australia, adaptive data workflows to maintain data and metadata integrity, challenges encountered, how domain expertise ensures data quality and the benefits of sharing data skills and code, especially in publishing eDNA data types in DwC (using the DNA-derived data extension) and exploring the new CamTrap Data Package using Frictionless data. We also present the work that both data groups are doing toward adopting the GBIF new Unified Data model for publishing data. This Australian case study demonstrates the strengths of collaborative data publishing and offers a model that minimises replication of data in global aggregators through the development of regional integrated data publishing pipelines.
澳大利亚联邦科学与工业研究组织(CSIRO)拥有澳大利亚海洋生物多样性信息系统(OBIS)和全球生物多样性信息设施(GBIF)节点,隶属于国家收藏和海洋基础设施(NCMI)业务部门。OBIS-AU由NCMI信息和数据中心领导,通过综合出版工具包(IPT)发布达尔文核心(DwC)标准的海洋生物多样性数据,目前有450多个海洋数据集。澳大利亚GBIF节点由澳大利亚生活地图集(ALA)的一个独立团队主持,ALA是一个全国性的生物多样性分析和知识传递门户。ALA汇集并出版了800多个陆地和海洋数据集,这些数据集来自各种各样的研究机构、博物馆和收藏馆、政府和公民科学机构,包括OBIS-AU。许多OBIS-AU发布的数据集被ALA收集和重新发布,反之亦然。OBIS- au对海洋生物多样性和观测数据进行识别、质量控制和格式化,然后使用GBIF IPT技术直接发布到OBIS国际数据存储库和门户网站。ALA数据处理管道在将数据传递给GBIF之前,收集、汇总和增强来自许多来源的具有权威分类和空间参考数据的数据集。OBIS-AU和ALA正在共同努力,以确保由两者管理的任何数据集的发布路径(可能存在重复记录和不完整的元数据收集)都是合理的,并且遵循跨两个单位的单一协作工作流来发布到GBIF。最近,数据管理小组达成了一项协议,合作发布海洋数据和eDNA数据。OBIS-AU已经开始在ALA的认可下直接向GBIF发布数据集。我们介绍了澳大利亚OBIS和GBIF数据发布的趋同演变,维护数据和元数据完整性的自适应数据工作流程,遇到的挑战,领域专业知识如何确保数据质量以及共享数据技能和代码的好处,特别是在DwC中发布eDNA数据类型(使用dna衍生数据扩展)以及使用无摩擦数据探索新的CamTrap数据包。我们还介绍了两个数据组为采用GBIF新的统一数据模型发布数据所做的工作。这个澳大利亚案例研究展示了协作数据发布的优势,并提供了一个模型,通过开发区域集成数据发布管道,最大限度地减少全球聚合器中的数据复制。
{"title":"An Australian Model of Cooperative Data Publishing to OBIS and GBIF","authors":"Katherine Tattersall, P. Newman, Sachit Rajbhandari, Dave Watts, Mahmoud Sadeghi","doi":"10.3897/biss.7.112228","DOIUrl":"https://doi.org/10.3897/biss.7.112228","url":null,"abstract":"The Australian Commonwealth Science and Industrial Research Organisation (CSIRO) hosts both the Australian Ocean Biodiversity Information System (OBIS) and Global Biodiversity Information Facility (GBIF) nodes within the National Collections and Marine Infrastructure (NCMI) business unit. OBIS-AU is led by the NCMI Information and Data Centre and publishes marine biodiversity data in the Darwin Core (DwC) standard via an Integrated Publishing Toolkit (IPT), with over 450 marine datasets at present. The Australian GBIF node is hosted by a separate team at the Atlas of Living Australia (ALA), a national-scale biodiversity analytical and knowledge delivery portal. The ALA aggregates and publishes over 800 terrestrial and marine datasets from a wide variety of research institutes, museums and collections, governments and citizen science agencies, including OBIS-AU. Many OBIS-AU published datasets are harvested and republished by ALA and vice-versa.\u0000 OBIS-AU identifies, performs Quality Control and formats marine biodiversity and observation data, then publishes directly to the OBIS international data repository and portal, using GBIF IPT technology. The ALA data processing pipeline harvests, aggregates and enhances datasets from many sources with authoritative taxonomic and spatial reference data before passing the data on to GBIF. OBIS-AU and ALA are working together to ensure that the publication pathways for any datasets managed by both (with potential for duplication of records and incomplete metadata harvests) are rationalised and that a single collaborative workflow across both units is followed for publication to GBIF. Recently, the data management groups have established an agreement to cooperatively publish marine data and eDNA data. OBIS-AU have commenced publishing datasets directly to GBIF with ALA endorsement.\u0000 We present the convergent evolution of OBIS and GBIF data publishing in Australia, adaptive data workflows to maintain data and metadata integrity, challenges encountered, how domain expertise ensures data quality and the benefits of sharing data skills and code, especially in publishing eDNA data types in DwC (using the DNA-derived data extension) and exploring the new CamTrap Data Package using Frictionless data. We also present the work that both data groups are doing toward adopting the GBIF new Unified Data model for publishing data. This Australian case study demonstrates the strengths of collaborative data publishing and offers a model that minimises replication of data in global aggregators through the development of regional integrated data publishing pipelines.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79790555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Making Schemas and Mappings Available and FAIR: A metadata and schema crosswalk registry from the FAIRCORE4EOSC project 使模式和映射可用和公平:来自FAIRCORE4EOSC项目的元数据和模式交叉注册表
Pub Date : 2023-09-07 DOI: 10.3897/biss.7.112223
T. Suominen, Joonas Kesäniemi, Hanna Koivula
Community standards like the Darwin Core (Darwin Core Task Group 2009) together with semantic artefacts (controlled vocabularies, ontologies, thesauri, and other knowledge organisation systems) are key building blocks for the implementation of the FAIR (Findable, Accessible, Interoperable, Reusable) principles (Wilkinson et al. 2016), specifically as emphasized in the Interoperability principle I2 “(Meta)data use vocabularies that follow FAIR principles”. However, most of these artefacts are actually not FAIR themselves (Le Franc et al. 2020). To address this, the FAIRCORE4EOSC project (2022-25) is developing a Metadata Schema and Crosswalk Registry (MSCR) that will allow registered users and communities to create, register and version schemas and crosswalks that all have persistent identifiers (PIDs). The published content can then be searched, browsed and downloaded without restrictions. The MSCR will also provide an API to facilitate the transformation of data from one schema to another via registered crosswalks. It will provide projects and individual researchers with the possibility to manage their metadata schemas and/or relevant metadata schema crosswalks. The schema and crosswalks will be shared with the community for reuse and extension supported by a proper versioning mechanism. The registry tool will facilitate better interoperability between resource catalogues and information systems using different (metadata) schemas and encourage organisations and especially researchers to share their metadata interoperability by publishing the metadata crosswalks used in their workflows, which are currently not visible (FAIRification). By providing an easy-to-use graphical user interface (GUI) for creating crosswalks, the GUI will attract users currently relying on project-specific solutions.
像达尔文核心这样的社区标准(达尔文核心任务组2009)以及语义工件(控制词汇表、本体、词典和其他知识组织系统)是实现FAIR(可查找、可访问、可互操作、可重用)原则(Wilkinson等人,2016)的关键构建块,特别是互操作性原则I2“(元)数据使用遵循FAIR原则的词汇表”中所强调的。然而,这些人工制品中的大多数实际上本身并不公平(Le Franc et al. 2020)。为了解决这个问题,FAIRCORE4EOSC项目(2022-25)正在开发一个元数据模式和人行横道注册表(MSCR),该注册表将允许注册用户和社区创建、注册和版本所有具有持久标识符(id)的模式和人行横道。发布的内容可以不受限制地搜索、浏览和下载。MSCR还将提供一个API,通过注册的人行横道促进数据从一种模式到另一种模式的转换。它将为项目和个人研究人员提供管理他们的元数据模式和/或相关元数据模式交叉通道的可能性。模式和交叉通道将与社区共享,以便在适当的版本控制机制支持下进行重用和扩展。注册工具将促进使用不同(元数据)模式的资源目录和信息系统之间更好的互操作性,并鼓励组织,特别是研究人员通过发布其工作流程中使用的元数据交叉通道来共享其元数据互操作性,这些交叉通道目前是不可见的(标准化)。通过提供一个易于使用的图形用户界面(GUI)来创建人行横道,GUI将吸引目前依赖于特定项目解决方案的用户。
{"title":"Making Schemas and Mappings Available and FAIR: A metadata and schema crosswalk registry from the FAIRCORE4EOSC project","authors":"T. Suominen, Joonas Kesäniemi, Hanna Koivula","doi":"10.3897/biss.7.112223","DOIUrl":"https://doi.org/10.3897/biss.7.112223","url":null,"abstract":"Community standards like the Darwin Core (Darwin Core Task Group 2009) together with semantic artefacts (controlled vocabularies, ontologies, thesauri, and other knowledge organisation systems) are key building blocks for the implementation of the FAIR (Findable, Accessible, Interoperable, Reusable) principles (Wilkinson et al. 2016), specifically as emphasized in the Interoperability principle I2 “(Meta)data use vocabularies that follow FAIR principles”. However, most of these artefacts are actually not FAIR themselves (Le Franc et al. 2020). \u0000 To address this, the FAIRCORE4EOSC project (2022-25) is developing a Metadata Schema and Crosswalk Registry (MSCR) that will allow registered users and communities to create, register and version schemas and crosswalks that all have persistent identifiers (PIDs). The published content can then be searched, browsed and downloaded without restrictions. The MSCR will also provide an API to facilitate the transformation of data from one schema to another via registered crosswalks. It will provide projects and individual researchers with the possibility to manage their metadata schemas and/or relevant metadata schema crosswalks. The schema and crosswalks will be shared with the community for reuse and extension supported by a proper versioning mechanism.\u0000 The registry tool will facilitate better interoperability between resource catalogues and information systems using different (metadata) schemas and encourage organisations and especially researchers to share their metadata interoperability by publishing the metadata crosswalks used in their workflows, which are currently not visible (FAIRification). By providing an easy-to-use graphical user interface (GUI) for creating crosswalks, the GUI will attract users currently relying on project-specific solutions.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"200 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83798084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impossible Museum: A national infrastructure to digitise the UK’s natural science collections 不可能的博物馆:数字化英国自然科学收藏的国家基础设施
Pub Date : 2023-09-07 DOI: 10.3897/biss.7.112294
Vincent Smith, Helen Hardy, Laurence Livermore, Lisa French, Tara Wainwright, Josh Humphries
The Distributed System of Scientific Collections UK (DiSSCo United Kingdom, Smith et al. 2022) is a proposal to the UK Research and Innovation (UKRI) Infrastructure Programme to revolutionise how we manage, share and use the UK’s natural science collections, creating a distributed network that provides a step change in research infrastructure for the UK. While the physical integration of such a collection would be almost inconceivable, its digital integration is within reach. Building on the UK Natural History Museum’s (NHM) digitisation programme and in partnership with more than 90 collection-holding institutions across the length and breadth of the UK, DiSSCo UK seeks to unlock the full scientific, economic and social benefits of the UK’s natural science collections, which are presently constrained by the limits of physical access. With just 8% of the UK’s 137 million specimens currently available digitally, their role in the emerging biodiversity data revolution is diminished. Through nationally coordinated action, DiSSCo UK seeks to massively accelerate the digitisation of these collections and the impact of these data. Five options to digitise UK collections are presently under consideration. These options iterate across the collection groups, number and type of institution, technical infrastructure level and "catalysis" to capitalise on the benefits of unlocking data and accelerating data production. Subject to UKRI approval, the full business cases for a preferred option will go through an 18–24 month approval process starting November 2023, unlocking tens to hundreds of millions of pounds of investment in UK collections. We will outline the strategic case, options and operational model for DISSCo UK, updating on our coordination, digitisation and catalysis activities.
英国科学收藏分布式系统(DiSSCo UK, Smith et al. 2022)是英国研究与创新(UKRI)基础设施计划的一项提案,旨在彻底改变我们如何管理,共享和使用英国的自然科学收藏,创建一个分布式网络,为英国的研究基础设施提供一个步骤变化。虽然这样的藏品的实体整合几乎是不可想象的,但它的数字整合是可以实现的。disco UK以英国自然历史博物馆(NHM)的数字化计划为基础,与英国各地的90多家收藏机构合作,寻求释放英国自然科学收藏的全部科学、经济和社会效益,这些收藏目前受到物理访问的限制。目前,英国1.37亿标本中只有8%是数字化的,它们在新兴的生物多样性数据革命中的作用正在减弱。通过国家协调行动,disco UK寻求大规模加速这些收藏的数字化和这些数据的影响。目前正在考虑将英国馆藏数字化的五种方案。这些选择在收集组、机构的数量和类型、技术基础设施水平和“催化”方面进行迭代,以充分利用解锁数据和加速数据生产的好处。在UKRI批准的情况下,从2023年11月开始,优先选择的完整商业案例将经历18-24个月的审批过程,解锁数千万至数亿英镑的英国收藏品投资。我们将概述DISSCo英国的战略案例、选择和运营模式,更新我们的协调、数字化和催化活动。
{"title":"The Impossible Museum: A national infrastructure to digitise the UK’s natural science collections","authors":"Vincent Smith, Helen Hardy, Laurence Livermore, Lisa French, Tara Wainwright, Josh Humphries","doi":"10.3897/biss.7.112294","DOIUrl":"https://doi.org/10.3897/biss.7.112294","url":null,"abstract":"The Distributed System of Scientific Collections UK (DiSSCo United Kingdom, Smith et al. 2022) is a proposal to the UK Research and Innovation (UKRI) Infrastructure Programme to revolutionise how we manage, share and use the UK’s natural science collections, creating a distributed network that provides a step change in research infrastructure for the UK. While the physical integration of such a collection would be almost inconceivable, its digital integration is within reach. Building on the UK Natural History Museum’s (NHM) digitisation programme and in partnership with more than 90 collection-holding institutions across the length and breadth of the UK, DiSSCo UK seeks to unlock the full scientific, economic and social benefits of the UK’s natural science collections, which are presently constrained by the limits of physical access. With just 8% of the UK’s 137 million specimens currently available digitally, their role in the emerging biodiversity data revolution is diminished. Through nationally coordinated action, DiSSCo UK seeks to massively accelerate the digitisation of these collections and the impact of these data. \u0000 Five options to digitise UK collections are presently under consideration. These options iterate across the collection groups, number and type of institution, technical infrastructure level and \"catalysis\" to capitalise on the benefits of unlocking data and accelerating data production. Subject to UKRI approval, the full business cases for a preferred option will go through an 18–24 month approval process starting November 2023, unlocking tens to hundreds of millions of pounds of investment in UK collections. We will outline the strategic case, options and operational model for DISSCo UK, updating on our coordination, digitisation and catalysis activities.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"129 3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78158316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mobilising Long-Term Natural Environment and Biodiversity Data and Exposing it for Federated, Semantic Queries 调动长期的自然环境和生物多样性数据,并将其暴露于联邦语义查询
Pub Date : 2023-09-07 DOI: 10.3897/biss.7.112221
Hanna Koivula, Christoph Wohner, Barbara Magagna, Paolo Tagliolato Acquaviva d’Aragona, A. Oggioni
Biodiversity and ecosystems cannot be studied without assessing the impacts of changing environmental conditions. Since the 1980s, the U.S. National Science Foundation's Long Term Ecological Research (LTER) Network has been a major force in the field of ecology to better understand ecosystems. In Europe, the LTER developments are led by the the Integrated European Long-Term Ecosystem, critical zone and socio-ecological system Research Infrastructure (eLTER RI), a currently project-based infrastructure initiative with the aim to facilitate high impact research and catalyse new insights about the compounded impacts of climate change, biodiversity loss, soil degradation, pollution, and unsustainable resource use on a range of European ecosystems and socio-ecological systems. The European LTER network, which forms the basis for the up-coming eLTER RI, is active in 26 countries and has 500 registered sites that provide legacy data e.g., historical time-series data about the environment (not only biodiversity). Its site information and dataset metadata with the measured variables are available to be searched at the Dynamic Ecological Information Management System - Site and dataset registry (DEIMS-SDR, Wohner et al. 2019). While DEIMS-SDR data models utilize parts of the Ecological Metadata Language (EML) schema 2.0.0, location information follows the European INSPIRE specification. The future eLTER data is planned to consist of site-based, long-term time-series of ecological data. The eLTER projects have defined eLTER Standard Observations (SO), which will include the minimum set of variables as well as the associated method protocols that can characterise adequately the state and future trends of the Earth's systems. (Masó et al. 2020, Reyers et al. 2017). The current eLTER network consists of sites that differ in terms of infrastructure maturity or environment type and may focus on one or several of the future SOs or they are not yet executing any holistic monitoring scheme. The main objective is to convert the eLTER site network into a distributed research infrastructure that incorporates a clearly outlined mandatory monitoring program. Essential to this effort are the suggested variables for eLTER SOs and the corresponding methods and protocols for relevant habitat types according to the European Nature Information System (EUNIS) in each domain. eLTER variables are described by using the eLTER thesaurus "EnvThes". These descriptions are currently enhanced by the use of the InteroperAble Descriptions of Observable Property Terminology (I-ADOPT, Magagna et al. 2022) framework to provide the necessary level of detail required for seamless data discovery and integration. Variables and their associated methods and protocols will be formalised to enable automatic site classifications, by building on existing observation representations such as the Extensible Observation Ontology (OBOE), Open Geospatial Consortium's Observation and Measurement, a
GBIF的新数据模型和用语义构件丰富原始数据可能被证明是提供组合来自多个来源的数据的主题数据产品的方法。
{"title":"Mobilising Long-Term Natural Environment and Biodiversity Data and Exposing it for Federated, Semantic Queries","authors":"Hanna Koivula, Christoph Wohner, Barbara Magagna, Paolo Tagliolato Acquaviva d’Aragona, A. Oggioni","doi":"10.3897/biss.7.112221","DOIUrl":"https://doi.org/10.3897/biss.7.112221","url":null,"abstract":"\u0000 Biodiversity and ecosystems cannot be studied without assessing the impacts of changing environmental conditions. Since the 1980s, the U.S. National Science Foundation's Long Term Ecological Research (LTER) Network has been a major force in the field of ecology to better understand ecosystems. In Europe, the LTER developments are led by the the Integrated European Long-Term Ecosystem, critical zone and socio-ecological system Research Infrastructure (eLTER RI), a currently project-based infrastructure initiative with the aim to facilitate high impact research and catalyse new insights about the compounded impacts of climate change, biodiversity loss, soil degradation, pollution, and unsustainable resource use on a range of European ecosystems and socio-ecological systems. The European LTER network, which forms the basis for the up-coming eLTER RI, is active in 26 countries and has 500 registered sites that provide legacy data e.g., historical time-series data about the environment (not only biodiversity). Its site information and dataset metadata with the measured variables are available to be searched at the Dynamic Ecological Information Management System - Site and dataset registry (DEIMS-SDR, \u0000 Wohner et al. 2019). While DEIMS-SDR data models utilize parts of the Ecological Metadata Language (EML) schema 2.0.0, location information follows the European INSPIRE specification.\u0000 \u0000 The future eLTER data is planned to consist of site-based, long-term time-series of ecological data. The eLTER projects have defined eLTER Standard Observations (SO), which will include the minimum set of variables as well as the associated method protocols that can characterise adequately the state and future trends of the Earth's systems. (Masó et al. 2020, Reyers et al. 2017).\u0000 The current eLTER network consists of sites that differ in terms of infrastructure maturity or environment type and may focus on one or several of the future SOs or they are not yet executing any holistic monitoring scheme. The main objective is to convert the eLTER site network into a distributed research infrastructure that incorporates a clearly outlined mandatory monitoring program. Essential to this effort are the suggested variables for eLTER SOs and the corresponding methods and protocols for relevant habitat types according to the European Nature Information System (EUNIS) in each domain. eLTER variables are described by using the eLTER thesaurus \"EnvThes\". These descriptions are currently enhanced by the use of the InteroperAble Descriptions of Observable Property Terminology (I-ADOPT, Magagna et al. 2022) framework to provide the necessary level of detail required for seamless data discovery and integration. Variables and their associated methods and protocols will be formalised to enable automatic site classifications, by building on existing observation representations such as the Extensible Observation Ontology (OBOE), Open Geospatial Consortium's Observation and Measurement, a","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"109 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72934532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biodiversity Information Science and Standards
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1