Starting in early 2022, biodiversity informatics researchers at Kew have been developing echinopscis: an "extensible notebook for open science on specimens". This aims to build on the early experiments that our community conducted with "e-taxonomy": the development of tools and techniques to enable taxonomic research to be conducted online. Early e-taxonomic tools (e.g., Scratchpads Smith et al. 2011) had to perform a wide range of functions, but in the past decade or so the move towards open science has built better support for generic functionality, such as reference management (Zotero) and document production (pandoc), skills development in automation and revision control to support reproducible science, as documented by the Turing Way (The Turing Way Community 2022), and an awareness of the importance of community building. We have developed echinopscis at Kew via a cross-departmental collaboration between researchers in biodiversity informatics and accelerated taxonomy. We have also benefitted from valuable input and advice from our many colleagues in associated projects and organisations around the world. OLS (originally Open Life Sciences) is a training and mentoring program for Open Science leaders with a focus on community building. The name was recently (2023) made more generic—"Open Seeds"—whilst retaining their well-known acronym "OLS"*1. OLS is a 16-week cohort-based mentoring program. Participants apply to join a cohort with a project that is developed through the 16 weeks. Each week of the syllabus alternates between time with a dedicated Open Science mentor and cohort calls, which are used to develop skills in project design, community building, open development & licencing, and inclusivity. Over 500 practitioners, experts and learners have participated across the seven completed cohorts of OLS' Open Seeds training and mentoring. Through this programme, over 300 researchers and open leaders from across six continents have designed, lauched and supported 200 projects from different disciplines worldwide. The next cohort will run between September 2023 and January 2024, and will be the eighth iteration of the program. This talk will briefly outline the work that we have done to setup and experiment with echinopscis, but will focus on the impact that the OLS program has had in its development. We will also include the use of techniques learned through OLS in other biodiversity informatics projects. OLS acknowledges that their program receives relatively few applications from project leads in biodiversity and we hope that this talk will be informative for Biodiversity Information Standards (TDWG) participants and can be used to build productive links between these communities.
{"title":"The Role of the OLS Program in the Development of echinopscis (an Extensible Notebook for Open Science on Specimens)","authors":"Nicky Nicolson, Eve Lucas","doi":"10.3897/biss.7.112318","DOIUrl":"https://doi.org/10.3897/biss.7.112318","url":null,"abstract":"Starting in early 2022, biodiversity informatics researchers at Kew have been developing echinopscis: an \"extensible notebook for open science on specimens\". This aims to build on the early experiments that our community conducted with \"e-taxonomy\": the development of tools and techniques to enable taxonomic research to be conducted online. Early e-taxonomic tools (e.g., Scratchpads Smith et al. 2011) had to perform a wide range of functions, but in the past decade or so the move towards open science has built better support for generic functionality, such as reference management (Zotero) and document production (pandoc), skills development in automation and revision control to support reproducible science, as documented by the Turing Way (The Turing Way Community 2022), and an awareness of the importance of community building. We have developed echinopscis at Kew via a cross-departmental collaboration between researchers in biodiversity informatics and accelerated taxonomy. We have also benefitted from valuable input and advice from our many colleagues in associated projects and organisations around the world. \u0000 OLS (originally Open Life Sciences) is a training and mentoring program for Open Science leaders with a focus on community building. The name was recently (2023) made more generic—\"Open Seeds\"—whilst retaining their well-known acronym \"OLS\"*1. OLS is a 16-week cohort-based mentoring program. Participants apply to join a cohort with a project that is developed through the 16 weeks. Each week of the syllabus alternates between time with a dedicated Open Science mentor and cohort calls, which are used to develop skills in project design, community building, open development & licencing, and inclusivity. Over 500 practitioners, experts and learners have participated across the seven completed cohorts of OLS' Open Seeds training and mentoring. Through this programme, over 300 researchers and open leaders from across six continents have designed, lauched and supported 200 projects from different disciplines worldwide. The next cohort will run between September 2023 and January 2024, and will be the eighth iteration of the program. \u0000 This talk will briefly outline the work that we have done to setup and experiment with echinopscis, but will focus on the impact that the OLS program has had in its development. We will also include the use of techniques learned through OLS in other biodiversity informatics projects. OLS acknowledges that their program receives relatively few applications from project leads in biodiversity and we hope that this talk will be informative for Biodiversity Information Standards (TDWG) participants and can be used to build productive links between these communities.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77794142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carla Novoa Sepúlveda, Stephan Biebl, Nadja Pöllath, S. Seifert, Markus Weiss, Tanja Weibulat, Dagmar Triebel
There is a growing demand for monitoring pests in natural history collections (NHCs) and establishing integrated pest management (IPM) solutions (Crossman and Ryde 2022). In this context, up-to-date taxonomic reference lists and controlled vocabularies following standard schemes are crucial and facilitate recording organisms detected in collections. The data pipeline described here results in the publication of a taxon reference list based on information from online resources and standard IPM literature. Most of the over 140 pest taxa on species level and above are insects, the rest belong to other animal groups and fungi. The complete taxon names, synonyms, English and German common names, and the hierarchical classification (parent-child relationships) are organised in a client-server installation of DiversityTaxonNames (DTN) at the Bavarian Natural History Collections (SNSB). DTN is a Microsoft Structured Query Language (MS SQL) database tool of the Diversity Workbench (DWB) framework with a published Entity Relation (ER) diagram (Hagedorn et al. 2019). The management is done using the Global Biodiversity Information Facility (GBIF) backbone taxonomy as external name resource, with linkage to the respective Wikidata Q item ID as a external persistent identifier (PID). Moreover, information on pest occurrence in NHCs is given, distinguishing the Consortium of European Taxonomic Facilities (CETAF) major NHC collection types affected (i.e., heritage sciences, life sciences and earth sciences) and the object categories, e.g., natural objects/specimens damaged. The data management in DTN enables the long-running curation, done by list curators. The generic data pipeline for the management and publication of a Global Taxonomic Reference List of Pests in NHCs is based on the DTN taxon lists concept and architecture and described under About "Taxon list of pest organisms for IPM at natural history collections compiled at the SNSB". It includes four steps (A–D) with significant results for best practices of data processing (Fig. 1). A. The data is managed and processed for publication by list curators in the database DiversityTaxonNames (DTN). As a result, the list can be kept up-to-date and is—without transformation—ready to be used for IPM solutions at any NHC with a DiversityCollection installation and as part of the DWB cloud services. B. The up-to-date data is publicly available via the DTN REST Webservice for Taxon Lists with machine-readable Application Programming Interface (API). As a result, the dynamic list publication service can be used as a reference backbone for establishing IPM solutions for pest monitoring at any NHC. C. The data is provided via the GBIF checklist data publication pipeline of the SNSB through GBIF validation tools and Darwin Core Archive in DwC-A (zip format) for GBIF. As a result, the checklist information becomes part of the GBIF network with GBIF ChecklistBank and GBIF Global Taxonomy. This ensures future c
对自然历史藏品(NHCs)中有害生物监测和建立综合有害生物管理(IPM)解决方案的需求日益增长(Crossman和Ryde 2022)。在这种情况下,最新的分类参考表和遵循标准方案的受控词汇表至关重要,并有助于记录收集中检测到的生物。这里描述的数据管道导致基于在线资源和标准IPM文献信息的分类单元参考列表的发布。在种级及以上的140多个害虫分类群中,大多数是昆虫,其余属于其他动物群和真菌。完整的分类单元名称、同义词、英语和德语常用名称以及层次分类(父子关系)在巴伐利亚自然历史收藏(SNSB)的DiversityTaxonNames (DTN)的客户机-服务器安装中组织。DTN是多样性工作台(DWB)框架的Microsoft结构化查询语言(MS SQL)数据库工具,具有已发布的实体关系(ER)图(Hagedorn et al. 2019)。管理使用全球生物多样性信息设施(GBIF)主干分类法作为外部名称资源,并链接到相应的Wikidata Q项目ID作为外部持久标识符(PID)。此外,还提供了国家卫生中心有害生物发生情况的信息,区分了欧洲分类设施联盟(CETAF)受影响的主要国家卫生中心收集类型(即遗产科学、生命科学和地球科学)和对象类别,例如自然物体/标本受损。DTN中的数据管理支持长期运行的管理,由列表管理器完成。国家卫生健康中心管理和出版《全球有害生物分类参考清单》的通用数据管道基于DTN分类单元清单的概念和架构,并在关于“SNSB编制的自然历史馆藏IPM有害生物分类单元清单”中进行了描述。它包括四个步骤(A-D),对于数据处理的最佳实践具有重要的结果(图1)。A.数据由数据库DiversityTaxonNames (DTN)中的列表管理员管理和处理以供发布。因此,该列表可以保持最新状态,并且无需进行转换,即可用于安装了DiversityCollection的任何NHC的IPM解决方案,并可作为DWB云服务的一部分。B.最新的数据通过DTN REST Webservice公开提供,具有机器可读的应用程序编程接口(API)。因此,动态列表发布服务可作为任何国家卫生健康中心建立有害生物监测IPM解决方案的参考骨干。C.通过GBIF验证工具和GBIF DwC-A (zip格式)的达尔文核心档案,通过SNSB的GBIF核对表数据发布管道提供数据。因此,清单信息通过GBIF ChecklistBank和GBIF Global Taxonomy成为GBIF网络的一部分。这确保了数据将来符合可查找性、可访问性、互操作性和重用(FAIR)指导原则。D.分类单元列表的DTN REST Web服务(目前有60个列表)是通过德国生物数据联合会(GFBio)术语服务注册和访问的。因此,带有外部pid和其他信息的列表可以作为服务使用(请参阅DTN列表概述)。在即将到来的德国国家研究数据基础设施(NFDI)计划的研究数据共享中(Diepenbroek等人,2021年),它将成为具有改进可访问性的商定接口方案的标准化api层的一部分。所提供的工具、API和数据是即将推出的nfdi4生物多样性服务组合的一部分。未来的场景包括使用DiversityNaviKey (triiebel et al. 2021)将列表项和属性作为分类用于诊断目的,包括发布用于识别害虫的图像。
{"title":"GBIF-Compliant Data Pipeline for the Management and Publication of a Global Taxonomic Reference List of Pests in Natural History Collections","authors":"Carla Novoa Sepúlveda, Stephan Biebl, Nadja Pöllath, S. Seifert, Markus Weiss, Tanja Weibulat, Dagmar Triebel","doi":"10.3897/biss.7.112391","DOIUrl":"https://doi.org/10.3897/biss.7.112391","url":null,"abstract":"There is a growing demand for monitoring pests in natural history collections (NHCs) and establishing integrated pest management (IPM) solutions (Crossman and Ryde 2022). In this context, up-to-date taxonomic reference lists and controlled vocabularies following standard schemes are crucial and facilitate recording organisms detected in collections.\u0000 The data pipeline described here results in the publication of a taxon reference list based on information from online resources and standard IPM literature. Most of the over 140 pest taxa on species level and above are insects, the rest belong to other animal groups and fungi.\u0000 The complete taxon names, synonyms, English and German common names, and the hierarchical classification (parent-child relationships) are organised in a client-server installation of DiversityTaxonNames (DTN) at the Bavarian Natural History Collections (SNSB). DTN is a Microsoft Structured Query Language (MS SQL) database tool of the Diversity Workbench (DWB) framework with a published Entity Relation (ER) diagram (Hagedorn et al. 2019). The management is done using the Global Biodiversity Information Facility (GBIF) backbone taxonomy as external name resource, with linkage to the respective Wikidata Q item ID as a external persistent identifier (PID). Moreover, information on pest occurrence in NHCs is given, distinguishing the Consortium of European Taxonomic Facilities (CETAF) major NHC collection types affected (i.e., heritage sciences, life sciences and earth sciences) and the object categories, e.g., natural objects/specimens damaged. The data management in DTN enables the long-running curation, done by list curators.\u0000 The generic data pipeline for the management and publication of a Global Taxonomic Reference List of Pests in NHCs is based on the DTN taxon lists concept and architecture and described under About \"Taxon list of pest organisms for IPM at natural history collections compiled at the SNSB\". It includes four steps (A–D) with significant results for best practices of data processing (Fig. 1).\u0000 A. The data is managed and processed for publication by list curators in the database DiversityTaxonNames (DTN).\u0000 As a result, the list can be kept up-to-date and is—without transformation—ready to be used for IPM solutions at any NHC with a DiversityCollection installation and as part of the DWB cloud services.\u0000 B. The up-to-date data is publicly available via the DTN REST Webservice for Taxon Lists with machine-readable Application Programming Interface (API).\u0000 As a result, the dynamic list publication service can be used as a reference backbone for establishing IPM solutions for pest monitoring at any NHC.\u0000 C. The data is provided via the GBIF checklist data publication pipeline of the SNSB through GBIF validation tools and Darwin Core Archive in DwC-A (zip format) for GBIF.\u0000 As a result, the checklist information becomes part of the GBIF network with GBIF ChecklistBank and GBIF Global Taxonomy. This ensures future c","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84719431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elizabeth R. Ellwood, Wouter Addink, John Bates, Andrew Bentley, Jutta Buschbom, Alina Freire-Fierro, Jose Fortes, David Jennings, Kerstin Lehnert, Bertram Ludäscher, Keping Ma, James Macklin, Austin Mast, Joe Miller, Gil Nelson, Nicky Nicolson, Jyotsna Pandey, Deborah Paul, Sinlan Poo, Richard Rabeler, Pamela S. Soltis, Elycia Wallis, Michael Webster, Andrew Young, Breda Zimkus
Thanks to substantial support for biodiversity data mobilization in recent decades, billions of occurrence records are openly available, documenting life on Earth and enabling timely research, awareness raising, and policy-making. Initiatives across local to global scales have been separately funded to serve different, yet often overlapping audiences of data users, and have developed a variety of platforms and infrastructures to meet the needs of these audiences. The independent progress of biodiversity data providers has led to innovations as well as challenges for the community at large as we move towards connecting and linking a diversity of information from disparate sources as Digital Extended Specimens (DES). Recognizing a need for deeper and more frequent opportunities for communication and collaboration across the globe, an ad-hoc group of representatives of various international, national, and regional organizations have been meeting virtually since 2020 to provide a forum for updates, announcements, and shared progress. This group is provisionally named International Partners for the Digital Extended Specimen (IPDES), and is guided by these four concepts: Biodiversity, Connection, Knowledge and Agency. Participants in IPDES include representatives of the Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio), American Institute of Biological Sciences (AIBS), Biodiversity Collections Network (BCoN), Natural Science Collections Alliance (NSCA), Distributed System of Scientific Collections (DiSSCo), Atlas of Living Australia (ALA), Biodiversity Information Standards (TDWG), Society for the Preservation of Natural History Collections (SPNHC), National Specimen Information Infrastructure of China (NSII), and South African National Biodiversity Institute (SANBI), as well as individuals involved with biodiversity informatics initiatives, natural science collections, museums, herbaria, and universities. Our global partners group strives to increase representation from around the globe as we aim to enable research that contributes to novel discoveries and addresses the societal challenges leading to the biodiversity crisis. Our overarching mission is to expand on the community-driven successes to connect biodiversity data and knowledge through coordination of a globally integrated network of stakeholders to enable an extensible technical and social infrastructure of data, tools, and working practices in support of our vision. The main work of our group thus far includes publishing a paper on the Digital Extended Specimen (Hardisty et al. 2022), organizing and hosting an array of activities at conferences, and asynchronous online work and forum-based exchanges. We aim to advance discussion on topics of broad interest to our community such as social and technical capacity building, broadening participation, expanding social and data networks, improving data models and building a backbone for the DES, and ide
由于近几十年来对生物多样性数据动员的大力支持,数十亿的生物发生记录可以公开获取,记录了地球上的生命,并使及时的研究、提高认识和决策成为可能。从地方到全球范围的举措已分别获得资助,以服务不同但往往重叠的数据用户受众,并开发了各种平台和基础设施,以满足这些受众的需求。生物多样性数据提供者的独立发展为整个生物多样性社区带来了创新和挑战,因为我们正朝着连接和连接来自不同来源的多样性信息作为数字扩展标本(DES)的方向发展。认识到需要在全球范围内建立更深入、更频繁的沟通与合作机会,自2020年以来,各种国际、国家和地区组织的代表组成了一个特设小组,通过虚拟方式举行会议,为更新、公告和共享进展提供论坛。该组织暂时被命名为数字扩展标本国际合作伙伴(IPDES),并以这四个概念为指导:生物多样性、联系、知识和代理。IPDES的参与者包括全球生物多样性信息设施(GBIF)、综合数字化生物馆藏(iDigBio)、美国生物科学研究所(AIBS)、生物多样性馆藏网络(BCoN)、自然科学馆藏联盟(NSCA)、科学馆藏分布式系统(DiSSCo)、澳大利亚生活图集(ALA)、生物多样性信息标准(TDWG)、自然历史馆藏保存协会(SPNHC)、中国国家标本信息基础设施(NSII)和南非国家生物多样性研究所(SANBI),以及参与生物多样性信息学倡议的个人、自然科学收藏、博物馆、植物标本馆和大学。我们的全球合作伙伴小组致力于增加来自全球各地的代表性,因为我们的目标是使研究有助于新发现和解决导致生物多样性危机的社会挑战。我们的首要任务是扩大社区驱动的成功,通过协调全球整合的利益相关者网络,将生物多样性数据和知识联系起来,使数据、工具和工作实践的可扩展技术和社会基础设施成为可能,以支持我们的愿景。到目前为止,我们小组的主要工作包括发表一篇关于数字扩展标本的论文(Hardisty et al. 2022),组织和主持一系列会议活动,以及异步在线工作和基于论坛的交流。我们的目标是推进对我们社区广泛感兴趣的主题的讨论,如社会和技术能力建设、扩大参与、扩大社会和数据网络、改进数据模型和建立经济发展系统的主干,以及确定国际筹资解决方案。本报告将重点介绍其中的一些活动,并详细介绍支持DES所需的人际网络和技术基础设施发展路线图的进展情况。它为利益相关者社区(如TDWG)和其他关注数据标准和生物多样性信息学的倡议提供了反馈和参与的机会。随着我们巩固我们的未来计划,以支持综合和相互关联的生物多样性数据,并赞扬那些从事这项工作的人。
{"title":"Connecting the Dots: Aligning human capacity through networks toward a globally interoperable Digital Extended Specimen (DES) infrastructure","authors":"Elizabeth R. Ellwood, Wouter Addink, John Bates, Andrew Bentley, Jutta Buschbom, Alina Freire-Fierro, Jose Fortes, David Jennings, Kerstin Lehnert, Bertram Ludäscher, Keping Ma, James Macklin, Austin Mast, Joe Miller, Gil Nelson, Nicky Nicolson, Jyotsna Pandey, Deborah Paul, Sinlan Poo, Richard Rabeler, Pamela S. Soltis, Elycia Wallis, Michael Webster, Andrew Young, Breda Zimkus","doi":"10.3897/biss.7.112390","DOIUrl":"https://doi.org/10.3897/biss.7.112390","url":null,"abstract":"Thanks to substantial support for biodiversity data mobilization in recent decades, billions of occurrence records are openly available, documenting life on Earth and enabling timely research, awareness raising, and policy-making. Initiatives across local to global scales have been separately funded to serve different, yet often overlapping audiences of data users, and have developed a variety of platforms and infrastructures to meet the needs of these audiences. The independent progress of biodiversity data providers has led to innovations as well as challenges for the community at large as we move towards connecting and linking a diversity of information from disparate sources as Digital Extended Specimens (DES). Recognizing a need for deeper and more frequent opportunities for communication and collaboration across the globe, an ad-hoc group of representatives of various international, national, and regional organizations have been meeting virtually since 2020 to provide a forum for updates, announcements, and shared progress. This group is provisionally named International Partners for the Digital Extended Specimen (IPDES), and is guided by these four concepts: Biodiversity, Connection, Knowledge and Agency. Participants in IPDES include representatives of the Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio), American Institute of Biological Sciences (AIBS), Biodiversity Collections Network (BCoN), Natural Science Collections Alliance (NSCA), Distributed System of Scientific Collections (DiSSCo), Atlas of Living Australia (ALA), Biodiversity Information Standards (TDWG), Society for the Preservation of Natural History Collections (SPNHC), National Specimen Information Infrastructure of China (NSII), and South African National Biodiversity Institute (SANBI), as well as individuals involved with biodiversity informatics initiatives, natural science collections, museums, herbaria, and universities. Our global partners group strives to increase representation from around the globe as we aim to enable research that contributes to novel discoveries and addresses the societal challenges leading to the biodiversity crisis. Our overarching mission is to expand on the community-driven successes to connect biodiversity data and knowledge through coordination of a globally integrated network of stakeholders to enable an extensible technical and social infrastructure of data, tools, and working practices in support of our vision. The main work of our group thus far includes publishing a paper on the Digital Extended Specimen (Hardisty et al. 2022), organizing and hosting an array of activities at conferences, and asynchronous online work and forum-based exchanges. We aim to advance discussion on topics of broad interest to our community such as social and technical capacity building, broadening participation, expanding social and data networks, improving data models and building a backbone for the DES, and ide","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136299015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Atlas of Living Australia (ALA), Australia's national online biodiversity database, is partnering with the Noongar Boodjar Language Centre (NBALC) to promote Indigenous language and knowledge by including Noongar names for plants and animals in the ALA. Names are included in the ALA species page for each plant and animal and knowledge is built into the Noongar Plant and Animal online Encyclopedia, hosted in the ALA. We demonstrate the use of CARE principles (Collective Benefit, Authority to Control, Responsibility, and Ethics (Carroll et al. 2020)) to engage, support, and deliver the project and outcomes to the Noongar people and communities working with us. The ALA addresses the FAIR principles (Wilkinson et al. 2016) for data management and stewardship ensuring data are findable, accessable, interoperable, and reusable. The ALA is partnering with NBALC in Perth to ensure all sharing of Noongar data is on Noongar terms. NBALC and ALA have been working with Noongar-Wadjari, a southern clan from the Fitzgerald River area in Western Australia, to collect, protect and share their language and traditional knowledge for local species.*1 The Noongar Encyclopedia project exhibits Collective Benefit because it is a co-innovation project that was co-designed by NBALC and ALA. The project’s activities were designed by the Community-endorsed representatives, the Knowledge Holders. The aims and aspirations of the Community were included in the project design to ensure equitable outcomes. NBALC’s more than 25-year relationship with the Community, and as Noongar people themselves, meant they had a good understanding of what the Community might want from the project. These assumptions were tested and refined during the first Community consultation, before the project plan was finalised. The Community are keen for their traditional knowledge to be shared and freely available to their Community. The ALA only shared knowledge that has passed through strict consent processes. It is seen as a safe and stable digital environment for now and the future, and where the traditional knowledge can be accessed freely and easily. The link to western science knowledge is secondary to knowledge sharing for most of the Aboriginal and Torres Strait Islander Communities that the ALA are working with although the benefits of scientists having access to both knowledge systems is seen as a positive step in care for Country into the future. The Noongar Encyclopedia project ensures Noongar Authority to Control these data because NBALC, as an Aboriginal organisation, led by Noongar people, understands the rights and interests of the Communities we are working with. Protection of these rights and inclusion of Community interests are written into the project methodology as part of the project co-design. It is important to ensure the project is working with the right people within the Community. NBALC facilitates this by finding people who hold traditional knowledge, and can trace
澳大利亚国家在线生物多样性数据库“澳大利亚生活地图集”(ALA)正与努加尔布德贾尔语言中心(NBALC)合作,通过在ALA中加入努加尔动植物名称来推广土著语言和知识。每一种动植物的名称都包含在ALA的物种页面中,知识被建立在由ALA托管的Noongar动植物在线百科全书中。我们展示了使用CARE原则(集体利益,控制权力,责任和道德(Carroll et al. 2020))来参与,支持并向与我们合作的Noongar人民和社区提供项目和成果。ALA解决了FAIR原则(Wilkinson et al. 2016),用于数据管理和管理,确保数据可查找、可访问、可互操作和可重用。ALA正在与位于珀斯的NBALC合作,以确保所有Noongar数据的共享都符合Noongar条款。NBALC和ALA一直在与西澳大利亚州菲茨杰拉德河地区的一个南部部族Noongar-Wadjari合作,收集、保护和分享他们的语言和当地物种的传统知识。*1 Noongar百科全书项目是由NBALC和ALA共同设计的协同创新项目,因此具有集体效益。该项目的活动是由共同体认可的代表,即知识持有人设计的。共同体的目标和愿望已列入项目设计,以确保公平的结果。NBALC与社区超过25年的关系,以及作为Noongar人自己,意味着他们对社区可能想从项目中得到什么有很好的理解。在项目计划最终确定之前,这些假设在第一次社区咨询期间进行了测试和完善。社区渴望他们的传统知识被分享并免费提供给他们的社区。美国ALA只分享经过严格同意程序的知识。它被视为现在和未来一个安全稳定的数字环境,在那里传统知识可以自由和容易地获得。对于ALA正在与之合作的大多数土著和托雷斯海峡岛民社区来说,与西方科学知识的联系是次要的,尽管科学家获得这两种知识系统的好处被视为关怀国家走向未来的积极一步。努格尔百科全书项目确保努格尔当局控制这些数据,因为NBALC作为一个由努格尔人领导的土著组织,了解我们正在与之合作的社区的权利和利益。保护这些权利和纳入社区利益作为项目共同设计的一部分写入项目方法。确保项目与社区内合适的人合作是很重要的。NBALC通过寻找掌握传统知识并能追溯故事来源的人来促进这一点。由于所有收集的数据都由NBALC存储和管理,因此可以确保对数据进行适当的治理。项目设计包括知识持有人的滚动同意,他们审查收集的所有数据,根据需要进行添加或编辑,并同意或拒绝通过ALA公开共享知识。努格尔百科全书项目的设计确保我们理解土著数据收集、保护、管理和共享的责任(CARE“R”)。通过与ALA的合作,NBALC正在扩大其数字数据收集和管理的能力和能力。欧共体正在建设其与语言学家和科学家合作的能力。将Noongar语言和传统知识纳入ALA向ALA的非土著用户展示了另一种命名,观察,谈论和记录物种知识的方式。这种观点不同于西方科学。Noongar人认为所有事物都是相互连接的,并根据它们的用途和连接性对事物进行分组。西方科学倾向于根据物种的物理属性对它们进行分类。语言是这种另类世界观的关键。ALA现在公布学名、英文名和Noongar单词。ALA通过Noongar百科全书和另外两个生态知识百科全书(Kamilaroi和South East Arnhem Land)链接到这些物种的另一种科学观点。Noongar百科全书项目不断受到社区的道德评估,并通过严格的西方道德评估和审查。共同体伦理评估包括在项目开始前进行的一系列评估。这些项目是与NBALC共同设计的,以确保它们符合协议和社区的期望。然后将ALA介绍给社区。 共同体决定他们是否对这个项目感兴趣,它是否满足了他们的愿望,他们是否愿意与ALA以及潜在的其他科学家一起工作。有贡献的科学家或学者由NBALC介绍给社区。欧共体有权拒绝与任何介绍的科学家或学者合作。所有贡献者在被介绍给社区之前都会被告知这个协议。《努加尔动植物百科全书》于2021年9月出版(NBALC 2021)。
{"title":"Implementing CARE Principles to Link Noongar Language and Knowledge to Western Science through the Atlas of Living Australia","authors":"N. Raisbeck‐Brown, Denise Smith-Ali","doi":"10.3897/biss.7.112349","DOIUrl":"https://doi.org/10.3897/biss.7.112349","url":null,"abstract":"The Atlas of Living Australia (ALA), Australia's national online biodiversity database, is partnering with the Noongar Boodjar Language Centre (NBALC) to promote Indigenous language and knowledge by including Noongar names for plants and animals in the ALA. Names are included in the ALA species page for each plant and animal and knowledge is built into the Noongar Plant and Animal online Encyclopedia, hosted in the ALA. We demonstrate the use of CARE principles (Collective Benefit, Authority to Control, Responsibility, and Ethics (Carroll et al. 2020)) to engage, support, and deliver the project and outcomes to the Noongar people and communities working with us. \u0000 The ALA addresses the FAIR principles (Wilkinson et al. 2016) for data management and stewardship ensuring data are findable, accessable, interoperable, and reusable. The ALA is partnering with NBALC in Perth to ensure all sharing of Noongar data is on Noongar terms. NBALC and ALA have been working with Noongar-Wadjari, a southern clan from the Fitzgerald River area in Western Australia, to collect, protect and share their language and traditional knowledge for local species.*1\u0000 The Noongar Encyclopedia project exhibits Collective Benefit because it is a co-innovation project that was co-designed by NBALC and ALA. The project’s activities were designed by the Community-endorsed representatives, the Knowledge Holders. The aims and aspirations of the Community were included in the project design to ensure equitable outcomes. NBALC’s more than 25-year relationship with the Community, and as Noongar people themselves, meant they had a good understanding of what the Community might want from the project. These assumptions were tested and refined during the first Community consultation, before the project plan was finalised. The Community are keen for their traditional knowledge to be shared and freely available to their Community. The ALA only shared knowledge that has passed through strict consent processes. It is seen as a safe and stable digital environment for now and the future, and where the traditional knowledge can be accessed freely and easily. The link to western science knowledge is secondary to knowledge sharing for most of the Aboriginal and Torres Strait Islander Communities that the ALA are working with although the benefits of scientists having access to both knowledge systems is seen as a positive step in care for Country into the future.\u0000 The Noongar Encyclopedia project ensures Noongar Authority to Control these data because NBALC, as an Aboriginal organisation, led by Noongar people, understands the rights and interests of the Communities we are working with. Protection of these rights and inclusion of Community interests are written into the project methodology as part of the project co-design. It is important to ensure the project is working with the right people within the Community. NBALC facilitates this by finding people who hold traditional knowledge, and can trace","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85507465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taxonomy, and biodiversity science in general, mainly revolve around four types of entities, which are available digitally in ever increasing numbers from different services: (1) Physical specimens (kept in museums and other collections around the world) and observations are available digitally via the Global Biodiversity Information Facility (GBIF). (2) DNA sequences (often derived from preserved specimens) are available from the European Nucleotide Archive (ENA) and National Center for Biotechnology Information (NCBI), having accession numbers as their primary means of citation. (3) Taxa, identified by taxon names, are increasingly registered to nomenclatural reference databases (ZooBank, International Plant Names Index (IPNI)) and aggregated in the Catalogue of Life (CoL). (4) Taxonomic treatments combine the former three; they define taxa, express scientific opinions about existing taxa, based upon specimens as well as DNA sequences derived from themand coin respective names; they are available from TreatmentBank (as well as Zenodo/Biodiversity Literature Repository (BLR) and Swiss Institute of Bioinformatics Literature Services (SIBiLS), and GBIF). Traditionally, treatments cite specimens, taxa, and other treatments in mainly human-centric ways, describing where to find the cited object, but they are not immediately actionable in a digital sense. Specimen citations use institution and collection codes and catalog numbers (often combined with geographical and environmental data). Taxon names are a type of self-citing entities, especially when given in combination with their (bibliographic) authorship, as they represent a historical approach to human-readable taxon identifiers. Citations of treatments are very similar to those of taxon names, adding (bibliographic) information of subsequent name usages as needed. Accession numbers for DNA sequences are the closest to modern digital identifiers. However, none of these means of citation, as usually found in literature, are readily machine actionable, which makes them hard to process at scale and analyze programmatically. Identifiers coined by the various data providers, in combination with APIs to resolve them, alleviate this problem and enable computational navigation of such links. However, this alone only defers the problem, as actionable identifiers (e.g., HTTP URIs) at some point still need to be inferred from the information given in the traditional means of citation where the latter occur in data. Recent projects, like BiCIKL, aim to add machine navigable links to the various entities (or respective data records) at scale, in pursuit of (ideally) fully intermeshed records, connecting (1) treatments to subject taxon names and concepts, cited specimens and DNA sequences, as well as cited treatments (with explicit nomenclatorial implications, e.g., taxon name synonymies or rebuttals thereof), (2) (digital) specimens to assigned taxon names, citing treatments, and any derived DNA sequences,
分类学和生物多样性科学总体上主要围绕四种类型的实体,这些实体从不同的服务中获得的数字数量不断增加:(1)物理标本(保存在世界各地的博物馆和其他收藏品中)和观测结果通过全球生物多样性信息设施(GBIF)获得数字。(2) DNA序列(通常来源于保存的标本)可从欧洲核苷酸档案(ENA)和国家生物技术信息中心(NCBI)获得,其检索编号是其主要引用方式。(3)通过分类单元名称识别的分类群越来越多地被收录到命名参考数据库(ZooBank、International Plant names Index (IPNI))和生命目录(CoL)中。(4)前三种分类处理相结合;他们根据标本及其衍生的DNA序列来定义分类群,表达对现有分类群的科学看法,并创造各自的名称;它们可从TreatmentBank(以及Zenodo/生物多样性文献库(BLR)和瑞士生物信息学文献服务研究所(SIBiLS)和GBIF)获得。传统上,治疗方法主要以人为中心的方式引用标本、分类群和其他治疗方法,描述在哪里可以找到被引用的对象,但它们在数字意义上不是立即可操作的。标本引用使用机构和收集代码以及目录编号(通常与地理和环境数据相结合)。分类单元名称是一种自引用实体,特别是当与它们的(书目)作者身份结合在一起时,因为它们代表了人类可读分类单元标识符的历史方法。处理的引用与分类单元名称的引用非常相似,并根据需要添加后续名称用法的(书目)信息。DNA序列的编号是最接近现代数字标识符的。然而,通常在文献中发现的这些引用方式都不容易被机器操作,这使得它们难以大规模处理和编程分析。由各种数据提供者创造的标识符,结合解决它们的api,缓解了这个问题,并使这些链接的计算导航成为可能。然而,这只是推迟了问题的解决,因为在某些情况下,可操作的标识符(例如HTTP uri)仍然需要从传统的引用方式中给出的信息中推断出来,而后者出现在数据中。最近的项目,如BiCIKL,旨在大规模地添加到各种实体(或各自的数据记录)的机器可导航链接,以追求(理想的)完全互连的记录,将(1)处理与主题分类单元名称和概念,引用的标本和DNA序列,以及引用的处理(具有明确的命名含义,例如,分类单元名称同义词或其反驳),(2)(数字)标本与指定的分类单元名称,引用的处理和任何衍生的DNA序列连接起来。(3)源标本(或其数字对应物)的DNA序列,如适用,分配分类群名称和引用处理;(4)分类群名称,定义和同义化处理、相关(数字)标本和任何衍生DNA序列。这消除了链接序列中传递依赖关系可能出现的问题,作为故障的中间点;所有主要的数据提供者已经在不同程度上这样做了一段时间,这提供了一个很好的起点,但是仍然存在一些挑战和陷阱:由于有效的技术原因,单个数据提供者的系统是(并且需要是)自包含的,这是以一定数量的重复(例如,GBIF和ENA/NCBI主干分类法)为代价的。这本身没有问题,但会减缓更新的扩散,并可能导致一些差异。此外,传统的人类可读标识符可能有些模棱两可:(1)一些机构和收藏代码不是唯一的,或者作者以非标准的方式使用它们(例如,全球科学收藏注册(GrSciColl)中的一些代码指向六个不同的机构);(2)博物馆标本的某些目录编号也是有效的(可解析的)加入编号,其实际语义仅从上下文中出现;(3)缺少后者使得表中数据的语义特别难以推断;(4)没有一个提供者具有完整的数据覆盖,因此在任何给定的点上,链接甚至在技术上都不可能在所有情况下都是可行的,并且随着覆盖范围和数据之间的重叠增加,一些链接只能随着时间的推移而添加(例如,当定义处理被数字化时,新发布的名称不可能在CoL中);(5)偶尔的完全重新计算或重新处理是不切实际和浪费的。 在本次演讲中,我们将讨论克服上述挑战和避免上述缺陷的各种方法,并为api提供相关建议,以更好地支持各自的机制。
{"title":"Bidirectional Linking: Benefits, challenges, pitfalls, and solutions","authors":"Guido Sautter, D. Agosti","doi":"10.3897/biss.7.112344","DOIUrl":"https://doi.org/10.3897/biss.7.112344","url":null,"abstract":"Taxonomy, and biodiversity science in general, mainly revolve around four types of entities, which are available digitally in ever increasing numbers from different services: (1) Physical specimens (kept in museums and other collections around the world) and observations are available digitally via the Global Biodiversity Information Facility (GBIF). (2) DNA sequences (often derived from preserved specimens) are available from the European Nucleotide Archive (ENA) and National Center for Biotechnology Information (NCBI), having accession numbers as their primary means of citation. (3) Taxa, identified by taxon names, are increasingly registered to nomenclatural reference databases (ZooBank, International Plant Names Index (IPNI)) and aggregated in the Catalogue of Life (CoL). (4) Taxonomic treatments combine the former three; they define taxa, express scientific opinions about existing taxa, based upon specimens as well as DNA sequences derived from themand coin respective names; they are available from TreatmentBank (as well as Zenodo/Biodiversity Literature Repository (BLR) and Swiss Institute of Bioinformatics Literature Services (SIBiLS), and GBIF).\u0000 Traditionally, treatments cite specimens, taxa, and other treatments in mainly human-centric ways, describing where to find the cited object, but they are not immediately actionable in a digital sense. Specimen citations use institution and collection codes and catalog numbers (often combined with geographical and environmental data). Taxon names are a type of self-citing entities, especially when given in combination with their (bibliographic) authorship, as they represent a historical approach to human-readable taxon identifiers. Citations of treatments are very similar to those of taxon names, adding (bibliographic) information of subsequent name usages as needed. Accession numbers for DNA sequences are the closest to modern digital identifiers. However, none of these means of citation, as usually found in literature, are readily machine actionable, which makes them hard to process at scale and analyze programmatically. Identifiers coined by the various data providers, in combination with APIs to resolve them, alleviate this problem and enable computational navigation of such links. However, this alone only defers the problem, as actionable identifiers (e.g., HTTP URIs) at some point still need to be inferred from the information given in the traditional means of citation where the latter occur in data.\u0000 Recent projects, like BiCIKL, aim to add machine navigable links to the various entities (or respective data records) at scale, in pursuit of (ideally) fully intermeshed records, connecting (1) treatments to subject taxon names and concepts, cited specimens and DNA sequences, as well as cited treatments (with explicit nomenclatorial implications, e.g., taxon name synonymies or rebuttals thereof), (2) (digital) specimens to assigned taxon names, citing treatments, and any derived DNA sequences,","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87815824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Progress in life and biomedical sciences depends absolutely on biodata resources—databases comprising biological data and services around those databases. Supporting scientists in data operations and spanning management, analysis and publication of newly generated data and access to pre-existing reference data, these biodata resources together comprise a critical infrastructure for life science and biomedical research. Familiar scientific infrastructures—for example the Conseil Européen pour la Recherche Nucléaire (CERN) or the Square Kilometer Array, are distinct, constructed, physical entities that are centrally funded and managed at one or more identifiable locations. By contrast, the primary infrastructure of the life sciences—comprised of databases and other biological data resources—is globally distributed, virtually connected, funded from multiple sources, and is not managed as a coordinated entity. While this configuration supports innovation, it lends itself poorly to the long-term sustainability of individual biodata resources and of the infrastructure as a whole. The Global Biodata Coalition (GBC) brings together life science research funding organisations that recognise these challenges and acknowledge the threat that the lack of sustainability poses. They agree to work together to find ways to improve sustainability. In the presentation, we will provide an overview of the global biodata resource infrastructure, focusing in particular on challenges to providing sustained long-term funding to the resources that comprise the infrastructure. This will provide a global context to other presentations in the session, which focus on biodata resources in Australia. Covering some of the work that GBC has carried out to understand and classify biodata resources and the entire biodata resource infrastructure, we will outline the Global Core Biodata Resource programme and Inventory project and also introduce the stakeholder consultation processes around approaches to sustainability and open data. Finally, we will lay out the path GBC is taking to engage researchers, informaticians, funding organisations and other stakeholders in moving towards greater sustainability for these critical resources
{"title":"The Global Biodata Coalition: Towards a sustainable biodata infrastructure","authors":"Chuck Cook, Guy Cochrane","doi":"10.3897/biss.7.112303","DOIUrl":"https://doi.org/10.3897/biss.7.112303","url":null,"abstract":"Progress in life and biomedical sciences depends absolutely on biodata resources—databases comprising biological data and services around those databases. Supporting scientists in data operations and spanning management, analysis and publication of newly generated data and access to pre-existing reference data, these biodata resources together comprise a critical infrastructure for life science and biomedical research. Familiar scientific infrastructures—for example the Conseil Européen pour la Recherche Nucléaire (CERN) or the Square Kilometer Array, are distinct, constructed, physical entities that are centrally funded and managed at one or more identifiable locations. By contrast, the primary infrastructure of the life sciences—comprised of databases and other biological data resources—is globally distributed, virtually connected, funded from multiple sources, and is not managed as a coordinated entity. While this configuration supports innovation, it lends itself poorly to the long-term sustainability of individual biodata resources and of the infrastructure as a whole. The Global Biodata Coalition (GBC) brings together life science research funding organisations that recognise these challenges and acknowledge the threat that the lack of sustainability poses. They agree to work together to find ways to improve sustainability.\u0000 In the presentation, we will provide an overview of the global biodata resource infrastructure, focusing in particular on challenges to providing sustained long-term funding to the resources that comprise the infrastructure. This will provide a global context to other presentations in the session, which focus on biodata resources in Australia.\u0000 Covering some of the work that GBC has carried out to understand and classify biodata resources and the entire biodata resource infrastructure, we will outline the Global Core Biodata Resource programme and Inventory project and also introduce the stakeholder consultation processes around approaches to sustainability and open data. Finally, we will lay out the path GBC is taking to engage researchers, informaticians, funding organisations and other stakeholders in moving towards greater sustainability for these critical resources","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"197 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76232603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BHL Australia, the Australian branch of the Biodiversity Heritage Library (BHL), was launched in 2010 and began operation with a single organisation, Museums Victoria in Melbourne. Since then, it has grown considerably. Funded by the Atlas of Living Australia, BHL Australia now digitises biodiversity literature on behalf of 42 organisations across the country. These organisations include museums, herbaria, state libraries, royal societies, government agencies, field naturalist clubs and natural history publishers, many of whom lack the resources to do this work themselves. BHL Australia’s national consortium model, which makes biodiversity literature accessible on behalf of so many organisations, is unique amongst the BHL global community. Most BHL operations digitise material on behalf of a single organisation. BHL Australia has now made over 530,000 pages of Australia’s biodiversity knowledge freely accessible online. The BHL Australia Collection includes both published works (books and journals) and unpublished material (collection registers, field diaries and correspondence). The pages of these works are filled with species descriptions, references to historically significant people and, most importantly, scientific data that is critical to ongoing research and conservation efforts. Providing access to materials published as far back as the 1600s and as recently as the current year, the collection chronicles the scientific discovery and understanding of Australia’s biodiversity. BHL Australia also leads the global initiative to bring the world's historic biodiversity and taxonomic literature into the modern linked network of scholarly research by incorporating article data into BHL and retrospectively assigning DOIs (Digital Object Identifiers) (Kearney et al. 2021). BHL has now assigned more than 162,000 DOIs to historic publications, making them persistently citable and trackable, both within BHL and beyond. This paper will celebrate the achievements of BHL Australia by journeying through the (now accessible, discoverable and DOI'd) Tasmanian Tiger literature. It will showcase: previously elusive descriptions (and beautiful illustrations) of Thylacines, including those by Gerhard Krefft (1871) https://doi.org/10.5962/p.314741, and John Gould (1863) https://doi.org/10.5962/p.312790; the invaluable creation of links to open access versions from paywalled publications that should be in the public domain, such as the first description of the Thylacine (Harris 1808): open access on BHL; paywalled by Oxford Academic; the many citations of historic taxonomic descriptions that are now appearing as clickable DOI links in modern scholarly articles, taxonomic databases, social media, and Wikipedia (Kearney and Page 2022); and the efforts being made to encourage more authors to cite the authoritative source of taxonomic names (Benichou 2022). previously elusive descriptions (and beautiful illustrations) of Thylacines, i
BHL澳大利亚是生物多样性遗产图书馆(BHL)的澳大利亚分馆,成立于2010年,最初由一个组织——墨尔本的维多利亚博物馆运营。从那以后,它有了相当大的增长。在澳大利亚生活地图集的资助下,BHL澳大利亚现在代表全国42个组织对生物多样性文献进行数字化。这些组织包括博物馆、植物标本馆、国家图书馆、皇家学会、政府机构、野外自然学家俱乐部和自然历史出版商,其中许多人缺乏自己开展这项工作的资源。BHL澳大利亚的国家联盟模式,使生物多样性文献可代表许多组织访问,是BHL全球社区中独一无二的。大多数BHL业务代表单个组织将材料数字化。澳大利亚生物多样性研究所现在已经在网上免费提供了超过53万页的澳大利亚生物多样性知识。BHL澳大利亚收藏包括已出版的作品(书籍和期刊)和未出版的材料(收集登记册,实地日记和通信)。这些作品的页面上充满了物种描述,对历史上重要人物的参考,最重要的是,对正在进行的研究和保护工作至关重要的科学数据。提供可追溯到17世纪和最近一年出版的材料,该收藏记录了澳大利亚生物多样性的科学发现和理解。澳大利亚BHL还领导全球倡议,通过将文章数据纳入BHL并回顾性地分配doi(数字对象标识符),将世界历史上的生物多样性和分类文献纳入现代学术研究网络(Kearney等人,2021)。BHL现在已经为历史出版物分配了超过162,000份doi,使它们在BHL内外都可以持续引用和跟踪。本文将通过(现在可访问的,可发现的和DOI)塔斯马尼亚虎文献来庆祝BHL澳大利亚的成就。它将展示:以前难以捉摸的袋狼描述(和美丽的插图),包括格哈德·克雷夫特(1871年)https://doi.org/10.5962/p.314741和约翰·古尔德(1863年)https://doi.org/10.5962/p.312790;从付费出版物中获取开放获取版本的链接,这些链接本应属于公共领域,例如对袋狼的首次描述(Harris 1808):在BHL上开放获取;付费墙由牛津学术;在现代学术文章、分类数据库、社交媒体和维基百科(Kearney and Page 2022)中,许多历史分类描述的引用现在以可点击的DOI链接出现;并努力鼓励更多的作者引用权威的分类名称来源(Benichou 2022)。以前难以捉摸的袋狼描述(和美丽的插图),包括格哈德·克雷夫特(1871年)https://doi.org/10.5962/p.314741和约翰·古尔德(1863年)https://doi.org/10.5962/p.312790;从付费出版物中获取开放获取版本的链接,这些链接本应属于公共领域,例如对袋狼的首次描述(Harris 1808):在BHL上开放获取;付费墙由牛津学术;在现代学术文章、分类数据库、社交媒体和维基百科(Kearney and Page 2022)中,许多历史分类描述的引用现在以可点击的DOI链接出现;并努力鼓励更多的作者引用权威的分类名称来源(Benichou 2022)。袋狼的灭绝鲜明地提醒我们,缺乏对自然世界的理解和欣赏将带来不可逆转的后果。同样,缺乏获取和/或无法找到生物多样性知识,阻碍了我们从过去学习的能力,阻碍了科学进步和保护工作。生物多样性遗产图书馆的创建是为了“解决科学研究的一个主要障碍:缺乏获取自然历史文献的途径”(BHL 2019)。BHL澳大利亚为这一全球使命做出了重大贡献,并在BHL向生物多样性知识图谱的完全可搜索、持续可链接组件的过渡中发挥了重要作用(Kearney 2020, Page 2016)。
{"title":"Celebrating BHL Australia through the Eye of the (Tasmanian) Tiger","authors":"Nicole Kearney","doi":"10.3897/biss.7.112352","DOIUrl":"https://doi.org/10.3897/biss.7.112352","url":null,"abstract":"BHL Australia, the Australian branch of the Biodiversity Heritage Library (BHL), was launched in 2010 and began operation with a single organisation, Museums Victoria in Melbourne. Since then, it has grown considerably. Funded by the Atlas of Living Australia, BHL Australia now digitises biodiversity literature on behalf of 42 organisations across the country. These organisations include museums, herbaria, state libraries, royal societies, government agencies, field naturalist clubs and natural history publishers, many of whom lack the resources to do this work themselves. BHL Australia’s national consortium model, which makes biodiversity literature accessible on behalf of so many organisations, is unique amongst the BHL global community. Most BHL operations digitise material on behalf of a single organisation.\u0000 BHL Australia has now made over 530,000 pages of Australia’s biodiversity knowledge freely accessible online. The BHL Australia Collection includes both published works (books and journals) and unpublished material (collection registers, field diaries and correspondence). The pages of these works are filled with species descriptions, references to historically significant people and, most importantly, scientific data that is critical to ongoing research and conservation efforts. Providing access to materials published as far back as the 1600s and as recently as the current year, the collection chronicles the scientific discovery and understanding of Australia’s biodiversity.\u0000 BHL Australia also leads the global initiative to bring the world's historic biodiversity and taxonomic literature into the modern linked network of scholarly research by incorporating article data into BHL and retrospectively assigning DOIs (Digital Object Identifiers) (Kearney et al. 2021). BHL has now assigned more than 162,000 DOIs to historic publications, making them persistently citable and trackable, both within BHL and beyond. \u0000 This paper will celebrate the achievements of BHL Australia by journeying through the (now accessible, discoverable and DOI'd) Tasmanian Tiger literature. It will showcase:\u0000 \u0000 \u0000 \u0000 previously elusive descriptions (and beautiful illustrations) of Thylacines, including those by Gerhard Krefft (1871) https://doi.org/10.5962/p.314741, and John Gould (1863) https://doi.org/10.5962/p.312790;\u0000 \u0000 \u0000 the invaluable creation of links to open access versions from paywalled publications that should be in the public domain, such as the first description of the Thylacine (Harris 1808): open access on BHL; paywalled by Oxford Academic;\u0000 \u0000 \u0000 the many citations of historic taxonomic descriptions that are now appearing as clickable DOI links in modern scholarly articles, taxonomic databases, social media, and Wikipedia (Kearney and Page 2022); and\u0000 \u0000 \u0000 the efforts being made to encourage more authors to cite the authoritative source of taxonomic names (Benichou 2022).\u0000 \u0000 \u0000 \u0000 previously elusive descriptions (and beautiful illustrations) of Thylacines, i","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84286904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital specimens are new information objects on the internet, which act as digital surrogates of the physical objects they represent. They are designed to be extended with data derived from the specimen like genetic, morphological and chemical data, and with data that puts the specimen in context of its gathering event and the environment it was derived from. This requires linking the digital specimens and their related entities to information about agents, locations, publications, taxa and environmental information. To establish reliable links and (re-)connect data to specimens, a new framework is needed, which creates persistent identifiers (PIDs) for the digital specimen and its related entities. These PIDs should be actionable by machines but also can be used by humans for data citation and communication purposes. The framework that enables this is a new PID infrastructure, produced by the European Commission-funded BiCIKL project (Biodiversity Community Integrated Knowledge Library), creates persistent and actionable identifiers. It is a generic PID infrastructure that will be used by the Distributed System for Scientific Collections research infrastructure (DiSSCo), but it can also be used by other infrastructures and institutions. PIDs minted by DiSSCo will be linked to the digital specimens and samples provided through DiSSCo. The new PIDs are a key element in enabling the concept of Digital Extended Specimens (Webster et al. 2021) and provide unique and resolvable references to enable bidirectional linking. DiSSCo has done extensive work to select the most appropriate PID scheme (Hardisty et al. 2021) and to design a PID infrastructure for the pan-European specimens. The draft design has been discussed with technical specialists in the joint DiSSCo and Consortium of European Taxonomic Facilities (CETAF) community, with international stakeholders like the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio) and was discussed at the 2022 conference of the Society for the Preservation of Natural History Collections (SPNHC). A first implementation was demonstrated in the Biodiversity Information Standards (TDWG) annual conference in 2022 and illustrated key elements in the design. To be able to provide digital specimen identifiers as DOIs (Digital Object Identifiers), a pilot project was started in 2023 with DataCite to investigate if Digital Specimen DOIs in the new PID infrastructure can be created using the DataCite service. The pilot aim was to create metadata crosswalks to the DataCite schema in consultation with the DataCite Metadata Working Group, to evaluate synergies with the IGSN (International Generic Sample Number) metadata schema, to develop and test PID kernel metadata registration, and to evaluate performance and the impact of using DataCite services. There are around two billion specimens and creating PIDs for them as DOIs requires creating DOIs at an unprecedented scale. Also,
数字样本是互联网上新的信息对象,它们作为它们所代表的物理对象的数字替代品。它们的设计目的是扩展来自标本的数据,如遗传、形态和化学数据,以及将标本置于其收集事件及其产生环境的背景下的数据。这需要将数字标本及其相关实体与有关代理人、地点、出版物、分类群和环境信息的信息联系起来。为了建立可靠的链接并(重新)将数据连接到标本,需要一个新的框架,为数字标本及其相关实体创建持久标识符(pid)。这些pid应该可以被机器操作,但也可以被人类用于数据引用和通信目的。实现这一目标的框架是一个新的PID基础设施,由欧盟委员会资助的BiCIKL项目(生物多样性社区综合知识库)生产,创建持久和可操作的标识符。它是一个通用的PID基础设施,将被分布式系统用于科学收藏研究基础设施(DiSSCo),但它也可以被其他基础设施和机构使用。由DiSSCo铸造的pid将与通过DiSSCo提供的数字标本和样品相关联。新的pid是实现数字扩展标本概念的关键因素(Webster等人,2021),并提供独特且可解析的参考,以实现双向连接。DiSSCo已经做了大量的工作来选择最合适的PID方案(Hardisty et al. 2021),并为泛欧标本设计了PID基础设施。该设计草案已与DiSSCo和欧洲分类学设施联盟(CETAF)社区的技术专家,以及全球生物多样性信息设施(GBIF)和综合数字化生物收集(iDigBio)等国际利益相关者进行了讨论,并在2022年自然历史收藏保护协会(SPNHC)会议上进行了讨论。在2022年的生物多样性信息标准(TDWG)年会上展示了第一次实施,并说明了设计中的关键要素。为了能够提供数字样本标识符作为doi(数字对象标识符),DataCite于2023年启动了一个试点项目,以调查是否可以使用DataCite服务在新的PID基础设施中创建数字样本doi。试点的目的是与DataCite元数据工作组协商,创建与DataCite模式的元数据交叉通道,评估与IGSN(国际通用样本号)元数据模式的协同作用,开发和测试PID内核元数据注册,并评估使用DataCite服务的性能和影响。大约有20亿个样本,为它们创建pid作为doi需要以前所未有的规模创建doi。此外,PID内核元数据注册是doi的新特性。所包含的标本元数据将补充现有的生物多样性信息标准,如达尔文核心,并支持正在开发中的新的MIDS(关于数字标本的最小信息)标准。新的PID基础设施的设计、开发和测试是BiCIKL项目的一部分,该项目旨在促进基础设施之间的协作并发展双向连接(Penev et al. 2022)。在会议上,我们将展示PID基础设施的发展成果,作为BiCIKL工具箱的一部分,用于连接生物多样性数据,并讨论创建数字标本doi的进展。
{"title":"A Novel Part in the Swiss Army Knife for Linking Biodiversity Data: The digital specimen identifier service","authors":"W. Addink, Soulaine Theocharides, Sharif Islam","doi":"10.3897/biss.7.112283","DOIUrl":"https://doi.org/10.3897/biss.7.112283","url":null,"abstract":"Digital specimens are new information objects on the internet, which act as digital surrogates of the physical objects they represent. They are designed to be extended with data derived from the specimen like genetic, morphological and chemical data, and with data that puts the specimen in context of its gathering event and the environment it was derived from. This requires linking the digital specimens and their related entities to information about agents, locations, publications, taxa and environmental information. To establish reliable links and (re-)connect data to specimens, a new framework is needed, which creates persistent identifiers (PIDs) for the digital specimen and its related entities. These PIDs should be actionable by machines but also can be used by humans for data citation and communication purposes.\u0000 The framework that enables this is a new PID infrastructure, produced by the European Commission-funded BiCIKL project (Biodiversity Community Integrated Knowledge Library), creates persistent and actionable identifiers. It is a generic PID infrastructure that will be used by the Distributed System for Scientific Collections research infrastructure (DiSSCo), but it can also be used by other infrastructures and institutions. PIDs minted by DiSSCo will be linked to the digital specimens and samples provided through DiSSCo. The new PIDs are a key element in enabling the concept of Digital Extended Specimens (Webster et al. 2021) and provide unique and resolvable references to enable bidirectional linking. \u0000 DiSSCo has done extensive work to select the most appropriate PID scheme (Hardisty et al. 2021) and to design a PID infrastructure for the pan-European specimens. The draft design has been discussed with technical specialists in the joint DiSSCo and Consortium of European Taxonomic Facilities (CETAF) community, with international stakeholders like the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio) and was discussed at the 2022 conference of the Society for the Preservation of Natural History Collections (SPNHC). A first implementation was demonstrated in the Biodiversity Information Standards (TDWG) annual conference in 2022 and illustrated key elements in the design. To be able to provide digital specimen identifiers as DOIs (Digital Object Identifiers), a pilot project was started in 2023 with DataCite to investigate if Digital Specimen DOIs in the new PID infrastructure can be created using the DataCite service. The pilot aim was to create metadata crosswalks to the DataCite schema in consultation with the DataCite Metadata Working Group, to evaluate synergies with the IGSN (International Generic Sample Number) metadata schema, to develop and test PID kernel metadata registration, and to evaluate performance and the impact of using DataCite services. There are around two billion specimens and creating PIDs for them as DOIs requires creating DOIs at an unprecedented scale. Also,","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89183646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Sica, Wesley Hochachka, Yi-Ming Gan, Kate Ingenloff, Dmitry Schigel, Robert Stevenson, Steven Baskauf, Peter Brenton, Anahita J. N. Kazem, John Wieczorek
Access to high-quality ecological data is critical to assessing and modeling biodiversity and its changes through space and time. The Darwin Core standard has proven to be immensely helpful in sharing species occurrence data (see Wieczorek et al. 2012, Global Biodiversity Information Facility, GBIF) and promoting biodiversity research following the FAIR principles of findability, accessibility, interoperability and reusability (Wilkinson et al. 2016). However, it is limited in its ability to fully accommodate inventory data (i.e., linked records of multiple taxa at a specific place and time). Information about the inventory processes is often either unreported or described in an unstructured manner, limiting its potential re-use for larger-scale analyses. Two key aspects that are not captured in a structured manner yet are: i) information about the species that were not detected during an inventory, and ii) ancillary information about sampling effort and completeness. Non-detections (i.e., reported counts of zero) potentially enable more accurate and precise estimates of distribution, abundance, and changes in abundance. This becomes possible when variation in effort is used to estimate the likelihood that a non-detection represents a true absence of that taxon during the inventory. Currently, ecological inventory data, when shared at all, are typically discoverable through dataset catalogs (e.g., governmental data repositories) and supplementary materials to publications. With few exceptions, indexing of such data with the detail and structure needed has not been attempted at broad temporal and spatial scales, despite the potentially high value resulting from making inventory data more readily accessible. To address these limitations in documenting inventory data using the Darwin Core, Guralnick et al. (2018) proposed the Humboldt Core. Subsequent discussions within the biodiversity standards community made it clear that greater integration could be achieved by creating an extension of the Darwin Core, rather than developing a new standard in isolation. Extension design work began in 2021 and progress has been reported by Brenton (2021) and Sica et al. (2022). Over the last year the Humboldt Extension Task Group has sought advice from data providers and aggregators and updated its vocabulary terms. A challenging aspect has been creating terminology for the parent-child relationships (see Properties of Hierarchical Events) needed to describe surveys that may be as simple as a collection of checklists (one level of hierarchy) or as complex as species records from traps within plots along transects across habitats over multiple years (at least four levels of hierarchy). The Task Group has committed to completing a User Guide for the Humboldt Extension. Group members who contributed to the Darwin Core (Darwin Core Task Group 2009) and the Vocabulary Maintenance Specification (Vocabulary Maintenance Specification Task Group 2017) have provided va
{"title":"Want to Describe and Share Biodiversity Inventory and Monitoring Data? The Humboldt Extension for Ecological Inventories Can Help!","authors":"Y. Sica, Wesley Hochachka, Yi-Ming Gan, Kate Ingenloff, Dmitry Schigel, Robert Stevenson, Steven Baskauf, Peter Brenton, Anahita J. N. Kazem, John Wieczorek","doi":"10.3897/biss.7.112229","DOIUrl":"https://doi.org/10.3897/biss.7.112229","url":null,"abstract":"Access to high-quality ecological data is critical to assessing and modeling biodiversity and its changes through space and time. The Darwin Core standard has proven to be immensely helpful in sharing species occurrence data (see Wieczorek et al. 2012, Global Biodiversity Information Facility, GBIF) and promoting biodiversity research following the FAIR principles of findability, accessibility, interoperability and reusability (Wilkinson et al. 2016). However, it is limited in its ability to fully accommodate inventory data (i.e., linked records of multiple taxa at a specific place and time). Information about the inventory processes is often either unreported or described in an unstructured manner, limiting its potential re-use for larger-scale analyses. Two key aspects that are not captured in a structured manner yet are: i) information about the species that were not detected during an inventory, and ii) ancillary information about sampling effort and completeness.\u0000 Non-detections (i.e., reported counts of zero) potentially enable more accurate and precise estimates of distribution, abundance, and changes in abundance. This becomes possible when variation in effort is used to estimate the likelihood that a non-detection represents a true absence of that taxon during the inventory. Currently, ecological inventory data, when shared at all, are typically discoverable through dataset catalogs (e.g., governmental data repositories) and supplementary materials to publications. With few exceptions, indexing of such data with the detail and structure needed has not been attempted at broad temporal and spatial scales, despite the potentially high value resulting from making inventory data more readily accessible.\u0000 To address these limitations in documenting inventory data using the Darwin Core, Guralnick et al. (2018) proposed the Humboldt Core. Subsequent discussions within the biodiversity standards community made it clear that greater integration could be achieved by creating an extension of the Darwin Core, rather than developing a new standard in isolation. Extension design work began in 2021 and progress has been reported by Brenton (2021) and Sica et al. (2022). \u0000 Over the last year the Humboldt Extension Task Group has sought advice from data providers and aggregators and updated its vocabulary terms. A challenging aspect has been creating terminology for the parent-child relationships (see Properties of Hierarchical Events) needed to describe surveys that may be as simple as a collection of checklists (one level of hierarchy) or as complex as species records from traps within plots along transects across habitats over multiple years (at least four levels of hierarchy). The Task Group has committed to completing a User Guide for the Humboldt Extension. Group members who contributed to the Darwin Core (Darwin Core Task Group 2009) and the Vocabulary Maintenance Specification (Vocabulary Maintenance Specification Task Group 2017) have provided va","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"1939 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91122617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In current life science practice, digital data are associated with all parts of the research lifecycle. Generation and management of data are planned for during project conception; collected from numerous instruments or existing sources; prepared for analysis and analysed to generate new knowledge and information; and then (hopefully) preserved so that the data may be found, shared and re-used by others when appropriate. This session will begin with a scan of the biodata and biodata infrastructure landscape within Australia. We will explore which organisations fund biodata generation, where data are processed and stored, and how data are made available for reuse by others. Important global and complementary data resources that are hosted offshore will also be discussed. To guarantee reproducibility and integrity for life sciences research, it is critical that each of these infrastructures (whether they are hosted on- or off-shore) are maintained for the long term. As an example of a resource that utilises a mixture of existing on- and off-shore data infrastructures to underpin a critical research need, the Australian Reference Genome Atlas (ARGA) will be discussed. ARGA is solving the problem of genomics data obscurity for Australian-relevant species by creating an online platform where life sciences researchers can comprehensively and confidently search for genomic data for taxa relevant to Australian research. Publicly available genomics (and genetics) data are aggregated and indexed from multiple sources (both on- and off-shore), and then integrated with occurrence records and the taxonomic frameworks of the Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA) to enrich the genomic data and make them searchable using taxonomy, location, ecological characteristics and selected phenotypic data. The presentation sets the scene for a subsequent talk by members of the Global Biodata Coalition (GBC), who will outline the challenges in sustaining the types of disseminated infrastructure discussed and the GBC’s work with the funders who support many of these resources to ensure long-term funding for existing infrastructure, while also channelling support to underpin future growth in data volumes and new technologies.
{"title":"Biodata Infrastructure within Australia and Beyond: Landscapes and horizons","authors":"Jeff Christiansen, Kathryn Hall","doi":"10.3897/biss.7.112274","DOIUrl":"https://doi.org/10.3897/biss.7.112274","url":null,"abstract":"In current life science practice, digital data are associated with all parts of the research lifecycle. Generation and management of data are planned for during project conception; collected from numerous instruments or existing sources; prepared for analysis and analysed to generate new knowledge and information; and then (hopefully) preserved so that the data may be found, shared and re-used by others when appropriate. \u0000 This session will begin with a scan of the biodata and biodata infrastructure landscape within Australia. We will explore which organisations fund biodata generation, where data are processed and stored, and how data are made available for reuse by others. Important global and complementary data resources that are hosted offshore will also be discussed. To guarantee reproducibility and integrity for life sciences research, it is critical that each of these infrastructures (whether they are hosted on- or off-shore) are maintained for the long term.\u0000 As an example of a resource that utilises a mixture of existing on- and off-shore data infrastructures to underpin a critical research need, the Australian Reference Genome Atlas (ARGA) will be discussed. ARGA is solving the problem of genomics data obscurity for Australian-relevant species by creating an online platform where life sciences researchers can comprehensively and confidently search for genomic data for taxa relevant to Australian research. Publicly available genomics (and genetics) data are aggregated and indexed from multiple sources (both on- and off-shore), and then integrated with occurrence records and the taxonomic frameworks of the Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA) to enrich the genomic data and make them searchable using taxonomy, location, ecological characteristics and selected phenotypic data. The presentation sets the scene for a subsequent talk by members of the Global Biodata Coalition (GBC), who will outline the challenges in sustaining the types of disseminated infrastructure discussed and the GBC’s work with the funders who support many of these resources to ensure long-term funding for existing infrastructure, while also channelling support to underpin future growth in data volumes and new technologies.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87150039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}