首页 > 最新文献

Biodiversity Information Science and Standards最新文献

英文 中文
Data Standards and Interoperability Challenges for Biodiversity Digital Twin: A novel and transformative approach to biodiversity research and application 生物多样性数字孪生的数据标准和互操作性挑战:生物多样性研究和应用的一种新颖的变革性方法
Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112373
Sharif Islam, Hanna Koivula, Dag Endresen, Erik Kusch, Dmitry Schigel, Wouter Addink
The Biodiversity Digital Twin (BioDT) project (2022-2025) aims to create prototypes that integrate various data sets, models, and expert domain knowledge enabling prediction capabilities and decision-making support for critical issues in biodiversity dynamics. While digital twin concepts have been applied in industries for continuous monitoring of physical phenomena, their application in biodiversity and environmental sciences presents novel challenges (Bauer et al. 2021, de Koning et al. 2023). In addition, successfully developing digital twins for biodiversity requires addressing interoperability challenges in data standards. BioDT is developing prototype digital twins based on use cases that span various data complexities, from point occurrence data to bioacoustics, covering nationwide forest states to specific communities and individual species. The project relies on FAIR principles (Findable, Accessible, Interoperable, and Reusable) and FAIR enabling resources like standards and vocabularies (Schultes et al. 2020) to enable the exchange, sharing, and reuse of biodiversity information, fostering collaboration among participating research infrastructures (DiSSCo, eLTER, GBIF, and LifeWatch) and data providers. It also involves creating a harmonised abstraction layer using Persistent Identifiers (PID) and FAIR Digital Object (FDO) records, alongside semantic mapping and crosswalk techniques to provide machine-actionable metadata (Schultes and Wittenburg 2019, Schwardmann 2020). Governance and engagement with research infrastructure stakeholders play crucial roles in this regard, with a focus on aligning technical and data standards discussions. In addition to data, models and workflows are key elements in BioDT. Models in the BioDT context are formal representations of problems or processes, implemented through equations, algorithms, or a combination of both, which can be executed by machine entities. The current twin prototypes are considering both statistical and mechanistic models, introducing significant variations in (1) data requirements, (2) modelling approaches and philosophy, and (3) model output. The BioDT consortium will develop guidelines and protocols for how to describe these models, what metadata to include, and how they will interact with the diverse datasets. While discussions on this topic exist within the broader context of biodiversity and ecological sciences (Jeltsch et al. 2013, Fer et al. 2020), the BioDT project is strongly committed to finding a solution within its scope. In the twinning context, data and models need to be executed within a computing infrastructure and also need to adhere to FAIR principles. Software within BioDT includes a suite of tools that facilitate data acquisition, storage, processing, and analysis. While some of these tools already exist, the challenge lies in integrating them within the digital twinning framework. One approach to achieving integration is through workflow representation, encompa
生物多样性数字孪生(BioDT)项目(2022-2025)旨在创建集成各种数据集、模型和专家领域知识的原型,为生物多样性动态的关键问题提供预测能力和决策支持。虽然数字孪生概念已应用于工业中对物理现象进行连续监测,但它们在生物多样性和环境科学中的应用提出了新的挑战(Bauer等人,2021年,de Koning等人,2023年)。此外,成功开发生物多样性的数字孪生需要解决数据标准中的互操作性挑战。BioDT正在开发基于使用案例的数字双胞胎原型,这些案例涵盖了各种数据复杂性,从点发生数据到生物声学,涵盖了全国森林州到特定群落和单个物种。该项目依靠FAIR原则(可查找、可访问、可互操作和可重用)和FAIR支持资源,如标准和词汇表(Schultes et al. 2020),实现生物多样性信息的交换、共享和重用,促进参与研究基础设施(disco、eLTER、GBIF和LifeWatch)和数据提供者之间的合作。它还涉及使用持久标识符(PID)和FAIR数字对象(FDO)记录创建一个协调的抽象层,以及语义映射和人行横道技术来提供机器可操作的元数据(Schultes和Wittenburg 2019, Schwardmann 2020)。治理和与研究基础设施利益相关者的接触在这方面发挥着关键作用,重点是协调技术和数据标准的讨论。除了数据之外,模型和工作流程也是生物odt的关键要素。BioDT上下文中的模型是问题或过程的形式化表示,通过方程、算法或两者的组合实现,可以由机器实体执行。目前的孪生原型同时考虑了统计模型和机械模型,在(1)数据需求,(2)建模方法和哲学,以及(3)模型输出方面引入了显著的变化。BioDT联盟将制定指导方针和协议,说明如何描述这些模型,包括哪些元数据,以及它们如何与各种数据集交互。虽然关于这一主题的讨论存在于生物多样性和生态科学的更广泛背景下(Jeltsch et al. 2013, Fer et al. 2020),但BioDT项目坚定地致力于在其范围内寻找解决方案。在孪生上下文中,数据和模型需要在计算基础设施中执行,并且还需要遵守FAIR原则。BioDT内部的软件包括一套工具,用于促进数据采集、存储、处理和分析。虽然其中一些工具已经存在,但挑战在于将它们集成到数字孪生框架中。实现集成的一种方法是通过工作流表示,包括指导数据获取、打包、处理和分析的标准化过程和协议。该项目正在探索研究对象箱(RO-Crate)的实现(Soiland-Reyes et al. 2022)。实现工作流可以确保研究实践中的再现性、可扩展性和透明度,使科学家能够验证和复制发现。BioDT项目为生物多样性研究和应用提供了一种新颖的、变革性的方法。通过利用协作研究基础设施和坚持数据标准,BioDT旨在利用数据、软件、超级计算机、模型和专业知识的力量,提供新的见解。包括生物多样性信息标准(TDWG)在内的数据标准所提供的基础,对于充分发挥数字孪生的潜力、促进各种数据源的无缝集成以及与模型的组合至关重要。
{"title":"Data Standards and Interoperability Challenges for Biodiversity Digital Twin: A novel and transformative approach to biodiversity research and application","authors":"Sharif Islam, Hanna Koivula, Dag Endresen, Erik Kusch, Dmitry Schigel, Wouter Addink","doi":"10.3897/biss.7.112373","DOIUrl":"https://doi.org/10.3897/biss.7.112373","url":null,"abstract":"The Biodiversity Digital Twin (BioDT) project (2022-2025) aims to create prototypes that integrate various data sets, models, and expert domain knowledge enabling prediction capabilities and decision-making support for critical issues in biodiversity dynamics. While digital twin concepts have been applied in industries for continuous monitoring of physical phenomena, their application in biodiversity and environmental sciences presents novel challenges (Bauer et al. 2021, de Koning et al. 2023). In addition, successfully developing digital twins for biodiversity requires addressing interoperability challenges in data standards. BioDT is developing prototype digital twins based on use cases that span various data complexities, from point occurrence data to bioacoustics, covering nationwide forest states to specific communities and individual species. The project relies on FAIR principles (Findable, Accessible, Interoperable, and Reusable) and FAIR enabling resources like standards and vocabularies (Schultes et al. 2020) to enable the exchange, sharing, and reuse of biodiversity information, fostering collaboration among participating research infrastructures (DiSSCo, eLTER, GBIF, and LifeWatch) and data providers. It also involves creating a harmonised abstraction layer using Persistent Identifiers (PID) and FAIR Digital Object (FDO) records, alongside semantic mapping and crosswalk techniques to provide machine-actionable metadata (Schultes and Wittenburg 2019, Schwardmann 2020). Governance and engagement with research infrastructure stakeholders play crucial roles in this regard, with a focus on aligning technical and data standards discussions. In addition to data, models and workflows are key elements in BioDT. Models in the BioDT context are formal representations of problems or processes, implemented through equations, algorithms, or a combination of both, which can be executed by machine entities. The current twin prototypes are considering both statistical and mechanistic models, introducing significant variations in (1) data requirements, (2) modelling approaches and philosophy, and (3) model output. The BioDT consortium will develop guidelines and protocols for how to describe these models, what metadata to include, and how they will interact with the diverse datasets. While discussions on this topic exist within the broader context of biodiversity and ecological sciences (Jeltsch et al. 2013, Fer et al. 2020), the BioDT project is strongly committed to finding a solution within its scope. In the twinning context, data and models need to be executed within a computing infrastructure and also need to adhere to FAIR principles. Software within BioDT includes a suite of tools that facilitate data acquisition, storage, processing, and analysis. While some of these tools already exist, the challenge lies in integrating them within the digital twinning framework. One approach to achieving integration is through workflow representation, encompa","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135981858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining Camera Trap Data and Environmental Data to Estimate the Effects of Environmental Gradients on Abundance of the Asian Elephant Elephas maximus in Cambodia 结合相机陷阱数据和环境数据估算环境梯度对柬埔寨亚洲象数量的影响
Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112100
Ret Thaung, Jackson Frechette, Matthew Luskin, Zachary Amir
Asian elephant ( Elephas maximus ) populations in Cambodia are currently declining, and the effect of environmental degradation on the abundance and health of elephants is poorly understood. We used camera trap data from 42 locations between 2016 to 2020 in the southern Cardamom Mountains to investigate the impact of environmental degradation on the abundance and condition of Asian elephants. Camera trap data were organized using CameraSweet software to retrieve both number of individuals and their condition. For a number of individuals, we defined independent captures spatially and temporally. To assess condition, we created a visual scoring system based on past research (Wemmer et al. 2006, Fernando et al. 2009, Morfeld et al. 2014, Wijeyamohan et al. 2014, Morfeld et al. 2016, Schiffmann et al. 2020). This scoring system relies on visual assessment of the muscle and fat in relation to the pelvis, ribs, and back bone. To validate this subjective scoring system, two scorers reviewed elephant captures by using 10 reference photos and then reviewing each other’s assessment in the first five images showing the elephant's body condition. This method minimizes subjective assessment from two scorers. Environmental variables (Suppl. material 1) such as distance to forest edge, forest integrity index, elevation, global human settlements, distance to road, distance to river, night light and forest cover were obtained, then reclassified in ArcGIS to a common 1 km grid. We implemented hierarchical N-mixture models to investigate the impacts of environmental variables on abundance and used cumulative link models to investigate the impact of the same environmental variables on condition. We found that Asian elephant abundance exhibited a significant positive relationship with distance to forest edges, where abundance was greater further away from a forest edge. We found that body condition score exhibited the relationship with forest cover and Forest Landscape Integrity Index, which suggested that grassland and less dense forest support better condition. Moreover, males exhibited significantly higher scores for body condition than females, while babies, juveniles, and subadults all exhibited lower body condition scores compared to adults. The significantly lower body condition of young elephants is concerning and suggests that conservation managers in the region should prioritize environmental conditions that support young elephant health. Our results identify key environmental variables that appear to promote Asian elephant abundance and health in the Cardamom Mountains, thus informing relevant conservation actions to support this endangered species in Cambodia and beyond.
柬埔寨的亚洲象数量目前正在下降,人们对环境退化对大象数量和健康的影响知之甚少。我们使用2016年至2020年期间在豆蔻山脉南部42个地点的相机陷阱数据来调查环境退化对亚洲象数量和状况的影响。使用CameraSweet软件组织相机陷阱数据,以检索个体数量和它们的状况。对于许多个体,我们在空间和时间上定义了独立的捕获。为了评估病情,我们基于过去的研究创建了一个视觉评分系统(Wemmer等人,2006年,Fernando等人,2009年,Morfeld等人,2014年,Wijeyamohan等人,2014年,Morfeld等人,2016年,Schiffmann等人,2020年)。这个评分系统依赖于与骨盆、肋骨和脊骨相关的肌肉和脂肪的视觉评估。为了验证这个主观评分系统,两位评分者通过使用10张参考照片来评估大象的捕捉,然后在显示大象身体状况的前五张图像中互相评估。这种方法最大限度地减少了两名评分者的主观评价。环境变量(补充)资料1),如到森林边缘的距离、森林完整性指数、高程、全球人类住区、到道路的距离、到河流的距离、夜间灯光和森林覆盖,然后在ArcGIS中重新分类为一个共同的1 km网格。我们采用分层n -混合模型来研究环境变量对丰度的影响,并使用累积联系模型来研究相同环境变量对条件的影响。我们发现亚洲象的丰度与离森林边缘的距离呈显著正相关,离森林边缘越远,亚洲象的丰度越大。研究发现,体况评分与森林覆盖度和森林景观完整性指数呈正相关关系,表明草地和低密度森林支持较好的体况条件。此外,雄鱼的身体状况得分显著高于雌鱼,而幼鱼、幼鱼和亚成鱼的身体状况得分均低于成鱼。幼象明显较低的身体状况令人担忧,并建议该地区的保护管理者应优先考虑支持幼象健康的环境条件。我们的研究结果确定了促进豆蔻山脉亚洲象数量和健康的关键环境变量,从而为相关保护行动提供信息,以支持柬埔寨及其他地区的这一濒危物种。
{"title":"Combining Camera Trap Data and Environmental Data to Estimate the Effects of Environmental Gradients on Abundance of the Asian Elephant Elephas maximus in Cambodia","authors":"Ret Thaung, Jackson Frechette, Matthew Luskin, Zachary Amir","doi":"10.3897/biss.7.112100","DOIUrl":"https://doi.org/10.3897/biss.7.112100","url":null,"abstract":"Asian elephant ( Elephas maximus ) populations in Cambodia are currently declining, and the effect of environmental degradation on the abundance and health of elephants is poorly understood. We used camera trap data from 42 locations between 2016 to 2020 in the southern Cardamom Mountains to investigate the impact of environmental degradation on the abundance and condition of Asian elephants. Camera trap data were organized using CameraSweet software to retrieve both number of individuals and their condition. For a number of individuals, we defined independent captures spatially and temporally. To assess condition, we created a visual scoring system based on past research (Wemmer et al. 2006, Fernando et al. 2009, Morfeld et al. 2014, Wijeyamohan et al. 2014, Morfeld et al. 2016, Schiffmann et al. 2020). This scoring system relies on visual assessment of the muscle and fat in relation to the pelvis, ribs, and back bone. To validate this subjective scoring system, two scorers reviewed elephant captures by using 10 reference photos and then reviewing each other’s assessment in the first five images showing the elephant's body condition. This method minimizes subjective assessment from two scorers. Environmental variables (Suppl. material 1) such as distance to forest edge, forest integrity index, elevation, global human settlements, distance to road, distance to river, night light and forest cover were obtained, then reclassified in ArcGIS to a common 1 km grid. We implemented hierarchical N-mixture models to investigate the impacts of environmental variables on abundance and used cumulative link models to investigate the impact of the same environmental variables on condition. We found that Asian elephant abundance exhibited a significant positive relationship with distance to forest edges, where abundance was greater further away from a forest edge. We found that body condition score exhibited the relationship with forest cover and Forest Landscape Integrity Index, which suggested that grassland and less dense forest support better condition. Moreover, males exhibited significantly higher scores for body condition than females, while babies, juveniles, and subadults all exhibited lower body condition scores compared to adults. The significantly lower body condition of young elephants is concerning and suggests that conservation managers in the region should prioritize environmental conditions that support young elephant health. Our results identify key environmental variables that appear to promote Asian elephant abundance and health in the Cardamom Mountains, thus informing relevant conservation actions to support this endangered species in Cambodia and beyond.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135980426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unearthing the Past for a Sustainable Future: Extracting and transforming data in the Biodiversity Heritage Library for climate action 为可持续的未来挖掘过去:为气候行动提取和转换生物多样性遗产图书馆的数据
Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112436
JJ Dearborn, Mike Lichtenberg, Joel Richard, Joseph deVeer, Michael Trizna, Katie Mika
As the urgency to address the climate crisis intensifies, the availability of accurate and comprehensive biodiversity data has become crucial for informing climate change studies, tracking key environmental indicators, and building global biodiversity monitoring platforms. The Biodiversity Heritage Library (BHL) plays a vital role in the core biodiversity infrastructure, housing over 60 million pages of digitized literature about life on Earth. Recognizing the value of over 500 years of data in BHL, a global network of BHL staff is working to establish a scalable data pipeline to provide actionable occurrence data from BHL’s vast and diverse collections. However, transforming textual content into FAIR (findable, accessible, interoperable, reusable) data poses challenges due to missing descriptive metadata and error-ridden unstructured outputs from commercial text engines. (Fig. 1) Despite the wealth of knowledge in BHL now available to global audiences, the underutilization of biodiversity and climate data contained in BHL's textual corpus hinders scientific research, hampers informed decision-making for conservation efforts, and limits our understanding of biodiversity patterns crucial for addressing the climate crisis. By leveraging recent advancements in text recognition engines, along with cutting-edge AI (Artificial Intelligence) models like OpenAI’s CLIP (Contrastive Language-Image Pre-Training) and nascent features in transcription platforms, BHL staff are beginning to process vast amounts of textual and image data and transform centuries worth of data from BHL collections into computationally usable formats. Recent technological breakthroughs now offer a transformative opportunity to empower the global biodiversity community with prescient insights from our shared past and facilitate the integration of historical knowledge into climate action initiatives. To bridge gaps in the historical record and unlock the potential of the Biodiversity Heritage Library (BHL), a multi-pronged effort utilizing innovative cross-disciplinary approaches is being piloted. These technical approaches were selected for their efficiency and ability to generate rapid results that could be applied across the diverse range of materials in BHL. (Fig. 2) Piloting a data pipeline that is scalable to 60 million pages requires considerable investigation, experimentation, and resources but will have an appreciable impact on global conservation efforts by informing and establishing historic baselines deeper into time. This presentation will focus on the identification, extraction, and transformation of OCR into structured data outputs in BHL. Approaches include: Upgrading legacy OCR text using Tesseract OCR engine to improve data quality by 20% and openly publish 40 GBs of textual data as FAIR data; Evaluating handwritten text recognition (HTR) engines (Microsoft Azure Computer Vision, Google Cloud Vision API (Application Programming Interface), and Amazon Textract) to im
随着应对气候危机的迫切性日益加剧,获得准确、全面的生物多样性数据对于气候变化研究、跟踪关键环境指标和建立全球生物多样性监测平台至关重要。生物多样性遗产图书馆(BHL)在核心生物多样性基础设施中发挥着至关重要的作用,拥有超过6000万页关于地球上生命的数字化文献。认识到BHL超过500年数据的价值,BHL员工的全球网络正在努力建立一个可扩展的数据管道,以提供BHL庞大而多样的收集的可操作的发生数据。然而,将文本内容转换为FAIR(可查找、可访问、可互操作、可重用)数据带来了挑战,因为缺少描述性元数据和来自商业文本引擎的错误频出的非结构化输出。(图1)尽管全球读者现在可以获得BHL的丰富知识,但BHL文本语料库中包含的生物多样性和气候数据的利用不足阻碍了科学研究,阻碍了保护工作的知情决策,并限制了我们对应对气候危机至关重要的生物多样性模式的理解。通过利用文本识别引擎的最新进展,以及像OpenAI的CLIP(对比语言图像预训练)这样的尖端AI(人工智能)模型和转录平台的新功能,BHL的工作人员开始处理大量的文本和图像数据,并将BHL收集的几个世纪的数据转换为计算可用的格式。最近的技术突破现在提供了一个变革性的机会,使全球生物多样性界能够从我们共同的过去中获得先见之明,并促进将历史知识纳入气候行动倡议。为了弥合历史记录上的差距,释放生物多样性遗产图书馆(BHL)的潜力,目前正在试点一项多管齐下的努力,利用创新的跨学科方法。选择这些技术方法是因为它们的效率和产生快速结果的能力,可以应用于BHL的各种材料。(图2)试点一个可扩展到6000万页的数据管道需要大量的调查、实验和资源,但通过深入了解和建立历史基线,将对全球保护工作产生可观的影响。本演讲将重点介绍BHL中OCR的识别、提取和转换为结构化数据输出。方法包括:使用Tesseract OCR引擎升级传统OCR文本,将数据质量提高20%,并公开发布40gb文本数据作为FAIR数据;评估手写文本识别(HTR)引擎(Microsoft Azure Computer Vision,谷歌Cloud Vision API(应用程序编程接口)和Amazon text),以使用Global Names Architecture开发的算法改善BHL手写档案材料中的科学名称查找;使用HTR坐标输出从收集事件中提取数据,并使用Python库Pandas DataFrame创建结构化数据;使用OpenAI的CLIP神经网络模型对BHL页面级图像进行分类,以准确识别BHL主要源材料的手写子语料库;运行A/B测试来评估人工关键转录数据提取的效率和准确性,以提供高质量的、人工审查的数据集,这些数据集可以存储在数据聚合器中。使用Tesseract OCR引擎升级传统OCR文本,将数据质量提高20%,并公开发布40gb文本数据作为FAIR数据;评估手写文本识别(HTR)引擎(Microsoft Azure Computer Vision,谷歌Cloud Vision API(应用程序编程接口)和Amazon text),以使用Global Names Architecture开发的算法改善BHL手写档案材料中的科学名称查找;使用HTR坐标输出从收集事件中提取数据,并使用Python库Pandas DataFrame创建结构化数据;使用OpenAI的CLIP神经网络模型对BHL页面级图像进行分类,以准确识别BHL主要源材料的手写子语料库;运行A/B测试来评估人工关键转录数据提取的效率和准确性,以提供高质量的、人工审查的数据集,这些数据集可以存储在数据聚合器中。BHL相关生物多样性和气候相关数据集的可扩展数据管道的持续发展需要生物多样性社区的持续支持和伙伴关系。初步结果表明,从档案和手写的实地记录中解放数据是艰巨的,但却是可行的。将这些方法扩展到更广泛的科学文献中提供了新的研究机会。 从非结构化文本来源中提取和规范化数据可以显著推进生物多样性研究并为环境政策提供信息。生物多样性遗产图书馆的工作人员致力于建立多个可扩展的数据管道,最终目标是建立一个全球生物多样性知识图谱,丰富的相互关联的数据和语义,为保护和可持续管理地球生物多样性提供明智的决策。
{"title":"Unearthing the Past for a Sustainable Future: Extracting and transforming data in the Biodiversity Heritage Library for climate action","authors":"JJ Dearborn, Mike Lichtenberg, Joel Richard, Joseph deVeer, Michael Trizna, Katie Mika","doi":"10.3897/biss.7.112436","DOIUrl":"https://doi.org/10.3897/biss.7.112436","url":null,"abstract":"As the urgency to address the climate crisis intensifies, the availability of accurate and comprehensive biodiversity data has become crucial for informing climate change studies, tracking key environmental indicators, and building global biodiversity monitoring platforms. The Biodiversity Heritage Library (BHL) plays a vital role in the core biodiversity infrastructure, housing over 60 million pages of digitized literature about life on Earth. Recognizing the value of over 500 years of data in BHL, a global network of BHL staff is working to establish a scalable data pipeline to provide actionable occurrence data from BHL’s vast and diverse collections. However, transforming textual content into FAIR (findable, accessible, interoperable, reusable) data poses challenges due to missing descriptive metadata and error-ridden unstructured outputs from commercial text engines. (Fig. 1) Despite the wealth of knowledge in BHL now available to global audiences, the underutilization of biodiversity and climate data contained in BHL's textual corpus hinders scientific research, hampers informed decision-making for conservation efforts, and limits our understanding of biodiversity patterns crucial for addressing the climate crisis. By leveraging recent advancements in text recognition engines, along with cutting-edge AI (Artificial Intelligence) models like OpenAI’s CLIP (Contrastive Language-Image Pre-Training) and nascent features in transcription platforms, BHL staff are beginning to process vast amounts of textual and image data and transform centuries worth of data from BHL collections into computationally usable formats. Recent technological breakthroughs now offer a transformative opportunity to empower the global biodiversity community with prescient insights from our shared past and facilitate the integration of historical knowledge into climate action initiatives. To bridge gaps in the historical record and unlock the potential of the Biodiversity Heritage Library (BHL), a multi-pronged effort utilizing innovative cross-disciplinary approaches is being piloted. These technical approaches were selected for their efficiency and ability to generate rapid results that could be applied across the diverse range of materials in BHL. (Fig. 2) Piloting a data pipeline that is scalable to 60 million pages requires considerable investigation, experimentation, and resources but will have an appreciable impact on global conservation efforts by informing and establishing historic baselines deeper into time. This presentation will focus on the identification, extraction, and transformation of OCR into structured data outputs in BHL. Approaches include: Upgrading legacy OCR text using Tesseract OCR engine to improve data quality by 20% and openly publish 40 GBs of textual data as FAIR data; Evaluating handwritten text recognition (HTR) engines (Microsoft Azure Computer Vision, Google Cloud Vision API (Application Programming Interface), and Amazon Textract) to im","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135981675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
"Publish First": A Rapid, GPT-4 Based Digitisation System for Small Institutes with Minimal Resources “出版优先”:一个快速的、基于GPT-4的数字化系统,用于资源最少的小型研究所
Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112428
Rukaya Johaadien, Michal Torma
We present a streamlined technical solution ("Publish First") designed to assist smaller, resource-constrained herbaria in rapidly publishing their specimens to the Global Biodiversity Information Facility (GBIF). Specimen data from smaller herbaria, particularly those in biodiversity-rich regions of the world, provide a valuable and often unique contribution to the global pool of biodiversity knowledge (Marsico et al. 2020). However, these institutions often face challenges not applicable to larger herbaria, including a lack of staff with technical skills, limited staff hours for digitization work, inadequate financial resources for specialized scanning equipment, cameras, lights, and imaging stands, limited (or no) access to computers and collection management software, and unreliable internet connections. Data-scarce and biodiversity rich countries are also often linguistically diverse (Gorenflo et al. 2012), and staff may not have English skills, which means pre-existing online data publication resources and guides are of limited use. The "Publish First" method we are trialing, addresses several of these issues: it drastically simplifies the publication process so technical skills are not necessary; it minimizes administrative tasks saving time; it uses simple, cheap and easily available hardware; it does not require any specialized software; and the process is so simple that there is little to no need for any written instructions. "Publish first" requires staff to attach QR code labels containing identifiers to herbarium specimen sheets, scan these sheets using a document scanner costing around €300, then drag and drop these files to an S3 bucket (a cloud container that specialises in storing files). Subsequently, these images are automatically processed through an Optical Character Recognition (OCR) service to extract text, which is then passed on to OpenAI's Generative Pre-Transformer 4 (GPT-4) Application Programming Interface (API), for standardization. The standardized data is integrated into a Darwin Core Archive file that is automatically published through GBIF's Integrated Publishing Toolkit (IPT) (GBIF 2021). The most technically challenging aspect of this project has been the standardization of OCR data to Darwin Core using the GPT-4 API, particularly in crafting precise prompts to address the inherent inconsistency and lack of reliability in these Large Language Models (LLMs). Despite this, GPT-4 outperformed our manual scraping efforts. Our choice of GPT-4 as a model was a naive one: we implemented the workflow on some pre-digitized specimens from previously published Norwegian collections, compared the published data on GBIF with GPT-4's Darwin Core standardized output, and found the results satisfactory. Moving forward, we plan to undertake more rigorous additional research to compare the effectiveness and cost-efficiency of different LLMs as Darwin Core standardization engines. We are also particularly interested in exploring
我们提出了一个简化的技术解决方案(“出版优先”),旨在帮助较小的、资源有限的植物标本馆快速将其标本发布到全球生物多样性信息设施(GBIF)。来自小型植物标本馆的标本数据,特别是来自世界上生物多样性丰富地区的标本数据,为全球生物多样性知识库提供了宝贵且往往独特的贡献(Marsico et al. 2020)。然而,这些机构经常面临与大型植物标本馆不同的挑战,包括缺乏具有技术技能的工作人员,数字化工作的工作时间有限,专门扫描设备,相机,灯和成像架的财政资源不足,有限(或没有)访问计算机和收集管理软件,以及不可靠的互联网连接。数据稀缺和生物多样性丰富的国家往往语言多样(Gorenflo et al. 2012),工作人员可能不具备英语技能,这意味着现有的在线数据出版资源和指南的用途有限。我们正在尝试的“发布优先”方法解决了其中几个问题:它极大地简化了发布过程,因此不需要技术技能;它最大限度地减少管理任务,节省时间;它使用简单、廉价和容易获得的硬件;它不需要任何专门的软件;这个过程非常简单,几乎不需要任何书面说明。“先发布”要求工作人员将包含标识符的QR码标签贴在植物标本上,使用成本约为300欧元的文档扫描仪扫描这些样本,然后将这些文件拖放到S3桶(一种专门存储文件的云容器)中。随后,通过光学字符识别(OCR)服务对这些图像进行自动处理以提取文本,然后将其传递给OpenAI的生成预变形4 (GPT-4)应用程序编程接口(API)进行标准化。标准化数据被集成到达尔文核心档案文件中,该文件通过GBIF的集成出版工具包(IPT) (GBIF 2021)自动发布。该项目最具技术挑战的方面是使用GPT-4 API将OCR数据标准化到Darwin Core,特别是在制作精确提示以解决这些大型语言模型(llm)中固有的不一致和缺乏可靠性。尽管如此,GPT-4还是比我们手工抓取的效果好。我们选择GPT-4作为模型是一个幼稚的选择:我们在之前发表的挪威馆藏的一些预数字化标本上实施了工作流,将GBIF上发表的数据与GPT-4的达尔文核心标准化输出进行了比较,发现结果令人满意。展望未来,我们计划进行更严格的额外研究,以比较不同llm作为达尔文核心标准化引擎的有效性和成本效率。我们还对探索GPT-4 API中添加的新“函数调用”特性特别感兴趣,因为它承诺允许我们以更一致和结构化的格式检索标准化数据。这一工作流程目前正在塔吉克斯坦试用,不久的将来可能会在乌兹别克斯坦、亚美尼亚和意大利使用。
{"title":"\"Publish First\": A Rapid, GPT-4 Based Digitisation System for Small Institutes with Minimal Resources","authors":"Rukaya Johaadien, Michal Torma","doi":"10.3897/biss.7.112428","DOIUrl":"https://doi.org/10.3897/biss.7.112428","url":null,"abstract":"We present a streamlined technical solution (\"Publish First\") designed to assist smaller, resource-constrained herbaria in rapidly publishing their specimens to the Global Biodiversity Information Facility (GBIF). Specimen data from smaller herbaria, particularly those in biodiversity-rich regions of the world, provide a valuable and often unique contribution to the global pool of biodiversity knowledge (Marsico et al. 2020). However, these institutions often face challenges not applicable to larger herbaria, including a lack of staff with technical skills, limited staff hours for digitization work, inadequate financial resources for specialized scanning equipment, cameras, lights, and imaging stands, limited (or no) access to computers and collection management software, and unreliable internet connections. Data-scarce and biodiversity rich countries are also often linguistically diverse (Gorenflo et al. 2012), and staff may not have English skills, which means pre-existing online data publication resources and guides are of limited use. The \"Publish First\" method we are trialing, addresses several of these issues: it drastically simplifies the publication process so technical skills are not necessary; it minimizes administrative tasks saving time; it uses simple, cheap and easily available hardware; it does not require any specialized software; and the process is so simple that there is little to no need for any written instructions. \"Publish first\" requires staff to attach QR code labels containing identifiers to herbarium specimen sheets, scan these sheets using a document scanner costing around €300, then drag and drop these files to an S3 bucket (a cloud container that specialises in storing files). Subsequently, these images are automatically processed through an Optical Character Recognition (OCR) service to extract text, which is then passed on to OpenAI's Generative Pre-Transformer 4 (GPT-4) Application Programming Interface (API), for standardization. The standardized data is integrated into a Darwin Core Archive file that is automatically published through GBIF's Integrated Publishing Toolkit (IPT) (GBIF 2021). The most technically challenging aspect of this project has been the standardization of OCR data to Darwin Core using the GPT-4 API, particularly in crafting precise prompts to address the inherent inconsistency and lack of reliability in these Large Language Models (LLMs). Despite this, GPT-4 outperformed our manual scraping efforts. Our choice of GPT-4 as a model was a naive one: we implemented the workflow on some pre-digitized specimens from previously published Norwegian collections, compared the published data on GBIF with GPT-4's Darwin Core standardized output, and found the results satisfactory. Moving forward, we plan to undertake more rigorous additional research to compare the effectiveness and cost-efficiency of different LLMs as Darwin Core standardization engines. We are also particularly interested in exploring ","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135981805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Safeguarding Access to 500 Years of Biodiversity Data: Sustainability planning for the Biodiversity Heritage Library 保护获取500年生物多样性数据:生物多样性遗产图书馆的可持续性规划
Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112430
Martin Kalfatovic, Bianca Crowley, JJ Dearborn, Colleen Funkhouser, David Iggulden, Kelli Trei, Elisa Herrmann, Kevin Merriman
The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. Headquartered at Smithsonian Libraries and Archives (SLA), BHL is a global consortium of research institutions working together to build and maintain a critical piece of biodiversity data infrastructure. BHL provides free access to over 60 million pages of biodiversity content from the 15th–21st centuries. BHL works with the biodiversity community to develop tools and services to facilitate greater access, interoperability, and reuse of content and data. Through taxonomic intelligence tools developed by Global Names Architecture, BHL has indexed more than 230 million instances of taxonomic names throughout its collection, allowing researchers to locate publications about specific taxa. BHL also works to bring historical literature into the modern network of scholarly research by retroactively assigning DOIs (digital object identifiers) and making this historical content more discoverable and trackable. Biodiversity databases such as the Catalogue of Life, International Plant Names Index, Tropicos, World Register of Marine Species, and iNaturalist, rely on literature housed in BHL. Locked within its 60 million pages are valuable species occurrence data and observations from expeditions. To make this data FAIR (findable, accessible, interoperable, and reusable), BHL and its partners are working on a data pipeline to transform textual content into actionable data that can be deposited into data aggregators such as the Global Biodiversity Information Facility (GBIF). BHL’s shared vision began in 2006 among a small community of passionate librarians, technologists, and biodiversity researchers. Uniting as a consortium, BHL received grant funding to build and launch its digital library. BHL partners received additional grant funding for further technical development and targeted digitization projects. When initial grant funding ended in 2012, BHL established an annual dues model for its Members and Affiliates to help support central BHL operating expenses and technical development. This dues model continues today, along with in-kind contributions of staff time from Members and Affiliates. Significant funding is also provided by the Smithsonian in the form of an annual U.S. federal allocation, endowment funds, and SLA cost subvention, to host the technical infrastructure and Secretariat staff. BHL also relies on user donations to support its program. Though BHL has diversified funding streams over the years, it relies heavily on a few key institutions to cover operating costs. Though these institutions have overarching open access, research, and sustainability goals, priorities and resources to achieve these goals shift over time. Without long-term commitments, institutions may choose to prioritize new projects over established programs. Many BHL contributors have experienced funding loss for digitization projects, reducin
生物多样性遗产图书馆(BHL)是世界上最大的生物多样性文献和档案开放获取数字图书馆。BHL总部位于史密森尼图书馆和档案馆(SLA),是一个由研究机构组成的全球联盟,致力于建立和维护生物多样性数据基础设施的关键部分。BHL免费提供超过6000万页的15 - 21世纪生物多样性内容。BHL与生物多样性社区合作开发工具和服务,以促进内容和数据的更大访问、互操作性和重用。通过全球名称架构公司开发的分类智能工具,BHL已经索引了其收集的超过2.3亿个分类名称实例,使研究人员能够定位有关特定分类群的出版物。BHL还致力于通过追溯分配doi(数字对象标识符)并使这些历史内容更易于发现和跟踪,将历史文献带入现代学术研究网络。生物多样性数据库,如《生命目录》、《国际植物名称索引》、《热带》、《世界海洋物种名录》和《自然主义者》,都依赖于BHL的文献。在它的6000万页中,有宝贵的物种发生数据和探险观察结果。为了使这些数据公平(可查找、可访问、可互操作和可重用),BHL及其合作伙伴正在研究一个数据管道,将文本内容转换为可操作的数据,这些数据可以存储到全球生物多样性信息设施(GBIF)等数据聚合器中。BHL的共同愿景始于2006年,由热情的图书馆员、技术人员和生物多样性研究人员组成的小社区。BHL作为一个财团联合起来,获得了建设和启动数字图书馆的拨款。BHL合作伙伴获得了额外的赠款资金,用于进一步的技术开发和有针对性的数字化项目。2012年初始拨款结束后,BHL为其会员和附属机构建立了年度会费模式,以帮助支持BHL的核心运营费用和技术开发。这种会费模式一直延续到今天,会员和附属机构的工作人员也以实物形式贡献时间。史密森学会还以年度美国联邦拨款、捐赠基金和SLA费用补助的形式提供大量资金,用于托管技术基础设施和秘书处工作人员。BHL还依靠用户捐款来支持其项目。尽管BHL多年来实现了资金流的多元化,但它严重依赖几家关键机构来支付运营成本。尽管这些机构有总体的开放获取、研究和可持续性目标,但实现这些目标的优先事项和资源会随着时间的推移而变化。如果没有长期承诺,机构可能会选择优先考虑新项目而不是现有项目。许多BHL的贡献者都经历过数字化项目的资金损失,这降低了BHL向新内容添加的速度。中央工作人员和技术基础设施的进一步资金损失将使BHL从一个数据丰富的技术项目减少到一个不受支持和废弃的平台。如果没有维护和改进技术基础设施的长期承诺,BHL的终止将导致无数来自生物多样性数据库、图书馆目录、维基数据和其他网络聚合器的链接中断;对依赖BHL引用和物种数据的现有第三方项目的不利影响;消除更公平、更自由地获取生物多样性知识的机会。为了继续履行其使命,BHL必须增加和改善其与更广泛的生物多样性基础设施的数据整合,并确保可持续的未来。确保这一未来将需要外部专业知识,使资金来源多样化,重新获得现有合作伙伴的支持,并确定新的利益攸关方的支持。在BHL的创始讨论期间,利益相关者一致认为,在全球范围内开展生物多样性科学的唯一途径是通过合作。一个机构无法独自领导。展望未来,这一当务之急还必须包括协作供资模式。与全球生物数据联盟(GBC)等倡议合作,可以建立一个更强大、更有弹性的生物多样性基础设施。通过持续的合作、创新和坚定不移的开放获取承诺,BHL将继续在全球范围内改变研究,并为研究人员提供他们研究、探索和保护地球生命所需的工具。
{"title":"Safeguarding Access to 500 Years of Biodiversity Data: Sustainability planning for the Biodiversity Heritage Library","authors":"Martin Kalfatovic, Bianca Crowley, JJ Dearborn, Colleen Funkhouser, David Iggulden, Kelli Trei, Elisa Herrmann, Kevin Merriman","doi":"10.3897/biss.7.112430","DOIUrl":"https://doi.org/10.3897/biss.7.112430","url":null,"abstract":"The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. Headquartered at Smithsonian Libraries and Archives (SLA), BHL is a global consortium of research institutions working together to build and maintain a critical piece of biodiversity data infrastructure. BHL provides free access to over 60 million pages of biodiversity content from the 15th–21st centuries. BHL works with the biodiversity community to develop tools and services to facilitate greater access, interoperability, and reuse of content and data. Through taxonomic intelligence tools developed by Global Names Architecture, BHL has indexed more than 230 million instances of taxonomic names throughout its collection, allowing researchers to locate publications about specific taxa. BHL also works to bring historical literature into the modern network of scholarly research by retroactively assigning DOIs (digital object identifiers) and making this historical content more discoverable and trackable. Biodiversity databases such as the Catalogue of Life, International Plant Names Index, Tropicos, World Register of Marine Species, and iNaturalist, rely on literature housed in BHL. Locked within its 60 million pages are valuable species occurrence data and observations from expeditions. To make this data FAIR (findable, accessible, interoperable, and reusable), BHL and its partners are working on a data pipeline to transform textual content into actionable data that can be deposited into data aggregators such as the Global Biodiversity Information Facility (GBIF). BHL’s shared vision began in 2006 among a small community of passionate librarians, technologists, and biodiversity researchers. Uniting as a consortium, BHL received grant funding to build and launch its digital library. BHL partners received additional grant funding for further technical development and targeted digitization projects. When initial grant funding ended in 2012, BHL established an annual dues model for its Members and Affiliates to help support central BHL operating expenses and technical development. This dues model continues today, along with in-kind contributions of staff time from Members and Affiliates. Significant funding is also provided by the Smithsonian in the form of an annual U.S. federal allocation, endowment funds, and SLA cost subvention, to host the technical infrastructure and Secretariat staff. BHL also relies on user donations to support its program. Though BHL has diversified funding streams over the years, it relies heavily on a few key institutions to cover operating costs. Though these institutions have overarching open access, research, and sustainability goals, priorities and resources to achieve these goals shift over time. Without long-term commitments, institutions may choose to prioritize new projects over established programs. Many BHL contributors have experienced funding loss for digitization projects, reducin","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135980587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAIR but not Necessarily Open: Sensitive data in the domain of biodiversity 公平但不一定开放:生物多样性领域的敏感数据
Pub Date : 2023-09-08 DOI: 10.3897/biss.7.112296
Patricia Mergen, S. Meeus, F. Leliaert
In the framework of implementing the European Open Science Cloud (EOSC), there is still confusion between the concept of data FAIRness (Findable, Accessible, Interoperable and Re-usable, Wilkinson et al. 2016) and the idea of open and freely accessible data, which are not necessarily the same. Data can indeed comply with the requirements of FAIRness even if their access is moderated or behind a paywall. Therefore the motto of EOSC is actually “As open as possible, as closed as necessary”. This confusion or misinterpretation of definitions has raised concerns among potential data providers who fear being obligated to make sensitive data openly accessible and freely available, even if there are valid reasons for restrictions, or to forfeit any charges or hamper profit making if the data generate revenue. As a result, there has been some reluctance to fully engage in the activities related to FAIR data and the EOSC. When addressing sensitive data, what comes to mind are personal data governed by the General Data Protection Regulation (GDPR), as well as clinical, security, military, or commercially valuable data protected by patents. In the domain of biodiversity or natural history collections, it is often reported that these issues surrounding sensitive data regulations have less impact, especially when contributors are properly cited and embargo periods are respected. However, there are cases in this domain where sensitive data must be considered for legal or ethical purposes. Examples include protected or endangered species, where the exact geographic coordinates might not be shared openly to avoid poaching; cases of Access and Benefit sharing (ABS), depending on the country of origin of the species; the respect of traditional knowledge; and a desire to limit the commercial exploitation of the data. The requirements of the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity, as well as the upcoming Digital Sequence Information regulations (DSI), play an important role here. The Digital Services Act (DSA) was recently adopted with the aim of the protection of the digital space against the spread of illegal content, which sets the interoperability requirements for operators of data spaces. This raises questions on the actual definition of data spaces and how they would be affected by this new European legislation but with a worldwide impact on widely used social media and content platforms such as Google or YouTube. During the implementation and updating activities in projects and initiatives like Biodiversity Community Integrated Knowledge Library (BiCIKL), it became clear that there is a need to offer a secure data repository and management system that can deal with both open and non-open data in order to effectively include all potential data providers and mobilise their content while adhering to FAIR requirements. In this talk, a
在实施欧洲开放科学云(EOSC)的框架中,数据公平性(可查找、可访问、可互操作和可重用,Wilkinson et al. 2016)的概念与开放和自由访问数据的概念之间仍然存在混淆,这两个概念不一定相同。数据确实可以符合公平的要求,即使它们的访问是经过审核的或在付费墙后面。因此,EOSC的座右铭实际上是“尽可能开放,必要时尽可能封闭”。这种对定义的混淆或误解引起了潜在数据提供者的担忧,他们担心有义务公开访问和免费提供敏感数据,即使有正当理由进行限制,或者在数据产生收入时放弃任何收费或妨碍盈利。因此,有些人不愿意充分参与与FAIR数据和EOSC有关的活动。在处理敏感数据时,首先想到的是受通用数据保护条例(GDPR)管辖的个人数据,以及受专利保护的临床、安全、军事或商业价值数据。在生物多样性或自然历史收藏领域,经常有报道称,围绕敏感数据法规的这些问题影响较小,特别是在适当引用贡献者和尊重禁运期的情况下。但是,在这个领域中,出于法律或道德目的必须考虑敏感数据的情况。例子包括受保护或濒临灭绝的物种,它们的确切地理坐标可能不会公开分享,以避免偷猎;获取和惠益分享(ABS)案例,具体取决于物种的原产国;尊重传统知识;以及限制数据商业利用的愿望。《生物多样性公约关于获取遗传资源和公平公正分享利用遗传资源所产生惠益的名古屋议定书》的要求以及即将出台的《数字序列信息条例》(DSI)在这方面发挥着重要作用。最近通过了《数字服务法案》(DSA),旨在保护数字空间免受非法内容的传播,该法案规定了数据空间运营商的互操作性要求。这就提出了关于数据空间的实际定义的问题,以及它们将如何受到这项新的欧洲立法的影响,但对b谷歌或YouTube等广泛使用的社交媒体和内容平台产生全球影响。在生物多样性社区综合知识图书馆(BiCIKL)等项目和倡议的实施和更新活动中,很明显,需要提供一个安全的数据存储库和管理系统,可以处理开放和非开放数据,以便有效地包括所有潜在的数据提供者,并在遵守FAIR要求的同时调动他们的内容。在本次演讲中,在对敏感数据进行了一般性介绍之后,我们将提供生物多样性和自然科学领域中如何处理敏感数据及其管理的几个例子,例如GBIF推荐的。最后,但并非最不重要的是,我们将强调在BiCIKL实施的生物多样性知识中心(BKH)的背景下,使用生物多样性信息标准(TDWG)等国际公认的标准来实现这些发展的重要性。值得注意的是,通过提供有关使用条款、引用要求和许可的清晰元数据,可以合法有效地对数据进行实际再利用。
{"title":"FAIR but not Necessarily Open: Sensitive data in the domain of biodiversity","authors":"Patricia Mergen, S. Meeus, F. Leliaert","doi":"10.3897/biss.7.112296","DOIUrl":"https://doi.org/10.3897/biss.7.112296","url":null,"abstract":"In the framework of implementing the European Open Science Cloud (EOSC), there is still confusion between the concept of data FAIRness (Findable, Accessible, Interoperable and Re-usable, Wilkinson et al. 2016) and the idea of open and freely accessible data, which are not necessarily the same. Data can indeed comply with the requirements of FAIRness even if their access is moderated or behind a paywall. Therefore the motto of EOSC is actually “As open as possible, as closed as necessary”. This confusion or misinterpretation of definitions has raised concerns among potential data providers who fear being obligated to make sensitive data openly accessible and freely available, even if there are valid reasons for restrictions, or to forfeit any charges or hamper profit making if the data generate revenue. As a result, there has been some reluctance to fully engage in the activities related to FAIR data and the EOSC.\u0000 When addressing sensitive data, what comes to mind are personal data governed by the General Data Protection Regulation (GDPR), as well as clinical, security, military, or commercially valuable data protected by patents. In the domain of biodiversity or natural history collections, it is often reported that these issues surrounding sensitive data regulations have less impact, especially when contributors are properly cited and embargo periods are respected. However, there are cases in this domain where sensitive data must be considered for legal or ethical purposes. Examples include protected or endangered species, where the exact geographic coordinates might not be shared openly to avoid poaching; cases of Access and Benefit sharing (ABS), depending on the country of origin of the species; the respect of traditional knowledge; and a desire to limit the commercial exploitation of the data. The requirements of the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity, as well as the upcoming Digital Sequence Information regulations (DSI), play an important role here. The Digital Services Act (DSA) was recently adopted with the aim of the protection of the digital space against the spread of illegal content, which sets the interoperability requirements for operators of data spaces. This raises questions on the actual definition of data spaces and how they would be affected by this new European legislation but with a worldwide impact on widely used social media and content platforms such as Google or YouTube.\u0000 During the implementation and updating activities in projects and initiatives like Biodiversity Community Integrated Knowledge Library (BiCIKL), it became clear that there is a need to offer a secure data repository and management system that can deal with both open and non-open data in order to effectively include all potential data providers and mobilise their content while adhering to FAIR requirements.\u0000 In this talk, a","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76593348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integration of Ecosystem Services and Habitats into the Biodiversity Atlas Austria 奥地利生物多样性地图集中生态系统服务和栖息地的整合
Pub Date : 2023-09-08 DOI: 10.3897/biss.7.112315
Tanja Lumetsberger, Georg Neubauer, Reinhardt Wenzina
The Biodiversity Atlas Austria (“Biodiversitäts-Atlas Österreich”) is a data portal to explore Austria’s biodiversity. It is based on the open-source infrastructure of the Atlas of Living Australia (ALA) and was launched with support of the Living Atlas (LA) community in late 2019 by the Biodiversity Hub of the University of Continuing Education Krems funded by the Government of Lower Austria. At present, it stores more than 8.5 million species occurrence records from various data partners and institutions and is available in both English and German. The Atlas is running on two virtual machines with 4 TB storage and is hosting many of the ALA-developed tools and services such as collectory, biocache, biodiversity information explorer, regions, spatial portal, sensitive data service, lists, images, and dashboard. In the project “ÖKOLEITA” (2021-2023), two new tools were developed within the existing LA infrastructure and will be launched in late 2023 to allow users to deal with ecosystem services and habitat data. The “ecosys”-tool will allow management, visualization, and analysis of ecosystem services by uploading different (raster or vector) TIFF files containing mapped ecosystem services to the geoserver. Users will be able to inspect various ecosystem services at a specific geolocation or compare different geolocations or a transect on their respective ecosystem service potential. The ecosystem service values are presented on the one hand as pictograms, where the value is transformed into quintiles, orienting on the work by Schreder et al. (2018), and as bar chart showing the true values. The “habitat” tool will store and manage datasets of habitat mappings (shapefiles) and allow users to spatially explore those various habitat mappings on a map. Users will be able to search for specific habitats across all datasets or a specific one and get all occurrences of this habitat type returned. Through linkage to the biocache, a click on a specific area reveals the list of species found within that habitat recording, as well as all the species occurrences within that area stored in the database. A “habitat backbone” of the most used habitat classifications in Austria will allow dealing with habitat mappings that use different classifications. Both tools are integrated into the Living Atlases infrastructure and communicate with the other tools and services of the Biodiversity Atlas Austria (Fig. 1). They share a common administration back-end but have different front-ends, where the users can explore the ecosystem services and habitats spatially and in connection with species occurrence records and other contextual information.
奥地利生物多样性地图集(“Biodiversitäts-Atlas Österreich”)是探索奥地利生物多样性的数据门户。它基于澳大利亚生活地图集(ALA)的开源基础设施,由下奥地利州政府资助的克雷姆斯继续教育大学生物多样性中心于2019年底在生活地图集(LA)社区的支持下启动。目前,它存储了来自不同数据合作伙伴和机构的850多万物种发生记录,并提供英语和德语两种版本。Atlas在两个4tb存储空间的虚拟机上运行,并托管了许多ala开发的工具和服务,如集合、生物缓存、生物多样性信息浏览器、区域、空间门户、敏感数据服务、列表、图像和仪表板。在“ÖKOLEITA”项目(2021-2023)中,在现有的洛杉矶基础设施中开发了两个新工具,并将于2023年底启动,以允许用户处理生态系统服务和栖息地数据。“ecosys”工具将允许通过上传不同的(栅格或矢量)TIFF文件来管理、可视化和分析生态系统服务,这些文件包含映射的生态系统服务到geoserver。用户将能够在特定地理位置检查各种生态系统服务,或比较不同地理位置或样带各自的生态系统服务潜力。根据Schreder等人(2018)的研究,生态系统服务价值一方面以象形图的形式呈现,其中价值被转换为五分位数,另一方面以柱状图的形式显示真实价值。“栖息地”工具将存储和管理栖息地映射数据集(shapefiles),并允许用户在地图上对这些不同的栖息地映射进行空间探索。用户将能够在所有数据集中搜索特定的栖息地,或者搜索特定的栖息地,并返回该栖息地类型的所有出现情况。通过与生物缓存的连接,点击一个特定区域,就会显示在该栖息地记录中发现的物种列表,以及数据库中存储的该区域内所有物种的出现情况。奥地利最常用的栖息地分类的“栖息地主干”将允许处理使用不同分类的栖息地映射。这两个工具都集成到Living Atlas基础设施中,并与奥地利生物多样性地图集的其他工具和服务进行通信(图1)。它们共享一个共同的管理后端,但具有不同的前端,用户可以在空间上探索生态系统服务和栖息地,并与物种发生记录和其他上下文信息联系起来。
{"title":"Integration of Ecosystem Services and Habitats into the Biodiversity Atlas Austria","authors":"Tanja Lumetsberger, Georg Neubauer, Reinhardt Wenzina","doi":"10.3897/biss.7.112315","DOIUrl":"https://doi.org/10.3897/biss.7.112315","url":null,"abstract":"The Biodiversity Atlas Austria (“Biodiversitäts-Atlas Österreich”) is a data portal to explore Austria’s biodiversity. It is based on the open-source infrastructure of the Atlas of Living Australia (ALA) and was launched with support of the Living Atlas (LA) community in late 2019 by the Biodiversity Hub of the University of Continuing Education Krems funded by the Government of Lower Austria. At present, it stores more than 8.5 million species occurrence records from various data partners and institutions and is available in both English and German. The Atlas is running on two virtual machines with 4 TB storage and is hosting many of the ALA-developed tools and services such as collectory, biocache, biodiversity information explorer, regions, spatial portal, sensitive data service, lists, images, and dashboard.\u0000 In the project “ÖKOLEITA” (2021-2023), two new tools were developed within the existing LA infrastructure and will be launched in late 2023 to allow users to deal with ecosystem services and habitat data.\u0000 The “ecosys”-tool will allow management, visualization, and analysis of ecosystem services by uploading different (raster or vector) TIFF files containing mapped ecosystem services to the geoserver. Users will be able to inspect various ecosystem services at a specific geolocation or compare different geolocations or a transect on their respective ecosystem service potential. The ecosystem service values are presented on the one hand as pictograms, where the value is transformed into quintiles, orienting on the work by Schreder et al. (2018), and as bar chart showing the true values.\u0000 The “habitat” tool will store and manage datasets of habitat mappings (shapefiles) and allow users to spatially explore those various habitat mappings on a map. Users will be able to search for specific habitats across all datasets or a specific one and get all occurrences of this habitat type returned. Through linkage to the biocache, a click on a specific area reveals the list of species found within that habitat recording, as well as all the species occurrences within that area stored in the database. A “habitat backbone” of the most used habitat classifications in Austria will allow dealing with habitat mappings that use different classifications.\u0000 Both tools are integrated into the Living Atlases infrastructure and communicate with the other tools and services of the Biodiversity Atlas Austria (Fig. 1). They share a common administration back-end but have different front-ends, where the users can explore the ecosystem services and habitats spatially and in connection with species occurrence records and other contextual information.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84171243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From the Shadows to the Spotlight: Unveiling Nepal's hidden kingdom of mushrooms and lichens through digitization 从阴影到聚光灯:通过数字化揭开尼泊尔隐藏的蘑菇和地衣王国
Pub Date : 2023-09-08 DOI: 10.3897/biss.7.112376
Shiva Devkota
The digitization of herbarium collections has brought forth a transformative journey, transitioning Nepal's hidden kingdom of mushrooms and lichens from the shadows into the spotlight. Through a collaborative work within the framework of Global Biodiversity Information Facility's Biodiversity Information Fund for Asia (GBIF-BIFA), involving the herbaria (KATH: Nepal's National Herbarium and Plant Laboratories and TUCH: Natural History Museum, Tribhuvan University, Nepal), and the research institute, Global Institute for Interdisciplinary Studies (GIIS), a successful unveiling of Nepal's mycological treasures has been achieved through digital means. A comprehensive digitization effort has resulted in the complete digitization of 3,971 mushroom specimens and 2,462 lichen specimens, illuminating a wealth of information for researchers, citizen scientists, and the general public. GBIF and the online database maintained by Nepal's National Herbarium and Plant Laboratories, Department of Plant Resources, serve as the gateway to this work (KATH 2021). Prior to this work, the specimens resided in the shadows, lacking the recognition they deserved. Through meticulous collection management, sorting, curation, and labeling, their secrets were unveiled, and their stories brought to our fingertips. These previously obscured specimens now possess registered individual catalogue numbers, allowing the quantification of Nepal's fungal wealth within the participating institutions. This project serves as a testament to the vital role of capturing available field-level data, preserving specimens, and harnessing the power of digitization to showcase Nepal's mycological and lichenological wonders to a global audience. Meanwhile, it has also emphasized the significance of sharing this knowledge and fostering appreciation for the overlooked world of mushrooms and lichens.
植物标本馆藏品的数字化带来了一场变革之旅,将尼泊尔隐藏的蘑菇和地衣王国从阴影中转移到聚光灯下。通过全球生物多样性信息设施亚洲生物多样性信息基金(GBIF-BIFA)框架内的合作工作,包括植物标本馆(KATH:尼泊尔国家植物标本馆和尼泊尔特里布万大学自然历史博物馆)和全球跨学科研究所(GIIS),通过数字手段成功地揭开了尼泊尔真菌学宝藏的面纱。一项全面的数字化工作已经完成了3971个蘑菇标本和2462个地衣标本的数字化,为研究人员、公民科学家和公众提供了丰富的信息。GBIF和尼泊尔植物资源部国家植物标本室和植物实验室维护的在线数据库是这项工作的门户(KATH 2021)。在这项工作之前,这些标本生活在阴影中,缺乏应有的认可。通过精心的收藏管理、分类、策展和标签,他们的秘密被揭开,他们的故事被带到我们的指尖。这些以前模糊不清的标本现在拥有注册的单个目录编号,允许在参与机构内对尼泊尔的真菌财富进行量化。该项目证明了采集现有实地数据、保存标本和利用数字化力量向全球观众展示尼泊尔真菌学和地衣学奇迹的重要作用。同时,它还强调了分享这些知识和培养对被忽视的蘑菇和地衣世界的欣赏的重要性。
{"title":"From the Shadows to the Spotlight: Unveiling Nepal's hidden kingdom of mushrooms and lichens through digitization","authors":"Shiva Devkota","doi":"10.3897/biss.7.112376","DOIUrl":"https://doi.org/10.3897/biss.7.112376","url":null,"abstract":"The digitization of herbarium collections has brought forth a transformative journey, transitioning Nepal's hidden kingdom of mushrooms and lichens from the shadows into the spotlight. Through a collaborative work within the framework of Global Biodiversity Information Facility's Biodiversity Information Fund for Asia (GBIF-BIFA), involving the herbaria (KATH: Nepal's National Herbarium and Plant Laboratories and TUCH: Natural History Museum, Tribhuvan University, Nepal), and the research institute, Global Institute for Interdisciplinary Studies (GIIS), a successful unveiling of Nepal's mycological treasures has been achieved through digital means. A comprehensive digitization effort has resulted in the complete digitization of 3,971 mushroom specimens and 2,462 lichen specimens, illuminating a wealth of information for researchers, citizen scientists, and the general public. GBIF and the online database maintained by Nepal's National Herbarium and Plant Laboratories, Department of Plant Resources, serve as the gateway to this work (KATH 2021). Prior to this work, the specimens resided in the shadows, lacking the recognition they deserved. Through meticulous collection management, sorting, curation, and labeling, their secrets were unveiled, and their stories brought to our fingertips. These previously obscured specimens now possess registered individual catalogue numbers, allowing the quantification of Nepal's fungal wealth within the participating institutions. This project serves as a testament to the vital role of capturing available field-level data, preserving specimens, and harnessing the power of digitization to showcase Nepal's mycological and lichenological wonders to a global audience. Meanwhile, it has also emphasized the significance of sharing this knowledge and fostering appreciation for the overlooked world of mushrooms and lichens.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91252599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Improved Data Flows for the Management of Invasive Alien Species and Wildlife: A LIFE RIPARIAS use case 改进外来入侵物种和野生动物管理的数据流:一个LIFE RIPARIAS用例
Pub Date : 2023-09-08 DOI: 10.3897/biss.7.112386
Lien Reyserhove, Pieter Huybrechts, J. Hillaert, T. Adriaens, B. D'hondt, Damiano Oldoni
Invasive alien species (IAS) are recognised as a major threat to biodiversity. To prevent the introduction and spread of IAS, the European Union Regulation (EU) 1143/2014 imposes an obligation on Member States to both develop management strategies for IAS of Union Concern and report on those interventions. For this, we need to collect and combine management data and streamline management actions. This is still a major challenge: the landscape of IAS management is diverse and includes different authorities, managers, businesses and non-governmental organizations. Some organizations have developed their own specific software applications for recording management actions. For other organizations, such a software system is lacking. Their management data are scattered, not harmonized, and often not openly available. For EU reporting, a workflow is needed to centralize all information about the applied management method, management effort, cost, effectiveness and impact of the performed actions on other biota or the environment. At this moment, such a workflow is lacking in Belgium. One of the aims of the LIFE RIPARIAS project is to set up a workflow for harmonizing IAS management data in Belgium. Based on the input from the IAS management community in Belgium, we were able to: draft a community-driven data model and exchange format called manIAS (MANagement of Invasive Alien Species, Reyserhove et al. 2022), and identify the minimal requirements a software application should have for being successfully used in the field (Hillaert et al. 2022). draft a community-driven data model and exchange format called manIAS (MANagement of Invasive Alien Species, Reyserhove et al. 2022), and identify the minimal requirements a software application should have for being successfully used in the field (Hillaert et al. 2022). In this presentation, we will explore both outputs, the lessons learned and the way forward. With our work, we aim to facilitate coordination and transfer of information between the different actors involved in IAS and wildlife management, not only on a Belgian scale, but also within an international context.
外来入侵物种(IAS)被认为是生物多样性的主要威胁。为了防止国际会计准则的引入和传播,欧盟法规(EU) 1143/2014规定成员国有义务为欧盟关注的国际会计准则制定管理战略,并报告这些干预措施。为此,我们需要收集和整合管理数据,简化管理操作。这仍然是一项重大挑战:国际会计准则的管理情况是多种多样的,包括不同的当局、管理人员、企业和非政府组织。一些组织已经开发了自己的特定软件应用程序来记录管理行为。对于其他组织来说,这样的软件系统是缺乏的。他们的管理数据是分散的,不协调的,而且往往不能公开获得。对于欧盟报告,需要一个工作流来集中所有关于应用的管理方法、管理努力、成本、有效性和所执行的行动对其他生物群或环境的影响的信息。目前,这样的工作流程在比利时是缺乏的。LIFE RIPARIAS项目的目标之一是建立一个工作流程,以协调比利时的IAS管理数据。基于来自比利时IAS管理社区的输入,我们能够:起草一个名为manIAS(入侵外来物种管理,Reyserhove等人,2022)的社区驱动数据模型和交换格式,并确定软件应用程序在该领域成功使用的最低要求(Hillaert等人,2022)。起草一个社区驱动的数据模型和交换格式,称为manIAS(外来入侵物种管理,Reyserhove等人,2022),并确定软件应用程序在该领域成功使用的最低要求(Hillaert等人,2022)。在本次演讲中,我们将探讨这两方面的成果、吸取的教训和前进的道路。通过我们的工作,我们的目标是促进国际生物多样性公约和野生动物管理的不同参与者之间的协调和信息传递,不仅在比利时范围内,而且在国际范围内。
{"title":"Towards Improved Data Flows for the Management of Invasive Alien Species and Wildlife: A LIFE RIPARIAS use case","authors":"Lien Reyserhove, Pieter Huybrechts, J. Hillaert, T. Adriaens, B. D'hondt, Damiano Oldoni","doi":"10.3897/biss.7.112386","DOIUrl":"https://doi.org/10.3897/biss.7.112386","url":null,"abstract":"Invasive alien species (IAS) are recognised as a major threat to biodiversity. To prevent the introduction and spread of IAS, the European Union Regulation (EU) 1143/2014 imposes an obligation on Member States to both develop management strategies for IAS of Union Concern and report on those interventions. For this, we need to collect and combine management data and streamline management actions. This is still a major challenge: the landscape of IAS management is diverse and includes different authorities, managers, businesses and non-governmental organizations. Some organizations have developed their own specific software applications for recording management actions. For other organizations, such a software system is lacking. Their management data are scattered, not harmonized, and often not openly available. For EU reporting, a workflow is needed to centralize all information about the applied management method, management effort, cost, effectiveness and impact of the performed actions on other biota or the environment. At this moment, such a workflow is lacking in Belgium.\u0000 One of the aims of the LIFE RIPARIAS project is to set up a workflow for harmonizing IAS management data in Belgium. Based on the input from the IAS management community in Belgium, we were able to:\u0000 \u0000 \u0000 \u0000 draft a community-driven data model and exchange format called manIAS (MANagement of Invasive Alien Species, Reyserhove et al. 2022), and\u0000 \u0000 \u0000 identify the minimal requirements a software application should have for being successfully used in the field (Hillaert et al. 2022).\u0000 \u0000 \u0000 \u0000 draft a community-driven data model and exchange format called manIAS (MANagement of Invasive Alien Species, Reyserhove et al. 2022), and\u0000 identify the minimal requirements a software application should have for being successfully used in the field (Hillaert et al. 2022).\u0000 In this presentation, we will explore both outputs, the lessons learned and the way forward. With our work, we aim to facilitate coordination and transfer of information between the different actors involved in IAS and wildlife management, not only on a Belgian scale, but also within an international context.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85347480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EMODnet Biology: Unlocking European marine biodiversity data EMODnet生物学:解锁欧洲海洋生物多样性数据
Pub Date : 2023-09-08 DOI: 10.3897/biss.7.112147
Ruben Perez Perez, J. Beja, L. Vandepitte, Marina Lipizer, B. Weigel, B. Vanhoorne
EMODnet Biology (hosted and coordinated by the Flanders Marine Institute (VLIZ)) is one of the seven themes within the European Marine Observation and Data network (EMODnet). The EMODnet Biology consortium aims to facilitate the accessibility and usage of marine biodiversity data. With the principle of "collect once, use many times" at its core, EMODnet Biology fosters collaboration across various sectors, including research, policy-making, industry, and individual citizens, to enhance knowledge sharing and inform decision-making. EMODnet Biology focuses on providing free and open access to comprehensive historical and recent data on the occurrence of marine species and their traits in all European regional seas. It achieves this through partnerships and collaboration with diverse international initiatives, such as the World Register of Marine Species (WoRMS), Marine Regions and the European node of the Ocean Biodiversity Information System (EurOBIS) among others. By promoting the usage of the Darwin Core Standard (Wieczorek et al. 2012), EMODnet Biology fosters data interoperability and ensures seamless integration with wider networks such as the Global Biodiversity Information Facility (GBIF) and the Ocean Biodiversity Information System (OBIS), serving as a significant data provider of the latter, as it is responsible for most of its data generated in Europe. Since its inception, EMODnet Biology has undertaken actions covering various areas, including providing access to marine biological data with spatio-temporal, taxonomic, environmental- and sampling-related information among others; developing an exhaustive data quality control tool based on the Darwin Core standard, the British Oceanographic Data Centre and Natural Environment Research Council Vocabulary Server (BODC NVS2) parameters and other controlled vocabularies used; creating and providing training courses to guide data providers; performing gap analyses to identify data quality and coverage shortcomings; creating and publishing marine biological distribution maps for various species or species groups; and interacting with international and European initiatives, projects and organizations. providing access to marine biological data with spatio-temporal, taxonomic, environmental- and sampling-related information among others; developing an exhaustive data quality control tool based on the Darwin Core standard, the British Oceanographic Data Centre and Natural Environment Research Council Vocabulary Server (BODC NVS2) parameters and other controlled vocabularies used; creating and providing training courses to guide data providers; performing gap analyses to identify data quality and coverage shortcomings; creating and publishing marine biological distribution maps for various species or species groups; and interacting with international and European initiatives, projects and organizations. Furthermore, EMODnet Biology contributes to the
EMODnet生物学(由佛兰德斯海洋研究所(VLIZ)主持和协调)是欧洲海洋观测和数据网络(EMODnet)内的七个主题之一。EMODnet生物学联盟旨在促进海洋生物多样性数据的获取和使用。EMODnet生物学以“一次收集,多次使用”为核心原则,促进包括研究、决策、行业和公民个人在内的各个部门之间的合作,以加强知识共享并为决策提供信息。EMODnet生物学的重点是提供免费和开放获取有关所有欧洲区域海洋中海洋物种发生及其特征的全面历史和近期数据。它通过与世界海洋物种登记(WoRMS)、海洋区域和海洋生物多样性信息系统欧洲节点(EurOBIS)等各种国际倡议建立伙伴关系和合作来实现这一目标。通过促进达尔文核心标准的使用(Wieczorek et al. 2012), EMODnet生物学促进了数据互操作性,并确保与更广泛的网络(如全球生物多样性信息设施(GBIF)和海洋生物多样性信息系统(OBIS))无缝集成,作为后者的重要数据提供者,因为它负责在欧洲生成的大部分数据。自成立以来,EMODnet生物学已采取了涵盖各个领域的行动,包括提供获取海洋生物数据的途径,其中包括时空、分类学、环境和采样等相关信息;基于达尔文核心标准、英国海洋学数据中心和自然环境研究委员会词汇服务器(BODC NVS2)参数和其他使用的受控词汇开发详尽的数据质量控制工具;创建和提供培训课程,以指导数据提供者;执行差距分析,以确定数据质量和覆盖范围的缺陷;创建和出版各种物种或物种群的海洋生物分布图;并与国际和欧洲的倡议、项目和组织进行互动。提供获取海洋生物数据的途径,其中包括时空、分类学、环境和采样等方面的信息;基于达尔文核心标准、英国海洋学数据中心和自然环境研究委员会词汇服务器(BODC NVS2)参数和其他使用的受控词汇开发详尽的数据质量控制工具;创建和提供培训课程,以指导数据提供者;执行差距分析,以确定数据质量和覆盖范围的缺陷;创建和出版各种物种或物种群的海洋生物分布图;并与国际和欧洲的倡议、项目和组织进行互动。此外,EMODnet生物学为整个EMODnet计划做出了贡献,该计划涵盖多学科数据和产品。由于使用了跨学科的标准协议和工具,EMODnet生物学产品可以有助于对主要海洋物种和栖息地的压力和影响进行多学科分析,最后,支持更好地管理和规划海洋空间。总之,EMODnet生物学通过为用户提供丰富的可获取和可重复使用的海洋生物多样性数据和产品,在生物多样性信息学中发挥着关键作用。它的协作方式、广泛的伙伴关系以及对FAIR(可查找、可访问、可互操作、可重用)数据原则(Wilkinson等人,2016年)、欧洲空间信息基础设施(INSPIRE)元数据技术指南(欧盟委员会联合研究中心,2013年)和开放地理空间联盟(OGC)标准的遵守,使其成为推进知识、为政策提供信息、支持海洋生态系统的可持续管理。
{"title":"EMODnet Biology: Unlocking European marine biodiversity data","authors":"Ruben Perez Perez, J. Beja, L. Vandepitte, Marina Lipizer, B. Weigel, B. Vanhoorne","doi":"10.3897/biss.7.112147","DOIUrl":"https://doi.org/10.3897/biss.7.112147","url":null,"abstract":"EMODnet Biology (hosted and coordinated by the Flanders Marine Institute (VLIZ)) is one of the seven themes within the European Marine Observation and Data network (EMODnet). The EMODnet Biology consortium aims to facilitate the accessibility and usage of marine biodiversity data. With the principle of \"collect once, use many times\" at its core, EMODnet Biology fosters collaboration across various sectors, including research, policy-making, industry, and individual citizens, to enhance knowledge sharing and inform decision-making.\u0000 EMODnet Biology focuses on providing free and open access to comprehensive historical and recent data on the occurrence of marine species and their traits in all European regional seas. It achieves this through partnerships and collaboration with diverse international initiatives, such as the World Register of Marine Species (WoRMS), Marine Regions and the European node of the Ocean Biodiversity Information System (EurOBIS) among others. By promoting the usage of the Darwin Core Standard (Wieczorek et al. 2012), EMODnet Biology fosters data interoperability and ensures seamless integration with wider networks such as the Global Biodiversity Information Facility (GBIF) and the Ocean Biodiversity Information System (OBIS), serving as a significant data provider of the latter, as it is responsible for most of its data generated in Europe.\u0000 Since its inception, EMODnet Biology has undertaken actions covering various areas, including\u0000 \u0000 \u0000 \u0000 providing access to marine biological data with spatio-temporal, taxonomic, environmental- and sampling-related information among others;\u0000 \u0000 \u0000 developing an exhaustive data quality control tool based on the Darwin Core standard, the British Oceanographic Data Centre and Natural Environment Research Council Vocabulary Server (BODC NVS2) parameters and other controlled vocabularies used;\u0000 \u0000 \u0000 creating and providing training courses to guide data providers;\u0000 \u0000 \u0000 performing gap analyses to identify data quality and coverage shortcomings;\u0000 \u0000 \u0000 creating and publishing marine biological distribution maps for various species or species groups; and\u0000 \u0000 \u0000 interacting with international and European initiatives, projects and organizations.\u0000 \u0000 \u0000 \u0000 providing access to marine biological data with spatio-temporal, taxonomic, environmental- and sampling-related information among others;\u0000 developing an exhaustive data quality control tool based on the Darwin Core standard, the British Oceanographic Data Centre and Natural Environment Research Council Vocabulary Server (BODC NVS2) parameters and other controlled vocabularies used;\u0000 creating and providing training courses to guide data providers;\u0000 performing gap analyses to identify data quality and coverage shortcomings;\u0000 creating and publishing marine biological distribution maps for various species or species groups; and\u0000 interacting with international and European initiatives, projects and organizations.\u0000 Furthermore, EMODnet Biology contributes to the ","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88483631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biodiversity Information Science and Standards
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1