Sharif Islam, Hanna Koivula, Dag Endresen, Erik Kusch, Dmitry Schigel, Wouter Addink
The Biodiversity Digital Twin (BioDT) project (2022-2025) aims to create prototypes that integrate various data sets, models, and expert domain knowledge enabling prediction capabilities and decision-making support for critical issues in biodiversity dynamics. While digital twin concepts have been applied in industries for continuous monitoring of physical phenomena, their application in biodiversity and environmental sciences presents novel challenges (Bauer et al. 2021, de Koning et al. 2023). In addition, successfully developing digital twins for biodiversity requires addressing interoperability challenges in data standards. BioDT is developing prototype digital twins based on use cases that span various data complexities, from point occurrence data to bioacoustics, covering nationwide forest states to specific communities and individual species. The project relies on FAIR principles (Findable, Accessible, Interoperable, and Reusable) and FAIR enabling resources like standards and vocabularies (Schultes et al. 2020) to enable the exchange, sharing, and reuse of biodiversity information, fostering collaboration among participating research infrastructures (DiSSCo, eLTER, GBIF, and LifeWatch) and data providers. It also involves creating a harmonised abstraction layer using Persistent Identifiers (PID) and FAIR Digital Object (FDO) records, alongside semantic mapping and crosswalk techniques to provide machine-actionable metadata (Schultes and Wittenburg 2019, Schwardmann 2020). Governance and engagement with research infrastructure stakeholders play crucial roles in this regard, with a focus on aligning technical and data standards discussions. In addition to data, models and workflows are key elements in BioDT. Models in the BioDT context are formal representations of problems or processes, implemented through equations, algorithms, or a combination of both, which can be executed by machine entities. The current twin prototypes are considering both statistical and mechanistic models, introducing significant variations in (1) data requirements, (2) modelling approaches and philosophy, and (3) model output. The BioDT consortium will develop guidelines and protocols for how to describe these models, what metadata to include, and how they will interact with the diverse datasets. While discussions on this topic exist within the broader context of biodiversity and ecological sciences (Jeltsch et al. 2013, Fer et al. 2020), the BioDT project is strongly committed to finding a solution within its scope. In the twinning context, data and models need to be executed within a computing infrastructure and also need to adhere to FAIR principles. Software within BioDT includes a suite of tools that facilitate data acquisition, storage, processing, and analysis. While some of these tools already exist, the challenge lies in integrating them within the digital twinning framework. One approach to achieving integration is through workflow representation, encompa
生物多样性数字孪生(BioDT)项目(2022-2025)旨在创建集成各种数据集、模型和专家领域知识的原型,为生物多样性动态的关键问题提供预测能力和决策支持。虽然数字孪生概念已应用于工业中对物理现象进行连续监测,但它们在生物多样性和环境科学中的应用提出了新的挑战(Bauer等人,2021年,de Koning等人,2023年)。此外,成功开发生物多样性的数字孪生需要解决数据标准中的互操作性挑战。BioDT正在开发基于使用案例的数字双胞胎原型,这些案例涵盖了各种数据复杂性,从点发生数据到生物声学,涵盖了全国森林州到特定群落和单个物种。该项目依靠FAIR原则(可查找、可访问、可互操作和可重用)和FAIR支持资源,如标准和词汇表(Schultes et al. 2020),实现生物多样性信息的交换、共享和重用,促进参与研究基础设施(disco、eLTER、GBIF和LifeWatch)和数据提供者之间的合作。它还涉及使用持久标识符(PID)和FAIR数字对象(FDO)记录创建一个协调的抽象层,以及语义映射和人行横道技术来提供机器可操作的元数据(Schultes和Wittenburg 2019, Schwardmann 2020)。治理和与研究基础设施利益相关者的接触在这方面发挥着关键作用,重点是协调技术和数据标准的讨论。除了数据之外,模型和工作流程也是生物odt的关键要素。BioDT上下文中的模型是问题或过程的形式化表示,通过方程、算法或两者的组合实现,可以由机器实体执行。目前的孪生原型同时考虑了统计模型和机械模型,在(1)数据需求,(2)建模方法和哲学,以及(3)模型输出方面引入了显著的变化。BioDT联盟将制定指导方针和协议,说明如何描述这些模型,包括哪些元数据,以及它们如何与各种数据集交互。虽然关于这一主题的讨论存在于生物多样性和生态科学的更广泛背景下(Jeltsch et al. 2013, Fer et al. 2020),但BioDT项目坚定地致力于在其范围内寻找解决方案。在孪生上下文中,数据和模型需要在计算基础设施中执行,并且还需要遵守FAIR原则。BioDT内部的软件包括一套工具,用于促进数据采集、存储、处理和分析。虽然其中一些工具已经存在,但挑战在于将它们集成到数字孪生框架中。实现集成的一种方法是通过工作流表示,包括指导数据获取、打包、处理和分析的标准化过程和协议。该项目正在探索研究对象箱(RO-Crate)的实现(Soiland-Reyes et al. 2022)。实现工作流可以确保研究实践中的再现性、可扩展性和透明度,使科学家能够验证和复制发现。BioDT项目为生物多样性研究和应用提供了一种新颖的、变革性的方法。通过利用协作研究基础设施和坚持数据标准,BioDT旨在利用数据、软件、超级计算机、模型和专业知识的力量,提供新的见解。包括生物多样性信息标准(TDWG)在内的数据标准所提供的基础,对于充分发挥数字孪生的潜力、促进各种数据源的无缝集成以及与模型的组合至关重要。
{"title":"Data Standards and Interoperability Challenges for Biodiversity Digital Twin: A novel and transformative approach to biodiversity research and application","authors":"Sharif Islam, Hanna Koivula, Dag Endresen, Erik Kusch, Dmitry Schigel, Wouter Addink","doi":"10.3897/biss.7.112373","DOIUrl":"https://doi.org/10.3897/biss.7.112373","url":null,"abstract":"The Biodiversity Digital Twin (BioDT) project (2022-2025) aims to create prototypes that integrate various data sets, models, and expert domain knowledge enabling prediction capabilities and decision-making support for critical issues in biodiversity dynamics. While digital twin concepts have been applied in industries for continuous monitoring of physical phenomena, their application in biodiversity and environmental sciences presents novel challenges (Bauer et al. 2021, de Koning et al. 2023). In addition, successfully developing digital twins for biodiversity requires addressing interoperability challenges in data standards. BioDT is developing prototype digital twins based on use cases that span various data complexities, from point occurrence data to bioacoustics, covering nationwide forest states to specific communities and individual species. The project relies on FAIR principles (Findable, Accessible, Interoperable, and Reusable) and FAIR enabling resources like standards and vocabularies (Schultes et al. 2020) to enable the exchange, sharing, and reuse of biodiversity information, fostering collaboration among participating research infrastructures (DiSSCo, eLTER, GBIF, and LifeWatch) and data providers. It also involves creating a harmonised abstraction layer using Persistent Identifiers (PID) and FAIR Digital Object (FDO) records, alongside semantic mapping and crosswalk techniques to provide machine-actionable metadata (Schultes and Wittenburg 2019, Schwardmann 2020). Governance and engagement with research infrastructure stakeholders play crucial roles in this regard, with a focus on aligning technical and data standards discussions. In addition to data, models and workflows are key elements in BioDT. Models in the BioDT context are formal representations of problems or processes, implemented through equations, algorithms, or a combination of both, which can be executed by machine entities. The current twin prototypes are considering both statistical and mechanistic models, introducing significant variations in (1) data requirements, (2) modelling approaches and philosophy, and (3) model output. The BioDT consortium will develop guidelines and protocols for how to describe these models, what metadata to include, and how they will interact with the diverse datasets. While discussions on this topic exist within the broader context of biodiversity and ecological sciences (Jeltsch et al. 2013, Fer et al. 2020), the BioDT project is strongly committed to finding a solution within its scope. In the twinning context, data and models need to be executed within a computing infrastructure and also need to adhere to FAIR principles. Software within BioDT includes a suite of tools that facilitate data acquisition, storage, processing, and analysis. While some of these tools already exist, the challenge lies in integrating them within the digital twinning framework. One approach to achieving integration is through workflow representation, encompa","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135981858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ret Thaung, Jackson Frechette, Matthew Luskin, Zachary Amir
Asian elephant ( Elephas maximus ) populations in Cambodia are currently declining, and the effect of environmental degradation on the abundance and health of elephants is poorly understood. We used camera trap data from 42 locations between 2016 to 2020 in the southern Cardamom Mountains to investigate the impact of environmental degradation on the abundance and condition of Asian elephants. Camera trap data were organized using CameraSweet software to retrieve both number of individuals and their condition. For a number of individuals, we defined independent captures spatially and temporally. To assess condition, we created a visual scoring system based on past research (Wemmer et al. 2006, Fernando et al. 2009, Morfeld et al. 2014, Wijeyamohan et al. 2014, Morfeld et al. 2016, Schiffmann et al. 2020). This scoring system relies on visual assessment of the muscle and fat in relation to the pelvis, ribs, and back bone. To validate this subjective scoring system, two scorers reviewed elephant captures by using 10 reference photos and then reviewing each other’s assessment in the first five images showing the elephant's body condition. This method minimizes subjective assessment from two scorers. Environmental variables (Suppl. material 1) such as distance to forest edge, forest integrity index, elevation, global human settlements, distance to road, distance to river, night light and forest cover were obtained, then reclassified in ArcGIS to a common 1 km grid. We implemented hierarchical N-mixture models to investigate the impacts of environmental variables on abundance and used cumulative link models to investigate the impact of the same environmental variables on condition. We found that Asian elephant abundance exhibited a significant positive relationship with distance to forest edges, where abundance was greater further away from a forest edge. We found that body condition score exhibited the relationship with forest cover and Forest Landscape Integrity Index, which suggested that grassland and less dense forest support better condition. Moreover, males exhibited significantly higher scores for body condition than females, while babies, juveniles, and subadults all exhibited lower body condition scores compared to adults. The significantly lower body condition of young elephants is concerning and suggests that conservation managers in the region should prioritize environmental conditions that support young elephant health. Our results identify key environmental variables that appear to promote Asian elephant abundance and health in the Cardamom Mountains, thus informing relevant conservation actions to support this endangered species in Cambodia and beyond.
{"title":"Combining Camera Trap Data and Environmental Data to Estimate the Effects of Environmental Gradients on Abundance of the Asian Elephant Elephas maximus in Cambodia","authors":"Ret Thaung, Jackson Frechette, Matthew Luskin, Zachary Amir","doi":"10.3897/biss.7.112100","DOIUrl":"https://doi.org/10.3897/biss.7.112100","url":null,"abstract":"Asian elephant ( Elephas maximus ) populations in Cambodia are currently declining, and the effect of environmental degradation on the abundance and health of elephants is poorly understood. We used camera trap data from 42 locations between 2016 to 2020 in the southern Cardamom Mountains to investigate the impact of environmental degradation on the abundance and condition of Asian elephants. Camera trap data were organized using CameraSweet software to retrieve both number of individuals and their condition. For a number of individuals, we defined independent captures spatially and temporally. To assess condition, we created a visual scoring system based on past research (Wemmer et al. 2006, Fernando et al. 2009, Morfeld et al. 2014, Wijeyamohan et al. 2014, Morfeld et al. 2016, Schiffmann et al. 2020). This scoring system relies on visual assessment of the muscle and fat in relation to the pelvis, ribs, and back bone. To validate this subjective scoring system, two scorers reviewed elephant captures by using 10 reference photos and then reviewing each other’s assessment in the first five images showing the elephant's body condition. This method minimizes subjective assessment from two scorers. Environmental variables (Suppl. material 1) such as distance to forest edge, forest integrity index, elevation, global human settlements, distance to road, distance to river, night light and forest cover were obtained, then reclassified in ArcGIS to a common 1 km grid. We implemented hierarchical N-mixture models to investigate the impacts of environmental variables on abundance and used cumulative link models to investigate the impact of the same environmental variables on condition. We found that Asian elephant abundance exhibited a significant positive relationship with distance to forest edges, where abundance was greater further away from a forest edge. We found that body condition score exhibited the relationship with forest cover and Forest Landscape Integrity Index, which suggested that grassland and less dense forest support better condition. Moreover, males exhibited significantly higher scores for body condition than females, while babies, juveniles, and subadults all exhibited lower body condition scores compared to adults. The significantly lower body condition of young elephants is concerning and suggests that conservation managers in the region should prioritize environmental conditions that support young elephant health. Our results identify key environmental variables that appear to promote Asian elephant abundance and health in the Cardamom Mountains, thus informing relevant conservation actions to support this endangered species in Cambodia and beyond.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135980426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JJ Dearborn, Mike Lichtenberg, Joel Richard, Joseph deVeer, Michael Trizna, Katie Mika
As the urgency to address the climate crisis intensifies, the availability of accurate and comprehensive biodiversity data has become crucial for informing climate change studies, tracking key environmental indicators, and building global biodiversity monitoring platforms. The Biodiversity Heritage Library (BHL) plays a vital role in the core biodiversity infrastructure, housing over 60 million pages of digitized literature about life on Earth. Recognizing the value of over 500 years of data in BHL, a global network of BHL staff is working to establish a scalable data pipeline to provide actionable occurrence data from BHL’s vast and diverse collections. However, transforming textual content into FAIR (findable, accessible, interoperable, reusable) data poses challenges due to missing descriptive metadata and error-ridden unstructured outputs from commercial text engines. (Fig. 1) Despite the wealth of knowledge in BHL now available to global audiences, the underutilization of biodiversity and climate data contained in BHL's textual corpus hinders scientific research, hampers informed decision-making for conservation efforts, and limits our understanding of biodiversity patterns crucial for addressing the climate crisis. By leveraging recent advancements in text recognition engines, along with cutting-edge AI (Artificial Intelligence) models like OpenAI’s CLIP (Contrastive Language-Image Pre-Training) and nascent features in transcription platforms, BHL staff are beginning to process vast amounts of textual and image data and transform centuries worth of data from BHL collections into computationally usable formats. Recent technological breakthroughs now offer a transformative opportunity to empower the global biodiversity community with prescient insights from our shared past and facilitate the integration of historical knowledge into climate action initiatives. To bridge gaps in the historical record and unlock the potential of the Biodiversity Heritage Library (BHL), a multi-pronged effort utilizing innovative cross-disciplinary approaches is being piloted. These technical approaches were selected for their efficiency and ability to generate rapid results that could be applied across the diverse range of materials in BHL. (Fig. 2) Piloting a data pipeline that is scalable to 60 million pages requires considerable investigation, experimentation, and resources but will have an appreciable impact on global conservation efforts by informing and establishing historic baselines deeper into time. This presentation will focus on the identification, extraction, and transformation of OCR into structured data outputs in BHL. Approaches include: Upgrading legacy OCR text using Tesseract OCR engine to improve data quality by 20% and openly publish 40 GBs of textual data as FAIR data; Evaluating handwritten text recognition (HTR) engines (Microsoft Azure Computer Vision, Google Cloud Vision API (Application Programming Interface), and Amazon Textract) to im
{"title":"Unearthing the Past for a Sustainable Future: Extracting and transforming data in the Biodiversity Heritage Library for climate action","authors":"JJ Dearborn, Mike Lichtenberg, Joel Richard, Joseph deVeer, Michael Trizna, Katie Mika","doi":"10.3897/biss.7.112436","DOIUrl":"https://doi.org/10.3897/biss.7.112436","url":null,"abstract":"As the urgency to address the climate crisis intensifies, the availability of accurate and comprehensive biodiversity data has become crucial for informing climate change studies, tracking key environmental indicators, and building global biodiversity monitoring platforms. The Biodiversity Heritage Library (BHL) plays a vital role in the core biodiversity infrastructure, housing over 60 million pages of digitized literature about life on Earth. Recognizing the value of over 500 years of data in BHL, a global network of BHL staff is working to establish a scalable data pipeline to provide actionable occurrence data from BHL’s vast and diverse collections. However, transforming textual content into FAIR (findable, accessible, interoperable, reusable) data poses challenges due to missing descriptive metadata and error-ridden unstructured outputs from commercial text engines. (Fig. 1) Despite the wealth of knowledge in BHL now available to global audiences, the underutilization of biodiversity and climate data contained in BHL's textual corpus hinders scientific research, hampers informed decision-making for conservation efforts, and limits our understanding of biodiversity patterns crucial for addressing the climate crisis. By leveraging recent advancements in text recognition engines, along with cutting-edge AI (Artificial Intelligence) models like OpenAI’s CLIP (Contrastive Language-Image Pre-Training) and nascent features in transcription platforms, BHL staff are beginning to process vast amounts of textual and image data and transform centuries worth of data from BHL collections into computationally usable formats. Recent technological breakthroughs now offer a transformative opportunity to empower the global biodiversity community with prescient insights from our shared past and facilitate the integration of historical knowledge into climate action initiatives. To bridge gaps in the historical record and unlock the potential of the Biodiversity Heritage Library (BHL), a multi-pronged effort utilizing innovative cross-disciplinary approaches is being piloted. These technical approaches were selected for their efficiency and ability to generate rapid results that could be applied across the diverse range of materials in BHL. (Fig. 2) Piloting a data pipeline that is scalable to 60 million pages requires considerable investigation, experimentation, and resources but will have an appreciable impact on global conservation efforts by informing and establishing historic baselines deeper into time. This presentation will focus on the identification, extraction, and transformation of OCR into structured data outputs in BHL. Approaches include: Upgrading legacy OCR text using Tesseract OCR engine to improve data quality by 20% and openly publish 40 GBs of textual data as FAIR data; Evaluating handwritten text recognition (HTR) engines (Microsoft Azure Computer Vision, Google Cloud Vision API (Application Programming Interface), and Amazon Textract) to im","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135981675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a streamlined technical solution ("Publish First") designed to assist smaller, resource-constrained herbaria in rapidly publishing their specimens to the Global Biodiversity Information Facility (GBIF). Specimen data from smaller herbaria, particularly those in biodiversity-rich regions of the world, provide a valuable and often unique contribution to the global pool of biodiversity knowledge (Marsico et al. 2020). However, these institutions often face challenges not applicable to larger herbaria, including a lack of staff with technical skills, limited staff hours for digitization work, inadequate financial resources for specialized scanning equipment, cameras, lights, and imaging stands, limited (or no) access to computers and collection management software, and unreliable internet connections. Data-scarce and biodiversity rich countries are also often linguistically diverse (Gorenflo et al. 2012), and staff may not have English skills, which means pre-existing online data publication resources and guides are of limited use. The "Publish First" method we are trialing, addresses several of these issues: it drastically simplifies the publication process so technical skills are not necessary; it minimizes administrative tasks saving time; it uses simple, cheap and easily available hardware; it does not require any specialized software; and the process is so simple that there is little to no need for any written instructions. "Publish first" requires staff to attach QR code labels containing identifiers to herbarium specimen sheets, scan these sheets using a document scanner costing around €300, then drag and drop these files to an S3 bucket (a cloud container that specialises in storing files). Subsequently, these images are automatically processed through an Optical Character Recognition (OCR) service to extract text, which is then passed on to OpenAI's Generative Pre-Transformer 4 (GPT-4) Application Programming Interface (API), for standardization. The standardized data is integrated into a Darwin Core Archive file that is automatically published through GBIF's Integrated Publishing Toolkit (IPT) (GBIF 2021). The most technically challenging aspect of this project has been the standardization of OCR data to Darwin Core using the GPT-4 API, particularly in crafting precise prompts to address the inherent inconsistency and lack of reliability in these Large Language Models (LLMs). Despite this, GPT-4 outperformed our manual scraping efforts. Our choice of GPT-4 as a model was a naive one: we implemented the workflow on some pre-digitized specimens from previously published Norwegian collections, compared the published data on GBIF with GPT-4's Darwin Core standardized output, and found the results satisfactory. Moving forward, we plan to undertake more rigorous additional research to compare the effectiveness and cost-efficiency of different LLMs as Darwin Core standardization engines. We are also particularly interested in exploring
我们提出了一个简化的技术解决方案(“出版优先”),旨在帮助较小的、资源有限的植物标本馆快速将其标本发布到全球生物多样性信息设施(GBIF)。来自小型植物标本馆的标本数据,特别是来自世界上生物多样性丰富地区的标本数据,为全球生物多样性知识库提供了宝贵且往往独特的贡献(Marsico et al. 2020)。然而,这些机构经常面临与大型植物标本馆不同的挑战,包括缺乏具有技术技能的工作人员,数字化工作的工作时间有限,专门扫描设备,相机,灯和成像架的财政资源不足,有限(或没有)访问计算机和收集管理软件,以及不可靠的互联网连接。数据稀缺和生物多样性丰富的国家往往语言多样(Gorenflo et al. 2012),工作人员可能不具备英语技能,这意味着现有的在线数据出版资源和指南的用途有限。我们正在尝试的“发布优先”方法解决了其中几个问题:它极大地简化了发布过程,因此不需要技术技能;它最大限度地减少管理任务,节省时间;它使用简单、廉价和容易获得的硬件;它不需要任何专门的软件;这个过程非常简单,几乎不需要任何书面说明。“先发布”要求工作人员将包含标识符的QR码标签贴在植物标本上,使用成本约为300欧元的文档扫描仪扫描这些样本,然后将这些文件拖放到S3桶(一种专门存储文件的云容器)中。随后,通过光学字符识别(OCR)服务对这些图像进行自动处理以提取文本,然后将其传递给OpenAI的生成预变形4 (GPT-4)应用程序编程接口(API)进行标准化。标准化数据被集成到达尔文核心档案文件中,该文件通过GBIF的集成出版工具包(IPT) (GBIF 2021)自动发布。该项目最具技术挑战的方面是使用GPT-4 API将OCR数据标准化到Darwin Core,特别是在制作精确提示以解决这些大型语言模型(llm)中固有的不一致和缺乏可靠性。尽管如此,GPT-4还是比我们手工抓取的效果好。我们选择GPT-4作为模型是一个幼稚的选择:我们在之前发表的挪威馆藏的一些预数字化标本上实施了工作流,将GBIF上发表的数据与GPT-4的达尔文核心标准化输出进行了比较,发现结果令人满意。展望未来,我们计划进行更严格的额外研究,以比较不同llm作为达尔文核心标准化引擎的有效性和成本效率。我们还对探索GPT-4 API中添加的新“函数调用”特性特别感兴趣,因为它承诺允许我们以更一致和结构化的格式检索标准化数据。这一工作流程目前正在塔吉克斯坦试用,不久的将来可能会在乌兹别克斯坦、亚美尼亚和意大利使用。
{"title":"\"Publish First\": A Rapid, GPT-4 Based Digitisation System for Small Institutes with Minimal Resources","authors":"Rukaya Johaadien, Michal Torma","doi":"10.3897/biss.7.112428","DOIUrl":"https://doi.org/10.3897/biss.7.112428","url":null,"abstract":"We present a streamlined technical solution (\"Publish First\") designed to assist smaller, resource-constrained herbaria in rapidly publishing their specimens to the Global Biodiversity Information Facility (GBIF). Specimen data from smaller herbaria, particularly those in biodiversity-rich regions of the world, provide a valuable and often unique contribution to the global pool of biodiversity knowledge (Marsico et al. 2020). However, these institutions often face challenges not applicable to larger herbaria, including a lack of staff with technical skills, limited staff hours for digitization work, inadequate financial resources for specialized scanning equipment, cameras, lights, and imaging stands, limited (or no) access to computers and collection management software, and unreliable internet connections. Data-scarce and biodiversity rich countries are also often linguistically diverse (Gorenflo et al. 2012), and staff may not have English skills, which means pre-existing online data publication resources and guides are of limited use. The \"Publish First\" method we are trialing, addresses several of these issues: it drastically simplifies the publication process so technical skills are not necessary; it minimizes administrative tasks saving time; it uses simple, cheap and easily available hardware; it does not require any specialized software; and the process is so simple that there is little to no need for any written instructions. \"Publish first\" requires staff to attach QR code labels containing identifiers to herbarium specimen sheets, scan these sheets using a document scanner costing around €300, then drag and drop these files to an S3 bucket (a cloud container that specialises in storing files). Subsequently, these images are automatically processed through an Optical Character Recognition (OCR) service to extract text, which is then passed on to OpenAI's Generative Pre-Transformer 4 (GPT-4) Application Programming Interface (API), for standardization. The standardized data is integrated into a Darwin Core Archive file that is automatically published through GBIF's Integrated Publishing Toolkit (IPT) (GBIF 2021). The most technically challenging aspect of this project has been the standardization of OCR data to Darwin Core using the GPT-4 API, particularly in crafting precise prompts to address the inherent inconsistency and lack of reliability in these Large Language Models (LLMs). Despite this, GPT-4 outperformed our manual scraping efforts. Our choice of GPT-4 as a model was a naive one: we implemented the workflow on some pre-digitized specimens from previously published Norwegian collections, compared the published data on GBIF with GPT-4's Darwin Core standardized output, and found the results satisfactory. Moving forward, we plan to undertake more rigorous additional research to compare the effectiveness and cost-efficiency of different LLMs as Darwin Core standardization engines. We are also particularly interested in exploring ","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135981805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Kalfatovic, Bianca Crowley, JJ Dearborn, Colleen Funkhouser, David Iggulden, Kelli Trei, Elisa Herrmann, Kevin Merriman
The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. Headquartered at Smithsonian Libraries and Archives (SLA), BHL is a global consortium of research institutions working together to build and maintain a critical piece of biodiversity data infrastructure. BHL provides free access to over 60 million pages of biodiversity content from the 15th–21st centuries. BHL works with the biodiversity community to develop tools and services to facilitate greater access, interoperability, and reuse of content and data. Through taxonomic intelligence tools developed by Global Names Architecture, BHL has indexed more than 230 million instances of taxonomic names throughout its collection, allowing researchers to locate publications about specific taxa. BHL also works to bring historical literature into the modern network of scholarly research by retroactively assigning DOIs (digital object identifiers) and making this historical content more discoverable and trackable. Biodiversity databases such as the Catalogue of Life, International Plant Names Index, Tropicos, World Register of Marine Species, and iNaturalist, rely on literature housed in BHL. Locked within its 60 million pages are valuable species occurrence data and observations from expeditions. To make this data FAIR (findable, accessible, interoperable, and reusable), BHL and its partners are working on a data pipeline to transform textual content into actionable data that can be deposited into data aggregators such as the Global Biodiversity Information Facility (GBIF). BHL’s shared vision began in 2006 among a small community of passionate librarians, technologists, and biodiversity researchers. Uniting as a consortium, BHL received grant funding to build and launch its digital library. BHL partners received additional grant funding for further technical development and targeted digitization projects. When initial grant funding ended in 2012, BHL established an annual dues model for its Members and Affiliates to help support central BHL operating expenses and technical development. This dues model continues today, along with in-kind contributions of staff time from Members and Affiliates. Significant funding is also provided by the Smithsonian in the form of an annual U.S. federal allocation, endowment funds, and SLA cost subvention, to host the technical infrastructure and Secretariat staff. BHL also relies on user donations to support its program. Though BHL has diversified funding streams over the years, it relies heavily on a few key institutions to cover operating costs. Though these institutions have overarching open access, research, and sustainability goals, priorities and resources to achieve these goals shift over time. Without long-term commitments, institutions may choose to prioritize new projects over established programs. Many BHL contributors have experienced funding loss for digitization projects, reducin
{"title":"Safeguarding Access to 500 Years of Biodiversity Data: Sustainability planning for the Biodiversity Heritage Library","authors":"Martin Kalfatovic, Bianca Crowley, JJ Dearborn, Colleen Funkhouser, David Iggulden, Kelli Trei, Elisa Herrmann, Kevin Merriman","doi":"10.3897/biss.7.112430","DOIUrl":"https://doi.org/10.3897/biss.7.112430","url":null,"abstract":"The Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. Headquartered at Smithsonian Libraries and Archives (SLA), BHL is a global consortium of research institutions working together to build and maintain a critical piece of biodiversity data infrastructure. BHL provides free access to over 60 million pages of biodiversity content from the 15th–21st centuries. BHL works with the biodiversity community to develop tools and services to facilitate greater access, interoperability, and reuse of content and data. Through taxonomic intelligence tools developed by Global Names Architecture, BHL has indexed more than 230 million instances of taxonomic names throughout its collection, allowing researchers to locate publications about specific taxa. BHL also works to bring historical literature into the modern network of scholarly research by retroactively assigning DOIs (digital object identifiers) and making this historical content more discoverable and trackable. Biodiversity databases such as the Catalogue of Life, International Plant Names Index, Tropicos, World Register of Marine Species, and iNaturalist, rely on literature housed in BHL. Locked within its 60 million pages are valuable species occurrence data and observations from expeditions. To make this data FAIR (findable, accessible, interoperable, and reusable), BHL and its partners are working on a data pipeline to transform textual content into actionable data that can be deposited into data aggregators such as the Global Biodiversity Information Facility (GBIF). BHL’s shared vision began in 2006 among a small community of passionate librarians, technologists, and biodiversity researchers. Uniting as a consortium, BHL received grant funding to build and launch its digital library. BHL partners received additional grant funding for further technical development and targeted digitization projects. When initial grant funding ended in 2012, BHL established an annual dues model for its Members and Affiliates to help support central BHL operating expenses and technical development. This dues model continues today, along with in-kind contributions of staff time from Members and Affiliates. Significant funding is also provided by the Smithsonian in the form of an annual U.S. federal allocation, endowment funds, and SLA cost subvention, to host the technical infrastructure and Secretariat staff. BHL also relies on user donations to support its program. Though BHL has diversified funding streams over the years, it relies heavily on a few key institutions to cover operating costs. Though these institutions have overarching open access, research, and sustainability goals, priorities and resources to achieve these goals shift over time. Without long-term commitments, institutions may choose to prioritize new projects over established programs. Many BHL contributors have experienced funding loss for digitization projects, reducin","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135980587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the framework of implementing the European Open Science Cloud (EOSC), there is still confusion between the concept of data FAIRness (Findable, Accessible, Interoperable and Re-usable, Wilkinson et al. 2016) and the idea of open and freely accessible data, which are not necessarily the same. Data can indeed comply with the requirements of FAIRness even if their access is moderated or behind a paywall. Therefore the motto of EOSC is actually “As open as possible, as closed as necessary”. This confusion or misinterpretation of definitions has raised concerns among potential data providers who fear being obligated to make sensitive data openly accessible and freely available, even if there are valid reasons for restrictions, or to forfeit any charges or hamper profit making if the data generate revenue. As a result, there has been some reluctance to fully engage in the activities related to FAIR data and the EOSC. When addressing sensitive data, what comes to mind are personal data governed by the General Data Protection Regulation (GDPR), as well as clinical, security, military, or commercially valuable data protected by patents. In the domain of biodiversity or natural history collections, it is often reported that these issues surrounding sensitive data regulations have less impact, especially when contributors are properly cited and embargo periods are respected. However, there are cases in this domain where sensitive data must be considered for legal or ethical purposes. Examples include protected or endangered species, where the exact geographic coordinates might not be shared openly to avoid poaching; cases of Access and Benefit sharing (ABS), depending on the country of origin of the species; the respect of traditional knowledge; and a desire to limit the commercial exploitation of the data. The requirements of the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity, as well as the upcoming Digital Sequence Information regulations (DSI), play an important role here. The Digital Services Act (DSA) was recently adopted with the aim of the protection of the digital space against the spread of illegal content, which sets the interoperability requirements for operators of data spaces. This raises questions on the actual definition of data spaces and how they would be affected by this new European legislation but with a worldwide impact on widely used social media and content platforms such as Google or YouTube. During the implementation and updating activities in projects and initiatives like Biodiversity Community Integrated Knowledge Library (BiCIKL), it became clear that there is a need to offer a secure data repository and management system that can deal with both open and non-open data in order to effectively include all potential data providers and mobilise their content while adhering to FAIR requirements. In this talk, a
在实施欧洲开放科学云(EOSC)的框架中,数据公平性(可查找、可访问、可互操作和可重用,Wilkinson et al. 2016)的概念与开放和自由访问数据的概念之间仍然存在混淆,这两个概念不一定相同。数据确实可以符合公平的要求,即使它们的访问是经过审核的或在付费墙后面。因此,EOSC的座右铭实际上是“尽可能开放,必要时尽可能封闭”。这种对定义的混淆或误解引起了潜在数据提供者的担忧,他们担心有义务公开访问和免费提供敏感数据,即使有正当理由进行限制,或者在数据产生收入时放弃任何收费或妨碍盈利。因此,有些人不愿意充分参与与FAIR数据和EOSC有关的活动。在处理敏感数据时,首先想到的是受通用数据保护条例(GDPR)管辖的个人数据,以及受专利保护的临床、安全、军事或商业价值数据。在生物多样性或自然历史收藏领域,经常有报道称,围绕敏感数据法规的这些问题影响较小,特别是在适当引用贡献者和尊重禁运期的情况下。但是,在这个领域中,出于法律或道德目的必须考虑敏感数据的情况。例子包括受保护或濒临灭绝的物种,它们的确切地理坐标可能不会公开分享,以避免偷猎;获取和惠益分享(ABS)案例,具体取决于物种的原产国;尊重传统知识;以及限制数据商业利用的愿望。《生物多样性公约关于获取遗传资源和公平公正分享利用遗传资源所产生惠益的名古屋议定书》的要求以及即将出台的《数字序列信息条例》(DSI)在这方面发挥着重要作用。最近通过了《数字服务法案》(DSA),旨在保护数字空间免受非法内容的传播,该法案规定了数据空间运营商的互操作性要求。这就提出了关于数据空间的实际定义的问题,以及它们将如何受到这项新的欧洲立法的影响,但对b谷歌或YouTube等广泛使用的社交媒体和内容平台产生全球影响。在生物多样性社区综合知识图书馆(BiCIKL)等项目和倡议的实施和更新活动中,很明显,需要提供一个安全的数据存储库和管理系统,可以处理开放和非开放数据,以便有效地包括所有潜在的数据提供者,并在遵守FAIR要求的同时调动他们的内容。在本次演讲中,在对敏感数据进行了一般性介绍之后,我们将提供生物多样性和自然科学领域中如何处理敏感数据及其管理的几个例子,例如GBIF推荐的。最后,但并非最不重要的是,我们将强调在BiCIKL实施的生物多样性知识中心(BKH)的背景下,使用生物多样性信息标准(TDWG)等国际公认的标准来实现这些发展的重要性。值得注意的是,通过提供有关使用条款、引用要求和许可的清晰元数据,可以合法有效地对数据进行实际再利用。
{"title":"FAIR but not Necessarily Open: Sensitive data in the domain of biodiversity","authors":"Patricia Mergen, S. Meeus, F. Leliaert","doi":"10.3897/biss.7.112296","DOIUrl":"https://doi.org/10.3897/biss.7.112296","url":null,"abstract":"In the framework of implementing the European Open Science Cloud (EOSC), there is still confusion between the concept of data FAIRness (Findable, Accessible, Interoperable and Re-usable, Wilkinson et al. 2016) and the idea of open and freely accessible data, which are not necessarily the same. Data can indeed comply with the requirements of FAIRness even if their access is moderated or behind a paywall. Therefore the motto of EOSC is actually “As open as possible, as closed as necessary”. This confusion or misinterpretation of definitions has raised concerns among potential data providers who fear being obligated to make sensitive data openly accessible and freely available, even if there are valid reasons for restrictions, or to forfeit any charges or hamper profit making if the data generate revenue. As a result, there has been some reluctance to fully engage in the activities related to FAIR data and the EOSC.\u0000 When addressing sensitive data, what comes to mind are personal data governed by the General Data Protection Regulation (GDPR), as well as clinical, security, military, or commercially valuable data protected by patents. In the domain of biodiversity or natural history collections, it is often reported that these issues surrounding sensitive data regulations have less impact, especially when contributors are properly cited and embargo periods are respected. However, there are cases in this domain where sensitive data must be considered for legal or ethical purposes. Examples include protected or endangered species, where the exact geographic coordinates might not be shared openly to avoid poaching; cases of Access and Benefit sharing (ABS), depending on the country of origin of the species; the respect of traditional knowledge; and a desire to limit the commercial exploitation of the data. The requirements of the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity, as well as the upcoming Digital Sequence Information regulations (DSI), play an important role here. The Digital Services Act (DSA) was recently adopted with the aim of the protection of the digital space against the spread of illegal content, which sets the interoperability requirements for operators of data spaces. This raises questions on the actual definition of data spaces and how they would be affected by this new European legislation but with a worldwide impact on widely used social media and content platforms such as Google or YouTube.\u0000 During the implementation and updating activities in projects and initiatives like Biodiversity Community Integrated Knowledge Library (BiCIKL), it became clear that there is a need to offer a secure data repository and management system that can deal with both open and non-open data in order to effectively include all potential data providers and mobilise their content while adhering to FAIR requirements.\u0000 In this talk, a","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76593348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tanja Lumetsberger, Georg Neubauer, Reinhardt Wenzina
The Biodiversity Atlas Austria (“Biodiversitäts-Atlas Österreich”) is a data portal to explore Austria’s biodiversity. It is based on the open-source infrastructure of the Atlas of Living Australia (ALA) and was launched with support of the Living Atlas (LA) community in late 2019 by the Biodiversity Hub of the University of Continuing Education Krems funded by the Government of Lower Austria. At present, it stores more than 8.5 million species occurrence records from various data partners and institutions and is available in both English and German. The Atlas is running on two virtual machines with 4 TB storage and is hosting many of the ALA-developed tools and services such as collectory, biocache, biodiversity information explorer, regions, spatial portal, sensitive data service, lists, images, and dashboard. In the project “ÖKOLEITA” (2021-2023), two new tools were developed within the existing LA infrastructure and will be launched in late 2023 to allow users to deal with ecosystem services and habitat data. The “ecosys”-tool will allow management, visualization, and analysis of ecosystem services by uploading different (raster or vector) TIFF files containing mapped ecosystem services to the geoserver. Users will be able to inspect various ecosystem services at a specific geolocation or compare different geolocations or a transect on their respective ecosystem service potential. The ecosystem service values are presented on the one hand as pictograms, where the value is transformed into quintiles, orienting on the work by Schreder et al. (2018), and as bar chart showing the true values. The “habitat” tool will store and manage datasets of habitat mappings (shapefiles) and allow users to spatially explore those various habitat mappings on a map. Users will be able to search for specific habitats across all datasets or a specific one and get all occurrences of this habitat type returned. Through linkage to the biocache, a click on a specific area reveals the list of species found within that habitat recording, as well as all the species occurrences within that area stored in the database. A “habitat backbone” of the most used habitat classifications in Austria will allow dealing with habitat mappings that use different classifications. Both tools are integrated into the Living Atlases infrastructure and communicate with the other tools and services of the Biodiversity Atlas Austria (Fig. 1). They share a common administration back-end but have different front-ends, where the users can explore the ecosystem services and habitats spatially and in connection with species occurrence records and other contextual information.
{"title":"Integration of Ecosystem Services and Habitats into the Biodiversity Atlas Austria","authors":"Tanja Lumetsberger, Georg Neubauer, Reinhardt Wenzina","doi":"10.3897/biss.7.112315","DOIUrl":"https://doi.org/10.3897/biss.7.112315","url":null,"abstract":"The Biodiversity Atlas Austria (“Biodiversitäts-Atlas Österreich”) is a data portal to explore Austria’s biodiversity. It is based on the open-source infrastructure of the Atlas of Living Australia (ALA) and was launched with support of the Living Atlas (LA) community in late 2019 by the Biodiversity Hub of the University of Continuing Education Krems funded by the Government of Lower Austria. At present, it stores more than 8.5 million species occurrence records from various data partners and institutions and is available in both English and German. The Atlas is running on two virtual machines with 4 TB storage and is hosting many of the ALA-developed tools and services such as collectory, biocache, biodiversity information explorer, regions, spatial portal, sensitive data service, lists, images, and dashboard.\u0000 In the project “ÖKOLEITA” (2021-2023), two new tools were developed within the existing LA infrastructure and will be launched in late 2023 to allow users to deal with ecosystem services and habitat data.\u0000 The “ecosys”-tool will allow management, visualization, and analysis of ecosystem services by uploading different (raster or vector) TIFF files containing mapped ecosystem services to the geoserver. Users will be able to inspect various ecosystem services at a specific geolocation or compare different geolocations or a transect on their respective ecosystem service potential. The ecosystem service values are presented on the one hand as pictograms, where the value is transformed into quintiles, orienting on the work by Schreder et al. (2018), and as bar chart showing the true values.\u0000 The “habitat” tool will store and manage datasets of habitat mappings (shapefiles) and allow users to spatially explore those various habitat mappings on a map. Users will be able to search for specific habitats across all datasets or a specific one and get all occurrences of this habitat type returned. Through linkage to the biocache, a click on a specific area reveals the list of species found within that habitat recording, as well as all the species occurrences within that area stored in the database. A “habitat backbone” of the most used habitat classifications in Austria will allow dealing with habitat mappings that use different classifications.\u0000 Both tools are integrated into the Living Atlases infrastructure and communicate with the other tools and services of the Biodiversity Atlas Austria (Fig. 1). They share a common administration back-end but have different front-ends, where the users can explore the ecosystem services and habitats spatially and in connection with species occurrence records and other contextual information.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84171243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The digitization of herbarium collections has brought forth a transformative journey, transitioning Nepal's hidden kingdom of mushrooms and lichens from the shadows into the spotlight. Through a collaborative work within the framework of Global Biodiversity Information Facility's Biodiversity Information Fund for Asia (GBIF-BIFA), involving the herbaria (KATH: Nepal's National Herbarium and Plant Laboratories and TUCH: Natural History Museum, Tribhuvan University, Nepal), and the research institute, Global Institute for Interdisciplinary Studies (GIIS), a successful unveiling of Nepal's mycological treasures has been achieved through digital means. A comprehensive digitization effort has resulted in the complete digitization of 3,971 mushroom specimens and 2,462 lichen specimens, illuminating a wealth of information for researchers, citizen scientists, and the general public. GBIF and the online database maintained by Nepal's National Herbarium and Plant Laboratories, Department of Plant Resources, serve as the gateway to this work (KATH 2021). Prior to this work, the specimens resided in the shadows, lacking the recognition they deserved. Through meticulous collection management, sorting, curation, and labeling, their secrets were unveiled, and their stories brought to our fingertips. These previously obscured specimens now possess registered individual catalogue numbers, allowing the quantification of Nepal's fungal wealth within the participating institutions. This project serves as a testament to the vital role of capturing available field-level data, preserving specimens, and harnessing the power of digitization to showcase Nepal's mycological and lichenological wonders to a global audience. Meanwhile, it has also emphasized the significance of sharing this knowledge and fostering appreciation for the overlooked world of mushrooms and lichens.
{"title":"From the Shadows to the Spotlight: Unveiling Nepal's hidden kingdom of mushrooms and lichens through digitization","authors":"Shiva Devkota","doi":"10.3897/biss.7.112376","DOIUrl":"https://doi.org/10.3897/biss.7.112376","url":null,"abstract":"The digitization of herbarium collections has brought forth a transformative journey, transitioning Nepal's hidden kingdom of mushrooms and lichens from the shadows into the spotlight. Through a collaborative work within the framework of Global Biodiversity Information Facility's Biodiversity Information Fund for Asia (GBIF-BIFA), involving the herbaria (KATH: Nepal's National Herbarium and Plant Laboratories and TUCH: Natural History Museum, Tribhuvan University, Nepal), and the research institute, Global Institute for Interdisciplinary Studies (GIIS), a successful unveiling of Nepal's mycological treasures has been achieved through digital means. A comprehensive digitization effort has resulted in the complete digitization of 3,971 mushroom specimens and 2,462 lichen specimens, illuminating a wealth of information for researchers, citizen scientists, and the general public. GBIF and the online database maintained by Nepal's National Herbarium and Plant Laboratories, Department of Plant Resources, serve as the gateway to this work (KATH 2021). Prior to this work, the specimens resided in the shadows, lacking the recognition they deserved. Through meticulous collection management, sorting, curation, and labeling, their secrets were unveiled, and their stories brought to our fingertips. These previously obscured specimens now possess registered individual catalogue numbers, allowing the quantification of Nepal's fungal wealth within the participating institutions. This project serves as a testament to the vital role of capturing available field-level data, preserving specimens, and harnessing the power of digitization to showcase Nepal's mycological and lichenological wonders to a global audience. Meanwhile, it has also emphasized the significance of sharing this knowledge and fostering appreciation for the overlooked world of mushrooms and lichens.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91252599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lien Reyserhove, Pieter Huybrechts, J. Hillaert, T. Adriaens, B. D'hondt, Damiano Oldoni
Invasive alien species (IAS) are recognised as a major threat to biodiversity. To prevent the introduction and spread of IAS, the European Union Regulation (EU) 1143/2014 imposes an obligation on Member States to both develop management strategies for IAS of Union Concern and report on those interventions. For this, we need to collect and combine management data and streamline management actions. This is still a major challenge: the landscape of IAS management is diverse and includes different authorities, managers, businesses and non-governmental organizations. Some organizations have developed their own specific software applications for recording management actions. For other organizations, such a software system is lacking. Their management data are scattered, not harmonized, and often not openly available. For EU reporting, a workflow is needed to centralize all information about the applied management method, management effort, cost, effectiveness and impact of the performed actions on other biota or the environment. At this moment, such a workflow is lacking in Belgium. One of the aims of the LIFE RIPARIAS project is to set up a workflow for harmonizing IAS management data in Belgium. Based on the input from the IAS management community in Belgium, we were able to: draft a community-driven data model and exchange format called manIAS (MANagement of Invasive Alien Species, Reyserhove et al. 2022), and identify the minimal requirements a software application should have for being successfully used in the field (Hillaert et al. 2022). draft a community-driven data model and exchange format called manIAS (MANagement of Invasive Alien Species, Reyserhove et al. 2022), and identify the minimal requirements a software application should have for being successfully used in the field (Hillaert et al. 2022). In this presentation, we will explore both outputs, the lessons learned and the way forward. With our work, we aim to facilitate coordination and transfer of information between the different actors involved in IAS and wildlife management, not only on a Belgian scale, but also within an international context.
{"title":"Towards Improved Data Flows for the Management of Invasive Alien Species and Wildlife: A LIFE RIPARIAS use case","authors":"Lien Reyserhove, Pieter Huybrechts, J. Hillaert, T. Adriaens, B. D'hondt, Damiano Oldoni","doi":"10.3897/biss.7.112386","DOIUrl":"https://doi.org/10.3897/biss.7.112386","url":null,"abstract":"Invasive alien species (IAS) are recognised as a major threat to biodiversity. To prevent the introduction and spread of IAS, the European Union Regulation (EU) 1143/2014 imposes an obligation on Member States to both develop management strategies for IAS of Union Concern and report on those interventions. For this, we need to collect and combine management data and streamline management actions. This is still a major challenge: the landscape of IAS management is diverse and includes different authorities, managers, businesses and non-governmental organizations. Some organizations have developed their own specific software applications for recording management actions. For other organizations, such a software system is lacking. Their management data are scattered, not harmonized, and often not openly available. For EU reporting, a workflow is needed to centralize all information about the applied management method, management effort, cost, effectiveness and impact of the performed actions on other biota or the environment. At this moment, such a workflow is lacking in Belgium.\u0000 One of the aims of the LIFE RIPARIAS project is to set up a workflow for harmonizing IAS management data in Belgium. Based on the input from the IAS management community in Belgium, we were able to:\u0000 \u0000 \u0000 \u0000 draft a community-driven data model and exchange format called manIAS (MANagement of Invasive Alien Species, Reyserhove et al. 2022), and\u0000 \u0000 \u0000 identify the minimal requirements a software application should have for being successfully used in the field (Hillaert et al. 2022).\u0000 \u0000 \u0000 \u0000 draft a community-driven data model and exchange format called manIAS (MANagement of Invasive Alien Species, Reyserhove et al. 2022), and\u0000 identify the minimal requirements a software application should have for being successfully used in the field (Hillaert et al. 2022).\u0000 In this presentation, we will explore both outputs, the lessons learned and the way forward. With our work, we aim to facilitate coordination and transfer of information between the different actors involved in IAS and wildlife management, not only on a Belgian scale, but also within an international context.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85347480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruben Perez Perez, J. Beja, L. Vandepitte, Marina Lipizer, B. Weigel, B. Vanhoorne
EMODnet Biology (hosted and coordinated by the Flanders Marine Institute (VLIZ)) is one of the seven themes within the European Marine Observation and Data network (EMODnet). The EMODnet Biology consortium aims to facilitate the accessibility and usage of marine biodiversity data. With the principle of "collect once, use many times" at its core, EMODnet Biology fosters collaboration across various sectors, including research, policy-making, industry, and individual citizens, to enhance knowledge sharing and inform decision-making. EMODnet Biology focuses on providing free and open access to comprehensive historical and recent data on the occurrence of marine species and their traits in all European regional seas. It achieves this through partnerships and collaboration with diverse international initiatives, such as the World Register of Marine Species (WoRMS), Marine Regions and the European node of the Ocean Biodiversity Information System (EurOBIS) among others. By promoting the usage of the Darwin Core Standard (Wieczorek et al. 2012), EMODnet Biology fosters data interoperability and ensures seamless integration with wider networks such as the Global Biodiversity Information Facility (GBIF) and the Ocean Biodiversity Information System (OBIS), serving as a significant data provider of the latter, as it is responsible for most of its data generated in Europe. Since its inception, EMODnet Biology has undertaken actions covering various areas, including providing access to marine biological data with spatio-temporal, taxonomic, environmental- and sampling-related information among others; developing an exhaustive data quality control tool based on the Darwin Core standard, the British Oceanographic Data Centre and Natural Environment Research Council Vocabulary Server (BODC NVS2) parameters and other controlled vocabularies used; creating and providing training courses to guide data providers; performing gap analyses to identify data quality and coverage shortcomings; creating and publishing marine biological distribution maps for various species or species groups; and interacting with international and European initiatives, projects and organizations. providing access to marine biological data with spatio-temporal, taxonomic, environmental- and sampling-related information among others; developing an exhaustive data quality control tool based on the Darwin Core standard, the British Oceanographic Data Centre and Natural Environment Research Council Vocabulary Server (BODC NVS2) parameters and other controlled vocabularies used; creating and providing training courses to guide data providers; performing gap analyses to identify data quality and coverage shortcomings; creating and publishing marine biological distribution maps for various species or species groups; and interacting with international and European initiatives, projects and organizations. Furthermore, EMODnet Biology contributes to the
EMODnet生物学(由佛兰德斯海洋研究所(VLIZ)主持和协调)是欧洲海洋观测和数据网络(EMODnet)内的七个主题之一。EMODnet生物学联盟旨在促进海洋生物多样性数据的获取和使用。EMODnet生物学以“一次收集,多次使用”为核心原则,促进包括研究、决策、行业和公民个人在内的各个部门之间的合作,以加强知识共享并为决策提供信息。EMODnet生物学的重点是提供免费和开放获取有关所有欧洲区域海洋中海洋物种发生及其特征的全面历史和近期数据。它通过与世界海洋物种登记(WoRMS)、海洋区域和海洋生物多样性信息系统欧洲节点(EurOBIS)等各种国际倡议建立伙伴关系和合作来实现这一目标。通过促进达尔文核心标准的使用(Wieczorek et al. 2012), EMODnet生物学促进了数据互操作性,并确保与更广泛的网络(如全球生物多样性信息设施(GBIF)和海洋生物多样性信息系统(OBIS))无缝集成,作为后者的重要数据提供者,因为它负责在欧洲生成的大部分数据。自成立以来,EMODnet生物学已采取了涵盖各个领域的行动,包括提供获取海洋生物数据的途径,其中包括时空、分类学、环境和采样等相关信息;基于达尔文核心标准、英国海洋学数据中心和自然环境研究委员会词汇服务器(BODC NVS2)参数和其他使用的受控词汇开发详尽的数据质量控制工具;创建和提供培训课程,以指导数据提供者;执行差距分析,以确定数据质量和覆盖范围的缺陷;创建和出版各种物种或物种群的海洋生物分布图;并与国际和欧洲的倡议、项目和组织进行互动。提供获取海洋生物数据的途径,其中包括时空、分类学、环境和采样等方面的信息;基于达尔文核心标准、英国海洋学数据中心和自然环境研究委员会词汇服务器(BODC NVS2)参数和其他使用的受控词汇开发详尽的数据质量控制工具;创建和提供培训课程,以指导数据提供者;执行差距分析,以确定数据质量和覆盖范围的缺陷;创建和出版各种物种或物种群的海洋生物分布图;并与国际和欧洲的倡议、项目和组织进行互动。此外,EMODnet生物学为整个EMODnet计划做出了贡献,该计划涵盖多学科数据和产品。由于使用了跨学科的标准协议和工具,EMODnet生物学产品可以有助于对主要海洋物种和栖息地的压力和影响进行多学科分析,最后,支持更好地管理和规划海洋空间。总之,EMODnet生物学通过为用户提供丰富的可获取和可重复使用的海洋生物多样性数据和产品,在生物多样性信息学中发挥着关键作用。它的协作方式、广泛的伙伴关系以及对FAIR(可查找、可访问、可互操作、可重用)数据原则(Wilkinson等人,2016年)、欧洲空间信息基础设施(INSPIRE)元数据技术指南(欧盟委员会联合研究中心,2013年)和开放地理空间联盟(OGC)标准的遵守,使其成为推进知识、为政策提供信息、支持海洋生态系统的可持续管理。
{"title":"EMODnet Biology: Unlocking European marine biodiversity data","authors":"Ruben Perez Perez, J. Beja, L. Vandepitte, Marina Lipizer, B. Weigel, B. Vanhoorne","doi":"10.3897/biss.7.112147","DOIUrl":"https://doi.org/10.3897/biss.7.112147","url":null,"abstract":"EMODnet Biology (hosted and coordinated by the Flanders Marine Institute (VLIZ)) is one of the seven themes within the European Marine Observation and Data network (EMODnet). The EMODnet Biology consortium aims to facilitate the accessibility and usage of marine biodiversity data. With the principle of \"collect once, use many times\" at its core, EMODnet Biology fosters collaboration across various sectors, including research, policy-making, industry, and individual citizens, to enhance knowledge sharing and inform decision-making.\u0000 EMODnet Biology focuses on providing free and open access to comprehensive historical and recent data on the occurrence of marine species and their traits in all European regional seas. It achieves this through partnerships and collaboration with diverse international initiatives, such as the World Register of Marine Species (WoRMS), Marine Regions and the European node of the Ocean Biodiversity Information System (EurOBIS) among others. By promoting the usage of the Darwin Core Standard (Wieczorek et al. 2012), EMODnet Biology fosters data interoperability and ensures seamless integration with wider networks such as the Global Biodiversity Information Facility (GBIF) and the Ocean Biodiversity Information System (OBIS), serving as a significant data provider of the latter, as it is responsible for most of its data generated in Europe.\u0000 Since its inception, EMODnet Biology has undertaken actions covering various areas, including\u0000 \u0000 \u0000 \u0000 providing access to marine biological data with spatio-temporal, taxonomic, environmental- and sampling-related information among others;\u0000 \u0000 \u0000 developing an exhaustive data quality control tool based on the Darwin Core standard, the British Oceanographic Data Centre and Natural Environment Research Council Vocabulary Server (BODC NVS2) parameters and other controlled vocabularies used;\u0000 \u0000 \u0000 creating and providing training courses to guide data providers;\u0000 \u0000 \u0000 performing gap analyses to identify data quality and coverage shortcomings;\u0000 \u0000 \u0000 creating and publishing marine biological distribution maps for various species or species groups; and\u0000 \u0000 \u0000 interacting with international and European initiatives, projects and organizations.\u0000 \u0000 \u0000 \u0000 providing access to marine biological data with spatio-temporal, taxonomic, environmental- and sampling-related information among others;\u0000 developing an exhaustive data quality control tool based on the Darwin Core standard, the British Oceanographic Data Centre and Natural Environment Research Council Vocabulary Server (BODC NVS2) parameters and other controlled vocabularies used;\u0000 creating and providing training courses to guide data providers;\u0000 performing gap analyses to identify data quality and coverage shortcomings;\u0000 creating and publishing marine biological distribution maps for various species or species groups; and\u0000 interacting with international and European initiatives, projects and organizations.\u0000 Furthermore, EMODnet Biology contributes to the ","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88483631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}