A. P. van de Putte, Yi-Ming Gan, Alyce Hancock, Ben Raymond
The Southern Ocean (SO), delinated to the north by the Antarctic convergence, is a unique environment that experiences rapid change in some areas while remaining relatively untouched by human activities. At the same time, these ecosystems are under severe threat from climate change and other stressors. While our understanding of SO biological processes (e.g., species distributions, feeding ecology, reproduction) has greatly improved in recent years, biological data for the region remains patchy, sparse, and unstandardised depending on the taxonomic group (Griffiths et al. 2014). Due to the scarcity of standardised observations and data, it is difficult to model and predict SO ecosystem responses to climate change, which is often accompanied by other anthropogenic pressures, such as fishing and tourism. Understanding the dynamics and change in the SO necessitates a comprehensive system of observations, data management, scientific analysis, and ensuing policy recommendations. It should be built as much as feasible from current platforms and standards, and it should be visible, verifiable and shared in accordance with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles (Van de Putte and Griffiths 2021). For this we need to identify the stakeholders' needs, sources of data, the algorithms for analysing the data and the infrastructure on which to run the algorithms (Benson and Brooks 2018). Existing synergistic methods for identifying selected variables for (life) monitoring include Essential Biodiversity Variables (EBVs; Pereira and Ferrier 2013), Essential Ocean Variables (EOVs; Miloslavich and Bax 2018), Essential Climate Variables (ECVs; Bojinski and Verstraete 2014), and ecosystem Essential Ocean Variables (eEOVs; Constable and Costa 2016). (For an overview see Muller-Karger and Miloslavich 2018.) These variables, can be integrated into the Southern Ocean Observation System (SOOS) and SOOSmap but also national or global systems (e.g., Group on Earth Observations-Biodiversty Observation Network (GEO-BON)). The resulting data products can in turn be used to inform policy makers. The use of Essential Variables (EVs) marks a significant step forward in the monitoring and assessment of SO ecosystems. However, these EVs will necessitate prioritising certain variables and data collection. Here we present the outcomes of a workshop organised in August 2023 that aimed to outline the set Essential Variables and workflows required for a distributed system that can translate biodiversity data (and environmental data) into policy-relevant data products. The goals of the workshop were: Create an inventory of EVs relevant for the Southern Ocean based on existing efforts by the GEO-BON and the Marine Biodiversity Observation Network (MBON). Identify data requirements and data gaps for calculating such EVs and prioritise EVs to work on. Identify existing workflows and tools. Develop a framework for developing the workf
南大洋(SO)被南极辐合带向北划分,是一个独特的环境,在某些地区经历快速变化的同时相对未受人类活动的影响。与此同时,这些生态系统受到气候变化和其他压力因素的严重威胁。虽然近年来我们对SO生物过程(如物种分布、摄食生态、繁殖)的理解有了很大的提高,但该地区的生物数据仍然不完整、稀疏,并且由于分类群的不同而未标准化(Griffiths et al. 2014)。由于缺乏标准化的观测和数据,很难模拟和预测SO生态系统对气候变化的响应,而气候变化通常伴随着其他人为压力,如渔业和旅游业。了解SO的动态和变化需要一个综合的系统,包括观察、数据管理、科学分析和随后的政策建议。它应该尽可能在现有平台和标准的基础上构建,并且应该是可见的,可验证的,并按照FAIR(可查找,可访问,可互操作和可重用)原则共享(Van de Putte和Griffiths 2021)。为此,我们需要确定利益相关者的需求、数据来源、分析数据的算法以及运行算法的基础设施(Benson和Brooks 2018)。现有的确定生命监测选定变量的协同方法包括基本生物多样性变量(ebv);Pereira and Ferrier 2013),基本海洋变量(EOVs;Miloslavich and Bax 2018),基本气候变量(ecv);Bojinski and Verstraete 2014),以及生态系统基本海洋变量(eEOVs;康斯特布尔和科斯塔2016)。(有关概述,请参阅Muller-Karger and Miloslavich 2018。)这些变量可以纳入南大洋观测系统(SOOS)和SOOSmap,也可以纳入国家或全球系统(例如,地球观测小组-生物多样性观测网(GEO-BON))。由此产生的数据产品反过来可用于为决策者提供信息。基本变量(ev)的使用标志着SO生态系统监测和评估向前迈出了重要一步。然而,这些电动汽车将需要优先考虑某些变量和数据收集。在这里,我们介绍了2023年8月组织的一次研讨会的成果,该研讨会旨在概述分布式系统所需的一系列基本变量和工作流程,该系统可以将生物多样性数据(和环境数据)转化为与政策相关的数据产品。研讨会的目标是:根据GEO-BON和海洋生物多样性观测网(MBON)的现有工作,编制一份与南大洋相关的电动汽车清单。确定计算此类电动汽车的数据需求和数据差距,并确定要优先处理的电动汽车。识别现有的工作流和工具。制定框架,制定将公共生物多样性数据转化为相关ev所需的工作流程。根据GEO-BON和海洋生物多样性观测网(MBON)的现有工作,创建与南大洋相关的电动汽车清单。确定计算此类电动汽车的数据需求和数据差距,并确定要优先处理的电动汽车。识别现有的工作流和工具。制定框架,制定将公共生物多样性数据转化为相关ev所需的工作流程。
{"title":"Towards a Distributed System for Essential Variables for the Southern Ocean","authors":"A. P. van de Putte, Yi-Ming Gan, Alyce Hancock, Ben Raymond","doi":"10.3897/biss.7.112289","DOIUrl":"https://doi.org/10.3897/biss.7.112289","url":null,"abstract":"The Southern Ocean (SO), delinated to the north by the Antarctic convergence, is a unique environment that experiences rapid change in some areas while remaining relatively untouched by human activities. At the same time, these ecosystems are under severe threat from climate change and other stressors. While our understanding of SO biological processes (e.g., species distributions, feeding ecology, reproduction) has greatly improved in recent years, biological data for the region remains patchy, sparse, and unstandardised depending on the taxonomic group (Griffiths et al. 2014).\u0000 Due to the scarcity of standardised observations and data, it is difficult to model and predict SO ecosystem responses to climate change, which is often accompanied by other anthropogenic pressures, such as fishing and tourism. Understanding the dynamics and change in the SO necessitates a comprehensive system of observations, data management, scientific analysis, and ensuing policy recommendations. It should be built as much as feasible from current platforms and standards, and it should be visible, verifiable and shared in accordance with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles (Van de Putte and Griffiths 2021). For this we need to identify the stakeholders' needs, sources of data, the algorithms for analysing the data and the infrastructure on which to run the algorithms (Benson and Brooks 2018). Existing synergistic methods for identifying selected variables for (life) monitoring include Essential Biodiversity Variables (EBVs; Pereira and Ferrier 2013), Essential Ocean Variables (EOVs; Miloslavich and Bax 2018), Essential Climate Variables (ECVs; Bojinski and Verstraete 2014), and ecosystem Essential Ocean Variables (eEOVs; Constable and Costa 2016). (For an overview see Muller-Karger and Miloslavich 2018.) These variables, can be integrated into the Southern Ocean Observation System (SOOS) and SOOSmap but also national or global systems (e.g., Group on Earth Observations-Biodiversty Observation Network (GEO-BON)). The resulting data products can in turn be used to inform policy makers.\u0000 The use of Essential Variables (EVs) marks a significant step forward in the monitoring and assessment of SO ecosystems. However, these EVs will necessitate prioritising certain variables and data collection. Here we present the outcomes of a workshop organised in August 2023 that aimed to outline the set Essential Variables and workflows required for a distributed system that can translate biodiversity data (and environmental data) into policy-relevant data products.\u0000 The goals of the workshop were:\u0000 \u0000 \u0000 \u0000 Create an inventory of EVs relevant for the Southern Ocean based on existing efforts by the GEO-BON and the Marine Biodiversity Observation Network (MBON).\u0000 \u0000 \u0000 Identify data requirements and data gaps for calculating such EVs and prioritise EVs to work on.\u0000 \u0000 \u0000 Identify existing workflows and tools.\u0000 \u0000 \u0000 Develop a framework for developing the workf","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73126878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Bakiş, Xiaojun Wang, B. Altıntaş, Dom Jebbia, Henry Bart Jr.
A new science discipline has emerged within the last decade at the intersection of informatics, computer science and biology: Imageomics. Like most other -omics fields, Imageomics also uses emerging technologies to analyze biological data but from the images. One of the most applied data analysis methods for image datasets is Machine Learning (ML). In 2019, we started working on a United States National Science Foundation (NSF) funded project, known as Biology Guided Neural Networks (BGNN) with the purpose of extracting information about biology by using neural networks and biological guidance such as species descriptions, identifications, phylogenetic trees and morphological annotations (Bart et al. 2021). Even though the variety and abundance of biological data is satisfactory for some ML analysis and the data are openly accessible, researchers still spend up to 80% of their time preparing data into a usable, AI-ready format, leaving only 20% for exploration and modeling (Long and Romanoff 2023). For this reason, we have built a dataset composed of digitized fish specimens, taken either directly from collections or from specialized repositories. The range of digital representations we cover is broad and growing, from photographs and radiographs, to CT scans, and even illustrations. We have added new groups of vocabularies to the dataset management system including image quality metadata, extended image metadata and batch metadata. With the image quality metadata and extended image metadata, we aimed to extract information from the digital objects that can possibly help ML scientists in their research with filtering, image processing and object recognition routines. Image quality metadata provides information about objects contained in the image, features and condition of the specimen, and some basic visual properties of the image, while extended image metadata provides information about technical properties of the digital file and the digital multimedia object (Bakış et al. 2021, Karnani et al. 2022, Leipzig et al. 2021, Pepper et al. 2021, Wang et al. 2021) (see details on Fish-AIR vocabulary web page). Batch metadata is used for separating different datasets and facilitates downloading and uploading data in batches with additional batch information and supplementary files. Additional flexibility, built into the database infrastructure using an RDF framework, will enable the system to host different taxonomic groups, which might require new metadata features (Jebbia et al. 2023). By the combination of these features, along with FAIR (Findable, Accessable, Interoperable, Reusable) principles, and reproducibility, we provide Artificial Intelligence Readiness (AIR; Long and Romanoff 2023) to the dataset. Fish-AIR provides an easy-to-access, filtered, annotated and cleaned biological dataset for researchers from different backgrounds and facilitates the integration of biological knowledge based on digitized preserved specimens into ML pipelines.
在过去十年中,在信息学、计算机科学和生物学的交叉领域出现了一门新的科学学科:图像组学。像大多数其他组学领域一样,图像组学也使用新兴技术来分析生物数据,但只是从图像中。机器学习(ML)是图像数据集最常用的数据分析方法之一。2019年,我们开始在美国国家科学基金会(NSF)资助的项目上工作,该项目被称为生物引导神经网络(BGNN),目的是通过使用神经网络和生物指导,如物种描述、鉴定、系统发育树和形态注释来提取生物学信息(Bart et al. 2021)。尽管生物数据的多样性和丰丰性对一些ML分析来说是令人满意的,而且数据是公开可访问的,但研究人员仍然花费高达80%的时间将数据准备成可用的ai格式,只剩下20%的时间用于探索和建模(Long and Romanoff 2023)。出于这个原因,我们建立了一个由数字化鱼类标本组成的数据集,这些标本要么直接来自收集品,要么来自专门的储存库。我们涵盖的数字表示的范围是广泛的和不断增长的,从照片和x光片,到CT扫描,甚至插图。我们在数据集管理系统中添加了新的词汇组,包括图像质量元数据、扩展图像元数据和批处理元数据。通过图像质量元数据和扩展图像元数据,我们旨在从数字对象中提取信息,这些信息可能有助于ML科学家进行过滤,图像处理和对象识别例程的研究。图像质量元数据提供有关图像中包含的对象、样本的特征和状态以及图像的一些基本视觉属性的信息,而扩展图像元数据提供有关数字文件和数字多媒体对象的技术属性的信息(Bakış et al. 2021, Karnani et al. 2022, Leipzig et al. 2021, Pepper et al. 2021, Wang et al. 2021)(详见Fish-AIR词汇网页)。批元数据用于分离不同的数据集,便于批量下载和上传数据,并附带额外的批信息和补充文件。使用RDF框架内置于数据库基础设施中的额外灵活性将使系统能够托管不同的分类组,这可能需要新的元数据特性(Jebbia et al. 2023)。通过结合这些特性,以及FAIR(可查找、可访问、可互操作、可重用)原则和可重复性,我们提供了人工智能就绪(AIR;Long和Romanoff 2023)的数据集。Fish-AIR为来自不同背景的研究人员提供易于访问,过滤,注释和清洁的生物数据集,并促进基于数字化保存标本的生物知识整合到ML管道中。由于灵活的数据库基础设施和新数据集的添加,在不久的将来,研究人员还将能够访问其他类型的数据,如地标、标本轮廓、注释部分和质量分数。该数据集已经是集成了图像质量管理系统的最大、最详细的人工智能鱼图像数据集(Jebbia et al. 2023, Wang et al. 2021)。
{"title":"On Image Quality Metadata, FAIR in ML, AI-Readiness and Reproducibility: Fish-AIR example","authors":"Y. Bakiş, Xiaojun Wang, B. Altıntaş, Dom Jebbia, Henry Bart Jr.","doi":"10.3897/biss.7.112178","DOIUrl":"https://doi.org/10.3897/biss.7.112178","url":null,"abstract":"A new science discipline has emerged within the last decade at the intersection of informatics, computer science and biology: Imageomics. Like most other -omics fields, Imageomics also uses emerging technologies to analyze biological data but from the images. One of the most applied data analysis methods for image datasets is Machine Learning (ML). In 2019, we started working on a United States National Science Foundation (NSF) funded project, known as Biology Guided Neural Networks (BGNN) with the purpose of extracting information about biology by using neural networks and biological guidance such as species descriptions, identifications, phylogenetic trees and morphological annotations (Bart et al. 2021). Even though the variety and abundance of biological data is satisfactory for some ML analysis and the data are openly accessible, researchers still spend up to 80% of their time preparing data into a usable, AI-ready format, leaving only 20% for exploration and modeling (Long and Romanoff 2023). For this reason, we have built a dataset composed of digitized fish specimens, taken either directly from collections or from specialized repositories. The range of digital representations we cover is broad and growing, from photographs and radiographs, to CT scans, and even illustrations. We have added new groups of vocabularies to the dataset management system including image quality metadata, extended image metadata and batch metadata. With the image quality metadata and extended image metadata, we aimed to extract information from the digital objects that can possibly help ML scientists in their research with filtering, image processing and object recognition routines. Image quality metadata provides information about objects contained in the image, features and condition of the specimen, and some basic visual properties of the image, while extended image metadata provides information about technical properties of the digital file and the digital multimedia object (Bakış et al. 2021, Karnani et al. 2022, Leipzig et al. 2021, Pepper et al. 2021, Wang et al. 2021) (see details on Fish-AIR vocabulary web page). Batch metadata is used for separating different datasets and facilitates downloading and uploading data in batches with additional batch information and supplementary files.\u0000 Additional flexibility, built into the database infrastructure using an RDF framework, will enable the system to host different taxonomic groups, which might require new metadata features (Jebbia et al. 2023). By the combination of these features, along with FAIR (Findable, Accessable, Interoperable, Reusable) principles, and reproducibility, we provide Artificial Intelligence Readiness (AIR; Long and Romanoff 2023) to the dataset.\u0000 Fish-AIR provides an easy-to-access, filtered, annotated and cleaned biological dataset for researchers from different backgrounds and facilitates the integration of biological knowledge based on digitized preserved specimens into ML pipelines.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90968505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kathryn Hall, Matt Andrews, Keeva Connolly, Yasima Kankanamge, Christopher Mangion, Winnie Mok, Lars Nauheimer, Goran Sterjov, Nigel Ward, Peter Brenton
Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single points of origin, like GenBank (part of the United States' National Center for Biotechnology Information), to find the reference or comparative data they needed, but the rapidity of data generation using next-gen methods, and the enormous size and diversity of datasets derived from next-gen sequencing methods, mean that single databases no longer contain all data of a specific class, which may be attributable to individual taxa, nor the full breadth of data types relevant for that taxon. Comprehensively searching for taxonomically relevant data, and indeed, data of types germane to the research question, is a significant challenge for researchers. Data are openly available online, but the data may be stored under synonyms or indexed via unconventional taxonomies. Data repositories are largely disconnected and researchers must visit multiple sites to have confidence that their searches have been exhaustive. Databases may focus on single data types and not store or reference other data assets, though they may be relevant for the taxon of interest. Additionally, our survey of the genomics community indicated that researchers are less likely to trust data with inadequately evidenced provenance metadata. This means that genomics data are hard to find and are often untrusted. Moreover, even once found, the data are in formats that do not interoperate with occurrence and ecological datasets, such as those housed in the Atlas of Living Australia. We built the Australian Reference Genome Atlas (ARGA) to overcome the barriers faced by researchers in finding and collating genomics data for Australia’s species, and we have built it so that researchers can search for data within taxonomically accepted contexts and defined intersections and conjunctions with verified and expert ecological datasets. Using a series of ingestion scripts, the ARGA data team has implemented new and customised data mappings that effectively integrate genomics data, ecological traits, and occurrence data within an extended Darwin Core Event framework (GBIF 2018). Here, we will demonstrate how the architecture we derived for ARGA application works, and how it can be extended as new data sources emerge. We then demonstrate how our flexible model can be used to: locate genomics data for taxa of interest; explore data within an ecological context; and calculate metrics for data availability for provincial bioregions. locate genomics data for taxa of interest; explore data within an ecological context; and calculate metrics for data availability for provincial bioregions.
{"title":"The Australian Reference Genome Atlas (ARGA): Finding, sharing and reusing Australian genomics data in an occurrence-driven context","authors":"Kathryn Hall, Matt Andrews, Keeva Connolly, Yasima Kankanamge, Christopher Mangion, Winnie Mok, Lars Nauheimer, Goran Sterjov, Nigel Ward, Peter Brenton","doi":"10.3897/biss.7.112129","DOIUrl":"https://doi.org/10.3897/biss.7.112129","url":null,"abstract":"Fundamental to the capacity of Australia’s 15,000 biosciences researchers to answer questions in taxonomy, phylogeny, evolution, conservation, and applied fields like crop improvement and biosecurity, is access to trusted genomics (and genetics) datasets. Historically, researchers turned to single points of origin, like GenBank (part of the United States' National Center for Biotechnology Information), to find the reference or comparative data they needed, but the rapidity of data generation using next-gen methods, and the enormous size and diversity of datasets derived from next-gen sequencing methods, mean that single databases no longer contain all data of a specific class, which may be attributable to individual taxa, nor the full breadth of data types relevant for that taxon. Comprehensively searching for taxonomically relevant data, and indeed, data of types germane to the research question, is a significant challenge for researchers. Data are openly available online, but the data may be stored under synonyms or indexed via unconventional taxonomies. Data repositories are largely disconnected and researchers must visit multiple sites to have confidence that their searches have been exhaustive. Databases may focus on single data types and not store or reference other data assets, though they may be relevant for the taxon of interest. Additionally, our survey of the genomics community indicated that researchers are less likely to trust data with inadequately evidenced provenance metadata. This means that genomics data are hard to find and are often untrusted. Moreover, even once found, the data are in formats that do not interoperate with occurrence and ecological datasets, such as those housed in the Atlas of Living Australia. \u0000 We built the Australian Reference Genome Atlas (ARGA) to overcome the barriers faced by researchers in finding and collating genomics data for Australia’s species, and we have built it so that researchers can search for data within taxonomically accepted contexts and defined intersections and conjunctions with verified and expert ecological datasets. Using a series of ingestion scripts, the ARGA data team has implemented new and customised data mappings that effectively integrate genomics data, ecological traits, and occurrence data within an extended Darwin Core Event framework (GBIF 2018). Here, we will demonstrate how the architecture we derived for ARGA application works, and how it can be extended as new data sources emerge. We then demonstrate how our flexible model can be used to:\u0000 \u0000 \u0000 \u0000 locate genomics data for taxa of interest;\u0000 \u0000 \u0000 explore data within an ecological context; and\u0000 \u0000 \u0000 calculate metrics for data availability for provincial bioregions.\u0000 \u0000 \u0000 \u0000 locate genomics data for taxa of interest;\u0000 explore data within an ecological context; and\u0000 calculate metrics for data availability for provincial bioregions.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"1a 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88128719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert Stevenson, Elizabeth R. Ellwood, Peter Brenton, P. Flemons, Jeff Gerbracht, Wesley Hochachka, Scott Loarie, Carrie Seltzer
The collection, archiving and use of biodiversity data depend on a network of pipelines herein called the Biodiversity Data Enterprise (BDE) and best understood globally through the work of the Global Biodiversity Information Facility (GBIF). Efforts to sustain and grow the BDE require information about the data pipeline and the infrastructure that supports it. A host of metrics from GBIF, including institutional participation (member countries, institutional contributors, data publishers), biodiversity coverage (occurrence records, species, geographic extent, data sets) and data usage (records downloaded, published papers using the data) (Miller 2021), document the rapid growth and successes of the BDE (GBIF Secretariat 2022). Heberling et al. (2021) make a convincing case that the data integration process is working. The Biodiversity Information Standards' (TDWG) Basis of Record term provides information about the underlying infrastructure. It categorizes the kinds of processes*1 that teams undertake to capture biodiversity information and GBIF quantifies their contributions*2 (Table 1). Currently 83.4% of observations come from human observations, of which 63% are of birds. Museum preserved specimens account for 9.5% of records. In both cases, a combination of volunteers (who make observations, collect specimens, digitize specimens, transcribe specimen labels) and professionals work together to make records available. To better understand how the BDE is working, we suggest that it would be of value to know the number of contributions and contributors and their hours of engagement for each data set. This can help the community address questions such as, "How many volunteers do we need to document birds in a given area?" or "How much professional support is required to run a camera trap network?" For example, millions of observations were made by tens of thousands of observers in two recent BioBlitz events, one called Big Day, focusing on birds, sponsored by the Cornell Laboratory of Ornithology and the other called the City Nature Challenge, addressing all taxa, sponsored jointly by the California Academy of Sciences and the Natural History Musuems of Los Angeles County (Table 2). In our presentation we will suggest approaches to deriving metrics that could be used to document the collaborations and contribution of volunteers and staff using examples from both Human Observation (eBird, iNaturalist) and Preserved Specimen (DigiVol, Notes from Nature) record types. The goal of the exercise is to start a conversation about how such metrics can further the development of the BDE.
生物多样性数据的收集、存档和使用依赖于此处称为生物多样性数据企业(BDE)的管道网络,通过全球生物多样性信息基金(GBIF)的工作在全球范围内得到最好的理解。维持和发展BDE需要有关数据管道和支持它的基础设施的信息。GBIF的一系列指标,包括机构参与(成员国、机构贡献者、数据出版商)、生物多样性覆盖率(发生记录、物种、地理范围、数据集)和数据使用(使用数据下载的记录、发表的论文)(Miller 2021),记录了生物多样性指数(GBIF秘书处2022)的快速增长和成功。Heberling等人(2021)提出了一个令人信服的案例,证明数据整合过程正在发挥作用。生物多样性信息标准(TDWG)记录基础术语提供了有关底层基础设施的信息。它对团队为获取生物多样性信息而进行的各种过程*1进行了分类,GBIF量化了它们的贡献*2(表1)。目前83.4%的观测来自人类观测,其中63%来自鸟类观测。博物馆保存的标本占记录的9.5%。在这两种情况下,志愿者(进行观察、收集标本、将标本数字化、抄写标本标签)和专业人员共同努力,提供记录。为了更好地理解BDE是如何工作的,我们建议了解每个数据集的贡献和贡献者的数量以及他们的参与时间是有价值的。这可以帮助社区解决诸如“我们需要多少志愿者来记录特定地区的鸟类?”或“运行一个摄像机陷阱网络需要多少专业支持?”例如,在最近的两次生物闪电活动中,数以万计的观察者进行了数百万次观察,一次是由康奈尔鸟类学实验室赞助的“大日子”,重点关注鸟类,另一次是“城市自然挑战”,关注所有分类群,由加州科学院和洛杉矶县自然历史博物馆联合赞助(表2)。在我们的演讲中,我们将通过人类观察(eBird, iNaturalist)和保存标本(DigiVol, Notes from Nature)记录类型的例子,提出可用于记录志愿者和工作人员合作和贡献的度量方法。该练习的目标是开始讨论这些指标如何进一步推动BDE的开发。
{"title":"Can Biodiversity Data Scientists Document Volunteer and Professional Collaborations and Contributions in the Biodiversity Data Enterprise?","authors":"Robert Stevenson, Elizabeth R. Ellwood, Peter Brenton, P. Flemons, Jeff Gerbracht, Wesley Hochachka, Scott Loarie, Carrie Seltzer","doi":"10.3897/biss.7.112126","DOIUrl":"https://doi.org/10.3897/biss.7.112126","url":null,"abstract":"The collection, archiving and use of biodiversity data depend on a network of pipelines herein called the Biodiversity Data Enterprise (BDE) and best understood globally through the work of the Global Biodiversity Information Facility (GBIF). Efforts to sustain and grow the BDE require information about the data pipeline and the infrastructure that supports it. A host of metrics from GBIF, including institutional participation (member countries, institutional contributors, data publishers), biodiversity coverage (occurrence records, species, geographic extent, data sets) and data usage (records downloaded, published papers using the data) (Miller 2021), document the rapid growth and successes of the BDE (GBIF Secretariat 2022). Heberling et al. (2021) make a convincing case that the data integration process is working.\u0000 The Biodiversity Information Standards' (TDWG) Basis of Record term provides information about the underlying infrastructure. It categorizes the kinds of processes*1 that teams undertake to capture biodiversity information and GBIF quantifies their contributions*2 (Table 1). Currently 83.4% of observations come from human observations, of which 63% are of birds. Museum preserved specimens account for 9.5% of records. In both cases, a combination of volunteers (who make observations, collect specimens, digitize specimens, transcribe specimen labels) and professionals work together to make records available.\u0000 To better understand how the BDE is working, we suggest that it would be of value to know the number of contributions and contributors and their hours of engagement for each data set. This can help the community address questions such as, \"How many volunteers do we need to document birds in a given area?\" or \"How much professional support is required to run a camera trap network?\" For example, millions of observations were made by tens of thousands of observers in two recent BioBlitz events, one called Big Day, focusing on birds, sponsored by the Cornell Laboratory of Ornithology and the other called the City Nature Challenge, addressing all taxa, sponsored jointly by the California Academy of Sciences and the Natural History Musuems of Los Angeles County (Table 2). In our presentation we will suggest approaches to deriving metrics that could be used to document the collaborations and contribution of volunteers and staff using examples from both Human Observation (eBird, iNaturalist) and Preserved Specimen (DigiVol, Notes from Nature) record types. The goal of the exercise is to start a conversation about how such metrics can further the development of the BDE.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81336615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Predictability is one of the core requirements for creating machine actionable data. The better predictable the data, the more generic the service acting on the data can be. The more generic the service, the easier we can exchange ideas, collaborate on initiatives and leverage machines to do the work. It is essential for implementing the FAIR Principles (Findable, Accessible, Interoperable, Reproducible), as it provides the “I” for Interoperability (Jacobsen et al. 2020). The FAIR principles emphasise machine actionability because the amount of data generated is far too large for humans to handle. While Biodiversity Information Standards (TDWG) standards have massively improved the standardisation of biodiversity data, there is still room for improvement. Within the Distributed System of Scientific Collections (DiSSCo), we aim to harmonise all scientific data derived from European specimen collections, including geological specimens, into a single data specification. We call this data specification the open Digital Specimen (openDS). It is being built on top of existing and developing biodiversity information standards such as Darwin Core (DwC), Minimal Information Digital Specimen (MIDS), Latimer Core, Access to Biological Collection Data (ABCD) Schema, Extension for Geosciences (EFG) and also on the new Global Biodiversity Information Facility (GBIF) Unified Model. In openDS we leverage the existing standards within the TDWG community but combine these with stricter constraints and controlled vocabularies, with the aim to improve the FAIRness of the data. This will not only make the data easier to use, but will also increase its quality and machine actionability. As the first step towards this the harmonisation of terms, we make sure that similar values use the same term in a standard as key. This enables the next step in which we harmonise the values. We can transform free-text values into standardised or controlled vocabularies. For example: instead of using the names J. Doe, John Doe and J. Doe sr. for a collector, we aim to standardise these to J. Doe, with a person identifier that connects this name with more information about the collector. Biodiversity information standards such as DwC were developed to lower the bar for data sharing. The downside of including minimal restraints and flexibility is that they provide room for ambiguity, leading to multiple ways of interpretation. This limits interoperability and hampers machine actionability. In DiSSCo, data will come from different sources that use different biodiversity information standards. To cover this, we need to harmonise terms between these standards. To complicate things further, different serialisation methods are used for data exchange. Darwin Core Archives (DwC-A; GBIF 2021) use Comma-separated values (CSV) files. ABCD(EFG) exposed through Biological Collection Access Service (BioCASe) uses XML. And most custom formats use JavaScript Object Notation (JSON). In this lightn
可预测性是创建机器可操作数据的核心要求之一。数据的可预测性越好,作用于数据的服务就越通用。服务越通用,我们就越容易交换想法,在计划上进行合作,并利用机器来完成工作。它对于实现FAIR原则(可查找、可访问、可互操作、可复制)至关重要,因为它为互操作性提供了“I”(Jacobsen et al. 2020)。FAIR原则强调机器的可操作性,因为生成的数据量太大,人类无法处理。虽然生物多样性信息标准(TDWG)标准极大地改善了生物多样性数据的标准化,但仍有改进的空间。在分布式科学收集系统(DiSSCo)中,我们的目标是将来自欧洲标本收集的所有科学数据(包括地质标本)统一到一个单一的数据规范中。我们称这种数据规范为开放数字样本(openes)。它建立在现有的和正在发展的生物多样性信息标准之上,如达尔文核心(DwC)、最小信息数字标本(MIDS)、拉蒂默核心(Latimer Core)、生物收集数据获取(ABCD)模式、地球科学扩展(EFG)以及新的全球生物多样性信息设施(GBIF)统一模型。在openDS中,我们利用了TDWG社区中的现有标准,但将这些标准与更严格的约束和受控词汇表结合起来,目的是提高数据的公平性。这不仅将使数据更容易使用,而且还将提高其质量和机器可操作性。作为实现这一目标的第一步,术语的协调,我们确保相似的值在标准中使用相同的术语作为关键。这使我们能够协调价值观的下一步。我们可以将自由文本值转换为标准化或受控词汇表。例如:我们的目标是将这些名称标准化为J. Doe,而不是使用J. Doe, John Doe和J. Doe sr.来表示收集器,并使用一个个人标识符将该名称与收集器的更多信息联系起来。诸如DwC之类的生物多样性信息标准的制定是为了降低数据共享的门槛。包含最小限制和灵活性的缺点是,它们为歧义提供了空间,导致多种解释方式。这限制了互操作性并阻碍了机器的可操作性。在DiSSCo中,数据将来自使用不同生物多样性信息标准的不同来源。为了解决这个问题,我们需要协调这些标准之间的术语。使事情进一步复杂化的是,数据交换使用了不同的序列化方法。达尔文核心档案(DwC-A;GBIF 2021)使用逗号分隔值(CSV)文件。通过生物收集访问服务(BioCASe)公开的ABCD(EFG)使用XML。大多数自定义格式使用JavaScript对象表示法(JSON)。在这个简短的演讲中,我们将深入探讨disco在协调过程中的技术实现。DiSSCo目前支持两个生物多样性信息标准,DwC和ABCD(EFG),并在逐个记录的基础上将数据映射到我们的openDS规范。我们将重点介绍一些更有问题的映射,但也展示了一个协调的模型如何大规模简化一般的动作,如MIDS水平的计算,它提供了关于标本数字化完整性的信息。最后,我们将快速浏览一下接下来的步骤,并希望开始讨论受控词汇表。高质量、标准化数据的开发基于严格的规范,具有受控的词汇,植根于社区接受的标准,可以对生物多样性研究产生巨大影响,并且是在计算支持下扩大研究规模的重要一步。
{"title":"Harmonised Data is Actionable Data: DiSSCo’s solution to data mapping","authors":"Sam Leeflang, W. Addink","doi":"10.3897/biss.7.112137","DOIUrl":"https://doi.org/10.3897/biss.7.112137","url":null,"abstract":"Predictability is one of the core requirements for creating machine actionable data. The better predictable the data, the more generic the service acting on the data can be. The more generic the service, the easier we can exchange ideas, collaborate on initiatives and leverage machines to do the work. It is essential for implementing the FAIR Principles (Findable, Accessible, Interoperable, Reproducible), as it provides the “I” for Interoperability (Jacobsen et al. 2020). The FAIR principles emphasise machine actionability because the amount of data generated is far too large for humans to handle. \u0000 While Biodiversity Information Standards (TDWG) standards have massively improved the standardisation of biodiversity data, there is still room for improvement. Within the Distributed System of Scientific Collections (DiSSCo), we aim to harmonise all scientific data derived from European specimen collections, including geological specimens, into a single data specification. We call this data specification the open Digital Specimen (openDS). It is being built on top of existing and developing biodiversity information standards such as Darwin Core (DwC), Minimal Information Digital Specimen (MIDS), Latimer Core, Access to Biological Collection Data (ABCD) Schema, Extension for Geosciences (EFG) and also on the new Global Biodiversity Information Facility (GBIF) Unified Model. In openDS we leverage the existing standards within the TDWG community but combine these with stricter constraints and controlled vocabularies, with the aim to improve the FAIRness of the data. This will not only make the data easier to use, but will also increase its quality and machine actionability.\u0000 As the first step towards this the harmonisation of terms, we make sure that similar values use the same term in a standard as key. This enables the next step in which we harmonise the values. We can transform free-text values into standardised or controlled vocabularies. For example: instead of using the names J. Doe, John Doe and J. Doe sr. for a collector, we aim to standardise these to J. Doe, with a person identifier that connects this name with more information about the collector.\u0000 Biodiversity information standards such as DwC were developed to lower the bar for data sharing. The downside of including minimal restraints and flexibility is that they provide room for ambiguity, leading to multiple ways of interpretation. This limits interoperability and hampers machine actionability. In DiSSCo, data will come from different sources that use different biodiversity information standards. To cover this, we need to harmonise terms between these standards. To complicate things further, different serialisation methods are used for data exchange. Darwin Core Archives (DwC-A; GBIF 2021) use Comma-separated values (CSV) files. ABCD(EFG) exposed through Biological Collection Access Service (BioCASe) uses XML. And most custom formats use JavaScript Object Notation (JSON).\u0000 In this lightn","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74282797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanane Ariouat, Youcef Sklab, M. Pignal, Régine Vignes Lebbe, Jean-Daniel Zucker, Edi Prifti, E. Chenin
Herbarium specimen scans constitute a valuable source of raw data. Herbarium collections are gaining interest in the scientific community as their exploration can lead to understanding serious threats to biodiversity. Data derived from scanned specimen images can be analyzed to answer important questions such as how plants respond to climate change, how different species respond to biotic and abiotic influences, or what role a species plays within an ecosystem. However, exploiting such large collections is challenging and requires automatic processing. A promising solution lies in the use of computer-based processing techniques, such as Deep Learning (DL). But herbarium specimens can be difficult to process and analyze as they contain several kinds of visual noise, including information labels, scale bars, color palettes, envelopes containing seeds or other organs, collection-specific barcodes, stamps, and other notes that are placed on the mounting sheet. Moreover, the paper on which the specimens are mounted can degrade over time for multiple reasons, and often the paper's color darkens and, in some cases, approaches the color of the plants. Neural network models are well-suited to the analysis of herbarium specimens, while making abstraction of the presence of such visual noise. However, in some cases the model can focus on these elements, which eventually can lead to a bad generalization when analyzing new data on which these visual elements are not present (White et al. 2020). It is important to remove the noise from specimen scans before using them in model training and testing to improve its performance. Studies have used basic cropping techniques (Younis et al. 2018), but they do not guarantee that the visual noise is removed from the cropped image. For instance, the labels are frequently put at random positions into the scans, resulting in cropped images that still contain noise. White et al. (2020) used the Otsu binarization method followed by a manual post-processing and a blurring step to adjust the pixels that should have been assigned to black during segmentation. Hussein et al. (2020) used an image labeler application, followed by a median filtering method to reduce the noise. However, both White et al. (2020) and Hussein et al. (2020) consider only two organs: stems and leaves. Triki et al. (2022) used a polygon-based deep learning object detection algorithm. But in addition to being laborious and difficult, this approach does not give good results when it comes to fully identifying specimens. In this work, we aim to create clean high-resolution mask extractions with the same resolution as the original images. These masks can be used by other models for a variety of purposes, for instance to distinguish the different plant organs. Here, we proceed by combining object detection and image segmentation techniques, using a dataset of scanned herbarium specimens. We propose an algorithm that identifies and retains the pixels belongi
植物标本室标本扫描是原始数据的宝贵来源。由于对植物标本馆的探索可以帮助人们了解生物多样性面临的严重威胁,因此科学界对植物标本馆的收藏越来越感兴趣。通过分析扫描标本图像获得的数据,可以回答诸如植物如何应对气候变化、不同物种如何应对生物和非生物影响,或者物种在生态系统中扮演什么角色等重要问题。然而,利用如此大的集合是具有挑战性的,并且需要自动处理。一个有希望的解决方案是使用基于计算机的处理技术,如深度学习(DL)。但是,植物标本室标本可能难以处理和分析,因为它们包含几种视觉噪声,包括信息标签、比尺、调色板、包含种子或其他器官的信封、收集特定的条形码、邮票和放置在安装片上的其他注释。此外,由于多种原因,放置标本的纸张会随着时间的推移而退化,纸张的颜色通常会变暗,在某些情况下,接近植物的颜色。神经网络模型非常适合于植物标本馆标本的分析,同时抽象了这种视觉噪声的存在。然而,在某些情况下,模型可以专注于这些元素,这最终会导致在分析不存在这些视觉元素的新数据时产生不好的泛化(White et al. 2020)。在将样本扫描用于模型训练和测试之前,去除样本扫描中的噪声以提高其性能是很重要的。研究使用了基本的裁剪技术(Younis et al. 2018),但它们不能保证从裁剪的图像中去除视觉噪声。例如,标签经常被放在扫描的随机位置,导致裁剪的图像仍然包含噪声。White等人(2020)使用Otsu二值化方法,然后进行手动后处理和模糊步骤来调整分割过程中应该分配给黑色的像素。Hussein et al.(2020)使用图像标记器应用,然后使用中值滤波方法来降低噪声。然而,White et al.(2020)和Hussein et al.(2020)只考虑了两个器官:茎和叶。Triki等人(2022)使用了基于多边形的深度学习对象检测算法。但是,除了费力和困难之外,这种方法在完全识别标本时也不能得到很好的结果。在这项工作中,我们的目标是创建与原始图像相同分辨率的干净的高分辨率掩模提取。这些面具可以被其他模型用于各种目的,例如区分不同的植物器官。在这里,我们将目标检测和图像分割技术结合起来,使用扫描的植物标本数据集。我们提出了一种算法,该算法可以识别和保留属于植物标本的像素,并去除作为噪声的非植物元素的其他像素。被移除的像素被设置为零(黑色)。图1给出了完整的掩蔽流水线,主要分为两个阶段:目标检测和图像分割。在第一阶段,我们在950张图像的数据集中使用边界框手动注释图像。我们确定了(图2)被认为是噪声的视觉元素(例如,比例尺、条形码、邮票、文本框、色盘、信封)。然后训练模型自动去除噪声元素。我们将数据集分为80%的训练集,10%的验证集和10%的测试集。我们最终获得了98.2%的精度分数,比基线提高了3%。接下来,将这一阶段的结果作为图像分割的输入,目的是生成最终的掩码。我们将被检测到的噪声元素覆盖的像素变黑,然后我们使用HSV(色调饱和度值)颜色分割来选择只有在一个范围内的值与植物颜色相对应的像素。最后,应用形态学打开操作去除噪声,分离目标;如Sunil Bhutada et al.(2022)所述,填补间隙的关闭操作,以消除剩余的噪声。这里的输出是一个生成的掩码,它只保留属于植物的像素。与其他主要关注叶和茎的方法不同,我们的方法涵盖了所有植物器官(图3)。我们的方法消除了植物标本扫描的背景噪声,提取了干净的植物图像。这是在不同深度学习模型中使用这些图像之前的重要一步。然而,提取的质量取决于扫描的质量、标本的状况和使用的纸张。
{"title":"Extracting Masks from Herbarium Specimen Images Based on Object Detection and Image Segmentation Techniques","authors":"Hanane Ariouat, Youcef Sklab, M. Pignal, Régine Vignes Lebbe, Jean-Daniel Zucker, Edi Prifti, E. Chenin","doi":"10.3897/biss.7.112161","DOIUrl":"https://doi.org/10.3897/biss.7.112161","url":null,"abstract":"Herbarium specimen scans constitute a valuable source of raw data. Herbarium collections are gaining interest in the scientific community as their exploration can lead to understanding serious threats to biodiversity. Data derived from scanned specimen images can be analyzed to answer important questions such as how plants respond to climate change, how different species respond to biotic and abiotic influences, or what role a species plays within an ecosystem. However, exploiting such large collections is challenging and requires automatic processing. A promising solution lies in the use of computer-based processing techniques, such as Deep Learning (DL). But herbarium specimens can be difficult to process and analyze as they contain several kinds of visual noise, including information labels, scale bars, color palettes, envelopes containing seeds or other organs, collection-specific barcodes, stamps, and other notes that are placed on the mounting sheet. Moreover, the paper on which the specimens are mounted can degrade over time for multiple reasons, and often the paper's color darkens and, in some cases, approaches the color of the plants.\u0000 Neural network models are well-suited to the analysis of herbarium specimens, while making abstraction of the presence of such visual noise. However, in some cases the model can focus on these elements, which eventually can lead to a bad generalization when analyzing new data on which these visual elements are not present (White et al. 2020). It is important to remove the noise from specimen scans before using them in model training and testing to improve its performance. Studies have used basic cropping techniques (Younis et al. 2018), but they do not guarantee that the visual noise is removed from the cropped image. For instance, the labels are frequently put at random positions into the scans, resulting in cropped images that still contain noise. White et al. (2020) used the Otsu binarization method followed by a manual post-processing and a blurring step to adjust the pixels that should have been assigned to black during segmentation. Hussein et al. (2020) used an image labeler application, followed by a median filtering method to reduce the noise. However, both White et al. (2020) and Hussein et al. (2020) consider only two organs: stems and leaves. Triki et al. (2022) used a polygon-based deep learning object detection algorithm. But in addition to being laborious and difficult, this approach does not give good results when it comes to fully identifying specimens. \u0000 In this work, we aim to create clean high-resolution mask extractions with the same resolution as the original images. These masks can be used by other models for a variety of purposes, for instance to distinguish the different plant organs. Here, we proceed by combining object detection and image segmentation techniques, using a dataset of scanned herbarium specimens. We propose an algorithm that identifies and retains the pixels belongi","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73621649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Some Biodiversity Information Standards (TDWG) standards have had mappings to other standards for years or even decades. However each standard is using its own approach to documenting those mappings, some are incomplete and often hard to find. There is no TDWG recommended approach for how mappings should be documented, like the Standards Documentation Standard (SDS) does for the standards themselves. During TDWG 2022 in Sofia, Bulgaria, the topic of mapping between standards was mentioned several times throughout the conference, which led to an impromptu discussion about standards mappings at the Unconference slot on the last conference day. Afterwards a dedicated Slack channel within the TDWG Slack workspace was added to continue the conversation (#mappings-between-standards). During further discussions, both within the Technical Architecture Group (TAG) of TDWG and during separate video conferences on the topic, it was decided to form a dedicated task group under the umbrella of the TAG. This task group is still in the process of formation. The goal of the group is to review the current state of mappings for TDWG standards, align approaches by the different standards to foster interoperability and give recommendations for current and future standards on how to specify mappings. Further work to define the strategy and scope for achieving these goals is needed, particularly to gain community input and acceptance. Consideration has been given to a range of possible types of mappings, which serve the different use cases and expectations for mappings such as machine actionability and improved documentation of the TDWG standards landscape to aid user understanding and implementation. In this talk we will show the work that has already been done, outline our planned steps and invite the community to give input on our process.
{"title":"The Journey to a TDWG Mappings Task Group and its Plans for the Future","authors":"David Fichtmueller","doi":"10.3897/biss.7.112148","DOIUrl":"https://doi.org/10.3897/biss.7.112148","url":null,"abstract":"Some Biodiversity Information Standards (TDWG) standards have had mappings to other standards for years or even decades. However each standard is using its own approach to documenting those mappings, some are incomplete and often hard to find. There is no TDWG recommended approach for how mappings should be documented, like the Standards Documentation Standard (SDS) does for the standards themselves. \u0000 During TDWG 2022 in Sofia, Bulgaria, the topic of mapping between standards was mentioned several times throughout the conference, which led to an impromptu discussion about standards mappings at the Unconference slot on the last conference day. Afterwards a dedicated Slack channel within the TDWG Slack workspace was added to continue the conversation (#mappings-between-standards). During further discussions, both within the Technical Architecture Group (TAG) of TDWG and during separate video conferences on the topic, it was decided to form a dedicated task group under the umbrella of the TAG. This task group is still in the process of formation. The goal of the group is to review the current state of mappings for TDWG standards, align approaches by the different standards to foster interoperability and give recommendations for current and future standards on how to specify mappings. Further work to define the strategy and scope for achieving these goals is needed, particularly to gain community input and acceptance. Consideration has been given to a range of possible types of mappings, which serve the different use cases and expectations for mappings such as machine actionability and improved documentation of the TDWG standards landscape to aid user understanding and implementation. In this talk we will show the work that has already been done, outline our planned steps and invite the community to give input on our process.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91524354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In Belgium, a federal country in the heart of Europe, the competencies for nature conservation and nature policy lie within the regions. The Research Institute for Nature and Forest (INBO) is an independent research institute, funded by the Flemish regional government, which underpins and evaluates biodiversity policy and management by means of applied scientific research, and sharing of data and knowledge. One of the 12 strategic goals in the 2009-2015 INBO strategic planning was that: 'INBO manages data and makes them accessible. It looks into appropriate data gathering methods and means by which to disseminate data and make them readily available'. Since 2009, the INBO has steadily evolved into a research institute with a strong emphasis on open data and open science. In 2010 INBO became a data publisher for the Global Biodiversity Information Facility (GBIF), adopted an open data and open access policy and is known for being an open science institute in Flanders, Belgium. In 2021, a question arose from the council of ministers on the possibility and availability of a public portal for biodiversity data. The goal of this portal should be to ensure findability, availability, and optimal usability of biodiversity data, initially for policy makers, but also for the wider public. With the Living Atlas project already high on our radar, an analysis project, funded by the Flemish government, started in December 2021. All the entities in the department of 'Environment' contributed to a requirements and feasibility study, a proof of concept (POC) Living Atlas for Flanders was set up and the required budget was calculated. During the requirements and feasibility study we questioned the agency for nature and forest (ANB), the Flanders Environment Agency (VMM), Flemish land agency (VLM) and the Department of Environment with the help of a professional inquiry agency IPSOS on the possible relevance for policy of a Flemish biodiversity portal, the need of high resolution data (geographical and temporal scale) and the availability of biodiversity data in Flanders, focussed on key species, protected species and other Flemish priority species. During the technical proof of concept, we tested the Living Atlases (LA) software suite as the most mature candidate for a Flemish Living Atlas. We checked how we could set up a LA installation in our own Amazon Web Services (AWS) environment, evaluated all the used technologies, estimated the maintenance and infrastructure cost, the needed profiles and the number of full-time equivalent personnel we would need to run a performant Atlas of Living Flanders. The goal of this talk is to inform the audience on the steps we took, the hurdles we encountered and how we are trying to convince our policy makers of the benefits of an Atlas of Living Flanders.
{"title":"Towards the Atlas of Living Flanders, a Challenging Path","authors":"Dimitri Brosens, Sten Migerode, Aaike De Wever","doi":"10.3897/biss.7.112155","DOIUrl":"https://doi.org/10.3897/biss.7.112155","url":null,"abstract":"In Belgium, a federal country in the heart of Europe, the competencies for nature conservation and nature policy lie within the regions. The Research Institute for Nature and Forest (INBO) is an independent research institute, funded by the Flemish regional government, which underpins and evaluates biodiversity policy and management by means of applied scientific research, and sharing of data and knowledge.\u0000 One of the 12 strategic goals in the 2009-2015 INBO strategic planning was that: 'INBO manages data and makes them accessible. It looks into appropriate data gathering methods and means by which to disseminate data and make them readily available'. Since 2009, the INBO has steadily evolved into a research institute with a strong emphasis on open data and open science. In 2010 INBO became a data publisher for the Global Biodiversity Information Facility (GBIF), adopted an open data and open access policy and is known for being an open science institute in Flanders, Belgium. In 2021, a question arose from the council of ministers on the possibility and availability of a public portal for biodiversity data. The goal of this portal should be to ensure findability, availability, and optimal usability of biodiversity data, initially for policy makers, but also for the wider public. With the Living Atlas project already high on our radar, an analysis project, funded by the Flemish government, started in December 2021. All the entities in the department of 'Environment' contributed to a requirements and feasibility study, a proof of concept (POC) Living Atlas for Flanders was set up and the required budget was calculated.\u0000 During the requirements and feasibility study we questioned the agency for nature and forest (ANB), the Flanders Environment Agency (VMM), Flemish land agency (VLM) and the Department of Environment with the help of a professional inquiry agency IPSOS on the possible relevance for policy of a Flemish biodiversity portal, the need of high resolution data (geographical and temporal scale) and the availability of biodiversity data in Flanders, focussed on key species, protected species and other Flemish priority species.\u0000 During the technical proof of concept, we tested the Living Atlases (LA) software suite as the most mature candidate for a Flemish Living Atlas. We checked how we could set up a LA installation in our own Amazon Web Services (AWS) environment, evaluated all the used technologies, estimated the maintenance and infrastructure cost, the needed profiles and the number of full-time equivalent personnel we would need to run a performant Atlas of Living Flanders.\u0000 The goal of this talk is to inform the audience on the steps we took, the hurdles we encountered and how we are trying to convince our policy makers of the benefits of an Atlas of Living Flanders.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88258987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Early detection of new incursions of species of biosecurity concern is crucial to protecting Australia’s environment, agriculture, and cultural heritage. As Australia’s largest biodiversity data repository, the Atlas of Living Australia (ALA) is often the first platform where new species incursions are recorded. The ALA holds records of more than 2,380 exotic species and over 1.9 million occurrences of pests, weeds, and diseases—many of which are reported though citizen science. However, until recently there has been no systematic mechanism for notifying relevant biosecurity authorities of potential biosecurity threats. To address this, the ALA partnered with the (Australian) Commonwealth Department of Agriculture, Fisheries and Forestry to develop the Biosecurity Alerts System. Two years on, the project has demonstrated the benefits of biosecurity alerts, but significant barriers exist as we now work to expand this system to State and Territory biosecurity agencies, and seek new sources of biosecurity data. In our presentation, we discuss a brief history of invasive alien species in Australia, the Biosecurity Alerts System, and how we are approaching issues with taxonomy, data standards, and cultural sensitivities in aggregating biosecurity data. We conclude by detailing our progress in expanding the alerts system and tackling systemic issues to help elevate Australia’s biosecurity system.
{"title":" Invasions, Plagues, and Epidemics: The Atlas of Living Australia’s deep dive into biosecurity","authors":"Andrew Turley, Erin Roger","doi":"10.3897/biss.7.112127","DOIUrl":"https://doi.org/10.3897/biss.7.112127","url":null,"abstract":"Early detection of new incursions of species of biosecurity concern is crucial to protecting Australia’s environment, agriculture, and cultural heritage. As Australia’s largest biodiversity data repository, the Atlas of Living Australia (ALA) is often the first platform where new species incursions are recorded. The ALA holds records of more than 2,380 exotic species and over 1.9 million occurrences of pests, weeds, and diseases—many of which are reported though citizen science. However, until recently there has been no systematic mechanism for notifying relevant biosecurity authorities of potential biosecurity threats. To address this, the ALA partnered with the (Australian) Commonwealth Department of Agriculture, Fisheries and Forestry to develop the Biosecurity Alerts System. Two years on, the project has demonstrated the benefits of biosecurity alerts, but significant barriers exist as we now work to expand this system to State and Territory biosecurity agencies, and seek new sources of biosecurity data. In our presentation, we discuss a brief history of invasive alien species in Australia, the Biosecurity Alerts System, and how we are approaching issues with taxonomy, data standards, and cultural sensitivities in aggregating biosecurity data. We conclude by detailing our progress in expanding the alerts system and tackling systemic issues to help elevate Australia’s biosecurity system.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"222 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76987901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DiversityIndia, founded in 2001, is an online community dedicated to promoting meaningful discussions and facilitating the exchange of diverse perspectives on lesser-known taxonomic groups, including butterflies, moths, dragonflies, spiders, and more. The core idea behind DiversityIndia is to establish a network of like-minded individuals who possess a deep passion for these subjects and actively participate in various aspects of biodiversity observation and research. Initially, the taxonomic focus of DiversityIndia centered around butterflies, which led to the creation of the ButterflyIndia Yahoo email group. The group quickly gained recognition for its significant contributions in sharing valuable insights about butterflies, including information about their habitats and lesser-known species. ButterflyIndia also played a vital role in facilitating connections among scientists and researchers who were dedicated to studying Lepidoptera. As a result of its collaborative efforts, the group actively contributed to major book projects and web portals, further enhancing the knowledge and resources available to the butterfly research community. As time progressed, the group expanded its presence to include various social media platforms like Orkut, Facebook, Flickr and more, thereby expanding its influence and reach. The realization of a significant need for empirical research on butterflies, requiring the involvement of both specialists and enthusiasts across diverse habitats, led to the first ButterflyIndia Meet in 2004 at Shendurney, Kerala. This pioneering concept garnered immense success, attracting participants from diverse regions of the country and backgrounds. Since then, several ButterflyIndia Meets have been organized, resulting in the documentation of numerous butterfly species. Building upon this success, DragonflyIndia and SpiderIndia were established with similar objectives and have successfully coordinated multiple gatherings (Fig. 1). One of the most notable DiversityIndia Meets occurred in April 2022, held in Sundarbans, West Bengal. This particular meet marked a significant milestone as the documented dataset, comprising information on all taxonomic groups observed during the event, was published through the Global Biodiversity Information Facility (GBIF) (Roy et al. 2022). This publication allowed for wider accessibility and utilization of the valuable biodiversity data collected during the meet. In addition to the Sundarbans meet, ongoing efforts are currently underway to gather occurrence data from all the previous meetings conducted by DiversityIndia (Table 1). The aim is to compile and mobilize this data on GBIF as datasets, involving active participation from the members who attended these meetings. This endeavor seeks to maximize the availability and usefulness of the biodiversity information gathered through the various DiversityIndia Meets over time. According to the published records so far (Global Biodiversity Informa
DiversityIndia成立于2001年,是一个在线社区,致力于促进有意义的讨论,促进对不太知名的分类群体(包括蝴蝶、飞蛾、蜻蜓、蜘蛛等)的不同观点的交流。DiversityIndia的核心理念是建立一个志同道合的人的网络,他们对这些学科有着深深的热情,并积极参与生物多样性的各个方面的观察和研究。最初,多样性印度的分类学焦点集中在蝴蝶上,这导致了蝴蝶印度雅虎电子邮件组的创建。该小组在分享有关蝴蝶的宝贵见解方面做出了重大贡献,包括有关它们的栖息地和鲜为人知的物种的信息,很快获得了认可。“印度蝴蝶”还在促进致力于研究鳞翅目的科学家和研究人员之间的联系方面发挥了至关重要的作用。由于其合作努力,该小组积极参与了主要的图书项目和门户网站,进一步增强了蝴蝶研究界可用的知识和资源。随着时间的推移,该组织扩大了其存在范围,包括各种社交媒体平台,如Orkut, Facebook, Flickr等,从而扩大了其影响力和覆盖面。认识到对蝴蝶进行实证研究的重大需求,需要不同栖息地的专家和爱好者的参与,导致了2004年在喀拉拉邦的申杜尼举行的第一届印度蝴蝶大会。这一开创性的概念获得了巨大的成功,吸引了来自该国不同地区和背景的参与者。从那时起,已经组织了几次蝴蝶印度会议,导致了许多蝴蝶物种的记录。在这一成功的基础上,蜻蜓印度和蜘蛛印度以相似的目标成立,并成功地协调了多次聚会(图1)。最著名的印度多样性会议之一于2022年4月在西孟加拉邦的孙德尔本斯举行。这次特别的会议标志着一个重要的里程碑,因为记录的数据集包含了会议期间观察到的所有分类类群的信息,并通过全球生物多样性信息设施(GBIF)发布(Roy et al. 2022)。该出版物使会议期间收集的宝贵的生物多样性数据得到更广泛的获取和利用。除了孙德尔本斯会议,目前正在努力收集印度多样性组织以前所有会议的发生数据(表1)。目的是汇编和动员GBIF上的这些数据作为数据集,包括参加这些会议的成员的积极参与。这一努力旨在最大限度地利用通过各种印度多样性会议收集的生物多样性信息。根据迄今为止公布的记录(全球生物多样性信息设施2023),有来自14个分类类别的859个分类群的1663个记录事件,涵盖了印度的各种生物地理。这些记录为了解该国的生物多样性提供了宝贵的见解。DiversityIndia确实在印度的在线公民科学方面发挥了先锋作用,它的起源可以追溯到最初的雅虎群组。通过将专家和爱好者聚集在一起,这个社区成功地为全面记录和了解印度丰富的生物多样性做出了贡献。成员的集体努力证明了公民科学在推动我们对自然世界的知识界限方面的持久影响。
{"title":"DiversityIndia Meets: Pioneering citizen science through collaborative data mobilization","authors":"Vijay Barve, Nandita Barman, Arjan Basu Roy, Amol Patwardhan, Purab Chowdhury","doi":"10.3897/biss.7.112163","DOIUrl":"https://doi.org/10.3897/biss.7.112163","url":null,"abstract":"DiversityIndia, founded in 2001, is an online community dedicated to promoting meaningful discussions and facilitating the exchange of diverse perspectives on lesser-known taxonomic groups, including butterflies, moths, dragonflies, spiders, and more. The core idea behind DiversityIndia is to establish a network of like-minded individuals who possess a deep passion for these subjects and actively participate in various aspects of biodiversity observation and research.\u0000 Initially, the taxonomic focus of DiversityIndia centered around butterflies, which led to the creation of the ButterflyIndia Yahoo email group. The group quickly gained recognition for its significant contributions in sharing valuable insights about butterflies, including information about their habitats and lesser-known species. ButterflyIndia also played a vital role in facilitating connections among scientists and researchers who were dedicated to studying Lepidoptera. As a result of its collaborative efforts, the group actively contributed to major book projects and web portals, further enhancing the knowledge and resources available to the butterfly research community. As time progressed, the group expanded its presence to include various social media platforms like Orkut, Facebook, Flickr and more, thereby expanding its influence and reach.\u0000 The realization of a significant need for empirical research on butterflies, requiring the involvement of both specialists and enthusiasts across diverse habitats, led to the first ButterflyIndia Meet in 2004 at Shendurney, Kerala. This pioneering concept garnered immense success, attracting participants from diverse regions of the country and backgrounds. Since then, several ButterflyIndia Meets have been organized, resulting in the documentation of numerous butterfly species. Building upon this success, DragonflyIndia and SpiderIndia were established with similar objectives and have successfully coordinated multiple gatherings (Fig. 1).\u0000 One of the most notable DiversityIndia Meets occurred in April 2022, held in Sundarbans, West Bengal. This particular meet marked a significant milestone as the documented dataset, comprising information on all taxonomic groups observed during the event, was published through the Global Biodiversity Information Facility (GBIF) (Roy et al. 2022). This publication allowed for wider accessibility and utilization of the valuable biodiversity data collected during the meet.\u0000 In addition to the Sundarbans meet, ongoing efforts are currently underway to gather occurrence data from all the previous meetings conducted by DiversityIndia (Table 1). The aim is to compile and mobilize this data on GBIF as datasets, involving active participation from the members who attended these meetings. This endeavor seeks to maximize the availability and usefulness of the biodiversity information gathered through the various DiversityIndia Meets over time.\u0000 According to the published records so far (Global Biodiversity Informa","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80036704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}