首页 > 最新文献

Biodiversity Information Science and Standards最新文献

英文 中文
NBN Atlas: Our transformation and re-alignment with the Living Atlas community NBN地图集:我们与生活地图集社区的转型和重新定位
Pub Date : 2023-09-18 DOI: 10.3897/biss.7.112813
Helen Manders-Jones, Keith Raven
The National Biodiversity Network (NBN) Atlas is the largest repository of publicly available biodiversity data in the United Kingdom (UK). Built on the open-source Atlas of Living Australia (ALA) platform, it was launched in 2017 and is part of a global network of over 20 Living Atlases (live or in development). Notably, the NBN Atlas is the largest, with almost twice the number of records as the Atlas of Living Australia. In order to meet the needs of the UK biological recording community, the NBN Atlas was considerably customised. Regrettably, these customisations were directly applied to the platform code, resulting in divergence from the parent ALA platform and creating major obstacles to upgrading. To address these challenges, we initiated the Fit for the Future Project. We will outline our journey to decouple the customizations, realign with the ALA, upgrade the NBN Atlas, regain control of the infrastructure and modernize DevOps practices. Each of these steps played a crucial role in our overall transformation. Additionally, we will discuss a new project that will allow data providers to set the public resolution of all records in a dataset and give individuals and organisations access to the supplied location information. We will also highlight our efforts to leverage contributions from volunteer developers.
国家生物多样性网络(NBN)地图集是英国最大的公开生物多样性数据储存库。它建立在开源的澳大利亚生活地图集(ALA)平台上,于2017年推出,是由20多个生活地图集(正在使用或正在开发)组成的全球网络的一部分。值得注意的是,NBN地图集是最大的,记录数量几乎是生活澳大利亚地图集的两倍。为了满足英国生物记录界的需要,NBN地图集是相当定制的。遗憾的是,这些自定义直接应用于平台上代码,导致了与父ALA平台的分歧,并为升级制造了主要障碍。为了应对这些挑战,我们发起了“适应未来”项目。我们将概述我们的旅程,以解耦定制,重新与ALA对齐,升级NBN地图集,重新获得对基础设施的控制,并使DevOps实践现代化。这些步骤中的每一步都在我们的整体转型中发挥了至关重要的作用。此外,我们将讨论一个新项目,该项目将允许数据提供者设置数据集中所有记录的公共分辨率,并允许个人和组织访问所提供的位置信息。我们还将强调我们在利用志愿开发人员的贡献方面所做的努力。
{"title":"NBN Atlas: Our transformation and re-alignment with the Living Atlas community","authors":"Helen Manders-Jones, Keith Raven","doi":"10.3897/biss.7.112813","DOIUrl":"https://doi.org/10.3897/biss.7.112813","url":null,"abstract":"The National Biodiversity Network (NBN) Atlas is the largest repository of publicly available biodiversity data in the United Kingdom (UK). Built on the open-source Atlas of Living Australia (ALA) platform, it was launched in 2017 and is part of a global network of over 20 Living Atlases (live or in development). Notably, the NBN Atlas is the largest, with almost twice the number of records as the Atlas of Living Australia. In order to meet the needs of the UK biological recording community, the NBN Atlas was considerably customised. Regrettably, these customisations were directly applied to the platform code, resulting in divergence from the parent ALA platform and creating major obstacles to upgrading. To address these challenges, we initiated the Fit for the Future Project. We will outline our journey to decouple the customizations, realign with the ALA, upgrade the NBN Atlas, regain control of the infrastructure and modernize DevOps practices. Each of these steps played a crucial role in our overall transformation. Additionally, we will discuss a new project that will allow data providers to set the public resolution of all records in a dataset and give individuals and organisations access to the supplied location information. We will also highlight our efforts to leverage contributions from volunteer developers.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135203125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-Accelerated Digitisation of Insect Collections: The next generation of Angled Label Image Capture Equipment (ALICE) 人工智能加速昆虫馆藏数字化:新一代角度标签图像捕获设备(ALICE)
Pub Date : 2023-09-15 DOI: 10.3897/biss.7.112742
Arianna Salili-James, Ben Scott, Laurence Livermore, Ben Price, Steen Dupont, Helen Hardy, Vincent Smith
The digitisation of natural science specimens is a shared ambition of many of the largest collections, but the scale of these collections, estimated at at least 1.1 billion specimens (Johnson et al. 2023), continues to challenge even the most resource-rich organisations. The Natural History Museum, London (NHM) has been pioneering work to accelerate the digitisation of its 80 million specimens. Since the inception of the NHM Digital Collection Programme in 2014, more than 5.5 million specimen records have been made digitally accessible. This has enabled the museum to deliver a tenfold increase in digitisation, compared to when rates were first measured by the NHM in 2008. Even with this investment, it will take circa 150 years to digitise its remaining collections, leading the museum to pursue technology-led solutions alongside increased funding to deliver the next increase in digitisation rate. Insects comprise approximately half of all described species and, at the NHM, represent more than one-third (c. 30 million specimens) of the NHM’s overall collection. Their most common preservation method, attached to a pin alongside a series of labels with metadata, makes insect specimens challenging to digitise. Early Artificial Intelligence (AI)-led innovations (Price et al. 2018) resulted in the development of ALICE, the museum's Angled Label Image Capture Equipment, in which a pinned specimen is placed inside a multi-camera setup, which captures a series of partial views of a specimen and its labels. Centred around the pin, these images can be digitally combined and reconstructed, using the accompanying ALICE software, to provide a clean image of each label. To do this, a Convolutional Neural Network (CNN) model is incorporated, to locate all labels within the images. This is followed by various image processing tools to transform the labels into a two-dimensional viewpoint, align the associated label images together, and merge them into one label. This allows users to manually, or computationally (e.g., using Optical Character Recognition [OCR] tools) extract label data from the processed label images (Salili-James et al. 2022). With the ALICE setup, a user might average imaging 800 digitised specimens per day, and exceptionally, up to 1,300. This compares with an average of 250 specimens or fewer daily, using more traditional methods involving separating the labels and photographing them off of the pin. Despite this, our original version of ALICE was only suited to a small subset of the collection. In situations when the specimen is very large, there are too many labels, or these labels are too close together, ALICE fails (Dupont and Price 2019). Using a combination of updated AI processing tools, we hereby present ALICE version 2. This new version of ALICE provides faster rates, improved software accuracy, and a more streamlined pipeline. It includes the following updates: Hardware: after conducting various tests, we have opti
自然科学标本的数字化是许多大型馆藏的共同目标,但这些馆藏的规模估计至少有11亿个标本(Johnson et al. 2023),即使是资源最丰富的组织也继续面临挑战。伦敦自然历史博物馆(NHM)在加速其8000万标本数字化方面一直处于领先地位。自2014年NHM数字收集计划启动以来,已有超过550万份标本记录以数字方式可供访问。这使得该博物馆的数字化程度比2008年英国国家博物馆首次测量时提高了10倍。即使有了这笔投资,将剩余藏品数字化也需要大约150年的时间,这使得博物馆在增加资金的同时,寻求以技术为主导的解决方案,以实现数字化率的下一次提高。昆虫约占所有已描述物种的一半,在国家自然博物馆,昆虫占国家自然博物馆总收藏的三分之一以上(约3000万标本)。它们最常见的保存方法是将昆虫标本与一系列带有元数据的标签一起固定在大头针上,这使得昆虫标本难以数字化。早期人工智能(AI)主导的创新(Price et al. 2018)导致了博物馆角度标签图像捕获设备ALICE的开发,其中将固定的标本放置在多摄像头设置中,该设备捕获标本及其标签的一系列局部视图。以大头针为中心,这些图像可以使用随附的ALICE软件进行数字组合和重建,以提供每个标签的清晰图像。为此,采用卷积神经网络(CNN)模型来定位图像中的所有标签。然后使用各种图像处理工具将标签转换为二维视点,将相关的标签图像对齐在一起,并将它们合并为一个标签。这允许用户手动或计算(例如,使用光学字符识别[OCR]工具)从处理过的标签图像中提取标签数据(Salili-James et al. 2022)。使用ALICE设置,用户平均每天可以成像800个数字化标本,特殊情况下可达1300个。相比之下,使用更传统的方法,包括分离标签并从大头针上拍照,每天平均只有250个或更少的标本。尽管如此,我们最初版本的ALICE只适用于集合的一小部分。在样品非常大,标签太多,或者这些标签靠得太近的情况下,ALICE会失败(Dupont and Price 2019)。结合更新的人工智能处理工具,我们在此推出ALICE版本2。这个新版本的ALICE提供了更快的速率,改进的软件准确性和更精简的管道。它包括以下更新:硬件:经过各种测试,我们优化了摄像头设置。进一步的硬件更新包括一个发光二极管(LED)环灯,以及修改摄像头安装。软件:我们最新的软件结合了机器学习和其他计算机视觉工具,从ALICE图像中分割标签,并以更高的精度更快地将它们拼接在一起,大大降低了图像处理的故障率。这些经过处理的标签图像可以与最新的OCR工具相结合,用于自动转录和数据分割。Buildkit:我们的目标是提供一个工具包,任何个人或机构都可以将其纳入其数字化管道。这包括硬件指令,详细介绍管道的广泛指南,以及通过Github访问的新软件代码。硬件:经过各种测试,我们优化了摄像头设置。进一步的硬件更新包括一个发光二极管(LED)环灯,以及修改摄像头安装。软件:我们最新的软件结合了机器学习和其他计算机视觉工具,从ALICE图像中分割标签,并以更高的精度更快地将它们拼接在一起,大大降低了图像处理的故障率。这些经过处理的标签图像可以与最新的OCR工具相结合,用于自动转录和数据分割。Buildkit:我们的目标是提供一个工具包,任何个人或机构都可以将其纳入其数字化管道。这包括硬件指令,详细介绍管道的广泛指南,以及通过Github访问的新软件代码。我们提供了测试数据和工作流程,以证明ALICE版本2作为数字化固定昆虫标本的有效,可访问和节省成本的解决方案的潜力。我们还描述了潜在的修改,使其能够与其他类型的标本工作。
{"title":"AI-Accelerated Digitisation of Insect Collections: The next generation of Angled Label Image Capture Equipment (ALICE)","authors":"Arianna Salili-James, Ben Scott, Laurence Livermore, Ben Price, Steen Dupont, Helen Hardy, Vincent Smith","doi":"10.3897/biss.7.112742","DOIUrl":"https://doi.org/10.3897/biss.7.112742","url":null,"abstract":"The digitisation of natural science specimens is a shared ambition of many of the largest collections, but the scale of these collections, estimated at at least 1.1 billion specimens (Johnson et al. 2023), continues to challenge even the most resource-rich organisations. The Natural History Museum, London (NHM) has been pioneering work to accelerate the digitisation of its 80 million specimens. Since the inception of the NHM Digital Collection Programme in 2014, more than 5.5 million specimen records have been made digitally accessible. This has enabled the museum to deliver a tenfold increase in digitisation, compared to when rates were first measured by the NHM in 2008. Even with this investment, it will take circa 150 years to digitise its remaining collections, leading the museum to pursue technology-led solutions alongside increased funding to deliver the next increase in digitisation rate. Insects comprise approximately half of all described species and, at the NHM, represent more than one-third (c. 30 million specimens) of the NHM’s overall collection. Their most common preservation method, attached to a pin alongside a series of labels with metadata, makes insect specimens challenging to digitise. Early Artificial Intelligence (AI)-led innovations (Price et al. 2018) resulted in the development of ALICE, the museum's Angled Label Image Capture Equipment, in which a pinned specimen is placed inside a multi-camera setup, which captures a series of partial views of a specimen and its labels. Centred around the pin, these images can be digitally combined and reconstructed, using the accompanying ALICE software, to provide a clean image of each label. To do this, a Convolutional Neural Network (CNN) model is incorporated, to locate all labels within the images. This is followed by various image processing tools to transform the labels into a two-dimensional viewpoint, align the associated label images together, and merge them into one label. This allows users to manually, or computationally (e.g., using Optical Character Recognition [OCR] tools) extract label data from the processed label images (Salili-James et al. 2022). With the ALICE setup, a user might average imaging 800 digitised specimens per day, and exceptionally, up to 1,300. This compares with an average of 250 specimens or fewer daily, using more traditional methods involving separating the labels and photographing them off of the pin. Despite this, our original version of ALICE was only suited to a small subset of the collection. In situations when the specimen is very large, there are too many labels, or these labels are too close together, ALICE fails (Dupont and Price 2019). Using a combination of updated AI processing tools, we hereby present ALICE version 2. This new version of ALICE provides faster rates, improved software accuracy, and a more streamlined pipeline. It includes the following updates: Hardware: after conducting various tests, we have opti","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135436718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping between Darwin Core and the Australian Biodiversity Information Standard: A linked data example 达尔文核心与澳大利亚生物多样性信息标准之间的映射:一个关联数据示例
Pub Date : 2023-09-15 DOI: 10.3897/biss.7.112722
Mieke Strong, Piers Higgs
The Australian Biodiversity Information Standard (ABIS) is a data standard that has been developed to represent and exchange biodiversity data expressed using the Resource Description Framework (RDF). ABIS has the TERN ontology at its core, which is a conceptual information model that represents plot-based ecological surveys. The RDF-linked data structure is self-describing, composed of “triples”. This format is quite different from tabular data. During the Australian federal government Biodiversity Data Repository pilot project, occurrence data in tabular Darwin Core format was converted into ABIS linked data. This lightning talk will describe the approach taken, the challenges that arose, and the ways in which data using Darwin Core terms can be represented in a different way using linked data technologies.
澳大利亚生物多样性信息标准(ABIS)是一种数据标准,用于表示和交换使用资源描述框架(RDF)表达的生物多样性数据。ABIS的核心是TERN本体,这是一个概念信息模型,表示基于图的生态调查。rdf链接的数据结构是自描述的,由“三元组”组成。这种格式与表格数据有很大的不同。在澳大利亚联邦政府生物多样性数据库试点项目中,将达尔文核心表格格式的发生数据转换为ABIS关联数据。这个简短的演讲将描述所采取的方法,出现的挑战,以及使用达尔文核心术语的数据可以使用关联数据技术以不同的方式表示的方式。
{"title":"Mapping between Darwin Core and the Australian Biodiversity Information Standard: A linked data example","authors":"Mieke Strong, Piers Higgs","doi":"10.3897/biss.7.112722","DOIUrl":"https://doi.org/10.3897/biss.7.112722","url":null,"abstract":"The Australian Biodiversity Information Standard (ABIS) is a data standard that has been developed to represent and exchange biodiversity data expressed using the Resource Description Framework (RDF). ABIS has the TERN ontology at its core, which is a conceptual information model that represents plot-based ecological surveys. The RDF-linked data structure is self-describing, composed of “triples”. This format is quite different from tabular data. During the Australian federal government Biodiversity Data Repository pilot project, occurrence data in tabular Darwin Core format was converted into ABIS linked data. This lightning talk will describe the approach taken, the challenges that arose, and the ways in which data using Darwin Core terms can be represented in a different way using linked data technologies.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135436489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lognom, Assisting in the Decision-Making and Management of Zoological Nomenclature 协助动物命名决策与管理
Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112710
Elie Saliba, Régine Vignes Lebbe, Annemarie Ohler
Nomenclature is the discipline of taxonomy responsible for managing the scientific names of groups of organisms. It ensures continuity in the transmission of all kinds of data and knowledge accumulated about taxa. Zoologists use the International Code of Zoological Nomenclature (International Commission on Zoological Nomenclature 1999), currently in its fourth edition. The Code contains the rules that allow the correct understanding and application of nomenclature, e.g., how to choose between two names applying to the same taxon. Nomenclature became more complex over the centuries, as rules appeared, disappeared, or evolved to adapt to scientific and technological changes (e.g., the inclusion of digital media) (International Commission on Zoological Nomenclature 2012). By adhering to nomenclatural rules, taxonomic databases, such as the Catalogue of Life (Bánki et al. 2023), can maintain the integrity and accuracy of taxon names, preventing confusion and ambiguity. Nomenclature also facilitates the linkage and integration of data across different databases, allowing for seamless collaboration and information exchange among researchers. However, unlike its final result, which is also called a nomenclature, the discipline itself has remained relatively impervious to computerization, until now. Lognom *1 is a free web application based on algorithms that facilitate decision-making in zoological nomenclature. It is not based on a pre-existing database, but instead provides an answer based on the user input, and relies on interactive form-based queries. This software aims to help taxonomists determine whether a name or work is available, whether spelling rules have been correctly applied, and whether all the relevant rules have been respected before a new name or work is published. Lognom also allows the user to obtain the valid name between several pre-registered candidate names, including the list of synonyms and the reason for their synonymy. It also includes tools for answering various nomenclatural questions, such as determining if two different species names with the same derivation and meaning should be treated as homonyms; if a name should be treated as a nomen oblitum under Art. 23.9 of the Code; and another tool to determine a genus-series name's grammatical gender. Lognom includes most of the rules regarding availability and validity, with the exception of those needing human interpretation, usually pertaining to Latin grammar. At this point of its development, homonymy is not completely included in the web app, nor are the rules linked to the management of type-specimens (e.g., lectotypification, neotypification), outside of their use in determining the availability of a name. With enough data entered by the users, Lognom should be able to model a modification of the rules and calculate its impact on the potential availability or spelling of existing names. Other prospectives include the possibility of working simultaneously on common proj
命名法是分类学的一门学科,负责管理生物群的学名。它保证了所积累的关于分类群的各种数据和知识的连续性。动物学家使用国际动物命名规则(国际动物命名委员会1999年),目前是第四版。该法典包含了正确理解和应用命名法的规则,例如,如何在适用于同一分类单元的两个名称之间进行选择。几个世纪以来,随着规则的出现、消失或演变以适应科学和技术的变化(例如,数字媒体的加入),命名法变得更加复杂(国际动物命名法委员会,2012)。通过遵守命名规则,分类数据库,如Catalogue of Life (Bánki et al. 2023),可以保持分类单元名称的完整性和准确性,防止混淆和歧义。Nomenclature还促进了跨不同数据库的数据链接和集成,允许研究人员之间的无缝协作和信息交换。然而,与它的最终结果(也称为命名法)不同的是,直到现在,这门学科本身仍然相对不受计算机化的影响。Lognom *1是一个免费的web应用程序,基于算法,促进决策在动物命名。它不是基于预先存在的数据库,而是根据用户输入提供答案,并依赖于交互式的基于表单的查询。这个软件旨在帮助分类学家确定一个名字或作品是否可用,拼写规则是否被正确应用,以及在新名字或作品发表之前是否遵守了所有相关规则。Lognom还允许用户获得几个预先注册的候选名称之间的有效名称,包括同义词列表及其同义词的原因。它还包括回答各种命名问题的工具,例如确定具有相同来源和含义的两个不同的物种名称是否应被视为同音异义词;根据《法典》第23.9条,某一名称是否应被视为姓名义务;这是另一个确定属系名称语法性别的工具。Lognom包括大多数关于可用性和有效性的规则,除了那些需要人工解释的规则,通常与拉丁语法有关。在其开发的这一点上,同音性并没有完全包含在web应用程序中,也没有与类型样本管理(例如,lectotypification, neotypification)相关的规则,除了它们在确定名称可用性方面的使用之外。有了用户输入的足够的数据,Lognom应该能够对规则的修改进行建模,并计算其对现有名称的潜在可用性或拼写的影响。其他的前景包括可能同时进行共同项目,这将产生现有姓名的动态名单,以及自动从已有的数据库提取命名数据,并在其中传播有关资料。将语义网络标签附加到整个Zoonom (Saliba等人,2021)或NOMEN (Yoder等人,2017)中的名称的链接也在考虑之中。
{"title":"Lognom, Assisting in the Decision-Making and Management of Zoological Nomenclature","authors":"Elie Saliba, Régine Vignes Lebbe, Annemarie Ohler","doi":"10.3897/biss.7.112710","DOIUrl":"https://doi.org/10.3897/biss.7.112710","url":null,"abstract":"Nomenclature is the discipline of taxonomy responsible for managing the scientific names of groups of organisms. It ensures continuity in the transmission of all kinds of data and knowledge accumulated about taxa. Zoologists use the International Code of Zoological Nomenclature (International Commission on Zoological Nomenclature 1999), currently in its fourth edition. The Code contains the rules that allow the correct understanding and application of nomenclature, e.g., how to choose between two names applying to the same taxon. Nomenclature became more complex over the centuries, as rules appeared, disappeared, or evolved to adapt to scientific and technological changes (e.g., the inclusion of digital media) (International Commission on Zoological Nomenclature 2012). By adhering to nomenclatural rules, taxonomic databases, such as the Catalogue of Life (Bánki et al. 2023), can maintain the integrity and accuracy of taxon names, preventing confusion and ambiguity. Nomenclature also facilitates the linkage and integration of data across different databases, allowing for seamless collaboration and information exchange among researchers. However, unlike its final result, which is also called a nomenclature, the discipline itself has remained relatively impervious to computerization, until now. Lognom *1 is a free web application based on algorithms that facilitate decision-making in zoological nomenclature. It is not based on a pre-existing database, but instead provides an answer based on the user input, and relies on interactive form-based queries. This software aims to help taxonomists determine whether a name or work is available, whether spelling rules have been correctly applied, and whether all the relevant rules have been respected before a new name or work is published. Lognom also allows the user to obtain the valid name between several pre-registered candidate names, including the list of synonyms and the reason for their synonymy. It also includes tools for answering various nomenclatural questions, such as determining if two different species names with the same derivation and meaning should be treated as homonyms; if a name should be treated as a nomen oblitum under Art. 23.9 of the Code; and another tool to determine a genus-series name's grammatical gender. Lognom includes most of the rules regarding availability and validity, with the exception of those needing human interpretation, usually pertaining to Latin grammar. At this point of its development, homonymy is not completely included in the web app, nor are the rules linked to the management of type-specimens (e.g., lectotypification, neotypification), outside of their use in determining the availability of a name. With enough data entered by the users, Lognom should be able to model a modification of the rules and calculate its impact on the potential availability or spelling of existing names. Other prospectives include the possibility of working simultaneously on common proj","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Multimodality for Biodiversity Data: Exploring joint representations of species descriptions and specimen images using CLIP 利用生物多样性数据的多模态:利用CLIP探索物种描述和标本图像的联合表示
Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112666
Maya Sahraoui, Youcef Sklab, Marc Pignal, Régine Vignes Lebbe, Vincent Guigue
In recent years, the field of biodiversity data analysis has witnessed significant advancements, with a number of models emerging to process and extract valuable insights from various data sources. One notable area of progress lies in the analysis of species descriptions, where structured knowledge extraction techniques have gained prominence. These techniques aim to automatically extract relevant information from unstructured text, such as taxonomic classifications and morphological traits. (Sahraoui et al. 2022, Sahraoui et al. 2023) By applying natural language processing (NLP) and machine learning methods, structured knowledge extraction enables the conversion of textual species descriptions into a structured format, facilitating easier integration, searchability, and analysis of biodiversity data. Furthermore, object detection on specimen images has emerged as a powerful tool in biodiversity research. By leveraging computer vision algorithms (Triki et al. 2020, Triki et al. 2021,Ott et al. 2020), researchers can automatically identify and classify objects of interest within specimen images, such as organs, anatomical features, or specific taxa. Object detection techniques allow for the efficient and accurate extraction of valuable information, contributing to tasks like species identification, morphological trait analysis, and biodiversity monitoring. These advancements have been particularly significant in the context of herbarium collections and digitization efforts, where large volumes of specimen images need to be processed and analyzed. On the other hand, multimodal learning, an emerging field in artificial intelligence (AI), focuses on developing models that can effectively process and learn from multiple modalities, such as text and images (Li et al. 2020, Li et al. 2021, Li et al. 2019, Radford et al. 2021, Sun et al. 2021, Chen et al. 2022). By incorporating information from different modalities, multimodal learning aims to capture the rich and complementary characteristics present in diverse data sources. This approach enables the model to leverage the strengths of each modality, leading to enhanced understanding, improved performance, and more comprehensive representations. Structured knowledge extraction from species descriptions and object detection on specimen images synergistically enhances biodiversity data analysis. This integration leverages textual and visual data strengths, gaining deeper insights. Extracted structured information from descriptions improves search, classification, and correlation of biodiversity data. Object detection enriches textual descriptions, providing visual evidence for the verification and validation of species characteristics. To tackle the challenges posed by the massive volume of specimen images available at the Herbarium of the National Museum of Natural History in Paris, we have chosen to implement the CLIP (Contrastive Language-Image Pretraining) model (Radford et al. 2021) developed by Ope
近年来,生物多样性数据分析领域取得了重大进展,出现了许多模型来处理和提取来自各种数据源的有价值的见解。一个值得注意的进步领域是物种描述的分析,其中结构化知识提取技术已经获得突出。这些技术旨在从非结构化文本中自动提取相关信息,如分类分类和形态特征。(Sahraoui et al. 2022, Sahraoui et al. 2023)通过应用自然语言处理(NLP)和机器学习方法,结构化知识提取能够将文本物种描述转换为结构化格式,从而便于生物多样性数据的集成、可搜索性和分析。此外,标本图像的目标检测已成为生物多样性研究的有力工具。通过利用计算机视觉算法(Triki et al. 2020, Triki et al. 2021,Ott et al. 2020),研究人员可以自动识别和分类标本图像中感兴趣的物体,如器官、解剖特征或特定分类群。目标检测技术可以有效、准确地提取有价值的信息,有助于完成物种鉴定、形态特征分析和生物多样性监测等任务。这些进步在植物标本馆收集和数字化工作的背景下尤为重要,因为需要处理和分析大量标本图像。另一方面,多模式学习是人工智能(AI)的一个新兴领域,专注于开发能够有效处理和学习多种模式的模型,如文本和图像(Li et al. 2020, Li et al. 2021, Li et al. 2019, Radford et al. 2021, Sun et al. 2021, Chen et al. 2022)。通过整合来自不同模式的信息,多模式学习旨在捕捉不同数据源中存在的丰富和互补特征。这种方法使模型能够利用每种模态的优势,从而增强理解、改进性能和更全面的表示。物种描述的结构化知识提取和标本图像的目标检测协同增强了生物多样性数据分析。这种集成利用了文本和可视化数据的优势,获得了更深入的见解。从描述中提取结构化信息,提高了生物多样性数据的检索、分类和相关性。目标检测丰富了文本描述,为物种特征的验证和验证提供了视觉证据。为了应对巴黎国家自然历史博物馆植物标本馆提供的大量标本图像所带来的挑战,我们选择实施由OpenAI开发的CLIP(对比语言-图像预训练)模型(Radford et al. 2021)。CLIP利用对比学习框架来识别文本和图像的联合表示。该模型在一个由来自互联网的文本-图像对组成的大规模数据集上进行训练,使其能够理解文本描述和视觉内容之间的语义关系。在我们的物种描述和标本图像数据集上微调CLIP模型对于使其适应我们的领域至关重要。通过将模型暴露于我们的数据,我们增强了模型理解和表示生物多样性特征的能力。这包括在我们的标记数据集上训练模型,使其能够改进其知识并适应生物多样性模式。利用经过微调的CLIP模型,我们的目标是为植物标本馆的大量生物多样性收藏开发一个高效的搜索引擎。用户可以用形态学关键词查询引擎,它会将文本描述与标本图像进行匹配,提供相关结果。这项研究与当前生物多样性数据的人工智能轨迹一致,为解决保护和理解地球生物多样性的创新方法铺平了道路。
{"title":"Leveraging Multimodality for Biodiversity Data: Exploring joint representations of species descriptions and specimen images using CLIP","authors":"Maya Sahraoui, Youcef Sklab, Marc Pignal, Régine Vignes Lebbe, Vincent Guigue","doi":"10.3897/biss.7.112666","DOIUrl":"https://doi.org/10.3897/biss.7.112666","url":null,"abstract":"In recent years, the field of biodiversity data analysis has witnessed significant advancements, with a number of models emerging to process and extract valuable insights from various data sources. One notable area of progress lies in the analysis of species descriptions, where structured knowledge extraction techniques have gained prominence. These techniques aim to automatically extract relevant information from unstructured text, such as taxonomic classifications and morphological traits. (Sahraoui et al. 2022, Sahraoui et al. 2023) By applying natural language processing (NLP) and machine learning methods, structured knowledge extraction enables the conversion of textual species descriptions into a structured format, facilitating easier integration, searchability, and analysis of biodiversity data. Furthermore, object detection on specimen images has emerged as a powerful tool in biodiversity research. By leveraging computer vision algorithms (Triki et al. 2020, Triki et al. 2021,Ott et al. 2020), researchers can automatically identify and classify objects of interest within specimen images, such as organs, anatomical features, or specific taxa. Object detection techniques allow for the efficient and accurate extraction of valuable information, contributing to tasks like species identification, morphological trait analysis, and biodiversity monitoring. These advancements have been particularly significant in the context of herbarium collections and digitization efforts, where large volumes of specimen images need to be processed and analyzed. On the other hand, multimodal learning, an emerging field in artificial intelligence (AI), focuses on developing models that can effectively process and learn from multiple modalities, such as text and images (Li et al. 2020, Li et al. 2021, Li et al. 2019, Radford et al. 2021, Sun et al. 2021, Chen et al. 2022). By incorporating information from different modalities, multimodal learning aims to capture the rich and complementary characteristics present in diverse data sources. This approach enables the model to leverage the strengths of each modality, leading to enhanced understanding, improved performance, and more comprehensive representations. Structured knowledge extraction from species descriptions and object detection on specimen images synergistically enhances biodiversity data analysis. This integration leverages textual and visual data strengths, gaining deeper insights. Extracted structured information from descriptions improves search, classification, and correlation of biodiversity data. Object detection enriches textual descriptions, providing visual evidence for the verification and validation of species characteristics. To tackle the challenges posed by the massive volume of specimen images available at the Herbarium of the National Museum of Natural History in Paris, we have chosen to implement the CLIP (Contrastive Language-Image Pretraining) model (Radford et al. 2021) developed by Ope","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Reproducible are the Results Gained with the Help of Deep Learning Methods in Biodiversity Research? 深度学习方法在生物多样性研究中的可重复性如何?
Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112698
Waqas Ahmed, Vamsi Krishna Kommineni, Birgitta Koenig-ries, Sheeba Samuel
In recent years, deep learning methods in the biodiversity domain have gained significant attention due to their ability to handle the complexity of biological data and to make processing of large volumes of data feasible. However, these methods are not easy to interpret, so the opacity of new scientific research and discoveries makes them somewhat untrustworthy. Reproducibility is a fundamental aspect of scientific research, which enables validation and advancement of methods and results. If results obtained with the help of deep learning methods were reproducible, this would increase their trustworthiness. In this study, we investigate the state of reproducibility of deep learning methods in biodiversity research. We propose a pipeline to investigate the reproducibility of deep learning methods in the biodiversity domain. In our preliminary work, we systematically mined the existing literature from Google Scholar to identify publications that employ deep-learning techniques for biodiversity research. By carefully curating a dataset of relevant publications, we extracted reproducibility-related variables for 61 publications using a manual approach, such as the availability of datasets and code that serve as fundamental criteria for reproducibility assessment. Moreover, we extended our analysis to include advanced reproducibility variables, such as the specific deep learning methods, models, hyperparameters, etc., employed in the studies. To facilitate the automatic extraction of information from publications, we plan to leverage the capabilities of large language models (LLMs). By using the latest natural language processing (NLP) techniques, we aim to identify and extract relevant information pertaining to the reproducibility of deep learning methods in the biodiversity domain. This study seeks to contribute to the establishment of robust and reliable research practices. The findings will not only aid in validating existing methods but also guide the development of future approaches, ultimately fostering transparency and trust in the application of deep learning techniques in biodiversity research.
近年来,生物多样性领域的深度学习方法因其处理复杂生物数据的能力和处理大量数据的可行性而受到广泛关注。然而,这些方法并不容易解释,因此新的科学研究和发现的不透明性使它们在某种程度上不可信。可重复性是科学研究的一个基本方面,它使方法和结果的验证和进步成为可能。如果在深度学习方法的帮助下获得的结果是可重复的,这将增加它们的可信度。在这项研究中,我们调查了生物多样性研究中深度学习方法的可重复性状态。我们提出了一个管道来研究深度学习方法在生物多样性领域的可重复性。在我们的初步工作中,我们系统地挖掘了来自Google Scholar的现有文献,以确定采用深度学习技术进行生物多样性研究的出版物。通过仔细整理相关出版物的数据集,我们使用手动方法提取了61份出版物的可重复性相关变量,例如作为可重复性评估基本标准的数据集和代码的可用性。此外,我们扩展了我们的分析,包括高级可重复性变量,如研究中使用的特定深度学习方法、模型、超参数等。为了方便从出版物中自动提取信息,我们计划利用大型语言模型(llm)的功能。通过使用最新的自然语言处理(NLP)技术,我们的目标是识别和提取有关生物多样性领域深度学习方法可重复性的相关信息。本研究旨在为建立健全可靠的研究实践做出贡献。这些发现不仅有助于验证现有方法,还将指导未来方法的发展,最终促进深度学习技术在生物多样性研究中的应用的透明度和信任。
{"title":"How Reproducible are the Results Gained with the Help of Deep Learning Methods in Biodiversity Research?","authors":"Waqas Ahmed, Vamsi Krishna Kommineni, Birgitta Koenig-ries, Sheeba Samuel","doi":"10.3897/biss.7.112698","DOIUrl":"https://doi.org/10.3897/biss.7.112698","url":null,"abstract":"In recent years, deep learning methods in the biodiversity domain have gained significant attention due to their ability to handle the complexity of biological data and to make processing of large volumes of data feasible. However, these methods are not easy to interpret, so the opacity of new scientific research and discoveries makes them somewhat untrustworthy. Reproducibility is a fundamental aspect of scientific research, which enables validation and advancement of methods and results. If results obtained with the help of deep learning methods were reproducible, this would increase their trustworthiness. In this study, we investigate the state of reproducibility of deep learning methods in biodiversity research. We propose a pipeline to investigate the reproducibility of deep learning methods in the biodiversity domain. In our preliminary work, we systematically mined the existing literature from Google Scholar to identify publications that employ deep-learning techniques for biodiversity research. By carefully curating a dataset of relevant publications, we extracted reproducibility-related variables for 61 publications using a manual approach, such as the availability of datasets and code that serve as fundamental criteria for reproducibility assessment. Moreover, we extended our analysis to include advanced reproducibility variables, such as the specific deep learning methods, models, hyperparameters, etc., employed in the studies. To facilitate the automatic extraction of information from publications, we plan to leverage the capabilities of large language models (LLMs). By using the latest natural language processing (NLP) techniques, we aim to identify and extract relevant information pertaining to the reproducibility of deep learning methods in the biodiversity domain. This study seeks to contribute to the establishment of robust and reliable research practices. The findings will not only aid in validating existing methods but also guide the development of future approaches, ultimately fostering transparency and trust in the application of deep learning techniques in biodiversity research.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Simple Recipe for Cooking your AI-assisted Dish to Serve it in the International Digital Specimen Architecture 一个简单的食谱烹饪你的人工智能辅助菜,以服务它在国际数字标本架构
Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112678
Wouter Addink, Sam Leeflang, Sharif Islam
With the rise of Artificial Intelligence (AI), a large set of new tools and services is emerging that supports specimen data mapping, standards alignment, quality enhancement and enrichment of the data. These tools currently operate in isolation, targeted to individual collections, collection management systems and institutional datasets. To address this challenge, DiSSCo, the Distributed System of Scientific Collections, is developing a new infrastructure for digital specimens, transforming them into actionable information objects. This infrastructure incorporates a framework for annotation and curation that allows the objects to be enriched or enhanced by both experts and machines. This creates the unique possibility to plug-in AI-assisted services that can then leverage digital specimens through this infrastructure, which serves as a harmonised Findable, Accessible, Interoperable and Reusable (FAIR) abstraction layer on top of individual institutional systems or datasets. An early example of such services are the ones developed in the Specimen Data Refinery workflow (Hardisty et al. 2022). The new architecture, DS Arch or Digital Specimen Architecture, is built on the concept of FAIR Digital Objects (FDO) (Islam et al. 2020). All digital specimens and related objects are served with persistent identifiers and machine-readable FDO records with information for machines about the object together with a pointer to its machine-readable type description. The type describes the structure of the object, its attributes and describes allowed operations. The digital specimen type and specimen media type are based on existing Biodiversity Information Standards (TDWG) such as Darwin Core, Access to Biological Collection Data (ABCD) Schema and Audiovisual Core Multimedia Resources Metadata Schema, and include support for annotation operations based on the World Wide Web Consortium (W3C) Annotations Data Model. This enables AI-assisted services registered with DS Arch to autonomously discover digital specimen objects and determine the actions they are authorised to perform. AI-assisted services can facilitate various tasks such as digitisation, extract new information from specimen images, create relations with other objects or standardise data. These operations can be done autonomously, upon user request, or in tandem with expert validation. AI-assisted services registered with DS Arch, can interact in the same way with all digital specimens worldwide when served through DS Arch with their uniform FDO representation, even if the content richness, level of standardisation and scope of the specimen is different. DS Arch has been designed to serve digital specimens for living and preserved specimens, and preserved environmental, earth system and astrogeology samples. With the AI-assisted services, data can be annotated with new data, alternative values, corrections, and with new entity relationships. As a result, the digital specimens become Digital Extended S
随着人工智能(AI)的兴起,大量新工具和服务正在出现,以支持标本数据映射,标准对齐,质量增强和数据丰富。这些工具目前是孤立运行的,针对个人收集、收集管理系统和机构数据集。为了应对这一挑战,DiSSCo,即分布式科学收藏系统,正在为数字标本开发一种新的基础设施,将它们转化为可操作的信息对象。这个基础结构包含了一个用于注释和管理的框架,该框架允许专家和机器对对象进行丰富或增强。这为插入人工智能辅助服务创造了独特的可能性,然后可以通过该基础设施利用数字样本,该基础设施作为单个机构系统或数据集之上的协调的可查找、可访问、可互操作和可重用(FAIR)抽象层。此类服务的早期示例是在样本数据提炼工作流中开发的服务(Hardisty et al. 2022)。新的建筑,DS拱门或数字标本建筑,是建立在FAIR数字对象(FDO)的概念上的(Islam等人,2020)。所有数字标本和相关对象都提供持久标识符和机器可读的FDO记录,其中包含有关该对象的机器信息以及指向其机器可读类型描述的指针。类型描述了对象的结构、属性和允许的操作。数字标本类型和标本介质类型基于现有的生物多样性信息标准(TDWG),如达尔文核心(Darwin Core)、生物收集数据访问(ABCD)模式和视听核心多媒体资源元数据模式,并支持基于万维网联盟(W3C)注释数据模型的注释操作。这使得在DS Arch注册的人工智能辅助服务能够自主发现数字标本对象,并确定它们被授权执行的操作。人工智能辅助服务可以促进各种任务,如数字化,从标本图像中提取新信息,创建与其他对象的关系或标准化数据。这些操作可以根据用户请求自主完成,也可以与专家验证一起完成。在DS Arch注册的人工智能辅助服务,即使标本的内容丰富程度、标准化水平和范围不同,也可以通过DS Arch以统一的FDO表示与全球所有数字标本以相同的方式进行交互。DS Arch的设计目的是为活体和保存标本以及保存的环境、地球系统和天体地质标本提供数字标本。使用人工智能辅助服务,可以用新数据、替代值、更正和新的实体关系对数据进行注释。因此,数字标本成为数字扩展标本,从而实现新的科学和应用(Webster et al. 2021)。随着DS Arch在社区接受方面的复杂信任模型的实施,这些注释将成为数据本身的一部分,并可用于包含在源系统中,如收集管理系统和聚合器,如全球生物多样性信息设施(GBIF)、地球科学收集访问服务(GeoCASe)和生命目录。我们的目标是在会议上展示人工智能辅助服务如何注册和用于注释标本数据。虽然DiSSCo DS Arch仍在开发中,计划于2025年投入使用,但我们已经有了一个沙盒环境,可以对概念进行测试,并可以对人工智能辅助服务进行试点,以对数字样本数据进行操作。出于测试目的,目前对样品的操作仅限于单个样品和开放数据,但是在未来的生产环境中也可以进行批量操作。
{"title":"A Simple Recipe for Cooking your AI-assisted Dish to Serve it in the International Digital Specimen Architecture","authors":"Wouter Addink, Sam Leeflang, Sharif Islam","doi":"10.3897/biss.7.112678","DOIUrl":"https://doi.org/10.3897/biss.7.112678","url":null,"abstract":"With the rise of Artificial Intelligence (AI), a large set of new tools and services is emerging that supports specimen data mapping, standards alignment, quality enhancement and enrichment of the data. These tools currently operate in isolation, targeted to individual collections, collection management systems and institutional datasets. To address this challenge, DiSSCo, the Distributed System of Scientific Collections, is developing a new infrastructure for digital specimens, transforming them into actionable information objects. This infrastructure incorporates a framework for annotation and curation that allows the objects to be enriched or enhanced by both experts and machines. This creates the unique possibility to plug-in AI-assisted services that can then leverage digital specimens through this infrastructure, which serves as a harmonised Findable, Accessible, Interoperable and Reusable (FAIR) abstraction layer on top of individual institutional systems or datasets. An early example of such services are the ones developed in the Specimen Data Refinery workflow (Hardisty et al. 2022). The new architecture, DS Arch or Digital Specimen Architecture, is built on the concept of FAIR Digital Objects (FDO) (Islam et al. 2020). All digital specimens and related objects are served with persistent identifiers and machine-readable FDO records with information for machines about the object together with a pointer to its machine-readable type description. The type describes the structure of the object, its attributes and describes allowed operations. The digital specimen type and specimen media type are based on existing Biodiversity Information Standards (TDWG) such as Darwin Core, Access to Biological Collection Data (ABCD) Schema and Audiovisual Core Multimedia Resources Metadata Schema, and include support for annotation operations based on the World Wide Web Consortium (W3C) Annotations Data Model. This enables AI-assisted services registered with DS Arch to autonomously discover digital specimen objects and determine the actions they are authorised to perform. AI-assisted services can facilitate various tasks such as digitisation, extract new information from specimen images, create relations with other objects or standardise data. These operations can be done autonomously, upon user request, or in tandem with expert validation. AI-assisted services registered with DS Arch, can interact in the same way with all digital specimens worldwide when served through DS Arch with their uniform FDO representation, even if the content richness, level of standardisation and scope of the specimen is different. DS Arch has been designed to serve digital specimens for living and preserved specimens, and preserved environmental, earth system and astrogeology samples. With the AI-assisted services, data can be annotated with new data, alternative values, corrections, and with new entity relationships. As a result, the digital specimens become Digital Extended S","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining Ecological and Socio-Environmental Data and Networks to Achieve Sustainability 结合生态和社会环境数据和网络实现可持续发展
Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112703
Laure Berti-Equille, Rafael L. G. Raimundo
Environmental degradation in Brazil has been recently amplified by the expansion of agribusiness, livestock and mining activities with dramatic repercussions on ecosystem functions and services. The anthropogenic degradation of landscapes has substantial impacts on indigenous peoples and small organic farmers whose lifestyles are intimately linked to diverse and functional ecosystems. Understanding how we can apply science and technology to benefit from biodiversity and promote socio-ecological transitions ensuring equitable and sustainable use of common natural resources is a critical challenge brought on by the Anthropocene. We present our approach to combine biodiversity and environmental data, supported by two funded research projects: DATAPB (Data of Paraíba) to develop tools for FAIR (Findable, Accessible, Interoperable and Reusable) data sharing for governance and educational projects and the International Joint Laboratory IDEAL (artificial Intelligence, Data analytics, and Earth observation applied to sustAinability Lab) launched in 2023 by the French Institute for Sustainable Development (IRD, Institut de Recherche pour le Développement) and co-coordinated by the authors, with 50 researchers in 11 Brazilian and French institutions working on Artificial Intelligence and socio-ecological research in four Brazilian Northeast states: Paraíba, Rio Grande do Norte, Pernambuco, and Ceará (Berti-Equille and Raimundo 2023). As the keystone of these transdisciplinary projects, the concept-paradigm of socio-ecological coviability (Barrière et al. 2019) proposes that we should explore multiple ways by which relationships between humans and nonhumans (fauna, flora, natural resources) can reach functional and persistent states. Transdisciplinary approaches to agroecological transitions are urgently needed to address questions such as: How can researchers, local communities, and policymakers co-produce participatory diagnoses that depict the coviability of a territory? How can we conserve biodiversity and ecosystem functions, promote social inclusion, value traditional knowledge, and strengthen bioeconomies at local and regional scales? How can biodiversity, social and environmental data, and networks help local communities in shaping adaptation pathways towards sustainable agroecological practices? How can researchers, local communities, and policymakers co-produce participatory diagnoses that depict the coviability of a territory? How can we conserve biodiversity and ecosystem functions, promote social inclusion, value traditional knowledge, and strengthen bioeconomies at local and regional scales? How can biodiversity, social and environmental data, and networks help local communities in shaping adaptation pathways towards sustainable agroecological practices? These questions require transdisciplinary approaches and effective collaboration among environmental, social, and computer scientists, with the involvement of local stakeholders (Biggs et al.
对巴西东北部9个州和1794个城市共同生存力的社会生态决定因素进行了大规模研究,结合了IBGE (Instituto Brasileiro de Geografia e Estatística)、IPEA (Instituto de Pesquisa Econômica appliada)、MapBiomas、巴西数据立方和我们的合作伙伴的多个数据源:GBIF(全球生物多样性信息设施)、inctodisseia(社会与环境相互作用动态观测站)和ICMBio (Instituto Chico Mendes de conserva<s:1> o o da Biodiversidade),能够计算不同尺度的生物多样性结构、生态系统功能和社会经济组织的代理和指标。我们将进行探索性数据分析,并使用人工智能(Rolnick et al. 2022)来识别适应性、弹性和脆弱性的代理。将使用自适应网络建模设计和测试用于建模社会生态系统和治理系统之间相互作用的多层网络方法(Raimundo et al. 2018)。除了多层网络来模拟社会生态动态(Keyes等人,2021年),我们将在景观尺度上纳入治理系统的演变,并应用拉丁超立方体方法来探索参数空间(Raimundo等人,2014年),并获得模型动力学的广泛特征,并深入了解耦合适应系统的相互作用如何影响多种生态和社会经济情景下的社会生态弹性。将介绍整体方法和研究案例。
{"title":"Combining Ecological and Socio-Environmental Data and Networks to Achieve Sustainability","authors":"Laure Berti-Equille, Rafael L. G. Raimundo","doi":"10.3897/biss.7.112703","DOIUrl":"https://doi.org/10.3897/biss.7.112703","url":null,"abstract":"Environmental degradation in Brazil has been recently amplified by the expansion of agribusiness, livestock and mining activities with dramatic repercussions on ecosystem functions and services. The anthropogenic degradation of landscapes has substantial impacts on indigenous peoples and small organic farmers whose lifestyles are intimately linked to diverse and functional ecosystems. Understanding how we can apply science and technology to benefit from biodiversity and promote socio-ecological transitions ensuring equitable and sustainable use of common natural resources is a critical challenge brought on by the Anthropocene. We present our approach to combine biodiversity and environmental data, supported by two funded research projects: DATAPB (Data of Paraíba) to develop tools for FAIR (Findable, Accessible, Interoperable and Reusable) data sharing for governance and educational projects and the International Joint Laboratory IDEAL (artificial Intelligence, Data analytics, and Earth observation applied to sustAinability Lab) launched in 2023 by the French Institute for Sustainable Development (IRD, Institut de Recherche pour le Développement) and co-coordinated by the authors, with 50 researchers in 11 Brazilian and French institutions working on Artificial Intelligence and socio-ecological research in four Brazilian Northeast states: Paraíba, Rio Grande do Norte, Pernambuco, and Ceará (Berti-Equille and Raimundo 2023). As the keystone of these transdisciplinary projects, the concept-paradigm of socio-ecological coviability (Barrière et al. 2019) proposes that we should explore multiple ways by which relationships between humans and nonhumans (fauna, flora, natural resources) can reach functional and persistent states. Transdisciplinary approaches to agroecological transitions are urgently needed to address questions such as: How can researchers, local communities, and policymakers co-produce participatory diagnoses that depict the coviability of a territory? How can we conserve biodiversity and ecosystem functions, promote social inclusion, value traditional knowledge, and strengthen bioeconomies at local and regional scales? How can biodiversity, social and environmental data, and networks help local communities in shaping adaptation pathways towards sustainable agroecological practices? How can researchers, local communities, and policymakers co-produce participatory diagnoses that depict the coviability of a territory? How can we conserve biodiversity and ecosystem functions, promote social inclusion, value traditional knowledge, and strengthen bioeconomies at local and regional scales? How can biodiversity, social and environmental data, and networks help local communities in shaping adaptation pathways towards sustainable agroecological practices? These questions require transdisciplinary approaches and effective collaboration among environmental, social, and computer scientists, with the involvement of local stakeholders (Biggs et al. ","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging AI in Biodiversity Informatics: Ethics, privacy, and broader impacts 在生物多样性信息学中利用人工智能:伦理、隐私和更广泛的影响
Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112701
Kristen "Kit" Lewers
Artificial Intelligence (AI) has been heralded as a hero by some and rejected as a harbinger of destruction by others. While many in the community are excited about the functionality and promise AI brings to the field of biodiversity informatics, others have reservations regarding its widespread use. This talk will specifically address Large Language Models (LLMs) highlighting both the pros and cons of using LLMs. Like any tool, LLMs are neither good nor bad in and of themselves, but AI does need to be used within the appropriate scope of its ability and properly. Topics to be covered include model opacity (Franzoni 2023), privacy concerns (Wu et al. 2023), potential for algorithmic harm (Marjanovic et al. 2021) and model bias (Wang et al. 2020) in the context of generative AI along with how these topics differ from similar concerns when using traditional ML (Machine Learning) applications. Potential for implementation and training to ensure the most fair environment when leveraging AI and keeping FAIR (Findability, Accessibility, Interoperability, and Reproducibility) principles in mind, will also be discussed. The topics covered will be mainly framed through the Biodiversity Information Standards (TDWG) community, focusing on sociotechnical aspects and implications of implementing LLMs and generative AI. Finally, this talk will explore the potential applicability of TDWG standards pertaining to uniform prompting vocabulary when using generative AI and employing it as a tool for biodiversity informatics.
人工智能(AI)被一些人奉为英雄,也被另一些人视为毁灭的先兆。虽然社区中的许多人对人工智能为生物多样性信息学领域带来的功能和前景感到兴奋,但其他人对其广泛使用持保留态度。这次演讲将特别讨论大型语言模型(llm),强调使用llm的优点和缺点。像任何工具一样,法学硕士本身没有好坏之分,但人工智能确实需要在其能力的适当范围内得到适当的使用。将要涵盖的主题包括生成式人工智能背景下的模型不透明度(Franzoni 2023)、隐私问题(Wu et al. 2023)、算法危害的可能性(Marjanovic et al. 2021)和模型偏差(Wang et al. 2020),以及这些主题与使用传统ML(机器学习)应用程序时的类似问题有何不同。还将讨论实施和培训的潜力,以确保在利用人工智能和保持公平(可寻性、可访问性、互操作性和可重复性)原则时最公平的环境。所涵盖的主题将主要通过生物多样性信息标准(TDWG)社区框架,重点关注社会技术方面和实施法学硕士和生成式人工智能的影响。最后,本演讲将探讨TDWG标准在使用生成式人工智能并将其作为生物多样性信息学工具时与统一提示词汇相关的潜在适用性。
{"title":"Leveraging AI in Biodiversity Informatics: Ethics, privacy, and broader impacts","authors":"Kristen \"Kit\" Lewers","doi":"10.3897/biss.7.112701","DOIUrl":"https://doi.org/10.3897/biss.7.112701","url":null,"abstract":"Artificial Intelligence (AI) has been heralded as a hero by some and rejected as a harbinger of destruction by others. While many in the community are excited about the functionality and promise AI brings to the field of biodiversity informatics, others have reservations regarding its widespread use. This talk will specifically address Large Language Models (LLMs) highlighting both the pros and cons of using LLMs. Like any tool, LLMs are neither good nor bad in and of themselves, but AI does need to be used within the appropriate scope of its ability and properly. Topics to be covered include model opacity (Franzoni 2023), privacy concerns (Wu et al. 2023), potential for algorithmic harm (Marjanovic et al. 2021) and model bias (Wang et al. 2020) in the context of generative AI along with how these topics differ from similar concerns when using traditional ML (Machine Learning) applications. Potential for implementation and training to ensure the most fair environment when leveraging AI and keeping FAIR (Findability, Accessibility, Interoperability, and Reproducibility) principles in mind, will also be discussed. The topics covered will be mainly framed through the Biodiversity Information Standards (TDWG) community, focusing on sociotechnical aspects and implications of implementing LLMs and generative AI. Finally, this talk will explore the potential applicability of TDWG standards pertaining to uniform prompting vocabulary when using generative AI and employing it as a tool for biodiversity informatics.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping across Standards to Calculate the MIDS Level of Digitisation of Natural Science Collections 跨标准映射计算自然科学馆藏数字化MIDS水平
Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112672
Elspeth Haston, Mathias Dillen, Sam Leeflang, Wouter Addink, Claus Weiland, Dagmar Triebel, Eirik Rindal, Anke Penzlin, Rachel Walcott, Josh Humphries, Caitlin Chapman
The Minimum Information about a Digital Specimen (MIDS) standard is being developed within Biodiversity Information Standards (TDWG) to provide a framework for organisations, communities and infrastructures to define, measure, monitor and prioritise the digitisation of specimen data to achieve increased accessibility and scientific use. MIDS levels indicate different levels of completeness in digitisation and range from Level 0: not yet meeting minimal required information needs for scientific use to Level 3: fulfilling the requirements for Digital Extended Specimens (Hardisty et al. 2022) by inclusion of persistent identifiers (PIDs) that connect the specimen with derived and related data. MIDS Levels 0–2 are generic for all specimens. From MIDS Level 2 onwards we make a distinction between biological, geological and palaeontological specimens. While MIDS represents a minimum specification, defining and publishing more extensive sets of information elements (extensions) is readily feasible and explicitly recommended. The MIDS level of a digital specimen can be calculated based on the availability of certain information elements. The MIDS standard applies to published data. The ability to map from, to and between TDWG standards is key to being able to measure the MIDS level of the digitised specimen(s). Each MIDS term is being mapped across TDWG standards involving Darwin Core (DwC), the Access to Biological Collections Data (ABCD) Schema and Latimer Core (LtC, Woodburn et al. 2022), using mapping properties provided by the Simple Knowledge Organization System (SKOS) ontology. In this presentation, we will show selected case studies that demonstrate the implementation of the MIDS standard supplemented by MIDS mappings to ABCD, to LtC, and to the Distributed System of Scientific Collections' (DISSCo) Open Digital Specimen specification. The studies show the mapping exercise in practice, with the aim of enabling fully automated and accurate calculations. To provide a reliable indicator for the level of digitisation completeness, it is important that calculations are done consistently in all implementations.
生物多样性信息标准(TDWG)正在制定关于数字标本的最低信息(MIDS)标准,为组织、社区和基础设施提供一个框架,以定义、测量、监测和优先考虑标本数据的数字化,以实现更大的可访问性和科学使用。MIDS级别表示数字化的不同完成程度,范围从0级:尚未满足科学使用所需的最低信息需求,到3级:通过包含将标本与衍生数据和相关数据连接起来的持久标识符(pid)来满足数字扩展标本的要求(Hardisty等人,2022)。所有标本的MIDS等级为0-2。从MIDS 2级开始,我们对生物、地质和古生物标本进行区分。虽然MIDS代表了最小的规范,但是定义和发布更广泛的信息元素(扩展)集是非常可行的,并且明确推荐。数字样本的MIDS水平可以根据某些信息元素的可用性来计算。MIDS标准适用于已发布的数据。从TDWG标准到TDWG标准之间进行映射的能力是能够测量数字化标本的MIDS水平的关键。使用简单知识组织系统(SKOS)本体提供的映射属性,每个MIDS术语都跨TDWG标准进行映射,包括达尔文核心(DwC)、生物馆藏数据访问(ABCD)模式和拉蒂默核心(LtC, Woodburn等人,2022)。在这次演讲中,我们将展示一些案例研究,这些案例研究展示了MIDS标准的实施,并辅以MIDS映射到ABCD、LtC和分布式科学收藏品系统(DISSCo)开放数字标本规范。这些研究展示了在实践中的测绘练习,目的是实现全自动和准确的计算。为了提供数字化完成程度的可靠指标,重要的是在所有实现中计算都是一致的。
{"title":"Mapping across Standards to Calculate the MIDS Level of Digitisation of Natural Science Collections","authors":"Elspeth Haston, Mathias Dillen, Sam Leeflang, Wouter Addink, Claus Weiland, Dagmar Triebel, Eirik Rindal, Anke Penzlin, Rachel Walcott, Josh Humphries, Caitlin Chapman","doi":"10.3897/biss.7.112672","DOIUrl":"https://doi.org/10.3897/biss.7.112672","url":null,"abstract":"The Minimum Information about a Digital Specimen (MIDS) standard is being developed within Biodiversity Information Standards (TDWG) to provide a framework for organisations, communities and infrastructures to define, measure, monitor and prioritise the digitisation of specimen data to achieve increased accessibility and scientific use. MIDS levels indicate different levels of completeness in digitisation and range from Level 0: not yet meeting minimal required information needs for scientific use to Level 3: fulfilling the requirements for Digital Extended Specimens (Hardisty et al. 2022) by inclusion of persistent identifiers (PIDs) that connect the specimen with derived and related data. MIDS Levels 0–2 are generic for all specimens. From MIDS Level 2 onwards we make a distinction between biological, geological and palaeontological specimens. While MIDS represents a minimum specification, defining and publishing more extensive sets of information elements (extensions) is readily feasible and explicitly recommended. The MIDS level of a digital specimen can be calculated based on the availability of certain information elements. The MIDS standard applies to published data. The ability to map from, to and between TDWG standards is key to being able to measure the MIDS level of the digitised specimen(s). Each MIDS term is being mapped across TDWG standards involving Darwin Core (DwC), the Access to Biological Collections Data (ABCD) Schema and Latimer Core (LtC, Woodburn et al. 2022), using mapping properties provided by the Simple Knowledge Organization System (SKOS) ontology. In this presentation, we will show selected case studies that demonstrate the implementation of the MIDS standard supplemented by MIDS mappings to ABCD, to LtC, and to the Distributed System of Scientific Collections' (DISSCo) Open Digital Specimen specification. The studies show the mapping exercise in practice, with the aim of enabling fully automated and accurate calculations. To provide a reliable indicator for the level of digitisation completeness, it is important that calculations are done consistently in all implementations.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biodiversity Information Science and Standards
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1