No Pain No Gain: Standards mapping in Latimer Core development

Matt Woodburn, Jutta Buschbom, Sharon Grant, Janeen Jones, Ben Norton, Maarten Trekels, Sarah Vincent, Kate Webbink
{"title":"No Pain No Gain: Standards mapping in Latimer Core development","authors":"Matt Woodburn, Jutta Buschbom, Sharon Grant, Janeen Jones, Ben Norton, Maarten Trekels, Sarah Vincent, Kate Webbink","doi":"10.3897/biss.7.113053","DOIUrl":null,"url":null,"abstract":"Latimer Core (LtC) is a new proposed Biodiversity Information Standards (TDWG) data standard that supports the representation and discovery of natural science collections by structuring data about the groups of objects that those collections and their subcomponents encompass (Woodburn et al. 2022). It is designed to be applicable to a range of use cases that include high level collection registries, rich textual narratives and semantic networks of collections, as well as more granular, quantitative breakdowns of collections to aid collection discovery and digitisation planning. As a standard that is (in this first version) focused on natural science collections, LtC has significant intersections with existing data standards and models (Fig. 1) that represent individual natural science objects and occurrences and their associated data (e.g., Darwin Core (DwC), Access to Biological Collection Data (ABCD), Conceptual Reference Model of the International Committee on Documentation (CIDOC-CRM)). LtC’s scope also overlaps with standards for more generic concepts like metadata, organisations, people and activities (i.e., Dublin Core, World Wide Web Consortium (W3C) ORG Ontology and PROV Ontology, Schema.org). LtC represents just an element of this extended network of data standards for the natural sciences and related concepts. Mapping between LtC and intersecting standards is therefore crucial for avoiding duplication of effort in the standard development process, and ensuring that data stored using the different standards are as interoperable as possible in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) principles. In particular, it is vital to make robust associations between records representing groups of objects in LtC and records (where available) that represent the objects within those groups. During LtC development, efforts were made to identify and align with relevant standards and vocabularies, and adopt existing terms from them where possible. During expert review, a more structured approach was proposed and implemented using the Simple Knowledge Organization System (SKOS) mappingRelation vocabulary. This exercise helped to better describe the nature of the mappings between new LtC terms and related terms in other standards, and to validate decisions around the borrowing of existing terms for LtC. A further exercise also used elements of the Simple Standard for Sharing Ontological Mappings (SSSOM) to start to develop a more comprehensive set of metadata around these mappings. At present, these mappings (Suppl. material 1 and Suppl. material 2) are provisional and not considered to be comprehensive, but should be further refined and expanded over time. Even with the support provided by the SKOS and SSSOM standards, the LtC experience has proven the mapping process to be far from straightforward. Different standards vary in how they are structured, for example, DwC is a ‘bag of terms’, with informal classes and no structural constraints, while more structured standards and ontologies like ABCD and PROV employ different approaches to how structure is defined and documented. The various standards use different metadata schemas and serialisations (e.g., Resource Description Framework (RDF), XML) for their documentation, and different approaches to providing persistent, resolvable identifiers for their terms. There are also many subtle nuances involved in assessing the alignment between the concepts that the source and target terms represent, particularly when assessing whether a match is exact enough to allow the existing term to be adopted. These factors make the mapping process quite manual and labour-intensive. Approaches and tools, such as developing decision trees (Fig. 2) to represent the logic involved and further exploration of the SSSOM standard, could help to streamline this process. In this presentation, we will discuss the LtC experience of the standard mapping process, the challenges faced and methods used, and the potential to contribute this experience to a collaborative standards mapping within the anticipated TDWG Standards Mapping Interest Group.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodiversity Information Science and Standards","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/biss.7.113053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Latimer Core (LtC) is a new proposed Biodiversity Information Standards (TDWG) data standard that supports the representation and discovery of natural science collections by structuring data about the groups of objects that those collections and their subcomponents encompass (Woodburn et al. 2022). It is designed to be applicable to a range of use cases that include high level collection registries, rich textual narratives and semantic networks of collections, as well as more granular, quantitative breakdowns of collections to aid collection discovery and digitisation planning. As a standard that is (in this first version) focused on natural science collections, LtC has significant intersections with existing data standards and models (Fig. 1) that represent individual natural science objects and occurrences and their associated data (e.g., Darwin Core (DwC), Access to Biological Collection Data (ABCD), Conceptual Reference Model of the International Committee on Documentation (CIDOC-CRM)). LtC’s scope also overlaps with standards for more generic concepts like metadata, organisations, people and activities (i.e., Dublin Core, World Wide Web Consortium (W3C) ORG Ontology and PROV Ontology, Schema.org). LtC represents just an element of this extended network of data standards for the natural sciences and related concepts. Mapping between LtC and intersecting standards is therefore crucial for avoiding duplication of effort in the standard development process, and ensuring that data stored using the different standards are as interoperable as possible in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) principles. In particular, it is vital to make robust associations between records representing groups of objects in LtC and records (where available) that represent the objects within those groups. During LtC development, efforts were made to identify and align with relevant standards and vocabularies, and adopt existing terms from them where possible. During expert review, a more structured approach was proposed and implemented using the Simple Knowledge Organization System (SKOS) mappingRelation vocabulary. This exercise helped to better describe the nature of the mappings between new LtC terms and related terms in other standards, and to validate decisions around the borrowing of existing terms for LtC. A further exercise also used elements of the Simple Standard for Sharing Ontological Mappings (SSSOM) to start to develop a more comprehensive set of metadata around these mappings. At present, these mappings (Suppl. material 1 and Suppl. material 2) are provisional and not considered to be comprehensive, but should be further refined and expanded over time. Even with the support provided by the SKOS and SSSOM standards, the LtC experience has proven the mapping process to be far from straightforward. Different standards vary in how they are structured, for example, DwC is a ‘bag of terms’, with informal classes and no structural constraints, while more structured standards and ontologies like ABCD and PROV employ different approaches to how structure is defined and documented. The various standards use different metadata schemas and serialisations (e.g., Resource Description Framework (RDF), XML) for their documentation, and different approaches to providing persistent, resolvable identifiers for their terms. There are also many subtle nuances involved in assessing the alignment between the concepts that the source and target terms represent, particularly when assessing whether a match is exact enough to allow the existing term to be adopted. These factors make the mapping process quite manual and labour-intensive. Approaches and tools, such as developing decision trees (Fig. 2) to represent the logic involved and further exploration of the SSSOM standard, could help to streamline this process. In this presentation, we will discuss the LtC experience of the standard mapping process, the challenges faced and methods used, and the potential to contribute this experience to a collaborative standards mapping within the anticipated TDWG Standards Mapping Interest Group.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
没有付出就没有收获:Latimer Core开发中的标准映射
Latimer Core (LtC)是一个新提出的生物多样性信息标准(TDWG)数据标准,它通过构建这些集合及其子组件所包含的对象组的数据来支持自然科学集合的表示和发现(Woodburn et al. 2022)。它被设计为适用于一系列用例,包括高层次的集合注册,丰富的文本叙述和集合的语义网络,以及更细粒度的,定量的集合分解,以帮助集合发现和数字化规划。作为一个(在第一个版本中)专注于自然科学收藏的标准,LtC与现有的数据标准和模型(图1)有重要的交集,这些数据标准和模型表示单个自然科学对象和事件及其相关数据(例如,达尔文核心(DwC),获取生物收集数据(ABCD),国际文献委员会概念参考模型(CIDOC-CRM))。LtC的范围还与元数据、组织、人员和活动等更通用概念的标准重叠(即,Dublin Core、万维网联盟(W3C) ORG本体和PROV本体、Schema.org)。LtC只是自然科学和相关概念的数据标准扩展网络的一个元素。因此,LtC和交叉标准之间的映射对于避免标准开发过程中的重复工作至关重要,并确保使用不同标准存储的数据尽可能与FAIR(可查找、可访问、可互操作、可重用)原则保持一致。特别是,在LtC中表示对象组的记录和表示这些组中的对象的记录(如果可用)之间建立健壮的关联是至关重要的。在LtC开发过程中,我们努力确定并与相关标准和词汇表保持一致,并在可能的情况下采用其中的现有术语。在专家评审期间,提出了一种更结构化的方法,并使用简单知识组织系统(SKOS)映射关系词汇表实现了该方法。这个练习有助于更好地描述新的LtC术语与其他标准中的相关术语之间映射的性质,并验证围绕LtC借用现有术语的决策。进一步的练习还使用了共享本体映射简单标准(SSSOM)的元素,开始围绕这些映射开发更全面的元数据集。目前,这些映射(supl。材料1和供应。材料2)是临时的,不被认为是全面的,但应该随着时间的推移进一步完善和扩展。即使有SKOS和SSSOM标准提供的支持,LtC的经验也证明了映射过程远非简单。不同的标准在结构方式上各不相同,例如,DwC是一个“术语包”,具有非正式的类,没有结构约束,而更结构化的标准和本体(如ABCD和proof)采用不同的方法来定义和记录结构。各种标准使用不同的元数据模式和序列化(例如,资源描述框架(RDF)、XML)作为它们的文档,并使用不同的方法为它们的术语提供持久的、可解析的标识符。在评估源术语和目标术语所表示的概念之间的一致性时,还涉及许多微妙的细微差别,特别是在评估匹配是否足够精确以允许采用现有术语时。这些因素使得映射过程相当手工和劳动密集。方法和工具,如开发决策树(图2)来表示所涉及的逻辑,并进一步探索SSSOM标准,可以帮助简化这一过程。在本次演讲中,我们将讨论LtC在标准映射过程中的经验、面临的挑战和使用的方法,以及在预期的TDWG标准映射兴趣组内将这些经验贡献给协作标准映射的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Meeting Report for the Phenoscape TraitFest 2023 with Comments on Organising Interdisciplinary Meetings Implementation Experience Report for the Developing Latimer Core Standard: The DiSSCo Flanders use-case Structuring Information from Plant Morphological Descriptions using Open Information Extraction The Future of Natural History Transcription: Navigating AI advancements with VoucherVision and the Specimen Label Transcription Project (SLTP) Comparative Study: Evaluating the effects of class balancing on transformer performance in the PlantNet-300k image dataset
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1