首页 > 最新文献

Journal of Biomedical Semantics最新文献

英文 中文
Optimized continuous homecare provisioning through distributed data-driven semantic services and cross-organizational workflows. 通过分布式数据驱动的语义服务和跨组织工作流程,优化持续的家庭护理供应。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-06 DOI: 10.1186/s13326-024-00303-4
Mathias De Brouwer, Pieter Bonte, Dörthe Arndt, Miel Vander Sande, Anastasia Dimou, Ruben Verborgh, Filip De Turck, Femke Ongenae

Background: In healthcare, an increasing collaboration can be noticed between different caregivers, especially considering the shift to homecare. To provide optimal patient care, efficient coordination of data and workflows between these different stakeholders is required. To achieve this, data should be exposed in a machine-interpretable, reusable manner. In addition, there is a need for smart, dynamic, personalized and performant services provided on top of this data. Flexible workflows should be defined that realize their desired functionality, adhere to use case specific quality constraints and improve coordination across stakeholders. User interfaces should allow configuring all of this in an easy, user-friendly way.

Methods: A distributed, generic, cascading reasoning reference architecture can solve the presented challenges. It can be instantiated with existing tools built upon Semantic Web technologies that provide data-driven semantic services and constructing cross-organizational workflows. These tools include RMLStreamer to generate Linked Data, DIVIDE to adaptively manage contextually relevant local queries, Streaming MASSIF to deploy reusable services, AMADEUS to compose semantic workflows, and RMLEditor and Matey to configure rules to generate Linked Data.

Results: A use case demonstrator is built on a scenario that focuses on personalized smart monitoring and cross-organizational treatment planning. The performance and usability of the demonstrator's implementation is evaluated. The former shows that the monitoring pipeline efficiently processes a stream of 14 observations per second: RMLStreamer maps JSON observations to RDF in 13.5 ms, a C-SPARQL query to generate fever alarms is executed on a window of 5 s in 26.4 ms, and Streaming MASSIF generates a smart notification for fever alarms based on severity and urgency in 1539.5 ms. DIVIDE derives the C-SPARQL queries in 7249.5 ms, while AMADEUS constructs a colon cancer treatment plan and performs conflict detection with it in 190.8 ms and 1335.7 ms, respectively.

Conclusions: Existing tools built upon Semantic Web technologies can be leveraged to optimize continuous care provisioning. The evaluation of the building blocks on a realistic homecare monitoring use case demonstrates their applicability, usability and good performance. Further extending the available user interfaces for some tools is required to increase their adoption.

背景:在医疗保健领域,不同护理人员之间的合作越来越多,尤其是考虑到向家庭护理的转变。为了提供最佳的病人护理,需要在这些不同的利益相关者之间有效协调数据和工作流程。为此,数据应以机器可解释、可重复使用的方式公开。此外,还需要在这些数据的基础上提供智能、动态、个性化和高性能的服务。应定义灵活的工作流程,以实现所需的功能,遵守特定用例的质量约束,并改善利益相关者之间的协调。用户界面应允许以简单、用户友好的方式配置所有这一切:分布式、通用、级联推理参考架构可解决上述挑战。它可以利用建立在语义网技术基础上的现有工具进行实例化,这些工具提供数据驱动的语义服务,并构建跨组织的工作流程。这些工具包括用于生成关联数据的RMLStreamer、用于自适应管理上下文相关本地查询的DIVIDE、用于部署可重用服务的流式MASSIF、用于组成语义工作流的AMADEUS,以及用于配置规则以生成关联数据的RMLEditor和Matey:结果:基于个性化智能监控和跨组织治疗规划的场景建立了一个用例演示器。我们对演示器的性能和可用性进行了评估。前者表明,监测管道每秒可高效处理 14 个观测数据流:RMLStreamer 在 13.5 毫秒内将 JSON 观测数据映射为 RDF,在 26.4 毫秒内对 5 秒的窗口执行 C-SPARQL 查询以生成发烧警报,流 MASSIF 在 1539.5 毫秒内根据严重性和紧迫性生成发烧警报智能通知。DIVIDE 在 7249.5 毫秒内生成 C-SPARQL 查询,而 AMADEUS 在 190.8 毫秒和 1335.7 毫秒内分别构建了结肠癌治疗计划并执行了冲突检测:结论:基于语义网技术的现有工具可用于优化持续护理服务。在现实的家庭护理监控使用案例中对构建模块进行的评估证明了它们的适用性、可用性和良好性能。需要进一步扩展某些工具的可用用户界面,以提高其采用率。
{"title":"Optimized continuous homecare provisioning through distributed data-driven semantic services and cross-organizational workflows.","authors":"Mathias De Brouwer, Pieter Bonte, Dörthe Arndt, Miel Vander Sande, Anastasia Dimou, Ruben Verborgh, Filip De Turck, Femke Ongenae","doi":"10.1186/s13326-024-00303-4","DOIUrl":"10.1186/s13326-024-00303-4","url":null,"abstract":"<p><strong>Background: </strong>In healthcare, an increasing collaboration can be noticed between different caregivers, especially considering the shift to homecare. To provide optimal patient care, efficient coordination of data and workflows between these different stakeholders is required. To achieve this, data should be exposed in a machine-interpretable, reusable manner. In addition, there is a need for smart, dynamic, personalized and performant services provided on top of this data. Flexible workflows should be defined that realize their desired functionality, adhere to use case specific quality constraints and improve coordination across stakeholders. User interfaces should allow configuring all of this in an easy, user-friendly way.</p><p><strong>Methods: </strong>A distributed, generic, cascading reasoning reference architecture can solve the presented challenges. It can be instantiated with existing tools built upon Semantic Web technologies that provide data-driven semantic services and constructing cross-organizational workflows. These tools include RMLStreamer to generate Linked Data, DIVIDE to adaptively manage contextually relevant local queries, Streaming MASSIF to deploy reusable services, AMADEUS to compose semantic workflows, and RMLEditor and Matey to configure rules to generate Linked Data.</p><p><strong>Results: </strong>A use case demonstrator is built on a scenario that focuses on personalized smart monitoring and cross-organizational treatment planning. The performance and usability of the demonstrator's implementation is evaluated. The former shows that the monitoring pipeline efficiently processes a stream of 14 observations per second: RMLStreamer maps JSON observations to RDF in 13.5 ms, a C-SPARQL query to generate fever alarms is executed on a window of 5 s in 26.4 ms, and Streaming MASSIF generates a smart notification for fever alarms based on severity and urgency in 1539.5 ms. DIVIDE derives the C-SPARQL queries in 7249.5 ms, while AMADEUS constructs a colon cancer treatment plan and performs conflict detection with it in 190.8 ms and 1335.7 ms, respectively.</p><p><strong>Conclusions: </strong>Existing tools built upon Semantic Web technologies can be leveraged to optimize continuous care provisioning. The evaluation of the building blocks on a realistic homecare monitoring use case demonstrates their applicability, usability and good performance. Further extending the available user interfaces for some tools is required to increase their adoption.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"9"},"PeriodicalIF":1.9,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11154993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141283810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explanatory argumentation in natural language for correct and incorrect medical diagnoses. 用自然语言对正确和错误的医学诊断进行解释性论证。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-05-30 DOI: 10.1186/s13326-024-00306-1
Benjamin Molinet, Santiago Marro, Elena Cabrio, Serena Villata

Background: A huge amount of research is carried out nowadays in Artificial Intelligence to propose automated ways to analyse medical data with the aim to support doctors in delivering medical diagnoses. However, a main issue of these approaches is the lack of transparency and interpretability of the achieved results, making it hard to employ such methods for educational purposes. It is therefore necessary to develop new frameworks to enhance explainability in these solutions.

Results: In this paper, we present a novel full pipeline to generate automatically natural language explanations for medical diagnoses. The proposed solution starts from a clinical case description associated with a list of correct and incorrect diagnoses and, through the extraction of the relevant symptoms and findings, enriches the information contained in the description with verified medical knowledge from an ontology. Finally, the system returns a pattern-based explanation in natural language which elucidates why the correct (incorrect) diagnosis is the correct (incorrect) one. The main contribution of the paper is twofold: first, we propose two novel linguistic resources for the medical domain (i.e, a dataset of 314 clinical cases annotated with the medical entities from UMLS, and a database of biological boundaries for common findings), and second, a full Information Extraction pipeline to extract symptoms and findings from the clinical cases and match them with the terms in a medical ontology and to the biological boundaries. An extensive evaluation of the proposed approach shows the our method outperforms comparable approaches.

Conclusions: Our goal is to offer AI-assisted educational support framework to form clinical residents to formulate sound and exhaustive explanations for their diagnoses to patients.

背景:如今,人工智能领域开展了大量研究,提出了自动分析医疗数据的方法,旨在为医生提供医疗诊断支持。然而,这些方法的一个主要问题是所取得的结果缺乏透明度和可解释性,因此很难将这些方法用于教育目的。因此,有必要开发新的框架来提高这些解决方案的可解释性:在本文中,我们提出了一个新颖的完整管道,用于自动生成医学诊断的自然语言解释。所提出的解决方案从与正确和错误诊断列表相关联的临床病例描述开始,通过提取相关症状和检查结果,用本体论中经过验证的医学知识丰富描述中包含的信息。最后,系统用自然语言返回基于模式的解释,阐明正确(错误)诊断的原因。本文的主要贡献有两个方面:首先,我们为医学领域提出了两个新颖的语言资源(即一个由 314 个临床病例组成的数据集,其中注有来自 UMLS 的医学实体,以及一个关于常见检查结果的生物边界数据库);其次,我们提出了一个完整的信息提取管道,用于从临床病例中提取症状和检查结果,并将其与医学本体中的术语和生物边界相匹配。对所提方法的广泛评估表明,我们的方法优于同类方法:我们的目标是提供人工智能辅助教育支持框架,帮助临床住院医师为其对患者的诊断做出合理详尽的解释。
{"title":"Explanatory argumentation in natural language for correct and incorrect medical diagnoses.","authors":"Benjamin Molinet, Santiago Marro, Elena Cabrio, Serena Villata","doi":"10.1186/s13326-024-00306-1","DOIUrl":"10.1186/s13326-024-00306-1","url":null,"abstract":"<p><strong>Background: </strong>A huge amount of research is carried out nowadays in Artificial Intelligence to propose automated ways to analyse medical data with the aim to support doctors in delivering medical diagnoses. However, a main issue of these approaches is the lack of transparency and interpretability of the achieved results, making it hard to employ such methods for educational purposes. It is therefore necessary to develop new frameworks to enhance explainability in these solutions.</p><p><strong>Results: </strong>In this paper, we present a novel full pipeline to generate automatically natural language explanations for medical diagnoses. The proposed solution starts from a clinical case description associated with a list of correct and incorrect diagnoses and, through the extraction of the relevant symptoms and findings, enriches the information contained in the description with verified medical knowledge from an ontology. Finally, the system returns a pattern-based explanation in natural language which elucidates why the correct (incorrect) diagnosis is the correct (incorrect) one. The main contribution of the paper is twofold: first, we propose two novel linguistic resources for the medical domain (i.e, a dataset of 314 clinical cases annotated with the medical entities from UMLS, and a database of biological boundaries for common findings), and second, a full Information Extraction pipeline to extract symptoms and findings from the clinical cases and match them with the terms in a medical ontology and to the biological boundaries. An extensive evaluation of the proposed approach shows the our method outperforms comparable approaches.</p><p><strong>Conclusions: </strong>Our goal is to offer AI-assisted educational support framework to form clinical residents to formulate sound and exhaustive explanations for their diagnoses to patients.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"8"},"PeriodicalIF":1.9,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11138001/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141179661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic units: organizing knowledge graphs into semantically meaningful units of representation. 语义单元:将知识图谱组织成具有语义意义的表示单元。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-05-27 DOI: 10.1186/s13326-024-00310-5
Lars Vogt, Tobias Kuhn, Robert Hoehndorf

Background: In today's landscape of data management, the importance of knowledge graphs and ontologies is escalating as critical mechanisms aligned with the FAIR Guiding Principles-ensuring data and metadata are Findable, Accessible, Interoperable, and Reusable. We discuss three challenges that may hinder the effective exploitation of the full potential of FAIR knowledge graphs.

Results: We introduce "semantic units" as a conceptual solution, although currently exemplified only in a limited prototype. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs by adding another layer of triples on top of the conventional data layer. Semantic units and their subgraphs are represented by their own resource that instantiates a corresponding semantic unit class. We distinguish statement and compound units as basic categories of semantic units. A statement unit is the smallest, independent proposition that is semantically meaningful for a human reader. Depending on the relation of its underlying proposition, it consists of one or more triples. Organizing a knowledge graph into statement units results in a partition of the graph, with each triple belonging to exactly one statement unit. A compound unit, on the other hand, is a semantically meaningful collection of statement and compound units that form larger subgraphs. Some semantic units organize the graph into different levels of representational granularity, others orthogonally into different types of granularity trees or different frames of reference, structuring and organizing the knowledge graph into partially overlapping, partially enclosed subgraphs, each of which can be referenced by its own resource.

Conclusions: Semantic units, applicable in RDF/OWL and labeled property graphs, offer support for making statements about statements and facilitate graph-alignment, subgraph-matching, knowledge graph profiling, and for management of access restrictions to sensitive data. Additionally, we argue that organizing the graph into semantic units promotes the differentiation of ontological and discursive information, and that it also supports the differentiation of multiple frames of reference within the graph.

背景:在当今的数据管理领域,知识图谱和本体作为符合 FAIR 指导原则(确保数据和元数据可查找、可访问、可互操作和可重用)的关键机制,其重要性正在不断提升。我们讨论了可能阻碍有效利用 FAIR 知识图谱全部潜力的三个挑战:我们引入了 "语义单元 "作为概念性解决方案,尽管目前仅在有限的原型中进行了示范。语义单元通过在传统数据层之上添加另一层三元组,将知识图谱结构化为可识别且具有语义意义的子图谱。语义单元及其子图由各自的资源表示,这些资源实例化了相应的语义单元类。我们将语句单元和复合单元区分为语义单元的基本类别。语句单元是对人类读者有语义意义的最小的独立命题。根据其基础命题的关系,它由一个或多个三元组组成。将知识图谱组织成语句单元,可以对图谱进行分割,每个三元组恰好属于一个语句单元。另一方面,复合单元是语句单元和复合单元在语义上的集合,它们构成了更大的子图。一些语义单元将图组织成不同层次的表述粒度,另一些则正交地组织成不同类型的粒度树或不同的参照系,将知识图结构化并组织成部分重叠、部分封闭的子图,每个子图都可以被自己的资源引用:适用于RDF/OWL和标注属性图的语义单元可支持对语句进行陈述,促进图对齐、子图匹配、知识图谱分析以及对敏感数据访问限制的管理。此外,我们还认为,将图组织成语义单元可促进本体信息和话语信息的区分,还可支持在图中区分多个参照系。
{"title":"Semantic units: organizing knowledge graphs into semantically meaningful units of representation.","authors":"Lars Vogt, Tobias Kuhn, Robert Hoehndorf","doi":"10.1186/s13326-024-00310-5","DOIUrl":"10.1186/s13326-024-00310-5","url":null,"abstract":"<p><strong>Background: </strong>In today's landscape of data management, the importance of knowledge graphs and ontologies is escalating as critical mechanisms aligned with the FAIR Guiding Principles-ensuring data and metadata are Findable, Accessible, Interoperable, and Reusable. We discuss three challenges that may hinder the effective exploitation of the full potential of FAIR knowledge graphs.</p><p><strong>Results: </strong>We introduce \"semantic units\" as a conceptual solution, although currently exemplified only in a limited prototype. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs by adding another layer of triples on top of the conventional data layer. Semantic units and their subgraphs are represented by their own resource that instantiates a corresponding semantic unit class. We distinguish statement and compound units as basic categories of semantic units. A statement unit is the smallest, independent proposition that is semantically meaningful for a human reader. Depending on the relation of its underlying proposition, it consists of one or more triples. Organizing a knowledge graph into statement units results in a partition of the graph, with each triple belonging to exactly one statement unit. A compound unit, on the other hand, is a semantically meaningful collection of statement and compound units that form larger subgraphs. Some semantic units organize the graph into different levels of representational granularity, others orthogonally into different types of granularity trees or different frames of reference, structuring and organizing the knowledge graph into partially overlapping, partially enclosed subgraphs, each of which can be referenced by its own resource.</p><p><strong>Conclusions: </strong>Semantic units, applicable in RDF/OWL and labeled property graphs, offer support for making statements about statements and facilitate graph-alignment, subgraph-matching, knowledge graph profiling, and for management of access restrictions to sensitive data. Additionally, we argue that organizing the graph into semantic units promotes the differentiation of ontological and discursive information, and that it also supports the differentiation of multiple frames of reference within the graph.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"7"},"PeriodicalIF":1.9,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11131308/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141157997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging logical definitions and lexical features to detect missing IS-A relations in biomedical terminologies 利用逻辑定义和词汇特征检测生物医学术语中缺失的 IS-A 关系
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-05-01 DOI: 10.1186/s13326-024-00309-y
Rashmie Abeysinghe, Fengbo Zheng, Jay Shi, Samden D. Lhatoo, Licong Cui
Biomedical terminologies play a vital role in managing biomedical data. Missing IS-A relations in a biomedical terminology could be detrimental to its downstream usages. In this paper, we investigate an approach combining logical definitions and lexical features to discover missing IS-A relations in two biomedical terminologies: SNOMED CT and the National Cancer Institute (NCI) thesaurus. The method is applied to unrelated concept-pairs within non-lattice subgraphs: graph fragments within a terminology likely to contain various inconsistencies. Our approach first compares whether the logical definition of a concept is more general than that of the other concept. Then, we check whether the lexical features of the concept are contained in those of the other concept. If both constraints are satisfied, we suggest a potentially missing IS-A relation between the two concepts. The method identified 982 potential missing IS-A relations for SNOMED CT and 100 for NCI thesaurus. In order to assess the efficacy of our approach, a random sample of results belonging to the “Clinical Findings” and “Procedure” subhierarchies of SNOMED CT and results belonging to the “Drug, Food, Chemical or Biomedical Material” subhierarchy of the NCI thesaurus were evaluated by domain experts. The evaluation results revealed that 118 out of 150 suggestions are valid for SNOMED CT and 17 out of 20 are valid for NCI thesaurus.
生物医学术语在管理生物医学数据方面发挥着至关重要的作用。生物医学术语中缺失的 IS-A 关系可能不利于其下游使用。本文研究了一种结合逻辑定义和词汇特征的方法,以发现两种生物医学术语中缺失的 IS-A 关系:SNOMED CT 和美国国家癌症研究所 (NCI) 词库。该方法适用于非网格子图中的不相关概念对:术语中可能包含各种不一致的图片段。我们的方法首先比较一个概念的逻辑定义是否比另一个概念的逻辑定义更宽泛。然后,我们检查该概念的词法特征是否包含在另一个概念的词法特征中。如果这两个限制条件都满足,我们就认为这两个概念之间可能存在缺失的 IS-A 关系。该方法为 SNOMED CT 识别出 982 个潜在缺失 IS-A 关系,为 NCI 词库识别出 100 个潜在缺失 IS-A 关系。为了评估我们方法的有效性,领域专家随机抽取了属于 SNOMED CT "临床结果 "和 "程序 "子体系的结果以及属于 NCI 词库 "药物、食品、化学或生物医学材料 "子体系的结果进行评估。评估结果显示,150 条建议中有 118 条对 SNOMED CT 有效,20 条中有 17 条对 NCI 词库有效。
{"title":"Leveraging logical definitions and lexical features to detect missing IS-A relations in biomedical terminologies","authors":"Rashmie Abeysinghe, Fengbo Zheng, Jay Shi, Samden D. Lhatoo, Licong Cui","doi":"10.1186/s13326-024-00309-y","DOIUrl":"https://doi.org/10.1186/s13326-024-00309-y","url":null,"abstract":"Biomedical terminologies play a vital role in managing biomedical data. Missing IS-A relations in a biomedical terminology could be detrimental to its downstream usages. In this paper, we investigate an approach combining logical definitions and lexical features to discover missing IS-A relations in two biomedical terminologies: SNOMED CT and the National Cancer Institute (NCI) thesaurus. The method is applied to unrelated concept-pairs within non-lattice subgraphs: graph fragments within a terminology likely to contain various inconsistencies. Our approach first compares whether the logical definition of a concept is more general than that of the other concept. Then, we check whether the lexical features of the concept are contained in those of the other concept. If both constraints are satisfied, we suggest a potentially missing IS-A relation between the two concepts. The method identified 982 potential missing IS-A relations for SNOMED CT and 100 for NCI thesaurus. In order to assess the efficacy of our approach, a random sample of results belonging to the “Clinical Findings” and “Procedure” subhierarchies of SNOMED CT and results belonging to the “Drug, Food, Chemical or Biomedical Material” subhierarchy of the NCI thesaurus were evaluated by domain experts. The evaluation results revealed that 118 out of 150 suggestions are valid for SNOMED CT and 17 out of 20 are valid for NCI thesaurus.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Elucidating the semantics-topology trade-off for knowledge inference-based pharmacological discovery 阐明基于知识推理的药理学发现的语义-拓扑权衡
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-05-01 DOI: 10.1186/s13326-024-00308-z
Daniel N. Sosa, Georgiana Neculae, Julien Fauqueur, Russ B. Altman
Leveraging AI for synthesizing the deluge of biomedical knowledge has great potential for pharmacological discovery with applications including developing new therapeutics for untreated diseases and repurposing drugs as emergent pandemic treatments. Creating knowledge graph representations of interacting drugs, diseases, genes, and proteins enables discovery via embedding-based ML approaches and link prediction. Previously, it has been shown that these predictive methods are susceptible to biases from network structure, namely that they are driven not by discovering nuanced biological understanding of mechanisms, but based on high-degree hub nodes. In this work, we study the confounding effect of network topology on biological relation semantics by creating an experimental pipeline of knowledge graph semantic and topological perturbations. We show that the drop in drug repurposing performance from ablating meaningful semantics increases by 21% and 38% when mitigating topological bias in two networks. We demonstrate that new methods for representing knowledge and inferring new knowledge must be developed for making use of biomedical semantics for pharmacological innovation, and we suggest fruitful avenues for their development.
利用人工智能合成大量的生物医学知识,在药理学发现方面具有巨大的潜力,其应用包括为未治疗的疾病开发新的治疗方法,以及将药物重新用作紧急流行病的治疗方法。创建相互作用的药物、疾病、基因和蛋白质的知识图谱表示法,可以通过基于嵌入的 ML 方法和链接预测进行发现。以前的研究表明,这些预测方法很容易受到网络结构偏差的影响,即这些方法的驱动力不是发现对机制的细微生物学理解,而是基于高阶枢纽节点。在这项工作中,我们通过创建知识图谱语义和拓扑扰动的实验管道,研究了网络拓扑结构对生物关系语义的干扰效应。我们发现,在减轻两个网络中的拓扑偏差时,消除有意义的语义导致的药物再利用性能下降分别增加了 21% 和 38%。我们证明,要利用生物医学语义进行药物创新,就必须开发新的知识表示和新知识推断方法,并提出了富有成效的开发途径。
{"title":"Elucidating the semantics-topology trade-off for knowledge inference-based pharmacological discovery","authors":"Daniel N. Sosa, Georgiana Neculae, Julien Fauqueur, Russ B. Altman","doi":"10.1186/s13326-024-00308-z","DOIUrl":"https://doi.org/10.1186/s13326-024-00308-z","url":null,"abstract":"Leveraging AI for synthesizing the deluge of biomedical knowledge has great potential for pharmacological discovery with applications including developing new therapeutics for untreated diseases and repurposing drugs as emergent pandemic treatments. Creating knowledge graph representations of interacting drugs, diseases, genes, and proteins enables discovery via embedding-based ML approaches and link prediction. Previously, it has been shown that these predictive methods are susceptible to biases from network structure, namely that they are driven not by discovering nuanced biological understanding of mechanisms, but based on high-degree hub nodes. In this work, we study the confounding effect of network topology on biological relation semantics by creating an experimental pipeline of knowledge graph semantic and topological perturbations. We show that the drop in drug repurposing performance from ablating meaningful semantics increases by 21% and 38% when mitigating topological bias in two networks. We demonstrate that new methods for representing knowledge and inferring new knowledge must be developed for making use of biomedical semantics for pharmacological innovation, and we suggest fruitful avenues for their development.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"61 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RecSOI: recommending research directions using statements of ignorance RecSOI:利用无知声明推荐研究方向
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-22 DOI: 10.1186/s13326-024-00304-3
Adrien Bibal, Nourah M. Salem, Rémi Cardon, Elizabeth K. White, Daniel E. Acuna, Robin Burke, Lawrence E. Hunter
The more science advances, the more questions are asked. This compounding growth can make it difficult to keep up with current research directions. Furthermore, this difficulty is exacerbated for junior researchers who enter fields with already large bases of potentially fruitful research avenues. In this paper, we propose a novel task and a recommender system for research directions, RecSOI, that draws from statements of ignorance (SOIs) found in the research literature. By building researchers’ profiles based on textual elements, RecSOI generates personalized recommendations of potential research directions tailored to their interests. In addition, RecSOI provides context for the recommended SOIs, so that users can quickly evaluate how relevant the research direction is for them. In this paper, we provide an overview of RecSOI’s functioning, implementation, and evaluation, demonstrating its effectiveness in guiding researchers through the vast landscape of potential research directions.
科学越进步,问题就越多。这种复合式增长会让人难以跟上当前的研究方向。此外,对于初级研究人员来说,如果他们进入的领域已经有大量潜在的富有成效的研究途径,那么这种困难就会更加严重。在本文中,我们提出了一个新颖的任务和研究方向推荐系统 RecSOI,它借鉴了研究文献中的无知声明 (SOI)。通过基于文本元素建立研究人员档案,RecSOI 可根据研究人员的兴趣生成个性化的潜在研究方向推荐。此外,RecSOI 还为推荐的 SOIs 提供上下文,以便用户快速评估研究方向与自己的相关性。在本文中,我们概述了 RecSOI 的功能、实施和评估情况,展示了它在引导研究人员浏览大量潜在研究方向方面的有效性。
{"title":"RecSOI: recommending research directions using statements of ignorance","authors":"Adrien Bibal, Nourah M. Salem, Rémi Cardon, Elizabeth K. White, Daniel E. Acuna, Robin Burke, Lawrence E. Hunter","doi":"10.1186/s13326-024-00304-3","DOIUrl":"https://doi.org/10.1186/s13326-024-00304-3","url":null,"abstract":"The more science advances, the more questions are asked. This compounding growth can make it difficult to keep up with current research directions. Furthermore, this difficulty is exacerbated for junior researchers who enter fields with already large bases of potentially fruitful research avenues. In this paper, we propose a novel task and a recommender system for research directions, RecSOI, that draws from statements of ignorance (SOIs) found in the research literature. By building researchers’ profiles based on textual elements, RecSOI generates personalized recommendations of potential research directions tailored to their interests. In addition, RecSOI provides context for the recommended SOIs, so that users can quickly evaluate how relevant the research direction is for them. In this paper, we provide an overview of RecSOI’s functioning, implementation, and evaluation, demonstrating its effectiveness in guiding researchers through the vast landscape of potential research directions.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"32 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140634747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enriching the FIDEO ontology with food-drug interactions from online knowledge sources. 利用在线知识源中的食物-药物相互作用丰富 FIDEO 本体。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-03-04 DOI: 10.1186/s13326-024-00302-5
Rabia Azzi, Georgeta Bordea, Romain Griffier, Jean Noël Nikiema, Fleur Mougin

The increasing number of articles on adverse interactions that may occur when specific foods are consumed with certain drugs makes it difficult to keep up with the latest findings. Conflicting information is available in the scientific literature and specialized knowledge bases because interactions are described in an unstructured or semi-structured format. The FIDEO ontology aims to integrate and represent information about food-drug interactions in a structured way. This article reports on the new version of this ontology in which more than 1700 interactions are integrated from two online resources: DrugBank and Hedrine. These food-drug interactions have been represented in FIDEO in the form of precompiled concepts, each of which specifies both the food and the drug involved. Additionally, competency questions that can be answered are reviewed, and avenues for further enrichment are discussed.

有关特定食物与某些药物一起食用时可能发生不良相互作用的文章越来越多,这使得我们很难跟上最新的研究成果。科学文献和专业知识库中的信息相互矛盾,因为对相互作用的描述是非结构化或半结构化的。FIDEO 本体论旨在以结构化的方式整合和表述有关食物-药物相互作用的信息。本文报告了该本体的新版本,其中整合了来自两个在线资源的 1700 多种相互作用:DrugBank和Hedrine。这些食物-药物相互作用在 FIDEO 中以预编译概念的形式呈现,每个概念都指明了所涉及的食物和药物。此外,还回顾了可以回答的能力问题,并讨论了进一步丰富的途径。
{"title":"Enriching the FIDEO ontology with food-drug interactions from online knowledge sources.","authors":"Rabia Azzi, Georgeta Bordea, Romain Griffier, Jean Noël Nikiema, Fleur Mougin","doi":"10.1186/s13326-024-00302-5","DOIUrl":"10.1186/s13326-024-00302-5","url":null,"abstract":"<p><p>The increasing number of articles on adverse interactions that may occur when specific foods are consumed with certain drugs makes it difficult to keep up with the latest findings. Conflicting information is available in the scientific literature and specialized knowledge bases because interactions are described in an unstructured or semi-structured format. The FIDEO ontology aims to integrate and represent information about food-drug interactions in a structured way. This article reports on the new version of this ontology in which more than 1700 interactions are integrated from two online resources: DrugBank and Hedrine. These food-drug interactions have been represented in FIDEO in the form of precompiled concepts, each of which specifies both the food and the drug involved. Additionally, competency questions that can be answered are reviewed, and avenues for further enrichment are discussed.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"1"},"PeriodicalIF":1.9,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10913206/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140028059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The use of foundational ontologies in biomedical research 在生物医学研究中使用基础本体论
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-12-11 DOI: 10.1186/s13326-023-00300-z
César H. Bernabé, Núria Queralt-Rosinach, Vítor E. Silva Souza, Luiz Olavo Bonino da Silva Santos, Barend Mons, Annika Jacobsen, Marco Roos
The FAIR principles recommend the use of controlled vocabularies, such as ontologies, to define data and metadata concepts. Ontologies are currently modelled following different approaches, sometimes describing conflicting definitions of the same concepts, which can affect interoperability. To cope with that, prior literature suggests organising ontologies in levels, where domain specific (low-level) ontologies are grounded in domain independent high-level ontologies (i.e., foundational ontologies). In this level-based organisation, foundational ontologies work as translators of intended meaning, thus improving interoperability. Despite their considerable acceptance in biomedical research, there are very few studies testing foundational ontologies. This paper describes a systematic literature mapping that was conducted to understand how foundational ontologies are used in biomedical research and to find empirical evidence supporting their claimed (dis)advantages. From a set of 79 selected papers, we identified that foundational ontologies are used for several purposes: ontology construction, repair, mapping, and ontology-based data analysis. Foundational ontologies are claimed to improve interoperability, enhance reasoning, speed up ontology development and facilitate maintainability. The complexity of using foundational ontologies is the most commonly cited downside. Despite being used for several purposes, there were hardly any experiments (1 paper) testing the claims for or against the use of foundational ontologies. In the subset of 49 papers that describe the development of an ontology, it was observed a low adherence to ontology construction (16 papers) and ontology evaluation formal methods (4 papers). Our findings have two main implications. First, the lack of empirical evidence about the use of foundational ontologies indicates a need for evaluating the use of such artefacts in biomedical research. Second, the low adherence to formal methods illustrates how the field could benefit from a more systematic approach when dealing with the development and evaluation of ontologies. The understanding of how foundational ontologies are used in the biomedical field can drive future research towards the improvement of ontologies and, consequently, data FAIRness. The adoption of formal methods can impact the quality and sustainability of ontologies, and reusing these methods from other fields is encouraged.
FAIR 原则建议使用本体等受控词汇表来定义数据和元数据概念。本体目前采用不同的建模方法,有时对相同概念的定义会相互冲突,从而影响互操作性。为了解决这个问题,以前的文献建议按层次组织本体,其中特定领域(低层次)本体以独立于领域的高层次本体(即基础本体)为基础。在这种基于层次的组织方式中,基础本体充当了预期意义的翻译者,从而提高了互操作性。尽管基础本体在生物医学研究中被广泛接受,但对基础本体进行测试的研究却寥寥无几。本文介绍了为了解基础性本体在生物医学研究中的应用情况并寻找支持其所声称的(不)优势的实证证据而进行的系统性文献映射。从79篇被选中的论文中,我们发现基础本体有几种用途:本体构建、修复、映射和基于本体的数据分析。基础本体被认为可以提高互操作性、增强推理能力、加快本体开发速度并促进可维护性。使用基础本体的复杂性是最常被提到的缺点。尽管基础性本体被用于多种目的,但几乎没有任何实验(1 篇论文)检验过使用基础性本体的利弊。在 49 篇描述本体开发的子集论文中,我们观察到对本体构建(16 篇)和本体评估正式方法(4 篇)的遵守程度较低。我们的发现有两个主要影响。首先,缺乏有关使用基础性本体的经验证据表明,有必要对生物医学研究中此类人工制品的使用进行评估。其次,对正式方法的遵守程度很低,这说明该领域在处理本体的开发和评估时可以从更系统的方法中获益。了解生物医学领域如何使用基础性本体,可以推动未来研究改进本体,从而提高数据的公平性。采用正规方法可以影响本体的质量和可持续性,鼓励从其他领域重新使用这些方法。
{"title":"The use of foundational ontologies in biomedical research","authors":"César H. Bernabé, Núria Queralt-Rosinach, Vítor E. Silva Souza, Luiz Olavo Bonino da Silva Santos, Barend Mons, Annika Jacobsen, Marco Roos","doi":"10.1186/s13326-023-00300-z","DOIUrl":"https://doi.org/10.1186/s13326-023-00300-z","url":null,"abstract":"The FAIR principles recommend the use of controlled vocabularies, such as ontologies, to define data and metadata concepts. Ontologies are currently modelled following different approaches, sometimes describing conflicting definitions of the same concepts, which can affect interoperability. To cope with that, prior literature suggests organising ontologies in levels, where domain specific (low-level) ontologies are grounded in domain independent high-level ontologies (i.e., foundational ontologies). In this level-based organisation, foundational ontologies work as translators of intended meaning, thus improving interoperability. Despite their considerable acceptance in biomedical research, there are very few studies testing foundational ontologies. This paper describes a systematic literature mapping that was conducted to understand how foundational ontologies are used in biomedical research and to find empirical evidence supporting their claimed (dis)advantages. From a set of 79 selected papers, we identified that foundational ontologies are used for several purposes: ontology construction, repair, mapping, and ontology-based data analysis. Foundational ontologies are claimed to improve interoperability, enhance reasoning, speed up ontology development and facilitate maintainability. The complexity of using foundational ontologies is the most commonly cited downside. Despite being used for several purposes, there were hardly any experiments (1 paper) testing the claims for or against the use of foundational ontologies. In the subset of 49 papers that describe the development of an ontology, it was observed a low adherence to ontology construction (16 papers) and ontology evaluation formal methods (4 papers). Our findings have two main implications. First, the lack of empirical evidence about the use of foundational ontologies indicates a need for evaluating the use of such artefacts in biomedical research. Second, the low adherence to formal methods illustrates how the field could benefit from a more systematic approach when dealing with the development and evaluation of ontologies. The understanding of how foundational ontologies are used in the biomedical field can drive future research towards the improvement of ontologies and, consequently, data FAIRness. The adoption of formal methods can impact the quality and sustainability of ontologies, and reusing these methods from other fields is encouraged.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"31 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138569337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BioBLP: a modular framework for learning on multimodal biomedical knowledge graphs BioBLP:多模态生物医学知识图谱的模块化学习框架
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-12-08 DOI: 10.1186/s13326-023-00301-y
Daniel Daza, Dimitrios Alivanistos, Payal Mitra, Thom Pijnenburg, Michael Cochez, Paul Groth
Knowledge graphs (KGs) are an important tool for representing complex relationships between entities in the biomedical domain. Several methods have been proposed for learning embeddings that can be used to predict new links in such graphs. Some methods ignore valuable attribute data associated with entities in biomedical KGs, such as protein sequences, or molecular graphs. Other works incorporate such data, but assume that entities can be represented with the same data modality. This is not always the case for biomedical KGs, where entities exhibit heterogeneous modalities that are central to their representation in the subject domain. We aim to understand how to incorporate multimodal data into biomedical KG embeddings, and analyze the resulting performance in comparison with traditional methods. We propose a modular framework for learning embeddings in KGs with entity attributes, that allows encoding attribute data of different modalities while also supporting entities with missing attributes. We additionally propose an efficient pretraining strategy for reducing the required training runtime. We train models using a biomedical KG containing approximately 2 million triples, and evaluate the performance of the resulting entity embeddings on the tasks of link prediction, and drug-protein interaction prediction, comparing against methods that do not take attribute data into account. In the standard link prediction evaluation, the proposed method results in competitive, yet lower performance than baselines that do not use attribute data. When evaluated in the task of drug-protein interaction prediction, the method compares favorably with the baselines. Further analyses show that incorporating attribute data does outperform baselines over entities below a certain node degree, comprising approximately 75% of the diseases in the graph. We also observe that optimizing attribute encoders is a challenging task that increases optimization costs. Our proposed pretraining strategy yields significantly higher performance while reducing the required training runtime. BioBLP allows to investigate different ways of incorporating multimodal biomedical data for learning representations in KGs. With a particular implementation, we find that incorporating attribute data does not consistently outperform baselines, but improvements are obtained on a comparatively large subset of entities below a specific node-degree. Our results indicate a potential for improved performance in scientific discovery tasks where understudied areas of the KG would benefit from link prediction methods.
知识图谱(KG)是表示生物医学领域实体间复杂关系的重要工具。目前已提出了几种学习嵌入的方法,可用于预测此类图中的新链接。有些方法忽略了生物医学 KG 中与实体相关的宝贵属性数据,如蛋白质序列或分子图。其他方法包含了这些数据,但假设实体可以用相同的数据模式来表示。生物医学 KG 并不总是这种情况,其中的实体表现出不同的模式,而这些模式对它们在主题领域中的表示至关重要。我们的目标是了解如何将多模态数据纳入生物医学 KG 嵌入,并与传统方法比较分析由此产生的性能。我们提出了一个模块化框架,用于学习带有实体属性的 KG 嵌入,该框架允许对不同模态的属性数据进行编码,同时还支持属性缺失的实体。此外,我们还提出了一种高效的预训练策略,以减少所需的训练运行时间。我们使用包含约 200 万个三元组的生物医学 KG 对模型进行了训练,并在链接预测和药物-蛋白质相互作用预测任务中评估了所得实体嵌入的性能,并与不考虑属性数据的方法进行了比较。在标准链接预测评估中,提出的方法具有竞争力,但性能低于不使用属性数据的基线方法。在药物-蛋白质相互作用预测任务中进行评估时,该方法与基线方法相比更胜一筹。进一步的分析表明,对于低于一定节点度的实体(约占图中疾病的 75%),结合属性数据的效果确实优于基线方法。我们还发现,优化属性编码器是一项具有挑战性的任务,会增加优化成本。我们提出的预训练策略能显著提高性能,同时减少所需的训练运行时间。BioBLP 允许研究将多模态生物医学数据纳入幼稚园学习表征的不同方法。通过特定的实现方法,我们发现纳入属性数据并不能始终优于基线,但在特定节点度以下的相对较大的实体子集上却能获得改进。我们的研究结果表明,在科学发现任务中,KG 中未被充分研究的领域将从链接预测方法中获益,从而提高性能。
{"title":"BioBLP: a modular framework for learning on multimodal biomedical knowledge graphs","authors":"Daniel Daza, Dimitrios Alivanistos, Payal Mitra, Thom Pijnenburg, Michael Cochez, Paul Groth","doi":"10.1186/s13326-023-00301-y","DOIUrl":"https://doi.org/10.1186/s13326-023-00301-y","url":null,"abstract":"Knowledge graphs (KGs) are an important tool for representing complex relationships between entities in the biomedical domain. Several methods have been proposed for learning embeddings that can be used to predict new links in such graphs. Some methods ignore valuable attribute data associated with entities in biomedical KGs, such as protein sequences, or molecular graphs. Other works incorporate such data, but assume that entities can be represented with the same data modality. This is not always the case for biomedical KGs, where entities exhibit heterogeneous modalities that are central to their representation in the subject domain. We aim to understand how to incorporate multimodal data into biomedical KG embeddings, and analyze the resulting performance in comparison with traditional methods. We propose a modular framework for learning embeddings in KGs with entity attributes, that allows encoding attribute data of different modalities while also supporting entities with missing attributes. We additionally propose an efficient pretraining strategy for reducing the required training runtime. We train models using a biomedical KG containing approximately 2 million triples, and evaluate the performance of the resulting entity embeddings on the tasks of link prediction, and drug-protein interaction prediction, comparing against methods that do not take attribute data into account. In the standard link prediction evaluation, the proposed method results in competitive, yet lower performance than baselines that do not use attribute data. When evaluated in the task of drug-protein interaction prediction, the method compares favorably with the baselines. Further analyses show that incorporating attribute data does outperform baselines over entities below a certain node degree, comprising approximately 75% of the diseases in the graph. We also observe that optimizing attribute encoders is a challenging task that increases optimization costs. Our proposed pretraining strategy yields significantly higher performance while reducing the required training runtime. BioBLP allows to investigate different ways of incorporating multimodal biomedical data for learning representations in KGs. With a particular implementation, we find that incorporating attribute data does not consistently outperform baselines, but improvements are obtained on a comparatively large subset of entities below a specific node-degree. Our results indicate a potential for improved performance in scientific discovery tasks where understudied areas of the KG would benefit from link prediction methods.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"86 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138562929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing resolvability, parsability, and consistency of RDF resources: a use case in rare diseases. 评估 RDF 资源的可解析性、可分析性和一致性:罕见疾病用例。
IF 1.6 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-12-05 DOI: 10.1186/s13326-023-00299-3
Shuxin Zhang, Nirupama Benis, Ronald Cornet

Introduction: Healthcare data and the knowledge gleaned from it play a key role in improving the health of current and future patients. These knowledge sources are regularly represented as 'linked' resources based on the Resource Description Framework (RDF). Making resources 'linkable' to facilitate their interoperability is especially important in the rare-disease domain, where health resources are scattered and scarce. However, to benefit from using RDF, resources need to be of good quality. Based on existing metrics, we aim to assess the quality of RDF resources related to rare diseases and provide recommendations for their improvement.

Methods: Sixteen resources of relevance for the rare-disease domain were selected: two schemas, three metadatasets, and eleven ontologies. These resources were tested on six objective metrics regarding resolvability, parsability, and consistency. Any URI that failed the test based on any of the six metrics was recorded as an error. The error count and percentage of each tested resource were recorded. The assessment results were represented in RDF, using the Data Quality Vocabulary schema.

Results: For three out of the six metrics, the assessment revealed quality issues. Eleven resources have non-resolvable URIs with proportion to all URIs ranging from 0.1% (6/6,712) in the Anatomical Therapeutic Chemical Classification to 13.7% (17/124) in the WikiPathways Ontology; seven resources have undefined URIs; and two resources have incorrectly used properties of the 'owl:ObjectProperty' type. Individual errors were examined to generate suggestions for the development of high-quality RDF resources, including the tested resources.

Conclusion: We assessed the resolvability, parsability, and consistency of RDF resources in the rare-disease domain, and determined the extent of these types of errors that potentially affect interoperability. The qualitative investigation on these errors reveals how they can be avoided. All findings serve as valuable input for the development of a guideline for creating high-quality RDF resources, thereby enhancing the interoperability of biomedical resources.

导言:医疗保健数据和从中获取的知识在改善当前和未来患者的健康状况方面发挥着关键作用。这些知识源通常以基于资源描述框架(RDF)的 "链接 "资源形式表示。使资源 "可链接 "以促进其互操作性在罕见病领域尤为重要,因为该领域的医疗资源分散且稀缺。然而,要从使用 RDF 中获益,资源必须具有良好的质量。基于现有的衡量标准,我们旨在评估与罕见病相关的 RDF 资源的质量,并提出改进建议:我们选择了 16 个与罕见病领域相关的资源:两个模式、三个元数据集和 11 个本体。对这些资源进行了有关可解析性、可分析性和一致性的六项客观指标测试。任何未通过六项指标中任何一项测试的 URI 都会被记录为错误。每个测试资源的错误计数和百分比都被记录下来。评估结果使用数据质量词汇模式 RDF 表示:在六个指标中,有三个指标的评估结果显示存在质量问题。有 11 个资源的 URI 无法解析,占所有 URI 的比例从解剖学治疗化学分类的 0.1%(6/6,712)到 WikiPathways 本体的 13.7%(17/124)不等;有 7 个资源的 URI 未定义;有 2 个资源错误地使用了 "owl:ObjectProperty "类型的属性。通过对个别错误的研究,我们提出了开发高质量 RDF 资源的建议,其中包括测试过的资源:我们评估了罕见病领域中 RDF 资源的可解析性、可分析性和一致性,并确定了这些可能影响互操作性的错误类型的严重程度。对这些错误的定性调查揭示了如何避免这些错误。所有研究结果都为制定创建高质量 RDF 资源的指南提供了有价值的信息,从而提高了生物医学资源的互操作性。
{"title":"Assessing resolvability, parsability, and consistency of RDF resources: a use case in rare diseases.","authors":"Shuxin Zhang, Nirupama Benis, Ronald Cornet","doi":"10.1186/s13326-023-00299-3","DOIUrl":"10.1186/s13326-023-00299-3","url":null,"abstract":"<p><strong>Introduction: </strong>Healthcare data and the knowledge gleaned from it play a key role in improving the health of current and future patients. These knowledge sources are regularly represented as 'linked' resources based on the Resource Description Framework (RDF). Making resources 'linkable' to facilitate their interoperability is especially important in the rare-disease domain, where health resources are scattered and scarce. However, to benefit from using RDF, resources need to be of good quality. Based on existing metrics, we aim to assess the quality of RDF resources related to rare diseases and provide recommendations for their improvement.</p><p><strong>Methods: </strong>Sixteen resources of relevance for the rare-disease domain were selected: two schemas, three metadatasets, and eleven ontologies. These resources were tested on six objective metrics regarding resolvability, parsability, and consistency. Any URI that failed the test based on any of the six metrics was recorded as an error. The error count and percentage of each tested resource were recorded. The assessment results were represented in RDF, using the Data Quality Vocabulary schema.</p><p><strong>Results: </strong>For three out of the six metrics, the assessment revealed quality issues. Eleven resources have non-resolvable URIs with proportion to all URIs ranging from 0.1% (6/6,712) in the Anatomical Therapeutic Chemical Classification to 13.7% (17/124) in the WikiPathways Ontology; seven resources have undefined URIs; and two resources have incorrectly used properties of the 'owl:ObjectProperty' type. Individual errors were examined to generate suggestions for the development of high-quality RDF resources, including the tested resources.</p><p><strong>Conclusion: </strong>We assessed the resolvability, parsability, and consistency of RDF resources in the rare-disease domain, and determined the extent of these types of errors that potentially affect interoperability. The qualitative investigation on these errors reveals how they can be avoided. All findings serve as valuable input for the development of a guideline for creating high-quality RDF resources, thereby enhancing the interoperability of biomedical resources.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"19"},"PeriodicalIF":1.6,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10696869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138487612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Semantics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1