首页 > 最新文献

Journal of Biomedical Semantics最新文献

英文 中文
Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph. COVID-19研究的影响:使用机器学习和领域独立知识图预测有影响力的学术文献的研究。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-28 DOI: 10.1186/s13326-023-00298-4
Gollam Rabby, Jennifer D'Souza, Allard Oelen, Lucie Dvorackova, Vojtěch Svátek, Sören Auer

Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.

许多研究调查了文献计量学特征和未分类的学术文献,以进行有影响力的学术文献预测任务。在本文中,我们描述了我们的工作,试图超越文献计量元数据来预测有影响力的学术文献。此外,本工作还研究了对分类学术文献有影响的学术文献预测任务。我们还提出了一种新的方法,利用领域无关的知识图来增强文献表示方法,利用分类的学术内容来寻找有影响力的学术文献。作为输入库,我们使用了世卫组织关于COVID-19主题的学术文献语料库。本研究考察了机器学习的不同文档表示方法,包括TF-IDF、BOW和基于嵌入的语言模型(BERT)。TF-IDF文档表示方法比其他方法效果更好。从测试的各种机器学习方法中,逻辑回归在学术文献类别分类方面表现优于其他方法,随机森林算法在有影响力的学术文献预测方面取得了最好的结果,借助领域无关的知识图,特别是DBpedia,增强了预测具有分类学术内容的有影响力的学术文献的文档表示方法。在这种情况下,我们的研究结合了最先进的机器学习方法和BOW文档表示方法。我们还使用直接类型(RDF类型)和来自DBpedia的不限定关系增强了BOW文档表示。从这个实验中,我们没有发现增强的文档表示对学术文档类别分类有任何影响。我们发现在有影响力的学术文献预测中使用分类数据有一定的效果。
{"title":"Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph.","authors":"Gollam Rabby, Jennifer D'Souza, Allard Oelen, Lucie Dvorackova, Vojtěch Svátek, Sören Auer","doi":"10.1186/s13326-023-00298-4","DOIUrl":"10.1186/s13326-023-00298-4","url":null,"abstract":"<p><p>Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"18"},"PeriodicalIF":1.9,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683290/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138451554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data management plans as linked open data: exploiting ARGOS FAIR and machine actionable outputs in the OpenAIRE research graph. 数据管理计划作为链接的开放数据:利用OpenAIRE研究图中的ARGOS FAIR和机器可操作输出。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-02 DOI: 10.1186/s13326-023-00297-5
Elli Papadopoulou, Alessia Bardi, George Kakaletris, Diamadis Tziotzios, Paolo Manghi, Natalia Manola

Background: Open Science Graphs (OSGs) are scientific knowledge graphs representing different entities of the research lifecycle (e.g. projects, people, research outcomes, institutions) and the relationships among them. They present a contextualized view of current research that supports discovery, re-use, reproducibility, monitoring, transparency and omni-comprehensive assessment. A Data Management Plan (DMP) contains information concerning both the research processes and the data collected, generated and/or re-used during a project's lifetime. Automated solutions and workflows that connect DMPs with the actual data and other contextual information (e.g., publications, fundings) are missing from the landscape. DMPs being submitted as deliverables also limit their findability. In an open and FAIR-enabling research ecosystem information linking between research processes and research outputs is essential. ARGOS tool for FAIR data management contributes to the OpenAIRE Research Graph (RG) and utilises its underlying services and trusted sources to progressively automate validation and automations of Research Data Management (RDM) practices.

Results: A comparative analysis was conducted between the data models of ARGOS and OpenAIRE Research Graph against the DMP Common Standard. Following this, we extended ARGOS with export format converters and semantic tagging, and the OpenAIRE RG with a DMP entity and semantics between existing entities and relationships. This enabled the integration of ARGOS machine actionable DMPs (ma-DMPs) to the OpenAIRE OSG, enriching and exposing DMPs as FAIR outputs.

Conclusions: This paper, to our knowledge, is the first to introduce exposing ma-DMPs in OSGs and making the link between OSGs and DMPs, introducing the latter as entities in the research lifecycle. Further, it provides insight to ARGOS DMP service interoperability practices and integrations to populate the OpenAIRE Research Graph with DMP entities and relationships and strengthen both FAIRness of outputs as well as information exchange in a standard way.

背景:开放科学图(OSG)是代表研究生命周期的不同实体(如项目、人员、研究成果、机构)及其之间关系的科学知识图。他们提出了当前研究的背景观点,支持发现、重复使用、再现性、监测、透明度和全方位综合评估。数据管理计划(DMP)包含有关研究过程以及在项目生命周期内收集、生成和/或重复使用的数据的信息。将DMP与实际数据和其他上下文信息(如出版物、资助)连接起来的自动化解决方案和工作流程在这一领域中缺失。DMP作为可交付成果提交也限制了其可查找性。在一个开放和FAIR的研究生态系统中,研究过程和研究产出之间的信息联系至关重要。用于FAIR数据管理的ARGOS工具有助于OpenAIRE研究图(RG),并利用其底层服务和可信来源逐步自动化研究数据管理(RDM)实践的验证和自动化。结果:ARGOS和OpenAIRE Research Graph的数据模型与DMP通用标准进行了比较分析。在此之后,我们使用导出格式转换器和语义标记扩展了ARGOS,并使用DMP实体和现有实体和关系之间的语义扩展了OpenAIRE RG。这使得ARGOS机器可操作DMP(ma DMP)能够集成到OpenAIRE OSG,丰富并公开DMP作为FAIR输出。结论:据我们所知,本文首次介绍了在OSG中暴露ma DMP,并在OSG和DMP之间建立联系,将后者作为研究生命周期中的实体引入。此外,它还深入了解了ARGOS DMP服务互操作性实践和集成,以用DMP实体和关系填充OpenAIRE研究图,并以标准方式加强输出的公平性和信息交换。
{"title":"Data management plans as linked open data: exploiting ARGOS FAIR and machine actionable outputs in the OpenAIRE research graph.","authors":"Elli Papadopoulou,&nbsp;Alessia Bardi,&nbsp;George Kakaletris,&nbsp;Diamadis Tziotzios,&nbsp;Paolo Manghi,&nbsp;Natalia Manola","doi":"10.1186/s13326-023-00297-5","DOIUrl":"10.1186/s13326-023-00297-5","url":null,"abstract":"<p><strong>Background: </strong>Open Science Graphs (OSGs) are scientific knowledge graphs representing different entities of the research lifecycle (e.g. projects, people, research outcomes, institutions) and the relationships among them. They present a contextualized view of current research that supports discovery, re-use, reproducibility, monitoring, transparency and omni-comprehensive assessment. A Data Management Plan (DMP) contains information concerning both the research processes and the data collected, generated and/or re-used during a project's lifetime. Automated solutions and workflows that connect DMPs with the actual data and other contextual information (e.g., publications, fundings) are missing from the landscape. DMPs being submitted as deliverables also limit their findability. In an open and FAIR-enabling research ecosystem information linking between research processes and research outputs is essential. ARGOS tool for FAIR data management contributes to the OpenAIRE Research Graph (RG) and utilises its underlying services and trusted sources to progressively automate validation and automations of Research Data Management (RDM) practices.</p><p><strong>Results: </strong>A comparative analysis was conducted between the data models of ARGOS and OpenAIRE Research Graph against the DMP Common Standard. Following this, we extended ARGOS with export format converters and semantic tagging, and the OpenAIRE RG with a DMP entity and semantics between existing entities and relationships. This enabled the integration of ARGOS machine actionable DMPs (ma-DMPs) to the OpenAIRE OSG, enriching and exposing DMPs as FAIR outputs.</p><p><strong>Conclusions: </strong>This paper, to our knowledge, is the first to introduce exposing ma-DMPs in OSGs and making the link between OSGs and DMPs, introducing the latter as entities in the research lifecycle. Further, it provides insight to ARGOS DMP service interoperability practices and integrations to populate the OpenAIRE Research Graph with DMP entities and relationships and strengthen both FAIRness of outputs as well as information exchange in a standard way.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"17"},"PeriodicalIF":1.9,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10621150/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71423853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-based refinement of mappings in evolving life science ontologies. 进化生命科学本体论中映射的基于上下文的精化。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-10-19 DOI: 10.1186/s13326-023-00294-8
Victor Eiti Yamamoto, Juliana Medeiros Destro, Julio Cesar Dos Reis

Background: Biomedical computational systems benefit from ontologies and their associated mappings. Indeed, aligned ontologies in life sciences play a central role in several semantic-enabled tasks, especially in data exchange. It is crucial to maintain up-to-date alignments according to new knowledge inserted in novel ontology releases. Refining ontology mappings in place, based on adding concepts, demands further research.

Results: This article studies the mapping refinement phenomenon by proposing techniques to refine a set of established mappings based on the evolution of biomedical ontologies. In our first analysis, we investigate ways of suggesting correspondences with the new ontology version without applying a matching operation to the whole set of ontology entities. In the second analysis, the refinement technique enables deriving new mappings and updating the semantic type of the mapping beyond equivalence. Our study explores the neighborhood of concepts in the alignment process to refine mapping sets.

Conclusion: Experimental evaluations with several versions of aligned biomedical ontologies were conducted. Those experiments demonstrated the usefulness of ontology evolution changes to support the process of mapping refinement. Furthermore, using context in ontological concepts was effective in our techniques.

背景:生物医学计算系统受益于本体论及其相关映射。事实上,生命科学中的对齐本体在一些语义支持的任务中发挥着核心作用,尤其是在数据交换中。根据新的本体发布中插入的新知识来保持最新的对齐是至关重要的。在添加概念的基础上,对本体映射进行适当的细化需要进一步的研究。结果:本文研究了映射精化现象,提出了基于生物医学本体论进化来精化一组已建立的映射的技术。在我们的第一次分析中,我们研究了在不将匹配操作应用于整个本体实体集的情况下,建议与新本体版本对应的方法。在第二种分析中,精化技术能够导出新的映射,并更新映射的语义类型,使其超越等价性。我们的研究探索了对齐过程中概念的邻域,以完善映射集。结论:对几种版本的生物医学本体进行了实验评估。这些实验证明了本体进化变化对支持映射精化过程的有用性。此外,在本体论概念中使用上下文在我们的技术中是有效的。
{"title":"Context-based refinement of mappings in evolving life science ontologies.","authors":"Victor Eiti Yamamoto, Juliana Medeiros Destro, Julio Cesar Dos Reis","doi":"10.1186/s13326-023-00294-8","DOIUrl":"10.1186/s13326-023-00294-8","url":null,"abstract":"<p><strong>Background: </strong>Biomedical computational systems benefit from ontologies and their associated mappings. Indeed, aligned ontologies in life sciences play a central role in several semantic-enabled tasks, especially in data exchange. It is crucial to maintain up-to-date alignments according to new knowledge inserted in novel ontology releases. Refining ontology mappings in place, based on adding concepts, demands further research.</p><p><strong>Results: </strong>This article studies the mapping refinement phenomenon by proposing techniques to refine a set of established mappings based on the evolution of biomedical ontologies. In our first analysis, we investigate ways of suggesting correspondences with the new ontology version without applying a matching operation to the whole set of ontology entities. In the second analysis, the refinement technique enables deriving new mappings and updating the semantic type of the mapping beyond equivalence. Our study explores the neighborhood of concepts in the alignment process to refine mapping sets.</p><p><strong>Conclusion: </strong>Experimental evaluations with several versions of aligned biomedical ontologies were conducted. Those experiments demonstrated the usefulness of ontology evolution changes to support the process of mapping refinement. Furthermore, using context in ontological concepts was effective in our techniques.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"16"},"PeriodicalIF":1.9,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10585791/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49677735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis and implementation of the DynDiff tool when comparing versions of ontology. 比较本体版本时DynDiff工具的分析和实现。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-09-28 DOI: 10.1186/s13326-023-00295-7
Sara Diaz Benavides, Silvio D Cardoso, Marcos Da Silveira, Cédric Pruski

Background: Ontologies play a key role in the management of medical knowledge because they have the properties to support a wide range of knowledge-intensive tasks. The dynamic nature of knowledge requires frequent changes to the ontologies to keep them up-to-date. The challenge is to understand and manage these changes and their impact on depending systems well in order to handle the growing volume of data annotated with ontologies and the limited documentation describing the changes.

Methods: We present a method to detect and characterize the changes occurring between different versions of an ontology together with an ontology of changes entitled DynDiffOnto, designed according to Semantic Web best practices and FAIR principles. We further describe the implementation of the method and the evaluation of the tool with different ontologies from the biomedical domain (i.e. ICD9-CM, MeSH, NCIt, SNOMEDCT, GO, IOBC and CIDO), showing its performance in terms of time execution and capacity to classify ontological changes, compared with other state-of-the-art approaches.

Results: The experiments show a top-level performance of DynDiff for large ontologies and a good performance for smaller ones, with respect to execution time and capability to identify complex changes. In this paper, we further highlight the impact of ontology matchers on the diff computation and the possibility to parameterize the matcher in DynDiff, enabling the possibility of benefits from state-of-the-art matchers.

Conclusion: DynDiff is an efficient tool to compute differences between ontology versions and classify these differences according to DynDiffOnto concepts. This work also contributes to a better understanding of ontological changes through DynDiffOnto, which was designed to express the semantics of the changes between versions of an ontology and can be used to document the evolution of an ontology.

背景:本体论在医学知识管理中发挥着关键作用,因为它们具有支持广泛的知识密集型任务的特性。知识的动态性质要求对本体进行频繁的更改,以使其保持最新状态。挑战在于理解和管理这些变化及其对依赖系统的影响,以便处理越来越多的用本体注释的数据和描述这些变化的有限文档。方法:我们提出了一种检测和表征不同版本本体之间发生的变化的方法,以及根据语义网最佳实践和FAIR原则设计的名为DynDiffOnto的变化本体。我们进一步描述了该方法的实现以及该工具在生物医学领域的不同本体(即ICD9-CM、MeSH、NCIt、SNOMEDCT、GO、IOBC和CIDO)的评估,与其他最先进的方法相比,显示了其在时间执行和对本体变化进行分类的能力方面的性能。结果:实验表明,在执行时间和识别复杂变化的能力方面,DynDiff对大型本体具有顶级性能,对小型本体具有良好性能。在本文中,我们进一步强调了本体匹配器对diff计算的影响,以及在DynDiff中参数化匹配器的可能性,从而有可能从最先进的匹配器中获益。结论:DynDiff是一种计算本体版本之间差异并根据DynDiffOnto概念对这些差异进行分类的有效工具。这项工作也有助于通过DynDiffOnto更好地理解本体论的变化,DynDiff Onto旨在表达本体论版本之间变化的语义,并可用于记录本体论的演变。
{"title":"Analysis and implementation of the DynDiff tool when comparing versions of ontology.","authors":"Sara Diaz Benavides, Silvio D Cardoso, Marcos Da Silveira, Cédric Pruski","doi":"10.1186/s13326-023-00295-7","DOIUrl":"10.1186/s13326-023-00295-7","url":null,"abstract":"<p><strong>Background: </strong>Ontologies play a key role in the management of medical knowledge because they have the properties to support a wide range of knowledge-intensive tasks. The dynamic nature of knowledge requires frequent changes to the ontologies to keep them up-to-date. The challenge is to understand and manage these changes and their impact on depending systems well in order to handle the growing volume of data annotated with ontologies and the limited documentation describing the changes.</p><p><strong>Methods: </strong>We present a method to detect and characterize the changes occurring between different versions of an ontology together with an ontology of changes entitled DynDiffOnto, designed according to Semantic Web best practices and FAIR principles. We further describe the implementation of the method and the evaluation of the tool with different ontologies from the biomedical domain (i.e. ICD9-CM, MeSH, NCIt, SNOMEDCT, GO, IOBC and CIDO), showing its performance in terms of time execution and capacity to classify ontological changes, compared with other state-of-the-art approaches.</p><p><strong>Results: </strong>The experiments show a top-level performance of DynDiff for large ontologies and a good performance for smaller ones, with respect to execution time and capability to identify complex changes. In this paper, we further highlight the impact of ontology matchers on the diff computation and the possibility to parameterize the matcher in DynDiff, enabling the possibility of benefits from state-of-the-art matchers.</p><p><strong>Conclusion: </strong>DynDiff is an efficient tool to compute differences between ontology versions and classify these differences according to DynDiffOnto concepts. This work also contributes to a better understanding of ontological changes through DynDiffOnto, which was designed to express the semantics of the changes between versions of an ontology and can be used to document the evolution of an ontology.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"15"},"PeriodicalIF":1.9,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10537977/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41114733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and validation of the early warning system scores ontology. 预警系统评分本体的开发和验证。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-09-20 DOI: 10.1186/s13326-023-00296-6
Cilia E Zayas, Justin M Whorton, Kevin W Sexton, Charles D Mabry, S Clint Dowland, Mathias Brochhausen

Background: Clinical early warning scoring systems, have improved patient outcomes in a range of specializations and global contexts. These systems are used to predict patient deterioration. A multitude of patient-level physiological decompensation data has been made available through the widespread integration of early warning scoring systems within EHRs across national and international health care organizations. These data can be used to promote secondary research. The diversity of early warning scoring systems and various EHR systems is one barrier to secondary analysis of early warning score data. Given that early warning score parameters are varied, this makes it difficult to query across providers and EHR systems. Moreover, mapping and merging the parameters is challenging. We develop and validate the Early Warning System Scores Ontology (EWSSO), representing three commonly used early warning scores: the National Early Warning Score (NEWS), the six-item modified Early Warning Score (MEWS), and the quick Sequential Organ Failure Assessment (qSOFA) to overcome these problems.

Methods: We apply the Software Development Lifecycle Framework-conceived by Winston Boyce in 1970-to model the activities involved in organizing, producing, and evaluating the EWSSO. We also follow OBO Foundry Principles and the principles of best practice for domain ontology design, terms, definitions, and classifications to meet BFO requirements for ontology building.

Results: We developed twenty-nine new classes, reused four classes and four object properties to create the EWSSO. When we queried the data our ontology-based process could differentiate between necessary and unnecessary features for score calculation 100% of the time. Further, our process applied the proper temperature conversions for the early warning score calculator 100% of the time.

Conclusions: Using synthetic datasets, we demonstrate the EWSSO can be used to generate and query health system data on vital signs and provide input to calculate the NEWS, six-item MEWS, and qSOFA. Future work includes extending the EWSSO by introducing additional early warning scores for adult and pediatric patient populations and creating patient profiles that contain clinical, demographic, and outcomes data regarding the patient.

背景:临床预警评分系统在一系列专业和全球背景下改善了患者的预后。这些系统用于预测患者病情恶化。通过在国家和国际卫生保健组织的EHR中广泛集成早期预警评分系统,已经提供了大量患者水平的生理失代偿数据。这些数据可用于促进二次研究。预警评分系统和各种EHR系统的多样性是对预警评分数据进行二次分析的障碍之一。鉴于预警分数参数各不相同,因此很难在供应商和EHR系统之间进行查询。此外,映射和合并参数也是一项挑战。为了克服这些问题,我们开发并验证了预警系统分数本体论(EWSSO),它代表了三种常用的预警分数:国家预警分数(NEWS)、六项修正预警分数(MEWS)和快速顺序器官衰竭评估(qSOFA)。方法:我们应用Winston Boyce在1970年提出的软件开发生命周期框架来对组织、生产和评估EWSSO所涉及的活动进行建模。我们还遵循海外建筑运营管理局铸造原则和领域本体设计、术语、定义和分类的最佳实践原则,以满足BFO对本体构建的要求。结果:我们开发了二十九个新类,重用了四个类和四个对象属性来创建EWSSO。当我们查询数据时,我们基于本体的过程可以100%区分必要和不必要的特征,用于分数计算。此外,我们的过程在100%的时间内为预警分数计算器应用了适当的温度转换。结论:使用合成数据集,我们证明了EWSSO可以用于生成和查询健康系统的生命体征数据,并为计算NEWS、六项MEWS和qSOFA提供输入。未来的工作包括通过为成人和儿科患者群体引入额外的早期预警分数来扩展EWSSO,并创建包含患者临床、人口统计和结果数据的患者档案。
{"title":"Development and validation of the early warning system scores ontology.","authors":"Cilia E Zayas, Justin M Whorton, Kevin W Sexton, Charles D Mabry, S Clint Dowland, Mathias Brochhausen","doi":"10.1186/s13326-023-00296-6","DOIUrl":"10.1186/s13326-023-00296-6","url":null,"abstract":"<p><strong>Background: </strong>Clinical early warning scoring systems, have improved patient outcomes in a range of specializations and global contexts. These systems are used to predict patient deterioration. A multitude of patient-level physiological decompensation data has been made available through the widespread integration of early warning scoring systems within EHRs across national and international health care organizations. These data can be used to promote secondary research. The diversity of early warning scoring systems and various EHR systems is one barrier to secondary analysis of early warning score data. Given that early warning score parameters are varied, this makes it difficult to query across providers and EHR systems. Moreover, mapping and merging the parameters is challenging. We develop and validate the Early Warning System Scores Ontology (EWSSO), representing three commonly used early warning scores: the National Early Warning Score (NEWS), the six-item modified Early Warning Score (MEWS), and the quick Sequential Organ Failure Assessment (qSOFA) to overcome these problems.</p><p><strong>Methods: </strong>We apply the Software Development Lifecycle Framework-conceived by Winston Boyce in 1970-to model the activities involved in organizing, producing, and evaluating the EWSSO. We also follow OBO Foundry Principles and the principles of best practice for domain ontology design, terms, definitions, and classifications to meet BFO requirements for ontology building.</p><p><strong>Results: </strong>We developed twenty-nine new classes, reused four classes and four object properties to create the EWSSO. When we queried the data our ontology-based process could differentiate between necessary and unnecessary features for score calculation 100% of the time. Further, our process applied the proper temperature conversions for the early warning score calculator 100% of the time.</p><p><strong>Conclusions: </strong>Using synthetic datasets, we demonstrate the EWSSO can be used to generate and query health system data on vital signs and provide input to calculate the NEWS, six-item MEWS, and qSOFA. Future work includes extending the EWSSO by introducing additional early warning scores for adult and pediatric patient populations and creating patient profiles that contain clinical, demographic, and outcomes data regarding the patient.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"14"},"PeriodicalIF":1.9,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10510162/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41123049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic classification of experimental models in biomedical literature to support searching for alternative methods to animal experiments. 生物医学文献中实验模型的自动分类,以支持寻找动物实验的替代方法。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-09-01 DOI: 10.1186/s13326-023-00292-w
Mariana Neves, Antonina Klippert, Fanny Knöspel, Juliane Rudeck, Ailine Stolz, Zsofia Ban, Markus Becker, Kai Diederich, Barbara Grune, Pia Kahnau, Nils Ohnesorge, Johannes Pucher, Gilbert Schönfelder, Bettina Bert, Daniel Butzke

Current animal protection laws require replacement of animal experiments with alternative methods, whenever such methods are suitable to reach the intended scientific objective. However, searching for alternative methods in the scientific literature is a time-consuming task that requires careful screening of an enormously large number of experimental biomedical publications. The identification of potentially relevant methods, e.g. organ or cell culture models, or computer simulations, can be supported with text mining tools specifically built for this purpose. Such tools are trained (or fine tuned) on relevant data sets labeled by human experts. We developed the GoldHamster corpus, composed of 1,600 PubMed (Medline) articles (titles and abstracts), in which we manually identified the used experimental model according to a set of eight labels, namely: "in vivo", "organs", "primary cells", "immortal cell lines", "invertebrates", "humans", "in silico" and "other" (models). We recruited 13 annotators with expertise in the biomedical domain and assigned each article to two individuals. Four additional rounds of annotation aimed at improving the quality of the annotations with disagreements in the first round. Furthermore, we conducted various machine learning experiments based on supervised learning to evaluate the corpus for our classification task. We obtained more than 7,000 document-level annotations for the above labels. After the first round of annotation, the inter-annotator agreement (kappa coefficient) varied among labels, and ranged from 0.42 (for "others") to 0.82 (for "invertebrates"), with an overall score of 0.62. All disagreements were resolved in the subsequent rounds of annotation. The best-performing machine learning experiment used the PubMedBERT pre-trained model with fine-tuning to our corpus, which gained an overall f-score of 0.83. We obtained a corpus with high agreement for all labels, and our evaluation demonstrated that our corpus is suitable for training reliable predictive models for automatic classification of biomedical literature according to the used experimental models. Our SMAFIRA - "Smart feature-based interactive" - search tool ( https://smafira.bf3r.de ) will employ this classifier for supporting the retrieval of alternative methods to animal experiments. The corpus is available for download ( https://doi.org/10.5281/zenodo.7152295 ), as well as the source code ( https://github.com/mariananeves/goldhamster ) and the model ( https://huggingface.co/SMAFIRA/goldhamster ).

目前的动物保护法要求用替代方法替代动物实验,只要这些方法适合达到预期的科学目标。然而,在科学文献中寻找替代方法是一项耗时的任务,需要仔细筛选大量的实验性生物医学出版物。识别潜在的相关方法,例如器官或细胞培养模型,或计算机模拟,可以通过专门为此目的构建的文本挖掘工具来支持。这些工具是在人类专家标记的相关数据集上训练(或微调)的。我们开发了GoldHamster语料库,该语料库由1600篇PubMed (Medline)文章(标题和摘要)组成,其中我们根据一组8个标签手动识别使用的实验模型,即:“体内”、“器官”、“原代细胞”、“不朽细胞系”、“无脊椎动物”、“人类”、“计算机”和“其他”(模型)。我们招募了13名具有生物医学领域专业知识的注释者,并将每篇文章分配给两个人。另外四轮注释旨在提高第一轮中存在分歧的注释的质量。此外,我们进行了各种基于监督学习的机器学习实验,以评估我们分类任务的语料库。我们为上述标签获得了7000多个文档级别的注释。在第一轮标注之后,标注者之间的一致性(kappa系数)在标签之间变化,范围从0.42(“其他”)到0.82(“无脊椎动物”),总分为0.62。在随后的几轮注释中,所有分歧都得到了解决。表现最好的机器学习实验使用了PubMedBERT预训练模型,并对我们的语料库进行了微调,其总体f分数为0.83。我们获得了一个对所有标签都具有高度一致性的语料库,我们的评估表明,根据使用的实验模型,我们的语料库适合用于训练可靠的生物医学文献自动分类预测模型。我们的SMAFIRA——“基于智能特征的交互式”搜索工具(https://smafira.bf3r.de)将使用这个分类器来支持动物实验替代方法的检索。语料库可以下载(https://doi.org/10.5281/zenodo.7152295),也可以下载源代码(https://github.com/mariananeves/goldhamster)和模型(https://huggingface.co/SMAFIRA/goldhamster)。
{"title":"Automatic classification of experimental models in biomedical literature to support searching for alternative methods to animal experiments.","authors":"Mariana Neves, Antonina Klippert, Fanny Knöspel, Juliane Rudeck, Ailine Stolz, Zsofia Ban, Markus Becker, Kai Diederich, Barbara Grune, Pia Kahnau, Nils Ohnesorge, Johannes Pucher, Gilbert Schönfelder, Bettina Bert, Daniel Butzke","doi":"10.1186/s13326-023-00292-w","DOIUrl":"10.1186/s13326-023-00292-w","url":null,"abstract":"<p><p>Current animal protection laws require replacement of animal experiments with alternative methods, whenever such methods are suitable to reach the intended scientific objective. However, searching for alternative methods in the scientific literature is a time-consuming task that requires careful screening of an enormously large number of experimental biomedical publications. The identification of potentially relevant methods, e.g. organ or cell culture models, or computer simulations, can be supported with text mining tools specifically built for this purpose. Such tools are trained (or fine tuned) on relevant data sets labeled by human experts. We developed the GoldHamster corpus, composed of 1,600 PubMed (Medline) articles (titles and abstracts), in which we manually identified the used experimental model according to a set of eight labels, namely: \"in vivo\", \"organs\", \"primary cells\", \"immortal cell lines\", \"invertebrates\", \"humans\", \"in silico\" and \"other\" (models). We recruited 13 annotators with expertise in the biomedical domain and assigned each article to two individuals. Four additional rounds of annotation aimed at improving the quality of the annotations with disagreements in the first round. Furthermore, we conducted various machine learning experiments based on supervised learning to evaluate the corpus for our classification task. We obtained more than 7,000 document-level annotations for the above labels. After the first round of annotation, the inter-annotator agreement (kappa coefficient) varied among labels, and ranged from 0.42 (for \"others\") to 0.82 (for \"invertebrates\"), with an overall score of 0.62. All disagreements were resolved in the subsequent rounds of annotation. The best-performing machine learning experiment used the PubMedBERT pre-trained model with fine-tuning to our corpus, which gained an overall f-score of 0.83. We obtained a corpus with high agreement for all labels, and our evaluation demonstrated that our corpus is suitable for training reliable predictive models for automatic classification of biomedical literature according to the used experimental models. Our SMAFIRA - \"Smart feature-based interactive\" - search tool ( https://smafira.bf3r.de ) will employ this classifier for supporting the retrieval of alternative methods to animal experiments. The corpus is available for download ( https://doi.org/10.5281/zenodo.7152295 ), as well as the source code ( https://github.com/mariananeves/goldhamster ) and the model ( https://huggingface.co/SMAFIRA/goldhamster ).</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"13"},"PeriodicalIF":1.9,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10472567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10178765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic transparency evaluation for open knowledge extraction systems. 开放知识提取系统的自动透明度评估。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-08-31 DOI: 10.1186/s13326-023-00293-9
Maryam Basereh, Annalina Caputo, Rob Brennan
<p><strong>Background: </strong>This paper proposes Cyrus, a new transparency evaluation framework, for Open Knowledge Extraction (OKE) systems. Cyrus is based on the state-of-the-art transparency models and linked data quality assessment dimensions. It brings together a comprehensive view of transparency dimensions for OKE systems. The Cyrus framework is used to evaluate the transparency of three linked datasets, which are built from the same corpus by three state-of-the-art OKE systems. The evaluation is automatically performed using a combination of three state-of-the-art FAIRness (Findability, Accessibility, Interoperability, Reusability) assessment tools and a linked data quality evaluation framework, called Luzzu. This evaluation includes six Cyrus data transparency dimensions for which existing assessment tools could be identified. OKE systems extract structured knowledge from unstructured or semi-structured text in the form of linked data. These systems are fundamental components of advanced knowledge services. However, due to the lack of a transparency framework for OKE, most OKE systems are not transparent. This means that their processes and outcomes are not understandable and interpretable. A comprehensive framework sheds light on different aspects of transparency, allows comparison between the transparency of different systems by supporting the development of transparency scores, gives insight into the transparency weaknesses of the system, and ways to improve them. Automatic transparency evaluation helps with scalability and facilitates transparency assessment. The transparency problem has been identified as critical by the European Union Trustworthy Artificial Intelligence (AI) guidelines. In this paper, Cyrus provides the first comprehensive view of transparency dimensions for OKE systems by merging the perspectives of the FAccT (Fairness, Accountability, and Transparency), FAIR, and linked data quality research communities.</p><p><strong>Results: </strong>In Cyrus, data transparency includes ten dimensions which are grouped in two categories. In this paper, six of these dimensions, i.e., provenance, interpretability, understandability, licensing, availability, interlinking have been evaluated automatically for three state-of-the-art OKE systems, using the state-of-the-art metrics and tools. Covid-on-the-Web is identified to have the highest mean transparency.</p><p><strong>Conclusions: </strong>This is the first research to study the transparency of OKE systems that provides a comprehensive set of transparency dimensions spanning ethics, trustworthy AI, and data quality approaches to transparency. It also demonstrates how to perform automated transparency evaluation that combines existing FAIRness and linked data quality assessment tools for the first time. We show that state-of-the-art OKE systems vary in the transparency of the linked data generated and that these differences can be automatically quantified leading to potential
背景:本文提出了一种新的面向开放知识抽取(OKE)系统的透明度评价框架Cyrus。Cyrus基于最先进的透明度模型和关联的数据质量评估维度。它汇集了对OKE系统透明度维度的全面视图。Cyrus框架用于评估三个关联数据集的透明度,这些数据集由三个最先进的OKE系统从相同的语料库构建而成。评估使用三种最先进的公平性(可查找性、可访问性、互操作性、可重用性)评估工具和一个被称为Luzzu的关联数据质量评估框架的组合自动执行。该评估包括六个Cyrus数据透明度维度,现有评估工具可以识别这些维度。OKE系统以关联数据的形式从非结构化或半结构化文本中提取结构化知识。这些系统是高级知识服务的基本组成部分。然而,由于缺乏一个透明的OKE框架,大多数OKE系统是不透明的。这意味着它们的过程和结果是不可理解和解释的。一个全面的框架阐明了透明度的不同方面,通过支持透明度分数的发展,允许对不同系统的透明度进行比较,深入了解系统的透明度弱点,以及改进它们的方法。自动透明度评估有助于可伸缩性和促进透明度评估。透明度问题已被欧盟可信赖人工智能(AI)指南确定为关键问题。在本文中,Cyrus通过合并FAccT(公平性、问责制和透明度)、FAIR和关联数据质量研究社区的观点,提供了OKE系统透明度维度的第一个全面视图。结果:在Cyrus中,数据透明度包括十个维度,分为两类。在本文中,使用最先进的度量和工具,对三个最先进的OKE系统自动评估了这些维度中的六个,即来源、可解释性、可理解性、许可、可用性、互连。网络上的冠状病毒被认为具有最高的平均透明度。结论:这是第一个研究OKE系统透明度的研究,该系统提供了一套全面的透明度维度,涵盖道德、可信赖的人工智能和数据质量透明度方法。它还首次演示了如何执行结合现有公平性和关联数据质量评估工具的自动化透明度评估。我们表明,最先进的OKE系统在生成的关联数据的透明度方面存在差异,这些差异可以自动量化,从而在可信赖的人工智能、合规性、数据保护、数据治理以及未来的OKE系统设计和测试中产生潜在的应用。
{"title":"Automatic transparency evaluation for open knowledge extraction systems.","authors":"Maryam Basereh, Annalina Caputo, Rob Brennan","doi":"10.1186/s13326-023-00293-9","DOIUrl":"10.1186/s13326-023-00293-9","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;This paper proposes Cyrus, a new transparency evaluation framework, for Open Knowledge Extraction (OKE) systems. Cyrus is based on the state-of-the-art transparency models and linked data quality assessment dimensions. It brings together a comprehensive view of transparency dimensions for OKE systems. The Cyrus framework is used to evaluate the transparency of three linked datasets, which are built from the same corpus by three state-of-the-art OKE systems. The evaluation is automatically performed using a combination of three state-of-the-art FAIRness (Findability, Accessibility, Interoperability, Reusability) assessment tools and a linked data quality evaluation framework, called Luzzu. This evaluation includes six Cyrus data transparency dimensions for which existing assessment tools could be identified. OKE systems extract structured knowledge from unstructured or semi-structured text in the form of linked data. These systems are fundamental components of advanced knowledge services. However, due to the lack of a transparency framework for OKE, most OKE systems are not transparent. This means that their processes and outcomes are not understandable and interpretable. A comprehensive framework sheds light on different aspects of transparency, allows comparison between the transparency of different systems by supporting the development of transparency scores, gives insight into the transparency weaknesses of the system, and ways to improve them. Automatic transparency evaluation helps with scalability and facilitates transparency assessment. The transparency problem has been identified as critical by the European Union Trustworthy Artificial Intelligence (AI) guidelines. In this paper, Cyrus provides the first comprehensive view of transparency dimensions for OKE systems by merging the perspectives of the FAccT (Fairness, Accountability, and Transparency), FAIR, and linked data quality research communities.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;In Cyrus, data transparency includes ten dimensions which are grouped in two categories. In this paper, six of these dimensions, i.e., provenance, interpretability, understandability, licensing, availability, interlinking have been evaluated automatically for three state-of-the-art OKE systems, using the state-of-the-art metrics and tools. Covid-on-the-Web is identified to have the highest mean transparency.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;This is the first research to study the transparency of OKE systems that provides a comprehensive set of transparency dimensions spanning ethics, trustworthy AI, and data quality approaches to transparency. It also demonstrates how to perform automated transparency evaluation that combines existing FAIRness and linked data quality assessment tools for the first time. We show that state-of-the-art OKE systems vary in the transparency of the linked data generated and that these differences can be automatically quantified leading to potential","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"12"},"PeriodicalIF":1.9,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468861/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10549601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-domain knowledge graph embeddings for gene-disease association prediction. 基因疾病关联预测的多领域知识图谱嵌入。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-08-14 DOI: 10.1186/s13326-023-00291-x
Susana Nunes, Rita T Sousa, Catia Pesquita

Background: Predicting gene-disease associations typically requires exploring diverse sources of information as well as sophisticated computational approaches. Knowledge graph embeddings can help tackle these challenges by creating representations of genes and diseases based on the scientific knowledge described in ontologies, which can then be explored by machine learning algorithms. However, state-of-the-art knowledge graph embeddings are produced over a single ontology or multiple but disconnected ones, ignoring the impact that considering multiple interconnected domains can have on complex tasks such as gene-disease association prediction.

Results: We propose a novel approach to predict gene-disease associations using rich semantic representations based on knowledge graph embeddings over multiple ontologies linked by logical definitions and compound ontology mappings. The experiments showed that considering richer knowledge graphs significantly improves gene-disease prediction and that different knowledge graph embeddings methods benefit more from distinct types of semantic richness.

Conclusions: This work demonstrated the potential for knowledge graph embeddings across multiple and interconnected biomedical ontologies to support gene-disease prediction. It also paved the way for considering other ontologies or tackling other tasks where multiple perspectives over the data can be beneficial. All software and data are freely available.

背景:预测基因与疾病的关联通常需要探索不同的信息来源以及复杂的计算方法。知识图嵌入可以通过基于本体中描述的科学知识创建基因和疾病的表示来帮助解决这些挑战,然后可以通过机器学习算法进行探索。然而,最先进的知识图嵌入是在单个本体或多个但不相连的本体上产生的,忽略了考虑多个相互连接的领域可能对复杂任务(如基因-疾病关联预测)的影响。结果:我们提出了一种预测基因-疾病关联的新方法,该方法使用基于知识图嵌入的丰富语义表示,通过逻辑定义和复合本体映射连接多个本体。实验表明,考虑更丰富的知识图可以显著提高基因疾病的预测效果,不同的知识图嵌入方法受益于不同类型的语义丰富度。结论:这项工作证明了跨多个相互关联的生物医学本体的知识图谱嵌入支持基因疾病预测的潜力。它还为考虑其他本体或处理其他任务铺平了道路,在这些任务中,数据的多个透视图可能是有益的。所有软件和数据都是免费提供的。
{"title":"Multi-domain knowledge graph embeddings for gene-disease association prediction.","authors":"Susana Nunes, Rita T Sousa, Catia Pesquita","doi":"10.1186/s13326-023-00291-x","DOIUrl":"10.1186/s13326-023-00291-x","url":null,"abstract":"<p><strong>Background: </strong>Predicting gene-disease associations typically requires exploring diverse sources of information as well as sophisticated computational approaches. Knowledge graph embeddings can help tackle these challenges by creating representations of genes and diseases based on the scientific knowledge described in ontologies, which can then be explored by machine learning algorithms. However, state-of-the-art knowledge graph embeddings are produced over a single ontology or multiple but disconnected ones, ignoring the impact that considering multiple interconnected domains can have on complex tasks such as gene-disease association prediction.</p><p><strong>Results: </strong>We propose a novel approach to predict gene-disease associations using rich semantic representations based on knowledge graph embeddings over multiple ontologies linked by logical definitions and compound ontology mappings. The experiments showed that considering richer knowledge graphs significantly improves gene-disease prediction and that different knowledge graph embeddings methods benefit more from distinct types of semantic richness.</p><p><strong>Conclusions: </strong>This work demonstrated the potential for knowledge graph embeddings across multiple and interconnected biomedical ontologies to support gene-disease prediction. It also paved the way for considering other ontologies or tackling other tasks where multiple perspectives over the data can be beneficial. All software and data are freely available.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"11"},"PeriodicalIF":1.9,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10426189/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10003461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An extension of the BioAssay Ontology to include pharmacokinetic/pharmacodynamic terminology for the enrichment of scientific workflows. 生物测定本体的扩展,包括药代动力学/药效学术语,以丰富科学工作流程。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-08-11 DOI: 10.1186/s13326-023-00288-6
Steve Penn, Jane Lomax, Anneli Karlsson, Vincent Antonucci, Carl-Dieter Zachmann, Samantha Kanza, Stephan Schurer, John Turner

With the capacity to produce and record data electronically, Scientific research and the data associated with it have grown at an unprecedented rate. However, despite a decent amount of data now existing in an electronic form, it is still common for scientific research to be recorded in an unstructured text format with inconsistent context (vocabularies) which vastly reduces the potential for direct intelligent analysis. Research has demonstrated that the use of semantic technologies such as ontologies to structure and enrich scientific data can greatly improve this potential. However, whilst there are many ontologies that can be used for this purpose, there is still a vast quantity of scientific terminology that does not have adequate semantic representation. A key area for expansion identified by the authors was the pharmacokinetic/pharmacodynamic (PK/PD) domain due to its high usage across many areas of Pharma. As such we have produced a set of these terms and other bioassay related terms to be incorporated into the BioAssay Ontology (BAO), which was identified as the most relevant ontology for this work. A number of use cases developed by experts in the field were used to demonstrate how these new ontology terms can be used, and to set the scene for the continuation of this work with a look to expanding this work out into further relevant domains. The work done in this paper was part of Phase 1 of the SEED project (Semantically Enriching electronic laboratory notebook (eLN) Data).

由于能够以电子方式产生和记录数据,科学研究和与之相关的数据以前所未有的速度增长。然而,尽管现在有相当数量的数据以电子形式存在,科学研究仍然普遍以不一致的上下文(词汇表)的非结构化文本格式记录,这大大降低了直接智能分析的潜力。研究表明,使用语义技术(如本体)来构建和丰富科学数据可以极大地提高这种潜力。然而,虽然有许多本体可以用于此目的,但仍然有大量的科学术语没有足够的语义表示。作者确定的一个关键扩展领域是药代动力学/药效学(PK/PD)领域,因为它在制药的许多领域都有很高的使用。因此,我们已经制作了一套这些术语和其他生物测定相关术语,并将其纳入生物测定本体(BAO),该本体被确定为与本工作最相关的本体。由该领域的专家开发的许多用例被用来演示如何使用这些新的本体术语,并为这项工作的继续设置场景,以期将这项工作扩展到进一步的相关领域。本文所做的工作是SEED项目(语义丰富电子实验室笔记本(eLN)数据)第一阶段的一部分。
{"title":"An extension of the BioAssay Ontology to include pharmacokinetic/pharmacodynamic terminology for the enrichment of scientific workflows.","authors":"Steve Penn, Jane Lomax, Anneli Karlsson, Vincent Antonucci, Carl-Dieter Zachmann, Samantha Kanza, Stephan Schurer, John Turner","doi":"10.1186/s13326-023-00288-6","DOIUrl":"10.1186/s13326-023-00288-6","url":null,"abstract":"<p><p>With the capacity to produce and record data electronically, Scientific research and the data associated with it have grown at an unprecedented rate. However, despite a decent amount of data now existing in an electronic form, it is still common for scientific research to be recorded in an unstructured text format with inconsistent context (vocabularies) which vastly reduces the potential for direct intelligent analysis. Research has demonstrated that the use of semantic technologies such as ontologies to structure and enrich scientific data can greatly improve this potential. However, whilst there are many ontologies that can be used for this purpose, there is still a vast quantity of scientific terminology that does not have adequate semantic representation. A key area for expansion identified by the authors was the pharmacokinetic/pharmacodynamic (PK/PD) domain due to its high usage across many areas of Pharma. As such we have produced a set of these terms and other bioassay related terms to be incorporated into the BioAssay Ontology (BAO), which was identified as the most relevant ontology for this work. A number of use cases developed by experts in the field were used to demonstrate how these new ontology terms can be used, and to set the scene for the continuation of this work with a look to expanding this work out into further relevant domains. The work done in this paper was part of Phase 1 of the SEED project (Semantically Enriching electronic laboratory notebook (eLN) Data).</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"10"},"PeriodicalIF":1.9,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10416407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9997460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving the classification of cardinality phenotypes using collections. 利用集合改进基数表型的分类。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-08-07 DOI: 10.1186/s13326-023-00290-y
Sarah M Alghamdi, Robert Hoehndorf

Motivation: Phenotypes are observable characteristics of an organism and they can be highly variable. Information about phenotypes is collected in a clinical context to characterize disease, and is also collected in model organisms and stored in model organism databases where they are used to understand gene functions. Phenotype data is also used in computational data analysis and machine learning methods to provide novel insights into disease mechanisms and support personalized diagnosis of disease. For mammalian organisms and in a clinical context, ontologies such as the Human Phenotype Ontology and the Mammalian Phenotype Ontology are widely used to formally and precisely describe phenotypes. We specifically analyze axioms pertaining to phenotypes of collections of entities within a body, and we find that some of the axioms in phenotype ontologies lead to inferences that may not accurately reflect the underlying biological phenomena.

Results: We reformulate the phenotypes of collections of entities using an ontological theory of collections. By reformulating phenotypes of collections in phenotypes ontologies, we avoid potentially incorrect inferences pertaining to the cardinality of these collections. We apply our method to two phenotype ontologies and show that the reformulation not only removes some problematic inferences but also quantitatively improves biological data analysis.

动机:表型是生物体的可观察特征,它们可以是高度可变的。在临床环境中收集有关表型的信息以表征疾病,也在模式生物中收集并存储在模式生物数据库中,用于了解基因功能。表型数据也用于计算数据分析和机器学习方法,为疾病机制提供新的见解,并支持疾病的个性化诊断。对于哺乳动物有机体和在临床环境中,本体,如人类表型本体和哺乳动物表型本体被广泛用于正式和精确地描述表型。我们特别分析了与身体内实体集合的表型有关的公理,我们发现表型本体论中的一些公理导致的推论可能不能准确反映潜在的生物现象。结果:我们使用集合的本体论理论重新制定实体集合的表型。通过在表型本体论中重新制定集合的表型,我们避免了与这些集合的基数性有关的潜在错误推论。我们将我们的方法应用于两种表型本体论,并表明重新表述不仅消除了一些有问题的推论,而且在定量上提高了生物学数据分析。
{"title":"Improving the classification of cardinality phenotypes using collections.","authors":"Sarah M Alghamdi, Robert Hoehndorf","doi":"10.1186/s13326-023-00290-y","DOIUrl":"10.1186/s13326-023-00290-y","url":null,"abstract":"<p><strong>Motivation: </strong>Phenotypes are observable characteristics of an organism and they can be highly variable. Information about phenotypes is collected in a clinical context to characterize disease, and is also collected in model organisms and stored in model organism databases where they are used to understand gene functions. Phenotype data is also used in computational data analysis and machine learning methods to provide novel insights into disease mechanisms and support personalized diagnosis of disease. For mammalian organisms and in a clinical context, ontologies such as the Human Phenotype Ontology and the Mammalian Phenotype Ontology are widely used to formally and precisely describe phenotypes. We specifically analyze axioms pertaining to phenotypes of collections of entities within a body, and we find that some of the axioms in phenotype ontologies lead to inferences that may not accurately reflect the underlying biological phenomena.</p><p><strong>Results: </strong>We reformulate the phenotypes of collections of entities using an ontological theory of collections. By reformulating phenotypes of collections in phenotypes ontologies, we avoid potentially incorrect inferences pertaining to the cardinality of these collections. We apply our method to two phenotype ontologies and show that the reformulation not only removes some problematic inferences but also quantitatively improves biological data analysis.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"9"},"PeriodicalIF":1.9,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10405428/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9959650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Semantics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1