Pub Date : 2024-08-10DOI: 10.1186/s13326-024-00318-x
Jianfu Li, Yiming Li, Yuanyi Pan, Jinjing Guo, Zenan Sun, Fang Li, Yongqun He, Cui Tao
Background: Vaccines have revolutionized public health by providing protection against infectious diseases. They stimulate the immune system and generate memory cells to defend against targeted diseases. Clinical trials evaluate vaccine performance, including dosage, administration routes, and potential side effects.
Clinicaltrials: gov is a valuable repository of clinical trial information, but the vaccine data in them lacks standardization, leading to challenges in automatic concept mapping, vaccine-related knowledge development, evidence-based decision-making, and vaccine surveillance.
Results: In this study, we developed a cascaded framework that capitalized on multiple domain knowledge sources, including clinical trials, the Unified Medical Language System (UMLS), and the Vaccine Ontology (VO), to enhance the performance of domain-specific language models for automated mapping of VO from clinical trials. The Vaccine Ontology (VO) is a community-based ontology that was developed to promote vaccine data standardization, integration, and computer-assisted reasoning. Our methodology involved extracting and annotating data from various sources. We then performed pre-training on the PubMedBERT model, leading to the development of CTPubMedBERT. Subsequently, we enhanced CTPubMedBERT by incorporating SAPBERT, which was pretrained using the UMLS, resulting in CTPubMedBERT + SAPBERT. Further refinement was accomplished through fine-tuning using the Vaccine Ontology corpus and vaccine data from clinical trials, yielding the CTPubMedBERT + SAPBERT + VO model. Finally, we utilized a collection of pre-trained models, along with the weighted rule-based ensemble approach, to normalize the vaccine corpus and improve the accuracy of the process. The ranking process in concept normalization involves prioritizing and ordering potential concepts to identify the most suitable match for a given context. We conducted a ranking of the Top 10 concepts, and our experimental results demonstrate that our proposed cascaded framework consistently outperformed existing effective baselines on vaccine mapping, achieving 71.8% on top 1 candidate's accuracy and 90.0% on top 10 candidate's accuracy.
Conclusion: This study provides a detailed insight into a cascaded framework of fine-tuned domain-specific language models improving mapping of VO from clinical trials. By effectively leveraging domain-specific information and applying weighted rule-based ensembles of different pre-trained BERT models, our framework can significantly enhance the mapping of VO from clinical trials.
{"title":"Mapping vaccine names in clinical trials to vaccine ontology using cascaded fine-tuned domain-specific language models.","authors":"Jianfu Li, Yiming Li, Yuanyi Pan, Jinjing Guo, Zenan Sun, Fang Li, Yongqun He, Cui Tao","doi":"10.1186/s13326-024-00318-x","DOIUrl":"10.1186/s13326-024-00318-x","url":null,"abstract":"<p><strong>Background: </strong>Vaccines have revolutionized public health by providing protection against infectious diseases. They stimulate the immune system and generate memory cells to defend against targeted diseases. Clinical trials evaluate vaccine performance, including dosage, administration routes, and potential side effects.</p><p><strong>Clinicaltrials: </strong>gov is a valuable repository of clinical trial information, but the vaccine data in them lacks standardization, leading to challenges in automatic concept mapping, vaccine-related knowledge development, evidence-based decision-making, and vaccine surveillance.</p><p><strong>Results: </strong>In this study, we developed a cascaded framework that capitalized on multiple domain knowledge sources, including clinical trials, the Unified Medical Language System (UMLS), and the Vaccine Ontology (VO), to enhance the performance of domain-specific language models for automated mapping of VO from clinical trials. The Vaccine Ontology (VO) is a community-based ontology that was developed to promote vaccine data standardization, integration, and computer-assisted reasoning. Our methodology involved extracting and annotating data from various sources. We then performed pre-training on the PubMedBERT model, leading to the development of CTPubMedBERT. Subsequently, we enhanced CTPubMedBERT by incorporating SAPBERT, which was pretrained using the UMLS, resulting in CTPubMedBERT + SAPBERT. Further refinement was accomplished through fine-tuning using the Vaccine Ontology corpus and vaccine data from clinical trials, yielding the CTPubMedBERT + SAPBERT + VO model. Finally, we utilized a collection of pre-trained models, along with the weighted rule-based ensemble approach, to normalize the vaccine corpus and improve the accuracy of the process. The ranking process in concept normalization involves prioritizing and ordering potential concepts to identify the most suitable match for a given context. We conducted a ranking of the Top 10 concepts, and our experimental results demonstrate that our proposed cascaded framework consistently outperformed existing effective baselines on vaccine mapping, achieving 71.8% on top 1 candidate's accuracy and 90.0% on top 10 candidate's accuracy.</p><p><strong>Conclusion: </strong>This study provides a detailed insight into a cascaded framework of fine-tuned domain-specific language models improving mapping of VO from clinical trials. By effectively leveraging domain-specific information and applying weighted rule-based ensembles of different pre-trained BERT models, our framework can significantly enhance the mapping of VO from clinical trials.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"14"},"PeriodicalIF":2.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11316402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141912790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-31DOI: 10.1186/s13326-024-00314-1
Sarah Mullin, Robert McDougal, Kei-Hoi Cheung, Halil Kilicoglu, Amanda Beck, Caroline J Zeiss
Background: Identifying chemical mentions within the Alzheimer's and dementia literature can provide a powerful tool to further therapeutic research. Leveraging the Chemical Entities of Biological Interest (ChEBI) ontology, which is rich in hierarchical and other relationship types, for entity normalization can provide an advantage for future downstream applications. We provide a reproducible hybrid approach that combines an ontology-enhanced PubMedBERT model for disambiguation with a dictionary-based method for candidate selection.
Results: There were 56,553 chemical mentions in the titles of 44,812 unique PubMed article abstracts. Based on our gold standard, our method of disambiguation improved entity normalization by 25.3 percentage points compared to using only the dictionary-based approach with fuzzy-string matching for disambiguation. For the CRAFT corpus, our method outperformed baselines (maximum 78.4%) with a 91.17% accuracy. For our Alzheimer's and dementia cohort, we were able to add 47.1% more potential mappings between MeSH and ChEBI when compared to BioPortal.
Conclusion: Use of natural language models like PubMedBERT and resources such as ChEBI and PubChem provide a beneficial way to link entity mentions to ontology terms, while further supporting downstream tasks like filtering ChEBI mentions based on roles and assertions to find beneficial therapies for Alzheimer's and dementia.
{"title":"Chemical entity normalization for successful translational development of Alzheimer's disease and dementia therapeutics.","authors":"Sarah Mullin, Robert McDougal, Kei-Hoi Cheung, Halil Kilicoglu, Amanda Beck, Caroline J Zeiss","doi":"10.1186/s13326-024-00314-1","DOIUrl":"10.1186/s13326-024-00314-1","url":null,"abstract":"<p><strong>Background: </strong>Identifying chemical mentions within the Alzheimer's and dementia literature can provide a powerful tool to further therapeutic research. Leveraging the Chemical Entities of Biological Interest (ChEBI) ontology, which is rich in hierarchical and other relationship types, for entity normalization can provide an advantage for future downstream applications. We provide a reproducible hybrid approach that combines an ontology-enhanced PubMedBERT model for disambiguation with a dictionary-based method for candidate selection.</p><p><strong>Results: </strong>There were 56,553 chemical mentions in the titles of 44,812 unique PubMed article abstracts. Based on our gold standard, our method of disambiguation improved entity normalization by 25.3 percentage points compared to using only the dictionary-based approach with fuzzy-string matching for disambiguation. For the CRAFT corpus, our method outperformed baselines (maximum 78.4%) with a 91.17% accuracy. For our Alzheimer's and dementia cohort, we were able to add 47.1% more potential mappings between MeSH and ChEBI when compared to BioPortal.</p><p><strong>Conclusion: </strong>Use of natural language models like PubMedBERT and resources such as ChEBI and PubChem provide a beneficial way to link entity mentions to ontology terms, while further supporting downstream tasks like filtering ChEBI mentions based on roles and assertions to find beneficial therapies for Alzheimer's and dementia.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"13"},"PeriodicalIF":2.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11290083/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141855609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1186/s13326-024-00312-3
Jie Zheng, Xingxian Li, Anna Maria Masci, Hayleigh Kahn, Anthony Huffman, Eliyas Asfaw, Yuanyi Pan, Jinjing Guo, Virginia He, Justin Song, Andrey I Seleznev, Asiyah Yu Lin, Yongqun He
Background: The exploration of cancer vaccines has yielded a multitude of studies, resulting in a diverse collection of information. The heterogeneity of cancer vaccine data significantly impedes effective integration and analysis. While CanVaxKB serves as a pioneering database for over 670 manually annotated cancer vaccines, it is important to distinguish that a database, on its own, does not offer the structured relationships and standardized definitions found in an ontology. Recognizing this, we expanded the Vaccine Ontology (VO) to include those cancer vaccines present in CanVaxKB that were not initially covered, enhancing VO's capacity to systematically define and interrelate cancer vaccines.
Results: An ontology design pattern (ODP) was first developed and applied to semantically represent various cancer vaccines, capturing their associated entities and relations. By applying the ODP, we generated a cancer vaccine template in a tabular format and converted it into the RDF/OWL format for generation of cancer vaccine terms in the VO. '12MP vaccine' was used as an example of cancer vaccines to demonstrate the application of the ODP. VO also reuses reference ontology terms to represent entities such as cancer diseases and vaccine hosts. Description Logic (DL) and SPARQL query scripts were developed and used to query for cancer vaccines based on different vaccine's features and to demonstrate the versatility of the VO representation. Additionally, ontological modeling was applied to illustrate cancer vaccine related concepts and studies for in-depth cancer vaccine analysis. A cancer vaccine-specific VO view, referred to as "CVO," was generated, and it contains 928 classes including 704 cancer vaccines. The CVO OWL file is publicly available on: http://purl.obolibrary.org/obo/vo/cvo.owl , for sharing and applications.
Conclusion: To facilitate the standardization, integration, and analysis of cancer vaccine data, we expanded the Vaccine Ontology (VO) to systematically model and represent cancer vaccines. We also developed a pipeline to automate the inclusion of cancer vaccines and associated terms in the VO. This not only enriches the data's standardization and integration, but also leverages ontological modeling to deepen the analysis of cancer vaccine information, maximizing benefits for researchers and clinicians.
Availability: The VO-cancer GitHub website is: https://github.com/vaccineontology/VO/tree/master/CVO .
背景:对癌症疫苗的探索产生了大量的研究,从而收集了各种各样的信息。癌症疫苗数据的异质性严重阻碍了有效的整合与分析。尽管 CanVaxKB 是一个包含 670 多种人工注释癌症疫苗的开创性数据库,但必须区分的是,数据库本身并不能提供本体论中的结构化关系和标准化定义。认识到这一点后,我们对疫苗本体(VO)进行了扩展,纳入了 CanVaxKB 最初未涵盖的癌症疫苗,从而增强了 VO 系统定义和相互关联癌症疫苗的能力:我们首先开发了一种本体设计模式(ODP),并将其用于从语义上表示各种癌症疫苗,捕捉其相关实体和关系。通过应用 ODP,我们以表格格式生成了癌症疫苗模板,并将其转换为 RDF/OWL 格式,以便在 VO 中生成癌症疫苗术语。我们以 "12MP 疫苗 "为例演示了 ODP 的应用。VO 还重复使用参考本体术语来表示癌症疾病和疫苗宿主等实体。我们开发了描述逻辑(DL)和 SPARQL 查询脚本,用于根据不同疫苗的特征查询癌症疫苗,以展示 VO 表示法的多功能性。此外,还应用本体论建模来说明癌症疫苗的相关概念和研究,以便对癌症疫苗进行深入分析。生成的癌症疫苗专用 VO 视图被称为 "CVO",包含 928 个类,其中包括 704 种癌症疫苗。CVO OWL 文件可在 http://purl.obolibrary.org/obo/vo/cvo.owl 上公开获取,以供共享和应用:为了促进癌症疫苗数据的标准化、集成和分析,我们扩展了疫苗本体(VO),以系统地建模和表示癌症疫苗。我们还开发了一个管道,可自动将癌症疫苗和相关术语纳入 VO。这不仅丰富了数据的标准化和集成,还利用本体论建模加深了对癌症疫苗信息的分析,为研究人员和临床医生带来了最大益处:VO-cancer GitHub 网站:https://github.com/vaccineontology/VO/tree/master/CVO 。
{"title":"Empowering standardization of cancer vaccines through ontology: enhanced modeling and data analysis.","authors":"Jie Zheng, Xingxian Li, Anna Maria Masci, Hayleigh Kahn, Anthony Huffman, Eliyas Asfaw, Yuanyi Pan, Jinjing Guo, Virginia He, Justin Song, Andrey I Seleznev, Asiyah Yu Lin, Yongqun He","doi":"10.1186/s13326-024-00312-3","DOIUrl":"10.1186/s13326-024-00312-3","url":null,"abstract":"<p><strong>Background: </strong>The exploration of cancer vaccines has yielded a multitude of studies, resulting in a diverse collection of information. The heterogeneity of cancer vaccine data significantly impedes effective integration and analysis. While CanVaxKB serves as a pioneering database for over 670 manually annotated cancer vaccines, it is important to distinguish that a database, on its own, does not offer the structured relationships and standardized definitions found in an ontology. Recognizing this, we expanded the Vaccine Ontology (VO) to include those cancer vaccines present in CanVaxKB that were not initially covered, enhancing VO's capacity to systematically define and interrelate cancer vaccines.</p><p><strong>Results: </strong>An ontology design pattern (ODP) was first developed and applied to semantically represent various cancer vaccines, capturing their associated entities and relations. By applying the ODP, we generated a cancer vaccine template in a tabular format and converted it into the RDF/OWL format for generation of cancer vaccine terms in the VO. '12MP vaccine' was used as an example of cancer vaccines to demonstrate the application of the ODP. VO also reuses reference ontology terms to represent entities such as cancer diseases and vaccine hosts. Description Logic (DL) and SPARQL query scripts were developed and used to query for cancer vaccines based on different vaccine's features and to demonstrate the versatility of the VO representation. Additionally, ontological modeling was applied to illustrate cancer vaccine related concepts and studies for in-depth cancer vaccine analysis. A cancer vaccine-specific VO view, referred to as \"CVO,\" was generated, and it contains 928 classes including 704 cancer vaccines. The CVO OWL file is publicly available on: http://purl.obolibrary.org/obo/vo/cvo.owl , for sharing and applications.</p><p><strong>Conclusion: </strong>To facilitate the standardization, integration, and analysis of cancer vaccine data, we expanded the Vaccine Ontology (VO) to systematically model and represent cancer vaccines. We also developed a pipeline to automate the inclusion of cancer vaccines and associated terms in the VO. This not only enriches the data's standardization and integration, but also leverages ontological modeling to deepen the analysis of cancer vaccine information, maximizing benefits for researchers and clinicians.</p><p><strong>Availability: </strong>The VO-cancer GitHub website is: https://github.com/vaccineontology/VO/tree/master/CVO .</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"12"},"PeriodicalIF":2.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11186274/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141419260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-07DOI: 10.1186/s13326-024-00311-4
Abdullateef I Almudaifer, Whitney Covington, JaMor Hairston, Zachary Deitch, Ankit Anand, Caleb M Carroll, Estera Crisan, William Bradford, Lauren A Walter, Ellen F Eaton, Sue S Feldman, John D Osborne
Background: The semantics of entities extracted from a clinical text can be dramatically altered by modifiers, including entity negation, uncertainty, conditionality, severity, and subject. Existing models for determining modifiers of clinical entities involve regular expression or features weights that are trained independently for each modifier.
Methods: We develop and evaluate a multi-task transformer architecture design where modifiers are learned and predicted jointly using the publicly available SemEval 2015 Task 14 corpus and a new Opioid Use Disorder (OUD) data set that contains modifiers shared with SemEval as well as novel modifiers specific for OUD. We evaluate the effectiveness of our multi-task learning approach versus previously published systems and assess the feasibility of transfer learning for clinical entity modifiers when only a portion of clinical modifiers are shared.
Results: Our approach achieved state-of-the-art results on the ShARe corpus from SemEval 2015 Task 14, showing an increase of 1.1% on weighted accuracy, 1.7% on unweighted accuracy, and 10% on micro F1 scores.
Conclusions: We show that learned weights from our shared model can be effectively transferred to a new partially matched data set, validating the use of transfer learning for clinical text modifiers.
{"title":"Multi-task transfer learning for the prediction of entity modifiers in clinical text: application to opioid use disorder case detection.","authors":"Abdullateef I Almudaifer, Whitney Covington, JaMor Hairston, Zachary Deitch, Ankit Anand, Caleb M Carroll, Estera Crisan, William Bradford, Lauren A Walter, Ellen F Eaton, Sue S Feldman, John D Osborne","doi":"10.1186/s13326-024-00311-4","DOIUrl":"10.1186/s13326-024-00311-4","url":null,"abstract":"<p><strong>Background: </strong>The semantics of entities extracted from a clinical text can be dramatically altered by modifiers, including entity negation, uncertainty, conditionality, severity, and subject. Existing models for determining modifiers of clinical entities involve regular expression or features weights that are trained independently for each modifier.</p><p><strong>Methods: </strong>We develop and evaluate a multi-task transformer architecture design where modifiers are learned and predicted jointly using the publicly available SemEval 2015 Task 14 corpus and a new Opioid Use Disorder (OUD) data set that contains modifiers shared with SemEval as well as novel modifiers specific for OUD. We evaluate the effectiveness of our multi-task learning approach versus previously published systems and assess the feasibility of transfer learning for clinical entity modifiers when only a portion of clinical modifiers are shared.</p><p><strong>Results: </strong>Our approach achieved state-of-the-art results on the ShARe corpus from SemEval 2015 Task 14, showing an increase of 1.1% on weighted accuracy, 1.7% on unweighted accuracy, and 10% on micro F1 scores.</p><p><strong>Conclusions: </strong>We show that learned weights from our shared model can be effectively transferred to a new partially matched data set, validating the use of transfer learning for clinical text modifiers.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"11"},"PeriodicalIF":2.0,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11157899/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141288154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-06DOI: 10.1186/s13326-024-00313-2
Lars Vogt, Tobias Kuhn, Robert Hoehndorf
{"title":"Correction to: Semantic units: organizing knowledge graphs into semantically meaningful units of representation.","authors":"Lars Vogt, Tobias Kuhn, Robert Hoehndorf","doi":"10.1186/s13326-024-00313-2","DOIUrl":"10.1186/s13326-024-00313-2","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"10"},"PeriodicalIF":2.0,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11154969/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141283785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-06DOI: 10.1186/s13326-024-00303-4
Mathias De Brouwer, Pieter Bonte, Dörthe Arndt, Miel Vander Sande, Anastasia Dimou, Ruben Verborgh, Filip De Turck, Femke Ongenae
Background: In healthcare, an increasing collaboration can be noticed between different caregivers, especially considering the shift to homecare. To provide optimal patient care, efficient coordination of data and workflows between these different stakeholders is required. To achieve this, data should be exposed in a machine-interpretable, reusable manner. In addition, there is a need for smart, dynamic, personalized and performant services provided on top of this data. Flexible workflows should be defined that realize their desired functionality, adhere to use case specific quality constraints and improve coordination across stakeholders. User interfaces should allow configuring all of this in an easy, user-friendly way.
Methods: A distributed, generic, cascading reasoning reference architecture can solve the presented challenges. It can be instantiated with existing tools built upon Semantic Web technologies that provide data-driven semantic services and constructing cross-organizational workflows. These tools include RMLStreamer to generate Linked Data, DIVIDE to adaptively manage contextually relevant local queries, Streaming MASSIF to deploy reusable services, AMADEUS to compose semantic workflows, and RMLEditor and Matey to configure rules to generate Linked Data.
Results: A use case demonstrator is built on a scenario that focuses on personalized smart monitoring and cross-organizational treatment planning. The performance and usability of the demonstrator's implementation is evaluated. The former shows that the monitoring pipeline efficiently processes a stream of 14 observations per second: RMLStreamer maps JSON observations to RDF in 13.5 ms, a C-SPARQL query to generate fever alarms is executed on a window of 5 s in 26.4 ms, and Streaming MASSIF generates a smart notification for fever alarms based on severity and urgency in 1539.5 ms. DIVIDE derives the C-SPARQL queries in 7249.5 ms, while AMADEUS constructs a colon cancer treatment plan and performs conflict detection with it in 190.8 ms and 1335.7 ms, respectively.
Conclusions: Existing tools built upon Semantic Web technologies can be leveraged to optimize continuous care provisioning. The evaluation of the building blocks on a realistic homecare monitoring use case demonstrates their applicability, usability and good performance. Further extending the available user interfaces for some tools is required to increase their adoption.
{"title":"Optimized continuous homecare provisioning through distributed data-driven semantic services and cross-organizational workflows.","authors":"Mathias De Brouwer, Pieter Bonte, Dörthe Arndt, Miel Vander Sande, Anastasia Dimou, Ruben Verborgh, Filip De Turck, Femke Ongenae","doi":"10.1186/s13326-024-00303-4","DOIUrl":"10.1186/s13326-024-00303-4","url":null,"abstract":"<p><strong>Background: </strong>In healthcare, an increasing collaboration can be noticed between different caregivers, especially considering the shift to homecare. To provide optimal patient care, efficient coordination of data and workflows between these different stakeholders is required. To achieve this, data should be exposed in a machine-interpretable, reusable manner. In addition, there is a need for smart, dynamic, personalized and performant services provided on top of this data. Flexible workflows should be defined that realize their desired functionality, adhere to use case specific quality constraints and improve coordination across stakeholders. User interfaces should allow configuring all of this in an easy, user-friendly way.</p><p><strong>Methods: </strong>A distributed, generic, cascading reasoning reference architecture can solve the presented challenges. It can be instantiated with existing tools built upon Semantic Web technologies that provide data-driven semantic services and constructing cross-organizational workflows. These tools include RMLStreamer to generate Linked Data, DIVIDE to adaptively manage contextually relevant local queries, Streaming MASSIF to deploy reusable services, AMADEUS to compose semantic workflows, and RMLEditor and Matey to configure rules to generate Linked Data.</p><p><strong>Results: </strong>A use case demonstrator is built on a scenario that focuses on personalized smart monitoring and cross-organizational treatment planning. The performance and usability of the demonstrator's implementation is evaluated. The former shows that the monitoring pipeline efficiently processes a stream of 14 observations per second: RMLStreamer maps JSON observations to RDF in 13.5 ms, a C-SPARQL query to generate fever alarms is executed on a window of 5 s in 26.4 ms, and Streaming MASSIF generates a smart notification for fever alarms based on severity and urgency in 1539.5 ms. DIVIDE derives the C-SPARQL queries in 7249.5 ms, while AMADEUS constructs a colon cancer treatment plan and performs conflict detection with it in 190.8 ms and 1335.7 ms, respectively.</p><p><strong>Conclusions: </strong>Existing tools built upon Semantic Web technologies can be leveraged to optimize continuous care provisioning. The evaluation of the building blocks on a realistic homecare monitoring use case demonstrates their applicability, usability and good performance. Further extending the available user interfaces for some tools is required to increase their adoption.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"9"},"PeriodicalIF":2.0,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11154993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141283810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-30DOI: 10.1186/s13326-024-00306-1
Benjamin Molinet, Santiago Marro, Elena Cabrio, Serena Villata
Background: A huge amount of research is carried out nowadays in Artificial Intelligence to propose automated ways to analyse medical data with the aim to support doctors in delivering medical diagnoses. However, a main issue of these approaches is the lack of transparency and interpretability of the achieved results, making it hard to employ such methods for educational purposes. It is therefore necessary to develop new frameworks to enhance explainability in these solutions.
Results: In this paper, we present a novel full pipeline to generate automatically natural language explanations for medical diagnoses. The proposed solution starts from a clinical case description associated with a list of correct and incorrect diagnoses and, through the extraction of the relevant symptoms and findings, enriches the information contained in the description with verified medical knowledge from an ontology. Finally, the system returns a pattern-based explanation in natural language which elucidates why the correct (incorrect) diagnosis is the correct (incorrect) one. The main contribution of the paper is twofold: first, we propose two novel linguistic resources for the medical domain (i.e, a dataset of 314 clinical cases annotated with the medical entities from UMLS, and a database of biological boundaries for common findings), and second, a full Information Extraction pipeline to extract symptoms and findings from the clinical cases and match them with the terms in a medical ontology and to the biological boundaries. An extensive evaluation of the proposed approach shows the our method outperforms comparable approaches.
Conclusions: Our goal is to offer AI-assisted educational support framework to form clinical residents to formulate sound and exhaustive explanations for their diagnoses to patients.
{"title":"Explanatory argumentation in natural language for correct and incorrect medical diagnoses.","authors":"Benjamin Molinet, Santiago Marro, Elena Cabrio, Serena Villata","doi":"10.1186/s13326-024-00306-1","DOIUrl":"10.1186/s13326-024-00306-1","url":null,"abstract":"<p><strong>Background: </strong>A huge amount of research is carried out nowadays in Artificial Intelligence to propose automated ways to analyse medical data with the aim to support doctors in delivering medical diagnoses. However, a main issue of these approaches is the lack of transparency and interpretability of the achieved results, making it hard to employ such methods for educational purposes. It is therefore necessary to develop new frameworks to enhance explainability in these solutions.</p><p><strong>Results: </strong>In this paper, we present a novel full pipeline to generate automatically natural language explanations for medical diagnoses. The proposed solution starts from a clinical case description associated with a list of correct and incorrect diagnoses and, through the extraction of the relevant symptoms and findings, enriches the information contained in the description with verified medical knowledge from an ontology. Finally, the system returns a pattern-based explanation in natural language which elucidates why the correct (incorrect) diagnosis is the correct (incorrect) one. The main contribution of the paper is twofold: first, we propose two novel linguistic resources for the medical domain (i.e, a dataset of 314 clinical cases annotated with the medical entities from UMLS, and a database of biological boundaries for common findings), and second, a full Information Extraction pipeline to extract symptoms and findings from the clinical cases and match them with the terms in a medical ontology and to the biological boundaries. An extensive evaluation of the proposed approach shows the our method outperforms comparable approaches.</p><p><strong>Conclusions: </strong>Our goal is to offer AI-assisted educational support framework to form clinical residents to formulate sound and exhaustive explanations for their diagnoses to patients.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"8"},"PeriodicalIF":2.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11138001/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141179661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-27DOI: 10.1186/s13326-024-00310-5
Lars Vogt, Tobias Kuhn, Robert Hoehndorf
Background: In today's landscape of data management, the importance of knowledge graphs and ontologies is escalating as critical mechanisms aligned with the FAIR Guiding Principles-ensuring data and metadata are Findable, Accessible, Interoperable, and Reusable. We discuss three challenges that may hinder the effective exploitation of the full potential of FAIR knowledge graphs.
Results: We introduce "semantic units" as a conceptual solution, although currently exemplified only in a limited prototype. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs by adding another layer of triples on top of the conventional data layer. Semantic units and their subgraphs are represented by their own resource that instantiates a corresponding semantic unit class. We distinguish statement and compound units as basic categories of semantic units. A statement unit is the smallest, independent proposition that is semantically meaningful for a human reader. Depending on the relation of its underlying proposition, it consists of one or more triples. Organizing a knowledge graph into statement units results in a partition of the graph, with each triple belonging to exactly one statement unit. A compound unit, on the other hand, is a semantically meaningful collection of statement and compound units that form larger subgraphs. Some semantic units organize the graph into different levels of representational granularity, others orthogonally into different types of granularity trees or different frames of reference, structuring and organizing the knowledge graph into partially overlapping, partially enclosed subgraphs, each of which can be referenced by its own resource.
Conclusions: Semantic units, applicable in RDF/OWL and labeled property graphs, offer support for making statements about statements and facilitate graph-alignment, subgraph-matching, knowledge graph profiling, and for management of access restrictions to sensitive data. Additionally, we argue that organizing the graph into semantic units promotes the differentiation of ontological and discursive information, and that it also supports the differentiation of multiple frames of reference within the graph.
{"title":"Semantic units: organizing knowledge graphs into semantically meaningful units of representation.","authors":"Lars Vogt, Tobias Kuhn, Robert Hoehndorf","doi":"10.1186/s13326-024-00310-5","DOIUrl":"10.1186/s13326-024-00310-5","url":null,"abstract":"<p><strong>Background: </strong>In today's landscape of data management, the importance of knowledge graphs and ontologies is escalating as critical mechanisms aligned with the FAIR Guiding Principles-ensuring data and metadata are Findable, Accessible, Interoperable, and Reusable. We discuss three challenges that may hinder the effective exploitation of the full potential of FAIR knowledge graphs.</p><p><strong>Results: </strong>We introduce \"semantic units\" as a conceptual solution, although currently exemplified only in a limited prototype. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs by adding another layer of triples on top of the conventional data layer. Semantic units and their subgraphs are represented by their own resource that instantiates a corresponding semantic unit class. We distinguish statement and compound units as basic categories of semantic units. A statement unit is the smallest, independent proposition that is semantically meaningful for a human reader. Depending on the relation of its underlying proposition, it consists of one or more triples. Organizing a knowledge graph into statement units results in a partition of the graph, with each triple belonging to exactly one statement unit. A compound unit, on the other hand, is a semantically meaningful collection of statement and compound units that form larger subgraphs. Some semantic units organize the graph into different levels of representational granularity, others orthogonally into different types of granularity trees or different frames of reference, structuring and organizing the knowledge graph into partially overlapping, partially enclosed subgraphs, each of which can be referenced by its own resource.</p><p><strong>Conclusions: </strong>Semantic units, applicable in RDF/OWL and labeled property graphs, offer support for making statements about statements and facilitate graph-alignment, subgraph-matching, knowledge graph profiling, and for management of access restrictions to sensitive data. Additionally, we argue that organizing the graph into semantic units promotes the differentiation of ontological and discursive information, and that it also supports the differentiation of multiple frames of reference within the graph.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":"7"},"PeriodicalIF":2.0,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11131308/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141157997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-01DOI: 10.1186/s13326-024-00309-y
Rashmie Abeysinghe, Fengbo Zheng, Jay Shi, Samden D. Lhatoo, Licong Cui
Biomedical terminologies play a vital role in managing biomedical data. Missing IS-A relations in a biomedical terminology could be detrimental to its downstream usages. In this paper, we investigate an approach combining logical definitions and lexical features to discover missing IS-A relations in two biomedical terminologies: SNOMED CT and the National Cancer Institute (NCI) thesaurus. The method is applied to unrelated concept-pairs within non-lattice subgraphs: graph fragments within a terminology likely to contain various inconsistencies. Our approach first compares whether the logical definition of a concept is more general than that of the other concept. Then, we check whether the lexical features of the concept are contained in those of the other concept. If both constraints are satisfied, we suggest a potentially missing IS-A relation between the two concepts. The method identified 982 potential missing IS-A relations for SNOMED CT and 100 for NCI thesaurus. In order to assess the efficacy of our approach, a random sample of results belonging to the “Clinical Findings” and “Procedure” subhierarchies of SNOMED CT and results belonging to the “Drug, Food, Chemical or Biomedical Material” subhierarchy of the NCI thesaurus were evaluated by domain experts. The evaluation results revealed that 118 out of 150 suggestions are valid for SNOMED CT and 17 out of 20 are valid for NCI thesaurus.
{"title":"Leveraging logical definitions and lexical features to detect missing IS-A relations in biomedical terminologies","authors":"Rashmie Abeysinghe, Fengbo Zheng, Jay Shi, Samden D. Lhatoo, Licong Cui","doi":"10.1186/s13326-024-00309-y","DOIUrl":"https://doi.org/10.1186/s13326-024-00309-y","url":null,"abstract":"Biomedical terminologies play a vital role in managing biomedical data. Missing IS-A relations in a biomedical terminology could be detrimental to its downstream usages. In this paper, we investigate an approach combining logical definitions and lexical features to discover missing IS-A relations in two biomedical terminologies: SNOMED CT and the National Cancer Institute (NCI) thesaurus. The method is applied to unrelated concept-pairs within non-lattice subgraphs: graph fragments within a terminology likely to contain various inconsistencies. Our approach first compares whether the logical definition of a concept is more general than that of the other concept. Then, we check whether the lexical features of the concept are contained in those of the other concept. If both constraints are satisfied, we suggest a potentially missing IS-A relation between the two concepts. The method identified 982 potential missing IS-A relations for SNOMED CT and 100 for NCI thesaurus. In order to assess the efficacy of our approach, a random sample of results belonging to the “Clinical Findings” and “Procedure” subhierarchies of SNOMED CT and results belonging to the “Drug, Food, Chemical or Biomedical Material” subhierarchy of the NCI thesaurus were evaluated by domain experts. The evaluation results revealed that 118 out of 150 suggestions are valid for SNOMED CT and 17 out of 20 are valid for NCI thesaurus.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"15 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-01DOI: 10.1186/s13326-024-00308-z
Daniel N. Sosa, Georgiana Neculae, Julien Fauqueur, Russ B. Altman
Leveraging AI for synthesizing the deluge of biomedical knowledge has great potential for pharmacological discovery with applications including developing new therapeutics for untreated diseases and repurposing drugs as emergent pandemic treatments. Creating knowledge graph representations of interacting drugs, diseases, genes, and proteins enables discovery via embedding-based ML approaches and link prediction. Previously, it has been shown that these predictive methods are susceptible to biases from network structure, namely that they are driven not by discovering nuanced biological understanding of mechanisms, but based on high-degree hub nodes. In this work, we study the confounding effect of network topology on biological relation semantics by creating an experimental pipeline of knowledge graph semantic and topological perturbations. We show that the drop in drug repurposing performance from ablating meaningful semantics increases by 21% and 38% when mitigating topological bias in two networks. We demonstrate that new methods for representing knowledge and inferring new knowledge must be developed for making use of biomedical semantics for pharmacological innovation, and we suggest fruitful avenues for their development.
利用人工智能合成大量的生物医学知识,在药理学发现方面具有巨大的潜力,其应用包括为未治疗的疾病开发新的治疗方法,以及将药物重新用作紧急流行病的治疗方法。创建相互作用的药物、疾病、基因和蛋白质的知识图谱表示法,可以通过基于嵌入的 ML 方法和链接预测进行发现。以前的研究表明,这些预测方法很容易受到网络结构偏差的影响,即这些方法的驱动力不是发现对机制的细微生物学理解,而是基于高阶枢纽节点。在这项工作中,我们通过创建知识图谱语义和拓扑扰动的实验管道,研究了网络拓扑结构对生物关系语义的干扰效应。我们发现,在减轻两个网络中的拓扑偏差时,消除有意义的语义导致的药物再利用性能下降分别增加了 21% 和 38%。我们证明,要利用生物医学语义进行药物创新,就必须开发新的知识表示和新知识推断方法,并提出了富有成效的开发途径。
{"title":"Elucidating the semantics-topology trade-off for knowledge inference-based pharmacological discovery","authors":"Daniel N. Sosa, Georgiana Neculae, Julien Fauqueur, Russ B. Altman","doi":"10.1186/s13326-024-00308-z","DOIUrl":"https://doi.org/10.1186/s13326-024-00308-z","url":null,"abstract":"Leveraging AI for synthesizing the deluge of biomedical knowledge has great potential for pharmacological discovery with applications including developing new therapeutics for untreated diseases and repurposing drugs as emergent pandemic treatments. Creating knowledge graph representations of interacting drugs, diseases, genes, and proteins enables discovery via embedding-based ML approaches and link prediction. Previously, it has been shown that these predictive methods are susceptible to biases from network structure, namely that they are driven not by discovering nuanced biological understanding of mechanisms, but based on high-degree hub nodes. In this work, we study the confounding effect of network topology on biological relation semantics by creating an experimental pipeline of knowledge graph semantic and topological perturbations. We show that the drop in drug repurposing performance from ablating meaningful semantics increases by 21% and 38% when mitigating topological bias in two networks. We demonstrate that new methods for representing knowledge and inferring new knowledge must be developed for making use of biomedical semantics for pharmacological innovation, and we suggest fruitful avenues for their development.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"61 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}