{"title":"A step towards quantifying, modelling and exploring uncertainty in biomedical knowledge graphs","authors":"Adil Bahaj , Mounir Ghogho","doi":"10.1016/j.compbiomed.2024.109355","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective:</h3><div>This study aims at automatically quantifying and modelling the uncertainty of facts in biomedical knowledge graphs (BKGs) based on their textual supporting evidence using deep learning techniques.</div></div><div><h3>Materials and Methods:</h3><div>A sentence transformer is employed to extract deep features of sentences used to classify sentence factuality using a naive Bayes classifier. For each fact and its supporting evidence in a source KG, the deep feature extractor and the classifier are used to quantify the factuality of each sentence which are then transformed to numerical values in <span><math><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow></math></span> before being averaged to get the confidence score of the fact.</div></div><div><h3>Results:</h3><div>The fact classification feature extractor enhances the separability of classes in the embedding space. This helped the fact classification model to achieve a better performance than existing factuality classification with hand-crafted features. Uncertainty quantification and modelling were demonstrated on SemMedDB by creating USemMedDB, showing KGB2U’s ability to process large BKGs. A subset of USemMedDB facts is modelled to demonstrate the correlation between the structure of the uncertain BKG and the confidence scores. The best-trained model is used to predict confidence scores of existing and unseen facts. The top-ranked unseen facts were grounded using scientific evidence showing KGB2U’s ability to discover new knowledge.</div></div><div><h3>Conclusion:</h3><div>Supporting literature of BKG facts can be used to automatically quantify their uncertainty. Additionally, the resulting uncertain biomedical KGs can be used for knowledge discovery. BKG2U interface and source code are available at <span><span>http://biofunk.datanets.org/</span><svg><path></path></svg></span> and <span><span>https://github.com/BahajAdil/KBG2U</span><svg><path></path></svg></span> respectively.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109355"},"PeriodicalIF":7.0000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482524014409","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective:
This study aims at automatically quantifying and modelling the uncertainty of facts in biomedical knowledge graphs (BKGs) based on their textual supporting evidence using deep learning techniques.
Materials and Methods:
A sentence transformer is employed to extract deep features of sentences used to classify sentence factuality using a naive Bayes classifier. For each fact and its supporting evidence in a source KG, the deep feature extractor and the classifier are used to quantify the factuality of each sentence which are then transformed to numerical values in before being averaged to get the confidence score of the fact.
Results:
The fact classification feature extractor enhances the separability of classes in the embedding space. This helped the fact classification model to achieve a better performance than existing factuality classification with hand-crafted features. Uncertainty quantification and modelling were demonstrated on SemMedDB by creating USemMedDB, showing KGB2U’s ability to process large BKGs. A subset of USemMedDB facts is modelled to demonstrate the correlation between the structure of the uncertain BKG and the confidence scores. The best-trained model is used to predict confidence scores of existing and unseen facts. The top-ranked unseen facts were grounded using scientific evidence showing KGB2U’s ability to discover new knowledge.
Conclusion:
Supporting literature of BKG facts can be used to automatically quantify their uncertainty. Additionally, the resulting uncertain biomedical KGs can be used for knowledge discovery. BKG2U interface and source code are available at http://biofunk.datanets.org/ and https://github.com/BahajAdil/KBG2U respectively.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.