A step towards quantifying, modelling and exploring uncertainty in biomedical knowledge graphs

IF 7 2区 医学 Q1 BIOLOGY Computers in biology and medicine Pub Date : 2024-11-14 DOI:10.1016/j.compbiomed.2024.109355
Adil Bahaj , Mounir Ghogho
{"title":"A step towards quantifying, modelling and exploring uncertainty in biomedical knowledge graphs","authors":"Adil Bahaj ,&nbsp;Mounir Ghogho","doi":"10.1016/j.compbiomed.2024.109355","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective:</h3><div>This study aims at automatically quantifying and modelling the uncertainty of facts in biomedical knowledge graphs (BKGs) based on their textual supporting evidence using deep learning techniques.</div></div><div><h3>Materials and Methods:</h3><div>A sentence transformer is employed to extract deep features of sentences used to classify sentence factuality using a naive Bayes classifier. For each fact and its supporting evidence in a source KG, the deep feature extractor and the classifier are used to quantify the factuality of each sentence which are then transformed to numerical values in <span><math><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow></math></span> before being averaged to get the confidence score of the fact.</div></div><div><h3>Results:</h3><div>The fact classification feature extractor enhances the separability of classes in the embedding space. This helped the fact classification model to achieve a better performance than existing factuality classification with hand-crafted features. Uncertainty quantification and modelling were demonstrated on SemMedDB by creating USemMedDB, showing KGB2U’s ability to process large BKGs. A subset of USemMedDB facts is modelled to demonstrate the correlation between the structure of the uncertain BKG and the confidence scores. The best-trained model is used to predict confidence scores of existing and unseen facts. The top-ranked unseen facts were grounded using scientific evidence showing KGB2U’s ability to discover new knowledge.</div></div><div><h3>Conclusion:</h3><div>Supporting literature of BKG facts can be used to automatically quantify their uncertainty. Additionally, the resulting uncertain biomedical KGs can be used for knowledge discovery. BKG2U interface and source code are available at <span><span>http://biofunk.datanets.org/</span><svg><path></path></svg></span> and <span><span>https://github.com/BahajAdil/KBG2U</span><svg><path></path></svg></span> respectively.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109355"},"PeriodicalIF":7.0000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482524014409","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective:

This study aims at automatically quantifying and modelling the uncertainty of facts in biomedical knowledge graphs (BKGs) based on their textual supporting evidence using deep learning techniques.

Materials and Methods:

A sentence transformer is employed to extract deep features of sentences used to classify sentence factuality using a naive Bayes classifier. For each fact and its supporting evidence in a source KG, the deep feature extractor and the classifier are used to quantify the factuality of each sentence which are then transformed to numerical values in [0,1] before being averaged to get the confidence score of the fact.

Results:

The fact classification feature extractor enhances the separability of classes in the embedding space. This helped the fact classification model to achieve a better performance than existing factuality classification with hand-crafted features. Uncertainty quantification and modelling were demonstrated on SemMedDB by creating USemMedDB, showing KGB2U’s ability to process large BKGs. A subset of USemMedDB facts is modelled to demonstrate the correlation between the structure of the uncertain BKG and the confidence scores. The best-trained model is used to predict confidence scores of existing and unseen facts. The top-ranked unseen facts were grounded using scientific evidence showing KGB2U’s ability to discover new knowledge.

Conclusion:

Supporting literature of BKG facts can be used to automatically quantify their uncertainty. Additionally, the resulting uncertain biomedical KGs can be used for knowledge discovery. BKG2U interface and source code are available at http://biofunk.datanets.org/ and https://github.com/BahajAdil/KBG2U respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
量化、模拟和探索生物医学知识图谱中的不确定性。
研究目的本研究旨在利用深度学习技术,根据生物医学知识图谱(BKG)中的文本支持证据,自动量化和模拟事实的不确定性:采用句子转换器提取句子的深度特征,利用天真贝叶斯分类器对句子的事实性进行分类。对于源 KG 中的每个事实及其支持证据,深度特征提取器和分类器用于量化每个句子的事实性,然后将其转换为[0,1]中的数值,最后求平均值,得到事实的置信度得分:事实分类特征提取器增强了嵌入空间中类别的可分离性。结果:事实分类特征提取器增强了嵌入空间中类别的可分离性,这有助于事实分类模型取得比现有手工特征事实分类更好的性能。通过创建 USemMedDB,在 SemMedDB 上演示了不确定性量化和建模,展示了 KGB2U 处理大型 BKG 的能力。对 USemMedDB 事实的子集进行建模,以展示不确定 BKG 结构与置信度分数之间的相关性。最佳训练模型用于预测现有和未见事实的置信度得分。排名靠前的未见事实以科学证据为基础,展示了 KGB2U 发现新知识的能力:结论:BKG 事实的辅助文献可用于自动量化其不确定性。此外,由此产生的不确定生物医学 KG 可用于知识发现。BKG2U 界面和源代码可分别在 http://biofunk.datanets.org/ 和 https://github.com/BahajAdil/KBG2U 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
期刊最新文献
An adaptive enhanced human memory algorithm for multi-level image segmentation for pathological lung cancer images. Integrating multimodal learning for improved vital health parameter estimation. Riemannian manifold-based geometric clustering of continuous glucose monitoring to improve personalized diabetes management. Transformative artificial intelligence in gastric cancer: Advancements in diagnostic techniques. Artificial intelligence and deep learning algorithms for epigenetic sequence analysis: A review for epigeneticists and AI experts.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1