Understanding the performance of knowledge graph embeddings in drug discovery

Stephen Bonner , Ian P. Barrett , Cheng Ye , Rowan Swiers , Ola Engkvist , Charles Tapley Hoyt , William L. Hamilton
{"title":"Understanding the performance of knowledge graph embeddings in drug discovery","authors":"Stephen Bonner ,&nbsp;Ian P. Barrett ,&nbsp;Cheng Ye ,&nbsp;Rowan Swiers ,&nbsp;Ola Engkvist ,&nbsp;Charles Tapley Hoyt ,&nbsp;William L. Hamilton","doi":"10.1016/j.ailsci.2022.100036","DOIUrl":null,"url":null,"abstract":"<div><p>Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required.</p><p>In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000071/pdfft?md5=06ed4e6a1e3c501ecb6c465108f88691&pid=1-s2.0-S2667318522000071-main.pdf","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667318522000071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

Abstract

Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required.

In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
理解知识图嵌入在药物发现中的性能
知识图(KG)和相关的知识图嵌入(KGE)模型最近开始在药物发现的背景下进行探索,并有可能帮助解决关键挑战,如目标识别。在药物发现领域,kg可作为流程的一部分使用,这可能导致进行基于实验室的实验,或影响其他决策,从而产生大量的时间和财务成本,最重要的是,最终影响患者的医疗保健。要使KGE模型在这个领域产生影响,不仅需要更好地理解性能,还需要更好地理解决定性能的各种因素。在这项研究中,我们通过数千个实验,研究了五种KGE模型在两种面向公共药物发现的KGE上的预测性能。我们的目标不是关注最佳的整体模型或配置,而是更深入地研究性能如何受到训练设置、超参数选择、模型参数初始化种子和数据集不同分割的变化的影响。我们的研究结果强调,这些因素对性能有显著影响,甚至可以影响模型的排名。事实上,这些因素应该与模型架构一起报告,以确保未来工作的完全可重复性和公平比较,我们认为这对于在生物医学环境中接受kge的使用和影响至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial intelligence in the life sciences
Artificial intelligence in the life sciences Pharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)
CiteScore
5.00
自引率
0.00%
发文量
0
审稿时长
15 days
期刊最新文献
Modeling PROTAC degradation activity with machine learning Machine learning proteochemometric models for Cereblon glue activity predictions Editorial Board Statistical approaches enabling technology-specific assay interference prediction from large screening data sets Federated learning for predicting compound mechanism of action based on image-data from cell painting
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1