Evaluating point-prediction uncertainties in neural networks for protein-ligand binding prediction

Ya Ju Fan , Jonathan E. Allen , Kevin S. McLoughlin , Da Shi , Brian J. Bennion , Xiaohua Zhang , Felice C. Lightstone
{"title":"Evaluating point-prediction uncertainties in neural networks for protein-ligand binding prediction","authors":"Ya Ju Fan ,&nbsp;Jonathan E. Allen ,&nbsp;Kevin S. McLoughlin ,&nbsp;Da Shi ,&nbsp;Brian J. Bennion ,&nbsp;Xiaohua Zhang ,&nbsp;Felice C. Lightstone","doi":"10.1016/j.aichem.2023.100004","DOIUrl":null,"url":null,"abstract":"<div><p>Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models requires uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Some methods require changing the NN architecture or training procedure, limiting the selection of NN models. Moreover, predictive uncertainty can come from different sources. It is important to have the ability to separately model different types of predictive uncertainty, as the model can take assorted actions depending on the source of uncertainty. In this paper, we examine UQ methods that estimate different sources of predictive uncertainty for NN models aiming at protein-ligand binding prediction. We use our prior knowledge on chemical compounds to design the experiments. By utilizing a visualization method we create non-overlapping and chemically diverse partitions from a collection of chemical compounds. These partitions are used as training and test set splits to explore NN model uncertainty. We demonstrate how the uncertainties estimated by the selected methods describe different sources of uncertainty under different partitions and featurization schemes and the relationship to prediction error.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/9f/25/nihms-1912151.PMC10426331.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949747723000040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models requires uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Some methods require changing the NN architecture or training procedure, limiting the selection of NN models. Moreover, predictive uncertainty can come from different sources. It is important to have the ability to separately model different types of predictive uncertainty, as the model can take assorted actions depending on the source of uncertainty. In this paper, we examine UQ methods that estimate different sources of predictive uncertainty for NN models aiming at protein-ligand binding prediction. We use our prior knowledge on chemical compounds to design the experiments. By utilizing a visualization method we create non-overlapping and chemically diverse partitions from a collection of chemical compounds. These partitions are used as training and test set splits to explore NN model uncertainty. We demonstrate how the uncertainties estimated by the selected methods describe different sources of uncertainty under different partitions and featurization schemes and the relationship to prediction error.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估神经网络在蛋白质配体结合预测中的点预测不确定性
神经网络(NN)模型提供了加快药物发现过程并降低其失败率的潜力。神经网络模型的成功需要不确定性量化(UQ),因为药物发现探索了训练数据分布之外的化学空间。标准NN模型不提供不确定性信息。一些方法需要改变神经网络架构或训练程序,从而限制神经网络模型的选择。此外,预测的不确定性可能来自不同的来源。重要的是要有能力分别对不同类型的预测不确定性进行建模,因为模型可以根据不确定性的来源采取各种行动。在本文中,我们检验了UQ方法,这些方法估计了针对蛋白质配体结合预测的NN模型的不同预测不确定性来源。我们利用我们对化合物的先验知识来设计实验。通过使用可视化方法,我们从一组化合物中创建了不重叠和化学多样的分区。这些分区被用作训练和测试集分割,以探索神经网络模型的不确定性。我们展示了所选方法估计的不确定性如何在不同的划分和特征化方案下描述不同的不确定性来源,以及与预测误差的关系。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial intelligence chemistry
Artificial intelligence chemistry Chemistry (General)
自引率
0.00%
发文量
0
审稿时长
21 days
期刊最新文献
Molecular similarity: Theory, applications, and perspectives Large-language models: The game-changers for materials science research Conf-GEM: A geometric information-assisted direct conformation generation model Top 20 influential AI-based technologies in chemistry User-friendly and industry-integrated AI for medicinal chemists and pharmaceuticals
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1