Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data.

IF 6.2 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Communications Chemistry Pub Date : 2025-02-08 DOI:10.1038/s42004-025-01428-y
Ísak Valsson, Matthew T Warren, Charlotte M Deane, Aniket Magarkar, Garrett M Morris, Philip C Biggin
{"title":"Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data.","authors":"Ísak Valsson, Matthew T Warren, Charlotte M Deane, Aniket Magarkar, Garrett M Morris, Philip C Biggin","doi":"10.1038/s42004-025-01428-y","DOIUrl":null,"url":null,"abstract":"<p><p>Machine learning offers great promise for fast and accurate binding affinity predictions. However, current models lack robust evaluation and fail on tasks encountered in (hit-to-) lead optimisation, such as ranking the binding affinity of a congeneric series of ligands, thereby limiting their application in drug discovery. Here, we address these issues by first introducing a novel attention-based graph neural network model called AEV-PLIG (atomic environment vector-protein ligand interaction graph). Second, we introduce a new and more realistic out-of-distribution test set called the OOD Test. We benchmark our model on this set, CASF-2016, and a test set used for free energy perturbation (FEP) calculations, that not only highlights the competitive performance of AEV-PLIG, but provides a realistic assessment of machine learning models with rigorous physics-based approaches. Moreover, we demonstrate how leveraging augmented data (generated using template-based modelling or molecular docking) can significantly improve binding affinity prediction correlation and ranking on the FEP benchmark (weighted mean PCC and Kendall's τ increases from 0.41 and 0.26 to 0.59 and 0.42). These strategies together are closing the performance gap with FEP calculations (FEP+ achieves weighted mean PCC and Kendall's τ of 0.68 and 0.49 on the FEP benchmark) while being  ~400,000 times faster.</p>","PeriodicalId":10529,"journal":{"name":"Communications Chemistry","volume":"8 1","pages":"41"},"PeriodicalIF":6.2000,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11807228/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1038/s42004-025-01428-y","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning offers great promise for fast and accurate binding affinity predictions. However, current models lack robust evaluation and fail on tasks encountered in (hit-to-) lead optimisation, such as ranking the binding affinity of a congeneric series of ligands, thereby limiting their application in drug discovery. Here, we address these issues by first introducing a novel attention-based graph neural network model called AEV-PLIG (atomic environment vector-protein ligand interaction graph). Second, we introduce a new and more realistic out-of-distribution test set called the OOD Test. We benchmark our model on this set, CASF-2016, and a test set used for free energy perturbation (FEP) calculations, that not only highlights the competitive performance of AEV-PLIG, but provides a realistic assessment of machine learning models with rigorous physics-based approaches. Moreover, we demonstrate how leveraging augmented data (generated using template-based modelling or molecular docking) can significantly improve binding affinity prediction correlation and ranking on the FEP benchmark (weighted mean PCC and Kendall's τ increases from 0.41 and 0.26 to 0.59 and 0.42). These strategies together are closing the performance gap with FEP calculations (FEP+ achieves weighted mean PCC and Kendall's τ of 0.68 and 0.49 on the FEP benchmark) while being  ~400,000 times faster.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用增广数据缩小机器学习评分函数和自由能摄动之间的差距。
机器学习为快速准确的绑定亲和预测提供了巨大的希望。然而,目前的模型缺乏可靠的评估,并且在(hit-to-)先导优化中遇到的任务上失败,例如对同源系列配体的结合亲和力进行排序,从而限制了它们在药物发现中的应用。在这里,我们通过首先引入一种新的基于注意力的图神经网络模型AEV-PLIG(原子环境载体-蛋白质配体相互作用图)来解决这些问题。其次,我们引入了一个新的更真实的分布外测试集,称为OOD测试。我们在此集CASF-2016和用于自由能摄动(FEP)计算的测试集上对我们的模型进行基准测试,这不仅突出了AEV-PLIG的竞争性能,而且通过严格的基于物理的方法提供了对机器学习模型的现实评估。此外,我们展示了利用增强数据(使用基于模板的建模或分子对接生成)如何显著提高结合亲和预测相关性和FEP基准排名(加权平均PCC和Kendall τ从0.41和0.26增加到0.59和0.42)。这些策略共同缩小了与FEP计算的性能差距(FEP+在FEP基准上实现加权平均PCC和肯德尔τ分别为0.68和0.49),同时速度提高了约40万倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Communications Chemistry
Communications Chemistry Chemistry-General Chemistry
CiteScore
7.70
自引率
1.70%
发文量
146
审稿时长
13 weeks
期刊介绍: Communications Chemistry is an open access journal from Nature Research publishing high-quality research, reviews and commentary in all areas of the chemical sciences. Research papers published by the journal represent significant advances bringing new chemical insight to a specialized area of research. We also aim to provide a community forum for issues of importance to all chemists, regardless of sub-discipline.
期刊最新文献
Adsorption of organic donor-acceptor molecules on graphene/SiC preserves light-induced charge transfer. A conformational benchmark for optical property prediction with solvent-aware graph neural networks. Orthologue inference-based enzyme mining for diversification of the anti-cancer evodiamine scaffold. Enhanced third-order optical nonlinearity in a dipolar carbene-metal-amide material with two-photon excited delayed fluorescence. Enabling multi-target drug discovery through latent evolutionary optimization and synthesis-aware prioritization (EVOSYNTH).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1