Ísak Valsson, Matthew T Warren, Charlotte M Deane, Aniket Magarkar, Garrett M Morris, Philip C Biggin
{"title":"Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data.","authors":"Ísak Valsson, Matthew T Warren, Charlotte M Deane, Aniket Magarkar, Garrett M Morris, Philip C Biggin","doi":"10.1038/s42004-025-01428-y","DOIUrl":null,"url":null,"abstract":"<p><p>Machine learning offers great promise for fast and accurate binding affinity predictions. However, current models lack robust evaluation and fail on tasks encountered in (hit-to-) lead optimisation, such as ranking the binding affinity of a congeneric series of ligands, thereby limiting their application in drug discovery. Here, we address these issues by first introducing a novel attention-based graph neural network model called AEV-PLIG (atomic environment vector-protein ligand interaction graph). Second, we introduce a new and more realistic out-of-distribution test set called the OOD Test. We benchmark our model on this set, CASF-2016, and a test set used for free energy perturbation (FEP) calculations, that not only highlights the competitive performance of AEV-PLIG, but provides a realistic assessment of machine learning models with rigorous physics-based approaches. Moreover, we demonstrate how leveraging augmented data (generated using template-based modelling or molecular docking) can significantly improve binding affinity prediction correlation and ranking on the FEP benchmark (weighted mean PCC and Kendall's τ increases from 0.41 and 0.26 to 0.59 and 0.42). These strategies together are closing the performance gap with FEP calculations (FEP+ achieves weighted mean PCC and Kendall's τ of 0.68 and 0.49 on the FEP benchmark) while being ~400,000 times faster.</p>","PeriodicalId":10529,"journal":{"name":"Communications Chemistry","volume":"8 1","pages":"41"},"PeriodicalIF":5.9000,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11807228/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1038/s42004-025-01428-y","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning offers great promise for fast and accurate binding affinity predictions. However, current models lack robust evaluation and fail on tasks encountered in (hit-to-) lead optimisation, such as ranking the binding affinity of a congeneric series of ligands, thereby limiting their application in drug discovery. Here, we address these issues by first introducing a novel attention-based graph neural network model called AEV-PLIG (atomic environment vector-protein ligand interaction graph). Second, we introduce a new and more realistic out-of-distribution test set called the OOD Test. We benchmark our model on this set, CASF-2016, and a test set used for free energy perturbation (FEP) calculations, that not only highlights the competitive performance of AEV-PLIG, but provides a realistic assessment of machine learning models with rigorous physics-based approaches. Moreover, we demonstrate how leveraging augmented data (generated using template-based modelling or molecular docking) can significantly improve binding affinity prediction correlation and ranking on the FEP benchmark (weighted mean PCC and Kendall's τ increases from 0.41 and 0.26 to 0.59 and 0.42). These strategies together are closing the performance gap with FEP calculations (FEP+ achieves weighted mean PCC and Kendall's τ of 0.68 and 0.49 on the FEP benchmark) while being ~400,000 times faster.
期刊介绍:
Communications Chemistry is an open access journal from Nature Research publishing high-quality research, reviews and commentary in all areas of the chemical sciences. Research papers published by the journal represent significant advances bringing new chemical insight to a specialized area of research. We also aim to provide a community forum for issues of importance to all chemists, regardless of sub-discipline.