Achieving well-informed decision-making in drug discovery: a comprehensive calibration study using neural network-based structure-activity models

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Journal of Cheminformatics Pub Date : 2025-03-05 DOI:10.1186/s13321-025-00964-y

Hannah Rosa Friesacher, Ola Engkvist, Lewis Mervin, Yves Moreau, Adam Arany

{"title":"Achieving well-informed decision-making in drug discovery: a comprehensive calibration study using neural network-based structure-activity models","authors":"Hannah Rosa Friesacher, Ola Engkvist, Lewis Mervin, Yves Moreau, Adam Arany","doi":"10.1186/s13321-025-00964-y","DOIUrl":null,"url":null,"abstract":"<div><p>In the drug discovery process, where experiments can be costly and time-consuming, computational models that predict drug-target interactions are valuable tools to accelerate the development of new therapeutic agents. Estimating the uncertainty inherent in these neural network predictions provides valuable information that facilitates optimal decision-making when risk assessment is crucial. However, such models can be poorly calibrated, which results in unreliable uncertainty estimates that do not reflect the true predictive uncertainty. In this study, we compare different metrics, including accuracy and calibration scores, used for model hyperparameter tuning to investigate which model selection strategy achieves well-calibrated models. Furthermore, we propose to use a computationally efficient Bayesian uncertainty estimation method named HMC Bayesian Last Layer (HBLL), which generates Hamiltonian Monte Carlo (HMC) trajectories to obtain samples for the parameters of a Bayesian logistic regression fitted to the hidden layer of the baseline neural network. We report that this approach improves model calibration and achieves the performance of common uncertainty quantification methods by combining the benefits of uncertainty estimation and probability calibration methods. Finally, we show that combining post hoc calibration method with well-performing uncertainty quantification approaches can boost model accuracy and calibration. </p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00964-y","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-00964-y","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

In the drug discovery process, where experiments can be costly and time-consuming, computational models that predict drug-target interactions are valuable tools to accelerate the development of new therapeutic agents. Estimating the uncertainty inherent in these neural network predictions provides valuable information that facilitates optimal decision-making when risk assessment is crucial. However, such models can be poorly calibrated, which results in unreliable uncertainty estimates that do not reflect the true predictive uncertainty. In this study, we compare different metrics, including accuracy and calibration scores, used for model hyperparameter tuning to investigate which model selection strategy achieves well-calibrated models. Furthermore, we propose to use a computationally efficient Bayesian uncertainty estimation method named HMC Bayesian Last Layer (HBLL), which generates Hamiltonian Monte Carlo (HMC) trajectories to obtain samples for the parameters of a Bayesian logistic regression fitted to the hidden layer of the baseline neural network. We report that this approach improves model calibration and achieves the performance of common uncertainty quantification methods by combining the benefits of uncertainty estimation and probability calibration methods. Finally, we show that combining post hoc calibration method with well-performing uncertainty quantification approaches can boost model accuracy and calibration.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在药物发现中实现明智的决策：使用基于神经网络的结构-活性模型的综合校准研究

在药物发现过程中，实验可能是昂贵和耗时的，预测药物-靶标相互作用的计算模型是加速新治疗剂开发的有价值的工具。评估这些神经网络预测中固有的不确定性提供了有价值的信息，有助于在风险评估至关重要的情况下进行最佳决策。然而，这些模型可能校准得很差，这导致不可靠的不确定性估计，不能反映真正的预测不确定性。在本研究中，我们比较了用于模型超参数调优的不同度量，包括精度和校准分数，以研究哪种模型选择策略可以获得校准良好的模型。此外，我们提出了一种计算效率高的贝叶斯不确定性估计方法，称为HMC贝叶斯最后一层（HBLL），该方法生成哈密顿蒙特卡罗（HMC）轨迹，以获取拟合基线神经网络隐藏层的贝叶斯逻辑回归参数的样本。该方法结合了不确定性估计和概率定标方法的优点，改进了模型定标，达到了常用不确定性定标方法的性能。最后，我们证明了将事后校准方法与性能良好的不确定度量化方法相结合可以提高模型的精度和校准。在这项工作中，我们提供了一个全面的概率校准研究，使用神经网络进行药物-靶标相互作用预测。我们报告了超参数选择策略，以及不确定性估计和概率校准方法对不确定性估计的可靠性的显着影响，这对于有效的药物发现过程至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.