GR-pKa: a message-passing neural network with retention mechanism for pKa prediction.

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS Briefings in bioinformatics Pub Date : 2024-07-25 DOI:10.1093/bib/bbae408

Runyu Miao, Danlin Liu, Liyun Mao, Xingyu Chen, Leihao Zhang, Zhen Yuan, Shanshan Shi, Honglin Li, Shiliang Li

{"title":"GR-pKa: a message-passing neural network with retention mechanism for pKa prediction.","authors":"Runyu Miao, Danlin Liu, Liyun Mao, Xingyu Chen, Leihao Zhang, Zhen Yuan, Shanshan Shi, Honglin Li, Shiliang Li","doi":"10.1093/bib/bbae408","DOIUrl":null,"url":null,"abstract":"<p><p>During the drug discovery and design process, the acid-base dissociation constant (pKa) of a molecule is critically emphasized due to its crucial role in influencing the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties and biological activity. However, the experimental determination of pKa values is often laborious and complex. Moreover, existing prediction methods exhibit limitations in both the quantity and quality of the training data, as well as in their capacity to handle the complex structural and physicochemical properties of compounds, consequently impeding accuracy and generalization. Therefore, developing a method that can quickly and accurately predict molecular pKa values will to some extent help the structural modification of molecules, and thus assist the development process of new drugs. In this study, we developed a cutting-edge pKa prediction model named GR-pKa (Graph Retention pKa), leveraging a message-passing neural network and employing a multi-fidelity learning strategy to accurately predict molecular pKa values. The GR-pKa model incorporates five quantum mechanical properties related to molecular thermodynamics and dynamics as key features to characterize molecules. Notably, we originally introduced the novel retention mechanism into the message-passing phase, which significantly improves the model's ability to capture and update molecular information. Our GR-pKa model outperforms several state-of-the-art models in predicting macro-pKa values, achieving impressive results with a low mean absolute error of 0.490 and root mean square error of 0.588, and a high R2 of 0.937 on the SAMPL7 dataset.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339865/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae408","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

During the drug discovery and design process, the acid-base dissociation constant (pKa) of a molecule is critically emphasized due to its crucial role in influencing the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties and biological activity. However, the experimental determination of pKa values is often laborious and complex. Moreover, existing prediction methods exhibit limitations in both the quantity and quality of the training data, as well as in their capacity to handle the complex structural and physicochemical properties of compounds, consequently impeding accuracy and generalization. Therefore, developing a method that can quickly and accurately predict molecular pKa values will to some extent help the structural modification of molecules, and thus assist the development process of new drugs. In this study, we developed a cutting-edge pKa prediction model named GR-pKa (Graph Retention pKa), leveraging a message-passing neural network and employing a multi-fidelity learning strategy to accurately predict molecular pKa values. The GR-pKa model incorporates five quantum mechanical properties related to molecular thermodynamics and dynamics as key features to characterize molecules. Notably, we originally introduced the novel retention mechanism into the message-passing phase, which significantly improves the model's ability to capture and update molecular information. Our GR-pKa model outperforms several state-of-the-art models in predicting macro-pKa values, achieving impressive results with a low mean absolute error of 0.490 and root mean square error of 0.588, and a high R2 of 0.937 on the SAMPL7 dataset.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GR-pKa：用于 pKa 预测的具有保留机制的信息传递神经网络。

在药物发现和设计过程中，由于分子的酸碱解离常数（pKa）在影响药物的 ADMET（吸收、分布、代谢、排泄和毒性）特性和生物活性方面起着至关重要的作用，因此它备受重视。然而，pKa 值的实验测定通常既费力又复杂。此外，现有的预测方法在训练数据的数量和质量以及处理化合物复杂的结构和理化性质的能力方面都存在局限性，从而影响了预测的准确性和通用性。因此，开发一种能快速、准确预测分子 pKa 值的方法，将在一定程度上有助于分子的结构改造，从而帮助新药的开发过程。在这项研究中，我们开发了一种名为 GR-pKa（Graph Retention pKa）的前沿 pKa 预测模型，利用消息传递神经网络和多保真度学习策略来准确预测分子 pKa 值。GR-pKa 模型将与分子热力学和动力学相关的五种量子力学性质作为表征分子的关键特征。值得注意的是，我们最初在信息传递阶段引入了新颖的保留机制，这大大提高了模型捕捉和更新分子信息的能力。在预测宏观pKa值方面，我们的GR-pKa模型优于几种最先进的模型，在SAMPL7数据集上取得了令人印象深刻的结果，平均绝对误差为0.490，均方根误差为0.588，R2高达0.937。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.