AttenGpKa: A Universal Predictor of Solvation Acidity Using Graph Neural Network and Molecular Topology.

IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Journal of Chemical Information and Modeling Pub Date : 2024-07-22 Epub Date: 2024-07-09 DOI:10.1021/acs.jcim.4c00449
Hongle An, Xuyang Liu, Wensheng Cai, Xueguang Shao
{"title":"AttenGpKa: A Universal Predictor of Solvation Acidity Using Graph Neural Network and Molecular Topology.","authors":"Hongle An, Xuyang Liu, Wensheng Cai, Xueguang Shao","doi":"10.1021/acs.jcim.4c00449","DOIUrl":null,"url":null,"abstract":"<p><p>Rapid and accurate calculation of acid dissociation constant (p<i>K</i><sub>a</sub>) is crucial for designing chemical synthesis routes, optimizing catalysts, and predicting chemical behavior. Despite recent progress in machine learning, predicting solvation acidity, especially in nonaqueous solvents, remains challenging due to limited experimental data. This challenge arises from treating experimental values in different solvents as distinct data domains and modeling them separately. In this work, we treat both the solutes and solvents equally from a perspective of molecular topology and propose a highly universal framework called AttenGpKa for predicting solvation acidity. AttenGpKa is trained using 26,522 experimental p<i>K</i><sub>a</sub> values from 60 pure and mixed solvents in the <i>i</i>BonD database. As a result, our model can simultaneously predict the p<i>K</i><sub>a</sub> values of a compound in various solvents, including pure water, pure nonaqueous, and mixed solvents. AttenGpKa achieves universality by using graph neural networks and attention mechanisms to learn complex effects within solute and solvent molecules. Furthermore, encodings of both solute and solvent molecules are adaptively fused to simulate the influence of the solvent on acid dissociation. AttenGpKa demonstrates robust generalization in extensive validations. The interpretability studies further indicate that our model has effectively learnt electronic and solvent effects. A free-to-use software is provided to facilitate the use of AttenGpKa for p<i>K</i><sub>a</sub> prediction.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c00449","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/9 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Rapid and accurate calculation of acid dissociation constant (pKa) is crucial for designing chemical synthesis routes, optimizing catalysts, and predicting chemical behavior. Despite recent progress in machine learning, predicting solvation acidity, especially in nonaqueous solvents, remains challenging due to limited experimental data. This challenge arises from treating experimental values in different solvents as distinct data domains and modeling them separately. In this work, we treat both the solutes and solvents equally from a perspective of molecular topology and propose a highly universal framework called AttenGpKa for predicting solvation acidity. AttenGpKa is trained using 26,522 experimental pKa values from 60 pure and mixed solvents in the iBonD database. As a result, our model can simultaneously predict the pKa values of a compound in various solvents, including pure water, pure nonaqueous, and mixed solvents. AttenGpKa achieves universality by using graph neural networks and attention mechanisms to learn complex effects within solute and solvent molecules. Furthermore, encodings of both solute and solvent molecules are adaptively fused to simulate the influence of the solvent on acid dissociation. AttenGpKa demonstrates robust generalization in extensive validations. The interpretability studies further indicate that our model has effectively learnt electronic and solvent effects. A free-to-use software is provided to facilitate the use of AttenGpKa for pKa prediction.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AttenGpKa:利用图神经网络和分子拓扑学的溶解酸度通用预测器
快速准确地计算酸解离常数(pKa)对于设计化学合成路线、优化催化剂和预测化学行为至关重要。尽管最近在机器学习方面取得了进展,但由于实验数据有限,预测溶解酸度,尤其是非水溶剂中的溶解酸度,仍然具有挑战性。这一挑战源于将不同溶剂中的实验值视为不同的数据域并分别建模。在这项工作中,我们从分子拓扑学的角度平等对待溶质和溶剂,并提出了一个名为 AttenGpKa 的高度通用框架来预测溶解酸度。我们使用 iBonD 数据库中 60 种纯溶剂和混合溶剂的 26522 个实验 pKa 值对 AttenGpKa 进行了训练。因此,我们的模型可以同时预测化合物在各种溶剂(包括纯水、纯非水和混合溶剂)中的 pKa 值。AttenGpKa 通过使用图神经网络和注意力机制来学习溶质和溶剂分子内的复杂效应,从而实现了通用性。此外,溶质和溶剂分子的编码都是自适应融合的,以模拟溶剂对酸解离的影响。AttenGpKa 在广泛的验证中显示出强大的通用性。可解释性研究进一步表明,我们的模型有效地学习了电子和溶剂效应。为方便使用 AttenGpKa 进行 pKa 预测,我们提供了一个免费使用的软件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
期刊最新文献
Combatting Antibiotic-Resistant Staphylococcus aureus: Discovery of TST1N-224, a Potent Inhibitor Targeting Response Regulator VraRC, through Pharmacophore-Based Screening and Molecular Characterizations. Charge Relaying within a Phospho-Motif Rescue Binding Competency of a Disordered Transcription Factor. Fully Flexible Molecular Alignment Enables Accurate Ligand Structure Modeling. Integrating Prior Chemical Knowledge into the Graph Transformer Network to Predict the Stability Constants of Chelating Agents and Metal Ions. Prediction of Protein Allosteric Sites with Transfer Entropy and Spatial Neighbor-Based Evolutionary Information Learned by an Ensemble Model.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1