Graph neural networks for identifying protein-reactive compounds†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Digital discovery Pub Date : 2024-07-25 DOI:10.1039/D4DD00038B
Victor Hugo Cano Gil and Christopher N. Rowley
{"title":"Graph neural networks for identifying protein-reactive compounds†","authors":"Victor Hugo Cano Gil and Christopher N. Rowley","doi":"10.1039/D4DD00038B","DOIUrl":null,"url":null,"abstract":"<p >The identification of protein-reactive electrophilic compounds is critical to the design of new covalent modifier drugs, screening for toxic compounds, and the exclusion of reactive compounds from high throughput screening. In this work, we employ traditional and graph machine learning (ML) algorithms to classify molecules being reactive towards proteins or nonreactive. For training data, we built a new dataset, ProteinReactiveDB, composed primarily of covalent and noncovalent inhibitors from the DrugBank, BindingDB, and CovalentInDB databases. To assess the transferability of the trained models, we created a custom set of covalent and noncovalent inhibitors, which was constructed from the recent literature. Baseline models were developed using Morgan fingerprints as training inputs, but they performed poorly when applied to compounds outside the training set. We then trained various Graph Neural Networks (GNNs), with the best GNN model achieving an Area Under the Receiver Operator Characteristic (AUROC) curve of 0.80, precision of 0.89, and recall of 0.72. We also explore the interpretability of these GNNs using Gradient Activation Mapping (GradCAM), which shows regions of the molecules GNNs deem most relevant when making a prediction. These maps indicated that our trained models can identify electrophilic functional groups in a molecule and classify molecules as protein-reactive based on their presence. We demonstrate the use of these models by comparing their performance against common chemical filters, identifying covalent modifiers in the ChEMBL database and generating a putative covalent inhibitor based on an established noncovalent inhibitor.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 1776-1792"},"PeriodicalIF":6.2000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00038b?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00038b","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The identification of protein-reactive electrophilic compounds is critical to the design of new covalent modifier drugs, screening for toxic compounds, and the exclusion of reactive compounds from high throughput screening. In this work, we employ traditional and graph machine learning (ML) algorithms to classify molecules being reactive towards proteins or nonreactive. For training data, we built a new dataset, ProteinReactiveDB, composed primarily of covalent and noncovalent inhibitors from the DrugBank, BindingDB, and CovalentInDB databases. To assess the transferability of the trained models, we created a custom set of covalent and noncovalent inhibitors, which was constructed from the recent literature. Baseline models were developed using Morgan fingerprints as training inputs, but they performed poorly when applied to compounds outside the training set. We then trained various Graph Neural Networks (GNNs), with the best GNN model achieving an Area Under the Receiver Operator Characteristic (AUROC) curve of 0.80, precision of 0.89, and recall of 0.72. We also explore the interpretability of these GNNs using Gradient Activation Mapping (GradCAM), which shows regions of the molecules GNNs deem most relevant when making a prediction. These maps indicated that our trained models can identify electrophilic functional groups in a molecule and classify molecules as protein-reactive based on their presence. We demonstrate the use of these models by comparing their performance against common chemical filters, identifying covalent modifiers in the ChEMBL database and generating a putative covalent inhibitor based on an established noncovalent inhibitor.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
识别蛋白质活性化合物的图神经网络
蛋白质反应性亲电化合物的鉴定对于设计新的共价修饰药物、筛选有毒化合物以及将反应性化合物排除在高通量筛选之外至关重要。在这项工作中,我们采用了传统的图式机器学习(ML)算法来分类对蛋白质有反应或无反应的分子。作为训练数据,我们建立了一个新的数据集 ProteinReactiveDB,主要由 DrugBank、BindingDB 和 CovalentInDB 数据库中的共价和非共价抑制剂组成。为了评估训练模型的可移植性,我们创建了一套定制的共价和非共价抑制剂,这套抑制剂是根据最近的文献构建的。我们使用摩根指纹作为训练输入开发了基准模型,但当这些模型应用于训练集之外的化合物时,表现不佳。我们随后训练了各种图神经网络 (GNN),其中最佳的 GNN 模型的接收者运算特性曲线下面积 (AUROC) 为 0.80,精确度为 0.89,召回率为 0.72。我们还使用梯度激活图谱 (GradCAM) 探索了这些 GNN 的可解释性,该图谱显示了 GNN 在进行预测时认为最相关的分子区域。这些图谱表明,我们训练有素的模型可以识别分子中的亲电官能团,并根据它们的存在将分子划分为对蛋白质有反应的分子。我们通过比较这些模型与常见化学过滤器的性能、识别 ChEMBL 数据库中的共价修饰物以及根据已确定的非共价抑制剂生成推定共价抑制剂,展示了这些模型的用途。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
期刊最新文献
Back cover ArcaNN: automated enhanced sampling generation of training sets for chemically reactive machine learning interatomic potentials. Sorting polyolefins with near-infrared spectroscopy: identification of optimal data analysis pipelines and machine learning classifiers†‡ High accuracy uncertainty-aware interatomic force modeling with equivariant Bayesian neural networks† Correction: A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1