CovCysPredictor: Predicting Selective Covalently Modifiable Cysteines Using Protein Structure and Interpretable Machine Learning.

IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Journal of Chemical Information and Modeling Pub Date : 2025-01-27 Epub Date: 2025-01-08 DOI:10.1021/acs.jcim.4c01281
Bryn Marie Reimer, Ernest Awoonor-Williams, Andrei A Golosov, Viktor Hornak
{"title":"CovCysPredictor: Predicting Selective Covalently Modifiable Cysteines Using Protein Structure and Interpretable Machine Learning.","authors":"Bryn Marie Reimer, Ernest Awoonor-Williams, Andrei A Golosov, Viktor Hornak","doi":"10.1021/acs.jcim.4c01281","DOIUrl":null,"url":null,"abstract":"<p><p>Targeted covalent inhibition is a powerful therapeutic modality in the drug discoverer's toolbox. Recent advances in covalent drug discovery, in particular, targeting cysteines, have led to significant breakthroughs for traditionally challenging targets such as mutant KRAS, which is implicated in diverse human cancers. However, identifying cysteines for targeted covalent inhibition is a difficult task, as experimental and in silico tools have shown limited accuracy. Using the recently released CovPDB and CovBinderInPDB databases, we have trained and tested interpretable machine learning (ML) models to identify cysteines that are liable to be covalently modified (i.e., \"ligandable\" cysteines). We explored myriad physicochemical features (p<i>K</i><sub>a</sub>, solvent exposure, residue electrostatics, etc.) and protein-ligand pocket descriptors in our ML models. Our final logistic regression model achieved a median F<sub>1</sub> score of 0.73 on held-out test sets. When tested on a small sample of <i>holo</i> proteins, our model also showed reasonable performance, accurately predicting the most ligandable cysteine in most cases. Taken together, these results indicate that we can accurately predict potential ligandable cysteines for targeted covalent drug discovery, privileging cysteines that are more likely to be selective rather than purely reactive. We release this tool to the scientific community as CovCysPredictor.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"544-553"},"PeriodicalIF":5.6000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c01281","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/8 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Targeted covalent inhibition is a powerful therapeutic modality in the drug discoverer's toolbox. Recent advances in covalent drug discovery, in particular, targeting cysteines, have led to significant breakthroughs for traditionally challenging targets such as mutant KRAS, which is implicated in diverse human cancers. However, identifying cysteines for targeted covalent inhibition is a difficult task, as experimental and in silico tools have shown limited accuracy. Using the recently released CovPDB and CovBinderInPDB databases, we have trained and tested interpretable machine learning (ML) models to identify cysteines that are liable to be covalently modified (i.e., "ligandable" cysteines). We explored myriad physicochemical features (pKa, solvent exposure, residue electrostatics, etc.) and protein-ligand pocket descriptors in our ML models. Our final logistic regression model achieved a median F1 score of 0.73 on held-out test sets. When tested on a small sample of holo proteins, our model also showed reasonable performance, accurately predicting the most ligandable cysteine in most cases. Taken together, these results indicate that we can accurately predict potential ligandable cysteines for targeted covalent drug discovery, privileging cysteines that are more likely to be selective rather than purely reactive. We release this tool to the scientific community as CovCysPredictor.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CovCysPredictor:使用蛋白质结构和可解释性机器学习预测选择性共价修饰半胱氨酸。
靶向共价抑制是药物发现者工具箱中的一种强大的治疗方式。最近在共价药物发现方面的进展,特别是针对半胱氨酸的药物发现,已经为传统上具有挑战性的靶点(如与多种人类癌症有关的突变KRAS)带来了重大突破。然而,鉴定半胱氨酸靶向共价抑制是一项艰巨的任务,因为实验和计算机工具显示出有限的准确性。使用最近发布的CovPDB和CovBinderInPDB数据库,我们训练和测试了可解释的机器学习(ML)模型,以识别容易被共价修饰的半胱氨酸(即“可配体”半胱氨酸)。我们在ML模型中探索了无数的物理化学特征(pKa、溶剂暴露、残留静电等)和蛋白质配体口袋描述符。我们最终的逻辑回归模型在hold out测试集上的F1中位数得分为0.73。当在小样本的全蛋白上测试时,我们的模型也显示出合理的性能,在大多数情况下准确地预测了最可配体的半胱氨酸。综上所述,这些结果表明我们可以准确地预测潜在的可配体半胱氨酸,用于靶向共价药物的发现,更有可能是选择性的半胱氨酸,而不是纯粹的反应性的半胱氨酸。我们将这个工具作为CovCysPredictor发布给科学界。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
期刊最新文献
Chemically Informed Deep Learning for Interpretable Radical Reaction Prediction. Modeling Heterogeneous Catalysis Using Quantum Computers: An Academic and Industry Perspective. ComNet: A Multiview Deep Learning Model for Predicting Drug Combination Side Effects. Quick-and-Easy Validation of Protein-Ligand Binding Models Using Fragment-Based Semiempirical Quantum Chemistry. End-Point Affinity Estimation of Galectin Ligands by Classical and Semiempirical Quantum Mechanical Potentials.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1