iScore: A ML-Based Scoring Function for De Novo Drug Discovery.

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL Journal of Chemical Information and Modeling Pub Date : 2025-03-24 Epub Date: 2025-03-04 DOI:10.1021/acs.jcim.4c02192

Sayyed Jalil Mahdizadeh, Leif A Eriksson

{"title":"iScore: A ML-Based Scoring Function for De Novo Drug Discovery.","authors":"Sayyed Jalil Mahdizadeh, Leif A Eriksson","doi":"10.1021/acs.jcim.4c02192","DOIUrl":null,"url":null,"abstract":"In the quest for accelerating de novo drug discovery, the development of efficient and accurate scoring functions represents a fundamental challenge. This study introduces iScore, a novel machine learning (ML)-based scoring function designed to predict the binding affinity of protein-ligand complexes with remarkable speed and precision. Uniquely, iScore circumvents the conventional reliance on explicit knowledge of protein-ligand interactions and a full picture of atomic contacts, instead leveraging a set of ligand and binding pocket descriptors to directly evaluate binding affinity. This approach enables skipping the inefficient and slow conformational sampling stage, thereby enabling the rapid screening of ultrahuge molecular libraries, a crucial advancement given the practically infinite dimensions of chemical space. iScore was rigorously trained and validated using the PDBbind 2020 refined set, CASF 2016, CSAR NRC-HiQ Set1/2, DUD-E, and target fishing data sets, employing three distinct ML methodologies: Deep neural network (iScore-DNN), random forest (iScore-RF), and eXtreme gradient boosting (iScore-XGB). A hybrid model, iScore-Hybrid, was subsequently developed to incorporate the strengths of these individual base learners. The hybrid model demonstrated a Pearson correlation coefficient (R) of 0.78 and a root-mean-square error (RMSE) of 1.23 in cross-validation, outperforming the individual base learners and establishing new benchmarks for scoring power (R = 0.814, RMSE = 1.34), ranking power (ρ = 0.705), and screening power (success rate at top 10% = 73.7%). Moreover, iScore-Hybrid demonstrated great performance in the target fishing benchmarking study.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2759-2772"},"PeriodicalIF":5.3000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11938276/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c02192","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

Abstract

In the quest for accelerating de novo drug discovery, the development of efficient and accurate scoring functions represents a fundamental challenge. This study introduces iScore, a novel machine learning (ML)-based scoring function designed to predict the binding affinity of protein-ligand complexes with remarkable speed and precision. Uniquely, iScore circumvents the conventional reliance on explicit knowledge of protein-ligand interactions and a full picture of atomic contacts, instead leveraging a set of ligand and binding pocket descriptors to directly evaluate binding affinity. This approach enables skipping the inefficient and slow conformational sampling stage, thereby enabling the rapid screening of ultrahuge molecular libraries, a crucial advancement given the practically infinite dimensions of chemical space. iScore was rigorously trained and validated using the PDBbind 2020 refined set, CASF 2016, CSAR NRC-HiQ Set1/2, DUD-E, and target fishing data sets, employing three distinct ML methodologies: Deep neural network (iScore-DNN), random forest (iScore-RF), and eXtreme gradient boosting (iScore-XGB). A hybrid model, iScore-Hybrid, was subsequently developed to incorporate the strengths of these individual base learners. The hybrid model demonstrated a Pearson correlation coefficient (R) of 0.78 and a root-mean-square error (RMSE) of 1.23 in cross-validation, outperforming the individual base learners and establishing new benchmarks for scoring power (R = 0.814, RMSE = 1.34), ranking power (ρ = 0.705), and screening power (success rate at top 10% = 73.7%). Moreover, iScore-Hybrid demonstrated great performance in the target fishing benchmarking study.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

iScore：一种基于ml的新药发现评分函数。

在寻求加速新药物发现的过程中，开发高效准确的评分功能是一项基本挑战。本研究引入了一种新的基于机器学习（ML）的评分函数iScore，旨在以惊人的速度和精度预测蛋白质-配体复合物的结合亲和力。独特的是，iScore绕过了传统上对蛋白质-配体相互作用的明确知识和原子接触的全图的依赖，而是利用一组配体和结合口袋描述符来直接评估结合亲和力。这种方法可以跳过低效和缓慢的构象采样阶段，从而能够快速筛选超大分子文库，这是考虑到化学空间几乎无限维度的关键进步。iScore使用PDBbind 2020精细化集、CASF 2016、CSAR NRC-HiQ Set1/2、ddu - e和目标捕捞数据集进行严格训练和验证，采用三种不同的ML方法：深度神经网络（iScore- dnn）、随机森林（iScore- rf）和极端梯度增强（iScore- xgb）。随后开发了一个混合模型，iScore-Hybrid，以结合这些个体基础学习者的优势。交叉验证表明，混合模型的Pearson相关系数(R)为0.78，均方根误差（RMSE）为1.23，优于单个基础学习器，并建立了评分能力（R = 0.814, RMSE = 1.34）、排名能力（ρ = 0.705）和筛选能力（前10%成功率= 73.7%）的新基准。此外，iScore-Hybrid在目标捕捞基准研究中表现出色。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.