用于药物发现的共识整体虚拟筛选:一种新型机器学习模型方法。

IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Journal of Cheminformatics Pub Date : 2024-05-28 DOI:10.1186/s13321-024-00855-8
Said Moshawih, Zhen Hui Bu, Hui Poh Goh, Nurolaini Kifli, Lam Hong Lee, Khang Wen Goh, Long Chiau Ming
{"title":"用于药物发现的共识整体虚拟筛选:一种新型机器学习模型方法。","authors":"Said Moshawih,&nbsp;Zhen Hui Bu,&nbsp;Hui Poh Goh,&nbsp;Nurolaini Kifli,&nbsp;Lam Hong Lee,&nbsp;Khang Wen Goh,&nbsp;Long Chiau Ming","doi":"10.1186/s13321-024-00855-8","DOIUrl":null,"url":null,"abstract":"<div><p>In drug discovery, virtual screening is crucial for identifying potential hit compounds. This study aims to present a novel pipeline that employs machine learning models that amalgamates various conventional screening methods. A diverse array of protein targets was selected, and their corresponding datasets were subjected to active/decoy distribution analysis prior to scoring using four distinct methods: QSAR, Pharmacophore, docking, and 2D shape similarity, which were ultimately integrated into a single consensus score. The fine-tuned machine learning models were ranked using the novel formula “w_new”, consensus scores were calculated, and an enrichment study was performed for each target. Distinctively, consensus scoring outperformed other methods in specific protein targets such as PPARG and DPP4, achieving AUC values of 0.90 and 0.84, respectively. Remarkably, this approach consistently prioritized compounds with higher experimental PIC<sub>50</sub> values compared to all other screening methodologies. Moreover, the models demonstrated a range of moderate to high performance in terms of R<sup>2</sup> values during external validation. In conclusion, this novel workflow consistently delivered superior results, emphasizing the significance of a holistic approach in drug discovery, where both quantitative metrics and active enrichment play pivotal roles in identifying the best virtual screening methodology.</p><p><b>Scientific contribution</b></p><p>We presented a novel consensus scoring workflow in virtual screening, merging diverse methods for enhanced compound selection. We also introduced ‘w_new’, a groundbreaking metric that intricately refines machine learning model rankings by weighing various model-specific parameters, revolutionizing their efficacy in drug discovery in addition to other domains.</p><h3>Graphical Abstract</h3>\n<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00855-8","citationCount":"0","resultStr":"{\"title\":\"Consensus holistic virtual screening for drug discovery: a novel machine learning model approach\",\"authors\":\"Said Moshawih,&nbsp;Zhen Hui Bu,&nbsp;Hui Poh Goh,&nbsp;Nurolaini Kifli,&nbsp;Lam Hong Lee,&nbsp;Khang Wen Goh,&nbsp;Long Chiau Ming\",\"doi\":\"10.1186/s13321-024-00855-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In drug discovery, virtual screening is crucial for identifying potential hit compounds. This study aims to present a novel pipeline that employs machine learning models that amalgamates various conventional screening methods. A diverse array of protein targets was selected, and their corresponding datasets were subjected to active/decoy distribution analysis prior to scoring using four distinct methods: QSAR, Pharmacophore, docking, and 2D shape similarity, which were ultimately integrated into a single consensus score. The fine-tuned machine learning models were ranked using the novel formula “w_new”, consensus scores were calculated, and an enrichment study was performed for each target. Distinctively, consensus scoring outperformed other methods in specific protein targets such as PPARG and DPP4, achieving AUC values of 0.90 and 0.84, respectively. Remarkably, this approach consistently prioritized compounds with higher experimental PIC<sub>50</sub> values compared to all other screening methodologies. Moreover, the models demonstrated a range of moderate to high performance in terms of R<sup>2</sup> values during external validation. In conclusion, this novel workflow consistently delivered superior results, emphasizing the significance of a holistic approach in drug discovery, where both quantitative metrics and active enrichment play pivotal roles in identifying the best virtual screening methodology.</p><p><b>Scientific contribution</b></p><p>We presented a novel consensus scoring workflow in virtual screening, merging diverse methods for enhanced compound selection. We also introduced ‘w_new’, a groundbreaking metric that intricately refines machine learning model rankings by weighing various model-specific parameters, revolutionizing their efficacy in drug discovery in addition to other domains.</p><h3>Graphical Abstract</h3>\\n<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>\",\"PeriodicalId\":617,\"journal\":{\"name\":\"Journal of Cheminformatics\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":7.1000,\"publicationDate\":\"2024-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00855-8\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cheminformatics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1186/s13321-024-00855-8\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00855-8","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

在药物发现过程中,虚拟筛选对于确定潜在的热门化合物至关重要。本研究旨在介绍一种采用机器学习模型的新型筛选方法,该方法综合了各种传统筛选方法。研究人员选择了一系列不同的蛋白质靶点,并对其相应的数据集进行了活性/诱饵分布分析,然后使用四种不同的方法进行评分:在使用四种不同的方法(QSAR、Pharmacophore、Docking 和 2D 形状相似性)进行评分之前,先对其相应的数据集进行活性/诱饵分布分析,最终将其整合为一个单一的共识得分。使用新公式 "w_new "对微调后的机器学习模型进行排序,计算共识得分,并对每个靶点进行富集研究。在 PPARG 和 DPP4 等特定蛋白质靶标方面,共识评分的效果明显优于其他方法,AUC 值分别达到 0.90 和 0.84。值得注意的是,与所有其他筛选方法相比,这种方法始终优先选择实验 PIC50 值更高的化合物。此外,在外部验证过程中,这些模型的 R2 值表现出了中等到高等的性能。总之,这种新颖的工作流程始终能带来卓越的结果,强调了整体方法在药物发现中的重要性,其中定量指标和活性富集在确定最佳虚拟筛选方法中发挥了关键作用。我们还介绍了 "w_new",这是一种开创性的指标,它通过权衡各种特定模型参数来复杂地完善机器学习模型的排名,彻底改变了它们在药物发现和其他领域的功效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Consensus holistic virtual screening for drug discovery: a novel machine learning model approach

In drug discovery, virtual screening is crucial for identifying potential hit compounds. This study aims to present a novel pipeline that employs machine learning models that amalgamates various conventional screening methods. A diverse array of protein targets was selected, and their corresponding datasets were subjected to active/decoy distribution analysis prior to scoring using four distinct methods: QSAR, Pharmacophore, docking, and 2D shape similarity, which were ultimately integrated into a single consensus score. The fine-tuned machine learning models were ranked using the novel formula “w_new”, consensus scores were calculated, and an enrichment study was performed for each target. Distinctively, consensus scoring outperformed other methods in specific protein targets such as PPARG and DPP4, achieving AUC values of 0.90 and 0.84, respectively. Remarkably, this approach consistently prioritized compounds with higher experimental PIC50 values compared to all other screening methodologies. Moreover, the models demonstrated a range of moderate to high performance in terms of R2 values during external validation. In conclusion, this novel workflow consistently delivered superior results, emphasizing the significance of a holistic approach in drug discovery, where both quantitative metrics and active enrichment play pivotal roles in identifying the best virtual screening methodology.

Scientific contribution

We presented a novel consensus scoring workflow in virtual screening, merging diverse methods for enhanced compound selection. We also introduced ‘w_new’, a groundbreaking metric that intricately refines machine learning model rankings by weighing various model-specific parameters, revolutionizing their efficacy in drug discovery in addition to other domains.

Graphical Abstract

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
期刊最新文献
One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening Chemical space as a unifying theme for chemistry Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing Fragmenstein: predicting protein–ligand structures of compounds derived from known crystallographic fragment hits using a strict conserved-binding–based methodology ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1