WelQrate:确定小分子药物发现基准的黄金标准。

ArXiv Pub Date : 2024-11-14
Yunchao Lance Liu, Ha Dong, Xin Wang, Rocco Moretti, Yu Wang, Zhaoqian Su, Jiawei Gu, Bobby Bodenheimer, Charles David Weaver, Jens Meiler, Tyler Derr
{"title":"WelQrate:确定小分子药物发现基准的黄金标准。","authors":"Yunchao Lance Liu, Ha Dong, Xin Wang, Rocco Moretti, Yu Wang, Zhaoqian Su, Jiawei Gu, Bobby Bodenheimer, Charles David Weaver, Jens Meiler, Tyler Derr","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices. We posit that without a sound model evaluation framework, the AI community's efforts cannot reach their full potential, thereby slowing the progress and transfer of innovation into real-world drug discovery. Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, <i>WelQrate</i>. Specifically, our contributions are threefold: <b><i>WelQrate</i> Dataset Collection</b> - we introduce a meticulously curated collection of 9 datasets spanning 5 therapeutic target classes. Our hierarchical curation pipelines, designed by drug discovery experts, go beyond the primary high-throughput screen by leveraging additional confirmatory and counter screens along with rigorous domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS) filtering, to ensure the high-quality data in the datasets; <b><i>WelQrate</i> Evaluation Framework</b> - we propose a standardized model evaluation framework considering high-quality datasets, featurization, 3D conformation generation, evaluation metrics, and data splits, which provides a reliable benchmarking for drug discovery experts conducting real-world virtual screening; <b>Benchmarking</b> - we evaluate model performance through various research questions using the <i>WelQrate</i> dataset collection, exploring the effects of different models, dataset quality, featurization methods, and data splitting strategies on the results. In summary, we recommend adopting our proposed <i>WelQrate</i> as the gold standard in small molecule drug discovery benchmarking. The <i>WelQrate</i> dataset collection, along with the curation codes, and experimental scripts are all publicly available at WelQrate.org.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601797/pdf/","citationCount":"0","resultStr":"{\"title\":\"WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking.\",\"authors\":\"Yunchao Lance Liu, Ha Dong, Xin Wang, Rocco Moretti, Yu Wang, Zhaoqian Su, Jiawei Gu, Bobby Bodenheimer, Charles David Weaver, Jens Meiler, Tyler Derr\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices. We posit that without a sound model evaluation framework, the AI community's efforts cannot reach their full potential, thereby slowing the progress and transfer of innovation into real-world drug discovery. Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, <i>WelQrate</i>. Specifically, our contributions are threefold: <b><i>WelQrate</i> Dataset Collection</b> - we introduce a meticulously curated collection of 9 datasets spanning 5 therapeutic target classes. Our hierarchical curation pipelines, designed by drug discovery experts, go beyond the primary high-throughput screen by leveraging additional confirmatory and counter screens along with rigorous domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS) filtering, to ensure the high-quality data in the datasets; <b><i>WelQrate</i> Evaluation Framework</b> - we propose a standardized model evaluation framework considering high-quality datasets, featurization, 3D conformation generation, evaluation metrics, and data splits, which provides a reliable benchmarking for drug discovery experts conducting real-world virtual screening; <b>Benchmarking</b> - we evaluate model performance through various research questions using the <i>WelQrate</i> dataset collection, exploring the effects of different models, dataset quality, featurization methods, and data splitting strategies on the results. In summary, we recommend adopting our proposed <i>WelQrate</i> as the gold standard in small molecule drug discovery benchmarking. The <i>WelQrate</i> dataset collection, along with the curation codes, and experimental scripts are all publicly available at WelQrate.org.</p>\",\"PeriodicalId\":93888,\"journal\":{\"name\":\"ArXiv\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601797/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ArXiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

虽然深度学习为计算机辅助药物发现带来了革命性的变化,但人工智能界主要关注的是模型创新,而不太重视建立最佳基准实践。我们认为,如果没有一个完善的模型评估框架,人工智能界的努力就无法充分发挥其潜力,从而延缓创新在现实世界药物发现中的进展和转移。因此,在本文中,我们试图建立一个新的小分子药物发现基准黄金标准--WelQrate。具体来说,我们的贡献有三个方面:WelQrate 数据集收集--我们介绍了经过精心策划的 9 个数据集,涵盖 5 个治疗靶点类别。我们的分层筛选管道由药物发现专家设计,通过利用额外的确证筛选和反筛选以及严格的领域驱动预处理(如泛检测干扰化合物 (PAINS) 过滤),超越了主要的高通量筛选,以确保数据集中的高质量数据;WelQrate 评估框架 - 我们提出了一个标准化的模型评估框架,该框架考虑了高质量数据集、特征化、三维构象生成、评估指标和数据拆分,为药物发现专家进行真实世界虚拟筛选提供了可靠的基准;基准评估 - 我们利用 WelQrate 数据集通过各种研究问题评估模型性能,探索不同模型、数据集质量、特征化方法和数据拆分策略对结果的影响。总之,我们建议采用我们提出的 WelQrate 作为小分子药物发现基准测试的黄金标准。WelQrate 数据集、整理代码和实验脚本均可在 WelQrate.org 上公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking.

While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices. We posit that without a sound model evaluation framework, the AI community's efforts cannot reach their full potential, thereby slowing the progress and transfer of innovation into real-world drug discovery. Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, WelQrate. Specifically, our contributions are threefold: WelQrate Dataset Collection - we introduce a meticulously curated collection of 9 datasets spanning 5 therapeutic target classes. Our hierarchical curation pipelines, designed by drug discovery experts, go beyond the primary high-throughput screen by leveraging additional confirmatory and counter screens along with rigorous domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS) filtering, to ensure the high-quality data in the datasets; WelQrate Evaluation Framework - we propose a standardized model evaluation framework considering high-quality datasets, featurization, 3D conformation generation, evaluation metrics, and data splits, which provides a reliable benchmarking for drug discovery experts conducting real-world virtual screening; Benchmarking - we evaluate model performance through various research questions using the WelQrate dataset collection, exploring the effects of different models, dataset quality, featurization methods, and data splitting strategies on the results. In summary, we recommend adopting our proposed WelQrate as the gold standard in small molecule drug discovery benchmarking. The WelQrate dataset collection, along with the curation codes, and experimental scripts are all publicly available at WelQrate.org.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Metastability in networks of nonlinear stochastic integrate-and-fire neurons. On the linear scaling of entropy vs. energy in human brain activity, the Hagedorn temperature and the Zipf law. Timing consistency of T cell receptor activation in a stochastic model combining kinetic segregation and proofreading. Brain Morphology Normative modelling platform for abnormality and Centile estimation: Brain MoNoCle. Adversarial Attacks on Large Language Models in Medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1