A call for an industry-led initiative to critically assess machine learning for real-world drug discovery

IF 18.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Nature Machine Intelligence Pub Date : 2024-10-04 DOI:10.1038/s42256-024-00911-w
Cas Wognum, Jeremy R. Ash, Matteo Aldeghi, Raquel Rodríguez-Pérez, Cheng Fang, Alan C. Cheng, Daniel J. Price, Djork-Arné Clevert, Ola Engkvist, W. Patrick Walters
{"title":"A call for an industry-led initiative to critically assess machine learning for real-world drug discovery","authors":"Cas Wognum, Jeremy R. Ash, Matteo Aldeghi, Raquel Rodríguez-Pérez, Cheng Fang, Alan C. Cheng, Daniel J. Price, Djork-Arné Clevert, Ola Engkvist, W. Patrick Walters","doi":"10.1038/s42256-024-00911-w","DOIUrl":null,"url":null,"abstract":"<p>Machine learning (ML) is driving exciting innovations in drug discovery, but we need to be mindful of the circumstances that set this application apart. Unlike other fields with fit-for-purpose datasets consisting of millions of examples, published datasets in drug discovery are classically heterogeneous, imbalanced, noisy and expensive to generate<sup>1</sup>. Furthermore, the applications of ML in drug discovery are numerous, require familiarity with several scientific disciplines, and inform high-stakes decisions, such as expensive or time-consuming experiments. The absence of standardized, domain-appropriate datasets, guidelines and tools for the evaluation and comparison of methods has led to a growing gap between perceived progress and real-world impact, which is delaying the adoption of ML in drug discovery. To bridge this gap, we believe that the unique expertise of scientists in the industry, who operate in real-world contexts, will be essential in developing benchmarking protocols tailored to drug discovery. To that end, we already formed a unique collaboration between representatives from ten biotech and pharmaceutical companies, but we believe that an open-science, cross-industry and interdisciplinary effort is needed to tackle such grand challenges.</p><p>Fit-for-purpose benchmarks are powerful instruments to direct the ML community towards more impactful research and to unlock breakthrough results. The gold standard for unbiased evaluation is a blind, prospective benchmark, in which different methods are evaluated on a newly generated test set that will only be disclosed after the results have been announced. A popular example in drug discovery is CASP (Critical Assessment of Structure Prediction)<sup>2</sup>, which enabled a revolution in protein structure prediction by systematically identifying valuable innovations in the community<sup>3</sup>. However, data acquisition in drug discovery is expensive and time-consuming, limiting the accessibility and availability of blind benchmarks to the general research community.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":null,"pages":null},"PeriodicalIF":18.8000,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1038/s42256-024-00911-w","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning (ML) is driving exciting innovations in drug discovery, but we need to be mindful of the circumstances that set this application apart. Unlike other fields with fit-for-purpose datasets consisting of millions of examples, published datasets in drug discovery are classically heterogeneous, imbalanced, noisy and expensive to generate1. Furthermore, the applications of ML in drug discovery are numerous, require familiarity with several scientific disciplines, and inform high-stakes decisions, such as expensive or time-consuming experiments. The absence of standardized, domain-appropriate datasets, guidelines and tools for the evaluation and comparison of methods has led to a growing gap between perceived progress and real-world impact, which is delaying the adoption of ML in drug discovery. To bridge this gap, we believe that the unique expertise of scientists in the industry, who operate in real-world contexts, will be essential in developing benchmarking protocols tailored to drug discovery. To that end, we already formed a unique collaboration between representatives from ten biotech and pharmaceutical companies, but we believe that an open-science, cross-industry and interdisciplinary effort is needed to tackle such grand challenges.

Fit-for-purpose benchmarks are powerful instruments to direct the ML community towards more impactful research and to unlock breakthrough results. The gold standard for unbiased evaluation is a blind, prospective benchmark, in which different methods are evaluated on a newly generated test set that will only be disclosed after the results have been announced. A popular example in drug discovery is CASP (Critical Assessment of Structure Prediction)2, which enabled a revolution in protein structure prediction by systematically identifying valuable innovations in the community3. However, data acquisition in drug discovery is expensive and time-consuming, limiting the accessibility and availability of blind benchmarks to the general research community.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
呼吁发起一项由行业主导的倡议,对机器学习在实际药物研发中的应用进行严格评估
机器学习(ML)正在推动药物发现领域令人兴奋的创新,但我们需要注意这一应用领域的特殊情况。与其他领域拥有由数百万实例组成的适用数据集不同,药物发现领域已发布的数据集通常是异构的、不平衡的、有噪声的,而且生成成本高昂1。此外,ML 在药物发现中的应用非常多,需要熟悉多个科学学科,并为昂贵或耗时的实验等重大决策提供信息。由于缺乏标准化的、与领域相适应的数据集、指南和工具来评估和比较各种方法,导致人们认为的进展与实际影响之间的差距越来越大,从而延误了 ML 在药物发现领域的应用。为了弥合这一差距,我们相信,在真实世界中开展工作的业界科学家的独特专业知识对于制定针对药物发现的基准协议至关重要。为此,我们已经与来自十家生物技术和制药公司的代表建立了独特的合作关系,但我们相信,要解决此类重大挑战,还需要开放科学、跨行业和跨学科的努力。无偏见评估的黄金标准是盲目的前瞻性基准,即在新生成的测试集上对不同方法进行评估,测试集只有在结果公布后才会公开。CASP (结构预测批判性评估)2 是药物发现领域的一个流行范例,它通过系统地识别社区中有价值的创新,促成了蛋白质结构预测的一场革命3。然而,药物发现领域的数据采集既昂贵又耗时,这限制了盲基准对一般研究界的可及性和可用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
36.90
自引率
2.10%
发文量
127
期刊介绍: Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements. To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects. Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.
期刊最新文献
A call for an industry-led initiative to critically assess machine learning for real-world drug discovery Engineering flexible machine learning systems by traversing functionally invariant paths Soft robotic shorts improve outdoor walking efficiency in older adults Reusability report: Annotating metabolite mass spectra with domain-inspired chemical formula transformers Sliding-attention transformer neural architecture for predicting T cell receptor–antigen–human leucocyte antigen binding
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1