Machine Learning Assisted Hit Prioritization for High Throughput Screening in Drug Discovery

IF 10.4 1区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY ACS Central Science Pub Date : 2024-03-15 DOI:10.1021/acscentsci.3c01517

Davide Boldini, Lukas Friedrich, Daniel Kuhn and Stephan A. Sieber*,

{"title":"Machine Learning Assisted Hit Prioritization for High Throughput Screening in Drug Discovery","authors":"Davide Boldini, Lukas Friedrich, Daniel Kuhn and Stephan A. Sieber*, ","doi":"10.1021/acscentsci.3c01517","DOIUrl":null,"url":null,"abstract":"<p >Efficient prioritization of bioactive compounds from high throughput screening campaigns is a fundamental challenge for accelerating drug development efforts. In this study, we present the first data-driven approach to simultaneously detect assay interferents and prioritize true bioactive compounds. By analyzing the learning dynamics during training of a gradient boosting model on noisy high throughput screening data using a novel formulation of sample influence, we are able to distinguish between compounds exhibiting the desired biological response and those producing assay artifacts. Therefore, our method enables false positive and true positive detection without relying on prior screens or assay interference mechanisms, making it applicable to any high throughput screening campaign. We demonstrate that our approach consistently excludes assay interferents with different mechanisms and prioritizes biologically relevant compounds more efficiently than all tested baselines, including a retrospective case study simulating its use in a real drug discovery campaign. Finally, our tool is extremely computationally efficient, requiring less than 30 s per assay on low-resource hardware. As such, our findings show that our method is an ideal addition to existing false positive detection tools and can be used to guide further pharmacological optimization after high throughput screening campaigns.</p><p >Minimum variance sampling analysis (MVS-A) is a fast machine-learning approach enabling the identification of both true bioactive compounds and false positives in high throughput screening data.</p>","PeriodicalId":10,"journal":{"name":"ACS Central Science","volume":"10 4","pages":"823–832"},"PeriodicalIF":10.4000,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acscentsci.3c01517","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Central Science","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acscentsci.3c01517","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Efficient prioritization of bioactive compounds from high throughput screening campaigns is a fundamental challenge for accelerating drug development efforts. In this study, we present the first data-driven approach to simultaneously detect assay interferents and prioritize true bioactive compounds. By analyzing the learning dynamics during training of a gradient boosting model on noisy high throughput screening data using a novel formulation of sample influence, we are able to distinguish between compounds exhibiting the desired biological response and those producing assay artifacts. Therefore, our method enables false positive and true positive detection without relying on prior screens or assay interference mechanisms, making it applicable to any high throughput screening campaign. We demonstrate that our approach consistently excludes assay interferents with different mechanisms and prioritizes biologically relevant compounds more efficiently than all tested baselines, including a retrospective case study simulating its use in a real drug discovery campaign. Finally, our tool is extremely computationally efficient, requiring less than 30 s per assay on low-resource hardware. As such, our findings show that our method is an ideal addition to existing false positive detection tools and can be used to guide further pharmacological optimization after high throughput screening campaigns.

Minimum variance sampling analysis (MVS-A) is a fast machine-learning approach enabling the identification of both true bioactive compounds and false positives in high throughput screening data.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

机器学习辅助药物发现中高通量筛选的命中优先级排序

从高通量筛选活动中有效地确定生物活性化合物的优先次序是加速药物开发工作的一项基本挑战。在本研究中，我们提出了第一种数据驱动方法，可同时检测检测干扰物和优先筛选真正的生物活性化合物。通过分析梯度提升模型在嘈杂的高通量筛选数据上训练过程中的学习动态，并使用一种新颖的样本影响公式，我们能够区分出表现出预期生物反应的化合物和产生检测伪影的化合物。因此，我们的方法可以实现假阳性和真阳性检测，而无需依赖先前的筛选或检测干扰机制，因此适用于任何高通量筛选活动。我们证明，与所有测试基线相比，我们的方法能一致地排除不同机制的检测干扰，并更有效地确定生物相关化合物的优先级，包括一项模拟在真实药物发现活动中使用该方法的回顾性案例研究。最后，我们的工具具有极高的计算效率，在低资源硬件上每次检测只需不到 30 秒。因此，我们的研究结果表明，我们的方法是现有假阳性检测工具的理想补充，可用于指导高通量筛选活动后的进一步药理优化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACS Central Science Chemical Engineering-General Chemical Engineering

CiteScore

25.50

自引率

0.50%

发文量

194

审稿时长

10 weeks

期刊介绍： ACS Central Science publishes significant primary reports on research in chemistry and allied fields where chemical approaches are pivotal. As the first fully open-access journal by the American Chemical Society, it covers compelling and important contributions to the broad chemistry and scientific community. "Central science," a term popularized nearly 40 years ago, emphasizes chemistry's central role in connecting physical and life sciences, and fundamental sciences with applied disciplines like medicine and engineering. The journal focuses on exceptional quality articles, addressing advances in fundamental chemistry and interdisciplinary research.

期刊最新文献

Issue Publication Information Issue Editorial Masthead Competitive Inhibition as a Tool to Modulate and Predict Dynamic Hydrogel Mechanics Enzyme-Activated Self-Assembling Peptides Mimicking Adiponectin Multimers for Nonalcoholic Fatty Liver Disease Therapy Which Reaction Conditions Work on Drug-Like Molecules? Lessons from 66,000 High-Throughput Experiments