Moreno Wichert, Laura Guasch and Raphael M. Franzini*,
{"title":"DNA 编码文库数据解读的挑战与前景","authors":"Moreno Wichert, Laura Guasch and Raphael M. Franzini*, ","doi":"10.1021/acs.chemrev.4c0028410.1021/acs.chemrev.4c00284","DOIUrl":null,"url":null,"abstract":"<p >DNA-encoded library (DEL) technology is a powerful platform for the efficient identification of novel chemical matter in the early drug discovery process enabled by parallel screening of vast libraries of encoded small molecules through affinity selection and deep sequencing. While DEL selections provide rich data sets for computational drug discovery, the underlying technical factors influencing DEL data remain incompletely understood. This review systematically examines the key parameters affecting the chemical information in DEL data and their impact on hit triaging and machine learning integration. The need for rigorous data handling and interpretation is emphasized, with standardized methods being critical for the success of DEL-based approaches. Major challenges include the relationship between sequence counts and binding affinities, frequent hitters, and the influence of factors such as inhomogeneous library composition, DNA damage, and linkers on binding modes. Experimental artifacts, such as those caused by protein immobilization and screening matrix effects, further complicate data interpretation. Recent advancements in using machine learning to denoise DEL data and predict drug candidates are highlighted. This review offers practical guidance on adopting best practices for integrating robust methodologies, comprehensive data analysis, and computational tools to improve the accuracy and efficacy of DEL-driven hit discovery.</p>","PeriodicalId":32,"journal":{"name":"Chemical Reviews","volume":"124 22","pages":"12551–12572 12551–12572"},"PeriodicalIF":51.4000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Challenges and Prospects of DNA-Encoded Library Data Interpretation\",\"authors\":\"Moreno Wichert, Laura Guasch and Raphael M. Franzini*, \",\"doi\":\"10.1021/acs.chemrev.4c0028410.1021/acs.chemrev.4c00284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >DNA-encoded library (DEL) technology is a powerful platform for the efficient identification of novel chemical matter in the early drug discovery process enabled by parallel screening of vast libraries of encoded small molecules through affinity selection and deep sequencing. While DEL selections provide rich data sets for computational drug discovery, the underlying technical factors influencing DEL data remain incompletely understood. This review systematically examines the key parameters affecting the chemical information in DEL data and their impact on hit triaging and machine learning integration. The need for rigorous data handling and interpretation is emphasized, with standardized methods being critical for the success of DEL-based approaches. Major challenges include the relationship between sequence counts and binding affinities, frequent hitters, and the influence of factors such as inhomogeneous library composition, DNA damage, and linkers on binding modes. Experimental artifacts, such as those caused by protein immobilization and screening matrix effects, further complicate data interpretation. Recent advancements in using machine learning to denoise DEL data and predict drug candidates are highlighted. This review offers practical guidance on adopting best practices for integrating robust methodologies, comprehensive data analysis, and computational tools to improve the accuracy and efficacy of DEL-driven hit discovery.</p>\",\"PeriodicalId\":32,\"journal\":{\"name\":\"Chemical Reviews\",\"volume\":\"124 22\",\"pages\":\"12551–12572 12551–12572\"},\"PeriodicalIF\":51.4000,\"publicationDate\":\"2024-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemical Reviews\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.chemrev.4c00284\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Reviews","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.chemrev.4c00284","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
DNA 编码文库(DEL)技术是在早期药物发现过程中高效识别新型化学物质的强大平台,它通过亲和选择和深度测序对大量编码小分子文库进行平行筛选。虽然 DEL 筛选为计算药物发现提供了丰富的数据集,但人们对影响 DEL 数据的基本技术因素仍不甚了解。本综述系统地探讨了影响 DEL 数据中化学信息的关键参数及其对命中分选和机器学习整合的影响。强调了严格处理和解释数据的必要性,标准化方法是基于 DEL 方法取得成功的关键。面临的主要挑战包括序列数与结合亲和力之间的关系、常中者以及非均质文库组成、DNA损伤和连接体等因素对结合模式的影响。蛋白质固定和筛选基质效应等造成的实验假象使数据解读更加复杂。本文重点介绍了利用机器学习对 DEL 数据进行去噪和预测候选药物的最新进展。本综述为采用最佳实践提供了实用指导,以整合稳健的方法学、全面的数据分析和计算工具,提高 DEL 驱动的药物发现的准确性和有效性。
Challenges and Prospects of DNA-Encoded Library Data Interpretation
DNA-encoded library (DEL) technology is a powerful platform for the efficient identification of novel chemical matter in the early drug discovery process enabled by parallel screening of vast libraries of encoded small molecules through affinity selection and deep sequencing. While DEL selections provide rich data sets for computational drug discovery, the underlying technical factors influencing DEL data remain incompletely understood. This review systematically examines the key parameters affecting the chemical information in DEL data and their impact on hit triaging and machine learning integration. The need for rigorous data handling and interpretation is emphasized, with standardized methods being critical for the success of DEL-based approaches. Major challenges include the relationship between sequence counts and binding affinities, frequent hitters, and the influence of factors such as inhomogeneous library composition, DNA damage, and linkers on binding modes. Experimental artifacts, such as those caused by protein immobilization and screening matrix effects, further complicate data interpretation. Recent advancements in using machine learning to denoise DEL data and predict drug candidates are highlighted. This review offers practical guidance on adopting best practices for integrating robust methodologies, comprehensive data analysis, and computational tools to improve the accuracy and efficacy of DEL-driven hit discovery.
期刊介绍:
Chemical Reviews is a highly regarded and highest-ranked journal covering the general topic of chemistry. Its mission is to provide comprehensive, authoritative, critical, and readable reviews of important recent research in organic, inorganic, physical, analytical, theoretical, and biological chemistry.
Since 1985, Chemical Reviews has also published periodic thematic issues that focus on a single theme or direction of emerging research.