Lessons learned during the journey of data: from experiment to model for predicting kinase affinity, selectivity, polypharmacology, and resistance

bioRxiv - Biophysics Pub Date : 2024-09-10 DOI:10.1101/2024.09.10.612176

Raquel Lopez-Rios de Castro, Jaime Rodriguez-Guerra, David Schaller, Talia B. Kimber, Corey Taylor, Jessica B White, Michael Backenkohler, Alexander Payne, Ben Kaminow, Ivan Pulido, Sukrit Singh, Paula Linh Krammer, Guillermo Perez-Hernandez, Andrea Volkamer, John D. Chodera

{"title":"Lessons learned during the journey of data: from experiment to model for predicting kinase affinity, selectivity, polypharmacology, and resistance","authors":"Raquel Lopez-Rios de Castro, Jaime Rodriguez-Guerra, David Schaller, Talia B. Kimber, Corey Taylor, Jessica B White, Michael Backenkohler, Alexander Payne, Ben Kaminow, Ivan Pulido, Sukrit Singh, Paula Linh Krammer, Guillermo Perez-Hernandez, Andrea Volkamer, John D. Chodera","doi":"10.1101/2024.09.10.612176","DOIUrl":null,"url":null,"abstract":"Recent advances in machine learning (ML) are reshaping drug discovery. Structure-based ML methods use physically-inspired models to predict binding affinities from protein:ligand complexes. These methods promise to enable the integration of data for many related targets, which addresses issues related to data scarcity for single targets and could enable generalizable predictions for a broad range of targets, including mutants. In this work, we report our experiences in building KinoML, a novel framework for ML in target-based small molecule drug discovery with an emphasis on structure-enabled methods. KinoML focuses currently on kinases as the relative structural conservation of this protein superfamily, particularly in the kinase domain, means it is possible to leverage data from the entire superfamily to make structure-informed predictions about binding affinities, selectivities, and drug resistance. Some key lessons learned in building KinoML include: the importance of reproducible data collection and deposition, the harmonization of molecular data and featurization, and the choice of the right data format to ensure reusability and reproducibility of ML models. As a result, KinoML allows users to easily achieve three tasks: accessing and curating molecular data; featurizing this data with representations suitable for ML applications; and running reproducible ML experiments that require access to ligand, protein, and assay information to predict ligand affinity. Despite KinoML focusing on kinases, this framework can be applied to other proteins. The lessons reported here can help guide the development of platforms for structure-enabled ML in other areas of drug discovery.","PeriodicalId":501048,"journal":{"name":"bioRxiv - Biophysics","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Biophysics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.10.612176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in machine learning (ML) are reshaping drug discovery. Structure-based ML methods use physically-inspired models to predict binding affinities from protein:ligand complexes. These methods promise to enable the integration of data for many related targets, which addresses issues related to data scarcity for single targets and could enable generalizable predictions for a broad range of targets, including mutants. In this work, we report our experiences in building KinoML, a novel framework for ML in target-based small molecule drug discovery with an emphasis on structure-enabled methods. KinoML focuses currently on kinases as the relative structural conservation of this protein superfamily, particularly in the kinase domain, means it is possible to leverage data from the entire superfamily to make structure-informed predictions about binding affinities, selectivities, and drug resistance. Some key lessons learned in building KinoML include: the importance of reproducible data collection and deposition, the harmonization of molecular data and featurization, and the choice of the right data format to ensure reusability and reproducibility of ML models. As a result, KinoML allows users to easily achieve three tasks: accessing and curating molecular data; featurizing this data with representations suitable for ML applications; and running reproducible ML experiments that require access to ligand, protein, and assay information to predict ligand affinity. Despite KinoML focusing on kinases, this framework can be applied to other proteins. The lessons reported here can help guide the development of platforms for structure-enabled ML in other areas of drug discovery.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

数据之旅中的经验教训：从实验到模型，预测激酶的亲和性、选择性、多药理学和抗药性

机器学习（ML）的最新进展正在重塑药物发现。基于结构的 ML 方法使用物理启发模型来预测蛋白质配体复合物的结合亲和力。这些方法有望整合许多相关靶点的数据，从而解决与单一靶点数据稀缺有关的问题，并能对包括突变体在内的广泛靶点进行通用预测。在这项工作中，我们报告了我们在构建 KinoML 方面的经验，KinoML 是基于靶点的小分子药物发现中的 ML 新框架，重点是结构赋能方法。KinoML 目前的重点是激酶，因为这个蛋白质超家族的结构相对保守，尤其是在激酶结构域，这意味着有可能利用整个超家族的数据，对结合亲和力、选择性和耐药性进行结构性预测。在构建 KinoML 的过程中学到的一些关键经验包括：可重复数据收集和沉积的重要性、分子数据的统一和特征化，以及选择正确的数据格式以确保 ML 模型的可重用性和可重复性。因此，KinoML 能让用户轻松完成三项任务：访问和整理分子数据；用适合 ML 应用的表示方法对这些数据进行特征化；运行可重现的 ML 实验，这些实验需要访问配体、蛋白质和检测信息，以预测配体亲和力。尽管 KinoML 专注于激酶，但这一框架也可应用于其他蛋白质。这里报告的经验有助于指导其他药物发现领域的结构化 ML 平台的开发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

bioRxiv - Biophysics

自引率

0.00%

发文量