首页 > 最新文献

Molecular Informatics最新文献

英文 中文
Chemoinformatic regression methods and their applicability domain. 化学信息回归方法及其适用领域。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 Epub Date: 2024-05-28 DOI: 10.1002/minf.202400018
Thomas-Martin Dutschmann, Valerie Schlenker, Knut Baumann

The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.

随着人们对化学信息模型不确定性的兴趣与日俱增,需要对最广泛使用的回归技术以及如何估计其可靠性进行总结。回归模型学习从解释变量空间到连续输出值空间的映射。除其他局限性外,模型的预测性能还受到用于模型拟合的训练数据的限制。通过离群点检测方法识别异常对象可以提高模型的性能。此外,正确的模型评估还需要定义模型的局限性,也就是通常所说的适用范围。与某些分类器类似,一些回归技术带有量化其(不)确定性的内置方法或增强功能,而另一些则依赖于通用程序。本文将解释其工作原理的理论背景,以及如何推导出适用范围的具体和一般定义。
{"title":"Chemoinformatic regression methods and their applicability domain.","authors":"Thomas-Martin Dutschmann, Valerie Schlenker, Knut Baumann","doi":"10.1002/minf.202400018","DOIUrl":"10.1002/minf.202400018","url":null,"abstract":"<p><p>The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400018"},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141158308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural analysis of neomycin B and kanamycin A binding Aminoglycosides Modifying Enzymes (AME) and bacterial ribosomal RNA. 新霉素 B 和卡那霉素 A 与氨基糖苷类药物修饰酶 (AME) 和细菌核糖体 RNA 结合的结构分析。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 Epub Date: 2024-06-10 DOI: 10.1002/minf.202300339
Julia Revillo Imbernon, Jean-Marc Weibel, Eric Ennifar, Gilles Prévost, Esther Kellenberger

Aminoglycosides are crucial antibiotics facing challenges from bacterial resistance. This study addresses the importance of aminoglycoside modifying enzymes in the context of escalating resistance. Drawing upon over two decades of structural data in the Protein Data Bank, we focused on two key antibiotics, neomycin B and kanamycin A, to explore how the aminoglycoside structure is exploited by this family of enzymes. A systematic comparison across diverse enzymes and the RNA A-site target identified common characteristics in the recognition mode, while assessing the adaptability of neomycin B and kanamycin A in various environments.

氨基糖苷类药物是面临细菌耐药性挑战的重要抗生素。本研究探讨了在耐药性不断升级的背景下氨基糖苷类药物修饰酶的重要性。利用蛋白质数据库中二十多年的结构数据,我们重点研究了两种关键抗生素--新霉素 B 和卡那霉素 A,以探索氨基糖苷类结构是如何被该酶家族利用的。我们对不同的酶和 RNA A 位点目标进行了系统比较,确定了识别模式的共同特征,同时评估了新霉素 B 和卡那霉素 A 在各种环境中的适应性。
{"title":"Structural analysis of neomycin B and kanamycin A binding Aminoglycosides Modifying Enzymes (AME) and bacterial ribosomal RNA.","authors":"Julia Revillo Imbernon, Jean-Marc Weibel, Eric Ennifar, Gilles Prévost, Esther Kellenberger","doi":"10.1002/minf.202300339","DOIUrl":"10.1002/minf.202300339","url":null,"abstract":"<p><p>Aminoglycosides are crucial antibiotics facing challenges from bacterial resistance. This study addresses the importance of aminoglycoside modifying enzymes in the context of escalating resistance. Drawing upon over two decades of structural data in the Protein Data Bank, we focused on two key antibiotics, neomycin B and kanamycin A, to explore how the aminoglycoside structure is exploited by this family of enzymes. A systematic comparison across diverse enzymes and the RNA A-site target identified common characteristics in the recognition mode, while assessing the adaptability of neomycin B and kanamycin A in various environments.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300339"},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141296441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing search algorithms on the retrosynthesis problem. 比较逆合成问题的搜索算法。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 Epub Date: 2024-06-12 DOI: 10.1002/minf.202300259
Milo Roucairol, Tristan Cazenave

In this article we try different algorithms, namely Nested Monte Carlo Search and Greedy Best First Search, on AstraZeneca's open source retrosynthetic tool : AiZynthFinder. We compare these algorithms to AiZynthFinder's base Monte Carlo Tree Search on a benchmark selected from the PubChem database and by Bayer's chemists. We show that both Nested Monte Carlo Search and Greedy Best First Search outperform AstraZeneca's Monte Carlo Tree Search, with a slight advantage for Nested Monte Carlo Search while experimenting on a playout heuristic. We also show how the search algorithms are bounded by the quality of the policy network, in order to improve our results the next step is to improve the policy network.

在本文中,我们在 AstraZeneca 的开源逆合成工具 AiZynthFinder 上尝试了不同的算法,即嵌套蒙特卡罗搜索和贪婪最佳优先搜索。我们将这些算法与 AiZynthFinder 的基本蒙特卡洛树搜索进行了比较,比较的基准是从 PubChem 数据库和拜耳的化学家那里挑选出来的。我们的结果表明,嵌套蒙特卡罗搜索和贪婪最佳优先搜索都优于 AstraZeneca 的蒙特卡罗树形搜索,而嵌套蒙特卡罗搜索在实验中采用了启发式,略胜一筹。我们还展示了搜索算法如何受到策略网络质量的限制,为了改进我们的结果,下一步就是改进策略网络。
{"title":"Comparing search algorithms on the retrosynthesis problem.","authors":"Milo Roucairol, Tristan Cazenave","doi":"10.1002/minf.202300259","DOIUrl":"10.1002/minf.202300259","url":null,"abstract":"<p><p>In this article we try different algorithms, namely Nested Monte Carlo Search and Greedy Best First Search, on AstraZeneca's open source retrosynthetic tool : AiZynthFinder. We compare these algorithms to AiZynthFinder's base Monte Carlo Tree Search on a benchmark selected from the PubChem database and by Bayer's chemists. We show that both Nested Monte Carlo Search and Greedy Best First Search outperform AstraZeneca's Monte Carlo Tree Search, with a slight advantage for Nested Monte Carlo Search while experimenting on a playout heuristic. We also show how the search algorithms are bounded by the quality of the policy network, in order to improve our results the next step is to improve the policy network.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300259"},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141306331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Best of both worlds: An expansion of the state of the art pKa model with data from three industrial partners 两全其美:利用三个工业合作伙伴提供的数据扩展最先进的 pKa 模型
IF 3.6 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-06-21 DOI: 10.1002/minf.202400088
Robert Fraczkiewicz, Huy Quoc Nguyen, Newton Wu, Nina Kausch‐Busies, Sergio Grimbs, Kai Sommer, Antonius ter Laak, Judith Günther, Björn Wagner, Michael Reutlinger
In a unique collaboration between Simulations Plus and several industrial partners, we were able to develop a new version 11.0 of the previously published in silico pKa model, S+pKa, with considerably improved prediction accuracy. The model's training set was vastly expanded by large amounts of experimental data obtained from F. Hoffmann‐La Roche AG, Genentech Inc., and the Crop Science division of Bayer AG. The previous v7.0 of S+pKa was trained on data from public sources and the Pharmaceutical division of Bayer AG. The model has shown dramatic improvements in predictive accuracy when externally validated on three new contributor compound sets. Less expected was v11.0’s improvement in prediction on new compounds developed at Bayer Pharma after v7.0 was released (2013–2023), even without contributing additional data to v11.0. We illustrate chemical space coverage by chemistries encountered in the five domains, public and industrial, outline model construction, and discuss factors contributing to model's success.
通过 Simulations Plus 与几家工业合作伙伴之间的独特合作,我们开发出了之前发布的硅 pKa 模型 S+pKa 的 11.0 新版本,大大提高了预测准确性。通过从 F. Hoffmann-La Roche AG、Genentech Inc.之前的 S+pKa v7.0 版本是根据来自公共资源和拜耳股份公司制药部门的数据进行训练的。在对三个新的贡献化合物集进行外部验证时,该模型的预测准确性有了显著提高。较少预期的是,即使没有为 v11.0 提供额外数据,v11.0 在 v7.0 发布后(2013-2023 年)对拜耳医药公司开发的新化合物的预测能力也有所提高。我们通过五个领域(公共领域和工业领域)中遇到的化学物质说明了化学空间的覆盖范围,概述了模型的构建,并讨论了模型成功的因素。
{"title":"Best of both worlds: An expansion of the state of the art pKa model with data from three industrial partners","authors":"Robert Fraczkiewicz, Huy Quoc Nguyen, Newton Wu, Nina Kausch‐Busies, Sergio Grimbs, Kai Sommer, Antonius ter Laak, Judith Günther, Björn Wagner, Michael Reutlinger","doi":"10.1002/minf.202400088","DOIUrl":"https://doi.org/10.1002/minf.202400088","url":null,"abstract":"In a unique collaboration between Simulations Plus and several industrial partners, we were able to develop a new version 11.0 of the previously published <jats:italic>in silico</jats:italic> pK<jats:sub>a</jats:sub> model, S+pKa, with considerably improved prediction accuracy. The model's training set was vastly expanded by large amounts of experimental data obtained from F. Hoffmann‐La Roche AG, Genentech Inc., and the Crop Science division of Bayer AG. The previous v7.0 of S+pKa was trained on data from public sources and the Pharmaceutical division of Bayer AG. The model has shown dramatic improvements in predictive accuracy when externally validated on three new contributor compound sets. Less expected was v11.0’s improvement in prediction on new compounds developed at Bayer Pharma after v7.0 was released (2013–2023), even without contributing additional data to v11.0. We illustrate chemical space coverage by chemistries encountered in the five domains, public and industrial, outline model construction, and discuss factors contributing to model's success.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"86 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring drug repositioning possibilities of kinase inhibitors via molecular simulation** 通过分子模拟探索激酶抑制剂药物重新定位的可能性**
IF 3.6 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-06-21 DOI: 10.1002/minf.202300336
Qing‐Xin Wang, Jiao Cai, Zi‐Jun Chen, Jia‐Chuan Liu, Jing‐Jing Wang, Hai Zhou, Qing‐Qing Li, Zi‐Xuan Wang, Yi‐Bo Wang, Zhen‐Jiang Tong, Jin Yang, Tian‐Hua Wei, Meng‐Yuan Zhang, Yun Zhou, Wei‐Chen Dai, Ning Ding, Xue‐Jiao Leng, Xiao‐Ying Yin, Shan‐Liang Sun, Yan‐Cheng Yu, Nian‐Guang Li, Zhi‐Hao Shi
Kinases, a class of enzymes controlling various substrates phosphorylation, are pivotal in both physiological and pathological processes. Although their conserved ATP binding pockets pose challenges for achieving selectivity, this feature offers opportunities for drug repositioning of kinase inhibitors (KIs). This study presents a cost‐effective in silico prediction of KIs drug repositioning via analyzing cross‐docking results. We established the KIs database (278 unique KIs, 1834 bioactivity data points) and kinases database (357 kinase structures categorized by the DFG motif) for carrying out cross‐docking. Comparative analysis of the docking scores and reported experimental bioactivity revealed that the Atypical, TK, and TKL superfamilies are suitable for drug repositioning. Among these kinase superfamilies, Olverematinib, Lapatinib, and Abemaciclib displayed enzymatic activity in our focused AKT‐PI3K‐mTOR pathway with IC50 values of 3.3, 3.2 and 5.8 μM. Further cell assays showed IC50 values of 0.2, 1.2 and 0.6 μM in tumor cells. The consistent result between prediction and validation demonstrated that repositioning KIs via in silico method is feasible.
激酶是一类控制各种底物磷酸化的酶,在生理和病理过程中都起着关键作用。尽管激酶保守的 ATP 结合口袋给实现选择性带来了挑战,但这一特点为激酶抑制剂(KIs)的药物重新定位提供了机会。本研究通过分析交叉对接结果,提出了一种经济有效的 KIs 药物重新定位的硅学预测方法。我们建立了 KIs 数据库(278 种独特的 KIs,1834 个生物活性数据点)和激酶数据库(按 DFG 主题分类的 357 种激酶结构),用于进行交叉对接。对对接得分和实验生物活性的比较分析表明,非典型激酶超家族、TK 激酶超家族和 TKL 激酶超家族适合药物重新定位。在这些激酶超家族中,Olverematinib、Lapatinib 和 Abemaciclib 在我们重点研究的 AKT-PI3K-mTOR 通路中显示出酶活性,IC50 值分别为 3.3、3.2 和 5.8 μM。进一步的细胞检测显示,肿瘤细胞的 IC50 值分别为 0.2、1.2 和 0.6 μM。预测和验证结果的一致性表明,通过硅学方法重新定位 KIs 是可行的。
{"title":"Exploring drug repositioning possibilities of kinase inhibitors via molecular simulation**","authors":"Qing‐Xin Wang, Jiao Cai, Zi‐Jun Chen, Jia‐Chuan Liu, Jing‐Jing Wang, Hai Zhou, Qing‐Qing Li, Zi‐Xuan Wang, Yi‐Bo Wang, Zhen‐Jiang Tong, Jin Yang, Tian‐Hua Wei, Meng‐Yuan Zhang, Yun Zhou, Wei‐Chen Dai, Ning Ding, Xue‐Jiao Leng, Xiao‐Ying Yin, Shan‐Liang Sun, Yan‐Cheng Yu, Nian‐Guang Li, Zhi‐Hao Shi","doi":"10.1002/minf.202300336","DOIUrl":"https://doi.org/10.1002/minf.202300336","url":null,"abstract":"Kinases, a class of enzymes controlling various substrates phosphorylation, are pivotal in both physiological and pathological processes. Although their conserved ATP binding pockets pose challenges for achieving selectivity, this feature offers opportunities for drug repositioning of kinase inhibitors (KIs). This study presents a cost‐effective in silico prediction of KIs drug repositioning via analyzing cross‐docking results. We established the KIs database (278 unique KIs, 1834 bioactivity data points) and kinases database (357 kinase structures categorized by the DFG motif) for carrying out cross‐docking. Comparative analysis of the docking scores and reported experimental bioactivity revealed that the Atypical, TK, and TKL superfamilies are suitable for drug repositioning. Among these kinase superfamilies, Olverematinib, Lapatinib, and Abemaciclib displayed enzymatic activity in our focused AKT‐PI3K‐mTOR pathway with IC<jats:sub>50</jats:sub> values of 3.3, 3.2 and 5.8 μM. Further cell assays showed IC<jats:sub>50</jats:sub> values of 0.2, 1.2 and 0.6 μM in tumor cells. The consistent result between prediction and validation demonstrated that repositioning KIs via <jats:italic>in silico</jats:italic> method is feasible.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"27 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Updating and profiling the natural product‐likeness of Latin American compound libraries 更新和剖析拉丁美洲化合物库的天然产品相似性
IF 3.6 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-06-21 DOI: 10.1002/minf.202400052
Alejandro Gómez‐García, Ann‐Kathrin Prinz, Daniel A. Acuña Jiménez, William J. Zamora, Haruna L. Barazorda‐Ccahuana, Miguel Á. Chávez‐Fumagalli, Marilia Valli, Adriano D. Andricopulo, Vanderlan da S. Bolzani, Dionisio A. Olmedo, Pablo N. Solís, Marvin J. Núñez, Johny R. Rodríguez Pérez, Hoover A. Valencia Sánchez, Héctor F. Cortés Hernández, Oscar M. Mosquera Martinez, Oliver Koch, José L. Medina‐Franco
Compound databases of natural products play a crucial role in drug discovery and development projects and have implications in other areas, such as food chemical research, ecology and metabolomics. Recently, we put together the first version of the Latin American Natural Product database (LANaPDB) as a collective effort of researchers from six countries to ensemble a public and representative library of natural products in a geographical region with a large biodiversity. The present work aims to conduct a comparative and extensive profiling of the natural product‐likeness of an updated version of LANaPDB and the individual ten compound databases that form part of LANaPDB. The natural product‐likeness profile of the Latin American compound databases is contrasted with the profile of other major natural product databases in the public domain and a set of small‐molecule drugs approved for clinical use. As part of the extensive characterization, we employed several chemoinformatics metrics of natural product likeness. The results of this study will capture the attention of the global community engaged in natural product databases, not only in Latin America but across the world.
天然产品化合物数据库在药物发现和开发项目中发挥着至关重要的作用,对食品化学研究、生态学和代谢组学等其他领域也有影响。最近,我们建立了拉丁美洲天然产物数据库(LANaPDB)的第一个版本,这是来自六个国家的研究人员共同努力的成果,目的是在生物多样性丰富的地理区域建立一个具有代表性的公共天然产物库。本工作旨在对更新版 LANaPDB 和构成 LANaPDB 一部分的十个化合物数据库的天然产品相似性进行广泛的比较分析。拉美化合物数据库的天然产品相似性特征与公共领域的其他主要天然产品数据库和一组已批准用于临床的小分子药物的特征进行了对比。作为广泛特征描述的一部分,我们采用了几种天然产物相似性的化学信息学指标。这项研究的结果将引起全球从事天然产品数据库研究的各界人士的关注,不仅在拉丁美洲,而且在全世界。
{"title":"Updating and profiling the natural product‐likeness of Latin American compound libraries","authors":"Alejandro Gómez‐García, Ann‐Kathrin Prinz, Daniel A. Acuña Jiménez, William J. Zamora, Haruna L. Barazorda‐Ccahuana, Miguel Á. Chávez‐Fumagalli, Marilia Valli, Adriano D. Andricopulo, Vanderlan da S. Bolzani, Dionisio A. Olmedo, Pablo N. Solís, Marvin J. Núñez, Johny R. Rodríguez Pérez, Hoover A. Valencia Sánchez, Héctor F. Cortés Hernández, Oscar M. Mosquera Martinez, Oliver Koch, José L. Medina‐Franco","doi":"10.1002/minf.202400052","DOIUrl":"https://doi.org/10.1002/minf.202400052","url":null,"abstract":"Compound databases of natural products play a crucial role in drug discovery and development projects and have implications in other areas, such as food chemical research, ecology and metabolomics. Recently, we put together the first version of the Latin American Natural Product database (LANaPDB) as a collective effort of researchers from six countries to ensemble a public and representative library of natural products in a geographical region with a large biodiversity. The present work aims to conduct a comparative and extensive profiling of the natural product‐likeness of an updated version of LANaPDB and the individual ten compound databases that form part of LANaPDB. The natural product‐likeness profile of the Latin American compound databases is contrasted with the profile of other major natural product databases in the public domain and a set of small‐molecule drugs approved for clinical use. As part of the extensive characterization, we employed several chemoinformatics metrics of natural product likeness. The results of this study will capture the attention of the global community engaged in natural product databases, not only in Latin America but across the world.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"12 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of adverse drug reactions due to genetic predisposition using deep neural networks. 利用深度神经网络预测遗传倾向导致的药物不良反应。
IF 3.6 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-06-01 Epub Date: 2024-06-08 DOI: 10.1002/minf.202400021
Bryan Dafniet, Olivier Taboureau

Drug development is a long and costly process, often limited by the toxicity and adverse drug reactions (ADRs) caused by drug candidates. Even on the market, some drugs can cause strong ADRs that can vary depending on an individual polymorphism. The development of Genome-wide association studies (GWAS) allowed the discovery of genetic variants of interest that may cause these effects. In this study, the objective was to investigate a deep learning approach to predict genetic variations potentially related to ADRs. We used single nucleotide polymorphisms (SNPs) information from dbSNP to create a network based on ADR-drug-target-mutations and extracted matrixes of interaction to build deep Neural Networks (DNN) models. Considering only information about mutations known to impact drug efficacy and drug safety from PharmGKB and drug adverse reactions based on the MedDRA System Organ Classes (SOCs), these DNN models reached a balanced accuracy of 0.61 in average. Including molecular fingerprints representing structural features of the drugs did not improve the performance of the models. To our knowledge, this is the first model that exploits DNN to predict ADR-drug-target-mutations. Although some improvements are suggested, these models can be of interest to analyze multiple compounds over all of the genes and polymorphisms information accessible and thus pave the way in precision medicine.

药物开发是一个漫长而昂贵的过程,往往受到候选药物的毒性和药物不良反应(ADRs)的限制。即使在市场上,一些药物也会引起强烈的药物不良反应,这些不良反应会因个体多态性的不同而不同。随着全基因组关联研究(GWAS)的发展,人们发现了可能导致这些影响的相关基因变异。本研究的目的是研究一种深度学习方法,以预测可能与 ADRs 相关的遗传变异。我们利用来自 dbSNP 的单核苷酸多态性(SNPs)信息创建了一个基于 ADR-药物-目标-突变的网络,并提取了相互作用矩阵来构建深度神经网络(DNN)模型。仅考虑到 PharmGKB 中已知会影响药物疗效和药物安全性的突变信息,以及基于 MedDRA 系统器官分类(SOCs)的药物不良反应,这些 DNN 模型的平均平衡准确率达到了 0.61。加入代表药物结构特征的分子指纹并没有提高模型的性能。据我们所知,这是首个利用 DNN 预测 ADR-药物-靶点突变的模型。虽然我们提出了一些改进建议,但这些模型可以用于分析可获取的所有基因和多态性信息中的多种化合物,从而为精准医疗铺平道路。
{"title":"Prediction of adverse drug reactions due to genetic predisposition using deep neural networks.","authors":"Bryan Dafniet, Olivier Taboureau","doi":"10.1002/minf.202400021","DOIUrl":"10.1002/minf.202400021","url":null,"abstract":"<p><p>Drug development is a long and costly process, often limited by the toxicity and adverse drug reactions (ADRs) caused by drug candidates. Even on the market, some drugs can cause strong ADRs that can vary depending on an individual polymorphism. The development of Genome-wide association studies (GWAS) allowed the discovery of genetic variants of interest that may cause these effects. In this study, the objective was to investigate a deep learning approach to predict genetic variations potentially related to ADRs. We used single nucleotide polymorphisms (SNPs) information from dbSNP to create a network based on ADR-drug-target-mutations and extracted matrixes of interaction to build deep Neural Networks (DNN) models. Considering only information about mutations known to impact drug efficacy and drug safety from PharmGKB and drug adverse reactions based on the MedDRA System Organ Classes (SOCs), these DNN models reached a balanced accuracy of 0.61 in average. Including molecular fingerprints representing structural features of the drugs did not improve the performance of the models. To our knowledge, this is the first model that exploits DNN to predict ADR-drug-target-mutations. Although some improvements are suggested, these models can be of interest to analyze multiple compounds over all of the genes and polymorphisms information accessible and thus pave the way in precision medicine.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400021"},"PeriodicalIF":3.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141293574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Molecular Docking using Machine Learning Methods. 利用机器学习方法加速分子对接。
IF 3.6 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-06-01 Epub Date: 2024-06-08 DOI: 10.1002/minf.202300167
Abdulsalam Y Bande, Sefer Baday

Virtual screening (VS) is one of the well-established approaches in drug discovery which speeds up the search for a bioactive molecule and, reduces costs and efforts associated with experiments. VS helps to narrow down the search space of chemical space and allows selecting fewer and more probable candidate compounds for experimental testing. Docking calculations are one of the commonly used and highly appreciated structure-based drug discovery methods. Databases for chemical structures of small molecules have been growing rapidly. However, at the moment virtual screening of large libraries via docking is not very common. In this work, we aim to accelerate docking studies by predicting docking scores without explicitly performing docking calculations. We experimented with an attention based long short-term memory (LSTM) neural network for an efficient prediction of docking scores as well as other machine learning models such as XGBoost. By using docking scores of a small number of ligands we trained our models and predicted docking scores of a few million molecules. Specifically, we tested our approaches on 11 datasets that were produced from in-house drug discovery studies. On average, by training models using only 7000 molecules we predicted docking scores of approximately 3.8 million molecules with R2 (coefficient of determination) of 0.77 and Spearman rank correlation coefficient of 0.85. We designed the system with ease of use in mind. All the user needs to provide is a csv file containing SMILES and their respective docking scores, the system then outputs a model that the user can use for the prediction of docking score for a new molecule.

虚拟筛选(VS)是药物发现中一种行之有效的方法,它能加快寻找生物活性分子的速度,降低实验成本和工作量。VS 有助于缩小化学空间的搜索范围,从而选择更少、更可能的候选化合物进行实验测试。Docking 计算是常用的、备受赞赏的基于结构的药物发现方法之一。小分子化学结构数据库一直在快速增长。然而,目前通过对接对大型文库进行虚拟筛选的方法并不常见。在这项工作中,我们旨在通过预测对接得分来加速对接研究,而无需明确执行对接计算。我们试验了一种基于注意力的长短期记忆(LSTM)神经网络,用于高效预测对接得分,以及其他机器学习模型,如 XGBoost。通过使用少量配体的对接得分,我们训练了模型,并预测了几百万个分子的对接得分。具体来说,我们在内部药物发现研究产生的 11 个数据集上测试了我们的方法。平均而言,通过仅使用 7000 个分子训练模型,我们预测了约 380 万个分子的对接得分,R2(决定系数)为 0.77,斯皮尔曼等级相关系数为 0.85。我们在设计该系统时考虑到了易用性。用户只需提供一个包含 SMILES 及其各自对接得分的 csv 文件,系统就会输出一个模型,用户可以用它来预测新分子的对接得分。
{"title":"Accelerating Molecular Docking using Machine Learning Methods.","authors":"Abdulsalam Y Bande, Sefer Baday","doi":"10.1002/minf.202300167","DOIUrl":"10.1002/minf.202300167","url":null,"abstract":"<p><p>Virtual screening (VS) is one of the well-established approaches in drug discovery which speeds up the search for a bioactive molecule and, reduces costs and efforts associated with experiments. VS helps to narrow down the search space of chemical space and allows selecting fewer and more probable candidate compounds for experimental testing. Docking calculations are one of the commonly used and highly appreciated structure-based drug discovery methods. Databases for chemical structures of small molecules have been growing rapidly. However, at the moment virtual screening of large libraries via docking is not very common. In this work, we aim to accelerate docking studies by predicting docking scores without explicitly performing docking calculations. We experimented with an attention based long short-term memory (LSTM) neural network for an efficient prediction of docking scores as well as other machine learning models such as XGBoost. By using docking scores of a small number of ligands we trained our models and predicted docking scores of a few million molecules. Specifically, we tested our approaches on 11 datasets that were produced from in-house drug discovery studies. On average, by training models using only 7000 molecules we predicted docking scores of approximately 3.8 million molecules with R<sup>2</sup> (coefficient of determination) of 0.77 and Spearman rank correlation coefficient of 0.85. We designed the system with ease of use in mind. All the user needs to provide is a csv file containing SMILES and their respective docking scores, the system then outputs a model that the user can use for the prediction of docking score for a new molecule.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300167"},"PeriodicalIF":3.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141293572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FetoML: Interpretable predictions of the fetotoxicity of drugs based on machine learning approaches. FetoML:基于机器学习方法的药物胎儿毒性可解读预测。
IF 3.6 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-06-01 Epub Date: 2024-06-08 DOI: 10.1002/minf.202300312
Myeonghyeon Jeong, Sunyong Yoo

Pregnant females may use medications to manage health problems that develop during pregnancy or that they had prior to pregnancy. However, using medications during pregnancy has a potential risk to the fetus. Assessing the fetotoxicity of drugs is essential to ensure safe treatments, but the current process is challenged by ethical issues, time, and cost. Therefore, the need for in silico models to efficiently assess the fetotoxicity of drugs has recently emerged. Previous studies have proposed successful machine learning models for fetotoxicity prediction and even suggest molecular substructures that are possibly associated with fetotoxicity risks or protective effects. However, the interpretation of the decisions of the models on fetotoxicity prediction for each drug is still insufficient. This study constructed machine learning-based models that can predict the fetotoxicity of drugs while providing explanations for the decisions. For this, permutation feature importance was used to identify the general features that the model made significant in predicting the fetotoxicity of drugs. In addition, features associated with fetotoxicity for each drug were analyzed using the attention mechanism. The predictive performance of all the constructed models was significantly high (AUROC: 0.854-0.974, AUPR: 0.890-0.975). Furthermore, we conducted literature reviews on the predicted important features and found that they were highly associated with fetotoxicity. We expect that our model will benefit fetotoxicity research by providing an evaluation of fetotoxicity risks for drugs or drug candidates, along with an interpretation of that prediction.

孕妇可能会使用药物来控制怀孕期间出现的健康问题或怀孕前的健康问题。然而,孕期用药对胎儿有潜在风险。评估药物的胎儿毒性对确保治疗安全至关重要,但目前的评估过程受到伦理问题、时间和成本的挑战。因此,最近出现了对高效评估药物胎儿毒性的硅学模型的需求。以往的研究已经提出了成功的机器学习胎儿毒性预测模型,甚至提出了可能与胎儿毒性风险或保护作用相关的分子亚结构。然而,对每种药物胎儿毒性预测模型决策的解释仍然不足。本研究构建了基于机器学习的模型,该模型可以预测药物的胎儿毒性,同时提供决策解释。为此,研究人员采用了置换特征重要性的方法来确定模型在预测药物胎毒性时具有重要意义的一般特征。此外,还利用注意力机制分析了与每种药物胎儿毒性相关的特征。所有构建模型的预测性能都非常高(AUROC:0.854-0.974,AUPR:0.890-0.975)。此外,我们还对预测的重要特征进行了文献综述,发现这些特征与胎儿毒性高度相关。我们希望我们的模型能对药物或候选药物的胎儿毒性风险进行评估,并对预测结果进行解释,从而有利于胎儿毒性研究。
{"title":"FetoML: Interpretable predictions of the fetotoxicity of drugs based on machine learning approaches.","authors":"Myeonghyeon Jeong, Sunyong Yoo","doi":"10.1002/minf.202300312","DOIUrl":"10.1002/minf.202300312","url":null,"abstract":"<p><p>Pregnant females may use medications to manage health problems that develop during pregnancy or that they had prior to pregnancy. However, using medications during pregnancy has a potential risk to the fetus. Assessing the fetotoxicity of drugs is essential to ensure safe treatments, but the current process is challenged by ethical issues, time, and cost. Therefore, the need for in silico models to efficiently assess the fetotoxicity of drugs has recently emerged. Previous studies have proposed successful machine learning models for fetotoxicity prediction and even suggest molecular substructures that are possibly associated with fetotoxicity risks or protective effects. However, the interpretation of the decisions of the models on fetotoxicity prediction for each drug is still insufficient. This study constructed machine learning-based models that can predict the fetotoxicity of drugs while providing explanations for the decisions. For this, permutation feature importance was used to identify the general features that the model made significant in predicting the fetotoxicity of drugs. In addition, features associated with fetotoxicity for each drug were analyzed using the attention mechanism. The predictive performance of all the constructed models was significantly high (AUROC: 0.854-0.974, AUPR: 0.890-0.975). Furthermore, we conducted literature reviews on the predicted important features and found that they were highly associated with fetotoxicity. We expect that our model will benefit fetotoxicity research by providing an evaluation of fetotoxicity risks for drugs or drug candidates, along with an interpretation of that prediction.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300312"},"PeriodicalIF":3.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141293573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kinematic analysis of kinases and their oncogenic mutations - Kinases and their mutation kinematic analysis. 激酶及其致癌突变的运动学分析 - 激酶及其突变的运动学分析。
IF 3.6 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-06-01 Epub Date: 2024-06-08 DOI: 10.1002/minf.202300250
Xiyu Chen, Sigrid Leyendecker

Protein kinases are crucial cellular enzymes that facilitate the transfer of phosphates from adenosine triphosphate (ATP) to their substrates, thereby regulating numerous cellular activities. Dysfunctional kinase activity often leads to oncogenic conditions. Chosen by using structural similarity to 5UG9, we selected 79 crystal structures from the PDB and based on the position of the phenylalanine side chain in the DFG motif, we classified these 79 crystal structures into 5 group clusters. Our approach applies our kinematic flexibility analysis (KFA) to explore the flexibility of kinases in various activity states and examine the impact of the activation loop on kinase structure. KFA enables the rapid decomposition of macromolecules into different flexibility regions, allowing comprehensive analysis of conformational structures. The results reveal that the activation loop of kinases acts as a "lock" that stabilizes the active conformation of kinases by rigidifying the adjacent α-helices. Furthermore, we investigate specific kinase mutations, such as the L858R mutation commonly associated with non-small cell lung cancer, which induces increased flexibility in active-state kinases. In addition, through analyzing the hydrogen bond pattern, we examine the substructure of kinases in different states. Notably, active-state kinases exhibit a higher occurrence of α-helices compared to inactive-state kinases. This study contributes to the understanding of biomolecular conformation at a level relevant to drug development.

蛋白激酶是一种重要的细胞酶,能促进磷酸从三磷酸腺苷(ATP)转移到其底物上,从而调节多种细胞活动。激酶活性失调往往会导致致癌情况。通过与 5UG9 的结构相似性,我们从 PDB 中选择了 79 个晶体结构,并根据 DFG 主题中苯丙氨酸侧链的位置,将这 79 个晶体结构分为 5 个群组。我们的方法应用了运动灵活性分析(KFA)来探索激酶在不同活性状态下的灵活性,并研究激活环对激酶结构的影响。KFA 能够将大分子快速分解为不同的柔性区域,从而对构象结构进行全面分析。研究结果表明,激酶的激活环就像一把 "锁",通过使相邻的α-螺旋僵化来稳定激酶的活性构象。此外,我们还研究了特定的激酶突变,如常见于非小细胞肺癌的 L858R 突变,这种突变会诱导活性状态激酶的灵活性增加。此外,通过分析氢键模式,我们研究了不同状态下激酶的亚结构。值得注意的是,与非活性状态激酶相比,活性状态激酶表现出更高的α-螺旋发生率。这项研究有助于在与药物开发相关的层面上理解生物分子构象。
{"title":"Kinematic analysis of kinases and their oncogenic mutations - Kinases and their mutation kinematic analysis.","authors":"Xiyu Chen, Sigrid Leyendecker","doi":"10.1002/minf.202300250","DOIUrl":"10.1002/minf.202300250","url":null,"abstract":"<p><p>Protein kinases are crucial cellular enzymes that facilitate the transfer of phosphates from adenosine triphosphate (ATP) to their substrates, thereby regulating numerous cellular activities. Dysfunctional kinase activity often leads to oncogenic conditions. Chosen by using structural similarity to 5UG9, we selected 79 crystal structures from the PDB and based on the position of the phenylalanine side chain in the DFG motif, we classified these 79 crystal structures into 5 group clusters. Our approach applies our kinematic flexibility analysis (KFA) to explore the flexibility of kinases in various activity states and examine the impact of the activation loop on kinase structure. KFA enables the rapid decomposition of macromolecules into different flexibility regions, allowing comprehensive analysis of conformational structures. The results reveal that the activation loop of kinases acts as a \"lock\" that stabilizes the active conformation of kinases by rigidifying the adjacent α-helices. Furthermore, we investigate specific kinase mutations, such as the L858R mutation commonly associated with non-small cell lung cancer, which induces increased flexibility in active-state kinases. In addition, through analyzing the hydrogen bond pattern, we examine the substructure of kinases in different states. Notably, active-state kinases exhibit a higher occurrence of α-helices compared to inactive-state kinases. This study contributes to the understanding of biomolecular conformation at a level relevant to drug development.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300250"},"PeriodicalIF":3.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141288332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Molecular Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1