首页 > 最新文献

Molecular Informatics最新文献

英文 中文
Cover Picture: (Mol. Inf. 7/2024) 封面图片:(Mol.Inf. 7/2024)
IF 3.6 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-07-12 DOI: 10.1002/minf.202480701
{"title":"Cover Picture: (Mol. Inf. 7/2024)","authors":"","doi":"10.1002/minf.202480701","DOIUrl":"https://doi.org/10.1002/minf.202480701","url":null,"abstract":"","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DIGEP-Pred 2.0: A web application for predicting drug-induced cell signaling and gene expression changes. DIGEP-Pred 2.0:用于预测药物诱导的细胞信号传导和基因表达变化的网络应用程序。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-07-09 DOI: 10.1002/minf.202400032
Sergey M Ivanov, Anastasia V Rudik, Alexey A Lagunin, Dmitry A Filimonov, Vladimir V Poroikov

The analysis of drug-induced gene expression profiles (DIGEP) is widely used to estimate the potential therapeutic and adverse drug effects as well as the molecular mechanisms of drug action. However, the corresponding experimental data is absent for many existing drugs and drug-like compounds. To solve this problem, we created the DIGEP-Pred 2.0 web application, which allows predicting DIGEP and potential drug targets by structural formula of drug-like compounds. It is based on the combined use of structure-activity relationships (SARs) and network analysis. SAR models were created using PASS (Prediction of Activity Spectra for Substances) technology for data from the Comparative Toxicogenomics Database (CTD), the Connectivity Map (CMap) for the prediction of DIGEP, and PubChem and ChEMBL for the prediction of molecular mechanisms of action (MoA). Using only the structural formula of a compound, the user can obtain information on potential gene expression changes in several cell lines and drug targets, which are potential master regulators responsible for the observed DIGEP. The mean accuracy of prediction calculated by leave-one-out cross validation was 86.5 % for 13377 genes and 94.8 % for 2932 proteins (CTD data), and it was 97.9 % for 2170 MoAs. SAR models (mean accuracy-87.5 %) were also created for CMap data given on MCF7, PC3, and HL60 cell lines with different threshold values for the logarithm of fold changes: 0.5, 0.7, 1, 1.5, and 2. Additionally, the data on pathways (KEGG, Reactome), biological processes of Gene Ontology, and diseases (DisGeNet) enriched by the predicted genes, together with the estimation of target-master regulators based on OmniPath data, is also provided. DIGEP-Pred 2.0 web application is freely available at https://www.way2drug.com/digep-pred.

药物诱导基因表达谱(DIGEP)分析被广泛用于评估药物的潜在治疗和不良反应以及药物作用的分子机制。然而,许多现有药物和类药物缺乏相应的实验数据。为了解决这个问题,我们创建了 DIGEP-Pred 2.0 网络应用程序,它可以通过类药物的结构式预测 DIGEP 和潜在的药物靶点。它基于结构-活性关系(SAR)和网络分析的结合使用。SAR 模型是利用 PASS(物质活性谱预测)技术创建的,其数据来自比较毒物基因组学数据库(CTD),连接图(CMap)用于预测 DIGEP,PubChem 和 ChEMBL 用于预测分子作用机制(MoA)。用户只需使用化合物的结构式,就能获得多个细胞系和药物靶点中潜在基因表达变化的信息,这些基因表达变化是导致观察到的 DIGEP 的潜在主调节因子。通过缺一交叉验证计算出的 13377 个基因和 2932 个蛋白质(CTD 数据)的平均预测准确率分别为 86.5%和 94.8%,2170 个 MoAs 的平均预测准确率为 97.9%。此外,还针对 MCF7、PC3 和 HL60 细胞系的 CMap 数据创建了 SAR 模型(平均准确率为 87.5%),并采用了不同的折叠变化对数值阈值:0.5、0.7、1、1.5 和 2。此外,还提供了预测基因富集的通路(KEGG、Reactome)、基因本体论的生物过程和疾病(DisGeNet)数据,以及基于 OmniPath 数据的目标主调节因子估算。DIGEP-Pred 2.0 网络应用程序可在 https://www.way2drug.com/digep-pred 免费获取。
{"title":"DIGEP-Pred 2.0: A web application for predicting drug-induced cell signaling and gene expression changes.","authors":"Sergey M Ivanov, Anastasia V Rudik, Alexey A Lagunin, Dmitry A Filimonov, Vladimir V Poroikov","doi":"10.1002/minf.202400032","DOIUrl":"https://doi.org/10.1002/minf.202400032","url":null,"abstract":"<p><p>The analysis of drug-induced gene expression profiles (DIGEP) is widely used to estimate the potential therapeutic and adverse drug effects as well as the molecular mechanisms of drug action. However, the corresponding experimental data is absent for many existing drugs and drug-like compounds. To solve this problem, we created the DIGEP-Pred 2.0 web application, which allows predicting DIGEP and potential drug targets by structural formula of drug-like compounds. It is based on the combined use of structure-activity relationships (SARs) and network analysis. SAR models were created using PASS (Prediction of Activity Spectra for Substances) technology for data from the Comparative Toxicogenomics Database (CTD), the Connectivity Map (CMap) for the prediction of DIGEP, and PubChem and ChEMBL for the prediction of molecular mechanisms of action (MoA). Using only the structural formula of a compound, the user can obtain information on potential gene expression changes in several cell lines and drug targets, which are potential master regulators responsible for the observed DIGEP. The mean accuracy of prediction calculated by leave-one-out cross validation was 86.5 % for 13377 genes and 94.8 % for 2932 proteins (CTD data), and it was 97.9 % for 2170 MoAs. SAR models (mean accuracy-87.5 %) were also created for CMap data given on MCF7, PC3, and HL60 cell lines with different threshold values for the logarithm of fold changes: 0.5, 0.7, 1, 1.5, and 2. Additionally, the data on pathways (KEGG, Reactome), biological processes of Gene Ontology, and diseases (DisGeNet) enriched by the predicted genes, together with the estimation of target-master regulators based on OmniPath data, is also provided. DIGEP-Pred 2.0 web application is freely available at https://www.way2drug.com/digep-pred.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141559261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cumulative phylogenetic, sequence and structural analysis of Insulin superfamily proteins provide unique structure-function insights. 对胰岛素超家族蛋白的系统发育、序列和结构的累积分析提供了独特的结构-功能见解。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-07-08 DOI: 10.1002/minf.202300160
Shrilakshmi Sheshagiri Rao, Shankar V Kundapura, Debayan Dey, Chandrasekaran Palaniappan, Kanagaraj Sekar, Ananda Kulal, Udupi A Ramagopal

The insulin superfamily proteins (ISPs), in particular, insulin, IGFs and relaxin proteins are key modulators of animal physiology. They are known to have evolved from the same ancestral gene and have diverged into proteins with varied sequences and distinct functions, but maintain a similar structural architecture stabilized by highly conserved disulphide bridges. The recent surge of sequence data and the structures of these proteins prompted a need for a comprehensive analysis, which connects the evolution of these sequences (427 sequences) in the light of available functional and structural information including representative complex structures of ISPs with their cognate receptors. This study reveals (a) unusually high sequence conservation of IGFs (>90 % conservation in 184 sequences) and provides a possible structure-based rationale for such high sequence conservation; (b) provides an updated definition of the receptor-binding signature motif of the functionally diverse relaxin family members (c) provides a probable non-canonical C-peptide cleavage site in a few insulin sequences. The high conservation of IGFs appears to represent a classic case of resistance to sequence diversity exerted by physiologically important interactions with multiple partners. We also propose a probable mechanism for C-peptide cleavage in a few distinct insulin sequences and redefine the receptor-binding signature motif of the relaxin family. Lastly, we provide a basis for minimally modified insulin mutants with potential therapeutic application, inspired by concomitant changes observed in other insulin superfamily protein members supported by molecular dynamics simulation.

胰岛素超家族蛋白(ISPs),特别是胰岛素、IGFs 和松弛素蛋白是动物生理的关键调节因子。众所周知,它们是由同一祖先基因进化而来,并分化成具有不同序列和不同功能的蛋白质,但通过高度保守的二硫键保持着相似的结构。最近,这些蛋白质的序列数据和结构激增,促使人们需要根据现有的功能和结构信息(包括 ISP 与其同源受体的代表性复合结构),对这些序列(427 个序列)的进化进行全面分析。这项研究揭示了:(a) IGFs 的序列保存率异常之高(184 个序列的保存率大于 90%),并为如此高的序列保存率提供了一个可能的基于结构的理由;(b) 为功能多样的弛缓素家族成员的受体结合标志图案提供了一个最新的定义;(c) 在一些胰岛素序列中提供了一个可能的非经典 C 肽裂解位点。IGFs 的高度保守性似乎代表了一种典型的情况,即通过与多个伙伴的重要生理相互作用来抵抗序列多样性。我们还提出了几个不同的胰岛素序列中 C 肽裂解的可能机制,并重新定义了松弛素家族的受体结合特征基团。最后,我们从分子动力学模拟支持下在其他胰岛素超家族蛋白成员中观察到的伴随变化中得到启发,为具有潜在治疗用途的最小修饰胰岛素突变体提供了基础。
{"title":"Cumulative phylogenetic, sequence and structural analysis of Insulin superfamily proteins provide unique structure-function insights.","authors":"Shrilakshmi Sheshagiri Rao, Shankar V Kundapura, Debayan Dey, Chandrasekaran Palaniappan, Kanagaraj Sekar, Ananda Kulal, Udupi A Ramagopal","doi":"10.1002/minf.202300160","DOIUrl":"https://doi.org/10.1002/minf.202300160","url":null,"abstract":"<p><p>The insulin superfamily proteins (ISPs), in particular, insulin, IGFs and relaxin proteins are key modulators of animal physiology. They are known to have evolved from the same ancestral gene and have diverged into proteins with varied sequences and distinct functions, but maintain a similar structural architecture stabilized by highly conserved disulphide bridges. The recent surge of sequence data and the structures of these proteins prompted a need for a comprehensive analysis, which connects the evolution of these sequences (427 sequences) in the light of available functional and structural information including representative complex structures of ISPs with their cognate receptors. This study reveals (a) unusually high sequence conservation of IGFs (>90 % conservation in 184 sequences) and provides a possible structure-based rationale for such high sequence conservation; (b) provides an updated definition of the receptor-binding signature motif of the functionally diverse relaxin family members (c) provides a probable non-canonical C-peptide cleavage site in a few insulin sequences. The high conservation of IGFs appears to represent a classic case of resistance to sequence diversity exerted by physiologically important interactions with multiple partners. We also propose a probable mechanism for C-peptide cleavage in a few distinct insulin sequences and redefine the receptor-binding signature motif of the relaxin family. Lastly, we provide a basis for minimally modified insulin mutants with potential therapeutic application, inspired by concomitant changes observed in other insulin superfamily protein members supported by molecular dynamics simulation.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Task ADME/PK prediction at industrial scale: leveraging large and diverse experimentaldatasets. 工业规模的多任务 ADME/PK 预测:利用大型多样的实验数据集。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-07-08 DOI: 10.1002/minf.202400079
Moritz Walter, Jens M Borghardt, Lina Humbeck, Miha Skalic

ADME (Absorption, Distribution, Metabolism, Excretion) properties are key parameters to judge whether a drug candidate exhibits a desired pharmacokinetic (PK) profile. In this study, we tested multi-task machine learning (ML) models to predict ADME and animal PK endpoints trained on in-house data generated at Boehringer Ingelheim. Models were evaluated both at the design stage of a compound (i. e., no experimental data of test compounds available) and at testing stage when a particular assay would be conducted (i. e., experimental data of earlier conducted assays may be available). Using realistic time-splits, we found a clear benefit in performance of multi-task graph-based neural network models over single-task model, which was even stronger when experimental data of earlier assays is available. In an attempt to explain the success of multi-task models, we found that especially endpoints with the largest numbers of data points (physicochemical endpoints, clearance in microsomes) are responsible for increased predictivity in more complex ADME and PK endpoints. In summary, our study provides insight into how data for multiple ADME/PK endpoints in a pharmaceutical company can be best leveraged to optimize predictivity of ML models.

ADME(吸收、分布、代谢、排泄)特性是判断候选药物是否具有理想药代动力学(PK)特征的关键参数。在这项研究中,我们测试了多任务机器学习(ML)模型,这些模型是根据勃林格殷格翰公司内部生成的数据训练而成的,用于预测 ADME 和动物 PK 终点。我们在化合物设计阶段(即没有测试化合物的实验数据)和测试阶段(即可能有早期进行的实验数据)对模型进行了评估。利用现实的时间分割,我们发现基于图的多任务神经网络模型的性能明显优于单任务模型。为了解释多任务模型的成功,我们发现数据点数量最多的终点(理化终点、微粒体中的清除率)尤其能提高更复杂的 ADME 和 PK 终点的预测能力。总之,我们的研究深入探讨了如何充分利用制药公司的多个 ADME/PK 终点数据来优化多重任务模型的预测能力。
{"title":"Multi-Task ADME/PK prediction at industrial scale: leveraging large and diverse experimentaldatasets.","authors":"Moritz Walter, Jens M Borghardt, Lina Humbeck, Miha Skalic","doi":"10.1002/minf.202400079","DOIUrl":"https://doi.org/10.1002/minf.202400079","url":null,"abstract":"<p><p>ADME (Absorption, Distribution, Metabolism, Excretion) properties are key parameters to judge whether a drug candidate exhibits a desired pharmacokinetic (PK) profile. In this study, we tested multi-task machine learning (ML) models to predict ADME and animal PK endpoints trained on in-house data generated at Boehringer Ingelheim. Models were evaluated both at the design stage of a compound (i. e., no experimental data of test compounds available) and at testing stage when a particular assay would be conducted (i. e., experimental data of earlier conducted assays may be available). Using realistic time-splits, we found a clear benefit in performance of multi-task graph-based neural network models over single-task model, which was even stronger when experimental data of earlier assays is available. In an attempt to explain the success of multi-task models, we found that especially endpoints with the largest numbers of data points (physicochemical endpoints, clearance in microsomes) are responsible for increased predictivity in more complex ADME and PK endpoints. In summary, our study provides insight into how data for multiple ADME/PK endpoints in a pharmaceutical company can be best leveraged to optimize predictivity of ML models.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chemoinformatic regression methods and their applicability domain. 化学信息回归方法及其适用领域。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 Epub Date: 2024-05-28 DOI: 10.1002/minf.202400018
Thomas-Martin Dutschmann, Valerie Schlenker, Knut Baumann

The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.

随着人们对化学信息模型不确定性的兴趣与日俱增,需要对最广泛使用的回归技术以及如何估计其可靠性进行总结。回归模型学习从解释变量空间到连续输出值空间的映射。除其他局限性外,模型的预测性能还受到用于模型拟合的训练数据的限制。通过离群点检测方法识别异常对象可以提高模型的性能。此外,正确的模型评估还需要定义模型的局限性,也就是通常所说的适用范围。与某些分类器类似,一些回归技术带有量化其(不)确定性的内置方法或增强功能,而另一些则依赖于通用程序。本文将解释其工作原理的理论背景,以及如何推导出适用范围的具体和一般定义。
{"title":"Chemoinformatic regression methods and their applicability domain.","authors":"Thomas-Martin Dutschmann, Valerie Schlenker, Knut Baumann","doi":"10.1002/minf.202400018","DOIUrl":"10.1002/minf.202400018","url":null,"abstract":"<p><p>The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141158308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural analysis of neomycin B and kanamycin A binding Aminoglycosides Modifying Enzymes (AME) and bacterial ribosomal RNA. 新霉素 B 和卡那霉素 A 与氨基糖苷类药物修饰酶 (AME) 和细菌核糖体 RNA 结合的结构分析。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 Epub Date: 2024-06-10 DOI: 10.1002/minf.202300339
Julia Revillo Imbernon, Jean-Marc Weibel, Eric Ennifar, Gilles Prévost, Esther Kellenberger

Aminoglycosides are crucial antibiotics facing challenges from bacterial resistance. This study addresses the importance of aminoglycoside modifying enzymes in the context of escalating resistance. Drawing upon over two decades of structural data in the Protein Data Bank, we focused on two key antibiotics, neomycin B and kanamycin A, to explore how the aminoglycoside structure is exploited by this family of enzymes. A systematic comparison across diverse enzymes and the RNA A-site target identified common characteristics in the recognition mode, while assessing the adaptability of neomycin B and kanamycin A in various environments.

氨基糖苷类药物是面临细菌耐药性挑战的重要抗生素。本研究探讨了在耐药性不断升级的背景下氨基糖苷类药物修饰酶的重要性。利用蛋白质数据库中二十多年的结构数据,我们重点研究了两种关键抗生素--新霉素 B 和卡那霉素 A,以探索氨基糖苷类结构是如何被该酶家族利用的。我们对不同的酶和 RNA A 位点目标进行了系统比较,确定了识别模式的共同特征,同时评估了新霉素 B 和卡那霉素 A 在各种环境中的适应性。
{"title":"Structural analysis of neomycin B and kanamycin A binding Aminoglycosides Modifying Enzymes (AME) and bacterial ribosomal RNA.","authors":"Julia Revillo Imbernon, Jean-Marc Weibel, Eric Ennifar, Gilles Prévost, Esther Kellenberger","doi":"10.1002/minf.202300339","DOIUrl":"10.1002/minf.202300339","url":null,"abstract":"<p><p>Aminoglycosides are crucial antibiotics facing challenges from bacterial resistance. This study addresses the importance of aminoglycoside modifying enzymes in the context of escalating resistance. Drawing upon over two decades of structural data in the Protein Data Bank, we focused on two key antibiotics, neomycin B and kanamycin A, to explore how the aminoglycoside structure is exploited by this family of enzymes. A systematic comparison across diverse enzymes and the RNA A-site target identified common characteristics in the recognition mode, while assessing the adaptability of neomycin B and kanamycin A in various environments.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141296441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing search algorithms on the retrosynthesis problem. 比较逆合成问题的搜索算法。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 Epub Date: 2024-06-12 DOI: 10.1002/minf.202300259
Milo Roucairol, Tristan Cazenave

In this article we try different algorithms, namely Nested Monte Carlo Search and Greedy Best First Search, on AstraZeneca's open source retrosynthetic tool : AiZynthFinder. We compare these algorithms to AiZynthFinder's base Monte Carlo Tree Search on a benchmark selected from the PubChem database and by Bayer's chemists. We show that both Nested Monte Carlo Search and Greedy Best First Search outperform AstraZeneca's Monte Carlo Tree Search, with a slight advantage for Nested Monte Carlo Search while experimenting on a playout heuristic. We also show how the search algorithms are bounded by the quality of the policy network, in order to improve our results the next step is to improve the policy network.

在本文中,我们在 AstraZeneca 的开源逆合成工具 AiZynthFinder 上尝试了不同的算法,即嵌套蒙特卡罗搜索和贪婪最佳优先搜索。我们将这些算法与 AiZynthFinder 的基本蒙特卡洛树搜索进行了比较,比较的基准是从 PubChem 数据库和拜耳的化学家那里挑选出来的。我们的结果表明,嵌套蒙特卡罗搜索和贪婪最佳优先搜索都优于 AstraZeneca 的蒙特卡罗树形搜索,而嵌套蒙特卡罗搜索在实验中采用了启发式,略胜一筹。我们还展示了搜索算法如何受到策略网络质量的限制,为了改进我们的结果,下一步就是改进策略网络。
{"title":"Comparing search algorithms on the retrosynthesis problem.","authors":"Milo Roucairol, Tristan Cazenave","doi":"10.1002/minf.202300259","DOIUrl":"10.1002/minf.202300259","url":null,"abstract":"<p><p>In this article we try different algorithms, namely Nested Monte Carlo Search and Greedy Best First Search, on AstraZeneca's open source retrosynthetic tool : AiZynthFinder. We compare these algorithms to AiZynthFinder's base Monte Carlo Tree Search on a benchmark selected from the PubChem database and by Bayer's chemists. We show that both Nested Monte Carlo Search and Greedy Best First Search outperform AstraZeneca's Monte Carlo Tree Search, with a slight advantage for Nested Monte Carlo Search while experimenting on a playout heuristic. We also show how the search algorithms are bounded by the quality of the policy network, in order to improve our results the next step is to improve the policy network.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141306331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Best of both worlds: An expansion of the state of the art pKa model with data from three industrial partners 两全其美:利用三个工业合作伙伴提供的数据扩展最先进的 pKa 模型
IF 3.6 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-06-21 DOI: 10.1002/minf.202400088
Robert Fraczkiewicz, Huy Quoc Nguyen, Newton Wu, Nina Kausch‐Busies, Sergio Grimbs, Kai Sommer, Antonius ter Laak, Judith Günther, Björn Wagner, Michael Reutlinger
In a unique collaboration between Simulations Plus and several industrial partners, we were able to develop a new version 11.0 of the previously published in silico pKa model, S+pKa, with considerably improved prediction accuracy. The model's training set was vastly expanded by large amounts of experimental data obtained from F. Hoffmann‐La Roche AG, Genentech Inc., and the Crop Science division of Bayer AG. The previous v7.0 of S+pKa was trained on data from public sources and the Pharmaceutical division of Bayer AG. The model has shown dramatic improvements in predictive accuracy when externally validated on three new contributor compound sets. Less expected was v11.0’s improvement in prediction on new compounds developed at Bayer Pharma after v7.0 was released (2013–2023), even without contributing additional data to v11.0. We illustrate chemical space coverage by chemistries encountered in the five domains, public and industrial, outline model construction, and discuss factors contributing to model's success.
通过 Simulations Plus 与几家工业合作伙伴之间的独特合作,我们开发出了之前发布的硅 pKa 模型 S+pKa 的 11.0 新版本,大大提高了预测准确性。通过从 F. Hoffmann-La Roche AG、Genentech Inc.之前的 S+pKa v7.0 版本是根据来自公共资源和拜耳股份公司制药部门的数据进行训练的。在对三个新的贡献化合物集进行外部验证时,该模型的预测准确性有了显著提高。较少预期的是,即使没有为 v11.0 提供额外数据,v11.0 在 v7.0 发布后(2013-2023 年)对拜耳医药公司开发的新化合物的预测能力也有所提高。我们通过五个领域(公共领域和工业领域)中遇到的化学物质说明了化学空间的覆盖范围,概述了模型的构建,并讨论了模型成功的因素。
{"title":"Best of both worlds: An expansion of the state of the art pKa model with data from three industrial partners","authors":"Robert Fraczkiewicz, Huy Quoc Nguyen, Newton Wu, Nina Kausch‐Busies, Sergio Grimbs, Kai Sommer, Antonius ter Laak, Judith Günther, Björn Wagner, Michael Reutlinger","doi":"10.1002/minf.202400088","DOIUrl":"https://doi.org/10.1002/minf.202400088","url":null,"abstract":"In a unique collaboration between Simulations Plus and several industrial partners, we were able to develop a new version 11.0 of the previously published <jats:italic>in silico</jats:italic> pK<jats:sub>a</jats:sub> model, S+pKa, with considerably improved prediction accuracy. The model's training set was vastly expanded by large amounts of experimental data obtained from F. Hoffmann‐La Roche AG, Genentech Inc., and the Crop Science division of Bayer AG. The previous v7.0 of S+pKa was trained on data from public sources and the Pharmaceutical division of Bayer AG. The model has shown dramatic improvements in predictive accuracy when externally validated on three new contributor compound sets. Less expected was v11.0’s improvement in prediction on new compounds developed at Bayer Pharma after v7.0 was released (2013–2023), even without contributing additional data to v11.0. We illustrate chemical space coverage by chemistries encountered in the five domains, public and industrial, outline model construction, and discuss factors contributing to model's success.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring drug repositioning possibilities of kinase inhibitors via molecular simulation** 通过分子模拟探索激酶抑制剂药物重新定位的可能性**
IF 3.6 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-06-21 DOI: 10.1002/minf.202300336
Qing‐Xin Wang, Jiao Cai, Zi‐Jun Chen, Jia‐Chuan Liu, Jing‐Jing Wang, Hai Zhou, Qing‐Qing Li, Zi‐Xuan Wang, Yi‐Bo Wang, Zhen‐Jiang Tong, Jin Yang, Tian‐Hua Wei, Meng‐Yuan Zhang, Yun Zhou, Wei‐Chen Dai, Ning Ding, Xue‐Jiao Leng, Xiao‐Ying Yin, Shan‐Liang Sun, Yan‐Cheng Yu, Nian‐Guang Li, Zhi‐Hao Shi
Kinases, a class of enzymes controlling various substrates phosphorylation, are pivotal in both physiological and pathological processes. Although their conserved ATP binding pockets pose challenges for achieving selectivity, this feature offers opportunities for drug repositioning of kinase inhibitors (KIs). This study presents a cost‐effective in silico prediction of KIs drug repositioning via analyzing cross‐docking results. We established the KIs database (278 unique KIs, 1834 bioactivity data points) and kinases database (357 kinase structures categorized by the DFG motif) for carrying out cross‐docking. Comparative analysis of the docking scores and reported experimental bioactivity revealed that the Atypical, TK, and TKL superfamilies are suitable for drug repositioning. Among these kinase superfamilies, Olverematinib, Lapatinib, and Abemaciclib displayed enzymatic activity in our focused AKT‐PI3K‐mTOR pathway with IC50 values of 3.3, 3.2 and 5.8 μM. Further cell assays showed IC50 values of 0.2, 1.2 and 0.6 μM in tumor cells. The consistent result between prediction and validation demonstrated that repositioning KIs via in silico method is feasible.
激酶是一类控制各种底物磷酸化的酶,在生理和病理过程中都起着关键作用。尽管激酶保守的 ATP 结合口袋给实现选择性带来了挑战,但这一特点为激酶抑制剂(KIs)的药物重新定位提供了机会。本研究通过分析交叉对接结果,提出了一种经济有效的 KIs 药物重新定位的硅学预测方法。我们建立了 KIs 数据库(278 种独特的 KIs,1834 个生物活性数据点)和激酶数据库(按 DFG 主题分类的 357 种激酶结构),用于进行交叉对接。对对接得分和实验生物活性的比较分析表明,非典型激酶超家族、TK 激酶超家族和 TKL 激酶超家族适合药物重新定位。在这些激酶超家族中,Olverematinib、Lapatinib 和 Abemaciclib 在我们重点研究的 AKT-PI3K-mTOR 通路中显示出酶活性,IC50 值分别为 3.3、3.2 和 5.8 μM。进一步的细胞检测显示,肿瘤细胞的 IC50 值分别为 0.2、1.2 和 0.6 μM。预测和验证结果的一致性表明,通过硅学方法重新定位 KIs 是可行的。
{"title":"Exploring drug repositioning possibilities of kinase inhibitors via molecular simulation**","authors":"Qing‐Xin Wang, Jiao Cai, Zi‐Jun Chen, Jia‐Chuan Liu, Jing‐Jing Wang, Hai Zhou, Qing‐Qing Li, Zi‐Xuan Wang, Yi‐Bo Wang, Zhen‐Jiang Tong, Jin Yang, Tian‐Hua Wei, Meng‐Yuan Zhang, Yun Zhou, Wei‐Chen Dai, Ning Ding, Xue‐Jiao Leng, Xiao‐Ying Yin, Shan‐Liang Sun, Yan‐Cheng Yu, Nian‐Guang Li, Zhi‐Hao Shi","doi":"10.1002/minf.202300336","DOIUrl":"https://doi.org/10.1002/minf.202300336","url":null,"abstract":"Kinases, a class of enzymes controlling various substrates phosphorylation, are pivotal in both physiological and pathological processes. Although their conserved ATP binding pockets pose challenges for achieving selectivity, this feature offers opportunities for drug repositioning of kinase inhibitors (KIs). This study presents a cost‐effective in silico prediction of KIs drug repositioning via analyzing cross‐docking results. We established the KIs database (278 unique KIs, 1834 bioactivity data points) and kinases database (357 kinase structures categorized by the DFG motif) for carrying out cross‐docking. Comparative analysis of the docking scores and reported experimental bioactivity revealed that the Atypical, TK, and TKL superfamilies are suitable for drug repositioning. Among these kinase superfamilies, Olverematinib, Lapatinib, and Abemaciclib displayed enzymatic activity in our focused AKT‐PI3K‐mTOR pathway with IC<jats:sub>50</jats:sub> values of 3.3, 3.2 and 5.8 μM. Further cell assays showed IC<jats:sub>50</jats:sub> values of 0.2, 1.2 and 0.6 μM in tumor cells. The consistent result between prediction and validation demonstrated that repositioning KIs via <jats:italic>in silico</jats:italic> method is feasible.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Updating and profiling the natural product‐likeness of Latin American compound libraries 更新和剖析拉丁美洲化合物库的天然产品相似性
IF 3.6 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-06-21 DOI: 10.1002/minf.202400052
Alejandro Gómez‐García, Ann‐Kathrin Prinz, Daniel A. Acuña Jiménez, William J. Zamora, Haruna L. Barazorda‐Ccahuana, Miguel Á. Chávez‐Fumagalli, Marilia Valli, Adriano D. Andricopulo, Vanderlan da S. Bolzani, Dionisio A. Olmedo, Pablo N. Solís, Marvin J. Núñez, Johny R. Rodríguez Pérez, Hoover A. Valencia Sánchez, Héctor F. Cortés Hernández, Oscar M. Mosquera Martinez, Oliver Koch, José L. Medina‐Franco
Compound databases of natural products play a crucial role in drug discovery and development projects and have implications in other areas, such as food chemical research, ecology and metabolomics. Recently, we put together the first version of the Latin American Natural Product database (LANaPDB) as a collective effort of researchers from six countries to ensemble a public and representative library of natural products in a geographical region with a large biodiversity. The present work aims to conduct a comparative and extensive profiling of the natural product‐likeness of an updated version of LANaPDB and the individual ten compound databases that form part of LANaPDB. The natural product‐likeness profile of the Latin American compound databases is contrasted with the profile of other major natural product databases in the public domain and a set of small‐molecule drugs approved for clinical use. As part of the extensive characterization, we employed several chemoinformatics metrics of natural product likeness. The results of this study will capture the attention of the global community engaged in natural product databases, not only in Latin America but across the world.
天然产品化合物数据库在药物发现和开发项目中发挥着至关重要的作用,对食品化学研究、生态学和代谢组学等其他领域也有影响。最近,我们建立了拉丁美洲天然产物数据库(LANaPDB)的第一个版本,这是来自六个国家的研究人员共同努力的成果,目的是在生物多样性丰富的地理区域建立一个具有代表性的公共天然产物库。本工作旨在对更新版 LANaPDB 和构成 LANaPDB 一部分的十个化合物数据库的天然产品相似性进行广泛的比较分析。拉美化合物数据库的天然产品相似性特征与公共领域的其他主要天然产品数据库和一组已批准用于临床的小分子药物的特征进行了对比。作为广泛特征描述的一部分,我们采用了几种天然产物相似性的化学信息学指标。这项研究的结果将引起全球从事天然产品数据库研究的各界人士的关注,不仅在拉丁美洲,而且在全世界。
{"title":"Updating and profiling the natural product‐likeness of Latin American compound libraries","authors":"Alejandro Gómez‐García, Ann‐Kathrin Prinz, Daniel A. Acuña Jiménez, William J. Zamora, Haruna L. Barazorda‐Ccahuana, Miguel Á. Chávez‐Fumagalli, Marilia Valli, Adriano D. Andricopulo, Vanderlan da S. Bolzani, Dionisio A. Olmedo, Pablo N. Solís, Marvin J. Núñez, Johny R. Rodríguez Pérez, Hoover A. Valencia Sánchez, Héctor F. Cortés Hernández, Oscar M. Mosquera Martinez, Oliver Koch, José L. Medina‐Franco","doi":"10.1002/minf.202400052","DOIUrl":"https://doi.org/10.1002/minf.202400052","url":null,"abstract":"Compound databases of natural products play a crucial role in drug discovery and development projects and have implications in other areas, such as food chemical research, ecology and metabolomics. Recently, we put together the first version of the Latin American Natural Product database (LANaPDB) as a collective effort of researchers from six countries to ensemble a public and representative library of natural products in a geographical region with a large biodiversity. The present work aims to conduct a comparative and extensive profiling of the natural product‐likeness of an updated version of LANaPDB and the individual ten compound databases that form part of LANaPDB. The natural product‐likeness profile of the Latin American compound databases is contrasted with the profile of other major natural product databases in the public domain and a set of small‐molecule drugs approved for clinical use. As part of the extensive characterization, we employed several chemoinformatics metrics of natural product likeness. The results of this study will capture the attention of the global community engaged in natural product databases, not only in Latin America but across the world.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Molecular Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1