首页 > 最新文献

Molecular Informatics最新文献

英文 中文
A Molecular Representation to Identify Isofunctional Molecules. 识别同功能分子的分子表征。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-03-01 DOI: 10.1002/minf.202400159
Philippe Pinel, Gwenn Guichaoua, Nicolas Devaux, Yann Gaston-Mathé, Brice Hoffmann, Véronique Stoven

The challenges of drug discovery from hit identification to clinical development sometimes involves addressing scaffold hopping issues, in order to optimise molecular biological activity or ADME properties, or mitigate toxicology concerns of a drug candidate. Docking is usually viewed as the method of choice for identification of isofunctional molecules, i. e. highly dissimilar molecules that share common binding modes with a protein target. However, the structure of the protein may not be suitable for docking because of a low resolution, or may even be unknown. This problem is frequently encountered in the case of membrane proteins, although they constitute an important category of the druggable proteome. In such cases, ligand-based approaches offer promise but are often inadequate to handle large-step scaffold hopping, because they usually rely on molecular structure. Therefore, we propose the Interaction Fingerprints Profile (IFPP), a molecular representation that captures molecules binding modes based on docking experiments against a panel of diverse high-quality proteins structures. Evaluation on the LH benchmark demonstrates the interest of IFPP for identification of isofunctional molecules. Nevertheless, computation of IFPPs is expensive, which limits its scalability for screening very large molecular libraries. We propose to overcome this limitation by leveraging Metric Learning approaches, allowing fast estimation of molecules IFPP similarities, thus providing an efficient pre-screening strategy that in applicable to very large molecular libraries. Overall, our results suggest that IFPP provides an interesting and complementary tool alongside existing methods, in order to address challenging scaffold hopping problems effectively in drug discovery.

从hit鉴定到临床开发,药物发现的挑战有时涉及解决支架跳跃问题,以优化分子生物学活性或ADME特性,或减轻候选药物的毒理学问题。对接通常被认为是鉴定同功能分子的首选方法。与蛋白质靶标具有共同结合模式的高度不同的分子。然而,由于分辨率低,蛋白质的结构可能不适合对接,甚至可能是未知的。尽管膜蛋白构成了可药物蛋白质组的一个重要类别,但在膜蛋白的情况下经常遇到这个问题。在这种情况下,基于配体的方法提供了希望,但通常不足以处理大台阶支架跳跃,因为它们通常依赖于分子结构。因此,我们提出了相互作用指纹图谱(IFPP),这是一种基于对接实验捕获分子结合模式的分子表征,该实验基于一组不同的高质量蛋白质结构。对LH基准的评价表明IFPP对鉴定同功能分子的兴趣。然而,IFPPs的计算成本很高,这限制了它在筛选非常大的分子库时的可扩展性。我们建议利用度量学习方法来克服这一限制,允许快速估计分子IFPP相似性,从而提供一种适用于非常大的分子库的有效预筛选策略。总的来说,我们的结果表明IFPP提供了一个有趣的和补充的工具,除了现有的方法,为了有效地解决药物发现中具有挑战性的支架跳跃问题。
{"title":"A Molecular Representation to Identify Isofunctional Molecules.","authors":"Philippe Pinel, Gwenn Guichaoua, Nicolas Devaux, Yann Gaston-Mathé, Brice Hoffmann, Véronique Stoven","doi":"10.1002/minf.202400159","DOIUrl":"10.1002/minf.202400159","url":null,"abstract":"<p><p>The challenges of drug discovery from hit identification to clinical development sometimes involves addressing scaffold hopping issues, in order to optimise molecular biological activity or ADME properties, or mitigate toxicology concerns of a drug candidate. Docking is usually viewed as the method of choice for identification of isofunctional molecules, i. e. highly dissimilar molecules that share common binding modes with a protein target. However, the structure of the protein may not be suitable for docking because of a low resolution, or may even be unknown. This problem is frequently encountered in the case of membrane proteins, although they constitute an important category of the druggable proteome. In such cases, ligand-based approaches offer promise but are often inadequate to handle large-step scaffold hopping, because they usually rely on molecular structure. Therefore, we propose the Interaction Fingerprints Profile (IFPP), a molecular representation that captures molecules binding modes based on docking experiments against a panel of diverse high-quality proteins structures. Evaluation on the LH benchmark demonstrates the interest of IFPP for identification of isofunctional molecules. Nevertheless, computation of IFPPs is expensive, which limits its scalability for screening very large molecular libraries. We propose to overcome this limitation by leveraging Metric Learning approaches, allowing fast estimation of molecules IFPP similarities, thus providing an efficient pre-screening strategy that in applicable to very large molecular libraries. Overall, our results suggest that IFPP provides an interesting and complementary tool alongside existing methods, in order to address challenging scaffold hopping problems effectively in drug discovery.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 3","pages":"e202400159"},"PeriodicalIF":2.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143657826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoLiNN: A Tool for Fast Chemical Space Visualization of Combinatorial Libraries Without Enumeration. 一个不需要枚举的组合库的快速化学空间可视化工具。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-03-01 DOI: 10.1002/minf.202400263
Regina Pikalyova, Tagir Akhmetshin, Dragos Horvath, Alexandre Varnek

Visualization of the combinatorial library chemical space provides a comprehensive overview of available compound classes, their diversity, and physicochemical property distribution - key factors in drug discovery. Typically, this visualization requires time- and resource-consuming compound enumeration, standardization, descriptor calculation, and dimensionality reduction. In this study, we present the Combinatorial Library Neural Network (CoLiNN) designed to predict the projection of compounds on a 2D chemical space map using only their building blocks and reaction information, thus eliminating the need for compound enumeration. Trained on 2.5 K virtual DNA-Encoded Libraries (DELs), CoLiNN demonstrated high predictive performance, accurately predicting the compound position on Generative Topographic Maps (GTMs). GTMs predicted by CoLiNN were found very similar to the maps built for enumerated structures. In the library comparison task, we compared the GTMs of DELs and the ChEMBL database. The similarity-based DELs/ChEMBL rankings obtained with "true" and CoLiNN predicted GTMs were consistent. Therefore, CoLiNN has the potential to become the go-to tool for combinatorial compound library design - it can explore the library design space more efficiently by skipping the compound enumeration.

可视化的组合库化学空间提供了一个全面的概述,可用的化合物类,它们的多样性和物理化学性质分布-药物发现的关键因素。通常,这种可视化需要耗费时间和资源的复合枚举、标准化、描述符计算和降维。在这项研究中,我们提出了组合库神经网络(CoLiNN),旨在仅使用它们的构建块和反应信息来预测化合物在二维化学空间图上的投影,从而消除了对化合物枚举的需要。在2.5 K的虚拟dna编码库(DELs)上训练,CoLiNN显示出很高的预测性能,可以准确预测生成地形图(GTMs)上的化合物位置。CoLiNN预测的GTMs与为枚举结构构建的映射非常相似。在库比较任务中,我们比较了DELs和ChEMBL数据库的GTMs。基于相似性的DELs/ChEMBL排名与“true”和CoLiNN预测的GTMs是一致的。因此,CoLiNN有潜力成为组合复合库设计的首选工具——它可以通过跳过复合枚举更有效地探索库设计空间。
{"title":"CoLiNN: A Tool for Fast Chemical Space Visualization of Combinatorial Libraries Without Enumeration.","authors":"Regina Pikalyova, Tagir Akhmetshin, Dragos Horvath, Alexandre Varnek","doi":"10.1002/minf.202400263","DOIUrl":"10.1002/minf.202400263","url":null,"abstract":"<p><p>Visualization of the combinatorial library chemical space provides a comprehensive overview of available compound classes, their diversity, and physicochemical property distribution - key factors in drug discovery. Typically, this visualization requires time- and resource-consuming compound enumeration, standardization, descriptor calculation, and dimensionality reduction. In this study, we present the Combinatorial Library Neural Network (CoLiNN) designed to predict the projection of compounds on a 2D chemical space map using only their building blocks and reaction information, thus eliminating the need for compound enumeration. Trained on 2.5 K virtual DNA-Encoded Libraries (DELs), CoLiNN demonstrated high predictive performance, accurately predicting the compound position on Generative Topographic Maps (GTMs). GTMs predicted by CoLiNN were found very similar to the maps built for enumerated structures. In the library comparison task, we compared the GTMs of DELs and the ChEMBL database. The similarity-based DELs/ChEMBL rankings obtained with \"true\" and CoLiNN predicted GTMs were consistent. Therefore, CoLiNN has the potential to become the go-to tool for combinatorial compound library design - it can explore the library design space more efficiently by skipping the compound enumeration.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 3","pages":"e202400263"},"PeriodicalIF":2.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11916640/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143657828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Molecular Odor Prediction Using Olfactory Receptor Information. 利用嗅觉受体信息进行分子气味预测。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-03-01 DOI: 10.1002/minf.202400274
Yuta Wakutsu, Hiromasa Kaneko

In fragrance development, the framework development process is a bottleneck from the perspective of labor, cost, and human resource development. Odors vary greatly depending on the structure and functional groups of the molecule. Although odor has been predicted from only the structure of molecules, its practical application remains elusive. In this study, we developed a model for predicting the odor of molecules that have only small differences in structure. Focusing on the mechanism of human olfaction, we divided the mechanism into three levels and constructed three models: a classification model that predicts the presence or absence of binding between molecules and olfactory receptors, a regression model that predicts the strength of binding, and a classification model that predicts the presence or absence of odor based on the strength of binding. Olfactory receptors were used as descriptors to discriminate between similar molecular odors. Our models predicted odor differences between some similar molecules, including optical isomers.

在香氛开发中,从人力、成本和人力资源开发的角度来看,框架开发过程是一个瓶颈。气味的变化很大程度上取决于分子的结构和官能团。虽然人们仅从分子结构就能预测气味,但其实际应用仍然难以捉摸。在这项研究中,我们开发了一个模型来预测结构上只有微小差异的分子的气味。针对人类嗅觉的机制,我们将其分为三个层次,构建了预测分子与嗅觉受体之间是否结合的分类模型、预测结合强度的回归模型和基于结合强度预测气味存在与否的分类模型。嗅觉受体被用作描述符来区分相似的分子气味。我们的模型预测了一些类似分子之间的气味差异,包括光学异构体。
{"title":"Molecular Odor Prediction Using Olfactory Receptor Information.","authors":"Yuta Wakutsu, Hiromasa Kaneko","doi":"10.1002/minf.202400274","DOIUrl":"10.1002/minf.202400274","url":null,"abstract":"<p><p>In fragrance development, the framework development process is a bottleneck from the perspective of labor, cost, and human resource development. Odors vary greatly depending on the structure and functional groups of the molecule. Although odor has been predicted from only the structure of molecules, its practical application remains elusive. In this study, we developed a model for predicting the odor of molecules that have only small differences in structure. Focusing on the mechanism of human olfaction, we divided the mechanism into three levels and constructed three models: a classification model that predicts the presence or absence of binding between molecules and olfactory receptors, a regression model that predicts the strength of binding, and a classification model that predicts the presence or absence of odor based on the strength of binding. Olfactory receptors were used as descriptors to discriminate between similar molecular odors. Our models predicted odor differences between some similar molecules, including optical isomers.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 3","pages":"e202400274"},"PeriodicalIF":2.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11906144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143625317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing Explanations of Molecular Machine Learning Models Generated with Different Methods for the Calculation of Shapley Values. Shapley值计算中不同方法生成的分子机器学习模型的解释比较
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-03-01 DOI: 10.1002/minf.202500067
Alec Lamens, Jürgen Bajorath

Feature attribution methods from explainable artificial intelligence (XAI) provide explanations of machine learning models by quantifying feature importance for predictions of test instances. While features determining individual predictions have frequently been identified in machine learning applications, the consistency of feature importance-based explanations of machine learning models using different attribution methods has not been thoroughly investigated. We have systematically compared model explanations in molecular machine learning. Therefore, a test system of highly accurate compound activity predictions for different targets using different machine learning methods was generated. For these predictions, explanations were computed using methodological variants of the Shapley value formalism, a popular feature attribution approach in machine learning adapted from game theory. Predictions of each model were assessed using a model-agnostic and model-specific Shapley value-based method. The resulting feature importance distributions were characterized and compared by a global statistical analysis using diverse measures. Unexpectedly, methodological variants for Shapley value calculations yielded distinct feature importance distributions for highly accurate predictions. There was only little agreement between alternative model explanations. Our findings suggest that feature importance-based explanations of machine learning predictions should include an assessment of consistency using alternative methods.

可解释人工智能(XAI)的特征归因方法通过量化特征对测试实例预测的重要性来解释机器学习模型。虽然在机器学习应用中,决定单个预测的特征经常被识别出来,但使用不同归因方法对机器学习模型进行的基于特征重要性的解释的一致性尚未得到深入研究。我们系统地比较了分子机器学习中的模型解释。因此,我们利用不同的机器学习方法生成了一个针对不同靶点的高精度化合物活性预测测试系统。对于这些预测,我们使用沙普利值形式主义的方法变体来计算解释,沙普利值形式主义是机器学习中一种流行的特征归因方法,由博弈论改编而来。使用基于 Shapley 值的模型无关和特定模型方法对每个模型的预测进行了评估。通过使用不同的测量方法进行全局统计分析,对得出的特征重要性分布进行了表征和比较。出乎意料的是,夏普利值计算方法的变体产生了不同的特征重要性分布,从而实现了高度准确的预测。替代模型解释之间的一致性很低。我们的研究结果表明,基于特征重要性的机器学习预测解释应包括使用替代方法对一致性进行评估。
{"title":"Comparing Explanations of Molecular Machine Learning Models Generated with Different Methods for the Calculation of Shapley Values.","authors":"Alec Lamens, Jürgen Bajorath","doi":"10.1002/minf.202500067","DOIUrl":"10.1002/minf.202500067","url":null,"abstract":"<p><p>Feature attribution methods from explainable artificial intelligence (XAI) provide explanations of machine learning models by quantifying feature importance for predictions of test instances. While features determining individual predictions have frequently been identified in machine learning applications, the consistency of feature importance-based explanations of machine learning models using different attribution methods has not been thoroughly investigated. We have systematically compared model explanations in molecular machine learning. Therefore, a test system of highly accurate compound activity predictions for different targets using different machine learning methods was generated. For these predictions, explanations were computed using methodological variants of the Shapley value formalism, a popular feature attribution approach in machine learning adapted from game theory. Predictions of each model were assessed using a model-agnostic and model-specific Shapley value-based method. The resulting feature importance distributions were characterized and compared by a global statistical analysis using diverse measures. Unexpectedly, methodological variants for Shapley value calculations yielded distinct feature importance distributions for highly accurate predictions. There was only little agreement between alternative model explanations. Our findings suggest that feature importance-based explanations of machine learning predictions should include an assessment of consistency using alternative methods.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 3","pages":"e202500067"},"PeriodicalIF":2.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11925390/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143670517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Integrated Fuzzy Neural Network and Topological Data Analysis for Molecular Graph Representation Learning and Property Forecasting. 基于模糊神经网络和拓扑数据分析的分子图表示学习和性质预测。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-03-01 DOI: 10.1002/minf.202400335
Phu Pham

Within a recent decade, graph neural network (GNN) has emerged as a powerful neural architecture for various graph-structured data modelling and task-driven representation learning problems. Recent studies have highlighted the remarkable capabilities of GNNs in handling complex graph representation learning tasks, achieving state-of-the-art results in node/graph classification, regression, and generation. However, most traditional GNN-based architectures like GCN and GraphSAGE still faced several challenges related to the capability of preserving the multi-scaled topological structures. These models primarily focus on capturing local neighborhood information, often failing to retain global structural features essential for graph-level representation and classification tasks. Furthermore, their expressiveness is limited when learning topological structures in complex molecular graph datasets. To overcome these limitations, in this paper, we proposed a novel graph neural architecture which is an integration between neuro-fuzzy network and topological graph learning approach, naming as: FTPG. Specifically, within our proposed FTPG model, we introduce a novel approach to molecular graph representation and property prediction by integrating multi-scaled topological graph learning with advanced neural components. The architecture employs separate graph neural learning modules to effectively capture both local graph-based structures as well as global topological features. Moreover, to further address feature uncertainty in the global-view representation, a multi-layered neuro-fuzzy network is incorporated within our model to enhance the robustness and expressiveness of the learned molecular graph embeddings. This combinatorial approach can assist to leverage the strengths of multi-view and multi-modal neural learning, enabling FTPG to deliver superior performance in molecular graph tasks. Extensive experiments on real-world/benchmark molecular datasets demonstrate the effectiveness of our proposed FTPG model. It consistently outperforms state-of-the-art GNN-based baselines categorized in different approaches, including canonical local proximity message passing based, graph transformer-based, and topology-driven approaches.

近十年来,图神经网络(GNN)已经成为一种强大的神经架构,用于各种图结构数据建模和任务驱动的表示学习问题。最近的研究强调了gnn在处理复杂图表示学习任务方面的卓越能力,在节点/图分类、回归和生成方面取得了最先进的结果。然而,大多数传统的基于gnn的体系结构,如GCN和GraphSAGE,仍然面临着与保留多尺度拓扑结构能力相关的一些挑战。这些模型主要关注于捕获局部邻域信息,通常不能保留图级表示和分类任务所必需的全局结构特征。此外,当学习复杂分子图数据集的拓扑结构时,它们的表达能力受到限制。为了克服这些限制,本文提出了一种新的图神经结构,它是神经模糊网络和拓扑图学习方法的结合,命名为:FTPG。具体来说,在我们提出的FTPG模型中,我们通过将多尺度拓扑图学习与高级神经组件相结合,引入了一种新的分子图表示和性质预测方法。该架构采用单独的图神经学习模块来有效地捕获局部基于图的结构和全局拓扑特征。此外,为了进一步解决全局视图表示中的特征不确定性,我们在模型中加入了多层神经模糊网络,以增强学习到的分子图嵌入的鲁棒性和表达性。这种组合方法可以帮助利用多视图和多模态神经学习的优势,使FTPG在分子图任务中提供卓越的性能。在现实世界/基准分子数据集上的大量实验证明了我们提出的FTPG模型的有效性。它始终优于分类在不同方法中的最先进的基于gnn的基线,包括基于规范的本地接近消息传递、基于图转换器和拓扑驱动的方法。
{"title":"An Integrated Fuzzy Neural Network and Topological Data Analysis for Molecular Graph Representation Learning and Property Forecasting.","authors":"Phu Pham","doi":"10.1002/minf.202400335","DOIUrl":"10.1002/minf.202400335","url":null,"abstract":"<p><p>Within a recent decade, graph neural network (GNN) has emerged as a powerful neural architecture for various graph-structured data modelling and task-driven representation learning problems. Recent studies have highlighted the remarkable capabilities of GNNs in handling complex graph representation learning tasks, achieving state-of-the-art results in node/graph classification, regression, and generation. However, most traditional GNN-based architectures like GCN and GraphSAGE still faced several challenges related to the capability of preserving the multi-scaled topological structures. These models primarily focus on capturing local neighborhood information, often failing to retain global structural features essential for graph-level representation and classification tasks. Furthermore, their expressiveness is limited when learning topological structures in complex molecular graph datasets. To overcome these limitations, in this paper, we proposed a novel graph neural architecture which is an integration between neuro-fuzzy network and topological graph learning approach, naming as: FTPG. Specifically, within our proposed FTPG model, we introduce a novel approach to molecular graph representation and property prediction by integrating multi-scaled topological graph learning with advanced neural components. The architecture employs separate graph neural learning modules to effectively capture both local graph-based structures as well as global topological features. Moreover, to further address feature uncertainty in the global-view representation, a multi-layered neuro-fuzzy network is incorporated within our model to enhance the robustness and expressiveness of the learned molecular graph embeddings. This combinatorial approach can assist to leverage the strengths of multi-view and multi-modal neural learning, enabling FTPG to deliver superior performance in molecular graph tasks. Extensive experiments on real-world/benchmark molecular datasets demonstrate the effectiveness of our proposed FTPG model. It consistently outperforms state-of-the-art GNN-based baselines categorized in different approaches, including canonical local proximity message passing based, graph transformer-based, and topology-driven approaches.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 3","pages":"e202400335"},"PeriodicalIF":2.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143616256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovery of New HER2 Inhibitors via Computational Docking, Pharmacophore Modeling, and Machine Learning. 通过计算对接、药效团建模和机器学习发现新的HER2抑制剂。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-02-01 DOI: 10.1002/minf.202400336
Aseel Yasin Matrouk, Haneen Mohammad, Safa Daoud, Mutasem Omar Taha

The human epidermal growth factor receptor 2 (HER2) is a critical oncogene implicated in the development of various aggressive cancers, particularly breast cancer. Discovering novel HER2 inhibitors is crucial for expanding therapeutic options for HER2-related malignancies. In this study, we present a computational workflow that focuses on generating pharmacophores derived from docked poses of a selected list of 15 diverse, potent HER2 inhibitors, utilizing flexible docking. The resulting pharmacophores, along with other physicochemical molecular descriptors, were then evaluated in a machine learning-quantitative structure-activity relationship (ML-QSAR) analysis against 1,272 HER2 inhibitors. Several machine learning methods were assessed, and a genetic function algorithm (GFA) was employed for feature selection. Ultimately, GFA combined with Bagging and J48Graft classifiers produced the best self-consistent and predictive models. These models highlighted the significance of two pharmacophores, Hypo_1 and Hypo_2, in distinguishing potent from less active inhibitors. The successful ML-QSAR models and their associated pharmacophores were used to screen the National Cancer Institute (NCI) database for novel HER2 inhibitors. Three promising anti-HER2 leads were identified, with the top-performing lead demonstrating an experimental anti-HER2 IC50 value of 3.85 μM. Notably, the three inhibitors exhibited distinct chemical scaffolds compared to existing HER2 inhibitors, as indicated by principal component analysis.

人表皮生长因子受体2 (HER2)是一个重要的致癌基因,与各种侵袭性癌症,特别是乳腺癌的发展有关。发现新的HER2抑制剂对于扩大HER2相关恶性肿瘤的治疗选择至关重要。在这项研究中,我们提出了一个计算工作流程,重点是利用灵活对接,从15种不同的、有效的HER2抑制剂的选定列表的对接姿势中产生药效团。然后,在针对1,272种HER2抑制剂的机器学习-定量结构-活性关系(ML-QSAR)分析中,对所得的药效团以及其他物理化学分子描述符进行评估。评估了几种机器学习方法,并采用遗传函数算法(GFA)进行特征选择。最终,GFA结合Bagging和J48Graft分类器产生了最好的自一致性和预测模型。这些模型突出了两个药效团Hypo_1和Hypo_2在区分强效抑制剂和低活性抑制剂方面的意义。成功的ML-QSAR模型及其相关的药效团用于筛选美国国家癌症研究所(NCI)数据库中的新型HER2抑制剂。3个抗her2引脚被鉴定出来,其中表现最好的引脚的抗her2 IC50值为3.85 μM。值得注意的是,主成分分析表明,与现有的HER2抑制剂相比,这三种抑制剂表现出不同的化学支架。
{"title":"Discovery of New HER2 Inhibitors via Computational Docking, Pharmacophore Modeling, and Machine Learning.","authors":"Aseel Yasin Matrouk, Haneen Mohammad, Safa Daoud, Mutasem Omar Taha","doi":"10.1002/minf.202400336","DOIUrl":"10.1002/minf.202400336","url":null,"abstract":"<p><p>The human epidermal growth factor receptor 2 (HER2) is a critical oncogene implicated in the development of various aggressive cancers, particularly breast cancer. Discovering novel HER2 inhibitors is crucial for expanding therapeutic options for HER2-related malignancies. In this study, we present a computational workflow that focuses on generating pharmacophores derived from docked poses of a selected list of 15 diverse, potent HER2 inhibitors, utilizing flexible docking. The resulting pharmacophores, along with other physicochemical molecular descriptors, were then evaluated in a machine learning-quantitative structure-activity relationship (ML-QSAR) analysis against 1,272 HER2 inhibitors. Several machine learning methods were assessed, and a genetic function algorithm (GFA) was employed for feature selection. Ultimately, GFA combined with Bagging and J48Graft classifiers produced the best self-consistent and predictive models. These models highlighted the significance of two pharmacophores, Hypo_1 and Hypo_2, in distinguishing potent from less active inhibitors. The successful ML-QSAR models and their associated pharmacophores were used to screen the National Cancer Institute (NCI) database for novel HER2 inhibitors. Three promising anti-HER2 leads were identified, with the top-performing lead demonstrating an experimental anti-HER2 IC<sub>50</sub> value of 3.85 μM. Notably, the three inhibitors exhibited distinct chemical scaffolds compared to existing HER2 inhibitors, as indicated by principal component analysis.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 2","pages":"e202400336"},"PeriodicalIF":2.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143458679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MAYA (Multiple ActivitY Analyzer): An Open Access Tool to Explore Structure-Multiple Activity Relationships in the Chemical Universe. MAYA(多活性分析仪):一个开放访问工具,探索化学宇宙中的结构-多活性关系。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-02-01 DOI: 10.1002/minf.202400306
J Israel Espinoza-Castañeda, José L Medina-Franco

Herein, we introduce MAYA (Multiple Activity Analyzer), a tool designed to automatically construct a chemical multiverse, generating multiple visualizations of chemical spaces of a compound data set described by structural descriptors of different nature such as Molecular ACCess Systems (MACCS) keys, extended connectivity fingerprints with different radius, molecular descriptors with pharmaceutical relevance, and bioactivity descriptors. These representations are integrated with various data visualization techniques for the automated analysis focused on structure - multiple activity/property relationships, enabling analysis for various problems set in user-friendly source software. The source code of MAYA is freely available on GitHub at https://github.com/IsrC11/MAYA.git.

本文介绍了一种自动构建化学多元宇宙的工具MAYA (Multiple Activity Analyzer),该工具可以生成化合物数据集化学空间的多个可视化,这些化学空间由不同性质的结构描述符(如分子访问系统(Molecular ACCess Systems, MACCS)密钥、不同半径的扩展连接指纹、具有药物相关性的分子描述符和生物活性描述符)描述。这些表示与各种数据可视化技术集成,用于关注结构-多活动/属性关系的自动化分析,从而能够在用户友好的源软件中对各种问题集进行分析。MAYA的源代码可以在GitHub上免费获得https://github.com/IsrC11/MAYA.git。
{"title":"MAYA (Multiple ActivitY Analyzer): An Open Access Tool to Explore Structure-Multiple Activity Relationships in the Chemical Universe.","authors":"J Israel Espinoza-Castañeda, José L Medina-Franco","doi":"10.1002/minf.202400306","DOIUrl":"10.1002/minf.202400306","url":null,"abstract":"<p><p>Herein, we introduce MAYA (Multiple Activity Analyzer), a tool designed to automatically construct a chemical multiverse, generating multiple visualizations of chemical spaces of a compound data set described by structural descriptors of different nature such as Molecular ACCess Systems (MACCS) keys, extended connectivity fingerprints with different radius, molecular descriptors with pharmaceutical relevance, and bioactivity descriptors. These representations are integrated with various data visualization techniques for the automated analysis focused on structure - multiple activity/property relationships, enabling analysis for various problems set in user-friendly source software. The source code of MAYA is freely available on GitHub at https://github.com/IsrC11/MAYA.git.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 2","pages":"e202400306"},"PeriodicalIF":2.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11812492/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143391311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Attempt to Classify Elementary Reactions on the Basis of TS Motifs. 基于TS基序对元素反应进行分类的尝试。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-02-01 DOI: 10.1002/minf.202400040
Kenji Hori, Yujiro Matsuo, Toru Yamaguchi, Kimito Funatsu

Reactions commonly used in synthetic organic chemistry are named after their discoverers or developers. They are called the name reactions and generally consist of several elementary reactions. Quantum chemical calculations can optimize transition state (TS) structures of the elementary reactions. The geometrical feature of TS is called TS motif. We have constructed a database (QMRDB) with the TS motif information and have been continuing to accumulate them. In the present study, we extracted 102 elementary reactions from the QMRDB and attempted to classify them using the Kohonen self-organization map. As the results, all the TS motifs were clustered. By firing a target compound on a Kohonen map generated, we expect to be able to easily find the TS motifs most similar to the target.

合成有机化学中常用的反应以其发现者或显影者的名字命名。它们被称为名称反应,通常由几个基本反应组成。量子化学计算可以优化基本反应的过渡态结构。TS的几何特征称为TS基序。我们建立了TS基序信息数据库(QMRDB),并不断积累。在本研究中,我们从QMRDB中提取了102个基本反应,并尝试使用Kohonen自组织图对它们进行分类。结果表明,所有的TS基序都被聚类。通过在生成的Kohonen图上发射目标化合物,我们期望能够很容易地找到与目标最相似的TS基序。
{"title":"An Attempt to Classify Elementary Reactions on the Basis of TS Motifs.","authors":"Kenji Hori, Yujiro Matsuo, Toru Yamaguchi, Kimito Funatsu","doi":"10.1002/minf.202400040","DOIUrl":"10.1002/minf.202400040","url":null,"abstract":"<p><p>Reactions commonly used in synthetic organic chemistry are named after their discoverers or developers. They are called the name reactions and generally consist of several elementary reactions. Quantum chemical calculations can optimize transition state (TS) structures of the elementary reactions. The geometrical feature of TS is called TS motif. We have constructed a database (QMRDB) with the TS motif information and have been continuing to accumulate them. In the present study, we extracted 102 elementary reactions from the QMRDB and attempted to classify them using the Kohonen self-organization map. As the results, all the TS motifs were clustered. By firing a target compound on a Kohonen map generated, we expect to be able to easily find the TS motifs most similar to the target.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 2","pages":"e202400040"},"PeriodicalIF":2.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833755/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143440926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting the Price of Molecules Using Their Predicted Synthetic Pathways. 利用预测的合成途径预测分子的价格。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-02-01 DOI: 10.1002/minf.202400039
Massina Abderrahmane, Hamza Tajmouati, Vinicius Barros Ribeiro da Silva, Quentin Perron

Currently, numerous metrics allow chemists and computational chemists to refine and filter libraries of virtual molecules in order to prioritize their synthesis. Some of the most commonly used metrics and models are QSAR models, docking scores, diverse druggability metrics, and synthetic feasibility scores to name only a few. To our knowledge, among the known metrics, a function which estimates the price of a novel virtual molecule and which takes into account the availability and price of starting materials has not been considered before in literature. Being able to make such a prediction could improve and accelerate the decision-making process related to the cost-of-goods. Taking advantage of recent advances in the field of Computer Aided Synthetic Planning (CASP), we decided to investigate if the predicted retrosynthetic pathways of a given molecule and the prices of its associated starting materials could be good features to predict the price of that compound. In this work, we present a deep learning model, RetroPriceNet, that predicts the price of molecules using their predicted synthetic pathways. On a holdout test set, the model achieves better performance than the state-of-the-art model. The developed approach takes into account the synthetic feasibility of molecules and the availability and prices of the starting materials.

目前,许多指标允许化学家和计算化学家精炼和过滤虚拟分子库,以便优先考虑它们的合成。一些最常用的指标和模型是QSAR模型、对接评分、多种药物可药性指标和合成可行性评分等。据我们所知,在已知的指标中,估计新型虚拟分子价格并考虑到起始材料的可用性和价格的函数在文献中尚未被考虑过。能够做出这样的预测可以改善和加快与货物成本有关的决策过程。利用计算机辅助合成计划(CASP)领域的最新进展,我们决定研究给定分子的预测反合成途径及其相关起始材料的价格是否可以作为预测该化合物价格的良好特征。在这项工作中,我们提出了一个深度学习模型RetroPriceNet,该模型使用预测的合成途径来预测分子的价格。在holdout测试集上,该模型比最先进的模型实现了更好的性能。所开发的方法考虑了分子合成的可行性以及起始材料的可用性和价格。
{"title":"Predicting the Price of Molecules Using Their Predicted Synthetic Pathways.","authors":"Massina Abderrahmane, Hamza Tajmouati, Vinicius Barros Ribeiro da Silva, Quentin Perron","doi":"10.1002/minf.202400039","DOIUrl":"10.1002/minf.202400039","url":null,"abstract":"<p><p>Currently, numerous metrics allow chemists and computational chemists to refine and filter libraries of virtual molecules in order to prioritize their synthesis. Some of the most commonly used metrics and models are QSAR models, docking scores, diverse druggability metrics, and synthetic feasibility scores to name only a few. To our knowledge, among the known metrics, a function which estimates the price of a novel virtual molecule and which takes into account the availability and price of starting materials has not been considered before in literature. Being able to make such a prediction could improve and accelerate the decision-making process related to the cost-of-goods. Taking advantage of recent advances in the field of Computer Aided Synthetic Planning (CASP), we decided to investigate if the predicted retrosynthetic pathways of a given molecule and the prices of its associated starting materials could be good features to predict the price of that compound. In this work, we present a deep learning model, RetroPriceNet, that predicts the price of molecules using their predicted synthetic pathways. On a holdout test set, the model achieves better performance than the state-of-the-art model. The developed approach takes into account the synthetic feasibility of molecules and the availability and prices of the starting materials.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 2","pages":"e202400039"},"PeriodicalIF":2.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143066819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of the Appropriate Temperature and Pressure for Polymer Dissolution Using Machine Learning Models. 使用机器学习模型预测聚合物溶解的适当温度和压力。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-02-01 DOI: 10.1002/minf.202400193
Dorsa Dadashi, Marjan Kaedi, Parsa Dadashi, Suprakas Sinha Ray

The widespread use of polymer solutions in the chemical industry poses a significant challenge in determining optimal dissolution conditions. Traditionally, researchers have relied on experimental methods to estimate the processing parameters needed to dissolve polymers, often requiring numerous iterations of testing different temperatures and pressures. This approach is both costly and time-consuming. In this study, for the first time, we present a machine learning-based approach to predict the minimum temperature and pressure required for polymer dissolution, correlating molecular weight and chemical structure of both the polymer and solvent and its weight percent. Using a dataset compiled from existing literature, which includes key factors influencing polymer dissolution, we also extracted chemical bond information from the molecular structures of polymer-solvent systems. Six different machine learning algorithms, including linear regression, k-nearest neighbors, regression trees, random forests, multilayer perceptron neural networks, and support vector regression, were employed to develop predictive models. Among these, the Random Forest model achieved the highest accuracy, with R2 values of 0.931 and 0.942 for temperature and pressure predictions, respectively. This novel approach eliminates the need for repetitive experimental testing, offering a more efficient pathway to determining dissolution conditions.

聚合物溶液在化学工业中的广泛应用对确定最佳溶解条件提出了重大挑战。传统上,研究人员依靠实验方法来估计溶解聚合物所需的工艺参数,通常需要多次重复测试不同的温度和压力。这种方法既昂贵又耗时。在这项研究中,我们首次提出了一种基于机器学习的方法来预测聚合物溶解所需的最低温度和压力,将聚合物和溶剂的分子量和化学结构及其重量百分比相关联。利用现有文献汇编的数据集,包括影响聚合物溶解的关键因素,我们还从聚合物溶剂体系的分子结构中提取了化学键信息。六种不同的机器学习算法,包括线性回归、k近邻、回归树、随机森林、多层感知器神经网络和支持向量回归,被用于开发预测模型。其中Random Forest模型的预测精度最高,预测温度和压力的R2分别为0.931和0.942。这种新颖的方法消除了重复实验测试的需要,为确定溶解条件提供了更有效的途径。
{"title":"Prediction of the Appropriate Temperature and Pressure for Polymer Dissolution Using Machine Learning Models.","authors":"Dorsa Dadashi, Marjan Kaedi, Parsa Dadashi, Suprakas Sinha Ray","doi":"10.1002/minf.202400193","DOIUrl":"10.1002/minf.202400193","url":null,"abstract":"<p><p>The widespread use of polymer solutions in the chemical industry poses a significant challenge in determining optimal dissolution conditions. Traditionally, researchers have relied on experimental methods to estimate the processing parameters needed to dissolve polymers, often requiring numerous iterations of testing different temperatures and pressures. This approach is both costly and time-consuming. In this study, for the first time, we present a machine learning-based approach to predict the minimum temperature and pressure required for polymer dissolution, correlating molecular weight and chemical structure of both the polymer and solvent and its weight percent. Using a dataset compiled from existing literature, which includes key factors influencing polymer dissolution, we also extracted chemical bond information from the molecular structures of polymer-solvent systems. Six different machine learning algorithms, including linear regression, k-nearest neighbors, regression trees, random forests, multilayer perceptron neural networks, and support vector regression, were employed to develop predictive models. Among these, the Random Forest model achieved the highest accuracy, with R<sup>2</sup> values of 0.931 and 0.942 for temperature and pressure predictions, respectively. This novel approach eliminates the need for repetitive experimental testing, offering a more efficient pathway to determining dissolution conditions.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 2","pages":"e202400193"},"PeriodicalIF":2.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143391324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Molecular Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1