首页 > 最新文献

Molecular Informatics最新文献

英文 中文
Interpret Gaussian Process Models by Using Integrated Gradients. 利用综合梯度解释高斯过程模型
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-11-26 DOI: 10.1002/minf.202400051
Fan Zhang, Naoaki Ono, Shigehiko Kanaya

Gaussian process regression (GPR) is a nonparametric probabilistic model capable of computing not only the predicted mean but also the predicted standard deviation, which represents the confidence level of predictions. It offers great flexibility as it can be non-linearized by designing the kernel function, made robust against outliers by altering the likelihood function, and extended to classification models. Recently, models combining deep learning with GPR, such as Deep Kernel Learning GPR, have been proposed and reported to achieve higher accuracy than GPR. However, due to its nonparametric nature, GPR is challenging to interpret. While Explainable AI (XAI) methods like LIME or kernel SHAP can interpret the predicted mean, interpreting the predicted standard deviation remains difficult. In this study, we propose a novel method to interpret the prediction of GPR by evaluating the importance of explanatory variables. We have incorporated the GPR model with the Integrated Gradients (IG) method to assess the contribution of each feature to the prediction. By evaluating the standard deviation of the posterior distribution, we show that the IG approach provides a detailed decomposition of the predictive uncertainty, attributing it to the uncertainty in individual feature contributions. This methodology not only highlights the variables that are most influential in the prediction but also provides insights into the reliability of the model by quantifying the uncertainty associated with each feature. Through this, we can obtain a deeper understanding of the model's behavior and foster trust in its predictions, especially in domains where interpretability is as crucial as accuracy.

高斯过程回归(GPR)是一种非参数概率模型,不仅能计算预测的平均值,还能计算预测的标准偏差,标准偏差代表预测的置信度。它具有极大的灵活性,可以通过设计核函数使其非线性化,通过改变似然函数使其对异常值具有鲁棒性,并扩展到分类模型。最近,有人提出了将深度学习与 GPR 相结合的模型,如深度核学习 GPR,据报道,其准确率高于 GPR。然而,由于其非参数性质,GPR 的解释具有挑战性。虽然像 LIME 或核 SHAP 这样的可解释人工智能(XAI)方法可以解释预测的平均值,但解释预测的标准偏差仍然很困难。在本研究中,我们提出了一种通过评估解释变量的重要性来解释 GPR 预测的新方法。我们将 GPR 模型与综合梯度(IG)方法相结合,以评估每个特征对预测的贡献。通过评估后验分布的标准偏差,我们发现综合梯度法对预测的不确定性进行了详细分解,将其归因于各个特征贡献的不确定性。这种方法不仅能突出对预测影响最大的变量,还能通过量化与每个特征相关的不确定性,深入了解模型的可靠性。通过这种方法,我们可以更深入地了解模型的行为,并增强对其预测结果的信任,尤其是在可解释性与准确性同等重要的领域。
{"title":"Interpret Gaussian Process Models by Using Integrated Gradients.","authors":"Fan Zhang, Naoaki Ono, Shigehiko Kanaya","doi":"10.1002/minf.202400051","DOIUrl":"10.1002/minf.202400051","url":null,"abstract":"<p><p>Gaussian process regression (GPR) is a nonparametric probabilistic model capable of computing not only the predicted mean but also the predicted standard deviation, which represents the confidence level of predictions. It offers great flexibility as it can be non-linearized by designing the kernel function, made robust against outliers by altering the likelihood function, and extended to classification models. Recently, models combining deep learning with GPR, such as Deep Kernel Learning GPR, have been proposed and reported to achieve higher accuracy than GPR. However, due to its nonparametric nature, GPR is challenging to interpret. While Explainable AI (XAI) methods like LIME or kernel SHAP can interpret the predicted mean, interpreting the predicted standard deviation remains difficult. In this study, we propose a novel method to interpret the prediction of GPR by evaluating the importance of explanatory variables. We have incorporated the GPR model with the Integrated Gradients (IG) method to assess the contribution of each feature to the prediction. By evaluating the standard deviation of the posterior distribution, we show that the IG approach provides a detailed decomposition of the predictive uncertainty, attributing it to the uncertainty in individual feature contributions. This methodology not only highlights the variables that are most influential in the prediction but also provides insights into the reliability of the model by quantifying the uncertainty associated with each feature. Through this, we can obtain a deeper understanding of the model's behavior and foster trust in its predictions, especially in domains where interpretability is as crucial as accuracy.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400051"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695984/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142716611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extended Activity Cliffs-Driven Approaches on Data Splitting for the Study of Bioactivity Machine Learning Predictions. 用于生物活性机器学习预测研究的数据分割扩展活动峭壁驱动方法。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-11-18 DOI: 10.1002/minf.202400054
Kenneth López-Pérez, Ramón Alain Miranda-Quintana

The presence of Activity Cliffs (ACs) has been known to represent a challenge for QSAR modeling. With its high data dependency, Machine Learning QSAR models will be directly influenced by the activity landscape. We propose several extended similarity and extended SALI methods to study the implications of ACs distribution on the training and test sets on the model's errors. Ununiform ACs and chemical space distribution tend to lead to worse models than the proposed uniform methods. ML modeling on AC-rich sets needs to be analyzed case-by-case. Proposed methods can be used as a tool to study the datasets, but as far as generalization, random splitting was the better-performing data splitting alternative overall.

众所周知,活性悬崖(AC)的存在对 QSAR 建模是一个挑战。由于高度依赖数据,机器学习 QSAR 模型将直接受到活动景观的影响。我们提出了几种扩展相似性和扩展 SALI 方法,以研究训练集和测试集上的 ACs 分布对模型误差的影响。与所提出的统一方法相比,不统一的 ACs 和化学空间分布往往会导致更差的模型。在富AC集上的 ML 建模需要逐个分析。建议的方法可作为研究数据集的工具,但就泛化而言,随机拆分是总体表现更好的数据拆分替代方法。
{"title":"Extended Activity Cliffs-Driven Approaches on Data Splitting for the Study of Bioactivity Machine Learning Predictions.","authors":"Kenneth López-Pérez, Ramón Alain Miranda-Quintana","doi":"10.1002/minf.202400054","DOIUrl":"10.1002/minf.202400054","url":null,"abstract":"<p><p>The presence of Activity Cliffs (ACs) has been known to represent a challenge for QSAR modeling. With its high data dependency, Machine Learning QSAR models will be directly influenced by the activity landscape. We propose several extended similarity and extended SALI methods to study the implications of ACs distribution on the training and test sets on the model's errors. Ununiform ACs and chemical space distribution tend to lead to worse models than the proposed uniform methods. ML modeling on AC-rich sets needs to be analyzed case-by-case. Proposed methods can be used as a tool to study the datasets, but as far as generalization, random splitting was the better-performing data splitting alternative overall.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400054"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12143937/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142668097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization. 从高维到人类洞察:探索化学空间可视化的降维。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-12-05 DOI: 10.1002/minf.202400265
Alexey A Orlov, Tagir N Akhmetshin, Dragos Horvath, Gilles Marcou, Alexandre Varnek

Dimensionality reduction is an important exploratory data analysis method that allows high-dimensional data to be represented in a human-interpretable lower-dimensional space. It is extensively applied in the analysis of chemical libraries, where chemical structure data - represented as high-dimensional feature vectors-are transformed into 2D or 3D chemical space maps. In this paper, commonly used dimensionality reduction techniques - Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and Generative Topographic Mapping (GTM) - are evaluated in terms of neighborhood preservation and visualization capability of sets of small molecules from the ChEMBL database.

降维是一种重要的探索性数据分析方法,它允许高维数据在人类可解释的低维空间中表示。它广泛应用于化学文库的分析,其中化学结构数据-表示为高维特征向量-转换为二维或三维化学空间图。在本文中,常用的降维技术-主成分分析(PCA), t-分布随机邻居嵌入(t-SNE),均匀流形逼近和投影(UMAP)和生成地形映射(GTM) -在ChEMBL数据库中小分子集的邻域保存和可视化能力方面进行了评估。
{"title":"From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization.","authors":"Alexey A Orlov, Tagir N Akhmetshin, Dragos Horvath, Gilles Marcou, Alexandre Varnek","doi":"10.1002/minf.202400265","DOIUrl":"10.1002/minf.202400265","url":null,"abstract":"<p><p>Dimensionality reduction is an important exploratory data analysis method that allows high-dimensional data to be represented in a human-interpretable lower-dimensional space. It is extensively applied in the analysis of chemical libraries, where chemical structure data - represented as high-dimensional feature vectors-are transformed into 2D or 3D chemical space maps. In this paper, commonly used dimensionality reduction techniques - Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and Generative Topographic Mapping (GTM) - are evaluated in terms of neighborhood preservation and visualization capability of sets of small molecules from the ChEMBL database.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400265"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11733715/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142780626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GDMol: Generative Double-Masking Self-Supervised Learning for Molecular Property Prediction. GDMol:用于分子特性预测的生成式双掩蔽自我监督学习。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-10-24 DOI: 10.1002/minf.202400146
Yingxu Liu, Qing Fan, Chengcheng Xu, Xiangzhen Ning, Yu Wang, Yang Liu, Yu Xie, Yanmin Zhang, Yadong Chen, Haichun Liu

Background: Effective molecular feature representation is crucial for drug property prediction. Recent years have seen increased attention on graph neural networks (GNNs) that are pre-trained using self-supervised learning techniques, aiming to overcome the scarcity of labeled data in molecular property prediction. Traditional GNNs in self-supervised molecular property prediction typically perform a single masking operation on the nodes and edges of the input molecular graph, masking only local information and insufficient for thorough self-supervised training.

Method: Hence, we propose a model for molecular property prediction based on generative double-masking self-supervised learning, termed as GDMol. This integrates generative learning into the self-supervised learning framework for latent representation, and applies a second round of masking to these latent representations, enabling the model to better capture global information and semantic knowledge of the molecules for a richer, more informative representation, thereby achieving more accurate and robust molecular property prediction.

Results: Our experiments on 5 datasets demonstrated superior performance of GDMol in predicting molecular properties across different domains. Moreover, we used the masking operation to traverse through the gradient changes of each node, the magnitude and sign of which reflect the positive and negative contribution respectively of the local structure in the molecule to the prediction outcome. This in-depth interpretative analysis not only enhances the model's interpretability, but also provides more targeted insights and direction for optimizing drug molecules.

Conclusions: In summary, this research offers novel insights on improving molecular property prediction tasks, and paves the way for further research on the application of generative learning and self-supervised learning in the field of chemistry.

背景:有效的分子特征表示对于药物性质预测至关重要。近年来,使用自我监督学习技术预先训练的图神经网络(GNN)受到越来越多的关注,其目的是克服分子性质预测中标记数据稀缺的问题。传统的自监督分子性质预测 GNN 通常只对输入分子图的节点和边进行一次屏蔽操作,屏蔽的只是局部信息,不足以进行彻底的自监督训练:因此,我们提出了一种基于生成式双掩蔽自监督学习的分子特性预测模型,称为 GDMol。它将生成学习整合到潜在表征的自我监督学习框架中,并对这些潜在表征进行第二轮掩蔽,使模型能够更好地捕捉分子的全局信息和语义知识,从而获得更丰富、更翔实的表征,从而实现更准确、更稳健的分子性质预测:我们在 5 个数据集上进行的实验表明,GDMol 在预测不同领域的分子特性方面表现出色。此外,我们利用掩码操作遍历了每个节点的梯度变化,其大小和符号分别反映了分子中局部结构对预测结果的正负贡献。这种深入的解释性分析不仅增强了模型的可解释性,还为优化药物分子提供了更有针对性的见解和方向:总之,这项研究为改进分子性质预测任务提供了新的见解,并为生成学习和自监督学习在化学领域的进一步应用研究铺平了道路。
{"title":"GDMol: Generative Double-Masking Self-Supervised Learning for Molecular Property Prediction.","authors":"Yingxu Liu, Qing Fan, Chengcheng Xu, Xiangzhen Ning, Yu Wang, Yang Liu, Yu Xie, Yanmin Zhang, Yadong Chen, Haichun Liu","doi":"10.1002/minf.202400146","DOIUrl":"10.1002/minf.202400146","url":null,"abstract":"<p><strong>Background: </strong>Effective molecular feature representation is crucial for drug property prediction. Recent years have seen increased attention on graph neural networks (GNNs) that are pre-trained using self-supervised learning techniques, aiming to overcome the scarcity of labeled data in molecular property prediction. Traditional GNNs in self-supervised molecular property prediction typically perform a single masking operation on the nodes and edges of the input molecular graph, masking only local information and insufficient for thorough self-supervised training.</p><p><strong>Method: </strong>Hence, we propose a model for molecular property prediction based on generative double-masking self-supervised learning, termed as GDMol. This integrates generative learning into the self-supervised learning framework for latent representation, and applies a second round of masking to these latent representations, enabling the model to better capture global information and semantic knowledge of the molecules for a richer, more informative representation, thereby achieving more accurate and robust molecular property prediction.</p><p><strong>Results: </strong>Our experiments on 5 datasets demonstrated superior performance of GDMol in predicting molecular properties across different domains. Moreover, we used the masking operation to traverse through the gradient changes of each node, the magnitude and sign of which reflect the positive and negative contribution respectively of the local structure in the molecule to the prediction outcome. This in-depth interpretative analysis not only enhances the model's interpretability, but also provides more targeted insights and direction for optimizing drug molecules.</p><p><strong>Conclusions: </strong>In summary, this research offers novel insights on improving molecular property prediction tasks, and paves the way for further research on the application of generative learning and self-supervised learning in the field of chemistry.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400146"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142504416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein Binding Site Representation in Latent Space. 潜空间中的蛋白质结合位点表征
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-12-18 DOI: 10.1002/minf.202400205
Frederieke Lohmann, Stephan Allenspach, Kenneth Atz, Carl C G Schiebroek, Jan A Hiss, Gisbert Schneider

Interpretability and reliability of deep learning models are important for computer-based drug discovery. Aiming to understand feature perception by such a model, we investigate a graph neural network for affinity prediction of protein-ligand complexes. We assess a latent representation of ligand binding sites and investigate underlying geometric structure in this latent space and its relation to protein function. We introduce an automated computational pipeline for dimensionality reduction, clustering, hypothesis testing, and visualization of latent space. The results indicate that the learned protein latent space is inherently structured and not randomly distributed. Several of the identified protein binding site clusters in latent space correspond to functional protein families. Ligand size was found to be a determinant of cluster geometry. The computational pipeline proved applicable to latent space analysis and interpretation and can be adapted to work for different datasets and deep learning models.

深度学习模型的可解释性和可靠性对于基于计算机的药物发现非常重要。为了了解此类模型的特征感知,我们研究了用于蛋白质配体复合物亲和力预测的图神经网络。我们评估了配体结合位点的潜在表征,并研究了该潜在空间的潜在几何结构及其与蛋白质功能的关系。我们引入了一个自动计算管道,用于潜在空间的降维、聚类、假设检验和可视化。结果表明,学习到的蛋白质潜空间是固有结构,而不是随机分布的。潜空间中发现的几个蛋白质结合位点群与功能蛋白质家族相对应。研究发现,配体的大小决定了簇的几何形状。事实证明,该计算管道适用于潜空间分析和解释,并可适用于不同的数据集和深度学习模型。
{"title":"Protein Binding Site Representation in Latent Space.","authors":"Frederieke Lohmann, Stephan Allenspach, Kenneth Atz, Carl C G Schiebroek, Jan A Hiss, Gisbert Schneider","doi":"10.1002/minf.202400205","DOIUrl":"10.1002/minf.202400205","url":null,"abstract":"<p><p>Interpretability and reliability of deep learning models are important for computer-based drug discovery. Aiming to understand feature perception by such a model, we investigate a graph neural network for affinity prediction of protein-ligand complexes. We assess a latent representation of ligand binding sites and investigate underlying geometric structure in this latent space and its relation to protein function. We introduce an automated computational pipeline for dimensionality reduction, clustering, hypothesis testing, and visualization of latent space. The results indicate that the learned protein latent space is inherently structured and not randomly distributed. Several of the identified protein binding site clusters in latent space correspond to functional protein families. Ligand size was found to be a determinant of cluster geometry. The computational pipeline proved applicable to latent space analysis and interpretation and can be adapted to work for different datasets and deep learning models.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400205"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11733832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142847041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chemography-guided analysis of a reaction path network for ethylene hydrogenation with a model Wilkinson's catalyst. 利用威尔金森催化剂模型对乙烯加氢反应路径网络进行化学分析。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-08-09 DOI: 10.1002/minf.202400063
Philippe Gantzer, Ruben Staub, Yu Harabuchi, Satoshi Maeda, Alexandre Varnek

Visualization and analysis of large chemical reaction networks become rather challenging when conventional graph-based approaches are used. As an alternative, we propose to use the chemical cartography ("chemography") approach, describing the data distribution on a 2-dimensional map. Here, the Generative Topographic Mapping (GTM) algorithm - an advanced chemography approach - has been applied to visualize the reaction path network of a simplified Wilkinson's catalyst-catalyzed hydrogenation containing some 105 structures generated with the help of the Artificial Force Induced Reaction (AFIR) method using either Density Functional Theory or Neural Network Potential (NNP) for potential energy surface calculations. Using new atoms permutation invariant 3D descriptors for structure encoding, we've demonstrated that GTM possesses the abilities to cluster structures that share the same 2D representation, to visualize potential energy surface, to provide an insight on the reaction path exploration as a function of time and to compare reaction path networks obtained with different methods of energy assessment.

如果采用传统的基于图形的方法,大型化学反应网络的可视化和分析就会变得相当具有挑战性。作为替代方案,我们建议使用化学制图("chemography")方法,在二维地图上描述数据分布。在这里,生成地形图(GTM)算法--一种先进的化学制图方法--被应用于简化的威尔金森催化剂催化氢化反应路径网络的可视化,该网络包含在人工力诱导反应(AFIR)方法的帮助下,利用密度泛函理论或神经网络势能(NNP)进行势能面计算而生成的约 105 个结构。通过使用新的原子排列不变三维描述符进行结构编码,我们证明了 GTM 具有对具有相同二维表示的结构进行聚类、可视化势能面、提供反应路径探索随时间变化的洞察力以及比较使用不同能量评估方法获得的反应路径网络的能力。
{"title":"Chemography-guided analysis of a reaction path network for ethylene hydrogenation with a model Wilkinson's catalyst.","authors":"Philippe Gantzer, Ruben Staub, Yu Harabuchi, Satoshi Maeda, Alexandre Varnek","doi":"10.1002/minf.202400063","DOIUrl":"10.1002/minf.202400063","url":null,"abstract":"<p><p>Visualization and analysis of large chemical reaction networks become rather challenging when conventional graph-based approaches are used. As an alternative, we propose to use the chemical cartography (\"chemography\") approach, describing the data distribution on a 2-dimensional map. Here, the Generative Topographic Mapping (GTM) algorithm - an advanced chemography approach - has been applied to visualize the reaction path network of a simplified Wilkinson's catalyst-catalyzed hydrogenation containing some 10<sup>5</sup> structures generated with the help of the Artificial Force Induced Reaction (AFIR) method using either Density Functional Theory or Neural Network Potential (NNP) for potential energy surface calculations. Using new atoms permutation invariant 3D descriptors for structure encoding, we've demonstrated that GTM possesses the abilities to cluster structures that share the same 2D representation, to visualize potential energy surface, to provide an insight on the reaction path exploration as a function of time and to compare reaction path networks obtained with different methods of energy assessment.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400063"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Molecular Design with Direct Inverse Analysis of QSAR/QSPR Model. 利用QSAR/QSPR模型的直接逆分析改进分子设计。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 DOI: 10.1002/minf.202400227
Yuto Shino, Hiromasa Kaneko

Recent advances in machine learning have significantly impacted molecular design, notably the molecular generation method combining the chemical variational autoencoder (VAE) with Gaussian mixture regression (GMR). In this method, a mathematical model is constructed with X as the latent variable of the molecule and Y as the target properties and activities. Through direct inverse analysis of this model, it is possible to generate molecules with the desired target properties. However, this approach outputs many strings that do not follow the simplified molecular input line entry system grammar and generates unrealistic chemical structures in which the properties and activity do not satisfy the target values. In this study, we focus on hierarchical VAE using molecular graphs to address these issues. We confirm that the combination of hierarchical VAE and GMR does not generate invalid outputs and returns molecules that simultaneously satisfy multiple target values. Moreover, we use this method to identify several molecules that are predicted to exhibit activity against drug targets.

机器学习的最新进展对分子设计产生了重大影响,特别是结合化学变分自编码器(VAE)和高斯混合回归(GMR)的分子生成方法。该方法以X为分子的潜在变量,Y为目标分子的性质和活性,建立数学模型。通过对该模型的直接逆分析,可以生成具有所需目标性质的分子。然而,这种方法输出了许多不遵循简化的分子输入行输入系统语法的字符串,并且生成了不现实的化学结构,其中的属性和活性不满足目标值。在本研究中,我们着重于使用分子图的分层VAE来解决这些问题。我们证实了分层VAE和GMR的组合不会产生无效的输出,并且返回同时满足多个目标值的分子。此外,我们使用这种方法来鉴定几种预测对药物靶标具有活性的分子。
{"title":"Improving Molecular Design with Direct Inverse Analysis of QSAR/QSPR Model.","authors":"Yuto Shino, Hiromasa Kaneko","doi":"10.1002/minf.202400227","DOIUrl":"10.1002/minf.202400227","url":null,"abstract":"<p><p>Recent advances in machine learning have significantly impacted molecular design, notably the molecular generation method combining the chemical variational autoencoder (VAE) with Gaussian mixture regression (GMR). In this method, a mathematical model is constructed with X as the latent variable of the molecule and Y as the target properties and activities. Through direct inverse analysis of this model, it is possible to generate molecules with the desired target properties. However, this approach outputs many strings that do not follow the simplified molecular input line entry system grammar and generates unrealistic chemical structures in which the properties and activity do not satisfy the target values. In this study, we focus on hierarchical VAE using molecular graphs to address these issues. We confirm that the combination of hierarchical VAE and GMR does not generate invalid outputs and returns molecules that simultaneously satisfy multiple target values. Moreover, we use this method to identify several molecules that are predicted to exhibit activity against drug targets.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 1","pages":"e202400227"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11724648/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142965748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Insight on the Selectivity of Calyx[4]Arene-Based Inhibitors of Mg2+-Dependent Atp-Hydrolases. 基于Calyx[4]芳烃的Mg2+依赖性atp水解酶抑制剂选择性的结构分析。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-12-05 DOI: 10.1002/minf.202400200
Alexey Rayevsky, Maksym Platonov, Bulgakov Elijah, Dmytro Volochnyuk, Tetyana Veklich, Sergiy Cherenok, Roman Rodik, Vitaliy Kalchenko, Sergiy Kosterin

Located in plasma membranes, ATP hydrolases are involved in several dynamic transport processes, helping to control the movement of ions across cell membranes. ATP hydrolase acts as a transport protein, converting energy from ATP hydrolysis into transport molecules against their concentration gradients. In addition to energy metabolism and active transport, ATP hydrolase is essential for maintaining cellular homeostasis and cell function. This study focused on the domain architecture model of P-type ATPases, which participate in the reaction cycles of ATP hydrolysis carried out by membrane transport systems - Na+, K+-ATPase and Ca2+, Mg2+-ATPase. Targeted modulation of Na+, K+-ATPase and Ca2+, Mg2+-ATPase by unnatural drugs is of greatest interest due to the lack of known effectors. This new discovery presents a convenient model based on our recent experimental studies of the membrane structures and myocytes of the uterine smooth muscle, the myometrium. This current study strongly supports the fact that nanosized calix[4]arenes functionalised on the upper rings of the macrocycle with biologically active phosphonic acid fragments can serve as selective and potent inhibitors of cation-transporting electroenzymes. This is how we discovered that calix[4]arene of methylenebisphosphonic acid C-97 and calix[4]arene of bis-aminophosphonic acid C-107 selectively and effectively (I0.5 <100 nM) inhibit the activity of Mg2+, ATP-dependent electrogenic Na+ K+ plasma membrane pump. As drug discovery in the field of Mg2+-ATPase inhibitors is uncharted territory, basic research holds the key to explaining and predicting the mechanism of interaction and action of different classes of compounds. In light of the presented results, new calix[4]arene compounds can be used as potent inhibitors of Mg2+, ATP-dependent electrogenic ion pumps.

ATP水解酶位于质膜上,参与多种动态转运过程,有助于控制离子在细胞膜上的运动。ATP水解酶作为一种转运蛋白,将ATP水解产生的能量根据其浓度梯度转化为转运分子。除了能量代谢和主动运输外,ATP水解酶对维持细胞稳态和细胞功能至关重要。本研究重点研究了p型ATP酶的结构域结构模型,p型ATP酶参与细胞膜转运系统Na+, K+-ATP酶和Ca2+, Mg2+-ATP酶进行ATP水解的反应循环。由于缺乏已知的效应物,非天然药物对Na+, K+- atp酶和Ca2+, Mg2+- atp酶的靶向调节是最感兴趣的。这一新发现基于我们最近对子宫平滑肌肌层的膜结构和肌细胞的实验研究提供了一个方便的模型。目前的研究有力地支持了这样一个事实,即在大环的上环上功能化的具有生物活性的磷酸片段的纳米杯[4]芳烃可以作为选择性和有效的阳离子运输电酶抑制剂。这就是我们如何选择性和有效地发现亚甲二膦酸C-97和二氨基膦酸C-107的杯状[4]芳烃(I0.5
{"title":"Structural Insight on the Selectivity of Calyx[4]Arene-Based Inhibitors of Mg<sup>2+-</sup>Dependent Atp-Hydrolases.","authors":"Alexey Rayevsky, Maksym Platonov, Bulgakov Elijah, Dmytro Volochnyuk, Tetyana Veklich, Sergiy Cherenok, Roman Rodik, Vitaliy Kalchenko, Sergiy Kosterin","doi":"10.1002/minf.202400200","DOIUrl":"10.1002/minf.202400200","url":null,"abstract":"<p><p>Located in plasma membranes, ATP hydrolases are involved in several dynamic transport processes, helping to control the movement of ions across cell membranes. ATP hydrolase acts as a transport protein, converting energy from ATP hydrolysis into transport molecules against their concentration gradients. In addition to energy metabolism and active transport, ATP hydrolase is essential for maintaining cellular homeostasis and cell function. This study focused on the domain architecture model of P-type ATPases, which participate in the reaction cycles of ATP hydrolysis carried out by membrane transport systems - Na+, K+-ATPase and Ca2+, Mg2+-ATPase. Targeted modulation of Na+, K+-ATPase and Ca2+, Mg2+-ATPase by unnatural drugs is of greatest interest due to the lack of known effectors. This new discovery presents a convenient model based on our recent experimental studies of the membrane structures and myocytes of the uterine smooth muscle, the myometrium. This current study strongly supports the fact that nanosized calix[4]arenes functionalised on the upper rings of the macrocycle with biologically active phosphonic acid fragments can serve as selective and potent inhibitors of cation-transporting electroenzymes. This is how we discovered that calix[4]arene of methylenebisphosphonic acid C-97 and calix[4]arene of bis-aminophosphonic acid C-107 selectively and effectively (I0.5 <100 nM) inhibit the activity of Mg2+, ATP-dependent electrogenic Na+ K+ plasma membrane pump. As drug discovery in the field of Mg2+-ATPase inhibitors is uncharted territory, basic research holds the key to explaining and predicting the mechanism of interaction and action of different classes of compounds. In light of the presented results, new calix[4]arene compounds can be used as potent inhibitors of Mg2+, ATP-dependent electrogenic ion pumps.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400200"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142780628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ERL-ProLiGraph: Enhanced representation learning on protein-ligand graph structured data for binding affinity prediction. ERL-ProLiGraph:用于结合亲和力预测的蛋白质配体图结构数据的增强表示学习。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-12-01 Epub Date: 2024-10-15 DOI: 10.1002/minf.202400044
Gloria Geine Paendong, Soualihou Ngnamsie Njimbouom, Candra Zonyfar, Jeong-Dong Kim

Predicting Protein-Ligand Binding Affinity (PLBA) is pivotal in drug development, as accurate estimations of PLBA expedite the identification of promising drug candidates for specific targets, thereby accelerating the drug discovery process. Despite substantial advancements in PLBA prediction, developing an efficient and more accurate method remains non-trivial. Unlike previous computer-aid PLBA studies which primarily using ligand SMILES and protein sequences represented as strings, this research introduces a Deep Learning-based method, the Enhanced Representation Learning on Protein-Ligand Graph Structured data for Binding Affinity Prediction (ERL-ProLiGraph). The unique aspect of this method is the use of graph representations for both proteins and ligands, intending to learn structural information continued from both to enhance the accuracy of PLBA predictions. In these graphs, nodes represent atomic structures, while edges depict chemical bonds and spatial relationship. The proposed model, leveraging deep-learning algorithms, effectively learns to correlate these graphical representations with binding affinities. This graph-based representations approach enhances the model's ability to capture the complex molecular interactions critical in PLBA. This work represents a promising advancement in computational techniques for protein-ligand binding prediction, offering a potential path toward more efficient and accurate predictions in drug development. Comparative analysis indicates that the proposed ERL-ProLiGraph outperforms previous models, showcasing notable efficacy and providing a more suitable approach for accurate PLBA predictions.

预测蛋白质配体结合亲和力(PLBA)在药物开发中至关重要,因为准确估计 PLBA 可以加快针对特定靶点确定有前途的候选药物,从而加速药物发现过程。尽管在 PLBA 预测方面取得了长足的进步,但开发一种高效、更准确的方法仍然不是一件容易的事。与以往主要使用配体 SMILES 和以字符串表示的蛋白质序列的计算机辅助 PLBA 研究不同,本研究引入了一种基于深度学习的方法,即用于结合亲和力预测的蛋白质配体图结构化数据的增强表示学习(ERL-ProLiGraph)。该方法的独特之处在于同时使用蛋白质和配体的图表示法,目的是从蛋白质和配体中持续学习结构信息,以提高 PLBA 预测的准确性。在这些图中,节点代表原子结构,而边则描述化学键和空间关系。所提出的模型利用深度学习算法,有效地学习将这些图形表示与结合亲和力相关联。这种基于图的表示方法增强了模型捕捉 PLBA 中关键的复杂分子相互作用的能力。这项工作代表了蛋白质配体结合预测计算技术的一大进步,为药物开发中更高效、更准确的预测提供了一条潜在的途径。对比分析表明,所提出的 ERL-ProLiGraph 优于以前的模型,展示了显著的功效,为准确预测 PLBA 提供了更合适的方法。
{"title":"ERL-ProLiGraph: Enhanced representation learning on protein-ligand graph structured data for binding affinity prediction.","authors":"Gloria Geine Paendong, Soualihou Ngnamsie Njimbouom, Candra Zonyfar, Jeong-Dong Kim","doi":"10.1002/minf.202400044","DOIUrl":"10.1002/minf.202400044","url":null,"abstract":"<p><p>Predicting Protein-Ligand Binding Affinity (PLBA) is pivotal in drug development, as accurate estimations of PLBA expedite the identification of promising drug candidates for specific targets, thereby accelerating the drug discovery process. Despite substantial advancements in PLBA prediction, developing an efficient and more accurate method remains non-trivial. Unlike previous computer-aid PLBA studies which primarily using ligand SMILES and protein sequences represented as strings, this research introduces a Deep Learning-based method, the Enhanced Representation Learning on Protein-Ligand Graph Structured data for Binding Affinity Prediction (ERL-ProLiGraph). The unique aspect of this method is the use of graph representations for both proteins and ligands, intending to learn structural information continued from both to enhance the accuracy of PLBA predictions. In these graphs, nodes represent atomic structures, while edges depict chemical bonds and spatial relationship. The proposed model, leveraging deep-learning algorithms, effectively learns to correlate these graphical representations with binding affinities. This graph-based representations approach enhances the model's ability to capture the complex molecular interactions critical in PLBA. This work represents a promising advancement in computational techniques for protein-ligand binding prediction, offering a potential path toward more efficient and accurate predictions in drug development. Comparative analysis indicates that the proposed ERL-ProLiGraph outperforms previous models, showcasing notable efficacy and providing a more suitable approach for accurate PLBA predictions.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400044"},"PeriodicalIF":2.8,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639045/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The freedom space - a new set of commercially available molecules for hit discovery. 自由空间--一组新的商业化分子,用于发现新药。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2024-12-01 Epub Date: 2024-08-22 DOI: 10.1002/minf.202400114
Mykola V Protopopov, Valentyna V Tararina, Fanny Bonachera, Igor M Dzyuba, Anna Kapeliukha, Serhii Hlotov, Oleksii Chuk, Gilles Marcou, Olga Klimchuk, Dragos Horvath, Erik Yeghyan, Olena Savych, Olga O Tarkhanova, Alexandre Varnek, Yurii S Moroz

The advent of high-performance virtual screening techniques nowadays allows drug designers to explore ultra-large sets of candidate compounds in search of molecules predicted to have desired properties. However, the success of such an endeavor heavily relies on the pertinence (drug-likeness and, foremost, chemical feasibility) of these candidates, or otherwise, virtual screening will return valueless "hits", by the garbage in/garbage out principle. The huge popularity of the judiciously enumerated Enamine REAL Space is clear proof of the strength of this Big Data trend in drug discovery. Here we describe a new dataset of make-on-demand compounds called the Freedom space. It follows the principles of Enamine REAL Space and contains highly feasible molecules (synthesis success rate over 75 percent). However, the scaffold and chemography analysis revealed significant differences to both the REAL and biologically annotated compounds from the ChEMBL database. The Freedom Space is a significant extension of the REAL Space and can be utilized for a more comprehensive exploration of the synthetically feasible chemical space in hit finding and hit-to-lead campaigns.

如今,高性能虚拟筛选技术的出现使药物设计人员能够探索超大规模的候选化合物集,寻找具有预期特性的分子。然而,这种努力的成功在很大程度上依赖于这些候选化合物的相关性(药物相似性,最重要的是化学可行性),否则,根据垃圾进/垃圾出原则,虚拟筛选将返回无价值的 "命中"。经过审慎枚举的 Enamine REAL Space 的大受欢迎充分证明了大数据趋势在药物发现中的优势。在此,我们将介绍一个名为 "自由空间"(Freedom space)的按需制造化合物新数据集。它遵循恩胺真实空间的原则,包含高度可行的分子(合成成功率超过 75%)。然而,支架和化学分析显示,它与 REAL 和 ChEMBL 数据库中的生物注释化合物存在显著差异。自由空间是 REAL 空间的重要扩展,可用于在寻找新药和新药先导活动中更全面地探索合成上可行的化学空间。
{"title":"The freedom space - a new set of commercially available molecules for hit discovery.","authors":"Mykola V Protopopov, Valentyna V Tararina, Fanny Bonachera, Igor M Dzyuba, Anna Kapeliukha, Serhii Hlotov, Oleksii Chuk, Gilles Marcou, Olga Klimchuk, Dragos Horvath, Erik Yeghyan, Olena Savych, Olga O Tarkhanova, Alexandre Varnek, Yurii S Moroz","doi":"10.1002/minf.202400114","DOIUrl":"10.1002/minf.202400114","url":null,"abstract":"<p><p>The advent of high-performance virtual screening techniques nowadays allows drug designers to explore ultra-large sets of candidate compounds in search of molecules predicted to have desired properties. However, the success of such an endeavor heavily relies on the pertinence (drug-likeness and, foremost, chemical feasibility) of these candidates, or otherwise, virtual screening will return valueless \"hits\", by the garbage in/garbage out principle. The huge popularity of the judiciously enumerated Enamine REAL Space is clear proof of the strength of this Big Data trend in drug discovery. Here we describe a new dataset of make-on-demand compounds called the Freedom space. It follows the principles of Enamine REAL Space and contains highly feasible molecules (synthesis success rate over 75 percent). However, the scaffold and chemography analysis revealed significant differences to both the REAL and biologically annotated compounds from the ChEMBL database. The Freedom Space is a significant extension of the REAL Space and can be utilized for a more comprehensive exploration of the synthetically feasible chemical space in hit finding and hit-to-lead campaigns.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400114"},"PeriodicalIF":2.8,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142018020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Molecular Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1