首页 > 最新文献

Molecular Informatics最新文献

英文 中文
GCLmf: A Novel Molecular Graph Contrastive Learning Framework Based on Hard Negatives and Application in Toxicity Prediction. GCLmf:基于硬阴性的新型分子图对比学习框架及其在毒性预测中的应用
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-10-18 DOI: 10.1002/minf.202400169
Xinxin Yu, Yuanting Chen, Long Chen, Weihua Li, Yuhao Wang, Yun Tang, Guixia Liu

In silico methods for prediction of chemical toxicity can decrease the cost and increase the efficiency in the early stage of drug discovery. However, due to low accessibility of sufficient and reliable toxicity data, constructing robust and accurate prediction models is challenging. Contrastive learning, a type of self-supervised learning, leverages large unlabeled data to obtain more expressive molecular representations, which can boost the prediction performance on downstream tasks. While molecular graph contrastive learning has gathered growing attentions, current models neglect the quality of negative data set. Here, we proposed a self-supervised pretraining deep learning framework named GCLmf. We first utilized molecular fragments that meet specific conditions as hard negative samples to boost the quality of the negative set and thus increase the difficulty of the proxy tasks during pre-training to learn informative representations. GCLmf has shown excellent predictive power on various molecular property benchmarks and demonstrates high performance in 33 toxicity tasks in comparison with multiple baselines. In addition, we further investigated the necessity of introducing hard negatives in model building and the impact of the proportion of hard negatives on the model.

在药物发现的早期阶段,预测化学毒性的硅学方法可以降低成本,提高效率。然而,由于难以获得充足可靠的毒性数据,构建稳健准确的预测模型具有挑战性。对比学习是一种自监督学习,它利用大量未标记数据来获得更具表现力的分子表征,从而提高下游任务的预测性能。虽然分子图对比学习受到越来越多的关注,但目前的模型忽视了负数据集的质量。在此,我们提出了一种名为 GCLmf 的自监督预训练深度学习框架。我们首先利用符合特定条件的分子片段作为硬负样本,以提高负集的质量,从而在预训练过程中增加代理任务的难度,以学习信息表征。GCLmf 在各种分子特性基准上都表现出了卓越的预测能力,与多个基线相比,它在 33 个毒性任务中表现出了很高的性能。此外,我们还进一步研究了在建立模型时引入硬阴性的必要性以及硬阴性比例对模型的影响。
{"title":"GCLmf: A Novel Molecular Graph Contrastive Learning Framework Based on Hard Negatives and Application in Toxicity Prediction.","authors":"Xinxin Yu, Yuanting Chen, Long Chen, Weihua Li, Yuhao Wang, Yun Tang, Guixia Liu","doi":"10.1002/minf.202400169","DOIUrl":"10.1002/minf.202400169","url":null,"abstract":"<p><p>In silico methods for prediction of chemical toxicity can decrease the cost and increase the efficiency in the early stage of drug discovery. However, due to low accessibility of sufficient and reliable toxicity data, constructing robust and accurate prediction models is challenging. Contrastive learning, a type of self-supervised learning, leverages large unlabeled data to obtain more expressive molecular representations, which can boost the prediction performance on downstream tasks. While molecular graph contrastive learning has gathered growing attentions, current models neglect the quality of negative data set. Here, we proposed a self-supervised pretraining deep learning framework named GCLmf. We first utilized molecular fragments that meet specific conditions as hard negative samples to boost the quality of the negative set and thus increase the difficulty of the proxy tasks during pre-training to learn informative representations. GCLmf has shown excellent predictive power on various molecular property benchmarks and demonstrates high performance in 33 toxicity tasks in comparison with multiple baselines. In addition, we further investigated the necessity of introducing hard negatives in model building and the impact of the proportion of hard negatives on the model.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400169"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural and Dynamic Assessment of Disease-Causing Mutations for the Carnitine Transporter OCTN2. 肉毒碱转运体OCTN2致病突变的结构和动态评估。
IF 3.1 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 DOI: 10.1002/minf.202400002
Johannes Jokiel, Marcel Bermudez

Primary carnitine deficiency (PCD) is a rare autosomal recessive genetic disorder caused by missense mutations in the SLC22A5 gene encoding the organic carnitine transporter novel type 2 (OCTN2). This study investigates the structural consequences of PCD-causing mutations, focusing on the N32S variant. Using an alpha-fold model, molecular dynamics simulations reveal altered interactions and dynamics suggesting potential mechanistic changes in carnitine transport. In addition, we identify mutation hotspots (R169, E452) across the SLC family with the major facilitator superfamily (MFS) fold. Our data demonstrates the applicability of structural modeling for linking genetic information and clinical observations and providing a rationale for the influence of disease-causing mutations on protein dynamics.

原发性肉碱缺乏症(PCD)是一种罕见的常染色体隐性遗传病,由编码有机肉碱转运体新型2型(OCTN2)的SLC22A5基因错义突变引起。本研究调查了引起pcd的突变的结构后果,重点是N32S变异。利用α折叠模型,分子动力学模拟揭示了相互作用和动力学的改变,表明肉毒碱运输的潜在机制变化。此外,我们在SLC家族中发现了突变热点(R169, E452),主要促进者超家族(MFS)折叠。我们的数据证明了结构建模在连接遗传信息和临床观察方面的适用性,并为致病突变对蛋白质动力学的影响提供了基本原理。
{"title":"Structural and Dynamic Assessment of Disease-Causing Mutations for the Carnitine Transporter OCTN2.","authors":"Johannes Jokiel, Marcel Bermudez","doi":"10.1002/minf.202400002","DOIUrl":"10.1002/minf.202400002","url":null,"abstract":"<p><p>Primary carnitine deficiency (PCD) is a rare autosomal recessive genetic disorder caused by missense mutations in the SLC22A5 gene encoding the organic carnitine transporter novel type 2 (OCTN2). This study investigates the structural consequences of PCD-causing mutations, focusing on the N32S variant. Using an alpha-fold model, molecular dynamics simulations reveal altered interactions and dynamics suggesting potential mechanistic changes in carnitine transport. In addition, we identify mutation hotspots (R169, E452) across the SLC family with the major facilitator superfamily (MFS) fold. Our data demonstrates the applicability of structural modeling for linking genetic information and clinical observations and providing a rationale for the influence of disease-causing mutations on protein dynamics.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 1","pages":"e202400002"},"PeriodicalIF":3.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11733719/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143056046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpret Gaussian Process Models by Using Integrated Gradients. 利用综合梯度解释高斯过程模型
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-11-26 DOI: 10.1002/minf.202400051
Fan Zhang, Naoaki Ono, Shigehiko Kanaya

Gaussian process regression (GPR) is a nonparametric probabilistic model capable of computing not only the predicted mean but also the predicted standard deviation, which represents the confidence level of predictions. It offers great flexibility as it can be non-linearized by designing the kernel function, made robust against outliers by altering the likelihood function, and extended to classification models. Recently, models combining deep learning with GPR, such as Deep Kernel Learning GPR, have been proposed and reported to achieve higher accuracy than GPR. However, due to its nonparametric nature, GPR is challenging to interpret. While Explainable AI (XAI) methods like LIME or kernel SHAP can interpret the predicted mean, interpreting the predicted standard deviation remains difficult. In this study, we propose a novel method to interpret the prediction of GPR by evaluating the importance of explanatory variables. We have incorporated the GPR model with the Integrated Gradients (IG) method to assess the contribution of each feature to the prediction. By evaluating the standard deviation of the posterior distribution, we show that the IG approach provides a detailed decomposition of the predictive uncertainty, attributing it to the uncertainty in individual feature contributions. This methodology not only highlights the variables that are most influential in the prediction but also provides insights into the reliability of the model by quantifying the uncertainty associated with each feature. Through this, we can obtain a deeper understanding of the model's behavior and foster trust in its predictions, especially in domains where interpretability is as crucial as accuracy.

高斯过程回归(GPR)是一种非参数概率模型,不仅能计算预测的平均值,还能计算预测的标准偏差,标准偏差代表预测的置信度。它具有极大的灵活性,可以通过设计核函数使其非线性化,通过改变似然函数使其对异常值具有鲁棒性,并扩展到分类模型。最近,有人提出了将深度学习与 GPR 相结合的模型,如深度核学习 GPR,据报道,其准确率高于 GPR。然而,由于其非参数性质,GPR 的解释具有挑战性。虽然像 LIME 或核 SHAP 这样的可解释人工智能(XAI)方法可以解释预测的平均值,但解释预测的标准偏差仍然很困难。在本研究中,我们提出了一种通过评估解释变量的重要性来解释 GPR 预测的新方法。我们将 GPR 模型与综合梯度(IG)方法相结合,以评估每个特征对预测的贡献。通过评估后验分布的标准偏差,我们发现综合梯度法对预测的不确定性进行了详细分解,将其归因于各个特征贡献的不确定性。这种方法不仅能突出对预测影响最大的变量,还能通过量化与每个特征相关的不确定性,深入了解模型的可靠性。通过这种方法,我们可以更深入地了解模型的行为,并增强对其预测结果的信任,尤其是在可解释性与准确性同等重要的领域。
{"title":"Interpret Gaussian Process Models by Using Integrated Gradients.","authors":"Fan Zhang, Naoaki Ono, Shigehiko Kanaya","doi":"10.1002/minf.202400051","DOIUrl":"10.1002/minf.202400051","url":null,"abstract":"<p><p>Gaussian process regression (GPR) is a nonparametric probabilistic model capable of computing not only the predicted mean but also the predicted standard deviation, which represents the confidence level of predictions. It offers great flexibility as it can be non-linearized by designing the kernel function, made robust against outliers by altering the likelihood function, and extended to classification models. Recently, models combining deep learning with GPR, such as Deep Kernel Learning GPR, have been proposed and reported to achieve higher accuracy than GPR. However, due to its nonparametric nature, GPR is challenging to interpret. While Explainable AI (XAI) methods like LIME or kernel SHAP can interpret the predicted mean, interpreting the predicted standard deviation remains difficult. In this study, we propose a novel method to interpret the prediction of GPR by evaluating the importance of explanatory variables. We have incorporated the GPR model with the Integrated Gradients (IG) method to assess the contribution of each feature to the prediction. By evaluating the standard deviation of the posterior distribution, we show that the IG approach provides a detailed decomposition of the predictive uncertainty, attributing it to the uncertainty in individual feature contributions. This methodology not only highlights the variables that are most influential in the prediction but also provides insights into the reliability of the model by quantifying the uncertainty associated with each feature. Through this, we can obtain a deeper understanding of the model's behavior and foster trust in its predictions, especially in domains where interpretability is as crucial as accuracy.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400051"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695984/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142716611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extended Activity Cliffs-Driven Approaches on Data Splitting for the Study of Bioactivity Machine Learning Predictions. 用于生物活性机器学习预测研究的数据分割扩展活动峭壁驱动方法。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-11-18 DOI: 10.1002/minf.202400054
Kenneth López-Pérez, Ramón Alain Miranda-Quintana

The presence of Activity Cliffs (ACs) has been known to represent a challenge for QSAR modeling. With its high data dependency, Machine Learning QSAR models will be directly influenced by the activity landscape. We propose several extended similarity and extended SALI methods to study the implications of ACs distribution on the training and test sets on the model's errors. Ununiform ACs and chemical space distribution tend to lead to worse models than the proposed uniform methods. ML modeling on AC-rich sets needs to be analyzed case-by-case. Proposed methods can be used as a tool to study the datasets, but as far as generalization, random splitting was the better-performing data splitting alternative overall.

众所周知,活性悬崖(AC)的存在对 QSAR 建模是一个挑战。由于高度依赖数据,机器学习 QSAR 模型将直接受到活动景观的影响。我们提出了几种扩展相似性和扩展 SALI 方法,以研究训练集和测试集上的 ACs 分布对模型误差的影响。与所提出的统一方法相比,不统一的 ACs 和化学空间分布往往会导致更差的模型。在富AC集上的 ML 建模需要逐个分析。建议的方法可作为研究数据集的工具,但就泛化而言,随机拆分是总体表现更好的数据拆分替代方法。
{"title":"Extended Activity Cliffs-Driven Approaches on Data Splitting for the Study of Bioactivity Machine Learning Predictions.","authors":"Kenneth López-Pérez, Ramón Alain Miranda-Quintana","doi":"10.1002/minf.202400054","DOIUrl":"10.1002/minf.202400054","url":null,"abstract":"<p><p>The presence of Activity Cliffs (ACs) has been known to represent a challenge for QSAR modeling. With its high data dependency, Machine Learning QSAR models will be directly influenced by the activity landscape. We propose several extended similarity and extended SALI methods to study the implications of ACs distribution on the training and test sets on the model's errors. Ununiform ACs and chemical space distribution tend to lead to worse models than the proposed uniform methods. ML modeling on AC-rich sets needs to be analyzed case-by-case. Proposed methods can be used as a tool to study the datasets, but as far as generalization, random splitting was the better-performing data splitting alternative overall.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400054"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12143937/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142668097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization. 从高维到人类洞察:探索化学空间可视化的降维。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-12-05 DOI: 10.1002/minf.202400265
Alexey A Orlov, Tagir N Akhmetshin, Dragos Horvath, Gilles Marcou, Alexandre Varnek

Dimensionality reduction is an important exploratory data analysis method that allows high-dimensional data to be represented in a human-interpretable lower-dimensional space. It is extensively applied in the analysis of chemical libraries, where chemical structure data - represented as high-dimensional feature vectors-are transformed into 2D or 3D chemical space maps. In this paper, commonly used dimensionality reduction techniques - Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and Generative Topographic Mapping (GTM) - are evaluated in terms of neighborhood preservation and visualization capability of sets of small molecules from the ChEMBL database.

降维是一种重要的探索性数据分析方法,它允许高维数据在人类可解释的低维空间中表示。它广泛应用于化学文库的分析,其中化学结构数据-表示为高维特征向量-转换为二维或三维化学空间图。在本文中,常用的降维技术-主成分分析(PCA), t-分布随机邻居嵌入(t-SNE),均匀流形逼近和投影(UMAP)和生成地形映射(GTM) -在ChEMBL数据库中小分子集的邻域保存和可视化能力方面进行了评估。
{"title":"From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization.","authors":"Alexey A Orlov, Tagir N Akhmetshin, Dragos Horvath, Gilles Marcou, Alexandre Varnek","doi":"10.1002/minf.202400265","DOIUrl":"10.1002/minf.202400265","url":null,"abstract":"<p><p>Dimensionality reduction is an important exploratory data analysis method that allows high-dimensional data to be represented in a human-interpretable lower-dimensional space. It is extensively applied in the analysis of chemical libraries, where chemical structure data - represented as high-dimensional feature vectors-are transformed into 2D or 3D chemical space maps. In this paper, commonly used dimensionality reduction techniques - Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and Generative Topographic Mapping (GTM) - are evaluated in terms of neighborhood preservation and visualization capability of sets of small molecules from the ChEMBL database.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400265"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11733715/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142780626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GDMol: Generative Double-Masking Self-Supervised Learning for Molecular Property Prediction. GDMol:用于分子特性预测的生成式双掩蔽自我监督学习。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-10-24 DOI: 10.1002/minf.202400146
Yingxu Liu, Qing Fan, Chengcheng Xu, Xiangzhen Ning, Yu Wang, Yang Liu, Yu Xie, Yanmin Zhang, Yadong Chen, Haichun Liu

Background: Effective molecular feature representation is crucial for drug property prediction. Recent years have seen increased attention on graph neural networks (GNNs) that are pre-trained using self-supervised learning techniques, aiming to overcome the scarcity of labeled data in molecular property prediction. Traditional GNNs in self-supervised molecular property prediction typically perform a single masking operation on the nodes and edges of the input molecular graph, masking only local information and insufficient for thorough self-supervised training.

Method: Hence, we propose a model for molecular property prediction based on generative double-masking self-supervised learning, termed as GDMol. This integrates generative learning into the self-supervised learning framework for latent representation, and applies a second round of masking to these latent representations, enabling the model to better capture global information and semantic knowledge of the molecules for a richer, more informative representation, thereby achieving more accurate and robust molecular property prediction.

Results: Our experiments on 5 datasets demonstrated superior performance of GDMol in predicting molecular properties across different domains. Moreover, we used the masking operation to traverse through the gradient changes of each node, the magnitude and sign of which reflect the positive and negative contribution respectively of the local structure in the molecule to the prediction outcome. This in-depth interpretative analysis not only enhances the model's interpretability, but also provides more targeted insights and direction for optimizing drug molecules.

Conclusions: In summary, this research offers novel insights on improving molecular property prediction tasks, and paves the way for further research on the application of generative learning and self-supervised learning in the field of chemistry.

背景:有效的分子特征表示对于药物性质预测至关重要。近年来,使用自我监督学习技术预先训练的图神经网络(GNN)受到越来越多的关注,其目的是克服分子性质预测中标记数据稀缺的问题。传统的自监督分子性质预测 GNN 通常只对输入分子图的节点和边进行一次屏蔽操作,屏蔽的只是局部信息,不足以进行彻底的自监督训练:因此,我们提出了一种基于生成式双掩蔽自监督学习的分子特性预测模型,称为 GDMol。它将生成学习整合到潜在表征的自我监督学习框架中,并对这些潜在表征进行第二轮掩蔽,使模型能够更好地捕捉分子的全局信息和语义知识,从而获得更丰富、更翔实的表征,从而实现更准确、更稳健的分子性质预测:我们在 5 个数据集上进行的实验表明,GDMol 在预测不同领域的分子特性方面表现出色。此外,我们利用掩码操作遍历了每个节点的梯度变化,其大小和符号分别反映了分子中局部结构对预测结果的正负贡献。这种深入的解释性分析不仅增强了模型的可解释性,还为优化药物分子提供了更有针对性的见解和方向:总之,这项研究为改进分子性质预测任务提供了新的见解,并为生成学习和自监督学习在化学领域的进一步应用研究铺平了道路。
{"title":"GDMol: Generative Double-Masking Self-Supervised Learning for Molecular Property Prediction.","authors":"Yingxu Liu, Qing Fan, Chengcheng Xu, Xiangzhen Ning, Yu Wang, Yang Liu, Yu Xie, Yanmin Zhang, Yadong Chen, Haichun Liu","doi":"10.1002/minf.202400146","DOIUrl":"10.1002/minf.202400146","url":null,"abstract":"<p><strong>Background: </strong>Effective molecular feature representation is crucial for drug property prediction. Recent years have seen increased attention on graph neural networks (GNNs) that are pre-trained using self-supervised learning techniques, aiming to overcome the scarcity of labeled data in molecular property prediction. Traditional GNNs in self-supervised molecular property prediction typically perform a single masking operation on the nodes and edges of the input molecular graph, masking only local information and insufficient for thorough self-supervised training.</p><p><strong>Method: </strong>Hence, we propose a model for molecular property prediction based on generative double-masking self-supervised learning, termed as GDMol. This integrates generative learning into the self-supervised learning framework for latent representation, and applies a second round of masking to these latent representations, enabling the model to better capture global information and semantic knowledge of the molecules for a richer, more informative representation, thereby achieving more accurate and robust molecular property prediction.</p><p><strong>Results: </strong>Our experiments on 5 datasets demonstrated superior performance of GDMol in predicting molecular properties across different domains. Moreover, we used the masking operation to traverse through the gradient changes of each node, the magnitude and sign of which reflect the positive and negative contribution respectively of the local structure in the molecule to the prediction outcome. This in-depth interpretative analysis not only enhances the model's interpretability, but also provides more targeted insights and direction for optimizing drug molecules.</p><p><strong>Conclusions: </strong>In summary, this research offers novel insights on improving molecular property prediction tasks, and paves the way for further research on the application of generative learning and self-supervised learning in the field of chemistry.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400146"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142504416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein Binding Site Representation in Latent Space. 潜空间中的蛋白质结合位点表征
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-12-18 DOI: 10.1002/minf.202400205
Frederieke Lohmann, Stephan Allenspach, Kenneth Atz, Carl C G Schiebroek, Jan A Hiss, Gisbert Schneider

Interpretability and reliability of deep learning models are important for computer-based drug discovery. Aiming to understand feature perception by such a model, we investigate a graph neural network for affinity prediction of protein-ligand complexes. We assess a latent representation of ligand binding sites and investigate underlying geometric structure in this latent space and its relation to protein function. We introduce an automated computational pipeline for dimensionality reduction, clustering, hypothesis testing, and visualization of latent space. The results indicate that the learned protein latent space is inherently structured and not randomly distributed. Several of the identified protein binding site clusters in latent space correspond to functional protein families. Ligand size was found to be a determinant of cluster geometry. The computational pipeline proved applicable to latent space analysis and interpretation and can be adapted to work for different datasets and deep learning models.

深度学习模型的可解释性和可靠性对于基于计算机的药物发现非常重要。为了了解此类模型的特征感知,我们研究了用于蛋白质配体复合物亲和力预测的图神经网络。我们评估了配体结合位点的潜在表征,并研究了该潜在空间的潜在几何结构及其与蛋白质功能的关系。我们引入了一个自动计算管道,用于潜在空间的降维、聚类、假设检验和可视化。结果表明,学习到的蛋白质潜空间是固有结构,而不是随机分布的。潜空间中发现的几个蛋白质结合位点群与功能蛋白质家族相对应。研究发现,配体的大小决定了簇的几何形状。事实证明,该计算管道适用于潜空间分析和解释,并可适用于不同的数据集和深度学习模型。
{"title":"Protein Binding Site Representation in Latent Space.","authors":"Frederieke Lohmann, Stephan Allenspach, Kenneth Atz, Carl C G Schiebroek, Jan A Hiss, Gisbert Schneider","doi":"10.1002/minf.202400205","DOIUrl":"10.1002/minf.202400205","url":null,"abstract":"<p><p>Interpretability and reliability of deep learning models are important for computer-based drug discovery. Aiming to understand feature perception by such a model, we investigate a graph neural network for affinity prediction of protein-ligand complexes. We assess a latent representation of ligand binding sites and investigate underlying geometric structure in this latent space and its relation to protein function. We introduce an automated computational pipeline for dimensionality reduction, clustering, hypothesis testing, and visualization of latent space. The results indicate that the learned protein latent space is inherently structured and not randomly distributed. Several of the identified protein binding site clusters in latent space correspond to functional protein families. Ligand size was found to be a determinant of cluster geometry. The computational pipeline proved applicable to latent space analysis and interpretation and can be adapted to work for different datasets and deep learning models.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400205"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11733832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142847041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chemography-guided analysis of a reaction path network for ethylene hydrogenation with a model Wilkinson's catalyst. 利用威尔金森催化剂模型对乙烯加氢反应路径网络进行化学分析。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-08-09 DOI: 10.1002/minf.202400063
Philippe Gantzer, Ruben Staub, Yu Harabuchi, Satoshi Maeda, Alexandre Varnek

Visualization and analysis of large chemical reaction networks become rather challenging when conventional graph-based approaches are used. As an alternative, we propose to use the chemical cartography ("chemography") approach, describing the data distribution on a 2-dimensional map. Here, the Generative Topographic Mapping (GTM) algorithm - an advanced chemography approach - has been applied to visualize the reaction path network of a simplified Wilkinson's catalyst-catalyzed hydrogenation containing some 105 structures generated with the help of the Artificial Force Induced Reaction (AFIR) method using either Density Functional Theory or Neural Network Potential (NNP) for potential energy surface calculations. Using new atoms permutation invariant 3D descriptors for structure encoding, we've demonstrated that GTM possesses the abilities to cluster structures that share the same 2D representation, to visualize potential energy surface, to provide an insight on the reaction path exploration as a function of time and to compare reaction path networks obtained with different methods of energy assessment.

如果采用传统的基于图形的方法,大型化学反应网络的可视化和分析就会变得相当具有挑战性。作为替代方案,我们建议使用化学制图("chemography")方法,在二维地图上描述数据分布。在这里,生成地形图(GTM)算法--一种先进的化学制图方法--被应用于简化的威尔金森催化剂催化氢化反应路径网络的可视化,该网络包含在人工力诱导反应(AFIR)方法的帮助下,利用密度泛函理论或神经网络势能(NNP)进行势能面计算而生成的约 105 个结构。通过使用新的原子排列不变三维描述符进行结构编码,我们证明了 GTM 具有对具有相同二维表示的结构进行聚类、可视化势能面、提供反应路径探索随时间变化的洞察力以及比较使用不同能量评估方法获得的反应路径网络的能力。
{"title":"Chemography-guided analysis of a reaction path network for ethylene hydrogenation with a model Wilkinson's catalyst.","authors":"Philippe Gantzer, Ruben Staub, Yu Harabuchi, Satoshi Maeda, Alexandre Varnek","doi":"10.1002/minf.202400063","DOIUrl":"10.1002/minf.202400063","url":null,"abstract":"<p><p>Visualization and analysis of large chemical reaction networks become rather challenging when conventional graph-based approaches are used. As an alternative, we propose to use the chemical cartography (\"chemography\") approach, describing the data distribution on a 2-dimensional map. Here, the Generative Topographic Mapping (GTM) algorithm - an advanced chemography approach - has been applied to visualize the reaction path network of a simplified Wilkinson's catalyst-catalyzed hydrogenation containing some 10<sup>5</sup> structures generated with the help of the Artificial Force Induced Reaction (AFIR) method using either Density Functional Theory or Neural Network Potential (NNP) for potential energy surface calculations. Using new atoms permutation invariant 3D descriptors for structure encoding, we've demonstrated that GTM possesses the abilities to cluster structures that share the same 2D representation, to visualize potential energy surface, to provide an insight on the reaction path exploration as a function of time and to compare reaction path networks obtained with different methods of energy assessment.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400063"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Molecular Design with Direct Inverse Analysis of QSAR/QSPR Model. 利用QSAR/QSPR模型的直接逆分析改进分子设计。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 DOI: 10.1002/minf.202400227
Yuto Shino, Hiromasa Kaneko

Recent advances in machine learning have significantly impacted molecular design, notably the molecular generation method combining the chemical variational autoencoder (VAE) with Gaussian mixture regression (GMR). In this method, a mathematical model is constructed with X as the latent variable of the molecule and Y as the target properties and activities. Through direct inverse analysis of this model, it is possible to generate molecules with the desired target properties. However, this approach outputs many strings that do not follow the simplified molecular input line entry system grammar and generates unrealistic chemical structures in which the properties and activity do not satisfy the target values. In this study, we focus on hierarchical VAE using molecular graphs to address these issues. We confirm that the combination of hierarchical VAE and GMR does not generate invalid outputs and returns molecules that simultaneously satisfy multiple target values. Moreover, we use this method to identify several molecules that are predicted to exhibit activity against drug targets.

机器学习的最新进展对分子设计产生了重大影响,特别是结合化学变分自编码器(VAE)和高斯混合回归(GMR)的分子生成方法。该方法以X为分子的潜在变量,Y为目标分子的性质和活性,建立数学模型。通过对该模型的直接逆分析,可以生成具有所需目标性质的分子。然而,这种方法输出了许多不遵循简化的分子输入行输入系统语法的字符串,并且生成了不现实的化学结构,其中的属性和活性不满足目标值。在本研究中,我们着重于使用分子图的分层VAE来解决这些问题。我们证实了分层VAE和GMR的组合不会产生无效的输出,并且返回同时满足多个目标值的分子。此外,我们使用这种方法来鉴定几种预测对药物靶标具有活性的分子。
{"title":"Improving Molecular Design with Direct Inverse Analysis of QSAR/QSPR Model.","authors":"Yuto Shino, Hiromasa Kaneko","doi":"10.1002/minf.202400227","DOIUrl":"10.1002/minf.202400227","url":null,"abstract":"<p><p>Recent advances in machine learning have significantly impacted molecular design, notably the molecular generation method combining the chemical variational autoencoder (VAE) with Gaussian mixture regression (GMR). In this method, a mathematical model is constructed with X as the latent variable of the molecule and Y as the target properties and activities. Through direct inverse analysis of this model, it is possible to generate molecules with the desired target properties. However, this approach outputs many strings that do not follow the simplified molecular input line entry system grammar and generates unrealistic chemical structures in which the properties and activity do not satisfy the target values. In this study, we focus on hierarchical VAE using molecular graphs to address these issues. We confirm that the combination of hierarchical VAE and GMR does not generate invalid outputs and returns molecules that simultaneously satisfy multiple target values. Moreover, we use this method to identify several molecules that are predicted to exhibit activity against drug targets.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 1","pages":"e202400227"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11724648/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142965748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Insight on the Selectivity of Calyx[4]Arene-Based Inhibitors of Mg2+-Dependent Atp-Hydrolases. 基于Calyx[4]芳烃的Mg2+依赖性atp水解酶抑制剂选择性的结构分析。
IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Pub Date : 2025-01-01 Epub Date: 2024-12-05 DOI: 10.1002/minf.202400200
Alexey Rayevsky, Maksym Platonov, Bulgakov Elijah, Dmytro Volochnyuk, Tetyana Veklich, Sergiy Cherenok, Roman Rodik, Vitaliy Kalchenko, Sergiy Kosterin

Located in plasma membranes, ATP hydrolases are involved in several dynamic transport processes, helping to control the movement of ions across cell membranes. ATP hydrolase acts as a transport protein, converting energy from ATP hydrolysis into transport molecules against their concentration gradients. In addition to energy metabolism and active transport, ATP hydrolase is essential for maintaining cellular homeostasis and cell function. This study focused on the domain architecture model of P-type ATPases, which participate in the reaction cycles of ATP hydrolysis carried out by membrane transport systems - Na+, K+-ATPase and Ca2+, Mg2+-ATPase. Targeted modulation of Na+, K+-ATPase and Ca2+, Mg2+-ATPase by unnatural drugs is of greatest interest due to the lack of known effectors. This new discovery presents a convenient model based on our recent experimental studies of the membrane structures and myocytes of the uterine smooth muscle, the myometrium. This current study strongly supports the fact that nanosized calix[4]arenes functionalised on the upper rings of the macrocycle with biologically active phosphonic acid fragments can serve as selective and potent inhibitors of cation-transporting electroenzymes. This is how we discovered that calix[4]arene of methylenebisphosphonic acid C-97 and calix[4]arene of bis-aminophosphonic acid C-107 selectively and effectively (I0.5 <100 nM) inhibit the activity of Mg2+, ATP-dependent electrogenic Na+ K+ plasma membrane pump. As drug discovery in the field of Mg2+-ATPase inhibitors is uncharted territory, basic research holds the key to explaining and predicting the mechanism of interaction and action of different classes of compounds. In light of the presented results, new calix[4]arene compounds can be used as potent inhibitors of Mg2+, ATP-dependent electrogenic ion pumps.

ATP水解酶位于质膜上,参与多种动态转运过程,有助于控制离子在细胞膜上的运动。ATP水解酶作为一种转运蛋白,将ATP水解产生的能量根据其浓度梯度转化为转运分子。除了能量代谢和主动运输外,ATP水解酶对维持细胞稳态和细胞功能至关重要。本研究重点研究了p型ATP酶的结构域结构模型,p型ATP酶参与细胞膜转运系统Na+, K+-ATP酶和Ca2+, Mg2+-ATP酶进行ATP水解的反应循环。由于缺乏已知的效应物,非天然药物对Na+, K+- atp酶和Ca2+, Mg2+- atp酶的靶向调节是最感兴趣的。这一新发现基于我们最近对子宫平滑肌肌层的膜结构和肌细胞的实验研究提供了一个方便的模型。目前的研究有力地支持了这样一个事实,即在大环的上环上功能化的具有生物活性的磷酸片段的纳米杯[4]芳烃可以作为选择性和有效的阳离子运输电酶抑制剂。这就是我们如何选择性和有效地发现亚甲二膦酸C-97和二氨基膦酸C-107的杯状[4]芳烃(I0.5
{"title":"Structural Insight on the Selectivity of Calyx[4]Arene-Based Inhibitors of Mg<sup>2+-</sup>Dependent Atp-Hydrolases.","authors":"Alexey Rayevsky, Maksym Platonov, Bulgakov Elijah, Dmytro Volochnyuk, Tetyana Veklich, Sergiy Cherenok, Roman Rodik, Vitaliy Kalchenko, Sergiy Kosterin","doi":"10.1002/minf.202400200","DOIUrl":"10.1002/minf.202400200","url":null,"abstract":"<p><p>Located in plasma membranes, ATP hydrolases are involved in several dynamic transport processes, helping to control the movement of ions across cell membranes. ATP hydrolase acts as a transport protein, converting energy from ATP hydrolysis into transport molecules against their concentration gradients. In addition to energy metabolism and active transport, ATP hydrolase is essential for maintaining cellular homeostasis and cell function. This study focused on the domain architecture model of P-type ATPases, which participate in the reaction cycles of ATP hydrolysis carried out by membrane transport systems - Na+, K+-ATPase and Ca2+, Mg2+-ATPase. Targeted modulation of Na+, K+-ATPase and Ca2+, Mg2+-ATPase by unnatural drugs is of greatest interest due to the lack of known effectors. This new discovery presents a convenient model based on our recent experimental studies of the membrane structures and myocytes of the uterine smooth muscle, the myometrium. This current study strongly supports the fact that nanosized calix[4]arenes functionalised on the upper rings of the macrocycle with biologically active phosphonic acid fragments can serve as selective and potent inhibitors of cation-transporting electroenzymes. This is how we discovered that calix[4]arene of methylenebisphosphonic acid C-97 and calix[4]arene of bis-aminophosphonic acid C-107 selectively and effectively (I0.5 <100 nM) inhibit the activity of Mg2+, ATP-dependent electrogenic Na+ K+ plasma membrane pump. As drug discovery in the field of Mg2+-ATPase inhibitors is uncharted territory, basic research holds the key to explaining and predicting the mechanism of interaction and action of different classes of compounds. In light of the presented results, new calix[4]arene compounds can be used as potent inhibitors of Mg2+, ATP-dependent electrogenic ion pumps.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400200"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142780628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Molecular Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1