首页 > 最新文献

Molecular Informatics最新文献

英文 中文
Exploring cooperative molecular contacts using a PostgreSQL database system. 利用PostgreSQL数据库系统探索合作分子接触。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-05-01 DOI: 10.1002/minf.202200235
Mael A Briand, Loïc Dreano, Ashenafi Legehar, Evgeni Grazhdankin, Leo Ghemtio, Henri Xhaard

Cooperative molecular contacts play an important role in protein structure and ligand binding. Here, we constructed a PostgreSQL database that stores structural information in the form of atomic environments and allows flexible mining of molecular contacts. Taking the Ser-His-Asp/Glu catalytic triad as a first test case, we demonstrate that the presence of a carboxylate oxygen atom in the vicinity of a His is associated with shorter Ser-OH..N-His bond in the PDB30 subset. We prospectively mine catalytic triads in unannotated proteins, suggesting catalytic functions for unannotated proteins. As a second test case, we demonstrate that this database system can include ligand atoms, represented by Sybyl atom types, by evaluating the proportion of counter-ions for ligand carboxylate oxygens.

协同分子接触在蛋白质结构和配体结合中起着重要作用。在这里,我们构建了一个PostgreSQL数据库,它以原子环境的形式存储结构信息,并允许灵活地挖掘分子接触。以Ser-His-Asp/Glu催化三联体为第一个测试案例,我们证明了在His附近存在羧酸氧原子与较短的Ser-OH相关。PDB30子集中的N-His键。我们前瞻性地挖掘了未注释蛋白中的催化三元组,提示了未注释蛋白的催化功能。作为第二个测试案例,我们通过评估配体羧酸氧的反离子比例,证明该数据库系统可以包括配体原子,以Sybyl原子类型表示。
{"title":"Exploring cooperative molecular contacts using a PostgreSQL database system.","authors":"Mael A Briand,&nbsp;Loïc Dreano,&nbsp;Ashenafi Legehar,&nbsp;Evgeni Grazhdankin,&nbsp;Leo Ghemtio,&nbsp;Henri Xhaard","doi":"10.1002/minf.202200235","DOIUrl":"https://doi.org/10.1002/minf.202200235","url":null,"abstract":"<p><p>Cooperative molecular contacts play an important role in protein structure and ligand binding. Here, we constructed a PostgreSQL database that stores structural information in the form of atomic environments and allows flexible mining of molecular contacts. Taking the Ser-His-Asp/Glu catalytic triad as a first test case, we demonstrate that the presence of a carboxylate oxygen atom in the vicinity of a His is associated with shorter Ser-OH..N-His bond in the PDB30 subset. We prospectively mine catalytic triads in unannotated proteins, suggesting catalytic functions for unannotated proteins. As a second test case, we demonstrate that this database system can include ligand atoms, represented by Sybyl atom types, by evaluating the proportion of counter-ions for ligand carboxylate oxygens.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9457184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning strategy with clustering under sampling of majority instances for predicting drug target interactions. 预测药物靶标相互作用的多数实例下聚类机器学习策略。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-05-01 DOI: 10.1002/minf.202200102
Tanya Liyaqat, Tanvir Ahmad

Drug Target Interactions (DTIs) are crucial in drug discovery as it reduces the range of candidate searches, speeding up the drug screening process. Considering in vitro and in vivo experimentations are time and cost-expensive, there has been a surge in computational techniques, especially ML methods for DTIs prediction. Therefore, this study aims to present a methodology that uses molecular structures and amino acid sequences for generating PSSM and PubChem fingerprints for drugs and targets respectively. The proposed work uses a novel technique NearestCUS for handling the class imbalance problem of the benchmark datasets. We use Isomap Embedding to extract features from PSSMs. Feature selection is performed using ANOVA. CatBoost is used for predicting the interaction between drugs and targets for the first time. To quantify the efficacy of NearestCUS, we compared it with other sampling techniques. We found that the proposed methodology performed better than state-of-the-art approaches.

药物靶标相互作用(DTIs)在药物发现中至关重要,因为它减少了候选药物的搜索范围,加快了药物筛选过程。考虑到体外和体内实验的时间和成本昂贵,计算技术,特别是用于DTIs预测的ML方法激增。因此,本研究旨在提出一种利用分子结构和氨基酸序列分别为药物和靶标生成PSSM和PubChem指纹图谱的方法。本文采用了一种新颖的NearestCUS技术来处理基准数据集的类不平衡问题。我们使用等高图嵌入技术从pssm中提取特征。特征选择使用方差分析进行。CatBoost首次用于预测药物与靶标之间的相互作用。为了量化NearestCUS的效果,我们将其与其他采样技术进行了比较。我们发现所提出的方法比最先进的方法表现得更好。
{"title":"A machine learning strategy with clustering under sampling of majority instances for predicting drug target interactions.","authors":"Tanya Liyaqat,&nbsp;Tanvir Ahmad","doi":"10.1002/minf.202200102","DOIUrl":"https://doi.org/10.1002/minf.202200102","url":null,"abstract":"<p><p>Drug Target Interactions (DTIs) are crucial in drug discovery as it reduces the range of candidate searches, speeding up the drug screening process. Considering in vitro and in vivo experimentations are time and cost-expensive, there has been a surge in computational techniques, especially ML methods for DTIs prediction. Therefore, this study aims to present a methodology that uses molecular structures and amino acid sequences for generating PSSM and PubChem fingerprints for drugs and targets respectively. The proposed work uses a novel technique NearestCUS for handling the class imbalance problem of the benchmark datasets. We use Isomap Embedding to extract features from PSSMs. Feature selection is performed using ANOVA. CatBoost is used for predicting the interaction between drugs and targets for the first time. To quantify the efficacy of NearestCUS, we compared it with other sampling techniques. We found that the proposed methodology performed better than state-of-the-art approaches.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9460164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fragment-based deep molecular generation using hierarchical chemical graph representation and multi-resolution graph variational autoencoder. 利用分层化学图表示和多分辨率图变分自编码器的基于片段的深度分子生成。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-05-01 DOI: 10.1002/minf.202200215
Zhenxiang Gao, Xinyu Wang, Blake Blumenfeld Gaines, Xuetao Shi, Jinbo Bi, Minghu Song

Graph generative models have recently emerged as an interesting approach to construct molecular structures atom-by-atom or fragment-by-fragment. In this study, we adopt the fragment-based strategy and decompose each input molecule into a set of small chemical fragments. In drug discovery, a few drug molecules are designed by replacing certain chemical substituents with their bioisosteres or alternative chemical moieties. This inspires us to group decomposed fragments into different fragment clusters according to their local structural environment around bond-breaking positions. In this way, an input structure can be transformed into an equivalent three-layer graph, in which individual atoms, decomposed fragments, or obtained fragment clusters act as graph nodes at each corresponding layer. We further implement a prototype model, named multi-resolution graph variational autoencoder (MRGVAE), to learn embeddings of constituted nodes at each layer in a fine-to-coarse order. Our decoder adopts a similar but conversely hierarchical structure. It first predicts the next possible fragment cluster, then samples an exact fragment structure out of the determined fragment cluster, and sequentially attaches it to the preceding chemical moiety. Our proposed approach demonstrates comparatively good performance in molecular evaluation metrics compared with several other graph-based molecular generative models. The introduction of the additional fragment cluster graph layer will hopefully increase the odds of assembling new chemical moieties absent in the original training set and enhance their structural diversity. We hope that our prototyping work will inspire more creative research to explore the possibility of incorporating different kinds of chemical domain knowledge into a similar multi-resolution neural network architecture.

图生成模型是最近出现的一种有趣的方法,可以逐个原子或逐个片段地构建分子结构。在本研究中,我们采用基于片段的策略,将每个输入分子分解为一组小的化学片段。在药物发现中,一些药物分子是通过用它们的生物同工异构体或替代化学部分取代某些化学取代基来设计的。这启发我们将分解后的碎片根据断键位置周围的局部结构环境分成不同的碎片簇。通过这种方式,输入结构可以转换为等效的三层图,其中单个原子、分解的片段或获得的片段簇作为每个相应层的图节点。我们进一步实现了一个名为多分辨率图变分自编码器(MRGVAE)的原型模型,以从细到粗的顺序学习每层构成节点的嵌入。我们的解码器采用类似但相反的分层结构。它首先预测下一个可能的片段簇,然后从确定的片段簇中采样一个精确的片段结构,并按顺序将其连接到前面的化学片段。与其他几种基于图的分子生成模型相比,我们提出的方法在分子评价指标方面表现出相对较好的性能。引入额外的片段聚类图层有望增加组装原始训练集中缺失的新化学片段的几率,并增强其结构多样性。我们希望我们的原型工作将激发更多的创造性研究,以探索将不同种类的化学领域知识纳入类似的多分辨率神经网络架构的可能性。
{"title":"Fragment-based deep molecular generation using hierarchical chemical graph representation and multi-resolution graph variational autoencoder.","authors":"Zhenxiang Gao,&nbsp;Xinyu Wang,&nbsp;Blake Blumenfeld Gaines,&nbsp;Xuetao Shi,&nbsp;Jinbo Bi,&nbsp;Minghu Song","doi":"10.1002/minf.202200215","DOIUrl":"https://doi.org/10.1002/minf.202200215","url":null,"abstract":"<p><p>Graph generative models have recently emerged as an interesting approach to construct molecular structures atom-by-atom or fragment-by-fragment. In this study, we adopt the fragment-based strategy and decompose each input molecule into a set of small chemical fragments. In drug discovery, a few drug molecules are designed by replacing certain chemical substituents with their bioisosteres or alternative chemical moieties. This inspires us to group decomposed fragments into different fragment clusters according to their local structural environment around bond-breaking positions. In this way, an input structure can be transformed into an equivalent three-layer graph, in which individual atoms, decomposed fragments, or obtained fragment clusters act as graph nodes at each corresponding layer. We further implement a prototype model, named multi-resolution graph variational autoencoder (MRGVAE), to learn embeddings of constituted nodes at each layer in a fine-to-coarse order. Our decoder adopts a similar but conversely hierarchical structure. It first predicts the next possible fragment cluster, then samples an exact fragment structure out of the determined fragment cluster, and sequentially attaches it to the preceding chemical moiety. Our proposed approach demonstrates comparatively good performance in molecular evaluation metrics compared with several other graph-based molecular generative models. The introduction of the additional fragment cluster graph layer will hopefully increase the odds of assembling new chemical moieties absent in the original training set and enhance their structural diversity. We hope that our prototyping work will inspire more creative research to explore the possibility of incorporating different kinds of chemical domain knowledge into a similar multi-resolution neural network architecture.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9455075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new set of KNIME nodes implementing the QPhAR algorithm. 实现QPhAR算法的一组新的KNIME节点。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-05-01 DOI: 10.1002/minf.202200245
Stefan Kohlbacher, Gökhan Ibis, Christian Permann, Sharon Bryant, Thierry Langer, Thomas Seidel

Dissemination of novel research methods, especially in the form of chemoinformatics software, depends heavily on their ease of applicability for non-expert users with only a little or no programming skills and knowledge in computer science. Visual programming has become widely popular over the last few years, also enabling researchers without in-depth programming skills to develop tailored data processing pipelines using elements from a repository of predefined standard procedures. In this work, we present the development of a set of nodes for the KNIME platform implementing the QPhAR algorithm. We show how the developed KNIME nodes can be included in a typical workflow for biological activity prediction. Furthermore, we present best-practice guidelines that should be followed to obtain high-quality QPhAR models. Finally, we show a typical workflow to train and optimise a QPhAR model in KNIME for a set of given input compounds, applying the discussed best practices.

新研究方法的传播,特别是化学信息学软件的传播,在很大程度上取决于它们对只有很少或没有编程技能和计算机科学知识的非专业用户的适用性。在过去的几年中,可视化编程已经变得非常流行,它也使没有深入编程技能的研究人员能够使用预定义的标准过程存储库中的元素开发定制的数据处理管道。在这项工作中,我们提出了一组实现QPhAR算法的KNIME平台节点的开发。我们将展示如何将开发的KNIME节点包含在用于生物活性预测的典型工作流程中。此外,我们提出了获得高质量QPhAR模型应该遵循的最佳实践指南。最后,我们展示了一个典型的工作流来训练和优化KNIME中的QPhAR模型,用于一组给定的输入化合物,应用所讨论的最佳实践。
{"title":"A new set of KNIME nodes implementing the QPhAR algorithm.","authors":"Stefan Kohlbacher,&nbsp;Gökhan Ibis,&nbsp;Christian Permann,&nbsp;Sharon Bryant,&nbsp;Thierry Langer,&nbsp;Thomas Seidel","doi":"10.1002/minf.202200245","DOIUrl":"https://doi.org/10.1002/minf.202200245","url":null,"abstract":"<p><p>Dissemination of novel research methods, especially in the form of chemoinformatics software, depends heavily on their ease of applicability for non-expert users with only a little or no programming skills and knowledge in computer science. Visual programming has become widely popular over the last few years, also enabling researchers without in-depth programming skills to develop tailored data processing pipelines using elements from a repository of predefined standard procedures. In this work, we present the development of a set of nodes for the KNIME platform implementing the QPhAR algorithm. We show how the developed KNIME nodes can be included in a typical workflow for biological activity prediction. Furthermore, we present best-practice guidelines that should be followed to obtain high-quality QPhAR models. Finally, we show a typical workflow to train and optimise a QPhAR model in KNIME for a set of given input compounds, applying the discussed best practices.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9826136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovery of natural-derived Mpro inhibitors as therapeutic candidates for COVID-19: Structure-based pharmacophore screening combined with QSAR analysis. 发现天然来源的Mpro抑制剂作为COVID-19的候选治疗药物:基于结构的药效团筛选结合QSAR分析
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-04-01 DOI: 10.1002/minf.202200198
Mohammad A Khanfar, Nada Salaas, Reem Abumostafa

The main protease (Mpro ) is an essential enzyme for the life cycle of SARS-CoV-2 and a validated target for treatment of COVID-19 infection. Structure-based pharmacophore modeling combined with QSAR calculations were employed to identify new chemical scaffolds of Mpro inhibitors from natural products repository. Hundreds of pharmacophore models were manually built from their corresponding X-ray crystallographic structures. A pharmacophore model that was validated by receiver operating characteristic (ROC) curve analysis and selected using the statistically optimum QSAR equation was implemented as a 3D-search tool to mine AnalytiCon Discovery database of natural products. Captured hits that showed the highest predicted inhibitory activities were bioassayed. Three active Mpro inhibitors (pseurotin A, lactupicrin, and alpinetin) were successfully identified with IC50 values in low micromolar range.

主蛋白酶(Mpro)是SARS-CoV-2生命周期的必需酶,也是治疗COVID-19感染的有效靶点。采用基于结构的药效团模型结合QSAR计算,从天然产物库中鉴定Mpro抑制剂的新化学支架。根据相应的x射线晶体结构,人工建立了数百个药效团模型。通过受试者工作特征(ROC)曲线分析验证药效团模型,并使用统计最优的QSAR方程选择药效团模型,作为挖掘AnalytiCon Discovery天然产物数据库的3d搜索工具。捕获的显示最高预测抑制活性的命中进行生物测定。在低微摩尔范围内成功鉴定出三种活性Mpro抑制剂(假黄素A、乳苦苷和高松素)的IC50值。
{"title":"Discovery of natural-derived M<sup>pro</sup> inhibitors as therapeutic candidates for COVID-19: Structure-based pharmacophore screening combined with QSAR analysis.","authors":"Mohammad A Khanfar,&nbsp;Nada Salaas,&nbsp;Reem Abumostafa","doi":"10.1002/minf.202200198","DOIUrl":"https://doi.org/10.1002/minf.202200198","url":null,"abstract":"<p><p>The main protease (M<sup>pro</sup> ) is an essential enzyme for the life cycle of SARS-CoV-2 and a validated target for treatment of COVID-19 infection. Structure-based pharmacophore modeling combined with QSAR calculations were employed to identify new chemical scaffolds of M<sup>pro</sup> inhibitors from natural products repository. Hundreds of pharmacophore models were manually built from their corresponding X-ray crystallographic structures. A pharmacophore model that was validated by receiver operating characteristic (ROC) curve analysis and selected using the statistically optimum QSAR equation was implemented as a 3D-search tool to mine AnalytiCon Discovery database of natural products. Captured hits that showed the highest predicted inhibitory activities were bioassayed. Three active M<sup>pro</sup> inhibitors (pseurotin A, lactupicrin, and alpinetin) were successfully identified with IC<sub>50</sub> values in low micromolar range.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9660815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring isofunctional molecules: Design of a benchmark and evaluation of prediction performance. 探索同功能分子:一个基准的设计和预测性能的评估。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-04-01 DOI: 10.1002/minf.202200216
Philippe Pinel, Gwenn Guichaoua, Matthieu Najm, Stéphanie Labouille, Nicolas Drizard, Yann Gaston-Mathé, Brice Hoffmann, Véronique Stoven

Identification of novel chemotypes with biological activity similar to a known active molecule is an important challenge in drug discovery called 'scaffold hopping'. Small-, medium-, and large-step scaffold hopping efforts may lead to increasing degrees of chemical structure novelty with respect to the parent compound. In the present paper, we focus on the problem of large-step scaffold hopping. We assembled a high quality and well characterized dataset of scaffold hopping examples comprising pairs of active molecules and including a variety of protein targets. This dataset was used to build a benchmark corresponding to the setting of real-life applications: one active molecule is known, and the second active is searched among a set of decoys chosen in a way to avoid statistical bias. This allowed us to evaluate the performance of computational methods for solving large-step scaffold hopping problems. In particular, we assessed how difficult these problems are, particularly for classical 2D and 3D ligand-based methods. We also showed that a machine-learning chemogenomic algorithm outperforms classical methods and we provided some useful hints for future improvements.

鉴定具有与已知活性分子相似生物活性的新化学型是药物发现中的一个重要挑战,称为“支架跳跃”。小、中、大台阶支架跳跃的努力可能导致相对于母体化合物的化学结构新颖性程度的增加。本文主要研究大台阶脚手架的跳跃问题。我们组装了一个高质量和特征良好的支架跳跃例子数据集,包括活性分子对,包括各种蛋白质靶点。该数据集用于建立与现实应用设置相对应的基准:已知一种活性分子,在一组以避免统计偏差的方式选择的诱饵中搜索第二种活性分子。这使我们能够评估求解大台阶脚手架跳跃问题的计算方法的性能。特别是,我们评估了这些问题的难度,特别是对于经典的基于二维和三维配体的方法。我们还展示了机器学习化学基因组算法优于经典方法,并为未来的改进提供了一些有用的提示。
{"title":"Exploring isofunctional molecules: Design of a benchmark and evaluation of prediction performance.","authors":"Philippe Pinel,&nbsp;Gwenn Guichaoua,&nbsp;Matthieu Najm,&nbsp;Stéphanie Labouille,&nbsp;Nicolas Drizard,&nbsp;Yann Gaston-Mathé,&nbsp;Brice Hoffmann,&nbsp;Véronique Stoven","doi":"10.1002/minf.202200216","DOIUrl":"https://doi.org/10.1002/minf.202200216","url":null,"abstract":"<p><p>Identification of novel chemotypes with biological activity similar to a known active molecule is an important challenge in drug discovery called 'scaffold hopping'. Small-, medium-, and large-step scaffold hopping efforts may lead to increasing degrees of chemical structure novelty with respect to the parent compound. In the present paper, we focus on the problem of large-step scaffold hopping. We assembled a high quality and well characterized dataset of scaffold hopping examples comprising pairs of active molecules and including a variety of protein targets. This dataset was used to build a benchmark corresponding to the setting of real-life applications: one active molecule is known, and the second active is searched among a set of decoys chosen in a way to avoid statistical bias. This allowed us to evaluate the performance of computational methods for solving large-step scaffold hopping problems. In particular, we assessed how difficult these problems are, particularly for classical 2D and 3D ligand-based methods. We also showed that a machine-learning chemogenomic algorithm outperforms classical methods and we provided some useful hints for future improvements.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9645704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cover Picture: (Mol. Inf. 4/2023) 封面图片:(Mol. Inf. 4/2023)
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-04-01 DOI: 10.1002/minf.202380401
{"title":"Cover Picture: (Mol. Inf. 4/2023)","authors":"","doi":"10.1002/minf.202380401","DOIUrl":"https://doi.org/10.1002/minf.202380401","url":null,"abstract":"","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47190015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparison between 2D and 3D descriptors in QSAR modeling based on bio-active conformations. 基于生物活性构象的QSAR建模中二维和三维描述符的比较。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-04-01 DOI: 10.1002/minf.202200186
Hanoch Senderowitz, Malkeet Singh Bahia, Omer Kaspi, Meir Touitou, Idan Binayev, Seema Dhail, Jacob Spiegel, Netaly Khazanov, Abraham Yosipof

QSAR models are widely and successfully used in many research areas. The success of such models highly depends on molecular descriptors typically classified as 1D, 2D, 3D, or 4D. While 3D information is likely important, e. g., for modeling ligand-protein binding, previous comparisons between the performances of 2D and 3D descriptors were inconclusive. Yet in such comparisons the modeled ligands were not necessarily represented by their bioactive conformations. With this in mind, we mined the PDB for sets of protein-ligand complexes sharing the same protein for which uniform activity data were reported. The results, totaling 461 structures spread across six series were compiled into a carefully curated, first of its kind dataset in which each ligand is represented by its bioactive conformation. Next, each set was characterized by 2D, 3D and 2D + 3D descriptors and modeled using three machine learning algorithms, namely, k-Nearest Neighbors, Random Forest and Lasso Regression. Models' performances were evaluated on external test sets derived from the parent datasets either randomly or in a rational manner. We found that many more significant models were obtained when combining 2D and 3D descriptors. We attribute these improvements to the ability of 2D and 3D descriptors to code for different, yet complementary molecular properties.

QSAR模型在许多研究领域得到了广泛而成功的应用。这种模型的成功在很大程度上取决于通常被分类为1D、2D、3D或4D的分子描述符。虽然3D信息可能很重要,例如:为了模拟配体-蛋白质结合,之前对2D和3D描述符的性能进行的比较是不确定的。然而,在这样的比较中,模拟的配体不一定由它们的生物活性构象来表示。考虑到这一点,我们在PDB中挖掘了具有一致活性数据的相同蛋白质的蛋白质配体复合物集。结果,总共461个结构分布在六个系列中,被汇编成一个精心策划的数据集,这是第一个此类数据集,其中每个配体都由其生物活性构象代表。接下来,每个集合用2D、3D和2D + 3D描述符进行表征,并使用k-Nearest Neighbors、Random Forest和Lasso Regression三种机器学习算法进行建模。模型的性能在来自父数据集的外部测试集上随机或以合理的方式进行评估。我们发现,当结合2D和3D描述符时,获得了许多更重要的模型。我们将这些改进归功于2D和3D描述符对不同但互补的分子特性进行编码的能力。
{"title":"A comparison between 2D and 3D descriptors in QSAR modeling based on bio-active conformations.","authors":"Hanoch Senderowitz,&nbsp;Malkeet Singh Bahia,&nbsp;Omer Kaspi,&nbsp;Meir Touitou,&nbsp;Idan Binayev,&nbsp;Seema Dhail,&nbsp;Jacob Spiegel,&nbsp;Netaly Khazanov,&nbsp;Abraham Yosipof","doi":"10.1002/minf.202200186","DOIUrl":"https://doi.org/10.1002/minf.202200186","url":null,"abstract":"<p><p>QSAR models are widely and successfully used in many research areas. The success of such models highly depends on molecular descriptors typically classified as 1D, 2D, 3D, or 4D. While 3D information is likely important, e. g., for modeling ligand-protein binding, previous comparisons between the performances of 2D and 3D descriptors were inconclusive. Yet in such comparisons the modeled ligands were not necessarily represented by their bioactive conformations. With this in mind, we mined the PDB for sets of protein-ligand complexes sharing the same protein for which uniform activity data were reported. The results, totaling 461 structures spread across six series were compiled into a carefully curated, first of its kind dataset in which each ligand is represented by its bioactive conformation. Next, each set was characterized by 2D, 3D and 2D + 3D descriptors and modeled using three machine learning algorithms, namely, k-Nearest Neighbors, Random Forest and Lasso Regression. Models' performances were evaluated on external test sets derived from the parent datasets either randomly or in a rational manner. We found that many more significant models were obtained when combining 2D and 3D descriptors. We attribute these improvements to the ability of 2D and 3D descriptors to code for different, yet complementary molecular properties.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9296517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
French dispatch: GTM-based analysis of the Chimiothèque Nationale Chemical Space. 法国快报:基于gtm的chiioth<e:1>国家化学空间分析。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-04-01 DOI: 10.1002/minf.202200208
Polina Oleneva, Yuliana Zabolotna, Dragos Horvath, Gilles Marcou, Fanny Bonachera, Alexandre Varnek

In order to analyze the Chimiothèque Nationale (CN) - The French National Compound Library - in the context of screening and biologically relevant compounds, the library was compared with ZINC in-stock collection and ChEMBL. This includes the study of chemical space coverage, physicochemical properties and Bemis-Murcko (BM) scaffold populations. More than 5 K CN-unique scaffolds (relative to ZINC and ChEMBL collections) were identified. Generative Topographic Maps (GTMs) accommodating those libraries were generated and used to compare the compound populations. Hierarchical GTM («zooming») was applied to generate an ensemble of maps at various resolution levels, from global overview to precise mapping of individual structures. The respective maps were added to the ChemSpace Atlas website. The analysis of synthetic accessibility in the context of combinatorial chemistry showed that only 29,7 % of CN compounds can be fully synthesized using commercially available building blocks.

为了分析法国国家化合物文库(chiioth Nationale, CN)的筛选和生物学相关化合物,将该文库与锌库和ChEMBL进行了比较。这包括化学空间覆盖、物理化学性质和Bemis-Murcko (BM)支架种群的研究。鉴定了5个以上的CN-unique支架(相对于锌和ChEMBL集合)。生成地形图(GTMs)包含这些库并用于比较复合种群。应用分层GTM(«缩放»)生成不同分辨率级别的地图集合,从全局概览到单个结构的精确映射。相应的地图被添加到ChemSpace Atlas网站上。在组合化学背景下的合成可达性分析表明,只有29.7%的CN化合物可以用市售的构建块完全合成。
{"title":"French dispatch: GTM-based analysis of the Chimiothèque Nationale Chemical Space.","authors":"Polina Oleneva,&nbsp;Yuliana Zabolotna,&nbsp;Dragos Horvath,&nbsp;Gilles Marcou,&nbsp;Fanny Bonachera,&nbsp;Alexandre Varnek","doi":"10.1002/minf.202200208","DOIUrl":"https://doi.org/10.1002/minf.202200208","url":null,"abstract":"<p><p>In order to analyze the Chimiothèque Nationale (CN) - The French National Compound Library - in the context of screening and biologically relevant compounds, the library was compared with ZINC in-stock collection and ChEMBL. This includes the study of chemical space coverage, physicochemical properties and Bemis-Murcko (BM) scaffold populations. More than 5 K CN-unique scaffolds (relative to ZINC and ChEMBL collections) were identified. Generative Topographic Maps (GTMs) accommodating those libraries were generated and used to compare the compound populations. Hierarchical GTM («zooming») was applied to generate an ensemble of maps at various resolution levels, from global overview to precise mapping of individual structures. The respective maps were added to the ChemSpace Atlas website. The analysis of synthetic accessibility in the context of combinatorial chemistry showed that only 29,7 % of CN compounds can be fully synthesized using commercially available building blocks.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9653057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning q-RASPR approach for efficient predictions of the specific surface area of perovskites. 用于有效预测钙钛矿比表面积的机器学习q-RASPR方法。
IF 3.6 4区 医学 Q1 Chemistry Pub Date : 2023-04-01 DOI: 10.1002/minf.202200261
Arkaprava Banerjee, Agnieszka Gajewicz-Skretna, K Roy

In this study, the specific surface area of various perovskites was modeled using a novel quantitative read-across structure-property relationship (q-RASPR) approach, which clubs both Read-Across (RA) and quantitative structure-property relationship (QSPR) together. After optimization of the hyper-parameters, certain similarity-based error measures for each query compound were obtained. Clubbing some of these error-based measures with the previously selected features along with the Read-Across prediction function, a number of machine learning models were developed using Partial Least Squares (PLS), Ridge Regression (RR), Linear Support Vector Regression (LSVR), Random Forest (RF) regression, Gradient Boost (GBoost), Adaptive Boosting (Adaboost), Multiple Layer Perceptron (MLP) regression and k-Nearest Neighbor (kNN) regression. Based on the repeated cross-validation as well as external prediction quality and interpretability, the PLS model (nTraining  = 38, nTest  = 12, R T r a i n 2 ${{R}_{Train}^{2}}$ =0.737, Q L O O 2 = 0 . 637 , R T e s t 2 = 0 . 898 , Q F 1 T e s t 2 = 0 . 901 ) ${{Q}_{LOO}^{2}=0.637, {R}_{Test}^{2}=0.898,{rm } {Q}_{F1left(Testright)}^{2}=0.901)}$ was selected as the best predictor which underscored the previously reported results. The finally selected model should efficiently predict specific surface areas of other perovskites for their use in photocatalysis. The new q-RASPR method also appears promising for the prediction of several other property endpoints of interest in materials science.

在本研究中,使用一种新的定量跨读结构-性质关系(q-RASPR)方法对各种钙钛矿的比表面积进行了建模,该方法将跨读(RA)和定量结构-性质关系(QSPR)结合在一起。通过对超参数的优化,得到了基于相似性的误差度量。将其中一些基于误差的度量与先前选择的特征以及Read-Across预测函数结合起来,使用偏最小二乘(PLS)、岭回归(RR)、线性支持向量回归(LSVR)、随机森林(RF)回归、梯度增强(GBoost)、自适应增强(Adaboost)、多层感知器(MLP)回归和k-最近邻(kNN)回归开发了许多机器学习模型。基于重复交叉验证以及外部预测质量和可解释性,PLS模型(nTraining = 38, nTest = 12, R T R ain 2 ${{R}_{Train}^{2}}$ =0.737, Q L O O 2 =0。637 R T T s T 2 = 0。qf1t = 0。901) ${{Q}_{LOO}^{2}=0.637, {R}_{Test}^{2}=0.898,{rm } {Q}_{F1left(Testright)}^{2}=0.901)}$被选为最佳预测因子,强调了先前报道的结果。最后选择的模型应该能够有效地预测其他钙钛矿在光催化中的比表面积。新的q-RASPR方法似乎也有希望预测材料科学中其他几个感兴趣的属性端点。
{"title":"A machine learning q-RASPR approach for efficient predictions of the specific surface area of perovskites.","authors":"Arkaprava Banerjee,&nbsp;Agnieszka Gajewicz-Skretna,&nbsp;K Roy","doi":"10.1002/minf.202200261","DOIUrl":"https://doi.org/10.1002/minf.202200261","url":null,"abstract":"<p><p>In this study, the specific surface area of various perovskites was modeled using a novel quantitative read-across structure-property relationship (q-RASPR) approach, which clubs both Read-Across (RA) and quantitative structure-property relationship (QSPR) together. After optimization of the hyper-parameters, certain similarity-based error measures for each query compound were obtained. Clubbing some of these error-based measures with the previously selected features along with the Read-Across prediction function, a number of machine learning models were developed using Partial Least Squares (PLS), Ridge Regression (RR), Linear Support Vector Regression (LSVR), Random Forest (RF) regression, Gradient Boost (GBoost), Adaptive Boosting (Adaboost), Multiple Layer Perceptron (MLP) regression and k-Nearest Neighbor (kNN) regression. Based on the repeated cross-validation as well as external prediction quality and interpretability, the PLS model (n<sub>Training</sub>  = 38, n<sub>Test</sub>  = 12, <math> <semantics><msubsup><mi>R</mi> <mrow><mi>T</mi> <mi>r</mi> <mi>a</mi> <mi>i</mi> <mi>n</mi></mrow> <mn>2</mn></msubsup> <annotation>${{R}_{Train}^{2}}$</annotation> </semantics> </math> =0.737, <math> <semantics> <mrow><msubsup><mi>Q</mi> <mrow><mi>L</mi> <mi>O</mi> <mi>O</mi></mrow> <mn>2</mn></msubsup> <mo>=</mo> <mn>0</mn> <mo>.</mo> <mn>637</mn> <mo>,</mo> <mspace></mspace> <msubsup><mi>R</mi> <mrow><mi>T</mi> <mi>e</mi> <mi>s</mi> <mi>t</mi></mrow> <mn>2</mn></msubsup> <mo>=</mo> <mn>0</mn> <mo>.</mo> <mn>898</mn> <mo>,</mo> <mspace></mspace> <mspace></mspace> <msubsup><mi>Q</mi> <mrow><mi>F</mi> <mn>1</mn> <mfenced><mi>T</mi> <mi>e</mi> <mi>s</mi> <mi>t</mi></mfenced> </mrow> <mn>2</mn></msubsup> <mrow><mo>=</mo> <mn>0</mn> <mo>.</mo> <mn>901</mn> <mo>)</mo></mrow> </mrow> <annotation>${{Q}_{LOO}^{2}=0.637, {R}_{Test}^{2}=0.898,{rm } {Q}_{F1left(Testright)}^{2}=0.901)}$</annotation> </semantics> </math> was selected as the best predictor which underscored the previously reported results. The finally selected model should efficiently predict specific surface areas of other perovskites for their use in photocatalysis. The new q-RASPR method also appears promising for the prediction of several other property endpoints of interest in materials science.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9284533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Molecular Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1