Molecular Informatics最新文献_第5页

In Silico prediction of inhibitors for multiple transporters via machine learning methods. 通过机器学习方法对多种转运体的抑制剂进行硅学预测。

IF 3.6 4区医学 Q1 Chemistry

Molecular Informatics

Pub Date : 2024-03-01 Epub Date: 2024-02-06 DOI: 10.1002/minf.202300270

Hao Duan, Chaofeng Lou, Yaxin Gu, Yimeng Wang, Weihua Li, Guixia Liu, Yun Tang

Transporters play an indispensable role in facilitating the transport of nutrients, signaling molecules and the elimination of metabolites and toxins in human cells. Contemporary computational methods have been employed in the prediction of transporter inhibitors. However, these methods often focus on isolated endpoints, overlooking the interactions between transporters and lacking good interpretation. In this study, we integrated a comprehensive dataset and constructed models to assess the inhibitory effects on seven transporters. Both conventional machine learning and multi-task deep learning methods were employed. The results demonstrated that the MLT-GAT model achieved superior performance with an average AUC value of 0.882. It is noteworthy that our model excels not only in prediction performance but also in achieving robust interpretability, aided by GNN-Explainer. It provided valuable insights into transporter inhibition. The reliability of our model's predictions positioned it as a promising and valuable tool in the field of transporter inhibition research. Related data and code are available at https://gitee.com/wutiantian99/transporter_code.git.

转运体在促进人体细胞内营养物质、信号分子的转运以及代谢物和毒素的排出方面发挥着不可或缺的作用。现代计算方法已被用于预测转运体抑制剂。然而，这些方法往往只关注孤立的终点，忽略了转运体之间的相互作用，缺乏良好的解释。在这项研究中，我们整合了一个综合数据集，并构建了模型来评估对七种转运体的抑制作用。我们采用了传统的机器学习方法和多任务深度学习方法。结果表明，MLT-GAT 模型性能优越，平均 AUC 值为 0.882。值得注意的是，在 GNN-Explainer 的帮助下，我们的模型不仅在预测性能方面表现出色，而且还实现了稳健的可解释性。它为了解转运体抑制作用提供了有价值的见解。我们模型预测的可靠性使其成为转运体抑制研究领域一个有前途、有价值的工具。相关数据和代码见 https://gitee.com/wutiantian99/transporter_code.git。

{"title":"In Silico prediction of inhibitors for multiple transporters via machine learning methods.","authors":"Hao Duan, Chaofeng Lou, Yaxin Gu, Yimeng Wang, Weihua Li, Guixia Liu, Yun Tang","doi":"10.1002/minf.202300270","DOIUrl":"10.1002/minf.202300270","url":null,"abstract":"Transporters play an indispensable role in facilitating the transport of nutrients, signaling molecules and the elimination of metabolites and toxins in human cells. Contemporary computational methods have been employed in the prediction of transporter inhibitors. However, these methods often focus on isolated endpoints, overlooking the interactions between transporters and lacking good interpretation. In this study, we integrated a comprehensive dataset and constructed models to assess the inhibitory effects on seven transporters. Both conventional machine learning and multi-task deep learning methods were employed. The results demonstrated that the MLT-GAT model achieved superior performance with an average AUC value of 0.882. It is noteworthy that our model excels not only in prediction performance but also in achieving robust interpretability, aided by GNN-Explainer. It provided valuable insights into transporter inhibition. The reliability of our model's predictions positioned it as a promising and valuable tool in the field of transporter inhibition research. Related data and code are available at https://gitee.com/wutiantian99/transporter_code.git.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139485652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cover Picture: (Mol. Inf. 2/2024) 封面图片：（Mol.Inf.2/2024）

IF 3.6 4区医学 Q1 Chemistry

Molecular Informatics

Pub Date : 2024-02-23 DOI: 10.1002/minf.202480201

引用次数: 0

Predicting the bandgap and efficiency of perovskite solar cells using machine learning methods. 利用机器学习方法预测钙钛矿太阳能电池的带隙和效率。

IF 3.6 4区医学 Q1 Chemistry

Molecular Informatics

Pub Date : 2024-02-01 Epub Date: 2024-01-04 DOI: 10.1002/minf.202300217

Asad Khan, Jeevan Kandel, Hilal Tayara, Kil To Chong

Rapid and accurate prediction of bandgaps and efficiency of perovskite solar cells is a crucial challenge for various solar cell applications. Existing theoretical and experimental methods often accurately measure these parameters; however, these methods are costly and time-consuming. Machine learning-based approaches offer a promising and computationally efficient method to address this problem. In this study, we trained different machine learning(ML) models using previously reported experimental data. Among the different ML models, the CatBoostRegressor performed better for both bandgap and efficiency approximations. We evaluated the proposed model using k-fold cross-validation and investigated the relative importance of input features using Shapley Additive Explanations (SHAP). SHAP interprets valuable insights into feature contributions of the prediction of the proposed model. Furthermore, we validated the performance of the proposed model using an independent dataset, demonstrating its robustness and generalizability beyond the training data. Our findings show that machine learning-based approaches, with the aid of SHAP, can provide a promising and computationally efficient method for the accurate and rapid prediction of perovskite solar cell properties. The proposed model is expected to facilitate the discovery of new perovskite materials and is freely available at GitHub (https://github.com/AsadKhanJBNU/perovskite_bandgap_and_efficiency.git) for the perovskite community.

30.0快速准确地预测钙钛矿太阳能电池的带隙和效率是各种太阳能电池应用的关键挑战。现有的理论和实验方法往往能准确地测量这些参数;然而，这些方法既昂贵又耗时。基于机器学习的方法为解决这一问题提供了一种有前途且计算效率高的方法。在本研究中，我们使用先前报道的实验数据训练了不同的机器学习(ML)模型。在不同的ML模型中，CatBoostRegressor在带隙和效率近似方面表现更好。我们使用k-fold交叉验证评估了所提出的模型，并使用Shapley加性解释(SHAP)研究了输入特征的相对重要性。SHAP将有价值的见解解释为所提议模型预测的特征贡献。此外，我们使用独立数据集验证了所提出模型的性能，证明了其鲁棒性和泛化性超越了训练数据。我们的研究结果表明，基于机器学习的方法，在SHAP的帮助下，可以为准确和快速预测钙钛矿太阳能电池性能提供一种有前途和计算效率高的方法。所提出的模型有望促进新的钙钛矿材料的发现，并且可以在GitHub (https:) GitHub (com)AsadKhanJBNU)钙钛矿/带隙/和/效率(git)上免费获得。

{"title":"Predicting the bandgap and efficiency of perovskite solar cells using machine learning methods.","authors":"Asad Khan, Jeevan Kandel, Hilal Tayara, Kil To Chong","doi":"10.1002/minf.202300217","DOIUrl":"10.1002/minf.202300217","url":null,"abstract":"Rapid and accurate prediction of bandgaps and efficiency of perovskite solar cells is a crucial challenge for various solar cell applications. Existing theoretical and experimental methods often accurately measure these parameters; however, these methods are costly and time-consuming. Machine learning-based approaches offer a promising and computationally efficient method to address this problem. In this study, we trained different machine learning(ML) models using previously reported experimental data. Among the different ML models, the CatBoostRegressor performed better for both bandgap and efficiency approximations. We evaluated the proposed model using k-fold cross-validation and investigated the relative importance of input features using Shapley Additive Explanations (SHAP). SHAP interprets valuable insights into feature contributions of the prediction of the proposed model. Furthermore, we validated the performance of the proposed model using an independent dataset, demonstrating its robustness and generalizability beyond the training data. Our findings show that machine learning-based approaches, with the aid of SHAP, can provide a promising and computationally efficient method for the accurate and rapid prediction of perovskite solar cell properties. The proposed model is expected to facilitate the discovery of new perovskite materials and is freely available at GitHub (https://github.com/AsadKhanJBNU/perovskite_bandgap_and_efficiency.git) for the perovskite community.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138482686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Kinetic solubility: Experimental and machine-learning modeling perspectives. 动力学溶解度：实验和机器学习建模视角。

IF 3.6 4区医学 Q1 Chemistry

Molecular Informatics

Pub Date : 2024-02-01 Epub Date: 2024-01-23 DOI: 10.1002/minf.202300216

Shamkhal Baybekov, Pierre Llompart, Gilles Marcou, Patrick Gizzi, Jean-Luc Galzi, Pascal Ramos, Olivier Saurel, Claire Bourban, Claire Minoletti, Alexandre Varnek

Kinetic aqueous or buffer solubility is important parameter measuring suitability of compounds for high throughput assays in early drug discovery while thermodynamic solubility is reserved for later stages of drug discovery and development. Kinetic solubility is also considered to have low inter-laboratory reproducibility because of its sensitivity to protocol parameters [1]. Presumably, this is why little efforts have been put to build QSPR models for kinetic in comparison to thermodynamic aqueous solubility. Here, we investigate the reproducibility and modelability of kinetic solubility assays. We first analyzed the relationship between kinetic and thermodynamic solubility data, and then examined the consistency of data from different kinetic assays. In this contribution, we report differences between kinetic and thermodynamic solubility data that are consistent with those reported by others [1, 2] and good agreement between data from different kinetic solubility campaigns in contrast to general expectations. The latter is confirmed by achieving high performing QSPR models trained on merged kinetic solubility datasets. The poor performance of QSPR model trained on thermodynamic solubility when applied to kinetic solubility dataset reinforces the conclusion that kinetic and thermodynamic solubilities do not correlate: one cannot be used as an ersatz for the other. This encourages for building predictive models for kinetic solubility. The kinetic solubility QSPR model developed in this study is freely accessible through the Predictor web service of the Laboratory of Chemoinformatics (https://chematlas.chimie.unistra.fr/cgi-bin/predictor2.cgi).

[[1]](#ref-0001) 在此，我们研究了动力学溶解度测定的可重复性和可模拟性。我们首先分析了动力学溶解度数据与热力学溶解度数据之间的关系，然后考察了不同动力学测定数据的一致性。在这篇论文中，我们报告了动力学水溶性或缓冲溶液溶解度与热力学溶解度之间的差异。动力学水溶性或缓冲溶液溶解度是衡量化合物是否适合在药物发现早期进行高通量测定的重要参数，而热力学溶解度则保留给药物发现和开发的后期阶段。由于动力学溶解度对方案参数的敏感性，它在实验室间的可重复性也被认为很低。据推测，这就是为什么与热力学水溶性相比，人们很少致力于为动力学溶解度建立 QSPR 模型的原因。在合并的动力学溶解度数据集上训练出的高性能 QSPR 模型证实了后者。在热力学溶解度基础上训练的 QSPR 模型在应用于动力学溶解度数据集时表现不佳，这进一步证明了动力学溶解度和热力学溶解度之间并不存在相关性：二者不能相互替代。这有助于建立动力学溶解度预测模型。本研究开发的动力学溶解度 QSPR 模型可通过化学信息学实验室的 Predictor 网络服务（[https://chematlas.chimie.unistra.fr/cgi-bin/predictor2.cgi](https://chematlas.chimie.unistra.fr/cgi-bin/predictor2.cgi)）免费访问。

{"title":"Kinetic solubility: Experimental and machine-learning modeling perspectives.","authors":"Shamkhal Baybekov, Pierre Llompart, Gilles Marcou, Patrick Gizzi, Jean-Luc Galzi, Pascal Ramos, Olivier Saurel, Claire Bourban, Claire Minoletti, Alexandre Varnek","doi":"10.1002/minf.202300216","DOIUrl":"10.1002/minf.202300216","url":null,"abstract":"Kinetic aqueous or buffer solubility is important parameter measuring suitability of compounds for high throughput assays in early drug discovery while thermodynamic solubility is reserved for later stages of drug discovery and development. Kinetic solubility is also considered to have low inter-laboratory reproducibility because of its sensitivity to protocol parameters [1]. Presumably, this is why little efforts have been put to build QSPR models for kinetic in comparison to thermodynamic aqueous solubility. Here, we investigate the reproducibility and modelability of kinetic solubility assays. We first analyzed the relationship between kinetic and thermodynamic solubility data, and then examined the consistency of data from different kinetic assays. In this contribution, we report differences between kinetic and thermodynamic solubility data that are consistent with those reported by others [1, 2] and good agreement between data from different kinetic solubility campaigns in contrast to general expectations. The latter is confirmed by achieving high performing QSPR models trained on merged kinetic solubility datasets. The poor performance of QSPR model trained on thermodynamic solubility when applied to kinetic solubility dataset reinforces the conclusion that kinetic and thermodynamic solubilities do not correlate: one cannot be used as an ersatz for the other. This encourages for building predictive models for kinetic solubility. The kinetic solubility QSPR model developed in this study is freely accessible through the Predictor web service of the Laboratory of Chemoinformatics (https://chematlas.chimie.unistra.fr/cgi-bin/predictor2.cgi).","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139040261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrated workflow for the identification of new GABA_A R positive allosteric modulators based on the in silico screening with further in vitro validation. Case study using Enamine's stock chemical space. 基于计算机筛选和进一步体外验证的新型GABA阳性变构调节剂鉴定集成工作流程。使用Enamine的库存化学空间进行案例研究。

IF 3.6 4区医学 Q1 Chemistry

Molecular Informatics

Pub Date : 2024-02-01 Epub Date: 2024-01-24 DOI: 10.1002/minf.202300156

Maksym Platonov, Oleksandr Maximyuk, Alexey Rayevsky, Olena Iegorova, Vasyl Hurmach, Yuliia Holota, Elijah Bulgakov, Andrii Cherninskyi, Pavel Karpov, Sergey Ryabukhin, Oleg Krishtal, Dmitriy Volochnyuk

Numerous studies reported an association between GABA_A R subunit genes and epilepsy, eating disorders, autism spectrum disorders, neurodevelopmental disorders, and bipolar disorders. This study was aimed to find some potential positive allosteric modulators and was performed by combining the in silico approach with further in vitro evaluation of its real activity. We started from the GABA_A R-diazepam complexes and assembled a lipid embedded protein ensemble to refine it via molecular dynamics (MD) simulation. Then we focused on the interaction of α1β2γ2 with some Z-drugs (non-benzodiazepine compounds) using an Induced Fit Docking (IFD) into the relaxed binding site to generate a pharmacophore model. The pharmacophore model was validated with a reference set and applied to decrease the pre-filtered Enamine database before the main docking procedure. Finally, we succeeded in identifying a set of compounds, which met all features of the docking model. The aqueous solubility and stability of these compounds in mouse plasma were assessed. Then they were tested for the biological activity using the rat Purkinje neurons and CHO cells with heterologously expressed human α1β2γ2 GABA_A receptors. Whole-cell patch clamp recordings were used to reveal the GABA induced currents. Our study represents a convenient and tunable model for the discovery of novel positive allosteric modulators of GABA_A receptors. A High-throughput virtual screening of the largest available database of chemical compounds resulted in the selection of 23 compounds. Further electrophysiological tests allowed us to determine a set of 3 the most outstanding active compounds. Considering the structural features of leader compounds, the study can develop into the MedChem project soon.

大量研究报道了GABA - AR亚基基因与癫痫、饮食失调、自闭症谱系障碍、神经发育障碍和双相情感障碍之间的关联。本研究旨在寻找一些潜在的正变构调节剂，并将计算机方法与进一步的体外活性评估相结合。我们从GABA - ar -地西泮复合物开始，通过分子动力学(MD)模拟组装了一个脂质嵌入蛋白集合来完善它。然后，我们将重点放在α1β2γ2与一些z -药物(非苯二氮卓类化合物)的相互作用上，利用诱导匹配对接(IFD)进入松弛结合位点，产生药效团模型。利用参考集验证药效团模型，并在主对接前减少预过滤的Enamine数据库。最后，我们成功地鉴定出一组符合对接模型所有特征的化合物。评估了这些化合物在小鼠血浆中的水溶性和稳定性。然后用大鼠浦肯野神经元和异源表达人α1β2γ2 GABAA受体的CHO细胞检测其生物活性。全细胞膜片钳记录显示GABA诱导电流。我们的研究为发现GABAA受体的新型正变构调节剂提供了一个方便和可调的模型。对最大的可用化合物数据库进行高通量虚拟筛选，筛选出23种化合物。进一步的电生理测试使我们确定了一组3个最突出的活性化合物。考虑到先导化合物的结构特点，该研究可以很快发展为MedChem项目。

{"title":"Integrated workflow for the identification of new GABAA R positive allosteric modulators based on the in silico screening with further in vitro validation. Case study using Enamine's stock chemical space.","authors":"Maksym Platonov, Oleksandr Maximyuk, Alexey Rayevsky, Olena Iegorova, Vasyl Hurmach, Yuliia Holota, Elijah Bulgakov, Andrii Cherninskyi, Pavel Karpov, Sergey Ryabukhin, Oleg Krishtal, Dmitriy Volochnyuk","doi":"10.1002/minf.202300156","DOIUrl":"10.1002/minf.202300156","url":null,"abstract":"Numerous studies reported an association between GABAA R subunit genes and epilepsy, eating disorders, autism spectrum disorders, neurodevelopmental disorders, and bipolar disorders. This study was aimed to find some potential positive allosteric modulators and was performed by combining the in silico approach with further in vitro evaluation of its real activity. We started from the GABAA R-diazepam complexes and assembled a lipid embedded protein ensemble to refine it via molecular dynamics (MD) simulation. Then we focused on the interaction of α1β2γ2 with some Z-drugs (non-benzodiazepine compounds) using an Induced Fit Docking (IFD) into the relaxed binding site to generate a pharmacophore model. The pharmacophore model was validated with a reference set and applied to decrease the pre-filtered Enamine database before the main docking procedure. Finally, we succeeded in identifying a set of compounds, which met all features of the docking model. The aqueous solubility and stability of these compounds in mouse plasma were assessed. Then they were tested for the biological activity using the rat Purkinje neurons and CHO cells with heterologously expressed human α1β2γ2 GABAA receptors. Whole-cell patch clamp recordings were used to reveal the GABA induced currents. Our study represents a convenient and tunable model for the discovery of novel positive allosteric modulators of GABAA receptors. A High-throughput virtual screening of the largest available database of chemical compounds resulted in the selection of 23 compounds. Further electrophysiological tests allowed us to determine a set of 3 the most outstanding active compounds. Considering the structural features of leader compounds, the study can develop into the MedChem project soon.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"107591770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Chemical language models for molecular design. 分子设计的化学语言模型。

IF 3.6 4区医学 Q1 Chemistry

Molecular Informatics

Pub Date : 2024-01-01 Epub Date: 2023-12-12 DOI: 10.1002/minf.202300288

Jürgen Bajorath

In drug discovery, chemical language models (CLMs) originating from natural language processing offer new opportunities for molecular design. CLMs have been developed using recurrent neural network (RNN) or transformer architectures. For the predictive performance of RNN-based encoder-decoder frameworks and transformers, attention mechanisms play a central role. Among others, emerging application areas for CLMs include constrained generative modeling and the prediction of chemical reactions or drug-target interactions. Since CLMs are applicable to any compound or target data that can be presented in a sequential format and tokenized, mappings of different types of sequences can be learned. For example, active compounds can be predicted from protein sequence motifs. Novel off-the-beat-path applications can also be considered. For example, analogue series from medicinal chemistry can be perceived and represented as chemical sequences and extended with new compounds using CLMs. Herein, methodological features of CLMs and different applications are discussed.

在药物发现中，源自自然语言处理的化学语言模型(CLMs)为分子设计提供了新的机会。clm是使用递归神经网络(RNN)或变压器架构开发的。对于基于rnn的编码器-解码器框架和转换器的预测性能，注意机制起着核心作用。其中，clm的新兴应用领域包括约束生成建模和化学反应或药物-靶标相互作用的预测。由于clm适用于任何可以以顺序格式表示和标记化的复合数据或目标数据，因此可以学习不同类型序列的映射。例如，活性化合物可以从蛋白质序列基序中无缝预测。还可以考虑新颖的非常规应用程序。例如，来自药物化学的类似序列可以被感知和表示为化学序列，并使用clm扩展新化合物。本文讨论了clm的方法特点和不同的应用。

引用次数: 0

A community effort in SARS-CoV-2 drug discovery. 严重急性呼吸系统综合征冠状病毒2型药物发现的社区努力。

IF 2.8 4区医学 Q3 CHEMISTRY, MEDICINAL

Molecular Informatics

Pub Date : 2024-01-01 Epub Date: 2023-11-14 DOI: 10.1002/minf.202300262

Johannes Schimunek, Philipp Seidl, Katarina Elez, Tim Hempel, Tuan Le, Frank Noé, Simon Olsson, Lluís Raich, Robin Winter, Hatice Gokcan, Filipp Gusev, Evgeny M Gutkin, Olexandr Isayev, Maria G Kurnikova, Chamali H Narangoda, Roman Zubatyuk, Ivan P Bosko, Konstantin V Furs, Anna D Karpenko, Yury V Kornoushenko, Mikita Shuldau, Artsemi Yushkevich, Mohammed B Benabderrahmane, Patrick Bousquet-Melou, Ronan Bureau, Beatrice Charton, Bertrand C Cirou, Gérard Gil, William J Allen, Suman Sirimulla, Stanley Watowich, Nick Antonopoulos, Nikolaos Epitropakis, Agamemnon Krasoulis, Vassilis Itsikalis, Stavros Theodorakis, Igor Kozlovskii, Anton Maliutin, Alexander Medvedev, Petr Popov, Mark Zaretckii, Hamid Eghbal-Zadeh, Christina Halmich, Sepp Hochreiter, Andreas Mayr, Peter Ruch, Michael Widrich, Francois Berenger, Ashutosh Kumar, Yoshihiro Yamanishi, Kam Y J Zhang, Emmanuel Bengio, Yoshua Bengio, Moksh J Jain, Maksym Korablyov, Cheng-Hao Liu, Gilles Marcou, Enrico Glaab, Kelly Barnsley, Suhasini M Iyengar, Mary Jo Ondrechen, V Joachim Haupt, Florian Kaiser, Michael Schroeder, Luisa Pugliese, Simone Albani, Christina Athanasiou, Andrea Beccari, Paolo Carloni, Giulia D'Arrigo, Eleonora Gianquinto, Jonas Goßen, Anton Hanke, Benjamin P Joseph, Daria B Kokh, Sandra Kovachka, Candida Manelfi, Goutam Mukherjee, Abraham Muñiz-Chicharro, Francesco Musiani, Ariane Nunes-Alves, Giulia Paiardi, Giulia Rossetti, S Kashif Sadiq, Francesca Spyrakis, Carmine Talarico, Alexandros Tsengenes, Rebecca C Wade, Conner Copeland, Jeremiah Gaiser, Daniel R Olson, Amitava Roy, Vishwesh Venkatraman, Travis J Wheeler, Haribabu Arthanari, Klara Blaschitz, Marco Cespugli, Vedat Durmaz, Konstantin Fackeldey, Patrick D Fischer, Christoph Gorgulla, Christian Gruber, Karl Gruber, Michael Hetmann, Jamie E Kinney, Krishna M Padmanabha Das, Shreya Pandita, Amit Singh, Georg Steinkellner, Guilhem Tesseyre, Gerhard Wagner, Zi-Fu Wang, Ryan J Yust, Dmitry S Druzhilovskiy, Dmitry A Filimonov, Pavel V Pogodin, Vladimir Poroikov, Anastassia V Rudik, Leonid A Stolbov, Alexander V Veselovsky, Maria De Rosa, Giada De Simone, Maria R Gulotta, Jessica Lombino, Nedra Mekni, Ugo Perricone, Arturo Casini, Amanda Embree, D Benjamin Gordon, David Lei, Katelin Pratt, Christopher A Voigt, Kuang-Yu Chen, Yves Jacob, Tim Krischuns, Pierre Lafaye, Agnès Zettor, M Luis Rodríguez, Kris M White, Daren Fearon, Frank Von Delft, Martin A Walsh, Dragos Horvath, Charles L Brooks, Babak Falsafi, Bryan Ford, Adolfo García-Sastre, Sang Yup Lee, Nadia Naffakh, Alexandre Varnek, Günter Klambauer, Thomas M Hermans

The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availability of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against COVID-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.

新冠肺炎大流行继续对人类生命构成重大威胁，并可能在未来几年内造成严重威胁。尽管有疫苗，但寻找广泛可用的高效小分子药物，包括在中低收入国家，是一项持续的挑战。在这项工作中，我们报告了一项开放科学社区努力的结果，即“数十亿分子对抗新冠肺炎挑战”，以确定针对SARS-CoV-2或相关人类受体的小分子抑制剂。参与团队使用了多种计算方法，针对6个蛋白质靶标筛选了至少10亿个虚拟分子。总共有31个团队参与，他们提出了总共639024个分子，随后对这些分子进行了排名，以找到“共识化合物”。组织团队与各种合同研究组织（CRO）和合作机构协调，合成并测试了878种化合物对蛋白酶（Nsp5、Nsp3、TMPRSS2）、核衣壳N、RdRP（仅Nsp12结构域）和（α）刺突蛋白S的生物活性，和/或病毒抑制测定，并且在这里给出。像这里介绍的开放科学方法有助于建立未来药物发现工作的知识库，以寻找更好的严重急性呼吸系统综合征冠状病毒2型治疗方法。

{"title":"A community effort in SARS-CoV-2 drug discovery.","authors":"Johannes Schimunek, Philipp Seidl, Katarina Elez, Tim Hempel, Tuan Le, Frank Noé, Simon Olsson, Lluís Raich, Robin Winter, Hatice Gokcan, Filipp Gusev, Evgeny M Gutkin, Olexandr Isayev, Maria G Kurnikova, Chamali H Narangoda, Roman Zubatyuk, Ivan P Bosko, Konstantin V Furs, Anna D Karpenko, Yury V Kornoushenko, Mikita Shuldau, Artsemi Yushkevich, Mohammed B Benabderrahmane, Patrick Bousquet-Melou, Ronan Bureau, Beatrice Charton, Bertrand C Cirou, Gérard Gil, William J Allen, Suman Sirimulla, Stanley Watowich, Nick Antonopoulos, Nikolaos Epitropakis, Agamemnon Krasoulis, Vassilis Itsikalis, Stavros Theodorakis, Igor Kozlovskii, Anton Maliutin, Alexander Medvedev, Petr Popov, Mark Zaretckii, Hamid Eghbal-Zadeh, Christina Halmich, Sepp Hochreiter, Andreas Mayr, Peter Ruch, Michael Widrich, Francois Berenger, Ashutosh Kumar, Yoshihiro Yamanishi, Kam Y J Zhang, Emmanuel Bengio, Yoshua Bengio, Moksh J Jain, Maksym Korablyov, Cheng-Hao Liu, Gilles Marcou, Enrico Glaab, Kelly Barnsley, Suhasini M Iyengar, Mary Jo Ondrechen, V Joachim Haupt, Florian Kaiser, Michael Schroeder, Luisa Pugliese, Simone Albani, Christina Athanasiou, Andrea Beccari, Paolo Carloni, Giulia D'Arrigo, Eleonora Gianquinto, Jonas Goßen, Anton Hanke, Benjamin P Joseph, Daria B Kokh, Sandra Kovachka, Candida Manelfi, Goutam Mukherjee, Abraham Muñiz-Chicharro, Francesco Musiani, Ariane Nunes-Alves, Giulia Paiardi, Giulia Rossetti, S Kashif Sadiq, Francesca Spyrakis, Carmine Talarico, Alexandros Tsengenes, Rebecca C Wade, Conner Copeland, Jeremiah Gaiser, Daniel R Olson, Amitava Roy, Vishwesh Venkatraman, Travis J Wheeler, Haribabu Arthanari, Klara Blaschitz, Marco Cespugli, Vedat Durmaz, Konstantin Fackeldey, Patrick D Fischer, Christoph Gorgulla, Christian Gruber, Karl Gruber, Michael Hetmann, Jamie E Kinney, Krishna M Padmanabha Das, Shreya Pandita, Amit Singh, Georg Steinkellner, Guilhem Tesseyre, Gerhard Wagner, Zi-Fu Wang, Ryan J Yust, Dmitry S Druzhilovskiy, Dmitry A Filimonov, Pavel V Pogodin, Vladimir Poroikov, Anastassia V Rudik, Leonid A Stolbov, Alexander V Veselovsky, Maria De Rosa, Giada De Simone, Maria R Gulotta, Jessica Lombino, Nedra Mekni, Ugo Perricone, Arturo Casini, Amanda Embree, D Benjamin Gordon, David Lei, Katelin Pratt, Christopher A Voigt, Kuang-Yu Chen, Yves Jacob, Tim Krischuns, Pierre Lafaye, Agnès Zettor, M Luis Rodríguez, Kris M White, Daren Fearon, Frank Von Delft, Martin A Walsh, Dragos Horvath, Charles L Brooks, Babak Falsafi, Bryan Ford, Adolfo García-Sastre, Sang Yup Lee, Nadia Naffakh, Alexandre Varnek, Günter Klambauer, Thomas M Hermans","doi":"10.1002/minf.202300262","DOIUrl":"10.1002/minf.202300262","url":null,"abstract":"The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availability of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the \"Billion molecules against COVID-19 challenge\", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41205605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CIPSI: An open chemical intellectual property service for medicinal chemists. CIPSI:为药物化学家提供开放的化学知识产权服务。

IF 3.6 4区医学 Q1 Chemistry

Molecular Informatics

Pub Date : 2024-01-01 Epub Date: 2023-12-12 DOI: 10.1002/minf.202300221

Maria Martinez-Sevillano, Maria J Falaguera, Jordi Mestres

The availability of patent chemical data offers public access to a chemical space that is not well covered by other sources collecting small molecules from scholarly literature. However, open applications to facilitate the search and analysis of biologically-relevant molecular structures present in patents are still largely missing. We have developed CIPSI, an open Chemical Intellectual Property Service @ IMIM to assist medicinal chemists in searching and analysing molecules in SureChEMBL patents. The current version contains 6,240,500 molecules from 236,689 pharmacological patents, of which 5,949,214 are confidently assigned to core chemical structures reminiscent of the Markush structure in the patent claim. The platform includes some graphical tools to facilitate comparative patent analyses between drugs, chemical substructures, and company assignees. CIPSI is available at https://cipsi.org.

专利化学数据的可用性为公众提供了从学术文献中收集小分子的其他来源无法很好地覆盖的化学空间。然而，开放的应用程序，以促进搜索和分析生物相关的分子结构存在于专利仍然很大程度上缺失。我们已经开发了CIPSI，一个开放的化学知识产权服务@ IMIM，以帮助药物化学家搜索和分析SureChEMBL专利中的分子。目前的版本包含来自236,689项药理学专利的6,240,500个分子，其中5,949,214个被自信地分配到核心化学结构，使人想起专利权利要求中的马库什结构。该平台包括一些图形工具，以方便药物、化学子结构和公司受让人之间的比较专利分析。CIPSI可在[http://cipsi.imim.es](http://cipsi.imim.es)上获得。

引用次数: 0

GUIDEMOL: A Python graphical user interface for molecular descriptors based on RDKit. GUIDEMOL：基于RDKit的分子描述符的Python图形用户界面。

IF 3.6 4区医学 Q1 Chemistry

Molecular Informatics

Pub Date : 2024-01-01 Epub Date: 2023-11-20 DOI: 10.1002/minf.202300190

Joao Aires-de-Sousa

GUIDEMOL is a Python computer program based on the RDKit software to process molecular structures and calculate molecular descriptors with a graphical user interface using the tkinter package. It can calculate descriptors already implemented in RDKit as well as grid representations of 3D molecular structures using the electrostatic potential or voxels. The GUIDEMOL app provides easy access to RDKit tools for chemoinformatics users with no programming skills and can be adapted to calculate other descriptors or to trigger other procedures. A command line interface (CLI) is also provided for the calculation of grid representations. The source code is available at https://github.com/jairesdesousa/guidemol.

tkinter GUIDEMOL是一个基于RDKit软件的Python计算机程序，用于处理分子结构并使用该包通过图形用户界面计算分子描述符。它可以计算RDKit中已经实现的描述符，以及使用静电势或体素的3D分子结构的网格表示。GUIDEMOL应用程序为没有编程技能的化学信息学用户提供了对RDKit工具的轻松访问，并且可以用于计算其他描述符或触发其他程序。还提供了用于计算网格表示的命令行界面（CLI）。源代码位于https://github.com/jairesdesousa/guidemol.

引用次数: 0

HIt Discovery using docking ENriched by GEnerative Modeling (HIDDEN GEM): A novel computational workflow for accelerated virtual screening of ultra-large chemical libraries. 使用扩展生成建模（HIDDEN-GEM）丰富的对接发现命中率：一种用于加速超大型化学库虚拟筛选的新计算工作流。

IF 3.6 4区医学 Q1 Chemistry

Molecular Informatics

Pub Date : 2024-01-01 Epub Date: 2023-12-19 DOI: 10.1002/minf.202300207

Konstantin I Popov, James Wellnitz, Travis Maxfield, Alexander Tropsha

Recent rapid expansion of make-on-demand, purchasable, chemical libraries comprising dozens of billions or even trillions of molecules has challenged the efficient application of traditional structure-based virtual screening methods that rely on molecular docking. We present a novel computational methodology termed HIDDEN GEM (HIt Discovery using Docking ENriched by GEnerative Modeling) that greatly accelerates virtual screening. This workflow uniquely integrates machine learning, generative chemistry, massive chemical similarity searching and molecular docking of small, selected libraries in the beginning and the end of the workflow. For each target, HIDDEN GEM nominates a small number of top-scoring virtual hits prioritized from ultra-large chemical libraries. We have benchmarked HIDDEN GEM by conducting virtual screening campaigns for 16 diverse protein targets using Enamine REAL Space library comprising 37 billion molecules. We show that HIDDEN GEM yields the highest enrichment factors as compared to state of the art accelerated virtual screening methods, while requiring the least computational resources. HIDDEN GEM can be executed with any docking software and employed by users with limited computational resources.

最近，由数十亿甚至数万亿分子组成的按需、可购买的化学文库的快速扩张，对依赖分子对接的传统基于结构的虚拟筛选方法的有效应用提出了挑战。我们提出了一种新的计算方法，称为HIDDEN GEM（HIt Discovery using Docking ENriched by GEnerative Modeling），它大大加速了虚拟筛选。该工作流程独特地集成了机器学习、生成化学、大规模化学相似性搜索以及在工作流程的开始和结束时对选定的小型库进行分子对接。对于每个目标，HIDDEN GEM从超大型化学库中提名少量得分最高的虚拟点击。我们通过使用包含370亿个分子的Enamine REAL Space文库对16个不同的蛋白质靶标进行虚拟筛选，以HIDDEN GEM为基准。我们表明，与现有技术的加速虚拟筛选方法相比，HIDDEN GEM产生了最高的富集因子，同时需要最少的计算资源。HIDDEN GEM可以用任何对接软件执行，并由计算资源有限的用户使用。

{"title":"HIt Discovery using docking ENriched by GEnerative Modeling (HIDDEN GEM): A novel computational workflow for accelerated virtual screening of ultra-large chemical libraries.","authors":"Konstantin I Popov, James Wellnitz, Travis Maxfield, Alexander Tropsha","doi":"10.1002/minf.202300207","DOIUrl":"10.1002/minf.202300207","url":null,"abstract":"Recent rapid expansion of make-on-demand, purchasable, chemical libraries comprising dozens of billions or even trillions of molecules has challenged the efficient application of traditional structure-based virtual screening methods that rely on molecular docking. We present a novel computational methodology termed HIDDEN GEM (HIt Discovery using Docking ENriched by GEnerative Modeling) that greatly accelerates virtual screening. This workflow uniquely integrates machine learning, generative chemistry, massive chemical similarity searching and molecular docking of small, selected libraries in the beginning and the end of the workflow. For each target, HIDDEN GEM nominates a small number of top-scoring virtual hits prioritized from ultra-large chemical libraries. We have benchmarked HIDDEN GEM by conducting virtual screening campaigns for 16 diverse protein targets using Enamine REAL Space library comprising 37 billion molecules. We show that HIDDEN GEM yields the highest enrichment factors as compared to state of the art accelerated virtual screening methods, while requiring the least computational resources. HIDDEN GEM can be executed with any docking software and employed by users with limited computational resources.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11156482/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41139125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0