Transporters play an indispensable role in facilitating the transport of nutrients, signaling molecules and the elimination of metabolites and toxins in human cells. Contemporary computational methods have been employed in the prediction of transporter inhibitors. However, these methods often focus on isolated endpoints, overlooking the interactions between transporters and lacking good interpretation. In this study, we integrated a comprehensive dataset and constructed models to assess the inhibitory effects on seven transporters. Both conventional machine learning and multi-task deep learning methods were employed. The results demonstrated that the MLT-GAT model achieved superior performance with an average AUC value of 0.882. It is noteworthy that our model excels not only in prediction performance but also in achieving robust interpretability, aided by GNN-Explainer. It provided valuable insights into transporter inhibition. The reliability of our model's predictions positioned it as a promising and valuable tool in the field of transporter inhibition research. Related data and code are available at https://gitee.com/wutiantian99/transporter_code.git.
{"title":"In Silico prediction of inhibitors for multiple transporters via machine learning methods.","authors":"Hao Duan, Chaofeng Lou, Yaxin Gu, Yimeng Wang, Weihua Li, Guixia Liu, Yun Tang","doi":"10.1002/minf.202300270","DOIUrl":"10.1002/minf.202300270","url":null,"abstract":"<p><p>Transporters play an indispensable role in facilitating the transport of nutrients, signaling molecules and the elimination of metabolites and toxins in human cells. Contemporary computational methods have been employed in the prediction of transporter inhibitors. However, these methods often focus on isolated endpoints, overlooking the interactions between transporters and lacking good interpretation. In this study, we integrated a comprehensive dataset and constructed models to assess the inhibitory effects on seven transporters. Both conventional machine learning and multi-task deep learning methods were employed. The results demonstrated that the MLT-GAT model achieved superior performance with an average AUC value of 0.882. It is noteworthy that our model excels not only in prediction performance but also in achieving robust interpretability, aided by GNN-Explainer. It provided valuable insights into transporter inhibition. The reliability of our model's predictions positioned it as a promising and valuable tool in the field of transporter inhibition research. Related data and code are available at https://gitee.com/wutiantian99/transporter_code.git.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139485652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-01Epub Date: 2024-01-04DOI: 10.1002/minf.202300217
Asad Khan, Jeevan Kandel, Hilal Tayara, Kil To Chong
Rapid and accurate prediction of bandgaps and efficiency of perovskite solar cells is a crucial challenge for various solar cell applications. Existing theoretical and experimental methods often accurately measure these parameters; however, these methods are costly and time-consuming. Machine learning-based approaches offer a promising and computationally efficient method to address this problem. In this study, we trained different machine learning(ML) models using previously reported experimental data. Among the different ML models, the CatBoostRegressor performed better for both bandgap and efficiency approximations. We evaluated the proposed model using k-fold cross-validation and investigated the relative importance of input features using Shapley Additive Explanations (SHAP). SHAP interprets valuable insights into feature contributions of the prediction of the proposed model. Furthermore, we validated the performance of the proposed model using an independent dataset, demonstrating its robustness and generalizability beyond the training data. Our findings show that machine learning-based approaches, with the aid of SHAP, can provide a promising and computationally efficient method for the accurate and rapid prediction of perovskite solar cell properties. The proposed model is expected to facilitate the discovery of new perovskite materials and is freely available at GitHub (https://github.com/AsadKhanJBNU/perovskite_bandgap_and_efficiency.git) for the perovskite community.
{"title":"Predicting the bandgap and efficiency of perovskite solar cells using machine learning methods.","authors":"Asad Khan, Jeevan Kandel, Hilal Tayara, Kil To Chong","doi":"10.1002/minf.202300217","DOIUrl":"10.1002/minf.202300217","url":null,"abstract":"<p><p>Rapid and accurate prediction of bandgaps and efficiency of perovskite solar cells is a crucial challenge for various solar cell applications. Existing theoretical and experimental methods often accurately measure these parameters; however, these methods are costly and time-consuming. Machine learning-based approaches offer a promising and computationally efficient method to address this problem. In this study, we trained different machine learning(ML) models using previously reported experimental data. Among the different ML models, the CatBoostRegressor performed better for both bandgap and efficiency approximations. We evaluated the proposed model using k-fold cross-validation and investigated the relative importance of input features using Shapley Additive Explanations (SHAP). SHAP interprets valuable insights into feature contributions of the prediction of the proposed model. Furthermore, we validated the performance of the proposed model using an independent dataset, demonstrating its robustness and generalizability beyond the training data. Our findings show that machine learning-based approaches, with the aid of SHAP, can provide a promising and computationally efficient method for the accurate and rapid prediction of perovskite solar cell properties. The proposed model is expected to facilitate the discovery of new perovskite materials and is freely available at GitHub (https://github.com/AsadKhanJBNU/perovskite_bandgap_and_efficiency.git) for the perovskite community.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138482686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-01Epub Date: 2024-01-23DOI: 10.1002/minf.202300216
Shamkhal Baybekov, Pierre Llompart, Gilles Marcou, Patrick Gizzi, Jean-Luc Galzi, Pascal Ramos, Olivier Saurel, Claire Bourban, Claire Minoletti, Alexandre Varnek
Kinetic aqueous or buffer solubility is important parameter measuring suitability of compounds for high throughput assays in early drug discovery while thermodynamic solubility is reserved for later stages of drug discovery and development. Kinetic solubility is also considered to have low inter-laboratory reproducibility because of its sensitivity to protocol parameters [1]. Presumably, this is why little efforts have been put to build QSPR models for kinetic in comparison to thermodynamic aqueous solubility. Here, we investigate the reproducibility and modelability of kinetic solubility assays. We first analyzed the relationship between kinetic and thermodynamic solubility data, and then examined the consistency of data from different kinetic assays. In this contribution, we report differences between kinetic and thermodynamic solubility data that are consistent with those reported by others [1, 2] and good agreement between data from different kinetic solubility campaigns in contrast to general expectations. The latter is confirmed by achieving high performing QSPR models trained on merged kinetic solubility datasets. The poor performance of QSPR model trained on thermodynamic solubility when applied to kinetic solubility dataset reinforces the conclusion that kinetic and thermodynamic solubilities do not correlate: one cannot be used as an ersatz for the other. This encourages for building predictive models for kinetic solubility. The kinetic solubility QSPR model developed in this study is freely accessible through the Predictor web service of the Laboratory of Chemoinformatics (https://chematlas.chimie.unistra.fr/cgi-bin/predictor2.cgi).
{"title":"Kinetic solubility: Experimental and machine-learning modeling perspectives.","authors":"Shamkhal Baybekov, Pierre Llompart, Gilles Marcou, Patrick Gizzi, Jean-Luc Galzi, Pascal Ramos, Olivier Saurel, Claire Bourban, Claire Minoletti, Alexandre Varnek","doi":"10.1002/minf.202300216","DOIUrl":"10.1002/minf.202300216","url":null,"abstract":"<p><p>Kinetic aqueous or buffer solubility is important parameter measuring suitability of compounds for high throughput assays in early drug discovery while thermodynamic solubility is reserved for later stages of drug discovery and development. Kinetic solubility is also considered to have low inter-laboratory reproducibility because of its sensitivity to protocol parameters [1]. Presumably, this is why little efforts have been put to build QSPR models for kinetic in comparison to thermodynamic aqueous solubility. Here, we investigate the reproducibility and modelability of kinetic solubility assays. We first analyzed the relationship between kinetic and thermodynamic solubility data, and then examined the consistency of data from different kinetic assays. In this contribution, we report differences between kinetic and thermodynamic solubility data that are consistent with those reported by others [1, 2] and good agreement between data from different kinetic solubility campaigns in contrast to general expectations. The latter is confirmed by achieving high performing QSPR models trained on merged kinetic solubility datasets. The poor performance of QSPR model trained on thermodynamic solubility when applied to kinetic solubility dataset reinforces the conclusion that kinetic and thermodynamic solubilities do not correlate: one cannot be used as an ersatz for the other. This encourages for building predictive models for kinetic solubility. The kinetic solubility QSPR model developed in this study is freely accessible through the Predictor web service of the Laboratory of Chemoinformatics (https://chematlas.chimie.unistra.fr/cgi-bin/predictor2.cgi).</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139040261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Numerous studies reported an association between GABAA R subunit genes and epilepsy, eating disorders, autism spectrum disorders, neurodevelopmental disorders, and bipolar disorders. This study was aimed to find some potential positive allosteric modulators and was performed by combining the in silico approach with further in vitro evaluation of its real activity. We started from the GABAA R-diazepam complexes and assembled a lipid embedded protein ensemble to refine it via molecular dynamics (MD) simulation. Then we focused on the interaction of α1β2γ2 with some Z-drugs (non-benzodiazepine compounds) using an Induced Fit Docking (IFD) into the relaxed binding site to generate a pharmacophore model. The pharmacophore model was validated with a reference set and applied to decrease the pre-filtered Enamine database before the main docking procedure. Finally, we succeeded in identifying a set of compounds, which met all features of the docking model. The aqueous solubility and stability of these compounds in mouse plasma were assessed. Then they were tested for the biological activity using the rat Purkinje neurons and CHO cells with heterologously expressed human α1β2γ2 GABAA receptors. Whole-cell patch clamp recordings were used to reveal the GABA induced currents. Our study represents a convenient and tunable model for the discovery of novel positive allosteric modulators of GABAA receptors. A High-throughput virtual screening of the largest available database of chemical compounds resulted in the selection of 23 compounds. Further electrophysiological tests allowed us to determine a set of 3 the most outstanding active compounds. Considering the structural features of leader compounds, the study can develop into the MedChem project soon.
大量研究报道了GABA - AR亚基基因与癫痫、饮食失调、自闭症谱系障碍、神经发育障碍和双相情感障碍之间的关联。本研究旨在寻找一些潜在的正变构调节剂,并将计算机方法与进一步的体外活性评估相结合。我们从GABA - ar -地西泮复合物开始,通过分子动力学(MD)模拟组装了一个脂质嵌入蛋白集合来完善它。然后,我们将重点放在α1β2γ2与一些z -药物(非苯二氮卓类化合物)的相互作用上,利用诱导匹配对接(IFD)进入松弛结合位点,产生药效团模型。利用参考集验证药效团模型,并在主对接前减少预过滤的Enamine数据库。最后,我们成功地鉴定出一组符合对接模型所有特征的化合物。评估了这些化合物在小鼠血浆中的水溶性和稳定性。然后用大鼠浦肯野神经元和异源表达人α1β2γ2 GABAA受体的CHO细胞检测其生物活性。全细胞膜片钳记录显示GABA诱导电流。我们的研究为发现GABAA受体的新型正变构调节剂提供了一个方便和可调的模型。对最大的可用化合物数据库进行高通量虚拟筛选,筛选出23种化合物。进一步的电生理测试使我们确定了一组3个最突出的活性化合物。考虑到先导化合物的结构特点,该研究可以很快发展为MedChem项目。
{"title":"Integrated workflow for the identification of new GABA<sub>A</sub> R positive allosteric modulators based on the in silico screening with further in vitro validation. Case study using Enamine's stock chemical space.","authors":"Maksym Platonov, Oleksandr Maximyuk, Alexey Rayevsky, Olena Iegorova, Vasyl Hurmach, Yuliia Holota, Elijah Bulgakov, Andrii Cherninskyi, Pavel Karpov, Sergey Ryabukhin, Oleg Krishtal, Dmitriy Volochnyuk","doi":"10.1002/minf.202300156","DOIUrl":"10.1002/minf.202300156","url":null,"abstract":"<p><p>Numerous studies reported an association between GABA<sub>A</sub> R subunit genes and epilepsy, eating disorders, autism spectrum disorders, neurodevelopmental disorders, and bipolar disorders. This study was aimed to find some potential positive allosteric modulators and was performed by combining the in silico approach with further in vitro evaluation of its real activity. We started from the GABA<sub>A</sub> R-diazepam complexes and assembled a lipid embedded protein ensemble to refine it via molecular dynamics (MD) simulation. Then we focused on the interaction of α1β2γ2 with some Z-drugs (non-benzodiazepine compounds) using an Induced Fit Docking (IFD) into the relaxed binding site to generate a pharmacophore model. The pharmacophore model was validated with a reference set and applied to decrease the pre-filtered Enamine database before the main docking procedure. Finally, we succeeded in identifying a set of compounds, which met all features of the docking model. The aqueous solubility and stability of these compounds in mouse plasma were assessed. Then they were tested for the biological activity using the rat Purkinje neurons and CHO cells with heterologously expressed human α1β2γ2 GABA<sub>A</sub> receptors. Whole-cell patch clamp recordings were used to reveal the GABA induced currents. Our study represents a convenient and tunable model for the discovery of novel positive allosteric modulators of GABA<sub>A</sub> receptors. A High-throughput virtual screening of the largest available database of chemical compounds resulted in the selection of 23 compounds. Further electrophysiological tests allowed us to determine a set of 3 the most outstanding active compounds. Considering the structural features of leader compounds, the study can develop into the MedChem project soon.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"107591770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2023-12-12DOI: 10.1002/minf.202300288
Jürgen Bajorath
In drug discovery, chemical language models (CLMs) originating from natural language processing offer new opportunities for molecular design. CLMs have been developed using recurrent neural network (RNN) or transformer architectures. For the predictive performance of RNN-based encoder-decoder frameworks and transformers, attention mechanisms play a central role. Among others, emerging application areas for CLMs include constrained generative modeling and the prediction of chemical reactions or drug-target interactions. Since CLMs are applicable to any compound or target data that can be presented in a sequential format and tokenized, mappings of different types of sequences can be learned. For example, active compounds can be predicted from protein sequence motifs. Novel off-the-beat-path applications can also be considered. For example, analogue series from medicinal chemistry can be perceived and represented as chemical sequences and extended with new compounds using CLMs. Herein, methodological features of CLMs and different applications are discussed.
{"title":"Chemical language models for molecular design.","authors":"Jürgen Bajorath","doi":"10.1002/minf.202300288","DOIUrl":"10.1002/minf.202300288","url":null,"abstract":"<p><p>In drug discovery, chemical language models (CLMs) originating from natural language processing offer new opportunities for molecular design. CLMs have been developed using recurrent neural network (RNN) or transformer architectures. For the predictive performance of RNN-based encoder-decoder frameworks and transformers, attention mechanisms play a central role. Among others, emerging application areas for CLMs include constrained generative modeling and the prediction of chemical reactions or drug-target interactions. Since CLMs are applicable to any compound or target data that can be presented in a sequential format and tokenized, mappings of different types of sequences can be learned. For example, active compounds can be predicted from protein sequence motifs. Novel off-the-beat-path applications can also be considered. For example, analogue series from medicinal chemistry can be perceived and represented as chemical sequences and extended with new compounds using CLMs. Herein, methodological features of CLMs and different applications are discussed.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138445490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2023-11-14DOI: 10.1002/minf.202300262
Johannes Schimunek, Philipp Seidl, Katarina Elez, Tim Hempel, Tuan Le, Frank Noé, Simon Olsson, Lluís Raich, Robin Winter, Hatice Gokcan, Filipp Gusev, Evgeny M Gutkin, Olexandr Isayev, Maria G Kurnikova, Chamali H Narangoda, Roman Zubatyuk, Ivan P Bosko, Konstantin V Furs, Anna D Karpenko, Yury V Kornoushenko, Mikita Shuldau, Artsemi Yushkevich, Mohammed B Benabderrahmane, Patrick Bousquet-Melou, Ronan Bureau, Beatrice Charton, Bertrand C Cirou, Gérard Gil, William J Allen, Suman Sirimulla, Stanley Watowich, Nick Antonopoulos, Nikolaos Epitropakis, Agamemnon Krasoulis, Vassilis Itsikalis, Stavros Theodorakis, Igor Kozlovskii, Anton Maliutin, Alexander Medvedev, Petr Popov, Mark Zaretckii, Hamid Eghbal-Zadeh, Christina Halmich, Sepp Hochreiter, Andreas Mayr, Peter Ruch, Michael Widrich, Francois Berenger, Ashutosh Kumar, Yoshihiro Yamanishi, Kam Y J Zhang, Emmanuel Bengio, Yoshua Bengio, Moksh J Jain, Maksym Korablyov, Cheng-Hao Liu, Gilles Marcou, Enrico Glaab, Kelly Barnsley, Suhasini M Iyengar, Mary Jo Ondrechen, V Joachim Haupt, Florian Kaiser, Michael Schroeder, Luisa Pugliese, Simone Albani, Christina Athanasiou, Andrea Beccari, Paolo Carloni, Giulia D'Arrigo, Eleonora Gianquinto, Jonas Goßen, Anton Hanke, Benjamin P Joseph, Daria B Kokh, Sandra Kovachka, Candida Manelfi, Goutam Mukherjee, Abraham Muñiz-Chicharro, Francesco Musiani, Ariane Nunes-Alves, Giulia Paiardi, Giulia Rossetti, S Kashif Sadiq, Francesca Spyrakis, Carmine Talarico, Alexandros Tsengenes, Rebecca C Wade, Conner Copeland, Jeremiah Gaiser, Daniel R Olson, Amitava Roy, Vishwesh Venkatraman, Travis J Wheeler, Haribabu Arthanari, Klara Blaschitz, Marco Cespugli, Vedat Durmaz, Konstantin Fackeldey, Patrick D Fischer, Christoph Gorgulla, Christian Gruber, Karl Gruber, Michael Hetmann, Jamie E Kinney, Krishna M Padmanabha Das, Shreya Pandita, Amit Singh, Georg Steinkellner, Guilhem Tesseyre, Gerhard Wagner, Zi-Fu Wang, Ryan J Yust, Dmitry S Druzhilovskiy, Dmitry A Filimonov, Pavel V Pogodin, Vladimir Poroikov, Anastassia V Rudik, Leonid A Stolbov, Alexander V Veselovsky, Maria De Rosa, Giada De Simone, Maria R Gulotta, Jessica Lombino, Nedra Mekni, Ugo Perricone, Arturo Casini, Amanda Embree, D Benjamin Gordon, David Lei, Katelin Pratt, Christopher A Voigt, Kuang-Yu Chen, Yves Jacob, Tim Krischuns, Pierre Lafaye, Agnès Zettor, M Luis Rodríguez, Kris M White, Daren Fearon, Frank Von Delft, Martin A Walsh, Dragos Horvath, Charles L Brooks, Babak Falsafi, Bryan Ford, Adolfo García-Sastre, Sang Yup Lee, Nadia Naffakh, Alexandre Varnek, Günter Klambauer, Thomas M Hermans
The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availability of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against COVID-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.
{"title":"A community effort in SARS-CoV-2 drug discovery.","authors":"Johannes Schimunek, Philipp Seidl, Katarina Elez, Tim Hempel, Tuan Le, Frank Noé, Simon Olsson, Lluís Raich, Robin Winter, Hatice Gokcan, Filipp Gusev, Evgeny M Gutkin, Olexandr Isayev, Maria G Kurnikova, Chamali H Narangoda, Roman Zubatyuk, Ivan P Bosko, Konstantin V Furs, Anna D Karpenko, Yury V Kornoushenko, Mikita Shuldau, Artsemi Yushkevich, Mohammed B Benabderrahmane, Patrick Bousquet-Melou, Ronan Bureau, Beatrice Charton, Bertrand C Cirou, Gérard Gil, William J Allen, Suman Sirimulla, Stanley Watowich, Nick Antonopoulos, Nikolaos Epitropakis, Agamemnon Krasoulis, Vassilis Itsikalis, Stavros Theodorakis, Igor Kozlovskii, Anton Maliutin, Alexander Medvedev, Petr Popov, Mark Zaretckii, Hamid Eghbal-Zadeh, Christina Halmich, Sepp Hochreiter, Andreas Mayr, Peter Ruch, Michael Widrich, Francois Berenger, Ashutosh Kumar, Yoshihiro Yamanishi, Kam Y J Zhang, Emmanuel Bengio, Yoshua Bengio, Moksh J Jain, Maksym Korablyov, Cheng-Hao Liu, Gilles Marcou, Enrico Glaab, Kelly Barnsley, Suhasini M Iyengar, Mary Jo Ondrechen, V Joachim Haupt, Florian Kaiser, Michael Schroeder, Luisa Pugliese, Simone Albani, Christina Athanasiou, Andrea Beccari, Paolo Carloni, Giulia D'Arrigo, Eleonora Gianquinto, Jonas Goßen, Anton Hanke, Benjamin P Joseph, Daria B Kokh, Sandra Kovachka, Candida Manelfi, Goutam Mukherjee, Abraham Muñiz-Chicharro, Francesco Musiani, Ariane Nunes-Alves, Giulia Paiardi, Giulia Rossetti, S Kashif Sadiq, Francesca Spyrakis, Carmine Talarico, Alexandros Tsengenes, Rebecca C Wade, Conner Copeland, Jeremiah Gaiser, Daniel R Olson, Amitava Roy, Vishwesh Venkatraman, Travis J Wheeler, Haribabu Arthanari, Klara Blaschitz, Marco Cespugli, Vedat Durmaz, Konstantin Fackeldey, Patrick D Fischer, Christoph Gorgulla, Christian Gruber, Karl Gruber, Michael Hetmann, Jamie E Kinney, Krishna M Padmanabha Das, Shreya Pandita, Amit Singh, Georg Steinkellner, Guilhem Tesseyre, Gerhard Wagner, Zi-Fu Wang, Ryan J Yust, Dmitry S Druzhilovskiy, Dmitry A Filimonov, Pavel V Pogodin, Vladimir Poroikov, Anastassia V Rudik, Leonid A Stolbov, Alexander V Veselovsky, Maria De Rosa, Giada De Simone, Maria R Gulotta, Jessica Lombino, Nedra Mekni, Ugo Perricone, Arturo Casini, Amanda Embree, D Benjamin Gordon, David Lei, Katelin Pratt, Christopher A Voigt, Kuang-Yu Chen, Yves Jacob, Tim Krischuns, Pierre Lafaye, Agnès Zettor, M Luis Rodríguez, Kris M White, Daren Fearon, Frank Von Delft, Martin A Walsh, Dragos Horvath, Charles L Brooks, Babak Falsafi, Bryan Ford, Adolfo García-Sastre, Sang Yup Lee, Nadia Naffakh, Alexandre Varnek, Günter Klambauer, Thomas M Hermans","doi":"10.1002/minf.202300262","DOIUrl":"10.1002/minf.202300262","url":null,"abstract":"<p><p>The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availability of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the \"Billion molecules against COVID-19 challenge\", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41205605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2023-12-12DOI: 10.1002/minf.202300221
Maria Martinez-Sevillano, Maria J Falaguera, Jordi Mestres
The availability of patent chemical data offers public access to a chemical space that is not well covered by other sources collecting small molecules from scholarly literature. However, open applications to facilitate the search and analysis of biologically-relevant molecular structures present in patents are still largely missing. We have developed CIPSI, an open Chemical Intellectual Property Service @ IMIM to assist medicinal chemists in searching and analysing molecules in SureChEMBL patents. The current version contains 6,240,500 molecules from 236,689 pharmacological patents, of which 5,949,214 are confidently assigned to core chemical structures reminiscent of the Markush structure in the patent claim. The platform includes some graphical tools to facilitate comparative patent analyses between drugs, chemical substructures, and company assignees. CIPSI is available at https://cipsi.org.
{"title":"CIPSI: An open chemical intellectual property service for medicinal chemists.","authors":"Maria Martinez-Sevillano, Maria J Falaguera, Jordi Mestres","doi":"10.1002/minf.202300221","DOIUrl":"10.1002/minf.202300221","url":null,"abstract":"<p><p>The availability of patent chemical data offers public access to a chemical space that is not well covered by other sources collecting small molecules from scholarly literature. However, open applications to facilitate the search and analysis of biologically-relevant molecular structures present in patents are still largely missing. We have developed CIPSI, an open Chemical Intellectual Property Service @ IMIM to assist medicinal chemists in searching and analysing molecules in SureChEMBL patents. The current version contains 6,240,500 molecules from 236,689 pharmacological patents, of which 5,949,214 are confidently assigned to core chemical structures reminiscent of the Markush structure in the patent claim. The platform includes some graphical tools to facilitate comparative patent analyses between drugs, chemical substructures, and company assignees. CIPSI is available at https://cipsi.org.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138445491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2023-11-20DOI: 10.1002/minf.202300190
Joao Aires-de-Sousa
GUIDEMOL is a Python computer program based on the RDKit software to process molecular structures and calculate molecular descriptors with a graphical user interface using the tkinter package. It can calculate descriptors already implemented in RDKit as well as grid representations of 3D molecular structures using the electrostatic potential or voxels. The GUIDEMOL app provides easy access to RDKit tools for chemoinformatics users with no programming skills and can be adapted to calculate other descriptors or to trigger other procedures. A command line interface (CLI) is also provided for the calculation of grid representations. The source code is available at https://github.com/jairesdesousa/guidemol.
{"title":"GUIDEMOL: A Python graphical user interface for molecular descriptors based on RDKit.","authors":"Joao Aires-de-Sousa","doi":"10.1002/minf.202300190","DOIUrl":"10.1002/minf.202300190","url":null,"abstract":"<p><p>GUIDEMOL is a Python computer program based on the RDKit software to process molecular structures and calculate molecular descriptors with a graphical user interface using the tkinter package. It can calculate descriptors already implemented in RDKit as well as grid representations of 3D molecular structures using the electrostatic potential or voxels. The GUIDEMOL app provides easy access to RDKit tools for chemoinformatics users with no programming skills and can be adapted to calculate other descriptors or to trigger other procedures. A command line interface (CLI) is also provided for the calculation of grid representations. The source code is available at https://github.com/jairesdesousa/guidemol.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54230132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2023-12-19DOI: 10.1002/minf.202300207
Konstantin I Popov, James Wellnitz, Travis Maxfield, Alexander Tropsha
Recent rapid expansion of make-on-demand, purchasable, chemical libraries comprising dozens of billions or even trillions of molecules has challenged the efficient application of traditional structure-based virtual screening methods that rely on molecular docking. We present a novel computational methodology termed HIDDEN GEM (HIt Discovery using Docking ENriched by GEnerative Modeling) that greatly accelerates virtual screening. This workflow uniquely integrates machine learning, generative chemistry, massive chemical similarity searching and molecular docking of small, selected libraries in the beginning and the end of the workflow. For each target, HIDDEN GEM nominates a small number of top-scoring virtual hits prioritized from ultra-large chemical libraries. We have benchmarked HIDDEN GEM by conducting virtual screening campaigns for 16 diverse protein targets using Enamine REAL Space library comprising 37 billion molecules. We show that HIDDEN GEM yields the highest enrichment factors as compared to state of the art accelerated virtual screening methods, while requiring the least computational resources. HIDDEN GEM can be executed with any docking software and employed by users with limited computational resources.
最近,由数十亿甚至数万亿分子组成的按需、可购买的化学文库的快速扩张,对依赖分子对接的传统基于结构的虚拟筛选方法的有效应用提出了挑战。我们提出了一种新的计算方法,称为HIDDEN GEM(HIt Discovery using Docking ENriched by GEnerative Modeling),它大大加速了虚拟筛选。该工作流程独特地集成了机器学习、生成化学、大规模化学相似性搜索以及在工作流程的开始和结束时对选定的小型库进行分子对接。对于每个目标,HIDDEN GEM从超大型化学库中提名少量得分最高的虚拟点击。我们通过使用包含370亿个分子的Enamine REAL Space文库对16个不同的蛋白质靶标进行虚拟筛选,以HIDDEN GEM为基准。我们表明,与现有技术的加速虚拟筛选方法相比,HIDDEN GEM产生了最高的富集因子,同时需要最少的计算资源。HIDDEN GEM可以用任何对接软件执行,并由计算资源有限的用户使用。
{"title":"HIt Discovery using docking ENriched by GEnerative Modeling (HIDDEN GEM): A novel computational workflow for accelerated virtual screening of ultra-large chemical libraries.","authors":"Konstantin I Popov, James Wellnitz, Travis Maxfield, Alexander Tropsha","doi":"10.1002/minf.202300207","DOIUrl":"10.1002/minf.202300207","url":null,"abstract":"<p><p>Recent rapid expansion of make-on-demand, purchasable, chemical libraries comprising dozens of billions or even trillions of molecules has challenged the efficient application of traditional structure-based virtual screening methods that rely on molecular docking. We present a novel computational methodology termed HIDDEN GEM (HIt Discovery using Docking ENriched by GEnerative Modeling) that greatly accelerates virtual screening. This workflow uniquely integrates machine learning, generative chemistry, massive chemical similarity searching and molecular docking of small, selected libraries in the beginning and the end of the workflow. For each target, HIDDEN GEM nominates a small number of top-scoring virtual hits prioritized from ultra-large chemical libraries. We have benchmarked HIDDEN GEM by conducting virtual screening campaigns for 16 diverse protein targets using Enamine REAL Space library comprising 37 billion molecules. We show that HIDDEN GEM yields the highest enrichment factors as compared to state of the art accelerated virtual screening methods, while requiring the least computational resources. HIDDEN GEM can be executed with any docking software and employed by users with limited computational resources.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11156482/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41139125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}