The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.
{"title":"Chemoinformatic regression methods and their applicability domain.","authors":"Thomas-Martin Dutschmann, Valerie Schlenker, Knut Baumann","doi":"10.1002/minf.202400018","DOIUrl":"10.1002/minf.202400018","url":null,"abstract":"<p><p>The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400018"},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141158308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-06-10DOI: 10.1002/minf.202300339
Julia Revillo Imbernon, Jean-Marc Weibel, Eric Ennifar, Gilles Prévost, Esther Kellenberger
Aminoglycosides are crucial antibiotics facing challenges from bacterial resistance. This study addresses the importance of aminoglycoside modifying enzymes in the context of escalating resistance. Drawing upon over two decades of structural data in the Protein Data Bank, we focused on two key antibiotics, neomycin B and kanamycin A, to explore how the aminoglycoside structure is exploited by this family of enzymes. A systematic comparison across diverse enzymes and the RNA A-site target identified common characteristics in the recognition mode, while assessing the adaptability of neomycin B and kanamycin A in various environments.
氨基糖苷类药物是面临细菌耐药性挑战的重要抗生素。本研究探讨了在耐药性不断升级的背景下氨基糖苷类药物修饰酶的重要性。利用蛋白质数据库中二十多年的结构数据,我们重点研究了两种关键抗生素--新霉素 B 和卡那霉素 A,以探索氨基糖苷类结构是如何被该酶家族利用的。我们对不同的酶和 RNA A 位点目标进行了系统比较,确定了识别模式的共同特征,同时评估了新霉素 B 和卡那霉素 A 在各种环境中的适应性。
{"title":"Structural analysis of neomycin B and kanamycin A binding Aminoglycosides Modifying Enzymes (AME) and bacterial ribosomal RNA.","authors":"Julia Revillo Imbernon, Jean-Marc Weibel, Eric Ennifar, Gilles Prévost, Esther Kellenberger","doi":"10.1002/minf.202300339","DOIUrl":"10.1002/minf.202300339","url":null,"abstract":"<p><p>Aminoglycosides are crucial antibiotics facing challenges from bacterial resistance. This study addresses the importance of aminoglycoside modifying enzymes in the context of escalating resistance. Drawing upon over two decades of structural data in the Protein Data Bank, we focused on two key antibiotics, neomycin B and kanamycin A, to explore how the aminoglycoside structure is exploited by this family of enzymes. A systematic comparison across diverse enzymes and the RNA A-site target identified common characteristics in the recognition mode, while assessing the adaptability of neomycin B and kanamycin A in various environments.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300339"},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141296441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-06-12DOI: 10.1002/minf.202300259
Milo Roucairol, Tristan Cazenave
In this article we try different algorithms, namely Nested Monte Carlo Search and Greedy Best First Search, on AstraZeneca's open source retrosynthetic tool : AiZynthFinder. We compare these algorithms to AiZynthFinder's base Monte Carlo Tree Search on a benchmark selected from the PubChem database and by Bayer's chemists. We show that both Nested Monte Carlo Search and Greedy Best First Search outperform AstraZeneca's Monte Carlo Tree Search, with a slight advantage for Nested Monte Carlo Search while experimenting on a playout heuristic. We also show how the search algorithms are bounded by the quality of the policy network, in order to improve our results the next step is to improve the policy network.
{"title":"Comparing search algorithms on the retrosynthesis problem.","authors":"Milo Roucairol, Tristan Cazenave","doi":"10.1002/minf.202300259","DOIUrl":"10.1002/minf.202300259","url":null,"abstract":"<p><p>In this article we try different algorithms, namely Nested Monte Carlo Search and Greedy Best First Search, on AstraZeneca's open source retrosynthetic tool : AiZynthFinder. We compare these algorithms to AiZynthFinder's base Monte Carlo Tree Search on a benchmark selected from the PubChem database and by Bayer's chemists. We show that both Nested Monte Carlo Search and Greedy Best First Search outperform AstraZeneca's Monte Carlo Tree Search, with a slight advantage for Nested Monte Carlo Search while experimenting on a playout heuristic. We also show how the search algorithms are bounded by the quality of the policy network, in order to improve our results the next step is to improve the policy network.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300259"},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141306331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert Fraczkiewicz, Huy Quoc Nguyen, Newton Wu, Nina Kausch‐Busies, Sergio Grimbs, Kai Sommer, Antonius ter Laak, Judith Günther, Björn Wagner, Michael Reutlinger
In a unique collaboration between Simulations Plus and several industrial partners, we were able to develop a new version 11.0 of the previously published in silico pKa model, S+pKa, with considerably improved prediction accuracy. The model's training set was vastly expanded by large amounts of experimental data obtained from F. Hoffmann‐La Roche AG, Genentech Inc., and the Crop Science division of Bayer AG. The previous v7.0 of S+pKa was trained on data from public sources and the Pharmaceutical division of Bayer AG. The model has shown dramatic improvements in predictive accuracy when externally validated on three new contributor compound sets. Less expected was v11.0’s improvement in prediction on new compounds developed at Bayer Pharma after v7.0 was released (2013–2023), even without contributing additional data to v11.0. We illustrate chemical space coverage by chemistries encountered in the five domains, public and industrial, outline model construction, and discuss factors contributing to model's success.
{"title":"Best of both worlds: An expansion of the state of the art pKa model with data from three industrial partners","authors":"Robert Fraczkiewicz, Huy Quoc Nguyen, Newton Wu, Nina Kausch‐Busies, Sergio Grimbs, Kai Sommer, Antonius ter Laak, Judith Günther, Björn Wagner, Michael Reutlinger","doi":"10.1002/minf.202400088","DOIUrl":"https://doi.org/10.1002/minf.202400088","url":null,"abstract":"In a unique collaboration between Simulations Plus and several industrial partners, we were able to develop a new version 11.0 of the previously published <jats:italic>in silico</jats:italic> pK<jats:sub>a</jats:sub> model, S+pKa, with considerably improved prediction accuracy. The model's training set was vastly expanded by large amounts of experimental data obtained from F. Hoffmann‐La Roche AG, Genentech Inc., and the Crop Science division of Bayer AG. The previous v7.0 of S+pKa was trained on data from public sources and the Pharmaceutical division of Bayer AG. The model has shown dramatic improvements in predictive accuracy when externally validated on three new contributor compound sets. Less expected was v11.0’s improvement in prediction on new compounds developed at Bayer Pharma after v7.0 was released (2013–2023), even without contributing additional data to v11.0. We illustrate chemical space coverage by chemistries encountered in the five domains, public and industrial, outline model construction, and discuss factors contributing to model's success.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"86 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kinases, a class of enzymes controlling various substrates phosphorylation, are pivotal in both physiological and pathological processes. Although their conserved ATP binding pockets pose challenges for achieving selectivity, this feature offers opportunities for drug repositioning of kinase inhibitors (KIs). This study presents a cost‐effective in silico prediction of KIs drug repositioning via analyzing cross‐docking results. We established the KIs database (278 unique KIs, 1834 bioactivity data points) and kinases database (357 kinase structures categorized by the DFG motif) for carrying out cross‐docking. Comparative analysis of the docking scores and reported experimental bioactivity revealed that the Atypical, TK, and TKL superfamilies are suitable for drug repositioning. Among these kinase superfamilies, Olverematinib, Lapatinib, and Abemaciclib displayed enzymatic activity in our focused AKT‐PI3K‐mTOR pathway with IC50 values of 3.3, 3.2 and 5.8 μM. Further cell assays showed IC50 values of 0.2, 1.2 and 0.6 μM in tumor cells. The consistent result between prediction and validation demonstrated that repositioning KIs via in silico method is feasible.
激酶是一类控制各种底物磷酸化的酶,在生理和病理过程中都起着关键作用。尽管激酶保守的 ATP 结合口袋给实现选择性带来了挑战,但这一特点为激酶抑制剂(KIs)的药物重新定位提供了机会。本研究通过分析交叉对接结果,提出了一种经济有效的 KIs 药物重新定位的硅学预测方法。我们建立了 KIs 数据库(278 种独特的 KIs,1834 个生物活性数据点)和激酶数据库(按 DFG 主题分类的 357 种激酶结构),用于进行交叉对接。对对接得分和实验生物活性的比较分析表明,非典型激酶超家族、TK 激酶超家族和 TKL 激酶超家族适合药物重新定位。在这些激酶超家族中,Olverematinib、Lapatinib 和 Abemaciclib 在我们重点研究的 AKT-PI3K-mTOR 通路中显示出酶活性,IC50 值分别为 3.3、3.2 和 5.8 μM。进一步的细胞检测显示,肿瘤细胞的 IC50 值分别为 0.2、1.2 和 0.6 μM。预测和验证结果的一致性表明,通过硅学方法重新定位 KIs 是可行的。
{"title":"Exploring drug repositioning possibilities of kinase inhibitors via molecular simulation**","authors":"Qing‐Xin Wang, Jiao Cai, Zi‐Jun Chen, Jia‐Chuan Liu, Jing‐Jing Wang, Hai Zhou, Qing‐Qing Li, Zi‐Xuan Wang, Yi‐Bo Wang, Zhen‐Jiang Tong, Jin Yang, Tian‐Hua Wei, Meng‐Yuan Zhang, Yun Zhou, Wei‐Chen Dai, Ning Ding, Xue‐Jiao Leng, Xiao‐Ying Yin, Shan‐Liang Sun, Yan‐Cheng Yu, Nian‐Guang Li, Zhi‐Hao Shi","doi":"10.1002/minf.202300336","DOIUrl":"https://doi.org/10.1002/minf.202300336","url":null,"abstract":"Kinases, a class of enzymes controlling various substrates phosphorylation, are pivotal in both physiological and pathological processes. Although their conserved ATP binding pockets pose challenges for achieving selectivity, this feature offers opportunities for drug repositioning of kinase inhibitors (KIs). This study presents a cost‐effective in silico prediction of KIs drug repositioning via analyzing cross‐docking results. We established the KIs database (278 unique KIs, 1834 bioactivity data points) and kinases database (357 kinase structures categorized by the DFG motif) for carrying out cross‐docking. Comparative analysis of the docking scores and reported experimental bioactivity revealed that the Atypical, TK, and TKL superfamilies are suitable for drug repositioning. Among these kinase superfamilies, Olverematinib, Lapatinib, and Abemaciclib displayed enzymatic activity in our focused AKT‐PI3K‐mTOR pathway with IC<jats:sub>50</jats:sub> values of 3.3, 3.2 and 5.8 μM. Further cell assays showed IC<jats:sub>50</jats:sub> values of 0.2, 1.2 and 0.6 μM in tumor cells. The consistent result between prediction and validation demonstrated that repositioning KIs via <jats:italic>in silico</jats:italic> method is feasible.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"27 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alejandro Gómez‐García, Ann‐Kathrin Prinz, Daniel A. Acuña Jiménez, William J. Zamora, Haruna L. Barazorda‐Ccahuana, Miguel Á. Chávez‐Fumagalli, Marilia Valli, Adriano D. Andricopulo, Vanderlan da S. Bolzani, Dionisio A. Olmedo, Pablo N. Solís, Marvin J. Núñez, Johny R. Rodríguez Pérez, Hoover A. Valencia Sánchez, Héctor F. Cortés Hernández, Oscar M. Mosquera Martinez, Oliver Koch, José L. Medina‐Franco
Compound databases of natural products play a crucial role in drug discovery and development projects and have implications in other areas, such as food chemical research, ecology and metabolomics. Recently, we put together the first version of the Latin American Natural Product database (LANaPDB) as a collective effort of researchers from six countries to ensemble a public and representative library of natural products in a geographical region with a large biodiversity. The present work aims to conduct a comparative and extensive profiling of the natural product‐likeness of an updated version of LANaPDB and the individual ten compound databases that form part of LANaPDB. The natural product‐likeness profile of the Latin American compound databases is contrasted with the profile of other major natural product databases in the public domain and a set of small‐molecule drugs approved for clinical use. As part of the extensive characterization, we employed several chemoinformatics metrics of natural product likeness. The results of this study will capture the attention of the global community engaged in natural product databases, not only in Latin America but across the world.
{"title":"Updating and profiling the natural product‐likeness of Latin American compound libraries","authors":"Alejandro Gómez‐García, Ann‐Kathrin Prinz, Daniel A. Acuña Jiménez, William J. Zamora, Haruna L. Barazorda‐Ccahuana, Miguel Á. Chávez‐Fumagalli, Marilia Valli, Adriano D. Andricopulo, Vanderlan da S. Bolzani, Dionisio A. Olmedo, Pablo N. Solís, Marvin J. Núñez, Johny R. Rodríguez Pérez, Hoover A. Valencia Sánchez, Héctor F. Cortés Hernández, Oscar M. Mosquera Martinez, Oliver Koch, José L. Medina‐Franco","doi":"10.1002/minf.202400052","DOIUrl":"https://doi.org/10.1002/minf.202400052","url":null,"abstract":"Compound databases of natural products play a crucial role in drug discovery and development projects and have implications in other areas, such as food chemical research, ecology and metabolomics. Recently, we put together the first version of the Latin American Natural Product database (LANaPDB) as a collective effort of researchers from six countries to ensemble a public and representative library of natural products in a geographical region with a large biodiversity. The present work aims to conduct a comparative and extensive profiling of the natural product‐likeness of an updated version of LANaPDB and the individual ten compound databases that form part of LANaPDB. The natural product‐likeness profile of the Latin American compound databases is contrasted with the profile of other major natural product databases in the public domain and a set of small‐molecule drugs approved for clinical use. As part of the extensive characterization, we employed several chemoinformatics metrics of natural product likeness. The results of this study will capture the attention of the global community engaged in natural product databases, not only in Latin America but across the world.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"12 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-06-08DOI: 10.1002/minf.202400021
Bryan Dafniet, Olivier Taboureau
Drug development is a long and costly process, often limited by the toxicity and adverse drug reactions (ADRs) caused by drug candidates. Even on the market, some drugs can cause strong ADRs that can vary depending on an individual polymorphism. The development of Genome-wide association studies (GWAS) allowed the discovery of genetic variants of interest that may cause these effects. In this study, the objective was to investigate a deep learning approach to predict genetic variations potentially related to ADRs. We used single nucleotide polymorphisms (SNPs) information from dbSNP to create a network based on ADR-drug-target-mutations and extracted matrixes of interaction to build deep Neural Networks (DNN) models. Considering only information about mutations known to impact drug efficacy and drug safety from PharmGKB and drug adverse reactions based on the MedDRA System Organ Classes (SOCs), these DNN models reached a balanced accuracy of 0.61 in average. Including molecular fingerprints representing structural features of the drugs did not improve the performance of the models. To our knowledge, this is the first model that exploits DNN to predict ADR-drug-target-mutations. Although some improvements are suggested, these models can be of interest to analyze multiple compounds over all of the genes and polymorphisms information accessible and thus pave the way in precision medicine.
{"title":"Prediction of adverse drug reactions due to genetic predisposition using deep neural networks.","authors":"Bryan Dafniet, Olivier Taboureau","doi":"10.1002/minf.202400021","DOIUrl":"10.1002/minf.202400021","url":null,"abstract":"<p><p>Drug development is a long and costly process, often limited by the toxicity and adverse drug reactions (ADRs) caused by drug candidates. Even on the market, some drugs can cause strong ADRs that can vary depending on an individual polymorphism. The development of Genome-wide association studies (GWAS) allowed the discovery of genetic variants of interest that may cause these effects. In this study, the objective was to investigate a deep learning approach to predict genetic variations potentially related to ADRs. We used single nucleotide polymorphisms (SNPs) information from dbSNP to create a network based on ADR-drug-target-mutations and extracted matrixes of interaction to build deep Neural Networks (DNN) models. Considering only information about mutations known to impact drug efficacy and drug safety from PharmGKB and drug adverse reactions based on the MedDRA System Organ Classes (SOCs), these DNN models reached a balanced accuracy of 0.61 in average. Including molecular fingerprints representing structural features of the drugs did not improve the performance of the models. To our knowledge, this is the first model that exploits DNN to predict ADR-drug-target-mutations. Although some improvements are suggested, these models can be of interest to analyze multiple compounds over all of the genes and polymorphisms information accessible and thus pave the way in precision medicine.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400021"},"PeriodicalIF":3.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141293574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-06-08DOI: 10.1002/minf.202300167
Abdulsalam Y Bande, Sefer Baday
Virtual screening (VS) is one of the well-established approaches in drug discovery which speeds up the search for a bioactive molecule and, reduces costs and efforts associated with experiments. VS helps to narrow down the search space of chemical space and allows selecting fewer and more probable candidate compounds for experimental testing. Docking calculations are one of the commonly used and highly appreciated structure-based drug discovery methods. Databases for chemical structures of small molecules have been growing rapidly. However, at the moment virtual screening of large libraries via docking is not very common. In this work, we aim to accelerate docking studies by predicting docking scores without explicitly performing docking calculations. We experimented with an attention based long short-term memory (LSTM) neural network for an efficient prediction of docking scores as well as other machine learning models such as XGBoost. By using docking scores of a small number of ligands we trained our models and predicted docking scores of a few million molecules. Specifically, we tested our approaches on 11 datasets that were produced from in-house drug discovery studies. On average, by training models using only 7000 molecules we predicted docking scores of approximately 3.8 million molecules with R2 (coefficient of determination) of 0.77 and Spearman rank correlation coefficient of 0.85. We designed the system with ease of use in mind. All the user needs to provide is a csv file containing SMILES and their respective docking scores, the system then outputs a model that the user can use for the prediction of docking score for a new molecule.
{"title":"Accelerating Molecular Docking using Machine Learning Methods.","authors":"Abdulsalam Y Bande, Sefer Baday","doi":"10.1002/minf.202300167","DOIUrl":"10.1002/minf.202300167","url":null,"abstract":"<p><p>Virtual screening (VS) is one of the well-established approaches in drug discovery which speeds up the search for a bioactive molecule and, reduces costs and efforts associated with experiments. VS helps to narrow down the search space of chemical space and allows selecting fewer and more probable candidate compounds for experimental testing. Docking calculations are one of the commonly used and highly appreciated structure-based drug discovery methods. Databases for chemical structures of small molecules have been growing rapidly. However, at the moment virtual screening of large libraries via docking is not very common. In this work, we aim to accelerate docking studies by predicting docking scores without explicitly performing docking calculations. We experimented with an attention based long short-term memory (LSTM) neural network for an efficient prediction of docking scores as well as other machine learning models such as XGBoost. By using docking scores of a small number of ligands we trained our models and predicted docking scores of a few million molecules. Specifically, we tested our approaches on 11 datasets that were produced from in-house drug discovery studies. On average, by training models using only 7000 molecules we predicted docking scores of approximately 3.8 million molecules with R<sup>2</sup> (coefficient of determination) of 0.77 and Spearman rank correlation coefficient of 0.85. We designed the system with ease of use in mind. All the user needs to provide is a csv file containing SMILES and their respective docking scores, the system then outputs a model that the user can use for the prediction of docking score for a new molecule.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300167"},"PeriodicalIF":3.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141293572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-06-08DOI: 10.1002/minf.202300312
Myeonghyeon Jeong, Sunyong Yoo
Pregnant females may use medications to manage health problems that develop during pregnancy or that they had prior to pregnancy. However, using medications during pregnancy has a potential risk to the fetus. Assessing the fetotoxicity of drugs is essential to ensure safe treatments, but the current process is challenged by ethical issues, time, and cost. Therefore, the need for in silico models to efficiently assess the fetotoxicity of drugs has recently emerged. Previous studies have proposed successful machine learning models for fetotoxicity prediction and even suggest molecular substructures that are possibly associated with fetotoxicity risks or protective effects. However, the interpretation of the decisions of the models on fetotoxicity prediction for each drug is still insufficient. This study constructed machine learning-based models that can predict the fetotoxicity of drugs while providing explanations for the decisions. For this, permutation feature importance was used to identify the general features that the model made significant in predicting the fetotoxicity of drugs. In addition, features associated with fetotoxicity for each drug were analyzed using the attention mechanism. The predictive performance of all the constructed models was significantly high (AUROC: 0.854-0.974, AUPR: 0.890-0.975). Furthermore, we conducted literature reviews on the predicted important features and found that they were highly associated with fetotoxicity. We expect that our model will benefit fetotoxicity research by providing an evaluation of fetotoxicity risks for drugs or drug candidates, along with an interpretation of that prediction.
{"title":"FetoML: Interpretable predictions of the fetotoxicity of drugs based on machine learning approaches.","authors":"Myeonghyeon Jeong, Sunyong Yoo","doi":"10.1002/minf.202300312","DOIUrl":"10.1002/minf.202300312","url":null,"abstract":"<p><p>Pregnant females may use medications to manage health problems that develop during pregnancy or that they had prior to pregnancy. However, using medications during pregnancy has a potential risk to the fetus. Assessing the fetotoxicity of drugs is essential to ensure safe treatments, but the current process is challenged by ethical issues, time, and cost. Therefore, the need for in silico models to efficiently assess the fetotoxicity of drugs has recently emerged. Previous studies have proposed successful machine learning models for fetotoxicity prediction and even suggest molecular substructures that are possibly associated with fetotoxicity risks or protective effects. However, the interpretation of the decisions of the models on fetotoxicity prediction for each drug is still insufficient. This study constructed machine learning-based models that can predict the fetotoxicity of drugs while providing explanations for the decisions. For this, permutation feature importance was used to identify the general features that the model made significant in predicting the fetotoxicity of drugs. In addition, features associated with fetotoxicity for each drug were analyzed using the attention mechanism. The predictive performance of all the constructed models was significantly high (AUROC: 0.854-0.974, AUPR: 0.890-0.975). Furthermore, we conducted literature reviews on the predicted important features and found that they were highly associated with fetotoxicity. We expect that our model will benefit fetotoxicity research by providing an evaluation of fetotoxicity risks for drugs or drug candidates, along with an interpretation of that prediction.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300312"},"PeriodicalIF":3.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141293573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-06-08DOI: 10.1002/minf.202300250
Xiyu Chen, Sigrid Leyendecker
Protein kinases are crucial cellular enzymes that facilitate the transfer of phosphates from adenosine triphosphate (ATP) to their substrates, thereby regulating numerous cellular activities. Dysfunctional kinase activity often leads to oncogenic conditions. Chosen by using structural similarity to 5UG9, we selected 79 crystal structures from the PDB and based on the position of the phenylalanine side chain in the DFG motif, we classified these 79 crystal structures into 5 group clusters. Our approach applies our kinematic flexibility analysis (KFA) to explore the flexibility of kinases in various activity states and examine the impact of the activation loop on kinase structure. KFA enables the rapid decomposition of macromolecules into different flexibility regions, allowing comprehensive analysis of conformational structures. The results reveal that the activation loop of kinases acts as a "lock" that stabilizes the active conformation of kinases by rigidifying the adjacent α-helices. Furthermore, we investigate specific kinase mutations, such as the L858R mutation commonly associated with non-small cell lung cancer, which induces increased flexibility in active-state kinases. In addition, through analyzing the hydrogen bond pattern, we examine the substructure of kinases in different states. Notably, active-state kinases exhibit a higher occurrence of α-helices compared to inactive-state kinases. This study contributes to the understanding of biomolecular conformation at a level relevant to drug development.
{"title":"Kinematic analysis of kinases and their oncogenic mutations - Kinases and their mutation kinematic analysis.","authors":"Xiyu Chen, Sigrid Leyendecker","doi":"10.1002/minf.202300250","DOIUrl":"10.1002/minf.202300250","url":null,"abstract":"<p><p>Protein kinases are crucial cellular enzymes that facilitate the transfer of phosphates from adenosine triphosphate (ATP) to their substrates, thereby regulating numerous cellular activities. Dysfunctional kinase activity often leads to oncogenic conditions. Chosen by using structural similarity to 5UG9, we selected 79 crystal structures from the PDB and based on the position of the phenylalanine side chain in the DFG motif, we classified these 79 crystal structures into 5 group clusters. Our approach applies our kinematic flexibility analysis (KFA) to explore the flexibility of kinases in various activity states and examine the impact of the activation loop on kinase structure. KFA enables the rapid decomposition of macromolecules into different flexibility regions, allowing comprehensive analysis of conformational structures. The results reveal that the activation loop of kinases acts as a \"lock\" that stabilizes the active conformation of kinases by rigidifying the adjacent α-helices. Furthermore, we investigate specific kinase mutations, such as the L858R mutation commonly associated with non-small cell lung cancer, which induces increased flexibility in active-state kinases. In addition, through analyzing the hydrogen bond pattern, we examine the substructure of kinases in different states. Notably, active-state kinases exhibit a higher occurrence of α-helices compared to inactive-state kinases. This study contributes to the understanding of biomolecular conformation at a level relevant to drug development.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300250"},"PeriodicalIF":3.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141288332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}