Pub Date : 2024-11-26DOI: 10.1021/acs.jcim.4c0136410.1021/acs.jcim.4c01364
Chris Zhang, Meghan Osato and David L. Mobley*,
As a model system, the binding pocket of the L99A mutant of T4 lysozyme has been the subject of numerous computational free energy studies. However, previous studies have failed to fully sample and account for the observed changes in the binding pocket of T4 L99A upon binding of a congeneric ligand series, limiting the accuracy of results. In this work, we resolve the closed, intermediate, and open states for T4 L99A previously reported in experiment in MD and establish definitions for these states based on the dynamics of the system. From this analysis, we arrive at two primary conclusions. First, assignment of simulation trajectories into discrete states should not be done simply based on RMSD to crystal structures as this can result in misassignment of states. Second, the different metastable conformations studied here need to be carefully treated, as we estimate the time scales for conformational interconversion to be on the order of 102 to 103 ns─far longer than time scales for typical binding calculations. We conclude with a discussion on the need to develop enhanced sampling methods to generally account for significant changes in protein conformation due to relatively small ligand perturbations.
{"title":"Kinetics-Based State Definitions for Discrete Binding Conformations of T4 L99A in MD via Markov State Modeling","authors":"Chris Zhang, Meghan Osato and David L. Mobley*, ","doi":"10.1021/acs.jcim.4c0136410.1021/acs.jcim.4c01364","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01364https://doi.org/10.1021/acs.jcim.4c01364","url":null,"abstract":"<p >As a model system, the binding pocket of the L99A mutant of T4 lysozyme has been the subject of numerous computational free energy studies. However, previous studies have failed to fully sample and account for the observed changes in the binding pocket of T4 L99A upon binding of a congeneric ligand series, limiting the accuracy of results. In this work, we resolve the closed, intermediate, and open states for T4 L99A previously reported in experiment in MD and establish definitions for these states based on the dynamics of the system. From this analysis, we arrive at two primary conclusions. First, assignment of simulation trajectories into discrete states should not be done simply based on RMSD to crystal structures as this can result in misassignment of states. Second, the different metastable conformations studied here need to be carefully treated, as we estimate the time scales for conformational interconversion to be on the order of 10<sup>2</sup> to 10<sup>3</sup> ns─far longer than time scales for typical binding calculations. We conclude with a discussion on the need to develop enhanced sampling methods to generally account for significant changes in protein conformation due to relatively small ligand perturbations.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"64 23","pages":"8870–8879 8870–8879"},"PeriodicalIF":5.6,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142843817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-26DOI: 10.1021/acs.jcim.4c02123
Issar Arab, Kris Laukens, Wout Bittremieux
{"title":"Correction to \"Semisupervised Learning to Boost hERG, Nav1.5, and Cav1.2 Cardiac Ion Channel Toxicity Prediction by Mining a Large Unlabeled Small Molecule Data Set\".","authors":"Issar Arab, Kris Laukens, Wout Bittremieux","doi":"10.1021/acs.jcim.4c02123","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02123","url":null,"abstract":"","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142724529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-25Epub Date: 2024-11-03DOI: 10.1021/acs.jcim.4c01381
Dmitry S Boichenko, Nikita I Kolomoets, Daniil A Boiko, Alexey S Galushko, Alexandra V Posvyatenko, Andrey E Kolesnikov, Ksenia S Egorova, Valentine P Ananikov
The increasing need to understand and control the environmental impact of chemical processes has revealed the challenge in efficient evaluation of toxicity of the vast number of chemical compounds and their varying effects on biological systems. In this study, we introduce "Build-a-bio-Strip", a novel online service designed to carry out a quick initial analysis of the toxic impact of chemical processes. This platform enables users to automatically generate toxicity characteristics of chemical reactions using their own data on cytotoxicity or median lethal doses of the substances involved or computational predictions based on SMILES strings. The service calculates the toxicity metrics such as bio-Factors and cytotoxicity potentials, which can be used to identify the substances with significant contributions to the overall toxicity of a particular process. This facilitates the selection of safer synthetic routes and the optimization of chemical processes from a toxicity perspective. "Build-a-bio-Strip" represents a step toward safer and more sustainable chemical practices. It is available free-of-charge at http://app.ananikovlab.ai:8080/.
{"title":"Build-a-Bio-Strip: An Online Platform for Rapid Toxicity Assessment in Chemical Synthesis.","authors":"Dmitry S Boichenko, Nikita I Kolomoets, Daniil A Boiko, Alexey S Galushko, Alexandra V Posvyatenko, Andrey E Kolesnikov, Ksenia S Egorova, Valentine P Ananikov","doi":"10.1021/acs.jcim.4c01381","DOIUrl":"10.1021/acs.jcim.4c01381","url":null,"abstract":"<p><p>The increasing need to understand and control the environmental impact of chemical processes has revealed the challenge in efficient evaluation of toxicity of the vast number of chemical compounds and their varying effects on biological systems. In this study, we introduce \"Build-a-bio-Strip\", a novel online service designed to carry out a quick initial analysis of the toxic impact of chemical processes. This platform enables users to automatically generate toxicity characteristics of chemical reactions using their own data on cytotoxicity or median lethal doses of the substances involved or computational predictions based on SMILES strings. The service calculates the toxicity metrics such as bio-Factors and cytotoxicity potentials, which can be used to identify the substances with significant contributions to the overall toxicity of a particular process. This facilitates the selection of safer synthetic routes and the optimization of chemical processes from a toxicity perspective. \"Build-a-bio-Strip\" represents a step toward safer and more sustainable chemical practices. It is available free-of-charge at http://app.ananikovlab.ai:8080/.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8373-8378"},"PeriodicalIF":5.6,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142566382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-25DOI: 10.1021/acs.jcim.4c0135310.1021/acs.jcim.4c01353
Fatemeh Etezadi, Shunichi Ito, Kosuke Yasui, Rodi Kado Abdalkader, Itsunari Minami, Motonari Uesugi, Namasivayam Ganesh Pandian, Haruko Nakano, Atsushi Nakano and Daniel M. Packwood*,
The discovery of small organic compounds for inducing stem cell differentiation is a time- and resource-intensive process. While data science could, in principle, streamline the discovery of these compounds, novel approaches are required due to the difficulty of acquiring training data from large numbers of example compounds. In this paper, we present the design of a new compound for inducing cardiomyocyte differentiation using simple regression models trained with a data set containing only 80 examples. We introduce decorated shape descriptors, an information-rich molecular feature representation that integrates both molecular shape and hydrophilicity information. These models demonstrate improved performance compared to ones using standard molecular descriptors based on shape alone. Model overtraining is diagnosed using a new type of sensitivity analysis. Our new compound is designed using a conservative molecular design strategy, and its effectiveness is confirmed through expression profiles of cardiomyocyte-related marker genes using real-time polymerase chain reaction experiments on human iPS cell lines. This work demonstrates a viable data-driven strategy for designing new compounds for stem cell differentiation protocols and will be useful in situations where training data is limited.
{"title":"Molecular Design for Cardiac Cell Differentiation Using a Small Data Set and Decorated Shape Features","authors":"Fatemeh Etezadi, Shunichi Ito, Kosuke Yasui, Rodi Kado Abdalkader, Itsunari Minami, Motonari Uesugi, Namasivayam Ganesh Pandian, Haruko Nakano, Atsushi Nakano and Daniel M. Packwood*, ","doi":"10.1021/acs.jcim.4c0135310.1021/acs.jcim.4c01353","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01353https://doi.org/10.1021/acs.jcim.4c01353","url":null,"abstract":"<p >The discovery of small organic compounds for inducing stem cell differentiation is a time- and resource-intensive process. While data science could, in principle, streamline the discovery of these compounds, novel approaches are required due to the difficulty of acquiring training data from large numbers of example compounds. In this paper, we present the design of a new compound for inducing cardiomyocyte differentiation using simple regression models trained with a data set containing only 80 examples. We introduce decorated shape descriptors, an information-rich molecular feature representation that integrates both molecular shape and hydrophilicity information. These models demonstrate improved performance compared to ones using standard molecular descriptors based on shape alone. Model overtraining is diagnosed using a new type of sensitivity analysis. Our new compound is designed using a conservative molecular design strategy, and its effectiveness is confirmed through expression profiles of cardiomyocyte-related marker genes using real-time polymerase chain reaction experiments on human iPS cell lines. This work demonstrates a viable data-driven strategy for designing new compounds for stem cell differentiation protocols and will be useful in situations where training data is limited.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"64 23","pages":"8824–8837 8824–8837"},"PeriodicalIF":5.6,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142850997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-25Epub Date: 2024-11-05DOI: 10.1021/acs.jcim.4c01186
Yuzhi Xu, Xinxin Liu, Wei Xia, Jiankai Ge, Cheng-Wei Ju, Haiping Zhang, John Z H Zhang
The rapid progression of machine learning, especially deep learning (DL), has catalyzed a new era in drug discovery, introducing innovative approaches for predicting molecular properties. Despite the many methods available for feature representation, efficiently utilizing rich, high-dimensional information remains a significant challenge. Our work introduces ChemXTree, a novel graph-based model that integrates a Gate Modulation Feature Unit (GMFU) and neural decision tree (NDT) in the output layer to address this challenge. Extensive evaluations on benchmark data sets, including MoleculeNet and eight additional drug databases, have demonstrated ChemXTree's superior performance, surpassing or matching the current state-of-the-art models. Visualization techniques clearly demonstrate that ChemXTree significantly improves the separation between substrates and nonsubstrates in the latent space. In summary, ChemXTree demonstrates a promising approach for integrating advanced feature extraction with neural decision trees, offering significant improvements in predictive accuracy for drug discovery tasks and opening new avenues for optimizing molecular properties.
{"title":"ChemXTree: A Feature-Enhanced Graph Neural Network-Neural Decision Tree Framework for ADMET Prediction.","authors":"Yuzhi Xu, Xinxin Liu, Wei Xia, Jiankai Ge, Cheng-Wei Ju, Haiping Zhang, John Z H Zhang","doi":"10.1021/acs.jcim.4c01186","DOIUrl":"10.1021/acs.jcim.4c01186","url":null,"abstract":"<p><p>The rapid progression of machine learning, especially deep learning (DL), has catalyzed a new era in drug discovery, introducing innovative approaches for predicting molecular properties. Despite the many methods available for feature representation, efficiently utilizing rich, high-dimensional information remains a significant challenge. Our work introduces ChemXTree, a novel graph-based model that integrates a Gate Modulation Feature Unit (GMFU) and neural decision tree (NDT) in the output layer to address this challenge. Extensive evaluations on benchmark data sets, including MoleculeNet and eight additional drug databases, have demonstrated ChemXTree's superior performance, surpassing or matching the current state-of-the-art models. Visualization techniques clearly demonstrate that ChemXTree significantly improves the separation between substrates and nonsubstrates in the latent space. In summary, ChemXTree demonstrates a promising approach for integrating advanced feature extraction with neural decision trees, offering significant improvements in predictive accuracy for drug discovery tasks and opening new avenues for optimizing molecular properties.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8440-8452"},"PeriodicalIF":5.6,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11600499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142574932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-25Epub Date: 2024-11-04DOI: 10.1021/acs.jcim.4c00788
Gabriel T Galdino, Olivier Mailhot, Rafael Najmanovich
The μ-opioid receptor (MOR) is a G-protein coupled receptor involved in nociception and the primary target of opioid drugs. Understanding the relationships among the ligand structure, receptor dynamics, and efficacy in activating MOR is crucial for drug discovery and development. Here, we use coarse-grained normal-mode analysis to predict ligand-induced changes in receptor dynamics with the Quantitative Dynamics Activity Relationship (QDAR) DynaSig-ML methodology, training a LASSO regression model on the entropic signatures (ESs) computed from ligand-receptor complexes. We train and validate the methodology using a data set of 179 MOR ligands with experimentally measured efficacies split into strictly chemically different cross-validation sets. By analyzing the coefficients of the ES LASSO model, we identified key residues involved in MOR activation, several of which have mutational data supporting their role in MOR activation. Additionally, we explored a contact-only LASSO model based on ligand-protein interactions. While the model showed predictive power, it failed at predicting efficacy for ligands with low structural similarity to the training set, emphasizing the importance of receptor dynamics for predicting ligand-induced receptor activation. Moreover, the low computational cost of our approach, at 3 CPU s per ligand-receptor complex, opens the door to its application in large-scale virtual screening contexts. Our work contributes to a better understanding of dynamics-function relationships in the μ-opioid receptor and provides a framework for predicting ligand efficacy based on ligand-induced changes in receptor dynamics.
μ-阿片受体(MOR)是一种参与痛觉的G蛋白偶联受体,也是阿片类药物的主要靶点。了解配体结构、受体动力学和激活 MOR 的功效之间的关系对于药物发现和开发至关重要。在这里,我们利用粗粒度正态模式分析预测配体诱导的受体动力学变化,采用定量动力学活性关系(QDAR)DynaSig-ML 方法,在配体-受体复合物计算出的熵特征(ES)上训练 LASSO 回归模型。我们使用由 179 种 MOR 配体组成的数据集对该方法进行了训练和验证,这些配体的药效是通过实验测得的,并分成了化学性质严格不同的交叉验证集。通过分析 ES LASSO 模型的系数,我们确定了参与 MOR 激活的关键残基,其中几个残基的突变数据支持它们在 MOR 激活中的作用。此外,我们还探索了基于配体与蛋白质相互作用的纯接触 LASSO 模型。虽然该模型显示出了预测能力,但它无法预测与训练集结构相似度较低的配体的药效,这强调了受体动力学对预测配体诱导的受体激活的重要性。此外,我们的方法计算成本低,每个配体-受体复合物只需 3 CPU s,这为其在大规模虚拟筛选中的应用打开了大门。我们的工作有助于更好地理解μ-阿片受体的动力学-功能关系,并为根据配体诱导的受体动力学变化预测配体功效提供了一个框架。
{"title":"Understanding and Predicting Ligand Efficacy in the μ-Opioid Receptor through Quantitative Dynamical Analysis of Complex Structures.","authors":"Gabriel T Galdino, Olivier Mailhot, Rafael Najmanovich","doi":"10.1021/acs.jcim.4c00788","DOIUrl":"10.1021/acs.jcim.4c00788","url":null,"abstract":"<p><p>The μ-opioid receptor (MOR) is a G-protein coupled receptor involved in nociception and the primary target of opioid drugs. Understanding the relationships among the ligand structure, receptor dynamics, and efficacy in activating MOR is crucial for drug discovery and development. Here, we use coarse-grained normal-mode analysis to predict ligand-induced changes in receptor dynamics with the Quantitative Dynamics Activity Relationship (QDAR) DynaSig-ML methodology, training a LASSO regression model on the entropic signatures (ESs) computed from ligand-receptor complexes. We train and validate the methodology using a data set of 179 MOR ligands with experimentally measured efficacies split into strictly chemically different cross-validation sets. By analyzing the coefficients of the ES LASSO model, we identified key residues involved in MOR activation, several of which have mutational data supporting their role in MOR activation. Additionally, we explored a contact-only LASSO model based on ligand-protein interactions. While the model showed predictive power, it failed at predicting efficacy for ligands with low structural similarity to the training set, emphasizing the importance of receptor dynamics for predicting ligand-induced receptor activation. Moreover, the low computational cost of our approach, at 3 CPU s per ligand-receptor complex, opens the door to its application in large-scale virtual screening contexts. Our work contributes to a better understanding of dynamics-function relationships in the μ-opioid receptor and provides a framework for predicting ligand efficacy based on ligand-induced changes in receptor dynamics.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8549-8561"},"PeriodicalIF":5.6,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142574941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-25Epub Date: 2024-11-01DOI: 10.1021/acs.jcim.4c01667
Bibhu Prasad Behera, Hemangini Naik, V Badireenath Konkimalla
Peptaloid is the first dedicated database for peptide alkaloid molecules, a unique class of naturally derived compounds known for their structural diversity and significant biological activities. Despite their promising potential in drug discovery and therapeutic development, research on peptide alkaloids has been limited by the absence of a comprehensive and centralized resource. Fragmented data across various sources have posed a significant challenge, underscoring the need for a specialized database to facilitate more efficient research and application. Peptaloid addresses this critical gap by providing a database with over 161,000 peptide alkaloid entries, each detailed with structural, physicochemical, and pharmacological properties. By leveraging advanced computational tools and machine learning, Peptaloid generates ADMET profiles, aiding in identifying and optimizing therapeutic candidates. Designed for versatility, the database supports various applications beyond drug discovery, including ecology and material sciences. Peptaloid (as a specialized database for peptide alkaloids) will play a crucial role in innovation and collaboration across scientific disciplines. Peptaloid is accessible at https://peptaloid.niser.ac.in.
{"title":"Peptaloid: A Comprehensive Database for Exploring Peptide Alkaloid.","authors":"Bibhu Prasad Behera, Hemangini Naik, V Badireenath Konkimalla","doi":"10.1021/acs.jcim.4c01667","DOIUrl":"10.1021/acs.jcim.4c01667","url":null,"abstract":"<p><p>Peptaloid is the first dedicated database for peptide alkaloid molecules, a unique class of naturally derived compounds known for their structural diversity and significant biological activities. Despite their promising potential in drug discovery and therapeutic development, research on peptide alkaloids has been limited by the absence of a comprehensive and centralized resource. Fragmented data across various sources have posed a significant challenge, underscoring the need for a specialized database to facilitate more efficient research and application. Peptaloid addresses this critical gap by providing a database with over 161,000 peptide alkaloid entries, each detailed with structural, physicochemical, and pharmacological properties. By leveraging advanced computational tools and machine learning, Peptaloid generates ADMET profiles, aiding in identifying and optimizing therapeutic candidates. Designed for versatility, the database supports various applications beyond drug discovery, including ecology and material sciences. Peptaloid (as a specialized database for peptide alkaloids) will play a crucial role in innovation and collaboration across scientific disciplines. Peptaloid is accessible at https://peptaloid.niser.ac.in.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8387-8395"},"PeriodicalIF":5.6,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142556620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-25Epub Date: 2024-11-04DOI: 10.1021/acs.jcim.4c00726
Yujing Zhao, Lei Zhang, Jian Du, Qingwei Meng, Li Zhang, Heshuang Wang, Liang Sun, Qilei Liu
The dissociation rate constant (koff) significantly impacts the drug potency and dosing frequency. This work proposes a powerful optimization-based framework for de novo drug design guided by koff. First, a comprehensive database containing 2,773 unique koff values is created. Based on the database, a novel generic dissociation kinetic model is developed with a mixture-of-experts architecture, enabling high-throughput predictions of koff with high accuracy. The developed model is then integrated with an optimization-based mathematical programming approach to design drug candidates with low koff. Finally, the τ-RAMD method is utilized to rigorously verify the designed potential drug candidates. In a case study, the framework successfully identified numerous new potential HSP90 inhibitor candidates, achieving a maximum 45.7% improvement in residence time (τ = 1/koff) compared to that of a known exceptional HSP90 inhibitor. These findings demonstrate the feasibility and effectiveness of the kinetics-guided optimization-based de novo drug design framework in designing drug candidates with prolonged τ.
{"title":"Mixture-of-Experts Based Dissociation Kinetic Model for <i>De Novo</i> Design of HSP90 Inhibitors with Prolonged Residence Time.","authors":"Yujing Zhao, Lei Zhang, Jian Du, Qingwei Meng, Li Zhang, Heshuang Wang, Liang Sun, Qilei Liu","doi":"10.1021/acs.jcim.4c00726","DOIUrl":"10.1021/acs.jcim.4c00726","url":null,"abstract":"<p><p>The dissociation rate constant (<i>k</i><sub>off</sub>) significantly impacts the drug potency and dosing frequency. This work proposes a powerful optimization-based framework for <i>de novo</i> drug design guided by <i>k</i><sub>off</sub>. First, a comprehensive database containing 2,773 unique <i>k</i><sub>off</sub> values is created. Based on the database, a novel generic dissociation kinetic model is developed with a mixture-of-experts architecture, enabling high-throughput predictions of <i>k</i><sub>off</sub> with high accuracy. The developed model is then integrated with an optimization-based mathematical programming approach to design drug candidates with low <i>k</i><sub>off</sub>. Finally, the τ-RAMD method is utilized to rigorously verify the designed potential drug candidates. In a case study, the framework successfully identified numerous new potential HSP90 inhibitor candidates, achieving a maximum 45.7% improvement in residence time (τ = 1/<i>k</i><sub>off</sub>) compared to that of a known exceptional HSP90 inhibitor. These findings demonstrate the feasibility and effectiveness of the kinetics-guided optimization-based <i>de novo</i> drug design framework in designing drug candidates with prolonged τ.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8427-8439"},"PeriodicalIF":5.6,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142574936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-25Epub Date: 2024-11-11DOI: 10.1021/acs.jcim.4c01246
Andrés Halabi Diaz, Mario Duque-Noreña, Elizabeth Rincón, Eduardo Chamorro
This study synergizes machine learning (ML) with conceptual density functional theory (CDFT) to develop OECD-compliant predictive models for the mutagenic activity of aromatic amines (AAs) with a fully No-Code methodology using a comprehensive data set of 251 AAs, Leave-One-Out-Cross-Validation (LOOCV), and three distinct data splits. Our research employs the GFN2-xTB method, known for its robustness and speed, to compute descriptors for procarcinogens and their activated metabolites in vacuum and aqueous phases. We evaluate the effectiveness of different theoretical definitions of electrophilicity within CDFT, namely, PSL, GCV, and CDP schemes, and the newly introduced Log QP descriptor to approximate Log P information. SPAARC, RandomTree, and JCHAID* ML methods were used to build explainable predictive models with highly robust internal validation (Avg. Correct Classifications = 76% and Avg. Kappa = 0.29) and external validation (Avg. Correct Classifications = 79% and Avg. Kappa = 0.33) metrics, and the results were compared to those of a two hidden layer Multilayer Perceptron. The results indicate that the second CDP definition for the electrophilicity in both vacuum and aqueous phases and also the newly presented Log QP descriptors are the most important ones for predicting the mutagenic activity of AA (namely ω+VacCDP2+, ω+AqCDP2+, and LogQP1+Vac, respectively). The results indicate that metabolic activation, aqueous solvent properties, and the CDP electrophilicity schemes and Log QP should be considered when building predictive models for the mutagenic activity of AA. This study offers a replicable, No-Code approach to QSAR research, making high-level ML and CDFT applications accessible to a broader audience. Future work will expand these methods to other compound families, enhancing predictive capabilities in the study of mutagenic activities and other biological phenomena.
{"title":"Synergizing Machine Learning, Conceptual Density Functional Theory, and Biochemistry: No-Code Explainable Predictive Models for Mutagenicity in Aromatic Amines.","authors":"Andrés Halabi Diaz, Mario Duque-Noreña, Elizabeth Rincón, Eduardo Chamorro","doi":"10.1021/acs.jcim.4c01246","DOIUrl":"10.1021/acs.jcim.4c01246","url":null,"abstract":"<p><p>This study synergizes machine learning (ML) with conceptual density functional theory (CDFT) to develop OECD-compliant predictive models for the mutagenic activity of aromatic amines (AAs) with a fully No-Code methodology using a comprehensive data set of 251 AAs, Leave-One-Out-Cross-Validation (LOOCV), and three distinct data splits. Our research employs the GFN2-xTB method, known for its robustness and speed, to compute descriptors for procarcinogens and their activated metabolites in vacuum and aqueous phases. We evaluate the effectiveness of different theoretical definitions of electrophilicity within CDFT, namely, PSL, GCV, and CDP schemes, and the newly introduced Log QP descriptor to approximate Log P information. SPAARC, RandomTree, and JCHAID* ML methods were used to build explainable predictive models with highly robust internal validation (Avg. Correct Classifications = 76% and Avg. Kappa = 0.29) and external validation (Avg. Correct Classifications = 79% and Avg. Kappa = 0.33) metrics, and the results were compared to those of a two hidden layer Multilayer Perceptron. The results indicate that the second CDP definition for the electrophilicity in both vacuum and aqueous phases and also the newly presented Log QP descriptors are the most important ones for predicting the mutagenic activity of AA (namely ω<sub>+Vac</sub><sup>CDP2+</sup>, ω<sub>+Aq</sub><sup>CDP2+</sup>, and LogQP1<sub>+Vac</sub>, respectively). The results indicate that metabolic activation, aqueous solvent properties, and the CDP electrophilicity schemes and Log QP should be considered when building predictive models for the mutagenic activity of AA. This study offers a replicable, No-Code approach to QSAR research, making high-level ML and CDFT applications accessible to a broader audience. Future work will expand these methods to other compound families, enhancing predictive capabilities in the study of mutagenic activities and other biological phenomena.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8510-8520"},"PeriodicalIF":5.6,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142612417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-25Epub Date: 2024-11-11DOI: 10.1021/acs.jcim.4c01621
Sumit Tarafder, Debswapna Bhattacharya
A scoring function that can reliably assess the accuracy of a 3D RNA structural model in the absence of experimental structure is not only important for model evaluation and selection but also useful for scoring-guided conformational sampling. However, high-fidelity RNA scoring has proven to be difficult using conventional knowledge-based statistical potentials and currently available machine learning-based approaches. Here, we present lociPARSE, a locality-aware invariant point attention architecture for scoring RNA 3D structures. Unlike existing machine learning methods that estimate superposition-based root-mean-square deviation (RMSD), lociPARSE estimates Local Distance Difference Test (lDDT) scores capturing the accuracy of each nucleotide and its surrounding local atomic environment in a superposition-free manner, before aggregating information to predict global structural accuracy. Tested on multiple datasets including CASP15, lociPARSE significantly outperforms existing statistical potentials (rsRNASP, cgRNASP, DFIRE-RNA, and RASP) and machine learning methods (ARES and RNA3DCNN) across complementary assessment metrics. lociPARSE is freely available at https://github.com/Bhattacharya-Lab/lociPARSE.
{"title":"lociPARSE: A Locality-aware Invariant Point Attention Model for Scoring RNA 3D Structures.","authors":"Sumit Tarafder, Debswapna Bhattacharya","doi":"10.1021/acs.jcim.4c01621","DOIUrl":"10.1021/acs.jcim.4c01621","url":null,"abstract":"<p><p>A scoring function that can reliably assess the accuracy of a 3D RNA structural model in the absence of experimental structure is not only important for model evaluation and selection but also useful for scoring-guided conformational sampling. However, high-fidelity RNA scoring has proven to be difficult using conventional knowledge-based statistical potentials and currently available machine learning-based approaches. Here, we present lociPARSE, a locality-aware invariant point attention architecture for scoring RNA 3D structures. Unlike existing machine learning methods that estimate superposition-based root-mean-square deviation (RMSD), lociPARSE estimates Local Distance Difference Test (lDDT) scores capturing the accuracy of each nucleotide and its surrounding local atomic environment in a superposition-free manner, before aggregating information to predict global structural accuracy. Tested on multiple datasets including CASP15, lociPARSE significantly outperforms existing statistical potentials (rsRNASP, cgRNASP, DFIRE-RNA, and RASP) and machine learning methods (ARES and RNA3DCNN) across complementary assessment metrics. lociPARSE is freely available at https://github.com/Bhattacharya-Lab/lociPARSE.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8655-8664"},"PeriodicalIF":5.6,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11600500/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142612432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}