首页 > 最新文献

Journal of Chemical Information and Modeling 最新文献

英文 中文
Ligand-Based Drug Discovery Leveraging State-of-the-Art Machine Learning Methodologies Exemplified by Cdr1 Inhibitor Prediction.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-04-16 DOI: 10.1021/acs.jcim.5c00374
The-Chuong Trinh,Pierre Falson,Viet-Khoa Tran-Nguyen,Ahcène Boumendjel
Artificial intelligence (AI) is revolutionizing drug discovery with unprecedented speed and efficiency. In computer-aided drug design, structure-based and ligand-based methodologies are the main driving forces for innovation. In cases where no experimental structure or high-confidence homology/AlphaFold-predicted model of the target is available in 3D, ligand-based strategies are generally preferable. Here, we aim to develop and evaluate new predictive AI models for ligand-based drug discovery. To illustrate our workflow, we propose, as an example, an ensemble classification model for Cdr1 inhibitor prediction. We leverage target-specific experimental data from different sources, various molecular feature types, and multiple state-of-the-art machine learning (ML) algorithms alongside a multi-instance 3D graph neural network (multiple conformations of a single molecule are considered). Bayesian hyperparameter tuning, stacked generalization, and soft voting are involved in our workflow. The final target-specific ensemble model benefits from the classification and screening power of those constituting it. On an external test set structurally dissimilar to the training data, its average precision is 0.755, its F1-score is 0.714, the area under the receiver operating characteristic curve is 0.884, and the balanced accuracy is 0.799. It gives a low false positive rate of 0.1236 on another test set outside the training chemical space, indicating its ability to avoid false positives. The present work highlights the potential of stacking ensemble ML and offers a rigorous general workflow to build ligand-based predictive AI models for other targets.
{"title":"Ligand-Based Drug Discovery Leveraging State-of-the-Art Machine Learning Methodologies Exemplified by Cdr1 Inhibitor Prediction.","authors":"The-Chuong Trinh,Pierre Falson,Viet-Khoa Tran-Nguyen,Ahcène Boumendjel","doi":"10.1021/acs.jcim.5c00374","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00374","url":null,"abstract":"Artificial intelligence (AI) is revolutionizing drug discovery with unprecedented speed and efficiency. In computer-aided drug design, structure-based and ligand-based methodologies are the main driving forces for innovation. In cases where no experimental structure or high-confidence homology/AlphaFold-predicted model of the target is available in 3D, ligand-based strategies are generally preferable. Here, we aim to develop and evaluate new predictive AI models for ligand-based drug discovery. To illustrate our workflow, we propose, as an example, an ensemble classification model for Cdr1 inhibitor prediction. We leverage target-specific experimental data from different sources, various molecular feature types, and multiple state-of-the-art machine learning (ML) algorithms alongside a multi-instance 3D graph neural network (multiple conformations of a single molecule are considered). Bayesian hyperparameter tuning, stacked generalization, and soft voting are involved in our workflow. The final target-specific ensemble model benefits from the classification and screening power of those constituting it. On an external test set structurally dissimilar to the training data, its average precision is 0.755, its F1-score is 0.714, the area under the receiver operating characteristic curve is 0.884, and the balanced accuracy is 0.799. It gives a low false positive rate of 0.1236 on another test set outside the training chemical space, indicating its ability to avoid false positives. The present work highlights the potential of stacking ensemble ML and offers a rigorous general workflow to build ligand-based predictive AI models for other targets.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"36 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143846330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physics-Informed Multifidelity Gaussian Process: Modeling the Effect of Water and Temperature on the Viscosity of a Deep Eutectic Solvent.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-04-16 DOI: 10.1021/acs.jcim.5c00157
Maximilian Fleck,Samir Darouich,Jürgen Pleiss,Niels Hansen,Marcelle B M Spera
Knowledge of shear viscosity as function of temperature and composition of an aqueous deep eutectic solvent mixture is essential for process design but can be highly challenging and costly to measure. The present work proposes to combine a small set of experimentally determined viscosities with a small set of simulated values within a linear multifidelity approach to predict the dependency of shear viscosity on temperature and composition. This method provides a simple approach that requires a physics-based transformation of viscosity data prior to training, without the need for additional data such as densities. This allows reduction in cost with experiments and reduces the number of experiments and simulations required to characterize a specific system. The data-driven component of the model does not concern the viscosity itself but rather the excess free energy term within the framework of a mixture viscosity model according to Eyring's absolute rate theory. Moreover, we illustrate the application of kernel-based machine learning approaches to daily research questions where data availability is limited compared to the data set size typically required for neural networks.
{"title":"Physics-Informed Multifidelity Gaussian Process: Modeling the Effect of Water and Temperature on the Viscosity of a Deep Eutectic Solvent.","authors":"Maximilian Fleck,Samir Darouich,Jürgen Pleiss,Niels Hansen,Marcelle B M Spera","doi":"10.1021/acs.jcim.5c00157","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00157","url":null,"abstract":"Knowledge of shear viscosity as function of temperature and composition of an aqueous deep eutectic solvent mixture is essential for process design but can be highly challenging and costly to measure. The present work proposes to combine a small set of experimentally determined viscosities with a small set of simulated values within a linear multifidelity approach to predict the dependency of shear viscosity on temperature and composition. This method provides a simple approach that requires a physics-based transformation of viscosity data prior to training, without the need for additional data such as densities. This allows reduction in cost with experiments and reduces the number of experiments and simulations required to characterize a specific system. The data-driven component of the model does not concern the viscosity itself but rather the excess free energy term within the framework of a mixture viscosity model according to Eyring's absolute rate theory. Moreover, we illustrate the application of kernel-based machine learning approaches to daily research questions where data availability is limited compared to the data set size typically required for neural networks.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"45 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143846332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing Imbalanced Classification Problems in Drug Discovery and Development Using Random Forest, Support Vector Machine, AutoGluon-Tabular, and H2O AutoML.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-04-15 DOI: 10.1021/acs.jcim.5c00023
Ayush Garg,Narayanan Ramamurthi,Shyam Sundar Das
The classification models built on class imbalanced data sets tend to prioritize the accuracy of the majority class, and thus, the minority class generally has a higher misclassification rate. Different techniques are available to address the class imbalance in classification models and can be categorized as data-level, algorithm-level, and hybrid methods. But to the best of our knowledge, an in-depth analysis of the performance of these techniques against the class ratio is not available in the literature. We have addressed these shortcomings in this study and have performed a detailed analysis of the performance of four different techniques to address imbalanced class distribution using machine learning (ML) methods and AutoML tools. To carry out our study, we have selected four such techniques─(a) threshold optimization using (i) GHOST and (ii) the area under the precision-recall curve (AUPR) curve, (b) internal balancing method of AutoML and class-weight of machine learning methods, and (c) data balancing using SMOTETomek─and generated 27 data sets considering nine different class ratios (i.e., the ratio of the positive class and total samples) from three data sets that belong to the drug discovery and development field. We have employed random forest (RF) and support vector machine (SVM) as representatives of ML classifier and AutoGluon-Tabular (version 0.6.1) and H2O AutoML (version 3.40.0.4) as representatives of AutoML tools. The important findings of our studies are as follows: (i) there is no effect of threshold optimization on ranking metrics such as AUC and AUPR, but AUC and AUPR get affected by class-weighting and SMOTTomek; (ii) for ML methods RF and SVM, significant percentage improvement up to 375, 33.33, and 450 over all the data sets can be achieved, respectively, for F1 score, MCC, and balanced accuracy, which are suitable for performance evaluation of imbalanced data sets; (iii) for AutoML libraries AutoGluon-Tabular and H2O AutoML, significant percentage improvement up to 383.33, 37.25, and 533.33 over all the data sets can be achieved, respectively, for F1 score, MCC, and balanced accuracy; (iv) the general pattern of percentage improvement in balanced accuracy is that the percentage improvement increases when the class ratio is systematically decreased from 0.5 to 0.1; in the case of F1 score and MCC, maximum improvement is achieved at the class ratio of 0.3; (v) for both ML and AutoML with balancing, it is observed that any individual class-balancing technique does not outperform all other methods on a significantly higher number of data sets based on F1 score; (vi) the three external balancing techniques combined outperformed the internal balancing methods of the ML and AutoML; (vii) AutoML tools perform as good as the ML models and in some cases perform even better for handling imbalanced classification when applied with imbalance handling techniques. In summary, exploration of multiple data balancing techniques is recom
{"title":"Addressing Imbalanced Classification Problems in Drug Discovery and Development Using Random Forest, Support Vector Machine, AutoGluon-Tabular, and H2O AutoML.","authors":"Ayush Garg,Narayanan Ramamurthi,Shyam Sundar Das","doi":"10.1021/acs.jcim.5c00023","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00023","url":null,"abstract":"The classification models built on class imbalanced data sets tend to prioritize the accuracy of the majority class, and thus, the minority class generally has a higher misclassification rate. Different techniques are available to address the class imbalance in classification models and can be categorized as data-level, algorithm-level, and hybrid methods. But to the best of our knowledge, an in-depth analysis of the performance of these techniques against the class ratio is not available in the literature. We have addressed these shortcomings in this study and have performed a detailed analysis of the performance of four different techniques to address imbalanced class distribution using machine learning (ML) methods and AutoML tools. To carry out our study, we have selected four such techniques─(a) threshold optimization using (i) GHOST and (ii) the area under the precision-recall curve (AUPR) curve, (b) internal balancing method of AutoML and class-weight of machine learning methods, and (c) data balancing using SMOTETomek─and generated 27 data sets considering nine different class ratios (i.e., the ratio of the positive class and total samples) from three data sets that belong to the drug discovery and development field. We have employed random forest (RF) and support vector machine (SVM) as representatives of ML classifier and AutoGluon-Tabular (version 0.6.1) and H2O AutoML (version 3.40.0.4) as representatives of AutoML tools. The important findings of our studies are as follows: (i) there is no effect of threshold optimization on ranking metrics such as AUC and AUPR, but AUC and AUPR get affected by class-weighting and SMOTTomek; (ii) for ML methods RF and SVM, significant percentage improvement up to 375, 33.33, and 450 over all the data sets can be achieved, respectively, for F1 score, MCC, and balanced accuracy, which are suitable for performance evaluation of imbalanced data sets; (iii) for AutoML libraries AutoGluon-Tabular and H2O AutoML, significant percentage improvement up to 383.33, 37.25, and 533.33 over all the data sets can be achieved, respectively, for F1 score, MCC, and balanced accuracy; (iv) the general pattern of percentage improvement in balanced accuracy is that the percentage improvement increases when the class ratio is systematically decreased from 0.5 to 0.1; in the case of F1 score and MCC, maximum improvement is achieved at the class ratio of 0.3; (v) for both ML and AutoML with balancing, it is observed that any individual class-balancing technique does not outperform all other methods on a significantly higher number of data sets based on F1 score; (vi) the three external balancing techniques combined outperformed the internal balancing methods of the ML and AutoML; (vii) AutoML tools perform as good as the ML models and in some cases perform even better for handling imbalanced classification when applied with imbalance handling techniques. In summary, exploration of multiple data balancing techniques is recom","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"50 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143836483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Statistical Conditions of Coevolutionary Signals that Enable Algorithmic Predictions of Protein Partners.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-04-15 DOI: 10.1021/acs.jcim.5c00052
José Fiorote,João Alves,Letícia Stock,Werner Treptow
This study examines the statistical conditions of coevolutionary signals that allow algorithmic predictions of protein partners based on amino acid sequences rather than 3D structures. It introduces a Markov stochastic model that predicts the number of correct protein partners based on coevolutionary information. The model defines state probabilities using a Poisson mixture of normal distributions, with key parameters including the total number of protein sequences M, the coevolutionary information gap α, and variance σ02. The model suggests that algorithmic approaches that maximize coevolutionary information cannot effectively resolve partners in protein families with a large number of sequences M ≥ 100. The model shows that true-positive (TP) rates can be enhanced by disregarding mismatches among similar sequences. This approach allows a distinction, in terms of {α, σ02}, between optimized solutions with trivial errors and other degenerate solutions. Our findings enable the a priori classification of protein families where partners can be reliably predicted by ignoring trivial errors between similar sequences, advancing the understanding of coevolutionary models for large protein data sets.
{"title":"Investigating Statistical Conditions of Coevolutionary Signals that Enable Algorithmic Predictions of Protein Partners.","authors":"José Fiorote,João Alves,Letícia Stock,Werner Treptow","doi":"10.1021/acs.jcim.5c00052","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00052","url":null,"abstract":"This study examines the statistical conditions of coevolutionary signals that allow algorithmic predictions of protein partners based on amino acid sequences rather than 3D structures. It introduces a Markov stochastic model that predicts the number of correct protein partners based on coevolutionary information. The model defines state probabilities using a Poisson mixture of normal distributions, with key parameters including the total number of protein sequences M, the coevolutionary information gap α, and variance σ02. The model suggests that algorithmic approaches that maximize coevolutionary information cannot effectively resolve partners in protein families with a large number of sequences M ≥ 100. The model shows that true-positive (TP) rates can be enhanced by disregarding mismatches among similar sequences. This approach allows a distinction, in terms of {α, σ02}, between optimized solutions with trivial errors and other degenerate solutions. Our findings enable the a priori classification of protein families where partners can be reliably predicted by ignoring trivial errors between similar sequences, advancing the understanding of coevolutionary models for large protein data sets.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"26 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143836596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
De Novo Design of Cyclic Peptide Binders Based on Fragment Docking and Assembling.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-04-14 DOI: 10.1021/acs.jcim.5c00088
Changsheng Zhang,Fanhao Wang,Tiantian Zhang,Yang Yang,Liying Wang,Xiaoling Zhang,Luhua Lai
Cyclic peptides offer distinct advantages in modulating protein-protein interactions (PPIs), including enhanced target specificity, structural stability, reduced toxicity, and minimal immunogenicity. However, most cyclic peptide therapeutics currently in clinical development are derived from natural products or the cyclization of protein loops, with few methodologies available for de novo cyclic peptide design based on target protein structures. To fill this gap, we introduce CycDockAssem, an integrative computational platform that facilitates the systematic generation of head-to-tail cyclic peptides made entirely of natural - or -amino acid residues. The cyclic peptide binders are constructed from oligopeptide fragments containing 3-5 amino acids. A fragment library comprising 15 million fragments was created from the Protein Data Bank. The assembly workflow involves dividing the targeted protein surface into two docking boxes; the updated protein-protein docking program SDOCK2.0 is then utilized to identify the best binding fragments for these boxes. The fragments binding in different boxes are concatenated into a ring using two additional peptide fragments as linkers. A ROSETTA script is employed for sequence redesign, while molecular dynamics simulations and MM-PBSA calculations assess the conformational stability and binding free energy. To enhance docking performance, cation-π interactions, backbone hydrogen bonding potential, and explicit water exclusion energy were incorporated into the docking score function of SDOCK2.0, resulting in a significantly improved performance on the updated test set. A mirror design strategy was developed for cyclic peptides composed of -amino acids, where natural amino acid cyclic peptide binders are first designed for the mirror image of the target protein and the resulting complexes are then mirrored back. CycDockAssem was experimentally validated using tumor necrosis factor α (TNFα) as the target. Surface plasmon resonance experiments demonstrated that six of the seven designed cyclic peptides bind TNFα with micromolar affinity, two of which significantly inhibit TNFα downstream gene expression. Overall, CycDockAssem provides a robust strategy for targeted de novo cyclic peptide drug discovery.
{"title":"De Novo Design of Cyclic Peptide Binders Based on Fragment Docking and Assembling.","authors":"Changsheng Zhang,Fanhao Wang,Tiantian Zhang,Yang Yang,Liying Wang,Xiaoling Zhang,Luhua Lai","doi":"10.1021/acs.jcim.5c00088","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00088","url":null,"abstract":"Cyclic peptides offer distinct advantages in modulating protein-protein interactions (PPIs), including enhanced target specificity, structural stability, reduced toxicity, and minimal immunogenicity. However, most cyclic peptide therapeutics currently in clinical development are derived from natural products or the cyclization of protein loops, with few methodologies available for de novo cyclic peptide design based on target protein structures. To fill this gap, we introduce CycDockAssem, an integrative computational platform that facilitates the systematic generation of head-to-tail cyclic peptides made entirely of natural - or -amino acid residues. The cyclic peptide binders are constructed from oligopeptide fragments containing 3-5 amino acids. A fragment library comprising 15 million fragments was created from the Protein Data Bank. The assembly workflow involves dividing the targeted protein surface into two docking boxes; the updated protein-protein docking program SDOCK2.0 is then utilized to identify the best binding fragments for these boxes. The fragments binding in different boxes are concatenated into a ring using two additional peptide fragments as linkers. A ROSETTA script is employed for sequence redesign, while molecular dynamics simulations and MM-PBSA calculations assess the conformational stability and binding free energy. To enhance docking performance, cation-π interactions, backbone hydrogen bonding potential, and explicit water exclusion energy were incorporated into the docking score function of SDOCK2.0, resulting in a significantly improved performance on the updated test set. A mirror design strategy was developed for cyclic peptides composed of -amino acids, where natural amino acid cyclic peptide binders are first designed for the mirror image of the target protein and the resulting complexes are then mirrored back. CycDockAssem was experimentally validated using tumor necrosis factor α (TNFα) as the target. Surface plasmon resonance experiments demonstrated that six of the seven designed cyclic peptides bind TNFα with micromolar affinity, two of which significantly inhibit TNFα downstream gene expression. Overall, CycDockAssem provides a robust strategy for targeted de novo cyclic peptide drug discovery.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"41 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143831550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crystal Structure Prediction Using a Self-Attention Neural Network and Semantic Segmentation.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-04-14 DOI: 10.1021/acs.jcim.4c02345
Wuling Zhao,Minxia Zhou,Jialin Shao,Jingzheng Ren,Yusha Hu,Yulin Han,Yi Man
The development of new materials is a time-consuming and resource-intensive process. Deep learning has emerged as a promising approach to accelerate this process. However, accurately predicting crystal structures using deep learning remains a significant challenge due to the complex, high-dimensional nature of atomic interactions and the scarcity of comprehensive training data that captures the full diversity of possible crystal configurations. This work developed a neural network model based on a data set comprising thousands of crystallographic information files from existing crystal structure databases. The model incorporates a self-attention mechanism to enhance prediction accuracy by learning and extracting both local and global features of three-dimensional structures, treating the atoms in each crystal as point sets. This approach enables effective semantic segmentation and accurate unit cell prediction. Experimental results demonstrate that for unit cells containing up to 500 atoms, the model achieves a structure prediction accuracy of 89.78%.
{"title":"Crystal Structure Prediction Using a Self-Attention Neural Network and Semantic Segmentation.","authors":"Wuling Zhao,Minxia Zhou,Jialin Shao,Jingzheng Ren,Yusha Hu,Yulin Han,Yi Man","doi":"10.1021/acs.jcim.4c02345","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02345","url":null,"abstract":"The development of new materials is a time-consuming and resource-intensive process. Deep learning has emerged as a promising approach to accelerate this process. However, accurately predicting crystal structures using deep learning remains a significant challenge due to the complex, high-dimensional nature of atomic interactions and the scarcity of comprehensive training data that captures the full diversity of possible crystal configurations. This work developed a neural network model based on a data set comprising thousands of crystallographic information files from existing crystal structure databases. The model incorporates a self-attention mechanism to enhance prediction accuracy by learning and extracting both local and global features of three-dimensional structures, treating the atoms in each crystal as point sets. This approach enables effective semantic segmentation and accurate unit cell prediction. Experimental results demonstrate that for unit cells containing up to 500 atoms, the model achieves a structure prediction accuracy of 89.78%.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"22 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143836484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Three-Dimensional CH/π and CH/N Interactions from Quantum-Mechanical and Database Analyses.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-04-14 DOI: 10.1021/acs.jcim.5c00124
Daichi Hayakawa,Hiroaki Gouda
Quantum mechanical (QM)-level molecular interaction fields (MIFs) are three-dimensional potential maps that describe the intermolecular interactions surrounding a target molecule, derived through QM calculations. This study employs QM-level MIFs (MIFs(QM)) and analyses of the Cambridge Structural Database (CSD) to uncover the three-dimensional characteristics of CH/π and CH/N interactions in typical nitrogen-containing heterocyclic compounds. Our findings confirm the reliability and applicability of MIF(QM) calculations for analyzing CH/π and CH/N interactions. Additionally, we propose approximation functions of MIFs(QM) and demonstrate that the resulting MIFs(func) are effective for studying CH/π and CH/N interactions in protein/ligand systems.
量子力学(QM)级分子相互作用场(MIFs)是通过量子力学计算得出的描述目标分子周围分子间相互作用的三维势图。本研究利用量子力学级分子相互作用场(MIFs(QM))和对剑桥结构数据库(CSD)的分析,揭示了典型含氮杂环化合物中 CH/π 和 CH/N 相互作用的三维特征。我们的研究结果证实了 MIF(QM) 计算在分析 CH/π 和 CH/N 相互作用方面的可靠性和适用性。此外,我们还提出了 MIFs(QM) 的近似函数,并证明由此得到的 MIFs(func) 能够有效地研究蛋白质/配体体系中的 CH/π 和 CH/N 相互作用。
{"title":"Three-Dimensional CH/π and CH/N Interactions from Quantum-Mechanical and Database Analyses.","authors":"Daichi Hayakawa,Hiroaki Gouda","doi":"10.1021/acs.jcim.5c00124","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00124","url":null,"abstract":"Quantum mechanical (QM)-level molecular interaction fields (MIFs) are three-dimensional potential maps that describe the intermolecular interactions surrounding a target molecule, derived through QM calculations. This study employs QM-level MIFs (MIFs(QM)) and analyses of the Cambridge Structural Database (CSD) to uncover the three-dimensional characteristics of CH/π and CH/N interactions in typical nitrogen-containing heterocyclic compounds. Our findings confirm the reliability and applicability of MIF(QM) calculations for analyzing CH/π and CH/N interactions. Additionally, we propose approximation functions of MIFs(QM) and demonstrate that the resulting MIFs(func) are effective for studying CH/π and CH/N interactions in protein/ligand systems.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"74 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143836486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Prediction of Drug-Target Interactions via Cross-Modal Feature Mapping with Learnable Association Information. 利用可学习的关联信息,通过跨模态特征映射动态预测药物与靶点的相互作用
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-04-14 DOI: 10.1021/acs.jcim.4c02348
Ziyu Wei,Zhengyu Wang,Chang Tang
Predicting drug-target interactions (DTIs) is essential for advancing drug discovery and personalized medicine. However, accurately capturing the intricate binding relationships between drugs and targets remains a significant challenge, particularly when attempting to fully leverage the vast correlation information inherent in molecular data. This complexity is further exacerbated by the structural differences and sequence length disparities between drug molecules and protein targets, which can hinder effective feature alignment and interaction modeling. To address these challenges, we propose a model named LAM-DTI. First, drug and target features are extracted from the original molecular sequence data using a multilayer convolutional neural network. To address the sequence length discrepancy between drug and target features, we apply a connectionist temporal classification module to generate normalized feature sequences. Building on this, we introduce a learnable association information matrix as a flexible intermediary, which dynamically adjusts to capture accurate DTI association information, thereby enhancing cross-modal mapping within a unified latent space. This progressive mapping strategy enables the model to form an interaction projection between drugs and targets, effectively identifying critical interaction regions and guiding the capture of complex interaction-related features. Extensive experiments on three well-known benchmark data sets demonstrate that LAM-DTI significantly outperforms previous models.
{"title":"Dynamic Prediction of Drug-Target Interactions via Cross-Modal Feature Mapping with Learnable Association Information.","authors":"Ziyu Wei,Zhengyu Wang,Chang Tang","doi":"10.1021/acs.jcim.4c02348","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02348","url":null,"abstract":"Predicting drug-target interactions (DTIs) is essential for advancing drug discovery and personalized medicine. However, accurately capturing the intricate binding relationships between drugs and targets remains a significant challenge, particularly when attempting to fully leverage the vast correlation information inherent in molecular data. This complexity is further exacerbated by the structural differences and sequence length disparities between drug molecules and protein targets, which can hinder effective feature alignment and interaction modeling. To address these challenges, we propose a model named LAM-DTI. First, drug and target features are extracted from the original molecular sequence data using a multilayer convolutional neural network. To address the sequence length discrepancy between drug and target features, we apply a connectionist temporal classification module to generate normalized feature sequences. Building on this, we introduce a learnable association information matrix as a flexible intermediary, which dynamically adjusts to capture accurate DTI association information, thereby enhancing cross-modal mapping within a unified latent space. This progressive mapping strategy enables the model to form an interaction projection between drugs and targets, effectively identifying critical interaction regions and guiding the capture of complex interaction-related features. Extensive experiments on three well-known benchmark data sets demonstrate that LAM-DTI significantly outperforms previous models.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"22 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143831820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning and Structural Dynamics-Based Approach to Reveal Molecular Mechanism of PTEN Missense Mutations Shared by Cancer and Autism Spectrum Disorder. 基于机器学习和结构动力学的方法揭示癌症和自闭症谱系障碍共有的 PTEN 错义突变的分子机制
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-04-14 DOI: 10.1021/acs.jcim.5c00134
Miao Yang,Jingran Wang,Ziyun Zhou,Wentian Li,Gennady Verkhivker,Fei Xiao,Guang Hu
Missense mutations in oncogenic proteins that are concurrently associated with neurodevelopmental disorders have garnered significant attention. Phosphatase and tensin homologue (PTEN) serves as a paradigmatic model for mapping its mutational landscape and identifying genotypic predictors of distinct phenotypic outcomes, including cancer and autism spectrum disorder (ASD). Despite extensive research into the genotype-phenotype correlations of PTEN mutations, the mechanisms underlying the dual association of specific PTEN mutations with both cancer and ASD (PTEN-cancer/ASD mutations) remain elusive. This study introduces an integrative approach that combines machine learning (ML) with structural dynamics to elucidate the molecular effects of PTEN-cancer/ASD mutations. Analysis of biophysical and network-biology-based signatures reveals a complex energetic and functional landscape. Subsequently, an ML model and corresponding integrated score were developed to classify and predict PTEN-cancer/ASD mutations, underscoring the significance of protein dynamics in predicting cellular phenotypes. Further molecular dynamics simulations demonstrated that PTEN-cancer/ASD mutations induce dynamic alterations characterized by open conformational changes restricted to the P loop and coupled with interdomain allosteric regulation. This research aims to enhance the genotypic and phenotypic understanding of PTEN-cancer/ASD mutations through an interpretable ML model integrated with structural dynamics analysis. By identifying shared mechanisms between cancer and ASD, the findings pave the way for the development of novel therapeutic strategies.
{"title":"Machine Learning and Structural Dynamics-Based Approach to Reveal Molecular Mechanism of PTEN Missense Mutations Shared by Cancer and Autism Spectrum Disorder.","authors":"Miao Yang,Jingran Wang,Ziyun Zhou,Wentian Li,Gennady Verkhivker,Fei Xiao,Guang Hu","doi":"10.1021/acs.jcim.5c00134","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00134","url":null,"abstract":"Missense mutations in oncogenic proteins that are concurrently associated with neurodevelopmental disorders have garnered significant attention. Phosphatase and tensin homologue (PTEN) serves as a paradigmatic model for mapping its mutational landscape and identifying genotypic predictors of distinct phenotypic outcomes, including cancer and autism spectrum disorder (ASD). Despite extensive research into the genotype-phenotype correlations of PTEN mutations, the mechanisms underlying the dual association of specific PTEN mutations with both cancer and ASD (PTEN-cancer/ASD mutations) remain elusive. This study introduces an integrative approach that combines machine learning (ML) with structural dynamics to elucidate the molecular effects of PTEN-cancer/ASD mutations. Analysis of biophysical and network-biology-based signatures reveals a complex energetic and functional landscape. Subsequently, an ML model and corresponding integrated score were developed to classify and predict PTEN-cancer/ASD mutations, underscoring the significance of protein dynamics in predicting cellular phenotypes. Further molecular dynamics simulations demonstrated that PTEN-cancer/ASD mutations induce dynamic alterations characterized by open conformational changes restricted to the P loop and coupled with interdomain allosteric regulation. This research aims to enhance the genotypic and phenotypic understanding of PTEN-cancer/ASD mutations through an interpretable ML model integrated with structural dynamics analysis. By identifying shared mechanisms between cancer and ASD, the findings pave the way for the development of novel therapeutic strategies.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"50 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143836612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mechanistic Insights into the Nonenzymatic Biosynthesis of Artemisinin and Related Natural Products: A Quantum Chemical Study.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-04-13 DOI: 10.1021/acs.jcim.5c00034
Maocai Yan,Xinfa Ding,Likun Zhao,Xudong Lü,Hai-Yan He,Shuai Fan,Zhaoyong Yang
Artemisinin (Qinghaosu) is an important antimalaria natural medicine containing a unique endoperoxide bridge in its sesquiterpene structure. The last phase of artemisinin biosynthesis involves conversion of dihydroartemisinic acid (DHAA) to artemisinin, and the detailed mechanism remains unclear. Based on previous experimental studies, this work investigated the possible mechanism of nonenzymatic conversion of DHAA to artemisinin and identified the most chemically plausible reaction pathway using quantum chemical computations. The rate-determining step in this pathway is acid-catalyzed oxidation of the enol by triplet O2, with an overall free energy barrier of 22.5 kcal/mol. This pathway also gives byproducts dihydroarteannuin B and dihydro-epi-arteannuin B. In addition, the nonenzymatic formation mechanism of 21 natural products from Artemisia annua was discussed in this work. These results provide fundamental knowledge of the biosynthetic processes of artemisinin and related natural products, as well as important references for semisynthesis and structural modification studies of artemisinin.
{"title":"Mechanistic Insights into the Nonenzymatic Biosynthesis of Artemisinin and Related Natural Products: A Quantum Chemical Study.","authors":"Maocai Yan,Xinfa Ding,Likun Zhao,Xudong Lü,Hai-Yan He,Shuai Fan,Zhaoyong Yang","doi":"10.1021/acs.jcim.5c00034","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00034","url":null,"abstract":"Artemisinin (Qinghaosu) is an important antimalaria natural medicine containing a unique endoperoxide bridge in its sesquiterpene structure. The last phase of artemisinin biosynthesis involves conversion of dihydroartemisinic acid (DHAA) to artemisinin, and the detailed mechanism remains unclear. Based on previous experimental studies, this work investigated the possible mechanism of nonenzymatic conversion of DHAA to artemisinin and identified the most chemically plausible reaction pathway using quantum chemical computations. The rate-determining step in this pathway is acid-catalyzed oxidation of the enol by triplet O2, with an overall free energy barrier of 22.5 kcal/mol. This pathway also gives byproducts dihydroarteannuin B and dihydro-epi-arteannuin B. In addition, the nonenzymatic formation mechanism of 21 natural products from Artemisia annua was discussed in this work. These results provide fundamental knowledge of the biosynthetic processes of artemisinin and related natural products, as well as important references for semisynthesis and structural modification studies of artemisinin.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"60 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143831623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemical Information and Modeling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1