Pub Date : 2024-12-09Epub Date: 2024-11-20DOI: 10.1021/acs.jcim.4c01558
Jingzhi Wang, Sida Qin, Xiaohui Zhang, Jixin Zhi
Background: This study explores the pathological mechanisms of atherosclerosis (AS), focusing on the role of macrophages in its formation and development, and potential therapeutic targets.
Methods: The heterogeneity of the AS single-cell data set GSE131778 was analyzed using Seurat. Tissue sequencing data GSE28829 and GSE43292 were analyzed for immune cell abundance using CIBERSORT. Differential genes were identified, and WGCNA was used to create a coexpression network. Hub genes were identified using MCODE and CytoHubba and analyzed with GO and KEGG enrichment analysis, GSVA, and immune infiltration analysis. DrugBank identified potential drugs, and molecular docking verified drug binding to key targets. Key targets were experimentally validated.
Results: Nineteen cell clusters were identified in the GSE131778 data set, classified into ten cell types. Macrophages in AS and normal tissues were identified based on cell abundance. CIBERSORT showed a significant increase in cell cluster 9 in AS samples. Thirty-two hub genes, including CD86, LILRB2, and IRF8, were validated. GO and KEGG analyses indicated Hub genes primarily affect immune functions. GSVA identified 29 significantly increased pathways in AS samples. Immune infiltration analysis revealed a positive correlation between IRF8, CD86, and LILRB2 expression and macrophage content. Molecular docking suggested CD86 as a potential drug target for AS. qRT-PCR confirmed increased IRF8 and CD86 expression.
Conclusions: CD86, LILRB2, and IRF8 are highly expressed in foam cell samples, with CD86 forming hydrogen bonds with several AS drugs, indicating CD86 as a promising target for AS treatment.
{"title":"Identification of Macrophage-Associated Novel Drug Targets in Atherosclerosis Based on Integrated Transcriptome Features.","authors":"Jingzhi Wang, Sida Qin, Xiaohui Zhang, Jixin Zhi","doi":"10.1021/acs.jcim.4c01558","DOIUrl":"10.1021/acs.jcim.4c01558","url":null,"abstract":"<p><strong>Background: </strong>This study explores the pathological mechanisms of atherosclerosis (AS), focusing on the role of macrophages in its formation and development, and potential therapeutic targets.</p><p><strong>Methods: </strong>The heterogeneity of the AS single-cell data set GSE131778 was analyzed using Seurat. Tissue sequencing data GSE28829 and GSE43292 were analyzed for immune cell abundance using CIBERSORT. Differential genes were identified, and WGCNA was used to create a coexpression network. Hub genes were identified using MCODE and CytoHubba and analyzed with GO and KEGG enrichment analysis, GSVA, and immune infiltration analysis. DrugBank identified potential drugs, and molecular docking verified drug binding to key targets. Key targets were experimentally validated.</p><p><strong>Results: </strong>Nineteen cell clusters were identified in the GSE131778 data set, classified into ten cell types. Macrophages in AS and normal tissues were identified based on cell abundance. CIBERSORT showed a significant increase in cell cluster 9 in AS samples. Thirty-two hub genes, including CD86, LILRB2, and IRF8, were validated. GO and KEGG analyses indicated Hub genes primarily affect immune functions. GSVA identified 29 significantly increased pathways in AS samples. Immune infiltration analysis revealed a positive correlation between IRF8, CD86, and LILRB2 expression and macrophage content. Molecular docking suggested CD86 as a potential drug target for AS. qRT-PCR confirmed increased IRF8 and CD86 expression.</p><p><strong>Conclusions: </strong>CD86, LILRB2, and IRF8 are highly expressed in foam cell samples, with CD86 forming hydrogen bonds with several AS drugs, indicating CD86 as a promising target for AS treatment.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9009-9020"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142680023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09Epub Date: 2024-11-22DOI: 10.1021/acs.jcim.4c01116
Ho-Joon Lee, Prashant S Emani, Mark B Gerstein
The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Many computational models for binding affinity prediction have been developed, but with varying results across targets. Given that ensembling or meta-modeling approaches have shown great promise in reducing model-specific biases, we develop a framework to integrate published force-field-based empirical docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual base models, training databases, and several meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over base models. Our best meta-models achieve comparable performance to state-of-the-art deep learning tools exclusively based on 3D structures while allowing for improved database scalability and flexibility through the explicit inclusion of features such as physicochemical properties or molecular descriptors. We further demonstrate improved generalization capability by our models using a large-scale benchmark of affinity prediction as well as a virtual screening application benchmark. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain meaningful improvement in binding affinity prediction.
{"title":"Improved Prediction of Ligand-Protein Binding Affinities by Meta-modeling.","authors":"Ho-Joon Lee, Prashant S Emani, Mark B Gerstein","doi":"10.1021/acs.jcim.4c01116","DOIUrl":"10.1021/acs.jcim.4c01116","url":null,"abstract":"<p><p>The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Many computational models for binding affinity prediction have been developed, but with varying results across targets. Given that ensembling or meta-modeling approaches have shown great promise in reducing model-specific biases, we develop a framework to integrate published force-field-based empirical docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual base models, training databases, and several meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over base models. Our best meta-models achieve comparable performance to state-of-the-art deep learning tools exclusively based on 3D structures while allowing for improved database scalability and flexibility through the explicit inclusion of features such as physicochemical properties or molecular descriptors. We further demonstrate improved generalization capability by our models using a large-scale benchmark of affinity prediction as well as a virtual screening application benchmark. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain meaningful improvement in binding affinity prediction.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8684-8704"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11632770/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142692246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09Epub Date: 2024-11-18DOI: 10.1021/acs.jcim.4c01278
Romanos Fasoulis, Georgios Paliouras, Lydia E Kavraki
The binding of peptides to class-I Major Histocompability Complex (MHC) receptors and their subsequent recognition downstream by T-cell receptors are crucial processes for most multicellular organisms to be able to fight various diseases. Thus, the identification of peptide antigens that can elicit an immune response is of immense importance for developing successful therapies for bacterial and viral infections, even cancer. Recently, studies have demonstrated the importance of peptide-MHC (pMHC) structural analysis, with pMHC structural modeling methods gradually becoming more popular in peptide antigen identification workflows. Most of the pMHC structural modeling tools provide an ensemble of candidate peptide poses in the MHC-I cleft, each associated with a score stemming from a scoring function, with the top scoring pose assumed to be the most representative of the ensemble. However, identifying the binding mode, that is, the peptide pose from the ensemble that is closer to an unavailable native structure, is not trivial. Oftentimes, the peptide poses characterized as best by a protein-ligand scoring function are not the ones that are the most representative of the actual structure. In this work, we frame the peptide binding pose identification problem as a Learning-to-Rank (LTR) problem. We present RankMHC, an LTR-based pMHC binding mode identification predictor, which is specifically trained to predict the most accurate ranking of an ensemble of pMHC conformations. RankMHC outperforms classical peptide-ligand scoring functions, as well as previous Machine Learning (ML)-based binding pose predictors. We further demonstrate that RankMHC can be used with many pMHC structural modeling tools that use different structural modeling protocols.
{"title":"RankMHC: Learning to Rank Class-I Peptide-MHC Structural Models.","authors":"Romanos Fasoulis, Georgios Paliouras, Lydia E Kavraki","doi":"10.1021/acs.jcim.4c01278","DOIUrl":"10.1021/acs.jcim.4c01278","url":null,"abstract":"<p><p>The binding of peptides to class-I Major Histocompability Complex (MHC) receptors and their subsequent recognition downstream by T-cell receptors are crucial processes for most multicellular organisms to be able to fight various diseases. Thus, the identification of peptide antigens that can elicit an immune response is of immense importance for developing successful therapies for bacterial and viral infections, even cancer. Recently, studies have demonstrated the importance of peptide-MHC (pMHC) structural analysis, with pMHC structural modeling methods gradually becoming more popular in peptide antigen identification workflows. Most of the pMHC structural modeling tools provide an ensemble of candidate peptide poses in the MHC-I cleft, each associated with a score stemming from a scoring function, with the top scoring pose assumed to be the most representative of the ensemble. However, identifying the binding mode, that is, the peptide pose from the ensemble that is closer to an unavailable native structure, is not trivial. Oftentimes, the peptide poses characterized as best by a protein-ligand scoring function are not the ones that are the most representative of the actual structure. In this work, we frame the peptide binding pose identification problem as a Learning-to-Rank (LTR) problem. We present RankMHC, an LTR-based pMHC binding mode identification predictor, which is specifically trained to predict the most accurate ranking of an ensemble of pMHC conformations. RankMHC outperforms classical peptide-ligand scoring functions, as well as previous Machine Learning (ML)-based binding pose predictors. We further demonstrate that RankMHC can be used with many pMHC structural modeling tools that use different structural modeling protocols.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8729-8742"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11633655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142646381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09Epub Date: 2024-11-19DOI: 10.1021/acs.jcim.4c01232
Wagner da Rocha, Leo Liberti, Antonio Mucherino, Thérèse E Malliavin
Protein structure prediction is generally based on the use of local conformational information coupled with long-range distance restraints. Such restraints can be derived from the knowledge of a template structure or the analysis of protein sequence alignment in the framework of models arising from the physics of disordered systems. The accuracy of approaches based on sequence alignment, however, is limited in the case where the number of aligned sequences is small. Here, we derive protein conformations using only local conformations knowledge by means of the interval Branch-and-Prune algorithm. The computation efficiency is directly related to the knowledge of stereochemistry (bond angle and ω values) along the protein sequence and, in particular, to the variations of the torsion angle ω. The impact of stereochemistry variations is particularly strong in the case of protein topologies defined from numerous long-range restraints, as in the case of protein of β secondary structures. The systematic enumeration of the conformations improves the efficiency of the calculations. The analysis of DNA codons permits to connect the variations of torsion angle ω to the positions of rare DNA codons.
蛋白质结构预测通常基于局部构象信息和长程距离约束。这些约束条件可以从模板结构知识或无序系统物理学模型框架下的蛋白质序列比对分析中获得。然而,基于序列比对的方法在比对序列数量较少的情况下准确性有限。在这里,我们通过间隔分支和剪切算法,仅利用局部构象知识来推导蛋白质构象。计算效率与沿蛋白质序列的立体化学知识(键角和 ω 值)直接相关,特别是与扭转角 ω 的变化直接相关。如果蛋白质拓扑结构是由许多长程约束条件确定的,那么立体化学变化的影响就特别大,β 二级结构的蛋白质就是这种情况。对构象的系统列举提高了计算效率。通过分析 DNA 密码子,可以将扭转角 ω 的变化与稀有 DNA 密码子的位置联系起来。
{"title":"Influence of Stereochemistry in a Local Approach for Calculating Protein Conformations.","authors":"Wagner da Rocha, Leo Liberti, Antonio Mucherino, Thérèse E Malliavin","doi":"10.1021/acs.jcim.4c01232","DOIUrl":"10.1021/acs.jcim.4c01232","url":null,"abstract":"<p><p>Protein structure prediction is generally based on the use of local conformational information coupled with long-range distance restraints. Such restraints can be derived from the knowledge of a template structure or the analysis of protein sequence alignment in the framework of models arising from the physics of disordered systems. The accuracy of approaches based on sequence alignment, however, is limited in the case where the number of aligned sequences is small. Here, we derive protein conformations using only local conformations knowledge by means of the interval Branch-and-Prune algorithm. The computation efficiency is directly related to the knowledge of stereochemistry (bond angle and ω values) along the protein sequence and, in particular, to the variations of the torsion angle ω. The impact of stereochemistry variations is particularly strong in the case of protein topologies defined from numerous long-range restraints, as in the case of protein of β secondary structures. The systematic enumeration of the conformations improves the efficiency of the calculations. The analysis of DNA codons permits to connect the variations of torsion angle ω to the positions of rare DNA codons.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8999-9008"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09Epub Date: 2024-11-20DOI: 10.1021/acs.jcim.4c01524
Christian Hölzer, Rick Oerder, Stefan Grimme, Jan Hamaekers
Conformer ranking is a crucial task for drug discovery, with methods for generating conformers often based on molecular (meta)dynamics or sophisticated sampling techniques. These methods are constrained by the underlying force computation regarding runtime and energy ranking accuracy, limiting their effectiveness for large-scale screening applications. To address these ranking limitations, we introduce ConfRank, a machine learning-based approach that enhances conformer ranking using pairwise training. We demonstrate its performance using GFN-FF-generated conformer ensembles, leveraging the DimeNet++ architecture trained on pairs of 159 760 uncharged organic compounds from the GEOM data set with r2SCAN-3c reference level. Instead of predicting only on single molecules, this approach captures relative energy differences between conformers, leading to a significant improvement of the overall conformational ranking, outperforming GFN-FF and GFN2-xTB. Thereby, the pairwise RMSD of the relative energy difference of two conformers can be reduced from 5.65 to 0.71 kcal mol-1 on the test data set, allowing to correctly identify up to 81% of all lowest lying conformers correctly (GFN-FF: 10%, GFN2-xTB: 47%). The ConfRank approach is cost-effective, allowing for scalable deployment on both CPU and GPU, achieving runtime accelerations by up to 2 orders of magnitude compared to GFN2-xTB. Out-of-sample investigations on CREST-generated conformer ensembles from the QM9 data set and conformers taken from an extended GMTKN55 data set show promising results for the robustness of this approach. Thereby, ranking correlation coefficient such as Spearman can be improved to 0.90 (GFN-FF: 0.39, GFN2-xTB: 0.84) reducing the probability of an incorrect sign flip in pairwise energy comparison from 32 to 7%. On the extended GMTKN55 subsets the pairwise MAD (RMSD) could be reduced on almost all subsets by up to 62% (58%) with an average improvement of 30% (29%). Moreover, an exemplary case study on vancomycin shows similar performance, indicating applicability to larger (bio)molecular structures. Furthermore, we motivate the usage of the pairwise training approach from a theoretical perspective, highlighting that while pairwise training can lead to a decline in single sample prediction of absolute energies for ML models, it significantly enhances conformer ranking performance. The data and models used in this study are available at https://github.com/grimme-lab/confrank.
{"title":"ConfRank: Improving GFN-FF Conformer Ranking with Pairwise Training.","authors":"Christian Hölzer, Rick Oerder, Stefan Grimme, Jan Hamaekers","doi":"10.1021/acs.jcim.4c01524","DOIUrl":"10.1021/acs.jcim.4c01524","url":null,"abstract":"<p><p>Conformer ranking is a crucial task for drug discovery, with methods for generating conformers often based on molecular (meta)dynamics or sophisticated sampling techniques. These methods are constrained by the underlying force computation regarding runtime and energy ranking accuracy, limiting their effectiveness for large-scale screening applications. To address these ranking limitations, we introduce ConfRank, a machine learning-based approach that enhances conformer ranking using pairwise training. We demonstrate its performance using GFN-FF-generated conformer ensembles, leveraging the DimeNet++ architecture trained on pairs of 159 760 uncharged organic compounds from the GEOM data set with r<sup>2</sup>SCAN-3c reference level. Instead of predicting only on single molecules, this approach captures relative energy differences between conformers, leading to a significant improvement of the overall conformational ranking, outperforming GFN-FF and GFN2-xTB. Thereby, the pairwise RMSD of the relative energy difference of two conformers can be reduced from 5.65 to 0.71 kcal mol<sup>-1</sup> on the test data set, allowing to correctly identify up to 81% of all lowest lying conformers correctly (GFN-FF: 10%, GFN2-xTB: 47%). The ConfRank approach is cost-effective, allowing for scalable deployment on both CPU and GPU, achieving runtime accelerations by up to 2 orders of magnitude compared to GFN2-xTB. Out-of-sample investigations on CREST-generated conformer ensembles from the QM9 data set and conformers taken from an extended GMTKN55 data set show promising results for the robustness of this approach. Thereby, ranking correlation coefficient such as Spearman can be improved to 0.90 (GFN-FF: 0.39, GFN2-xTB: 0.84) reducing the probability of an incorrect sign flip in pairwise energy comparison from 32 to 7%. On the extended GMTKN55 subsets the pairwise MAD (RMSD) could be reduced on almost all subsets by up to 62% (58%) with an average improvement of 30% (29%). Moreover, an exemplary case study on vancomycin shows similar performance, indicating applicability to larger (bio)molecular structures. Furthermore, we motivate the usage of the pairwise training approach from a theoretical perspective, highlighting that while pairwise training can lead to a decline in single sample prediction of absolute energies for ML models, it significantly enhances conformer ranking performance. The data and models used in this study are available at https://github.com/grimme-lab/confrank.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8909-8925"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142680019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09Epub Date: 2024-11-20DOI: 10.1021/acs.jcim.4c01523
Abhishek Bera, Pritish Joshi, Niladri Patra
Since their inception in antibacterial therapy, macrolide-based antibiotics have significantly shaped the evolutionary pathways of pathogenic bacteria, driving them to develop diverse antimicrobial resistance (AMR) mechanisms. Among these, macrolide esterase, commonly referred to as erythromycin esterase, emerged as a critical defense mechanism, enabling bacteria to detoxify macrolides by hydrolyzing the macrolactone ring within the bacterial cell. In this study, we delve into the intricate interactions and conformational dynamics of erythromycin esterase C (EreC), a key member of the Ere enzyme family. We have focused on three FDA-approved and widely prescribed macrolides─erythromycin, clarithromycin, and azithromycin─by employing classical molecular dynamics, absolute binding free energy calculations, and 2D well-tempered metadynamics simulations to explore their interactions with EreC. To estimate the absolute binding free energies, we have used the recently developed and robust "Streamlined Alchemical Free Energy Perturbation (SAFEP)" protocol. The results from our molecular dynamics simulations and advanced analyses portrayed the crucial role of hydrophobic interactions within the macrolide binding cleft of EreC, along with the significant influence of the minor lobe in facilitating overall structural fluctuation. In silico alanine scanning identified top three hydrophobic residues, i.e., PHE248, MET333, and PHE344, responsible for macrolide binding inside that cleft. According to the free energy calculations, azithromycin and clarithromycin showed greater binding affinities toward EreC than the parent macrolide erythromycin. Moreover, 2D metadynamics simulations along with graph theory-based eigenvector centrality analyses revealed a metastable "semiopen" state during the hypothesized "active loop closure" of the EreC protein triggered by subtle conformational changes of an important histidine residue, HIS289, upon macrolide capture, drawing a fascinating parallel to the renowned "Venus flytrap" mechanism.
{"title":"Delving into Macrolide Binding Affinities and Associated Structural Modulations in Erythromycin Esterase C: Insights into the Venus Flytrap Mechanism.","authors":"Abhishek Bera, Pritish Joshi, Niladri Patra","doi":"10.1021/acs.jcim.4c01523","DOIUrl":"10.1021/acs.jcim.4c01523","url":null,"abstract":"<p><p>Since their inception in antibacterial therapy, macrolide-based antibiotics have significantly shaped the evolutionary pathways of pathogenic bacteria, driving them to develop diverse antimicrobial resistance (AMR) mechanisms. Among these, macrolide esterase, commonly referred to as erythromycin esterase, emerged as a critical defense mechanism, enabling bacteria to detoxify macrolides by hydrolyzing the macrolactone ring within the bacterial cell. In this study, we delve into the intricate interactions and conformational dynamics of erythromycin esterase C (EreC), a key member of the Ere enzyme family. We have focused on three FDA-approved and widely prescribed macrolides─erythromycin, clarithromycin, and azithromycin─by employing classical molecular dynamics, absolute binding free energy calculations, and 2D well-tempered metadynamics simulations to explore their interactions with EreC. To estimate the absolute binding free energies, we have used the recently developed and robust \"Streamlined Alchemical Free Energy Perturbation (SAFEP)\" protocol. The results from our molecular dynamics simulations and advanced analyses portrayed the crucial role of hydrophobic interactions within the macrolide binding cleft of EreC, along with the significant influence of the minor lobe in facilitating overall structural fluctuation. In silico alanine scanning identified top three hydrophobic residues, i.e., PHE248, MET333, and PHE344, responsible for macrolide binding inside that cleft. According to the free energy calculations, azithromycin and clarithromycin showed greater binding affinities toward EreC than the parent macrolide erythromycin. Moreover, 2D metadynamics simulations along with graph theory-based eigenvector centrality analyses revealed a metastable \"semiopen\" state during the hypothesized \"active loop closure\" of the EreC protein triggered by subtle conformational changes of an important histidine residue, HIS289, upon macrolide capture, drawing a fascinating parallel to the renowned \"Venus flytrap\" mechanism.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8892-8908"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142680021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09Epub Date: 2024-11-21DOI: 10.1021/acs.jcim.4c01139
Peiyao Li, Lan Hua, Zhechao Ma, Wenbo Hu, Ye Liu, Jun Zhu
Drug discovery and development is a complex and costly process, with a substantial portion of the expense dedicated to characterizing the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of new drug candidates. While the advent of deep learning and molecular graph neural networks (GNNs) has significantly enhanced in silico ADMET prediction capabilities, reliably quantifying prediction uncertainty remains a critical challenge. The performance of GNNs is influenced by both the volume and the quality of the data. Hence, determining the reliability and extent of a prediction is as crucial as achieving accurate predictions, especially for out-of-domain (OoD) compounds. This paper introduces a novel GNN model called conformalized fusion regression (CFR). CFR combined a GNN model with a joint mean-quantile regression loss and an ensemble-based conformal prediction (CP) method. Through rigorous evaluation across various ADMET tasks, we demonstrate that our framework provides accurate predictions, reliable probability calibration, and high-quality prediction intervals, outperforming existing uncertainty quantification methods.
{"title":"Conformalized Graph Learning for Molecular ADMET Property Prediction and Reliable Uncertainty Quantification.","authors":"Peiyao Li, Lan Hua, Zhechao Ma, Wenbo Hu, Ye Liu, Jun Zhu","doi":"10.1021/acs.jcim.4c01139","DOIUrl":"10.1021/acs.jcim.4c01139","url":null,"abstract":"<p><p>Drug discovery and development is a complex and costly process, with a substantial portion of the expense dedicated to characterizing the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of new drug candidates. While the advent of deep learning and molecular graph neural networks (GNNs) has significantly enhanced in silico ADMET prediction capabilities, reliably quantifying prediction uncertainty remains a critical challenge. The performance of GNNs is influenced by both the volume and the quality of the data. Hence, determining the reliability and extent of a prediction is as crucial as achieving accurate predictions, especially for out-of-domain (OoD) compounds. This paper introduces a novel GNN model called conformalized fusion regression (CFR). CFR combined a GNN model with a joint mean-quantile regression loss and an ensemble-based conformal prediction (CP) method. Through rigorous evaluation across various ADMET tasks, we demonstrate that our framework provides accurate predictions, reliable probability calibration, and high-quality prediction intervals, outperforming existing uncertainty quantification methods.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8705-8717"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142685406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09Epub Date: 2024-11-23DOI: 10.1021/acs.jcim.4c01448
Xibing He, Viet Hoang Man, Jie Gao, Junmei Wang
To propose new mechanism-based therapeutics for Alzheimer's disease (AD), it is crucial to study the kinetics and oligomerization/aggregation mechanisms of the hallmark tau proteins, which have various isoforms and are intrinsically disordered. In this study, multiple all-atom (AA) and coarse-grained (CG) force fields (FFs) have been benchmarked on molecular dynamics (MD) simulations of K18 tau (M243-E372), which is a truncated form (130 residues) of full-length tau (441 residues). FF19SB is first excluded because the dynamics are too slow, and the conformations are too stable. All other benchmarked AAFFs (Charmm36m, FF14SB, Gromos54A7, and OPLS-AA) and CGFFs (Martini3 and Sirah2.0) exhibit a trend of shrinking K18 tau into compact structures with the radius of gyration (ROG) around 2.0 nm, which is much smaller than the experimental value of 3.8 nm, within 200 ns of AA-MD or 2000 ns of CG-MD. Gromos54A7, OPLS-AA, and Martini3 shrink much faster than the other FFs. To perform meaningful postanalysis of various properties, we propose a strategy of selecting snapshots with 2.5 < ROG < 4.5 nm, instead of using all sampled snapshots. The calculated chemical shifts of all C, CA, and CB atoms have very good and close root-mean-square error (RMSE) values, while Charmm36m and Sirah2.0 exhibit better chemical shifts of N than other FFs. Comparing the calculated distributions of the distance between the CA atoms of CYS291 and CYS322 with the results of the FRET experiment demonstrates that Charmm36m is a perfect match with the experiment while other FFs exhibit limitations. In summary, Charmm36m is recommended as the best AAFF, and Sirah2.0 is recommended as an excellent CGFF for simulating tau K18.
{"title":"Effects of All-Atom and Coarse-Grained Molecular Mechanics Force Fields on Amyloid Peptide Assembly: The Case of a Tau K18 Monomer.","authors":"Xibing He, Viet Hoang Man, Jie Gao, Junmei Wang","doi":"10.1021/acs.jcim.4c01448","DOIUrl":"10.1021/acs.jcim.4c01448","url":null,"abstract":"<p><p>To propose new mechanism-based therapeutics for Alzheimer's disease (AD), it is crucial to study the kinetics and oligomerization/aggregation mechanisms of the hallmark tau proteins, which have various isoforms and are intrinsically disordered. In this study, multiple all-atom (AA) and coarse-grained (CG) force fields (FFs) have been benchmarked on molecular dynamics (MD) simulations of K18 tau (M243-E372), which is a truncated form (130 residues) of full-length tau (441 residues). FF19SB is first excluded because the dynamics are too slow, and the conformations are too stable. All other benchmarked AAFFs (Charmm36m, FF14SB, Gromos54A7, and OPLS-AA) and CGFFs (Martini3 and Sirah2.0) exhibit a trend of shrinking K18 tau into compact structures with the radius of gyration (ROG) around 2.0 nm, which is much smaller than the experimental value of 3.8 nm, within 200 ns of AA-MD or 2000 ns of CG-MD. Gromos54A7, OPLS-AA, and Martini3 shrink much faster than the other FFs. To perform meaningful postanalysis of various properties, we propose a strategy of selecting snapshots with 2.5 < ROG < 4.5 nm, instead of using all sampled snapshots. The calculated chemical shifts of all C, CA, and CB atoms have very good and close root-mean-square error (RMSE) values, while Charmm36m and Sirah2.0 exhibit better chemical shifts of N than other FFs. Comparing the calculated distributions of the distance between the CA atoms of CYS291 and CYS322 with the results of the FRET experiment demonstrates that Charmm36m is a perfect match with the experiment while other FFs exhibit limitations. In summary, Charmm36m is recommended as the best AAFF, and Sirah2.0 is recommended as an excellent CGFF for simulating tau K18.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8880-8891"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142694876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09Epub Date: 2024-11-15DOI: 10.1021/acs.jcim.4c01516
Nicholas B Smith, Anna L Garden
Global optimization of the structure of atomic nanoparticles is often hampered by the presence of many funnels on the potential energy surface. While broad funnels are readily encountered and easily exploited by the search, narrow funnels are more difficult to locate and explore, presenting a problem if the global minimum is situated in such a funnel. Here, a divide-and-conquer approach is applied to overcome the issue posed by the multifunnel effect using a machine learning approach, without using a priori knowledge of the potential energy surface. This approach begins with a truncated exploration to gather coarse-grained knowledge of the potential energy surface. This is then used to train a machine learning Gaussian mixture model to divide up the potential energy surface into separate regions, with each region then being explored in more detail (or conquered) separately. This scheme was tested on a variety of multifunnel systems and yielded significant improvements to the times taken to locate the global minima of Lennard-Jones (LJ) nanoparticles, LJ75 and LJ104, as well as two metallic systems, Au55 and Pd88. However, difficulties were encountered for LJ98, providing insight into how the scheme could be further improved.
{"title":"A Divide-and-Conquer Approach to Nanoparticle Global Optimisation Using Machine Learning.","authors":"Nicholas B Smith, Anna L Garden","doi":"10.1021/acs.jcim.4c01516","DOIUrl":"10.1021/acs.jcim.4c01516","url":null,"abstract":"<p><p>Global optimization of the structure of atomic nanoparticles is often hampered by the presence of many funnels on the potential energy surface. While broad funnels are readily encountered and easily exploited by the search, narrow funnels are more difficult to locate and explore, presenting a problem if the global minimum is situated in such a funnel. Here, a divide-and-conquer approach is applied to overcome the issue posed by the multifunnel effect using a machine learning approach, without using <i>a priori</i> knowledge of the potential energy surface. This approach begins with a truncated exploration to gather coarse-grained knowledge of the potential energy surface. This is then used to train a machine learning Gaussian mixture model to divide up the potential energy surface into separate regions, with each region then being explored in more detail (or conquered) separately. This scheme was tested on a variety of multifunnel systems and yielded significant improvements to the times taken to locate the global minima of Lennard-Jones (LJ) nanoparticles, LJ<sub>75</sub> and LJ<sub>104</sub>, as well as two metallic systems, Au<sub>55</sub> and Pd<sub>88</sub>. However, difficulties were encountered for LJ<sub>98</sub>, providing insight into how the scheme could be further improved.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8743-8755"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142638030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09Epub Date: 2024-11-22DOI: 10.1021/acs.jcim.4c01769
Thomas J Summers, Difan Zhang, Josiane A Sobrinho, Ana de Bettencourt-Dias, Roger Rousseau, Vassiliki-Alexandra Glezakou, David C Cantu
Ensemble-average sampling of structures from ab initio molecular dynamics (AIMD) simulations can be used to predict theoretical extended X-ray absorption fine structure (EXAFS) signals that closely match experimental spectra. However, AIMD simulations are time-consuming and resource-intensive, particularly for solvated lanthanide ions, which often form multiple nonrigid geometries with high coordination numbers. To accelerate the characterization of lanthanide structures in solution, we employed the Northwest Potential Energy Surface Search Engine (NWPEsSe), an adaptive-learning global optimization algorithm, to efficiently screen first-shell structures. As case studies, we examine two systems: Eu(NO3)3 dissolved in acetonitrile with a terpyridine ligand (terpyNO2), and Nd(NO3)3 dissolved in acetonitrile. The theoretical spectra for structures identified by NWPEsSe were compared to both experimental and AIMD-derived EXAFS spectra. The NWPEsSe algorithm successfully identified the proper solvation structure for both Eu(NO3)3(terpyNO2) and Nd(NO3)(acetonitrile)3, with the calculated EXAFS signals closely matching the experimental spectra for the Eu-ligand complex and showing good similarity for the Nd salt; the better agreement with the ligand-containing structure is attributed to a less dynamic coordination environment due to the rigid ligand. The key advantage of the global optimization algorithm lies in its ability to sample the coordination environment across the potential energy surface and reduce the time required to identify structures from generally a month to within a week. Additionally, this approach is versatile and can be adapted to characterize main-group metal complexes.
从原子分子动力学(ab initio molecular dynamics,AIMD)模拟中对结构进行集合平均采样,可用于预测与实验光谱密切匹配的理论扩展 X 射线吸收精细结构(EXAFS)信号。然而,AIMD 模拟既耗时又耗费资源,尤其是对于溶解的镧系离子,它们通常会形成具有高配位数的多重非刚性几何结构。为了加快溶液中镧系元素结构的表征,我们采用了西北势能面搜索引擎(NWPEsSe)--一种自适应学习的全局优化算法--来高效筛选第一壳结构。作为案例研究,我们考察了两个系统:Eu(NO3)3溶于乙腈,并带有一个特吡啶配体(terpyNO2);Nd(NO3)3溶于乙腈。将 NWPEsSe 确定的结构的理论光谱与实验光谱和 AIMD 导出的 EXAFS 光谱进行了比较。NWPEsSe 算法成功地为 Eu(NO3)3(terpyNO2) 和 Nd(NO3)(acetonitrile)3 确定了适当的溶解结构,计算出的 EXAFS 信号与 Eu 配体复合物的实验光谱非常吻合,与 Nd 盐的实验光谱也非常相似;与含配体结构的吻合度更高,这归因于刚性配体带来的较低动态配位环境。全局优化算法的主要优势在于它能够对整个势能面的配位环境进行采样,并将确定结构所需的时间从通常的一个月缩短到一周之内。此外,这种方法用途广泛,可用于表征主族金属配合物。
{"title":"Pairing a Global Optimization Algorithm with EXAFS to Characterize Lanthanide Structure in Solution.","authors":"Thomas J Summers, Difan Zhang, Josiane A Sobrinho, Ana de Bettencourt-Dias, Roger Rousseau, Vassiliki-Alexandra Glezakou, David C Cantu","doi":"10.1021/acs.jcim.4c01769","DOIUrl":"10.1021/acs.jcim.4c01769","url":null,"abstract":"<p><p><i>Ensemble</i>-average sampling of structures from <i>ab initio</i> molecular dynamics (AIMD) simulations can be used to predict theoretical extended X-ray absorption fine structure (EXAFS) signals that closely match experimental spectra. However, AIMD simulations are time-consuming and resource-intensive, particularly for solvated lanthanide ions, which often form multiple nonrigid geometries with high coordination numbers. To accelerate the characterization of lanthanide structures in solution, we employed the Northwest Potential Energy Surface Search Engine (NWPEsSe), an adaptive-learning global optimization algorithm, to efficiently screen first-shell structures. As case studies, we examine two systems: Eu(NO<sub>3</sub>)<sub>3</sub> dissolved in acetonitrile with a terpyridine ligand (terpyNO<sub>2</sub>), and Nd(NO<sub>3</sub>)<sub>3</sub> dissolved in acetonitrile. The theoretical spectra for structures identified by NWPEsSe were compared to both experimental and AIMD-derived EXAFS spectra. The NWPEsSe algorithm successfully identified the proper solvation structure for both Eu(NO<sub>3</sub>)<sub>3</sub>(terpyNO<sub>2</sub>) and Nd(NO<sub>3</sub>)(acetonitrile)<sub>3</sub>, with the calculated EXAFS signals closely matching the experimental spectra for the Eu-ligand complex and showing good similarity for the Nd salt; the better agreement with the ligand-containing structure is attributed to a less dynamic coordination environment due to the rigid ligand. The key advantage of the global optimization algorithm lies in its ability to sample the coordination environment across the potential energy surface and reduce the time required to identify structures from generally a month to within a week. Additionally, this approach is versatile and can be adapted to characterize main-group metal complexes.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8926-8936"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142685411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}