Pub Date : 2024-06-22DOI: 10.1021/acs.jcim.3c01905
Fatma Cankara, Simge Senyuz, Ahenk Zeynep Sayin, Attila Gursoy, Ozlem Keskin
Proteins interact through their interfaces, and dysfunction of protein-protein interactions (PPIs) has been associated with various diseases. Therefore, investigating the properties of the drug-modulated PPIs and interface-targeting drugs is critical. Here, we present a curated large data set for drug-like molecules in protein interfaces. We further introduce DiPPI (Drugs in Protein-Protein Interfaces), a two-module web site to facilitate the search for such molecules and their properties by exploiting our data set in drug repurposing studies. In the interface module of the web site, we present several properties, of interfaces, such as amino acid properties, hotspots, evolutionary conservation of drug-binding amino acids, and post-translational modifications of these residues. On the drug-like molecule side, we list drug-like small molecules and FDA-approved drugs from various databases and highlight those that bind to the interfaces. We further clustered the drugs based on their molecular fingerprints to confine the search for an alternative drug to a smaller space. Drug properties, including Lipinski's rules and various molecular descriptors, are also calculated and made available on the web site to guide the selection of drug molecules. Our data set contains 534,203 interfaces for 98,632 protein structures, of which 55,135 are detected to bind to a drug-like molecule. 2214 drug-like molecules are deposited on our web site, among which 335 are FDA-approved. DiPPI provides users with an easy-to-follow scheme for drug repurposing studies through its well-curated and clustered interface and drug data and is freely available at http://interactome.ku.edu.tr:8501.
{"title":"DiPPI: A Curated Data Set for Drug-like Molecules in Protein-Protein Interfaces.","authors":"Fatma Cankara, Simge Senyuz, Ahenk Zeynep Sayin, Attila Gursoy, Ozlem Keskin","doi":"10.1021/acs.jcim.3c01905","DOIUrl":"https://doi.org/10.1021/acs.jcim.3c01905","url":null,"abstract":"<p><p>Proteins interact through their interfaces, and dysfunction of protein-protein interactions (PPIs) has been associated with various diseases. Therefore, investigating the properties of the drug-modulated PPIs and interface-targeting drugs is critical. Here, we present a curated large data set for drug-like molecules in protein interfaces. We further introduce DiPPI (Drugs in Protein-Protein Interfaces), a two-module web site to facilitate the search for such molecules and their properties by exploiting our data set in drug repurposing studies. In the interface module of the web site, we present several properties, of interfaces, such as amino acid properties, hotspots, evolutionary conservation of drug-binding amino acids, and post-translational modifications of these residues. On the drug-like molecule side, we list drug-like small molecules and FDA-approved drugs from various databases and highlight those that bind to the interfaces. We further clustered the drugs based on their molecular fingerprints to confine the search for an alternative drug to a smaller space. Drug properties, including Lipinski's rules and various molecular descriptors, are also calculated and made available on the web site to guide the selection of drug molecules. Our data set contains 534,203 interfaces for 98,632 protein structures, of which 55,135 are detected to bind to a drug-like molecule. 2214 drug-like molecules are deposited on our web site, among which 335 are FDA-approved. DiPPI provides users with an easy-to-follow scheme for drug repurposing studies through its well-curated and clustered interface and drug data and is freely available at http://interactome.ku.edu.tr:8501.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141440058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-22DOI: 10.1021/acs.jcim.3c02007
Zakaria L Dahmani, Ana Ligia Scott, Catherine Vénien-Bryan, David Perahia, Mauricio G S Costa
Molecular Dynamics Flexible Fitting (MDFF) is a widely used tool to refine high-resolution structures into cryo-EM density maps. Despite many successful applications, MDFF is still limited by its high computational cost, overfitting, accuracy, and performance issues due to entrapment within wrong local minima. Modern ensemble-based MDFF tools have generated promising results in the past decade. In line with these studies, we present MDFF_NM, a stochastic hybrid flexible fitting algorithm combining Normal Mode Analysis (NMA) and simulation-based flexible fitting. Initial tests reveal that, besides accelerating the fitting process, MDFF_NM increases the diversity of fitting routes leading to the target, uncovering ensembles of conformations in closer agreement with experimental data. The potential integration of MDFF_NM with other existing methods and integrative modeling pipelines is also discussed.
{"title":"MDFF_NM: Improved Molecular Dynamics Flexible Fitting into Cryo-EM Density Maps with a Multireplica Normal Mode-Based Search.","authors":"Zakaria L Dahmani, Ana Ligia Scott, Catherine Vénien-Bryan, David Perahia, Mauricio G S Costa","doi":"10.1021/acs.jcim.3c02007","DOIUrl":"https://doi.org/10.1021/acs.jcim.3c02007","url":null,"abstract":"<p><p>Molecular Dynamics Flexible Fitting (MDFF) is a widely used tool to refine high-resolution structures into cryo-EM density maps. Despite many successful applications, MDFF is still limited by its high computational cost, overfitting, accuracy, and performance issues due to entrapment within wrong local minima. Modern ensemble-based MDFF tools have generated promising results in the past decade. In line with these studies, we present MDFF_NM, a stochastic hybrid flexible fitting algorithm combining Normal Mode Analysis (NMA) and simulation-based flexible fitting. Initial tests reveal that, besides accelerating the fitting process, MDFF_NM increases the diversity of fitting routes leading to the target, uncovering ensembles of conformations in closer agreement with experimental data. The potential integration of MDFF_NM with other existing methods and integrative modeling pipelines is also discussed.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141440059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-21DOI: 10.1021/acs.jcim.4c00578
Muhammad Haseeb, Yang Seon Choi, Mahesh Chandra Patra, Uisuk Jeong, Wang Hee Lee, Naila Qayyum, Hongjoon Choi, Wook Kim, Sangdun Choi
The aberrant secretion of proinflammatory cytokines by immune cells is the principal cause of inflammatory diseases, such as systemic lupus erythematosus and rheumatoid arthritis. Toll-like receptor 7 (TLR7) and TLR9, sequestered to the endosomal compartment of dendritic cells and macrophages, are closely associated with the initiation and progression of these diseases. Therefore, the development of drugs targeting dysregulated endosomal TLRs is imperative to mitigate systemic inflammation. Here, we applied the principles of computer-aided drug discovery to identify a novel low-molecular-weight compound, TLR inhibitory compound 10 (TIC10), and its potent derivative (TIC10g), which demonstrated dual inhibition of TLR7 and TLR9 signaling pathways. Compared to TIC10, TIC10g exhibited a more pronounced inhibition of the TLR7- and TLR9-mediated secretion of the proinflammatory cytokine tumor necrosis factor-α in a mouse macrophage cell line and mouse bone marrow dendritic cells in a concentration-dependent manner. While TIC10g slightly prevented TLR3 and TLR8 activation, it had no impact on cell surface TLRs (TLR1/2, TLR2/6, TLR4, or TLR5), indicating its selectivity for TLR7 and TLR9. Additionally, mechanistic studies suggested that TIC10g interfered with TLR9 activation by CpG DNA and suppressed downstream pathways by directly binding to TLR9. Western blot analysis revealed that TIC10g downregulated the phosphorylation of the p65 subunit of nuclear factor κ-light-chain-enhancer of activated B cells (NF-κB) and mitogen-activated protein kinases (MAPKs), including extracellular-signal-regulated kinase, p38-MAPK, and c-Jun N-terminal kinase. These findings indicate that the novel ligand, TIC10g, is a specific dual inhibitor of endosomal TLRs (TLR7 and TLR9), disrupting MAPK- and NF-κB-mediated proinflammatory gene expression.
{"title":"Discovery of Novel Small Molecule Dual Inhibitor Targeting Toll-Like Receptors 7 and 9.","authors":"Muhammad Haseeb, Yang Seon Choi, Mahesh Chandra Patra, Uisuk Jeong, Wang Hee Lee, Naila Qayyum, Hongjoon Choi, Wook Kim, Sangdun Choi","doi":"10.1021/acs.jcim.4c00578","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00578","url":null,"abstract":"<p><p>The aberrant secretion of proinflammatory cytokines by immune cells is the principal cause of inflammatory diseases, such as systemic lupus erythematosus and rheumatoid arthritis. Toll-like receptor 7 (TLR7) and TLR9, sequestered to the endosomal compartment of dendritic cells and macrophages, are closely associated with the initiation and progression of these diseases. Therefore, the development of drugs targeting dysregulated endosomal TLRs is imperative to mitigate systemic inflammation. Here, we applied the principles of computer-aided drug discovery to identify a novel low-molecular-weight compound, TLR inhibitory compound 10 (TIC10), and its potent derivative (TIC10g), which demonstrated dual inhibition of TLR7 and TLR9 signaling pathways. Compared to TIC10, TIC10g exhibited a more pronounced inhibition of the TLR7- and TLR9-mediated secretion of the proinflammatory cytokine tumor necrosis factor-α in a mouse macrophage cell line and mouse bone marrow dendritic cells in a concentration-dependent manner. While TIC10g slightly prevented TLR3 and TLR8 activation, it had no impact on cell surface TLRs (TLR1/2, TLR2/6, TLR4, or TLR5), indicating its selectivity for TLR7 and TLR9. Additionally, mechanistic studies suggested that TIC10g interfered with TLR9 activation by CpG DNA and suppressed downstream pathways by directly binding to TLR9. Western blot analysis revealed that TIC10g downregulated the phosphorylation of the p65 subunit of nuclear factor κ-light-chain-enhancer of activated B cells (NF-κB) and mitogen-activated protein kinases (MAPKs), including extracellular-signal-regulated kinase, p38-MAPK, and c-Jun N-terminal kinase. These findings indicate that the novel ligand, TIC10g, is a specific dual inhibitor of endosomal TLRs (TLR7 and TLR9), disrupting MAPK- and NF-κB-mediated proinflammatory gene expression.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141430886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Combination therapy is an important direction of continuous exploration in the field of medicine, with the core goals of improving treatment efficacy, reducing adverse reactions, and optimizing clinical outcomes. Machine learning technology holds great promise in improving the prediction of drug synergy combinations. However, most studies focus on single disease-oriented collaborative predictive models or involve excessive feature categories, making it challenging to predict the majority of new drugs. To address these challenges, the DrugSK comprehensive model was developed, which utilizes SMILES-BERT to extract structural information from 3492 drugs and trains on reactions from 48,756 drug combinations. DrugSK is an integrated learning model capable of predicting interactions among various drug categories. First, the primary learner is trained from the initial data set. Random forest, support vector machine, and XGboost model are selected as primary learners and logistic regression as secondary learners. A new data set is then "generated" to train level 2 learners, which can be thought of as a prediction for each model. Finally, the results are filtered using logistic regression. Furthermore, the combination of the new antibacterial drug Drafloxacin with other antibacterial agents was tested. The synergistic effect of Drafloxacin and Isavuconazonium in the fight against Candida albicans has been confirmed, providing enlightenment for the clinical treatment of skin infection. DrugSK's prediction is accurate in practical application and can also predict the probability of the outcome. In addition, the tendency of Drafloxacin and antifungal drugs to be synergistic was found. The development of DrugSK will provide a new blueprint for predicting drug combination synergies.
{"title":"DrugSK: A Stacked Ensemble Learning Framework for Predicting Drug Combinations of Multiple Diseases.","authors":"Siqi Chen, Nan Gao, Chunzhi Li, Fei Zhai, Xiwei Jiang, Peng Zhang, Jibin Guan, Kefeng Li, Rongwu Xiang, Guixia Ling","doi":"10.1021/acs.jcim.4c00296","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00296","url":null,"abstract":"<p><p>Combination therapy is an important direction of continuous exploration in the field of medicine, with the core goals of improving treatment efficacy, reducing adverse reactions, and optimizing clinical outcomes. Machine learning technology holds great promise in improving the prediction of drug synergy combinations. However, most studies focus on single disease-oriented collaborative predictive models or involve excessive feature categories, making it challenging to predict the majority of new drugs. To address these challenges, the DrugSK comprehensive model was developed, which utilizes SMILES-BERT to extract structural information from 3492 drugs and trains on reactions from 48,756 drug combinations. DrugSK is an integrated learning model capable of predicting interactions among various drug categories. First, the primary learner is trained from the initial data set. Random forest, support vector machine, and XGboost model are selected as primary learners and logistic regression as secondary learners. A new data set is then \"generated\" to train level 2 learners, which can be thought of as a prediction for each model. Finally, the results are filtered using logistic regression. Furthermore, the combination of the new antibacterial drug Drafloxacin with other antibacterial agents was tested. The synergistic effect of Drafloxacin and Isavuconazonium in the fight against <i>Candida albicans</i> has been confirmed, providing enlightenment for the clinical treatment of skin infection. DrugSK's prediction is accurate in practical application and can also predict the probability of the outcome. In addition, the tendency of Drafloxacin and antifungal drugs to be synergistic was found. The development of DrugSK will provide a new blueprint for predicting drug combination synergies.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141430887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1021/acs.jcim.4c00696
David Ricardo Figueroa Blanco, Pietro Vidossich, Marco De Vivo
DNA polymerases (Pols) add incoming nucleotides (deoxyribonucleoside triphosphate (dNTPs)) to growing DNA strands, a crucial step for DNA synthesis. The insertion of correct (vs incorrect) nucleotides relates to Pols' fidelity, which defines Pols' ability to faithfully replicate DNA strands in a template-dependent manner. We and others have demonstrated that reactant alignment and correct base pairing at the Pols catalytic site are crucial structural features to fidelity. Here, we first used equilibrium molecular simulations to demonstrate that the local dynamics at the protein-DNA interface in the proximity of the catalytic site is different when correct vs incorrect dNTPs are bound to polymerase β (Pol β). Formation and dynamic stability of specific interatomic interactions around the incoming nucleotide influence the overall binding site architecture. This explains why certain Pols' mutants can affect the local catalytic environment and influence the selection of correct vs incorrect nucleotides. In particular, this is here demonstrated by analyzing the interaction network formed by the residue R283, whose mutant R283A has an experimentally measured lower capacity of differentiating correct (G:dCTP) vs incorrect (G:dATP) base pairing in Pol β. We also used alchemical free-energy calculations to quantify the G:dCTP →G:dATP transformation in Pol β wild-type and mutant R283A. These results correlate well with the experimental trend, thus corroborating our mechanistic insights. Sequence and structural comparisons with other Pols from the same family suggest that these findings may also be valid in similar enzymes.
DNA 聚合酶(Pols)将输入的核苷酸(三磷酸脱氧核苷(dNTPs))添加到生长的 DNA 链中,这是 DNA 合成的关键步骤。正确(与不正确)核苷酸的插入与 Pols 的保真度有关,它决定了 Pols 以依赖模板的方式忠实复制 DNA 链的能力。我们和其他人已经证明,Pols 催化位点的反应物排列和正确的碱基配对是保真度的关键结构特征。在这里,我们首次利用平衡分子模拟证明,当正确与不正确的 dNTP 与聚合酶 β(Pol β)结合时,催化位点附近蛋白质-DNA 界面的局部动力学是不同的。输入核苷酸周围特定原子间相互作用的形成和动态稳定性会影响整个结合位点的结构。这就解释了为什么某些 Pols 突变体会影响局部催化环境,并影响正确与错误核苷酸的选择。我们还利用炼金术自由能计算来量化 Pol β 野生型和突变体 R283A 中 G:dCTP →G:dATP 的转化。这些结果与实验趋势密切相关,从而证实了我们的机理见解。与同族其他 Pols 的序列和结构比较表明,这些发现可能也适用于类似的酶。
{"title":"Correct Nucleotide Selection Is Confined at the Binding Site of Polymerase Enzymes.","authors":"David Ricardo Figueroa Blanco, Pietro Vidossich, Marco De Vivo","doi":"10.1021/acs.jcim.4c00696","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00696","url":null,"abstract":"<p><p>DNA polymerases (Pols) add incoming nucleotides (deoxyribonucleoside triphosphate (dNTPs)) to growing DNA strands, a crucial step for DNA synthesis. The insertion of correct (vs incorrect) nucleotides relates to Pols' fidelity, which defines Pols' ability to faithfully replicate DNA strands in a template-dependent manner. We and others have demonstrated that reactant alignment and correct base pairing at the Pols catalytic site are crucial structural features to fidelity. Here, we first used equilibrium molecular simulations to demonstrate that the local dynamics at the protein-DNA interface in the proximity of the catalytic site is different when correct vs incorrect dNTPs are bound to polymerase β (Pol β). Formation and dynamic stability of specific interatomic interactions around the incoming nucleotide influence the overall binding site architecture. This explains why certain Pols' mutants can affect the local catalytic environment and influence the selection of correct vs incorrect nucleotides. In particular, this is here demonstrated by analyzing the interaction network formed by the residue R283, whose mutant R283A has an experimentally measured lower capacity of differentiating correct (G:dCTP) vs incorrect (G:dATP) base pairing in Pol β. We also used alchemical free-energy calculations to quantify the G:dCTP →G:dATP transformation in Pol β wild-type and mutant R283A. These results correlate well with the experimental trend, thus corroborating our mechanistic insights. Sequence and structural comparisons with other Pols from the same family suggest that these findings may also be valid in similar enzymes.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141430885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1021/acs.jcim.4c00359
Matthew Glace, Roudabeh S Moazeni-Pourasil, Daniel W Cook, Thomas D Roper
In this work, a new model with broad utility for quantitative spectroscopy development is reported. A primary objective of this work is to create a novel modeling procedure that may allow for higher automation of the model development process. The fundamental concept is simple yet powerful even for complex spectra and is employed with no additional preprocessing. This approach is applicable for several types of spectroscopic data to develop regression models that have similar or greater quality than the current methods. The key modeling steps are a matrix transformation and subsequent feature selection process that are collectively referred to as iterative regression of corrective baselines (IRCB). The transformed matrix (Xtransform) is a linearized form of the original X data set. Features from Xtransform that are predictive of Y can be ranked and selected by ordinary least-squares regression. The best features (rows of Xtransform) are linear depictions of Y that can be utilized to develop regression models with several machine learning models. The IRCB workflow is first detailed by using a case study of Fourier transform infrared (FTIR) spectroscopy for prepared solutions of a three-component mixture. Next, IRCB is applied and compared to benchmark results for the 2006 "Chimiométrie" near-infrared spectroscopy (NIR) soil composition challenge and Raman measurements of a simulated nuclear waste slurry.
在这项工作中,报告了一个在定量光谱开发方面具有广泛用途的新模型。这项工作的主要目的是创建一个新颖的建模程序,从而提高模型开发过程的自动化程度。其基本概念简单而强大,即使对于复杂的光谱也是如此,而且无需额外的预处理。这种方法适用于多种类型的光谱数据,可开发出质量与现有方法相似或更高的回归模型。建模的关键步骤是矩阵变换和随后的特征选择过程,统称为修正基线迭代回归(IRCB)。转换后的矩阵(Xtransform)是原始 X 数据集的线性化形式。可以通过普通最小二乘回归对 Xtransform 中可预测 Y 的特征进行排序和选择。最佳特征(Xtransform 的行)是 Y 的线性描述,可用于使用多个机器学习模型开发回归模型。IRCB 工作流程首先通过对三组分混合物配制溶液的傅立叶变换红外光谱(FTIR)案例研究进行详细说明。接下来,应用 IRCB 并与 2006 年 "Chimiométrie "近红外光谱(NIR)土壤成分挑战赛的基准结果和模拟核废料浆液的拉曼测量结果进行比较。
{"title":"Iterative Regression of Corrective Baselines (IRCB): A New Model for Quantitative Spectroscopy.","authors":"Matthew Glace, Roudabeh S Moazeni-Pourasil, Daniel W Cook, Thomas D Roper","doi":"10.1021/acs.jcim.4c00359","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00359","url":null,"abstract":"<p><p>In this work, a new model with broad utility for quantitative spectroscopy development is reported. A primary objective of this work is to create a novel modeling procedure that may allow for higher automation of the model development process. The fundamental concept is simple yet powerful even for complex spectra and is employed with no additional preprocessing. This approach is applicable for several types of spectroscopic data to develop regression models that have similar or greater quality than the current methods. The key modeling steps are a matrix transformation and subsequent feature selection process that are collectively referred to as iterative regression of corrective baselines (IRCB). The transformed matrix (<b>X</b><sub><b>transform</b></sub>) is a linearized form of the original <b>X</b> data set. Features from <b>X<sub>t</sub></b><sub><b>ransform</b></sub> that are predictive of <b>Y</b> can be ranked and selected by ordinary least-squares regression. The best features (rows of <b>X<sub>t</sub></b><sub><b>ransform</b></sub>) are linear depictions of <b>Y</b> that can be utilized to develop regression models with several machine learning models. The IRCB workflow is first detailed by using a case study of Fourier transform infrared (FTIR) spectroscopy for prepared solutions of a three-component mixture. Next, IRCB is applied and compared to benchmark results for the 2006 \"Chimiométrie\" near-infrared spectroscopy (NIR) soil composition challenge and Raman measurements of a simulated nuclear waste slurry.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141425653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1021/acs.jcim.4c00417
David F Hahn, Vytautas Gapsys, Bert L de Groot, David L Mobley, Gary Tresadern
In drug discovery, the in silico prediction of binding affinity is one of the major means to prioritize compounds for synthesis. Alchemical relative binding free energy (RBFE) calculations based on molecular dynamics (MD) simulations are nowadays a popular approach for the accurate affinity ranking of compounds. MD simulations rely on empirical force field parameters, which strongly influence the accuracy of the predicted affinities. Here, we evaluate the ability of six different small-molecule force fields to predict experimental protein-ligand binding affinities in RBFE calculations on a set of 598 ligands and 22 protein targets. The public force fields OpenFF Parsley and Sage, GAFF, and CGenFF show comparable accuracy, while OPLS3e is significantly more accurate. However, a consensus approach using Sage, GAFF, and CGenFF leads to accuracy comparable to OPLS3e. While Parsley and Sage are performing comparably based on aggregated statistics across the whole dataset, there are differences in terms of outliers. Analysis of the force field reveals that improved parameters lead to significant improvement in the accuracy of affinity predictions on subsets of the dataset involving those parameters. Lower accuracy can not only be attributed to the force field parameters but is also dependent on input preparation and sampling convergence of the calculations. Especially large perturbations and nonconverged simulations lead to less accurate predictions. The input structures, Gromacs force field files, as well as the analysis Python notebooks are available on GitHub.
{"title":"Current State of Open Source Force Fields in Protein-Ligand Binding Affinity Predictions.","authors":"David F Hahn, Vytautas Gapsys, Bert L de Groot, David L Mobley, Gary Tresadern","doi":"10.1021/acs.jcim.4c00417","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00417","url":null,"abstract":"<p><p>In drug discovery, the in silico prediction of binding affinity is one of the major means to prioritize compounds for synthesis. Alchemical relative binding free energy (RBFE) calculations based on molecular dynamics (MD) simulations are nowadays a popular approach for the accurate affinity ranking of compounds. MD simulations rely on empirical force field parameters, which strongly influence the accuracy of the predicted affinities. Here, we evaluate the ability of six different small-molecule force fields to predict experimental protein-ligand binding affinities in RBFE calculations on a set of 598 ligands and 22 protein targets. The public force fields OpenFF Parsley and Sage, GAFF, and CGenFF show comparable accuracy, while OPLS3e is significantly more accurate. However, a consensus approach using Sage, GAFF, and CGenFF leads to accuracy comparable to OPLS3e. While Parsley and Sage are performing comparably based on aggregated statistics across the whole dataset, there are differences in terms of outliers. Analysis of the force field reveals that improved parameters lead to significant improvement in the accuracy of affinity predictions on subsets of the dataset involving those parameters. Lower accuracy can not only be attributed to the force field parameters but is also dependent on input preparation and sampling convergence of the calculations. Especially large perturbations and nonconverged simulations lead to less accurate predictions. The input structures, Gromacs force field files, as well as the analysis Python notebooks are available on GitHub.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141416610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1021/acs.jcim.4c00540
Emanuele Falbo, Antonio Lavecchia
In this study, we introduce a novel approach to enhance the accuracy of molecular dynamics simulations by refining the force fields (FFs) through a combination of transferable parameters and molecule-specific characteristics derived from quantum mechanical (QM) calculations. Traditional FFs often prioritize generality over precision, leading to limitations in the accuracy of accurately capturing intra- and intermolecular interactions. To address this, we present an open-source toolkit, called HessFit, designed to integrate QM-derived bonded parameters and atomic charges into existing FFs. In combination with bond, angle, torsional, and nonbonded parameters derivation, HessFit can easily extract multiple barrier terms of dihedrals from QM Hessian and gradient or return all terms through a fitting procedure scheme of QM potential energy surface. We showcase the effectiveness of HessFit through comprehensive evaluations of vibrational properties across a diverse set of small molecules and demonstrate that experimental results support its ability in predicting thermodynamic properties of organic molecules compared to previous state-of-the-art approaches. We further explore its application to Zn2+ metalloprotein models, which are hard systems to treat with automatic approaches. Our results demonstrate that HessFit parameters compete with GAFF2 and OPLS parameters to describing small organic molecules, and its feasibility is also comparable to current FFs used to modeling nonstandard residues in Zn proteins for molecular dynamics simulations. The effectiveness of the HessFit protocol makes it a valuable tool for deriving or extending force field parameters for novel compounds in several molecular modeling applications.
{"title":"HessFit: A Toolkit to Derive Automated Force Fields from Quantum Mechanical Information.","authors":"Emanuele Falbo, Antonio Lavecchia","doi":"10.1021/acs.jcim.4c00540","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00540","url":null,"abstract":"<p><p>In this study, we introduce a novel approach to enhance the accuracy of molecular dynamics simulations by refining the force fields (FFs) through a combination of transferable parameters and molecule-specific characteristics derived from quantum mechanical (QM) calculations. Traditional FFs often prioritize generality over precision, leading to limitations in the accuracy of accurately capturing intra- and intermolecular interactions. To address this, we present an open-source toolkit, called HessFit, designed to integrate QM-derived bonded parameters and atomic charges into existing FFs. In combination with bond, angle, torsional, and nonbonded parameters derivation, HessFit can easily extract multiple barrier terms of dihedrals from QM Hessian and gradient or return all terms through a fitting procedure scheme of QM potential energy surface. We showcase the effectiveness of HessFit through comprehensive evaluations of vibrational properties across a diverse set of small molecules and demonstrate that experimental results support its ability in predicting thermodynamic properties of organic molecules compared to previous state-of-the-art approaches. We further explore its application to Zn<sup>2+</sup> metalloprotein models, which are hard systems to treat with automatic approaches. Our results demonstrate that HessFit parameters compete with GAFF2 and OPLS parameters to describing small organic molecules, and its feasibility is also comparable to current FFs used to modeling nonstandard residues in Zn proteins for molecular dynamics simulations. The effectiveness of the HessFit protocol makes it a valuable tool for deriving or extending force field parameters for novel compounds in several molecular modeling applications.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141425652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1021/acs.jcim.4c00310
Amit Ranjan, Adam Bess, Chris Alvin, Supratik Mukhopadhyay
Drug-target affinity (DTA) prediction is an important task in the early stages of drug discovery. Traditional biological approaches are time-consuming, effort-consuming, and resource-consuming due to the large size of genomic and chemical spaces. Computational approaches using machine learning have emerged to narrow down the drug candidate search space. However, most of these prediction models focus on single feature encoding of drugs and targets, ignoring the importance of integrating different dimensions of these features. We propose a deep learning-based approach called Multi-Dimensional Fusion for Drug Target Affinity Prediction (MDF-DTA) incorporating different dimensional features. Our model fuses 1D, 2D, and 3D representations obtained from different pretrained models for both drugs and targets. We evaluated MDF-DTA on two standard benchmark data sets: DAVIS and KIBA. Experimental results show that MDF-DTA outperforms many state-of-the-art techniques in the DTA task across both data sets. Through ablation studies and performance evaluation metrics, we evaluate the importance of individual representations and the impact of each representation on MDF-DTA.
{"title":"MDF-DTA: A Multi-Dimensional Fusion Approach for Drug-Target Binding Affinity Prediction.","authors":"Amit Ranjan, Adam Bess, Chris Alvin, Supratik Mukhopadhyay","doi":"10.1021/acs.jcim.4c00310","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00310","url":null,"abstract":"<p><p>Drug-target affinity (DTA) prediction is an important task in the early stages of drug discovery. Traditional biological approaches are time-consuming, effort-consuming, and resource-consuming due to the large size of genomic and chemical spaces. Computational approaches using machine learning have emerged to narrow down the drug candidate search space. However, most of these prediction models focus on single feature encoding of drugs and targets, ignoring the importance of integrating different dimensions of these features. We propose a deep learning-based approach called Multi-Dimensional Fusion for Drug Target Affinity Prediction (MDF-DTA) incorporating different dimensional features. Our model fuses 1D, 2D, and 3D representations obtained from different pretrained models for both drugs and targets. We evaluated MDF-DTA on two standard benchmark data sets: DAVIS and KIBA. Experimental results show that MDF-DTA outperforms many state-of-the-art techniques in the DTA task across both data sets. Through ablation studies and performance evaluation metrics, we evaluate the importance of individual representations and the impact of each representation on MDF-DTA.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141416614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1021/acs.jcim.4c00404
Aditya Kulkarni, Michael Bortz, Karl-Heinz Küfer, Maximilian Kohns, Hans Hasse
Many widely used molecular models of water are built from a single Lennard-Jones site on which three point charges are positioned, one negative and two positive ones. Models from that class, denoted LJ3PC here, are computationally efficient, but it is well known that they cannot represent all relevant properties of water simultaneously with good accuracy. Despite the importance of the LJ3PC water model class, its inherent limitations in simultaneously describing different properties of water have never been studied systematically. This task can only be solved by multicriteria optimization (MCO). However, due to its computational cost, applying MCO to molecular models is a formidable task. We have recently introduced the reduced units method (RUM) to cope with this problem. In the present work, we apply the RUM in a hierarchical scheme to optimize LJ3PC water models taking into account five objectives: the representation of vapor pressure, saturated liquid density, self-diffusion coefficient, shear viscosity, and relative permittivity. Of the six parameters of the LJ3PC models, five were varied; only the H-O-H bond angle, which is usually chosen based on physical arguments, was kept constant. Our hierarchical RUM-based approach yields a Pareto set that contains attractive new water models. Furthermore, the results give an idea of what can be achieved by molecular modeling of water with models from the LJ3PC class.
许多广泛使用的水分子模型都是由单个伦纳德-琼斯位点建立的,该位点上有三个点电荷,一个负电荷和两个正电荷。该类模型(此处称为 LJ3PC)计算效率高,但众所周知,它们无法同时准确地表示水的所有相关特性。尽管 LJ3PC 水模型类别非常重要,但从未对其在同时描述水的不同特性方面的固有局限性进行过系统研究。这项任务只能通过多标准优化(MCO)来解决。然而,由于其计算成本,将 MCO 应用于分子模型是一项艰巨的任务。我们最近引入了简化单元法(RUM)来解决这一问题。在本研究中,我们采用分层方案应用 RUM 优化 LJ3PC 水模型,同时考虑到五个目标:蒸汽压、饱和液体密度、自扩散系数、剪切粘度和相对介电常数的表示。在 LJ3PC 模型的六个参数中,有五个参数是可变的;只有 H-O-H 键角保持不变,该角通常是根据物理论据选择的。我们基于 RUM 的分层方法产生了一个帕累托集合,其中包含了极具吸引力的新水模型。此外,研究结果还展示了利用 LJ3PC 类模型建立水分子模型所能取得的成果。
{"title":"Hierarchical Multicriteria Optimization of Molecular Models of Water.","authors":"Aditya Kulkarni, Michael Bortz, Karl-Heinz Küfer, Maximilian Kohns, Hans Hasse","doi":"10.1021/acs.jcim.4c00404","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00404","url":null,"abstract":"<p><p>Many widely used molecular models of water are built from a single Lennard-Jones site on which three point charges are positioned, one negative and two positive ones. Models from that class, denoted LJ3PC here, are computationally efficient, but it is well known that they cannot represent all relevant properties of water simultaneously with good accuracy. Despite the importance of the LJ3PC water model class, its inherent limitations in simultaneously describing different properties of water have never been studied systematically. This task can only be solved by multicriteria optimization (MCO). However, due to its computational cost, applying MCO to molecular models is a formidable task. We have recently introduced the reduced units method (RUM) to cope with this problem. In the present work, we apply the RUM in a hierarchical scheme to optimize LJ3PC water models taking into account five objectives: the representation of vapor pressure, saturated liquid density, self-diffusion coefficient, shear viscosity, and relative permittivity. Of the six parameters of the LJ3PC models, five were varied; only the H-O-H bond angle, which is usually chosen based on physical arguments, was kept constant. Our hierarchical RUM-based approach yields a Pareto set that contains attractive new water models. Furthermore, the results give an idea of what can be achieved by molecular modeling of water with models from the LJ3PC class.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141416612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}