首页 > 最新文献

Journal of Chemical Information and Modeling 最新文献

英文 中文
CENsible: Interpretable Insights into Small-Molecule Binding with Context Explanation Networks CENsible:利用上下文解释网络对小分子结合的可解释性洞察。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-06-07 DOI: 10.1021/acs.jcim.4c00825
Roshni Bhatt, David Ryan Koes and Jacob D. Durrant*, 

We present a novel and interpretable approach for assessing small-molecule binding using context explanation networks. Given the specific structure of a protein/ligand complex, our CENsible scoring function uses a deep convolutional neural network to predict the contributions of precalculated terms to the overall binding affinity. We show that CENsible can effectively distinguish active vs inactive compounds for many systems. Its primary benefit over related machine-learning scoring functions, however, is that it retains interpretability, allowing researchers to identify the contribution of each precalculated term to the final affinity prediction, with implications for subsequent lead optimization.

我们提出了一种利用上下文解释网络评估小分子结合的新颖、可解释的方法。鉴于蛋白质/配体复合物的特定结构,我们的 CENsible 评分功能使用深度卷积神经网络来预测预计算项对整体结合亲和力的贡献。我们的研究表明,CENsible 可以有效区分许多系统中的活性与非活性化合物。不过,与相关的机器学习评分功能相比,它的主要优势在于保留了可解释性,使研究人员能够确定每个预计算项对最终亲和力预测的贡献,从而对后续的先导物优化产生影响。
{"title":"CENsible: Interpretable Insights into Small-Molecule Binding with Context Explanation Networks","authors":"Roshni Bhatt,&nbsp;David Ryan Koes and Jacob D. Durrant*,&nbsp;","doi":"10.1021/acs.jcim.4c00825","DOIUrl":"10.1021/acs.jcim.4c00825","url":null,"abstract":"<p >We present a novel and interpretable approach for assessing small-molecule binding using context explanation networks. Given the specific structure of a protein/ligand complex, our CENsible scoring function uses a deep convolutional neural network to predict the contributions of precalculated terms to the overall binding affinity. We show that CENsible can effectively distinguish active vs inactive compounds for many systems. Its primary benefit over related machine-learning scoring functions, however, is that it retains interpretability, allowing researchers to identify the contribution of each precalculated term to the final affinity prediction, with implications for subsequent lead optimization.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11200255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141282327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Highly Mobile Membrane Mimetic Model for Investigating Protein–Cholesterol Interactions 用于研究蛋白质与胆固醇相互作用的改进型高流动膜模拟模型
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-06-06 DOI: 10.1021/acs.jcim.4c00619
Muyun Lihan,  and , Emad Tajkhorshid*, 

Cholesterol (CHL) plays an integral role in modulating the function and activity of various mammalian membrane proteins. Due to the slow dynamics of lipids, conventional computational studies of protein–CHL interactions rely on either long-time scale atomistic simulations or coarse-grained approximations to sample the process. A highly mobile membrane mimetic (HMMM) has been developed to enhance lipid diffusion and thus used to facilitate the investigation of lipid interactions with peripheral membrane proteins and, with customized in silico solvents to replace phospholipid tails, with integral membrane proteins. Here, we report an updated HMMM model that is able to include CHL, a nonphospholipid component of the membrane, henceforth called HMMM-CHL. To this end, we had to optimize the effect of the customized solvents on CHL behavior in the membrane. Furthermore, the new solvent is compatible with simulations using force-based switching protocols. In the HMMM-CHL, both improved CHL dynamics and accelerated lipid diffusion are integrated. To test the updated model, we have applied it to the characterization of protein–CHL interactions in two membrane protein systems, the human β2-adrenergic receptor (β2AR) and the mitochondrial voltage-dependent anion channel 1 (VDAC-1). Our HMMM-CHL simulations successfully identified CHL binding sites and captured detailed CHL interactions in excellent consistency with experimental data as well as other simulation results, indicating the utility of the improved model in applications where an enhanced sampling of protein–CHL interactions is desired.

胆固醇(CHL)在调节各种哺乳动物膜蛋白的功能和活性方面发挥着不可或缺的作用。由于脂质的动态变化速度较慢,蛋白质与胆固醇相互作用的传统计算研究要么依赖于长时间尺度的原子模拟,要么依赖于粗粒度近似来对这一过程进行采样。为了增强脂质的扩散,我们开发了一种高流动膜模拟物(HMMM),用于促进研究脂质与外周膜蛋白的相互作用,以及用定制的硅学溶剂取代磷脂尾部,研究脂质与整体膜蛋白的相互作用。在此,我们报告了一个更新的 HMMM 模型,该模型能够包含膜的非磷脂成分 CHL,因此称为 HMMM-CHL。为此,我们必须优化定制溶剂对膜中 CHL 行为的影响。此外,新溶剂与使用基于力的切换协议的模拟兼容。在 HMMM-CHL 中,集成了改进的 CHL 动力学和加速的脂质扩散。为了测试更新后的模型,我们将其应用于两个膜蛋白系统--人类β2-肾上腺素能受体(β2AR)和线粒体电压依赖性阴离子通道1(VDAC-1)--中蛋白质-CHL相互作用的表征。我们的 HMMM-CHL 模拟成功地确定了 CHL 结合位点,并捕捉到了详细的 CHL 相互作用,与实验数据和其他模拟结果非常一致,这表明改进后的模型在需要增强蛋白质-CHL 相互作用采样的应用中非常有用。
{"title":"Improved Highly Mobile Membrane Mimetic Model for Investigating Protein–Cholesterol Interactions","authors":"Muyun Lihan,&nbsp; and ,&nbsp;Emad Tajkhorshid*,&nbsp;","doi":"10.1021/acs.jcim.4c00619","DOIUrl":"10.1021/acs.jcim.4c00619","url":null,"abstract":"<p >Cholesterol (CHL) plays an integral role in modulating the function and activity of various mammalian membrane proteins. Due to the slow dynamics of lipids, conventional computational studies of protein–CHL interactions rely on either long-time scale atomistic simulations or coarse-grained approximations to sample the process. A highly mobile membrane mimetic (HMMM) has been developed to enhance lipid diffusion and thus used to facilitate the investigation of lipid interactions with peripheral membrane proteins and, with customized <i>in silico</i> solvents to replace phospholipid tails, with integral membrane proteins. Here, we report an updated HMMM model that is able to include CHL, a nonphospholipid component of the membrane, henceforth called HMMM-CHL. To this end, we had to optimize the effect of the customized solvents on CHL behavior in the membrane. Furthermore, the new solvent is compatible with simulations using force-based switching protocols. In the HMMM-CHL, both improved CHL dynamics and accelerated lipid diffusion are integrated. To test the updated model, we have applied it to the characterization of protein–CHL interactions in two membrane protein systems, the human β<sub>2</sub>-adrenergic receptor (β<sub>2</sub>AR) and the mitochondrial voltage-dependent anion channel 1 (VDAC-1). Our HMMM-CHL simulations successfully identified CHL binding sites and captured detailed CHL interactions in excellent consistency with experimental data as well as other simulation results, indicating the utility of the improved model in applications where an enhanced sampling of protein–CHL interactions is desired.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141282330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graphormer-IR: Graph Transformers Predict Experimental IR Spectra Using Highly Specialized Attention Graphormer-IR:图形变换器利用高度专业化的注意力预测实验红外光谱。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-06-06 DOI: 10.1021/acs.jcim.4c00378
Cailum M. K. Stienstra, Liam Hebert, Patrick Thomas, Alexander Haack, Jason Guo and W. Scott Hopkins*, 

Infrared (IR) spectroscopy is an important analytical tool in various chemical and forensic domains and a great deal of effort has gone into developing in silico methods for predicting experimental spectra. A key challenge in this regard is generating highly accurate spectra quickly to enable real-time feedback between computation and experiment. Here, we employ Graphormer, a graph neural network (GNN) transformer, to predict IR spectra using only simplified molecular-input line-entry system (SMILES) strings. Our data set includes 53,528 high-quality spectra, measured in five different experimental media (i.e., phases), for molecules containing the elements H, C, N, O, F, Si, S, P, Cl, Br, and I. When using only atomic numbers for node encodings, Graphormer-IR achieved a mean test spectral information similarity (SISμ) value of 0.8449 ± 0.0012 (n = 5), which surpasses that the current state-of-the-art model Chemprop-IR (SISμ = 0.8409 ± 0.0014, n = 5) with only 36% of the encoded information. Augmenting node embeddings with additional node-level descriptors in learned embeddings generated through a multilayer perceptron improves scores to SISμ = 0.8523 ± 0.0006, a total improvement of 19.7σ (t = 19). These improved scores show how Graphormer-IR excels in capturing long-range interactions like hydrogen bonding, anharmonic peak positions in experimental spectra, and stretching frequencies of uncommon functional groups. Scaling our architecture to 210 attention heads demonstrates specialist-like behavior for distinct IR frequencies that improves model performance. Our model utilizes novel architectures, including a global node for phase encoding, learned node feature embeddings, and a one-dimensional (1D) smoothing convolutional neural network (CNN). Graphormer-IR’s innovations underscore its value over traditional message-passing neural networks (MPNNs) due to its expressive embeddings and ability to capture long-range intramolecular relationships.

红外(IR)光谱学是各种化学和法医领域的重要分析工具,为开发预测实验光谱的硅学方法付出了巨大努力。这方面的一个关键挑战是如何快速生成高精度光谱,以实现计算与实验之间的实时反馈。在这里,我们采用图形神经网络(GNN)转换器 Graphormer,仅使用简化分子输入行输入系统(SMILES)字符串预测红外光谱。我们的数据集包括 53528 个高质量光谱,这些光谱是在五种不同的实验介质(即......相)中测量的、当仅使用原子序数进行节点编码时,Graphormer-IR 的平均测试光谱信息相似度 (SISμ) 值为 0.8449 ± 0.0012 (n = 5),超过了目前最先进的模型 Chemprop-IR (SISμ = 0.8409 ± 0.0014, n = 5),但编码信息仅占 36%。在通过多层感知器生成的学习嵌入中使用额外的节点级描述符来增强节点嵌入,可将得分提高到 SISμ = 0.8523 ± 0.0006,总共提高了 19.7σ (t = 19)。这些分数的提高表明 Graphormer-IR 在捕捉氢键等长程相互作用、实验光谱中的非谐波峰位置以及不常见官能团的伸展频率方面表现出色。将我们的架构扩展到 210 个注意头,可针对不同的红外频率显示类似专家的行为,从而提高模型性能。我们的模型采用了新颖的架构,包括用于相位编码的全局节点、学习节点特征嵌入和一维(1D)平滑卷积神经网络(CNN)。Graphormer-IR 的创新之处在于其富有表现力的嵌入和捕捉长程分子内关系的能力,这凸显了它相对于传统消息传递神经网络 (MPNN) 的价值。
{"title":"Graphormer-IR: Graph Transformers Predict Experimental IR Spectra Using Highly Specialized Attention","authors":"Cailum M. K. Stienstra,&nbsp;Liam Hebert,&nbsp;Patrick Thomas,&nbsp;Alexander Haack,&nbsp;Jason Guo and W. Scott Hopkins*,&nbsp;","doi":"10.1021/acs.jcim.4c00378","DOIUrl":"10.1021/acs.jcim.4c00378","url":null,"abstract":"<p >Infrared (IR) spectroscopy is an important analytical tool in various chemical and forensic domains and a great deal of effort has gone into developing <i>in silico</i> methods for predicting experimental spectra. A key challenge in this regard is generating highly accurate spectra quickly to enable real-time feedback between computation and experiment. Here, we employ Graphormer, a graph neural network (GNN) transformer, to predict IR spectra using only simplified molecular-input line-entry system (SMILES) strings. Our data set includes 53,528 high-quality spectra, measured in five different experimental media (i.e., phases), for molecules containing the elements H, C, N, O, F, Si, S, P, Cl, Br, and I. When using only atomic numbers for node encodings, Graphormer-IR achieved a mean test spectral information similarity (<i>SIS</i><sub>μ</sub>) value of 0.8449 ± 0.0012 (<i>n</i> = 5), which surpasses that the current state-of-the-art model Chemprop-IR (<i>SIS</i><sub>μ</sub> = 0.8409 ± 0.0014, <i>n</i> = 5) with only 36% of the encoded information. Augmenting node embeddings with additional node-level descriptors in learned embeddings generated through a multilayer perceptron improves scores to <i>SIS</i><sub>μ</sub> = 0.8523 ± 0.0006, a total improvement of 19.7σ (<i>t</i> = 19). These improved scores show how Graphormer-IR excels in capturing long-range interactions like hydrogen bonding, anharmonic peak positions in experimental spectra, and stretching frequencies of uncommon functional groups. Scaling our architecture to 210 attention heads demonstrates specialist-like behavior for distinct IR frequencies that improves model performance. Our model utilizes novel architectures, including a global node for phase encoding, learned node feature embeddings, and a one-dimensional (1D) smoothing convolutional neural network (CNN). Graphormer-IR’s innovations underscore its value over traditional message-passing neural networks (MPNNs) due to its expressive embeddings and ability to capture long-range intramolecular relationships.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141282329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining for Potent Inhibitors through Artificial Intelligence and Physics: A Unified Methodology for Ligand Based and Structure Based Drug Design. 通过人工智能和物理学挖掘强效抑制剂:基于配体和结构的药物设计的统一方法。
IF 5.6 2区 化学 Q1 Social Sciences Pub Date : 2024-06-06 DOI: 10.1021/acs.jcim.4c00634
Jie Li, Oufan Zhang, Kunyang Sun, Yingze Wang, Xingyi Guan, Dorian Bagni, Mojtaba Haghighatlari, Fiona L Kearns, Conor Parks, Rommie E Amaro, Teresa Head-Gordon

Determining the viability of a new drug molecule is a time- and resource-intensive task that makes computer-aided assessments a vital approach to rapid drug discovery. Here we develop a machine learning algorithm, iMiner, that generates novel inhibitor molecules for target proteins by combining deep reinforcement learning with real-time 3D molecular docking using AutoDock Vina, thereby simultaneously creating chemical novelty while constraining molecules for shape and molecular compatibility with target active sites. Moreover, through the use of various types of reward functions, we have introduced novelty in generative tasks for new molecules such as chemical similarity to a target ligand, molecules grown from known protein bound fragments, and creation of molecules that enforce interactions with target residues in the protein active site. The iMiner algorithm is embedded in a composite workflow that filters out Pan-assay interference compounds, Lipinski rule violations, uncommon structures in medicinal chemistry, and poor synthetic accessibility with options for cross-validation against other docking scoring functions and automation of a molecular dynamics simulation to measure pose stability. We also allow users to define a set of rules for the structures they would like to exclude during the training process and postfiltering steps. Because our approach relies only on the structure of the target protein, iMiner can be easily adapted for the future development of other inhibitors or small molecule therapeutics of any target protein.

确定新药物分子的可行性是一项时间和资源密集型任务,因此计算机辅助评估成为快速药物发现的重要方法。在这里,我们开发了一种名为 iMiner 的机器学习算法,该算法通过将深度强化学习与使用 AutoDock Vina 进行的实时三维分子对接相结合,为目标蛋白质生成新型抑制剂分子,从而在限制分子形状和分子与目标活性位点的兼容性的同时创造化学新颖性。此外,通过使用各种类型的奖励函数,我们在新分子的生成任务中引入了新颖性,例如与目标配体的化学相似性、从已知蛋白质结合片段中生长出的分子,以及与蛋白质活性位点中的目标残基强制相互作用的分子的创建。iMiner 算法被嵌入到一个复合工作流程中,该流程可过滤掉泛测干扰化合物、违反 Lipinski 规则的化合物、药物化学中不常见的结构以及合成可及性差的化合物,并可选择与其他对接评分函数进行交叉验证,以及自动进行分子动力学模拟以测量姿势的稳定性。我们还允许用户为他们希望在训练过程和后过滤步骤中排除的结构定义一套规则。由于我们的方法只依赖于目标蛋白质的结构,因此 iMiner 可以很容易地适用于未来任何目标蛋白质的其他抑制剂或小分子疗法的开发。
{"title":"Mining for Potent Inhibitors through Artificial Intelligence and Physics: A Unified Methodology for Ligand Based and Structure Based Drug Design.","authors":"Jie Li, Oufan Zhang, Kunyang Sun, Yingze Wang, Xingyi Guan, Dorian Bagni, Mojtaba Haghighatlari, Fiona L Kearns, Conor Parks, Rommie E Amaro, Teresa Head-Gordon","doi":"10.1021/acs.jcim.4c00634","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00634","url":null,"abstract":"<p><p>Determining the viability of a new drug molecule is a time- and resource-intensive task that makes computer-aided assessments a vital approach to rapid drug discovery. Here we develop a machine learning algorithm, iMiner, that generates novel inhibitor molecules for target proteins by combining deep reinforcement learning with real-time 3D molecular docking using AutoDock Vina, thereby simultaneously creating chemical novelty while constraining molecules for shape and molecular compatibility with target active sites. Moreover, through the use of various types of reward functions, we have introduced novelty in generative tasks for new molecules such as chemical similarity to a target ligand, molecules grown from known protein bound fragments, and creation of molecules that enforce interactions with target residues in the protein active site. The iMiner algorithm is embedded in a composite workflow that filters out Pan-assay interference compounds, Lipinski rule violations, uncommon structures in medicinal chemistry, and poor synthetic accessibility with options for cross-validation against other docking scoring functions and automation of a molecular dynamics simulation to measure pose stability. We also allow users to define a set of rules for the structures they would like to exclude during the training process and postfiltering steps. Because our approach relies only on the structure of the target protein, iMiner can be easily adapted for the future development of other inhibitors or small molecule therapeutics of any target protein.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141282331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synergizing Chemical Structures and Bioassay Descriptions for Enhanced Molecular Property Prediction in Drug Discovery 协同化学结构和生物测定描述,增强药物发现中的分子特性预测。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-06-05 DOI: 10.1021/acs.jcim.4c00765
Maximilian G. Schuh, Davide Boldini* and Stephan A. Sieber*, 

The precise prediction of molecular properties can greatly accelerate the development of new drugs. However, in silico molecular property prediction approaches have been limited so far to assays for which large amounts of data are available. In this study, we develop a new computational approach leveraging both the textual description of the assay of interest and the chemical structure of target compounds. By combining these two sources of information via self-supervised learning, our tool can provide accurate predictions for assays where no measurements are available. Remarkably, our approach achieves state-of-the-art performance on the FS-Mol benchmark for zero-shot prediction, outperforming a wide variety of deep learning approaches. Additionally, we demonstrate how our tool can be used for tailoring screening libraries for the assay of interest, showing promising performance in a retrospective case study on a high-throughput screening campaign. By accelerating the early identification of active molecules in drug discovery and development, this method has the potential to streamline the identification of novel therapeutics.

对分子特性的精确预测可以大大加快新药的研发。然而,迄今为止硅学分子性质预测方法仅限于可获得大量数据的检测方法。在本研究中,我们开发了一种新的计算方法,同时利用相关检测的文本描述和目标化合物的化学结构。通过自监督学习将这两种信息源结合起来,我们的工具可以为没有测量数据的检测提供准确的预测。值得注意的是,我们的方法在零次预测的 FS-Mol 基准上取得了最先进的性能,超过了各种深度学习方法。此外,我们还展示了如何利用我们的工具为感兴趣的检测量身定制筛选库,并在一项高通量筛选活动的回顾性案例研究中展示了良好的性能。通过加速药物发现和开发过程中活性分子的早期鉴定,这种方法有望简化新型疗法的鉴定过程。
{"title":"Synergizing Chemical Structures and Bioassay Descriptions for Enhanced Molecular Property Prediction in Drug Discovery","authors":"Maximilian G. Schuh,&nbsp;Davide Boldini* and Stephan A. Sieber*,&nbsp;","doi":"10.1021/acs.jcim.4c00765","DOIUrl":"10.1021/acs.jcim.4c00765","url":null,"abstract":"<p >The precise prediction of molecular properties can greatly accelerate the development of new drugs. However, <i>in silico</i> molecular property prediction approaches have been limited so far to assays for which large amounts of data are available. In this study, we develop a new computational approach leveraging both the textual description of the assay of interest and the chemical structure of target compounds. By combining these two sources of information via self-supervised learning, our tool can provide accurate predictions for assays where no measurements are available. Remarkably, our approach achieves state-of-the-art performance on the FS-Mol benchmark for zero-shot prediction, outperforming a wide variety of deep learning approaches. Additionally, we demonstrate how our tool can be used for tailoring screening libraries for the assay of interest, showing promising performance in a retrospective case study on a high-throughput screening campaign. By accelerating the early identification of active molecules in drug discovery and development, this method has the potential to streamline the identification of novel therapeutics.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11200265/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141247041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge Graph Convolutional Network with Heuristic Search for Drug Repositioning 用于药物重新定位的知识图谱卷积网络与启发式搜索
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-06-05 DOI: 10.1021/acs.jcim.4c00737
Xiang Du, Xinliang Sun and Min Li*, 

Drug repositioning is a strategy of repurposing approved drugs for treating new indications, which can accelerate the drug discovery process, reduce development costs, and lower the safety risk. The advancement of biotechnology has significantly accelerated the speed and scale of biological data generation, offering significant potential for drug repositioning through biomedical knowledge graphs that integrate diverse entities and relations from various biomedical sources. To fully learn the semantic information and topological structure information from the biological knowledge graph, we propose a knowledge graph convolutional network with a heuristic search, named KGCNH, which can effectively utilize the diversity of entities and relationships in biological knowledge graphs, as well as topological structure information, to predict the associations between drugs and diseases. Specifically, we design a relation-aware attention mechanism to compute the attention scores for each neighboring entity of a given entity under different relations. To address the challenge of randomness of the initial attention scores potentially impacting model performance and to expand the search scope of the model, we designed a heuristic search module based on Gumbel-Softmax, which uses attention scores as heuristic information and introduces randomness to assist the model in exploring more optimal embeddings of drugs and diseases. Following this module, we derive the relation weights, obtain the embeddings of drugs and diseases through neighborhood aggregation, and then predict drug–disease associations. Additionally, we employ feature-based augmented views to enhance model robustness and mitigate overfitting issues. We have implemented our method and conducted experiments on two data sets. The results demonstrate that KGCNH outperforms competing methods. In particular, case studies on lithium and quetiapine confirm that KGCNH can retrieve more actual drug–disease associations in the top prediction results.

药物重新定位是一种将已获批准的药物重新用于治疗新适应症的策略,可加快药物发现过程、降低开发成本和安全风险。生物技术的发展大大加快了生物数据生成的速度和规模,通过生物医学知识图谱整合来自各种生物医学资源的不同实体和关系,为药物重新定位提供了巨大的潜力。为了充分学习生物知识图谱中的语义信息和拓扑结构信息,我们提出了一种带有启发式搜索的知识图谱卷积网络,命名为 KGCNH,它能有效利用生物知识图谱中实体和关系的多样性以及拓扑结构信息来预测药物与疾病之间的关联。具体来说,我们设计了一种关系感知注意力机制,计算给定实体在不同关系下每个相邻实体的注意力得分。为了解决初始注意力分数的随机性可能影响模型性能的难题,并扩大模型的搜索范围,我们设计了一个基于 Gumbel-Softmax 的启发式搜索模块,该模块使用注意力分数作为启发式信息,并引入随机性,以帮助模型探索更优化的药物和疾病嵌入。在此模块之后,我们将得出关系权重,通过邻域聚合获得药物和疾病的嵌入,然后预测药物与疾病的关联。此外,我们还采用了基于特征的增强视图,以增强模型的鲁棒性并缓解过拟合问题。我们实现了我们的方法,并在两个数据集上进行了实验。结果表明,KGCNH 优于其他竞争方法。特别是对锂和喹硫平的案例研究证实,KGCNH 可以在顶级预测结果中检索到更多实际的药物-疾病关联。
{"title":"Knowledge Graph Convolutional Network with Heuristic Search for Drug Repositioning","authors":"Xiang Du,&nbsp;Xinliang Sun and Min Li*,&nbsp;","doi":"10.1021/acs.jcim.4c00737","DOIUrl":"10.1021/acs.jcim.4c00737","url":null,"abstract":"<p >Drug repositioning is a strategy of repurposing approved drugs for treating new indications, which can accelerate the drug discovery process, reduce development costs, and lower the safety risk. The advancement of biotechnology has significantly accelerated the speed and scale of biological data generation, offering significant potential for drug repositioning through biomedical knowledge graphs that integrate diverse entities and relations from various biomedical sources. To fully learn the semantic information and topological structure information from the biological knowledge graph, we propose a knowledge graph convolutional network with a heuristic search, named KGCNH, which can effectively utilize the diversity of entities and relationships in biological knowledge graphs, as well as topological structure information, to predict the associations between drugs and diseases. Specifically, we design a relation-aware attention mechanism to compute the attention scores for each neighboring entity of a given entity under different relations. To address the challenge of randomness of the initial attention scores potentially impacting model performance and to expand the search scope of the model, we designed a heuristic search module based on Gumbel-Softmax, which uses attention scores as heuristic information and introduces randomness to assist the model in exploring more optimal embeddings of drugs and diseases. Following this module, we derive the relation weights, obtain the embeddings of drugs and diseases through neighborhood aggregation, and then predict drug–disease associations. Additionally, we employ feature-based augmented views to enhance model robustness and mitigate overfitting issues. We have implemented our method and conducted experiments on two data sets. The results demonstrate that KGCNH outperforms competing methods. In particular, case studies on lithium and quetiapine confirm that KGCNH can retrieve more actual drug–disease associations in the top prediction results.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141260650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Peptide Drug Design Using Alchemical Free Energy Calculation: An Application and Validation on Agonists of Ghrelin Receptor 利用炼金术自由能计算进行多肽药物设计:胃泌素受体激动剂的应用与验证。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-06-05 DOI: 10.1021/acs.jcim.4c00414
Qin Zeng, Guangpeng Meng, Bingyu Zhao, Haodian Lin, Yuqing Guan, Xiaobin Qin, Yu Yuan*, Yuanbo Li* and Qiantao Wang*, 

With recent large-scale applications and validations, the relative binding free energy (RBFE) calculated using alchemical free energy methods has been proven to be an accurate measure to probe the binding of small-molecule drug candidates. On the other hand, given the flexibility of peptides, it is of great interest to find out whether sufficient sampling could be achieved within the typical time scale of such calculation, and a similar level of accuracy could be reached for peptide drugs. However, the systematic evaluation of such calculations on protein–peptide systems has been less reported. Most reported studies of peptides were restricted to a limited number of data points or lacking experimental support. To demonstrate the applicability of the alchemical free energy method for protein–peptide systems in a typical real-world drug discovery project, we report an application of the thermodynamic integration (TI) method to the RBFE calculation of ghrelin receptor and its peptide agonists. Along with the calculation, the synthesis and in vitro EC50 activity of relamorelin and 17 new peptide derivatives were also reported. A cost-effective criterion to determine the data collection time was proposed for peptides in the TI simulation. The average of three TI repeats yielded a mean absolute error of 0.98 kcal/mol and Pearson’s correlation coefficient (R) of 0.77 against the experimental free energy derived from the in vitro EC50 activity, showing good repeatability of the proposed method and a slightly better agreement than the results obtained from the arbitrary time frames up to 20 ns. Although it is limited by having one target and a deduced binding pose, we hope that this study can add some insights into alchemical free energy calculation of protein–peptide systems, providing theoretical assistance to the development of peptide drugs.

通过最近的大规模应用和验证,使用炼金术自由能方法计算的相对结合自由能(RBFE)已被证明是探究小分子候选药物结合的准确测量方法。另一方面,考虑到多肽的灵活性,人们对能否在此类计算的典型时间尺度内实现足够的取样以及多肽药物能否达到类似的准确度水平非常感兴趣。然而,对蛋白质-肽系统进行此类计算的系统评估报道较少。大多数关于多肽的研究都局限于有限的数据点或缺乏实验支持。为了证明炼金术自由能方法在典型的实际药物发现项目中对蛋白肽系统的适用性,我们报告了热力学积分(TI)方法在胃泌素受体及其多肽激动剂的 RBFE 计算中的应用。在计算的同时,我们还报告了relamorelin和17种新多肽衍生物的合成和体外EC50活性。针对 TI 模拟中的多肽,提出了一个确定数据收集时间的成本效益标准。三次 TI 重复的平均绝对误差为 0.98 kcal/mol,与体外 EC50 活性得出的实验自由能的皮尔逊相关系数 (R) 为 0.77,这表明所提议的方法具有良好的可重复性,其一致性略好于 20 ns 以下任意时间框架得出的结果。虽然这项研究受限于一个目标和一个推导出的结合姿势,但我们希望它能为蛋白质-多肽系统的炼金自由能计算增添一些新的见解,为多肽药物的开发提供理论帮助。
{"title":"Peptide Drug Design Using Alchemical Free Energy Calculation: An Application and Validation on Agonists of Ghrelin Receptor","authors":"Qin Zeng,&nbsp;Guangpeng Meng,&nbsp;Bingyu Zhao,&nbsp;Haodian Lin,&nbsp;Yuqing Guan,&nbsp;Xiaobin Qin,&nbsp;Yu Yuan*,&nbsp;Yuanbo Li* and Qiantao Wang*,&nbsp;","doi":"10.1021/acs.jcim.4c00414","DOIUrl":"10.1021/acs.jcim.4c00414","url":null,"abstract":"<p >With recent large-scale applications and validations, the relative binding free energy (RBFE) calculated using alchemical free energy methods has been proven to be an accurate measure to probe the binding of small-molecule drug candidates. On the other hand, given the flexibility of peptides, it is of great interest to find out whether sufficient sampling could be achieved within the typical time scale of such calculation, and a similar level of accuracy could be reached for peptide drugs. However, the systematic evaluation of such calculations on protein–peptide systems has been less reported. Most reported studies of peptides were restricted to a limited number of data points or lacking experimental support. To demonstrate the applicability of the alchemical free energy method for protein–peptide systems in a typical real-world drug discovery project, we report an application of the thermodynamic integration (TI) method to the RBFE calculation of ghrelin receptor and its peptide agonists. Along with the calculation, the synthesis and in vitro EC<sub>50</sub> activity of relamorelin and 17 new peptide derivatives were also reported. A cost-effective criterion to determine the data collection time was proposed for peptides in the TI simulation. The average of three TI repeats yielded a mean absolute error of 0.98 kcal/mol and Pearson’s correlation coefficient (<i>R</i>) of 0.77 against the experimental free energy derived from the in vitro EC<sub>50</sub> activity, showing good repeatability of the proposed method and a slightly better agreement than the results obtained from the arbitrary time frames up to 20 ns. Although it is limited by having one target and a deduced binding pose, we hope that this study can add some insights into alchemical free energy calculation of protein–peptide systems, providing theoretical assistance to the development of peptide drugs.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141246965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining the Dynamical Properties of Substrate and FAD Binding Pockets of LSD1: Hints for New Inhibitor Design Direction 挖掘 LSD1 底物和 FAD 结合口袋的动态特性:新抑制剂设计方向的提示。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-06-05 DOI: 10.1021/acs.jcim.4c00398
Kecheng Yang*,  and , Hongmin Liu, 

Lysine-specific demethylase 1 (LSD1), a highly sophisticated epigenetic regulator, orchestrates a range of critical cellular processes, holding promising therapeutic potential for treating diverse diseases. However, the clinical research progress targeting LSD1 is very slow. After 20 years of research, only one small-molecule drug, BEA-17, targeting the degradation of LSD1 and CoREST has been approved by the U.S. Food and Drug Administration. The primary reason for this may be the lack of abundant structural data regarding its intricate functions. To gain a deeper understanding of its conformational dynamics and guide the drug design process, we conducted molecular dynamics simulations to explore the conformational states of LSD1 in the apo state and under the influence of cofactors of flavin adenine dinucleotide (FAD) and CoREST. Our results showed that, across all states, the substrate binding pocket exhibited high flexibility, whereas the FAD binding pocket remained more stable. These distinct dynamical properties are essential for LSD1’s ability to bind various substrates while maintaining efficient demethylation activity. Both pockets can be enlarged by merging with adjacent pockets, although only the substrate binding pocket can shrink into smaller pockets. These new pocket shapes can inform inhibitor design, particularly for selectively FAD-competitive inhibitors of LSD1, given the presence of numerous FAD-dependent enzymes in the human body. More interestingly, in the absence of FAD binding, the united substrate and FAD binding pocket are partitioned by the conserved residue of Tyr761, offering valuable insights for the design of inhibitors that disrupt the crucial steric role of Tyr761 and the redox role of FAD. Additionally, we identified pockets that positively or negatively correlate with the substrate and FAD binding pockets, which can be exploited for the design of allosteric or concurrent inhibitors. Our results reveal the intricate dynamical properties of LSD1 as well as multiple novel conformational states, which deepen our understanding of its sophisticated functions and aid in the rational design of new inhibitors.

赖氨酸特异性去甲基化酶1(LSD1)是一种高度复杂的表观遗传调控因子,协调着一系列关键的细胞过程,具有治疗各种疾病的潜力。然而,针对 LSD1 的临床研究进展非常缓慢。经过 20 年的研究,只有一种靶向降解 LSD1 和 CoREST 的小分子药物 BEA-17 获得了美国食品药品管理局的批准。其主要原因可能是缺乏有关其复杂功能的丰富结构数据。为了深入了解其构象动态并指导药物设计过程,我们进行了分子动力学模拟,以探索 LSD1 在 apo 状态以及在黄素腺嘌呤二核苷酸(FAD)和 CoREST 等辅助因子影响下的构象状态。我们的研究结果表明,在所有状态下,底物结合口袋都表现出很高的灵活性,而 FAD 结合口袋则更加稳定。这些不同的动态特性对于 LSD1 在保持高效去甲基化活性的同时结合各种底物的能力至关重要。这两个口袋都可以通过与相邻口袋合并而扩大,但只有底物结合口袋可以缩小成更小的口袋。鉴于人体内存在大量依赖 FAD 的酶,这些新的口袋形状可以为抑制剂的设计提供参考,尤其是为 LSD1 的选择性 FAD 竞争性抑制剂提供参考。更有趣的是,在没有 FAD 结合的情况下,底物和 FAD 结合口袋由 Tyr761 这一保守残基分割,这为设计能破坏 Tyr761 的关键立体作用和 FAD 的氧化还原作用的抑制剂提供了宝贵的启示。此外,我们还发现了与底物和 FAD 结合口袋正相关或负相关的口袋,可用于设计异位或并发抑制剂。我们的研究结果揭示了 LSD1 复杂的动态特性以及多种新的构象状态,加深了我们对其复杂功能的理解,有助于合理设计新的抑制剂。
{"title":"Mining the Dynamical Properties of Substrate and FAD Binding Pockets of LSD1: Hints for New Inhibitor Design Direction","authors":"Kecheng Yang*,&nbsp; and ,&nbsp;Hongmin Liu,&nbsp;","doi":"10.1021/acs.jcim.4c00398","DOIUrl":"10.1021/acs.jcim.4c00398","url":null,"abstract":"<p >Lysine-specific demethylase 1 (LSD1), a highly sophisticated epigenetic regulator, orchestrates a range of critical cellular processes, holding promising therapeutic potential for treating diverse diseases. However, the clinical research progress targeting LSD1 is very slow. After 20 years of research, only one small-molecule drug, BEA-17, targeting the degradation of LSD1 and CoREST has been approved by the U.S. Food and Drug Administration. The primary reason for this may be the lack of abundant structural data regarding its intricate functions. To gain a deeper understanding of its conformational dynamics and guide the drug design process, we conducted molecular dynamics simulations to explore the conformational states of LSD1 in the apo state and under the influence of cofactors of flavin adenine dinucleotide (FAD) and CoREST. Our results showed that, across all states, the substrate binding pocket exhibited high flexibility, whereas the FAD binding pocket remained more stable. These distinct dynamical properties are essential for LSD1’s ability to bind various substrates while maintaining efficient demethylation activity. Both pockets can be enlarged by merging with adjacent pockets, although only the substrate binding pocket can shrink into smaller pockets. These new pocket shapes can inform inhibitor design, particularly for selectively FAD-competitive inhibitors of LSD1, given the presence of numerous FAD-dependent enzymes in the human body. More interestingly, in the absence of FAD binding, the united substrate and FAD binding pocket are partitioned by the conserved residue of Tyr761, offering valuable insights for the design of inhibitors that disrupt the crucial steric role of Tyr761 and the redox role of FAD. Additionally, we identified pockets that positively or negatively correlate with the substrate and FAD binding pockets, which can be exploited for the design of allosteric or concurrent inhibitors. Our results reveal the intricate dynamical properties of LSD1 as well as multiple novel conformational states, which deepen our understanding of its sophisticated functions and aid in the rational design of new inhibitors.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141260652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of Family-Specific Features in Cas9 and Cas12 Proteins: A Machine Learning Approach Using Complete Protein Feature Spectrum Cas9 和 Cas12 蛋白家族特异性特征的鉴定:使用完整蛋白质特征谱的机器学习方法。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-06-05 DOI: 10.1021/acs.jcim.4c00625
Sita Sirisha Madugula, Pranav Pujar, Bharani Nammi, Shouyi Wang, Vindi M. Jayasinghe-Arachchige, Tyler Pham, Dominic Mashburn, Maria Artiles and Jin Liu*, 

The recent development of CRISPR-Cas technology holds promise to correct gene-level defects for genetic diseases. The key element of the CRISPR-Cas system is the Cas protein, a nuclease that can edit the gene of interest assisted by guide RNA. However, these Cas proteins suffer from inherent limitations such as large size, low cleavage efficiency, and off-target effects, hindering their widespread application as a gene editing tool. Therefore, there is a need to identify novel Cas proteins with improved editing properties, for which it is necessary to understand the underlying features governing the Cas families. In this study, we aim to elucidate the unique protein features associated with Cas9 and Cas12 families and identify the features distinguishing each family from non-Cas proteins. Here, we built Random Forest (RF) binary classifiers to distinguish Cas12 and Cas9 proteins from non-Cas proteins, respectively, using the complete protein feature spectrum (13,494 features) encoding various physiochemical, topological, constitutional, and coevolutionary information on Cas proteins. Furthermore, we built multiclass RF classifiers differentiating Cas9, Cas12, and non-Cas proteins. All the models were evaluated rigorously on the test and independent data sets. The Cas12 and Cas9 binary models achieved a high overall accuracy of 92% and 95% on their respective independent data sets, while the multiclass classifier achieved an F1 score of close to 0.98. We observed that Quasi-Sequence-Order (QSO) descriptors like Schneider.lag and Composition descriptors like charge, volume, and polarizability are predominant in the Cas12 family. Conversely Amino Acid Composition descriptors, especially Tripeptide Composition (TPC), predominate the Cas9 family. Four of the top 10 descriptors identified in Cas9 classification are tripeptides PWN, PYY, HHA, and DHI, which are seen to be conserved across all Cas9 proteins and located within different catalytically important domains of the Streptococcus pyogenes Cas9 (SpCas9) structure. Among these, DHI and HHA are well-known to be involved in the DNA cleavage activity of the SpCas9 protein. Mutation studies have highlighted the significance of the PWN tripeptide in PAM recognition and DNA cleavage activity of SpCas9, while Y450 from the PYY tripeptide plays a crucial role in reducing off-target effects and improving the specificity in SpCas9. Leveraging our machine learning (ML) pipeline, we identified numerous Cas9 and Cas12 family-specific features. These features offer valuable insights for future experimental and computational studies aiming at designing Cas systems with enhanced gene-editing properties. These features suggest plausible structural modifications that can effectively guide the development of Cas proteins with improved editing capabilities.

最近开发的 CRISPR-Cas 技术有望纠正遗传疾病的基因水平缺陷。CRISPR-Cas系统的关键元件是Cas蛋白,它是一种核酸酶,可在引导RNA的辅助下编辑相关基因。然而,这些 Cas 蛋白存在固有的局限性,如体积大、裂解效率低和脱靶效应,阻碍了它们作为基因编辑工具的广泛应用。因此,有必要鉴定具有更好编辑特性的新型 Cas 蛋白,为此有必要了解 Cas 家族的基本特征。在本研究中,我们旨在阐明与 Cas9 和 Cas12 家族相关的独特蛋白质特征,并确定每个家族区别于非 Cas 蛋白的特征。在此,我们利用完整的蛋白质特征谱(13,494 个特征)编码了 Cas 蛋白的各种理化、拓扑、结构和协同进化信息,建立了随机森林(RF)二元分类器,以区分 Cas12 和 Cas9 蛋白与非 Cas 蛋白。此外,我们还建立了区分 Cas9、Cas12 和非 Cas 蛋白的多类 RF 分类器。我们在测试数据集和独立数据集上对所有模型进行了严格评估。在各自的独立数据集上,Cas12 和 Cas9 二进制模型的总体准确率分别达到 92% 和 95%,而多分类器的 F1 分数接近 0.98。我们观察到,在 Cas12 家族中,Schneider.lag 等准序列序列(QSO)描述符以及电荷、体积和极化性等组成描述符占主导地位。相反,氨基酸组成描述符,尤其是三肽组成(TPC)在 Cas9 家族中占主导地位。在Cas9分类中发现的前10个描述符中有4个是三肽PWN、PYY、HHA和DHI,它们在所有Cas9蛋白中都是保守的,并且位于化脓性链球菌Cas9(SpCas9)结构的不同重要催化结构域中。众所周知,DHI 和 HHA 参与了 SpCas9 蛋白的 DNA 切割活动。突变研究强调了PWN三肽在SpCas9的PAM识别和DNA切割活性中的重要作用,而PYY三肽中的Y450则在减少脱靶效应和提高SpCas9的特异性方面发挥着关键作用。利用我们的机器学习(ML)管道,我们发现了许多 Cas9 和 Cas12 家族的特异性特征。这些特征为未来旨在设计具有更强基因编辑特性的 Cas 系统的实验和计算研究提供了宝贵的见解。这些特征提出了一些似是而非的结构修饰,可以有效地指导具有更强编辑能力的 Cas 蛋白的开发。
{"title":"Identification of Family-Specific Features in Cas9 and Cas12 Proteins: A Machine Learning Approach Using Complete Protein Feature Spectrum","authors":"Sita Sirisha Madugula,&nbsp;Pranav Pujar,&nbsp;Bharani Nammi,&nbsp;Shouyi Wang,&nbsp;Vindi M. Jayasinghe-Arachchige,&nbsp;Tyler Pham,&nbsp;Dominic Mashburn,&nbsp;Maria Artiles and Jin Liu*,&nbsp;","doi":"10.1021/acs.jcim.4c00625","DOIUrl":"10.1021/acs.jcim.4c00625","url":null,"abstract":"<p >The recent development of CRISPR-Cas technology holds promise to correct gene-level defects for genetic diseases. The key element of the CRISPR-Cas system is the Cas protein, a nuclease that can edit the gene of interest assisted by guide RNA. However, these Cas proteins suffer from inherent limitations such as large size, low cleavage efficiency, and off-target effects, hindering their widespread application as a gene editing tool. Therefore, there is a need to identify novel Cas proteins with improved editing properties, for which it is necessary to understand the underlying features governing the Cas families. In this study, we aim to elucidate the unique protein features associated with Cas9 and Cas12 families and identify the features distinguishing each family from non-Cas proteins. Here, we built Random Forest (RF) binary classifiers to distinguish Cas12 and Cas9 proteins from non-Cas proteins, respectively, using the complete protein feature spectrum (13,494 features) encoding various physiochemical, topological, constitutional, and coevolutionary information on Cas proteins. Furthermore, we built multiclass RF classifiers differentiating Cas9, Cas12, and non-Cas proteins. All the models were evaluated rigorously on the test and independent data sets. The Cas12 and Cas9 binary models achieved a high overall accuracy of 92% and 95% on their respective independent data sets, while the multiclass classifier achieved an F1 score of close to 0.98. We observed that Quasi-Sequence-Order (QSO) descriptors like Schneider.lag and Composition descriptors like charge, volume, and polarizability are predominant in the Cas12 family. Conversely Amino Acid Composition descriptors, especially Tripeptide Composition (TPC), predominate the Cas9 family. Four of the top 10 descriptors identified in Cas9 classification are tripeptides PWN, PYY, HHA, and DHI, which are seen to be conserved across all Cas9 proteins and located within different catalytically important domains of the <i>Streptococcus pyogenes</i> Cas9 (SpCas9) structure. Among these, DHI and HHA are well-known to be involved in the DNA cleavage activity of the SpCas9 protein. Mutation studies have highlighted the significance of the PWN tripeptide in PAM recognition and DNA cleavage activity of SpCas9, while Y450 from the PYY tripeptide plays a crucial role in reducing off-target effects and improving the specificity in SpCas9. Leveraging our machine learning (ML) pipeline, we identified numerous Cas9 and Cas12 family-specific features. These features offer valuable insights for future experimental and computational studies aiming at designing Cas systems with enhanced gene-editing properties. These features suggest plausible structural modifications that can effectively guide the development of Cas proteins with improved editing capabilities.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141260647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ChatGPT Combining Machine Learning for the Prediction of Nanozyme Catalytic Types and Activities. ChatGPT 结合机器学习预测纳米酶催化类型和活性。
IF 5.6 2区 化学 Q1 Social Sciences Pub Date : 2024-06-03 DOI: 10.1021/acs.jcim.4c00600
Liping Sun, Jili Hu, Yinfeng Yang, Yongkang Wang, Zijian Wang, Yong Gao, Yiqi Nie, Can Liu, Hongxing Kan

The design of nanozymes with superior catalytic activities is a prerequisite for broadening their biomedical applications. Previous studies have exerted significant effort in theoretical calculation and experimental trials for enhancing the catalytic activity of nanozyme. Machine learning (ML) provides a forward-looking aid in predicting nanozyme catalytic activity. However, this requires a significant amount of human effort for data collection. In addition, the prediction accuracy urgently needs to be improved. Herein, we demonstrate that ChatGPT can collaborate with humans to efficiently collect data. We establish four qualitative models (random forest (RF), decision tree (DT), adaboost random forest (adaboost-RF), and adaboost decision tree (adaboost-DT)) for predicting nanozyme catalytic types, such as peroxidase, oxidase, catalase, superoxide dismutase, and glutathione peroxidase. Furthermore, we use five quantitative models (random forest (RF), decision tree (DT), Support Vector Regression (SVR), gradient boosting regression (GBR), and fully connected deep neuron network (DNN)) to predict nanozyme catalytic activities. We find that GBR model demonstrates superior prediction performance for nanozyme catalytic activities (R2 = 0.6476 for Km and R2 = 0.95 for Kcat). Moreover, an open-access web resource, AI-ZYMES, with a ChatGPT-based nanozyme copilot is developed for predicting nanozyme catalytic types and activities and guiding the synthesis of nanozyme. The accuracy of the nanozyme copilot's responses reaches more than 90% through the retrieval augmented generation. This study provides a new potential application for ChatGPT in the field of nanozymes.

设计具有卓越催化活性的纳米酶是扩大其生物医学应用的先决条件。以往的研究在提高纳米酶催化活性的理论计算和实验测试方面付出了巨大努力。机器学习(ML)为预测纳米酶的催化活性提供了前瞻性的帮助。然而,这需要大量的人力收集数据。此外,预测的准确性也亟待提高。在这里,我们证明了 ChatGPT 可以与人类合作,高效地收集数据。我们建立了四个定性模型(随机森林 (RF)、决策树 (DT)、adaboost 随机森林 (adaboost-RF) 和 adaboost 决策树 (adaboost-DT))来预测纳米酶催化类型,如过氧化物酶、氧化酶、过氧化氢酶、超氧化物歧化酶和谷胱甘肽过氧化物酶。此外,我们还使用了五种定量模型(随机森林(RF)、决策树(DT)、支持向量回归(SVR)、梯度提升回归(GBR)和全连接深度神经元网络(DNN))来预测纳米酶催化活性。我们发现,GBR 模型在预测纳米酶催化活性方面表现优异(Km 的 R2 = 0.6476,Kcat 的 R2 = 0.95)。此外,还开发了一个开放访问的网络资源 AI-ZYMES,其中包含一个基于 ChatGPT 的纳米酶共导器,用于预测纳米酶催化类型和催化活性,并指导纳米酶的合成。通过检索增强生成,纳米酶副驾驶员响应的准确率达到 90% 以上。这项研究为 ChatGPT 在纳米酶领域的应用提供了新的可能性。
{"title":"ChatGPT Combining Machine Learning for the Prediction of Nanozyme Catalytic Types and Activities.","authors":"Liping Sun, Jili Hu, Yinfeng Yang, Yongkang Wang, Zijian Wang, Yong Gao, Yiqi Nie, Can Liu, Hongxing Kan","doi":"10.1021/acs.jcim.4c00600","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00600","url":null,"abstract":"<p><p>The design of nanozymes with superior catalytic activities is a prerequisite for broadening their biomedical applications. Previous studies have exerted significant effort in theoretical calculation and experimental trials for enhancing the catalytic activity of nanozyme. Machine learning (ML) provides a forward-looking aid in predicting nanozyme catalytic activity. However, this requires a significant amount of human effort for data collection. In addition, the prediction accuracy urgently needs to be improved. Herein, we demonstrate that ChatGPT can collaborate with humans to efficiently collect data. We establish four qualitative models (random forest (RF), decision tree (DT), adaboost random forest (adaboost-RF), and adaboost decision tree (adaboost-DT)) for predicting nanozyme catalytic types, such as peroxidase, oxidase, catalase, superoxide dismutase, and glutathione peroxidase. Furthermore, we use five quantitative models (random forest (RF), decision tree (DT), Support Vector Regression (SVR), gradient boosting regression (GBR), and fully connected deep neuron network (DNN)) to predict nanozyme catalytic activities. We find that GBR model demonstrates superior prediction performance for nanozyme catalytic activities (<i>R</i><sup>2</sup> = 0.6476 for Km and <i>R</i><sup>2</sup> = 0.95 for Kcat). Moreover, an open-access web resource, AI-ZYMES, with a ChatGPT-based nanozyme copilot is developed for predicting nanozyme catalytic types and activities and guiding the synthesis of nanozyme. The accuracy of the nanozyme copilot's responses reaches more than 90% through the retrieval augmented generation. This study provides a new potential application for ChatGPT in the field of nanozymes.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141236630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemical Information and Modeling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1