首页 > 最新文献

Journal of Computer-Aided Molecular Design最新文献

英文 中文
Examining unsupervised ensemble learning using spectroscopy data of organic compounds 利用有机化合物的光谱数据检验无监督集成学习
IF 3.5 3区 生物学 Q1 Chemistry Pub Date : 2022-11-21 DOI: 10.1007/s10822-022-00488-9
Kedan He, Djenerly G. Massena

One solution to the challenge of choosing an appropriate clustering algorithm is to combine different clusterings into a single consensus clustering result, known as cluster ensemble (CE). This ensemble learning strategy can provide more robust and stable solutions across different domains and datasets. Unfortunately, not all clusterings in the ensemble contribute to the final data partition. Cluster ensemble selection (CES) aims at selecting a subset from a large library of clustering solutions to form a smaller cluster ensemble that performs as well as or better than the set of all available clustering solutions. In this paper, we investigate four CES methods for the categorization of structurally distinct organic compounds using high-dimensional IR and Raman spectroscopy data. Single quality selection (SQI) forms a subset of the ensemble by selecting the highest quality ensemble members. The Single Quality Selection (SQI) method is used with various quality indices to select subsets by including the highest quality ensemble members. The Bagging method, usually applied in supervised learning, ranks ensemble members by calculating the normalized mutual information (NMI) between ensemble members and consensus solutions generated from a randomly sampled subset of the full ensemble. The hierarchical cluster and select method (HCAS-SQI) uses the diversity matrix of ensemble members to select a diverse set of ensemble members with the highest quality. Furthermore, a combining strategy can be used to combine subsets selected using multiple quality indices (HCAS-MQI) for the refinement of clustering solutions in the ensemble. The IR + Raman hybrid ensemble library is created by merging two complementary “views” of the organic compounds. This inherently more diverse library gives the best full ensemble consensus results. Overall, the Bagging method is recommended because it provides the most robust results that are better than or comparable to the full ensemble consensus solutions.

对于选择合适的聚类算法的挑战,一种解决方案是将不同的聚类组合成一个一致的聚类结果,称为聚类集成(CE)。这种集成学习策略可以跨不同的领域和数据集提供更健壮和稳定的解决方案。不幸的是,并不是集合中的所有聚类都对最终的数据分区有贡献。集群集成选择(CES)旨在从大型集群解决方案库中选择一个子集,以形成一个较小的集群集成,该集群集成的性能与所有可用的集群解决方案集一样好,甚至更好。在本文中,我们研究了四种利用高维红外和拉曼光谱数据对结构不同的有机化合物进行分类的CES方法。单一质量选择(SQI)通过选择最高质量的集成成员形成集成的子集。单一质量选择(SQI)方法与各种质量指标结合使用,通过包含最高质量的集合成员来选择子集。Bagging方法通常应用于监督学习,通过计算集合成员之间的归一化互信息(NMI)和从完整集合的随机抽样子集生成的共识解来对集合成员进行排序。层次聚类选择方法(HCAS-SQI)利用集合成员的多样性矩阵来选择质量最高的集合成员。此外,可以使用组合策略将使用多质量指标(HCAS-MQI)选择的子集组合在一起,以改进集成中的聚类解。红外+拉曼混合集合库是通过合并有机化合物的两个互补“视图”而创建的。这个本质上更加多样化的库提供了最好的全集成一致结果。总的来说,Bagging方法是推荐的,因为它提供了比完整集合共识解决方案更好或可与之相比的最可靠的结果。
{"title":"Examining unsupervised ensemble learning using spectroscopy data of organic compounds","authors":"Kedan He,&nbsp;Djenerly G. Massena","doi":"10.1007/s10822-022-00488-9","DOIUrl":"10.1007/s10822-022-00488-9","url":null,"abstract":"<div><p>One solution to the challenge of choosing an appropriate clustering algorithm is to combine different clusterings into a single consensus clustering result, known as cluster ensemble (CE). This ensemble learning strategy can provide more robust and stable solutions across different domains and datasets. Unfortunately, not all clusterings in the ensemble contribute to the final data partition. Cluster ensemble selection (CES) aims at selecting a subset from a large library of clustering solutions to form a smaller cluster ensemble that performs as well as or better than the set of all available clustering solutions. In this paper, we investigate four CES methods for the categorization of structurally distinct organic compounds using high-dimensional IR and Raman spectroscopy data. Single quality selection (SQI) forms a subset of the ensemble by selecting the highest quality ensemble members. The Single Quality Selection (SQI) method is used with various quality indices to select subsets by including the highest quality ensemble members. The Bagging method, usually applied in supervised learning, ranks ensemble members by calculating the normalized mutual information (NMI) between ensemble members and consensus solutions generated from a randomly sampled subset of the full ensemble. The hierarchical cluster and select method (HCAS-SQI) uses the diversity matrix of ensemble members to select a diverse set of ensemble members with the highest quality. Furthermore, a combining strategy can be used to combine subsets selected using multiple quality indices (HCAS-MQI) for the refinement of clustering solutions in the ensemble. The IR + Raman hybrid ensemble library is created by merging two complementary “views” of the organic compounds. This inherently more diverse library gives the best full ensemble consensus results. Overall, the Bagging method is recommended because it provides the most robust results that are better than or comparable to the full ensemble consensus solutions.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4840927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comprehensive evaluation of end-point free energy techniques in carboxylated-pillar[6]arene host–guest binding: II. regression and dielectric constant 羧基-柱[6]芳烃主客体结合终点自由能技术的综合评价[j]。回归和介电常数
IF 3.5 3区 生物学 Q1 Chemistry Pub Date : 2022-11-17 DOI: 10.1007/s10822-022-00487-w
Xiao Liu, Lei Zheng, Yalong Cong, Zhihao Gong, Zhixiang Yin, John Z. H. Zhang, Zhirong Liu, Zhaoxi Sun

End-point free energy calculations as a powerful tool have been widely applied in protein–ligand and protein–protein interactions. It is often recognized that these end-point techniques serve as an option of intermediate accuracy and computational cost compared with more rigorous statistical mechanic models (e.g., alchemical transformation) and coarser molecular docking. However, it is observed that this intermediate level of accuracy does not hold in relatively simple and prototypical host–guest systems. Specifically, in our previous work investigating a set of carboxylated-pillar[6]arene host–guest complexes, end-point methods provide free energy estimates deviating significantly from the experimental reference, and the rank of binding affinities is also incorrectly computed. These observations suggest the unsuitability and inapplicability of standard end-point free energy techniques in host–guest systems, and alteration and development are required to make them practically usable. In this work, we consider two ways to improve the performance of end-point techniques. The first one is the PBSA_E regression that varies the weights of different free energy terms in the end-point calculation procedure, while the second one is considering the interior dielectric constant as an additional variable in the end-point equation. By detailed investigation of the calculation procedure and the simulation outcome, we prove that these two treatments (i.e., regression and dielectric constant) are manipulating the end-point equation in a somehow similar way, i.e., weakening the electrostatic contribution and strengthening the non-polar terms, although there are still many detailed differences between these two methods. With the trained end-point scheme, the RMSE of the computed affinities is improved from the standard ~ 12 kcal/mol to ~ 2.4 kcal/mol, which is comparable to another altered end-point method (ELIE) trained with system-specific data. By tuning PBSA_E weighting factors with the host-specific data, it is possible to further decrease the prediction error to ~ 2.1 kcal/mol. These observations along with the extremely efficient optimized-structure computation procedure suggest the regression (i.e., PBSA_E as well as its GBSA_E extension) as a practically applicable solution that brings end-point methods back into the library of usable tools for host–guest binding. However, the dielectric-constant-variable scheme cannot effectively minimize the experiment-calculation discrepancy for absolute binding affinities, but is able to improve the calculation of affinity ranks. This phenomenon is somehow different from the protein–ligand case and suggests the difference between host–guest and biomacromolecular (protein–ligand and protein–protein) systems. Therefore, the spectrum of tools usable for protein–ligand complexes could be unsuitable for host–guest binding, and numerical validations are necessary to screen out really workable solutions in these ‘pr

终点自由能计算作为一种强有力的工具,在蛋白质-配体和蛋白质-蛋白质相互作用中得到了广泛的应用。人们通常认识到,与更严格的统计力学模型(例如炼金术转化)和更粗糙的分子对接相比,这些端点技术可以作为中间精度和计算成本的选择。然而,我们观察到,在相对简单和典型的主客系统中,这种中间精度水平并不成立。具体来说,在我们之前研究一组羧基柱[6]芳烃主客体配合物的工作中,终点法提供的自由能估计与实验参考有很大偏差,并且结合亲和等级的计算也不正确。这些观察结果表明,标准端点自由能技术在主客系统中的不适宜性和不适用性,需要改变和发展以使其实际可用。在这项工作中,我们考虑了两种方法来提高端点技术的性能。第一种是改变终点计算过程中不同自由能项权重的PBSA_E回归,第二种是在终点方程中考虑内部介电常数作为附加变量。通过对计算过程和模拟结果的详细研究,我们证明了这两种处理(即回归和介电常数)以某种类似的方式操纵终点方程,即削弱静电贡献和加强非极性项,尽管这两种方法之间仍有许多详细的差异。使用训练终点方案,计算亲和力的RMSE从标准的~ 12 kcal/mol提高到~ 2.4 kcal/mol,这与使用系统特定数据训练的另一种改变终点方法(ELIE)相当。通过调整PBSA_E权重因子与宿主特定数据,可以进一步降低预测误差至~ 2.1 kcal/mol。这些观察结果以及极其高效的优化结构计算过程表明,回归(即PBSA_E及其GBSA_E扩展)是一种实际适用的解决方案,它将端点方法带回了主机-客户机绑定的可用工具库中。然而,电介质常数-变量格式不能有效地减小绝对结合亲和度的实验计算差异,但可以提高亲和度等级的计算。这种现象在某种程度上不同于蛋白质-配体的情况,并表明了宿主-客体和生物大分子(蛋白质-配体和蛋白质-蛋白质)系统之间的差异。因此,用于蛋白质-配体复合物的工具谱可能不适合宿主-客体结合,在这些“原型”情况下,需要进行数值验证以筛选出真正可行的解决方案。
{"title":"Comprehensive evaluation of end-point free energy techniques in carboxylated-pillar[6]arene host–guest binding: II. regression and dielectric constant","authors":"Xiao Liu,&nbsp;Lei Zheng,&nbsp;Yalong Cong,&nbsp;Zhihao Gong,&nbsp;Zhixiang Yin,&nbsp;John Z. H. Zhang,&nbsp;Zhirong Liu,&nbsp;Zhaoxi Sun","doi":"10.1007/s10822-022-00487-w","DOIUrl":"10.1007/s10822-022-00487-w","url":null,"abstract":"<div><p>End-point free energy calculations as a powerful tool have been widely applied in protein–ligand and protein–protein interactions. It is often recognized that these end-point techniques serve as an option of intermediate accuracy and computational cost compared with more rigorous statistical mechanic models (e.g., alchemical transformation) and coarser molecular docking. However, it is observed that this intermediate level of accuracy does not hold in relatively simple and prototypical host–guest systems. Specifically, in our previous work investigating a set of carboxylated-pillar[6]arene host–guest complexes, end-point methods provide free energy estimates deviating significantly from the experimental reference, and the rank of binding affinities is also incorrectly computed. These observations suggest the unsuitability and inapplicability of standard end-point free energy techniques in host–guest systems, and alteration and development are required to make them practically usable. In this work, we consider two ways to improve the performance of end-point techniques. The first one is the PBSA_E regression that varies the weights of different free energy terms in the end-point calculation procedure, while the second one is considering the interior dielectric constant as an additional variable in the end-point equation. By detailed investigation of the calculation procedure and the simulation outcome, we prove that these two treatments (i.e., regression and dielectric constant) are manipulating the end-point equation in a somehow similar way, i.e., weakening the electrostatic contribution and strengthening the non-polar terms, although there are still many detailed differences between these two methods. With the trained end-point scheme, the RMSE of the computed affinities is improved from the standard ~ 12 kcal/mol to ~ 2.4 kcal/mol, which is comparable to another altered end-point method (ELIE) trained with system-specific data. By tuning PBSA_E weighting factors with the host-specific data, it is possible to further decrease the prediction error to ~ 2.1 kcal/mol. These observations along with the extremely efficient optimized-structure computation procedure suggest the regression (i.e., PBSA_E as well as its GBSA_E extension) as a practically applicable solution that brings end-point methods back into the library of usable tools for host–guest binding. However, the dielectric-constant-variable scheme cannot effectively minimize the experiment-calculation discrepancy for absolute binding affinities, but is able to improve the calculation of affinity ranks. This phenomenon is somehow different from the protein–ligand case and suggests the difference between host–guest and biomacromolecular (protein–ligand and protein–protein) systems. Therefore, the spectrum of tools usable for protein–ligand complexes could be unsuitable for host–guest binding, and numerical validations are necessary to screen out really workable solutions in these ‘pr","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4702276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Reliable gas-phase tautomer equilibria of drug-like molecule scaffolds and the issue of continuum solvation 类药物分子支架的可靠气相互变异构平衡和连续溶剂化问题
IF 3.5 3区 生物学 Q1 Chemistry Pub Date : 2022-11-02 DOI: 10.1007/s10822-022-00480-3
Andreas H. Göller

Accurate calculation of relative tautomer energies in different environments is a prerequisite to many parameters of relevance in drug discovery. This work provides a thorough benchmark of the semiempirical methods AM1, PM3 and GFN2-xTB, the force-field OPLS4, Hartree–Fock and HF-3c, the density functionals PBEh-3c, B97-3c, r2SCAN-3c, PBE, PBE0, TPSS, r2SCAN, ω-B97X-V, M06-2X, B3LYP, B2PLYP, and second-order perturbation theory MP2 versus the gold-standard coupled-cluster DLPNO-CCSD(T) using the def2-QZVPP basis set. The outperforming method identified is M06-2X, whereas r2SCAN-3c is the best-perfoming one in the set of cost-optimized methods. Application of the two methods on a challenging subset from the SAMPL2 challenge provides evidence that deviations from experiment are caused by deficiencies of current continuum solvation methods.

准确计算不同环境下互变异构体的相对能量是药物发现中许多相关参数的先决条件。本文利用def2-QZVPP基集,对半经验方法AM1、PM3和GFN2-xTB、力场OPLS4、hartref - fock和HF-3c、密度泛函phh -3c、B97-3c、r2SCAN-3c、PBE、PBE0、TPSS、r2SCAN、ω-B97X-V、M06-2X、B3LYP、B2PLYP和二阶微动理论MP2与金标准耦合簇DLPNO-CCSD(T)进行了全面的基准测试。在成本优化方法集合中,性能最佳的方法是M06-2X,而性能最佳的方法是r2SCAN-3c。这两种方法在SAMPL2挑战的挑战性子集上的应用证明,实验偏差是由当前连续介质溶剂化方法的缺陷引起的。
{"title":"Reliable gas-phase tautomer equilibria of drug-like molecule scaffolds and the issue of continuum solvation","authors":"Andreas H. Göller","doi":"10.1007/s10822-022-00480-3","DOIUrl":"10.1007/s10822-022-00480-3","url":null,"abstract":"<div><p>Accurate calculation of relative tautomer energies in different environments is a prerequisite to many parameters of relevance in drug discovery. This work provides a thorough benchmark of the semiempirical methods AM1, PM3 and GFN2-xTB, the force-field OPLS4, Hartree–Fock and HF-3c, the density functionals PBEh-3c, B97-3c, r2SCAN-3c, PBE, PBE0, TPSS, r2SCAN, ω-B97X-V, M06-2X, B3LYP, B2PLYP, and second-order perturbation theory MP2 versus the gold-standard coupled-cluster DLPNO-CCSD(T) using the def2-QZVPP basis set. The outperforming method identified is M06-2X, whereas r2SCAN-3c is the best-perfoming one in the set of cost-optimized methods. Application of the two methods on a challenging subset from the SAMPL2 challenge provides evidence that deviations from experiment are caused by deficiencies of current continuum solvation methods.\u0000</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4099831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The FMO2 analysis of the ligand-receptor binding energy: the Biscarbene-Gold(I)/DNA G-Quadruplex case study 配体-受体结合能的FMO2分析:比斯卡宾-金(I)/DNA g -四重体案例研究
IF 3.5 3区 生物学 Q1 Chemistry Pub Date : 2022-11-01 DOI: 10.1007/s10822-022-00484-z
Roberto Paciotti, Cecilia Coletti, Alessandro Marrone, Nazzareno Re

In this work, the ab initio fragment molecular orbital (FMO) method was applied to calculate and analyze the binding energy of two biscarbene-Au(I) derivatives, [Au(9-methylcaffein-8-ylidene)2]+ and [Au(1,3-dimethylbenzimidazol-2-ylidene)2]+, to the DNA G-Quadruplex structure. The FMO2 binding energy considers the ligand-receptor complex as well as the isolated forms of energy-minimum state of ligand and receptor, providing a better description of ligand-receptor affinity compared with simple pair interaction energies (PIE). Our results highlight important features of the binding process of biscarbene-Au(I) derivatives to DNA G-Quadruplex, indicating that the total deformation-polarization energy and desolvation penalty of the ligands are the main terms destabilizing the binding. The pair interaction energy decomposition analysis (PIEDA) between ligand and nucleobases suggest that the main interaction terms are electrostatic and charge-transfer energies supporting the hypothesis that Au(I) ion can be involved in π-cation interactions further stabilizing the ligand-receptor complex. Moreover, the presence of polar groups on the carbene ring, as C = O, can improve the charge-transfer interaction with K+ ion. These findings can be employed to design new powerful biscarbene-Au(I) DNA-G quadruplex binders as promising anticancer drugs. The procedure described in this work can be applied to investigate any ligand-receptor system and is particularly useful when the binding process is strongly characterized by polarization, charge-transfer and dispersion interactions, properly evaluated by ab initio methods.

本文采用从头算片段分子轨道(FMO)方法,计算并分析了两种双卡宾-Au(I)衍生物[Au(9-甲基咖啡因-8-酰基)2]+和[Au(1,3-二甲基苯并咪唑-2-酰基)2]+对DNA g -四重体结构的结合能。FMO2结合能考虑了配体-受体复合物以及配体和受体能量最低状态的孤立形式,与简单对相互作用能(PIE)相比,能更好地描述配体-受体的亲和力。我们的研究结果突出了比斯卡宾-金(I)衍生物与DNA g -四重体结合过程的重要特征,表明配体的总变形极化能和脱溶惩罚是破坏结合的主要因素。配体与核碱基之间的对相互作用能分解分析(PIEDA)表明,主要相互作用项是静电能和电荷转移能,支持Au(I)离子可以参与π-阳离子相互作用的假设,进一步稳定配体-受体复合物。此外,碳环上存在极性基团,如C = O,可以改善与K+离子的电荷转移相互作用。这些发现可用于设计新的强效双卡宾- au (I) DNA-G四重体结合物,作为有前景的抗癌药物。这项工作中描述的程序可以应用于研究任何配体-受体系统,当结合过程具有极化、电荷转移和色散相互作用的强烈特征时,特别有用,可以用从头算方法进行适当的评估。
{"title":"The FMO2 analysis of the ligand-receptor binding energy: the Biscarbene-Gold(I)/DNA G-Quadruplex case study","authors":"Roberto Paciotti,&nbsp;Cecilia Coletti,&nbsp;Alessandro Marrone,&nbsp;Nazzareno Re","doi":"10.1007/s10822-022-00484-z","DOIUrl":"10.1007/s10822-022-00484-z","url":null,"abstract":"<div><p>In this work, the ab initio fragment molecular orbital (FMO) method was applied to calculate and analyze the binding energy of two biscarbene-Au(I) derivatives, [Au(9-methylcaffein-8-ylidene)<sub>2</sub>]<sup>+</sup> and [Au(1,3-dimethylbenzimidazol-2-ylidene)<sub>2</sub>]<sup>+</sup>, to the DNA G-Quadruplex structure. The FMO2 binding energy considers the ligand-receptor complex as well as the isolated forms of energy-minimum state of ligand and receptor, providing a better description of ligand-receptor affinity compared with simple pair interaction energies (PIE). Our results highlight important features of the binding process of biscarbene-Au(I) derivatives to DNA G-Quadruplex, indicating that the total deformation-polarization energy and desolvation penalty of the ligands are the main terms destabilizing the binding. The pair interaction energy decomposition analysis (PIEDA) between ligand and nucleobases suggest that the main interaction terms are electrostatic and charge-transfer energies supporting the hypothesis that Au(I) ion can be involved in π-cation interactions further stabilizing the ligand-receptor complex. Moreover, the presence of polar groups on the carbene ring, as C = O, can improve the charge-transfer interaction with K<sup>+</sup> ion. These findings can be employed to design new powerful biscarbene-Au(I) DNA-G quadruplex binders as promising anticancer drugs. The procedure described in this work can be applied to investigate any ligand-receptor system and is particularly useful when the binding process is strongly characterized by polarization, charge-transfer and dispersion interactions, properly evaluated by ab initio methods.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10822-022-00484-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4051635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
From oncoproteins to spike proteins: the evaluation of intramolecular stability using hydropathic force field 从癌蛋白到刺突蛋白:用亲水力场评价分子内稳定性
IF 3.5 3区 生物学 Q1 Chemistry Pub Date : 2022-10-31 DOI: 10.1007/s10822-022-00477-y
Federica Agosta, Glen E. Kellogg, Pietro Cozzini

Evaluation of the intramolecular stability of proteins plays a key role in the comprehension of their biological behavior and mechanism of action. Small structural alterations such as mutations induced by single nucleotide polymorphism can impact biological activity and pharmacological modulation. Covid-19 mutations, that affect viral replication and the susceptibility to antibody neutralization, and the action of antiviral drugs, are just one example. In this work, the intramolecular stability of mutated proteins, like Spike glycoprotein and its complexes with the human target, is evaluated through hydropathic intramolecular energy scoring originally conceived by Abraham and Kellogg based on the “Extension of the fragment method to calculate amino acid zwitterion and side-chain partition coefficients” by Abraham and Leo in Proteins: Struct. Funct. Genet. 1987, 2:130 − 52. HINT is proposed as a fast and reliable tool for the stability evaluation of any mutated system. This work has been written in honor of Prof. Donald J. Abraham (1936–2021).

评价蛋白质的分子内稳定性对理解蛋白质的生物学行为和作用机制具有重要意义。小的结构改变,如突变引起的单核苷酸多态性可以影响生物活性和药理学调节。影响病毒复制和对抗体中和的易感性以及抗病毒药物作用的Covid-19突变只是一个例子。在本研究中,基于Abraham和Leo在《proteins: Struct》中提出的“扩展片段法计算氨基酸两性离子和侧链分配系数”,通过Abraham和Kellogg最初设想的分子内亲水能量评分法来评估突变蛋白(如Spike糖蛋白及其与人类靶标的复合物)的分子内稳定性。功能。热学杂志。1987,2:130−52。提出了一种快速、可靠的对任意突变系统进行稳定性评估的方法。这本书是为了纪念唐纳德·j·亚伯拉罕教授(1936-2021)而写的。
{"title":"From oncoproteins to spike proteins: the evaluation of intramolecular stability using hydropathic force field","authors":"Federica Agosta,&nbsp;Glen E. Kellogg,&nbsp;Pietro Cozzini","doi":"10.1007/s10822-022-00477-y","DOIUrl":"10.1007/s10822-022-00477-y","url":null,"abstract":"<div><p>Evaluation of the intramolecular stability of proteins plays a key role in the comprehension of their biological behavior and mechanism of action. Small structural alterations such as mutations induced by single nucleotide polymorphism can impact biological activity and pharmacological modulation. Covid-19 mutations, that affect viral replication and the susceptibility to antibody neutralization, and the action of antiviral drugs, are just one example. In this work, the intramolecular stability of mutated proteins, like Spike glycoprotein and its complexes with the human target, is evaluated through hydropathic intramolecular energy scoring originally conceived by Abraham and Kellogg based on the “Extension of the fragment method to calculate amino acid zwitterion and side-chain partition coefficients” by Abraham and Leo in <i>Proteins</i>: <i>Struct. Funct. Genet.</i> 1987, 2:130 − 52. HINT is proposed as a fast and reliable tool for the stability evaluation of any mutated system. This work has been written in honor of Prof. Donald J. Abraham (1936–2021).</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10822-022-00477-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"5188513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Physicochemical QSAR analysis of hERG inhibition revisited: towards a quantitative potency prediction hERG抑制的理化QSAR分析再访:走向定量效价预测
IF 3.5 3区 生物学 Q1 Chemistry Pub Date : 2022-10-28 DOI: 10.1007/s10822-022-00483-0
Kiril Lanevskij, Remigijus Didziapetris, Andrius Sazonovas

In an earlier study (Didziapetris R & Lanevskij K (2016). J Comput Aided Mol Des. 30:1175–1188) we collected a database of publicly available hERG inhibition data for almost 6700 drug-like molecules and built a probabilistic Gradient Boosting classifier with a minimal set of physicochemical descriptors (log P, pKa, molecular size and topology parameters). This approach favored interpretability over statistical performance but still achieved an overall classification accuracy of 75%. In the current follow-up work we expanded the database (provided in Supplementary Information) to almost 9400 molecules and performed temporal validation of the model on a set of novel chemicals from recently published lead optimization projects. Validation results showed almost no performance degradation compared to the original study. Additionally, we rebuilt the model using AFT (Accelerated Failure Time) learning objective in XGBoost, which accepts both quantitative and censored data often reported in protein inhibition studies. The new model achieved a similar level of accuracy of discerning hERG blockers from non-blockers at 10 µM threshold, which can be conceived as close to the performance ceiling for methods aiming to describe only non-specific ligand interactions with hERG. Yet, this model outputs quantitative potency values (IC50) and is not tied to a particular classification cut-off. pIC50 from patch-clamp measurements can be predicted with R2 ≈ 0.4 and MAE < 0.5, which enables ligand ranking according to their expected potency levels. The employed approach can be valuable for quantitative modeling of various ADME and drug safety endpoints with a high prevalence of censored data.

在早期的一项研究中(Didziapetris R &Lanevskij K(2016)。我们收集了近6700种药物类分子的公开可用hERG抑制数据数据库,并使用最小的物理化学描述符(log P, pKa,分子大小和拓扑参数)构建了概率梯度增强分类器。这种方法更倾向于可解释性而不是统计性能,但仍然实现了75%的总体分类精度。在目前的后续工作中,我们将数据库(在补充信息中提供)扩展到近9400个分子,并在最近发表的先导优化项目的一组新化学物质上对该模型进行了时间验证。验证结果显示,与原始研究相比,几乎没有性能下降。此外,我们使用XGBoost中的AFT(加速失效时间)学习目标重建了模型,该目标接受定量和审查数据,通常在蛋白质抑制研究中报道。新模型在10 μ M阈值下实现了hERG阻滞剂和非阻滞剂的相似准确度,这可以被认为接近于仅描述与hERG非特异性配体相互作用的方法的性能上限。然而,该模型输出定量效价值(IC50),并且不依赖于特定的分类截止值。膜片钳测量的pIC50可以用R2≈0.4和MAE < 0.5来预测,这使得配体能够根据其预期的效价水平进行排序。所采用的方法对于各种ADME和药物安全端点的定量建模具有很高的审查数据的流行率。
{"title":"Physicochemical QSAR analysis of hERG inhibition revisited: towards a quantitative potency prediction","authors":"Kiril Lanevskij,&nbsp;Remigijus Didziapetris,&nbsp;Andrius Sazonovas","doi":"10.1007/s10822-022-00483-0","DOIUrl":"10.1007/s10822-022-00483-0","url":null,"abstract":"<div><p>In an earlier study (Didziapetris R &amp; Lanevskij K (2016). J Comput Aided Mol Des. 30:1175–1188) we collected a database of publicly available hERG inhibition data for almost 6700 drug-like molecules and built a probabilistic Gradient Boosting classifier with a minimal set of physicochemical descriptors (log <i>P</i>, p<i>K</i><sub>a</sub>, molecular size and topology parameters). This approach favored interpretability over statistical performance but still achieved an overall classification accuracy of 75%. In the current follow-up work we expanded the database (provided in Supplementary Information) to almost 9400 molecules and performed temporal validation of the model on a set of novel chemicals from recently published lead optimization projects. Validation results showed almost no performance degradation compared to the original study. Additionally, we rebuilt the model using AFT (Accelerated Failure Time) learning objective in XGBoost, which accepts both quantitative and censored data often reported in protein inhibition studies. The new model achieved a similar level of accuracy of discerning hERG blockers from non-blockers at 10 µM threshold, which can be conceived as close to the performance ceiling for methods aiming to describe only non-specific ligand interactions with hERG. Yet, this model outputs quantitative potency values (<i>IC</i><sub>50</sub>) and is not tied to a particular classification cut-off. p<i>IC</i><sub>50</sub> from patch-clamp measurements can be predicted with R<sup>2</sup> ≈ 0.4 and MAE &lt; 0.5, which enables ligand ranking according to their expected potency levels. The employed approach can be valuable for quantitative modeling of various ADME and drug safety endpoints with a high prevalence of censored data.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10822-022-00483-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"5098910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Improved prediction and characterization of blood-brain barrier penetrating peptides using estimated propensity scores of dipeptides 改进预测和表征血脑屏障穿透肽使用估计的倾向分数的二肽
IF 3.5 3区 生物学 Q1 Chemistry Pub Date : 2022-10-26 DOI: 10.1007/s10822-022-00476-z
Phasit Charoenkwan, Pramote Chumnanpuen, Nalini Schaduangrat, Pietro Lio’, Mohammad Ali Moni, Watshara Shoombuatong

The blood-brain barrier (BBB) is the primary barrier with a highly selective semipermeable border between blood vascular endothelial cells and the central nervous system. Since BBB can prevent drugs circulating in the blood from crossing into the interstitial fluid of the brain where neurons reside, many researchers are working hard on developing drug delivery systems to penetrate the BBB which currently poses a challenge. Thus, blood-brain barrier penetrating peptides (B3PPs) are an alternative neurotherapeutic for brain-related disorder since they can facilitate drug delivery into the brain. In the meanwhile, developing computational methods that are effective for both the identification and characterization of B3PPs in a cost-effective manner plays an important role for basic reach and in the pharmaceutical industry. Even though few computational methods for B3PP identification have been developed, their performance might fail in terms of generalization ability and interpretability. In this study, a novel and efficient scoring card method-based predictor (termed SCMB3PP) is presented for improving B3PP identification and characterization. To overcome the limitation of black-box computational approaches, the SCMB3PP predictor can automatically estimate amino acid and dipeptide propensities to be B3PPs. Both cross-validation and independent tests indicate that SCMB3PP can achieve impressive performance and outperform various popular machine learning-based methods and the existing methods on multiple independent test datasets. Furthermore, SCMB3PP-derived amino acid propensities were utilized to identify informative biophysical and biochemical properties for characterizing B3PPs. Finally, an online user-friendly web server (http://pmlabstack.pythonanywhere.com/SCMB3PP) is established to identify novel and potential B3PP cost-effectively. This novel computational approach is anticipated to facilitate the large-scale identification of high potential B3PP candidates for follow-up experimental validation.

血脑屏障(BBB)是血管内皮细胞和中枢神经系统之间具有高度选择性的半渗透性边界的主要屏障。由于血脑屏障可以阻止血液中循环的药物进入神经元所在的脑间质液,许多研究人员正在努力开发穿透血脑屏障的药物输送系统,这是目前的一个挑战。因此,血脑屏障穿透肽(B3PPs)是脑相关疾病的另一种神经治疗方法,因为它们可以促进药物进入大脑。与此同时,开发有效的计算方法,以经济有效的方式对B3PPs进行识别和表征,对基础医疗和制药行业具有重要作用。尽管B3PP识别的计算方法很少,但它们的性能在泛化能力和可解释性方面可能会失败。在这项研究中,提出了一种新颖有效的基于计分卡方法的预测器(称为SCMB3PP),用于改进B3PP的识别和表征。为了克服黑箱计算方法的局限性,SCMB3PP预测器可以自动估计氨基酸和二肽倾向为b3pp。交叉验证和独立测试表明,在多个独立测试数据集上,SCMB3PP可以取得令人印象深刻的性能,并且优于各种流行的基于机器学习的方法和现有的方法。此外,利用scmb3pp衍生的氨基酸倾向来鉴定b3pp的生物物理和生化特性。最后,建立了一个在线用户友好的web服务器(http://pmlabstack.pythonanywhere.com/SCMB3PP),以经济有效地识别新的和潜在的B3PP。这种新颖的计算方法有望促进大规模识别高潜力的B3PP候选物,以进行后续实验验证。
{"title":"Improved prediction and characterization of blood-brain barrier penetrating peptides using estimated propensity scores of dipeptides","authors":"Phasit Charoenkwan,&nbsp;Pramote Chumnanpuen,&nbsp;Nalini Schaduangrat,&nbsp;Pietro Lio’,&nbsp;Mohammad Ali Moni,&nbsp;Watshara Shoombuatong","doi":"10.1007/s10822-022-00476-z","DOIUrl":"10.1007/s10822-022-00476-z","url":null,"abstract":"<div><p>The blood-brain barrier (BBB) is the primary barrier with a highly selective semipermeable border between blood vascular endothelial cells and the central nervous system. Since BBB can prevent drugs circulating in the blood from crossing into the interstitial fluid of the brain where neurons reside, many researchers are working hard on developing drug delivery systems to penetrate the BBB which currently poses a challenge. Thus, blood-brain barrier penetrating peptides (B3PPs) are an alternative neurotherapeutic for brain-related disorder since they can facilitate drug delivery into the brain. In the meanwhile, developing computational methods that are effective for both the identification and characterization of B3PPs in a cost-effective manner plays an important role for basic reach and in the pharmaceutical industry. Even though few computational methods for B3PP identification have been developed, their performance might fail in terms of generalization ability and interpretability. In this study, a novel and efficient scoring card method-based predictor (termed SCMB3PP) is presented for improving B3PP identification and characterization. To overcome the limitation of black-box computational approaches, the SCMB3PP predictor can automatically estimate amino acid and dipeptide propensities to be B3PPs. Both cross-validation and independent tests indicate that SCMB3PP can achieve impressive performance and outperform various popular machine learning-based methods and the existing methods on multiple independent test datasets. Furthermore, SCMB3PP-derived amino acid propensities were utilized to identify informative biophysical and biochemical properties for characterizing B3PPs. Finally, an online user-friendly web server (http://pmlabstack.pythonanywhere.com/SCMB3PP) is established to identify novel and potential B3PP cost-effectively. This novel computational approach is anticipated to facilitate the large-scale identification of high potential B3PP candidates for follow-up experimental validation.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"5023182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Enabling data-limited chemical bioactivity predictions through deep neural network transfer learning 通过深度神经网络迁移学习实现数据有限的化学生物活性预测
IF 3.5 3区 生物学 Q1 Chemistry Pub Date : 2022-10-22 DOI: 10.1007/s10822-022-00486-x
Ruifeng Liu, Srinivas Laxminarayan, Jaques Reifman, Anders Wallqvist

The main limitation in developing deep neural network (DNN) models to predict bioactivity properties of chemicals is the lack of sufficient assay data to train the network’s classification layers. Focusing on feedforward DNNs that use atom- and bond-based structural fingerprints as input, we examined whether layers of a fully trained DNN based on large amounts of data to predict one property could be used to develop DNNs to predict other related or unrelated properties based on limited amounts of data. Hence, we assessed if and under what conditions the dense layers of a pre-trained DNN could be transferred and used for the development of another DNN associated with limited training data. We carried out a quantitative study employing more than 400 pairs of assay datasets, where we used fully trained layers from a large dataset to augment the training of a small dataset. We found that the higher the correlation r between two assay datasets, the more efficient the transfer learning is in reducing prediction errors associated with the smaller dataset DNN predictions. The reduction in mean squared prediction errors ranged from 10 to 20% for every 0.1 increase in r2 between the datasets, with the bulk of the error reductions associated with transfers of the first dense layer. Transfer of other dense layers did not result in additional benefits, suggesting that deeper, dense layers conveyed more specialized and assay-specific information. Importantly, depending on the dataset correlation, training sample size could be reduced by up to tenfold without any loss of prediction accuracy.

开发深度神经网络(DNN)模型来预测化学物质的生物活性特性的主要限制是缺乏足够的分析数据来训练网络的分类层。专注于使用基于原子和键的结构指纹作为输入的前馈深度神经网络,我们研究了基于大量数据来预测一个属性的完全训练的深度神经网络的层是否可以用于开发基于有限数据来预测其他相关或不相关属性的深度神经网络。因此,我们评估了预训练DNN的密集层是否以及在什么条件下可以转移,并用于与有限训练数据相关的另一个DNN的开发。我们进行了一项定量研究,使用了400多对分析数据集,其中我们使用了来自大型数据集的完全训练层来增强小型数据集的训练。我们发现,两个分析数据集之间的相关性r越高,迁移学习在减少与较小数据集DNN预测相关的预测误差方面就越有效。数据集之间的r2每增加0.1,均方预测误差的减少幅度从10%到20%不等,大部分误差减少与第一个密集层的转移有关。其他密集层的转移没有带来额外的好处,这表明更深、更密集的层传递了更专业和分析特定的信息。重要的是,根据数据集的相关性,训练样本量可以减少十倍,而不会损失预测精度。
{"title":"Enabling data-limited chemical bioactivity predictions through deep neural network transfer learning","authors":"Ruifeng Liu,&nbsp;Srinivas Laxminarayan,&nbsp;Jaques Reifman,&nbsp;Anders Wallqvist","doi":"10.1007/s10822-022-00486-x","DOIUrl":"10.1007/s10822-022-00486-x","url":null,"abstract":"<p>The main limitation in developing deep neural network (DNN) models to predict bioactivity properties of chemicals is the lack of sufficient assay data to train the network’s classification layers. Focusing on feedforward DNNs that use atom- and bond-based structural fingerprints as input, we examined whether layers of a fully trained DNN based on large amounts of data to predict one property could be used to develop DNNs to predict other related or unrelated properties based on limited amounts of data. Hence, we assessed if and under what conditions the dense layers of a pre-trained DNN could be transferred and used for the development of another DNN associated with limited training data. We carried out a quantitative study employing more than 400 pairs of assay datasets, where we used fully trained layers from a large dataset to augment the training of a small dataset. We found that the higher the correlation <i>r</i> between two assay datasets, the more efficient the transfer learning is in reducing prediction errors associated with the smaller dataset DNN predictions. The reduction in mean squared prediction errors ranged from 10 to 20% for every 0.1 increase in <i>r</i><sup>2</sup> between the datasets, with the bulk of the error reductions associated with transfers of the first dense layer. Transfer of other dense layers did not result in additional benefits, suggesting that deeper, dense layers conveyed more specialized and assay-specific information. Importantly, depending on the dataset correlation, training sample size could be reduced by up to tenfold without any loss of prediction accuracy.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2022-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10822-022-00486-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"5177069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Protocol for iterative optimization of modified peptides bound to protein targets 与蛋白靶标结合的修饰肽的迭代优化方案
IF 3.5 3区 生物学 Q1 Chemistry Pub Date : 2022-10-19 DOI: 10.1007/s10822-022-00482-1
Rodrigo Ochoa, Pilar Cossio, Thomas Fox

Peptides are commonly used as therapeutic agents. However, they suffer from easy degradation and instability. Replacing natural by non-natural amino acids can avoid these problems, and potentially improve the affinity towards the target protein. Here, we present a computational pipeline to optimize peptides based on adding non-natural amino acids while improving their binding affinity. The workflow is an iterative computational evolution algorithm, inspired by the PARCE protocol, that performs single-point mutations on the peptide sequence using modules from the Rosetta framework. The modifications can be guided based on the structural properties or previous knowledge of the biological system. At each mutation step, the affinity to the protein is estimated by sampling the complex conformations and applying a consensus metric using various open protein-ligand scoring functions. The mutations are accepted based on the score differences, allowing for an iterative optimization of the initial peptide. The sampling/scoring scheme was benchmarked with a set of protein-peptide complexes where experimental affinity values have been reported. In addition, a basic application using a known protein-peptide complex is also provided. The structure- and dynamic-based approach allows users to optimize bound peptides, with the option to personalize the code for further applications. The protocol, called mPARCE, is available at: https://github.com/rochoa85/mPARCE/.

多肽常被用作治疗剂。然而,它们容易退化和不稳定。用非天然氨基酸代替天然氨基酸可以避免这些问题,并有可能提高对目标蛋白的亲和力。在这里,我们提出了一个基于添加非天然氨基酸的计算管道来优化肽,同时提高它们的结合亲和力。该工作流程是一种迭代计算进化算法,受PARCE协议的启发,使用Rosetta框架中的模块对肽序列进行单点突变。修改可以根据结构特性或生物系统的先前知识进行指导。在每个突变步骤中,通过对复杂构象进行采样并使用各种开放蛋白质配体评分函数应用共识度量来估计对蛋白质的亲和力。基于分数差异接受突变,允许对初始肽进行迭代优化。采样/评分方案以一组蛋白质-肽复合物为基准,其中实验亲和力值已经报告。此外,还提供了使用已知蛋白质-肽复合物的基本应用。基于结构和动态的方法允许用户优化结合肽,并可选择个性化代码以供进一步应用。该协议名为mPARCE,可在https://github.com/rochoa85/mPARCE/上获得。
{"title":"Protocol for iterative optimization of modified peptides bound to protein targets","authors":"Rodrigo Ochoa,&nbsp;Pilar Cossio,&nbsp;Thomas Fox","doi":"10.1007/s10822-022-00482-1","DOIUrl":"10.1007/s10822-022-00482-1","url":null,"abstract":"<div><p>Peptides are commonly used as therapeutic agents. However, they suffer from easy degradation and instability. Replacing natural by non-natural amino acids can avoid these problems, and potentially improve the affinity towards the target protein. Here, we present a computational pipeline to optimize peptides based on adding non-natural amino acids while improving their binding affinity. The workflow is an iterative computational evolution algorithm, inspired by the PARCE protocol, that performs single-point mutations on the peptide sequence using modules from the Rosetta framework. The modifications can be guided based on the structural properties or previous knowledge of the biological system. At each mutation step, the affinity to the protein is estimated by sampling the complex conformations and applying a consensus metric using various open protein-ligand scoring functions. The mutations are accepted based on the score differences, allowing for an iterative optimization of the initial peptide. The sampling/scoring scheme was benchmarked with a set of protein-peptide complexes where experimental affinity values have been reported. In addition, a basic application using a known protein-peptide complex is also provided. The structure- and dynamic-based approach allows users to optimize bound peptides, with the option to personalize the code for further applications. The protocol, called mPARCE, is available at: https://github.com/rochoa85/mPARCE/.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10822-022-00482-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"5069090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An overview of the SAMPL8 host–guest binding challenge SAMPL8主客绑定挑战的概述
IF 3.5 3区 生物学 Q1 Chemistry Pub Date : 2022-10-14 DOI: 10.1007/s10822-022-00462-5
Martin Amezcua, Jeffry Setiadi, Yunhui Ge, David L. Mobley

The SAMPL series of challenges aim to focus the community on specific modeling challenges, while testing and hopefully driving progress of computational methods to help guide pharmaceutical drug discovery. In this study, we report on the results of the SAMPL8 host–guest blind challenge for predicting absolute binding affinities. SAMPL8 focused on two host–guest datasets, one involving the cucurbituril CB8 (with a series of common drugs of abuse) and another involving two different Gibb deep-cavity cavitands. The latter dataset involved a previously featured deep cavity cavitand (TEMOA) as well as a new variant (TEETOA), both binding to a series of relatively rigid fragment-like guests. Challenge participants employed a reasonably wide variety of methods, though many of these were based on molecular simulations, and predictive accuracy was mixed. As in some previous SAMPL iterations (SAMPL6 and SAMPL7), we found that one approach to achieve greater accuracy was to apply empirical corrections to the binding free energy predictions, taking advantage of prior data on binding to these hosts. Another approach which performed well was a hybrid MD-based approach with reweighting to a force matched QM potential. In the cavitand challenge, an alchemical method using the AMOEBA-polarizable force field achieved the best success with RMSE less than 1 kcal/mol, while another alchemical approach (ATM/GAFF2-AM1BCC/TIP3P/HREM) had RMSE less than 1.75 kcal/mol. The work discussed here also highlights several important lessons; for example, retrospective studies of reference calculations demonstrate the sensitivity of predicted binding free energies to ethyl group sampling and/or guest starting pose, providing guidance to help improve future studies on these systems.

SAMPL系列挑战旨在将社区重点放在特定的建模挑战上,同时测试并希望推动计算方法的进步,以帮助指导药物发现。在这项研究中,我们报告了SAMPL8主-客盲挑战预测绝对结合亲和力的结果。SAMPL8专注于两个主客数据集,一个涉及葫芦脲CB8(与一系列常见滥用药物),另一个涉及两种不同的吉布深腔空腔。后一个数据集涉及先前的深腔腔体(TEMOA)和一个新的变体(TEETOA),两者都与一系列相对刚性的碎片状客体相结合。挑战参与者采用了相当广泛的方法,尽管其中许多方法是基于分子模拟的,而且预测的准确性参差不齐。正如在之前的一些SAMPL迭代(SAMPL6和SAMPL7)中一样,我们发现实现更高精度的一种方法是对结合自由能预测应用经验修正,利用先前与这些宿主结合的数据。另一种表现良好的方法是基于混合md的方法,将权重重新调整到与力匹配的QM势。在空腔和挑战中,使用变形虫极化力场的炼金术方法的RMSE小于1 kcal/mol,而另一种炼金术方法(ATM/GAFF2-AM1BCC/TIP3P/HREM)的RMSE小于1.75 kcal/mol。这里讨论的工作还突出了几个重要的教训;例如,参考计算的回顾性研究证明了预测的结合自由能对乙基取样和/或客体起始姿态的敏感性,为帮助改进这些系统的未来研究提供了指导。
{"title":"An overview of the SAMPL8 host–guest binding challenge","authors":"Martin Amezcua,&nbsp;Jeffry Setiadi,&nbsp;Yunhui Ge,&nbsp;David L. Mobley","doi":"10.1007/s10822-022-00462-5","DOIUrl":"10.1007/s10822-022-00462-5","url":null,"abstract":"<div><p>The SAMPL series of challenges aim to focus the community on specific modeling challenges, while testing and hopefully driving progress of computational methods to help guide pharmaceutical drug discovery. In this study, we report on the results of the SAMPL8 host–guest blind challenge for predicting absolute binding affinities. SAMPL8 focused on two host–guest datasets, one involving the cucurbituril CB8 (with a series of common drugs of abuse) and another involving two different Gibb deep-cavity cavitands. The latter dataset involved a previously featured deep cavity cavitand (TEMOA) as well as a new variant (TEETOA), both binding to a series of relatively rigid fragment-like guests. Challenge participants employed a reasonably wide variety of methods, though many of these were based on molecular simulations, and predictive accuracy was mixed. As in some previous SAMPL iterations (SAMPL6 and SAMPL7), we found that one approach to achieve greater accuracy was to apply empirical corrections to the binding free energy predictions, taking advantage of prior data on binding to these hosts. Another approach which performed well was a hybrid MD-based approach with reweighting to a force matched QM potential. In the cavitand challenge, an alchemical method using the AMOEBA-polarizable force field achieved the best success with RMSE less than 1 kcal/mol, while another alchemical approach (<i>ATM/GAFF2-AM1BCC/TIP3P/HREM</i>) had RMSE less than 1.75 kcal/mol. The work discussed here also highlights several important lessons; for example, retrospective studies of reference calculations demonstrate the sensitivity of predicted binding free energies to ethyl group sampling and/or guest starting pose, providing guidance to help improve future studies on these systems.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10822-022-00462-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"4589337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
Journal of Computer-Aided Molecular Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1