Pub Date : 2024-08-27DOI: 10.1007/s10822-024-00571-3
Hyosoon Jang, Sangmin Seo, Sanghyun Park, Byung Ju Kim, Geon-Woo Choi, Jonghwan Choi, Chihyun Park
Over the last decade, automatic chemical design frameworks for discovering molecules with drug-like properties have significantly progressed. Among them, the variational autoencoder (VAE) is a cutting-edge approach that models the tractable latent space of the molecular space. In particular, the usage of a VAE along with a property estimator has attracted considerable interest because it enables gradient-based optimization of a given molecule. However, although successful results have been achieved experimentally, the theoretical background and prerequisites for the correct operation of this method have not yet been clarified. In view of the above, we theoretically analyze and rigorously reconstruct the entire framework. From the perspective of parameterized distribution and the information theory, we first describe how the previous model overcomes the limitations of the beta VAE in discovering molecules with the desired properties. Furthermore, we describe the prerequisites for training the above model. Next, from the log-likelihood perspective of each term, we reformulate the objectives for exploring latent space to generate drug-like molecules. The distributional constraints are defined in this study, which will break away from the invalid molecular search. We demonstrated that our model could discover a novel chemical compound for targeting BCL-2 family proteins in de novo approach. Through the theoretical analysis and practical implementation, the importance of the aforementioned prerequisites and constraints to operate the model was verified.
{"title":"De novo drug design through gradient-based regularized search in information-theoretically controlled latent space.","authors":"Hyosoon Jang, Sangmin Seo, Sanghyun Park, Byung Ju Kim, Geon-Woo Choi, Jonghwan Choi, Chihyun Park","doi":"10.1007/s10822-024-00571-3","DOIUrl":"10.1007/s10822-024-00571-3","url":null,"abstract":"<p><p>Over the last decade, automatic chemical design frameworks for discovering molecules with drug-like properties have significantly progressed. Among them, the variational autoencoder (VAE) is a cutting-edge approach that models the tractable latent space of the molecular space. In particular, the usage of a VAE along with a property estimator has attracted considerable interest because it enables gradient-based optimization of a given molecule. However, although successful results have been achieved experimentally, the theoretical background and prerequisites for the correct operation of this method have not yet been clarified. In view of the above, we theoretically analyze and rigorously reconstruct the entire framework. From the perspective of parameterized distribution and the information theory, we first describe how the previous model overcomes the limitations of the beta VAE in discovering molecules with the desired properties. Furthermore, we describe the prerequisites for training the above model. Next, from the log-likelihood perspective of each term, we reformulate the objectives for exploring latent space to generate drug-like molecules. The distributional constraints are defined in this study, which will break away from the invalid molecular search. We demonstrated that our model could discover a novel chemical compound for targeting BCL-2 family proteins in de novo approach. Through the theoretical analysis and practical implementation, the importance of the aforementioned prerequisites and constraints to operate the model was verified.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11349835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142071705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-23DOI: 10.1007/s10822-024-00572-2
Kaipeng Li, Lijun Liu
Human Hippo signaling pathway is an evolutionarily conserved regulator network that controls organ development and has been implicated in various cancers. Transcriptional enhanced associate domain-4 (TEAD4) is the final nuclear effector of Hippo pathway, which is activated by Yes-associated protein (YAP) through binding to two separated YAP regions of α1-helix and Ω-loop. Previous efforts have all been addressed on deriving peptide inhibitors from the YAP to target TEAD4. Instead, we herein attempted to rationally design a so-called 'YAP helixα1-trap' based on the TEAD4 to target YAP by using dynamics simulation and energetics analysis as well as experimental assays at molecular and cellular levels. The trap represents a native double-stranded helical hairpin covering a specific YAP-binding site on TEAD4 surface, which is expected to form a three-helix bundle with the α1-helical region of YAP, thus competitively disrupting TEAD4-YAP interaction. The hairpin was further stapled by a disulfide bridge across its two helical arms. Circular dichroism characterized that the stapling can effectively constrain the trap into a native-like structured conformation in free state, thus largely minimizing the entropy penalty upon its binding to YAP. Affinity assays revealed that the stapling can considerably improve the trap binding potency to YAP α1-helix by up to 8.5-fold at molecular level, which also exhibited a good tumor-suppressing effect at cellular level if fused with TAT cell permeation sequence. In this respect, it is considered that the YAP helixα1-trap-mediated blockade of Hippo pathway may be a new and promising therapeutic strategy against cancers.
{"title":"Computational design and experimental confirmation of a disulfide-stapled YAP helix<sup>α1</sup>-trap derived from TEAD4 helical hairpin to selectively capture YAP α1-helix with potent antitumor activity.","authors":"Kaipeng Li, Lijun Liu","doi":"10.1007/s10822-024-00572-2","DOIUrl":"https://doi.org/10.1007/s10822-024-00572-2","url":null,"abstract":"<p><p>Human Hippo signaling pathway is an evolutionarily conserved regulator network that controls organ development and has been implicated in various cancers. Transcriptional enhanced associate domain-4 (TEAD4) is the final nuclear effector of Hippo pathway, which is activated by Yes-associated protein (YAP) through binding to two separated YAP regions of α1-helix and Ω-loop. Previous efforts have all been addressed on deriving peptide inhibitors from the YAP to target TEAD4. Instead, we herein attempted to rationally design a so-called 'YAP helix<sup>α1</sup>-trap' based on the TEAD4 to target YAP by using dynamics simulation and energetics analysis as well as experimental assays at molecular and cellular levels. The trap represents a native double-stranded helical hairpin covering a specific YAP-binding site on TEAD4 surface, which is expected to form a three-helix bundle with the α1-helical region of YAP, thus competitively disrupting TEAD4-YAP interaction. The hairpin was further stapled by a disulfide bridge across its two helical arms. Circular dichroism characterized that the stapling can effectively constrain the trap into a native-like structured conformation in free state, thus largely minimizing the entropy penalty upon its binding to YAP. Affinity assays revealed that the stapling can considerably improve the trap binding potency to YAP α1-helix by up to 8.5-fold at molecular level, which also exhibited a good tumor-suppressing effect at cellular level if fused with TAT cell permeation sequence. In this respect, it is considered that the YAP helix<sup>α1</sup>-trap-mediated blockade of Hippo pathway may be a new and promising therapeutic strategy against cancers.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142034861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1007/s10822-024-00569-x
Daniel A M Pais, Jan-Peter A Mayer, Karin Felderer, Maria B Batalha, Timo Eichner, Sofia T Santos, Raman Kumar, Sandra D Silva, Hitto Kaufmann
The development of novel therapeutic proteins is a lengthy and costly process, with an average attrition rate of 91% (Thomas et al. Clinical Development Success Rates and Contributing Factors 2011-2020, 2021). To increase the probability of success and ensure robust drug supply beyond approval, it is essential to assess the developability profile of new potential drug candidates as early and broadly as possible in development (Jain et al. MAbs, 2023. https://doi.org/10.1016/j.copbio.2011.06.002 ). Predicting these properties in silico is expected to be the next leap in innovation as it would enable significantly reduced development timelines combined with broader screens at lower costs. However, developing predictive algorithms typically requires substantial datasets generated under very defined conditions, a limiting factor especially for new classes of therapeutic proteins that hold immense clinical promise. Here we describe a strategy for assessing the developability of a novel class of small therapeutic Anticalin® proteins using machine learning in conjunction with a knowledge-driven approach. The knowledge-driven approach considers developability attributes such as aggregation propensity, charge variants, immunogenicity, specificity, thermal stability, hydrophobicity, and potential post-translational modifications, to calculate a holistic developability score. Based on sequence-derived descriptors as input parameters we established novel statistical models designed to predict the developability scores for Anticalin proteins. The best models yielded low root mean square errors across the entire dataset and were further validated by removing input data from individual screening campaigns and predicting developability scores for those drug candidates. The adoption of the described workflow will enable significantly streamlined preclinical development of Anticalin drug candidates and could potentially be applied to other therapeutic protein scaffolds.
{"title":"Holistic in silico developability assessment of novel classes of small proteins using publicly available sequence-based predictors.","authors":"Daniel A M Pais, Jan-Peter A Mayer, Karin Felderer, Maria B Batalha, Timo Eichner, Sofia T Santos, Raman Kumar, Sandra D Silva, Hitto Kaufmann","doi":"10.1007/s10822-024-00569-x","DOIUrl":"https://doi.org/10.1007/s10822-024-00569-x","url":null,"abstract":"<p><p>The development of novel therapeutic proteins is a lengthy and costly process, with an average attrition rate of 91% (Thomas et al. Clinical Development Success Rates and Contributing Factors 2011-2020, 2021). To increase the probability of success and ensure robust drug supply beyond approval, it is essential to assess the developability profile of new potential drug candidates as early and broadly as possible in development (Jain et al. MAbs, 2023. https://doi.org/10.1016/j.copbio.2011.06.002 ). Predicting these properties in silico is expected to be the next leap in innovation as it would enable significantly reduced development timelines combined with broader screens at lower costs. However, developing predictive algorithms typically requires substantial datasets generated under very defined conditions, a limiting factor especially for new classes of therapeutic proteins that hold immense clinical promise. Here we describe a strategy for assessing the developability of a novel class of small therapeutic Anticalin® proteins using machine learning in conjunction with a knowledge-driven approach. The knowledge-driven approach considers developability attributes such as aggregation propensity, charge variants, immunogenicity, specificity, thermal stability, hydrophobicity, and potential post-translational modifications, to calculate a holistic developability score. Based on sequence-derived descriptors as input parameters we established novel statistical models designed to predict the developability scores for Anticalin proteins. The best models yielded low root mean square errors across the entire dataset and were further validated by removing input data from individual screening campaigns and predicting developability scores for those drug candidates. The adoption of the described workflow will enable significantly streamlined preclinical development of Anticalin drug candidates and could potentially be applied to other therapeutic protein scaffolds.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142008046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-16DOI: 10.1007/s10822-024-00570-4
Daniel K Gehlhaar, Daniel J Mermelstein
Enhancing virtual screening enrichment has become an urgent problem in computational chemistry, driven by increasingly large databases of commercially available compounds, without a commensurate drop in in vitro screening costs. Docking these large databases is possible with cloud-scale computing. However, rapid docking necessitates compromises in scoring, often leading to poor enrichment and an abundance of false positives in docking results. This work describes a new scoring function composed of two parts - a knowledge-based component that predicts the probability of a particular atom type being in a particular receptor environment, and a tunable weight matrix that converts the probability predictions into a dimensionless score suitable for virtual screening enrichment. This score, the FitScore, represents the compatibility between the ligand and the binding site and is capable of a high degree of enrichment across standardized docking test sets.
{"title":"FitScore: a fast machine learning-based score for 3D virtual screening enrichment.","authors":"Daniel K Gehlhaar, Daniel J Mermelstein","doi":"10.1007/s10822-024-00570-4","DOIUrl":"https://doi.org/10.1007/s10822-024-00570-4","url":null,"abstract":"<p><p>Enhancing virtual screening enrichment has become an urgent problem in computational chemistry, driven by increasingly large databases of commercially available compounds, without a commensurate drop in in vitro screening costs. Docking these large databases is possible with cloud-scale computing. However, rapid docking necessitates compromises in scoring, often leading to poor enrichment and an abundance of false positives in docking results. This work describes a new scoring function composed of two parts - a knowledge-based component that predicts the probability of a particular atom type being in a particular receptor environment, and a tunable weight matrix that converts the probability predictions into a dimensionless score suitable for virtual screening enrichment. This score, the FitScore, represents the compatibility between the ligand and the binding site and is capable of a high degree of enrichment across standardized docking test sets.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141987208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lactate dehydrogenase A (LDHA) is highly expressed in many tumor cells and promotes the conversion of pyruvate to lactic acid in the glucose pathway, providing energy and synthetic precursors for rapid proliferation of tumor cells. Therefore, inhibition of LDHA has become a widely concerned tumor treatment strategy. However, the research and development of highly efficient and low toxic LDHA small molecule inhibitors still faces challenges. To discover potential inhibitors against LDHA, virtual screening based on molecular docking techniques was performed from Specs database of more than 260,000 compounds and Chemdiv-smart database of more than 1,000 compounds. Through molecular dynamics (MD) simulation studies, we identified 12 potential LDHA inhibitors, all of which can stably bind to human LDHA protein and form multiple interactions with its active central residues. In order to verify the inhibitory activities of these compounds, we established an enzyme activity assay system and measured their inhibitory effects on recombinant human LDHA. The results showed that Compound 6 could inhibit the catalytic effect of LDHA on pyruvate in a dose-dependent manner with an EC50 value of 14.54 ± 0.83 µM. Further in vitro experiments showed that Compound 6 could significantly inhibit the proliferation of various tumor cell lines such as pancreatic cancer cells and lung cancer cells, reduce intracellular lactic acid content and increase intracellular reactive oxygen species (ROS) level. In summary, through virtual screening and in vitro validation, we found that Compound 6 is a small molecule inhibitor for LDHA, providing a good lead compound for the research and development of LDHA related targeted anti-tumor drugs.
{"title":"Development of human lactate dehydrogenase a inhibitors: high-throughput screening, molecular dynamics simulation and enzyme activity assay.","authors":"Yuanyuan Shu, Jianda Yue, Yaqi Li, Yekui Yin, Jiaxu Wang, Tingting Li, Xiao He, Songping Liang, Gaihua Zhang, Zhonghua Liu, Ying Wang","doi":"10.1007/s10822-024-00568-y","DOIUrl":"10.1007/s10822-024-00568-y","url":null,"abstract":"<p><p>Lactate dehydrogenase A (LDHA) is highly expressed in many tumor cells and promotes the conversion of pyruvate to lactic acid in the glucose pathway, providing energy and synthetic precursors for rapid proliferation of tumor cells. Therefore, inhibition of LDHA has become a widely concerned tumor treatment strategy. However, the research and development of highly efficient and low toxic LDHA small molecule inhibitors still faces challenges. To discover potential inhibitors against LDHA, virtual screening based on molecular docking techniques was performed from Specs database of more than 260,000 compounds and Chemdiv-smart database of more than 1,000 compounds. Through molecular dynamics (MD) simulation studies, we identified 12 potential LDHA inhibitors, all of which can stably bind to human LDHA protein and form multiple interactions with its active central residues. In order to verify the inhibitory activities of these compounds, we established an enzyme activity assay system and measured their inhibitory effects on recombinant human LDHA. The results showed that Compound 6 could inhibit the catalytic effect of LDHA on pyruvate in a dose-dependent manner with an EC<sub>50</sub> value of 14.54 ± 0.83 µM. Further in vitro experiments showed that Compound 6 could significantly inhibit the proliferation of various tumor cell lines such as pancreatic cancer cells and lung cancer cells, reduce intracellular lactic acid content and increase intracellular reactive oxygen species (ROS) level. In summary, through virtual screening and in vitro validation, we found that Compound 6 is a small molecule inhibitor for LDHA, providing a good lead compound for the research and development of LDHA related targeted anti-tumor drugs.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141911277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-02DOI: 10.1007/s10822-024-00567-z
Lucas A Garro, Matias F Andrada, Esteban G Vega-Hissi, Sonia Barberis, Juan C Garro Martinez
Antioxidants agents play an essential role in the food industry for improving the oxidative stability of food products. In the last years, the search for new natural antioxidants has increased due to the potential high toxicity of chemical additives. Therefore, the synthesis and evaluation of the antioxidant activity in peptides is a field of current research. In this study, we performed a Quantitative Structure Activity Relationship analysis (QSAR) of cysteine-containing 19 dipeptides and 19 tripeptides. The main objective is to bring information on the relationship between the structure of peptides and their antioxidant activity. For this purpose, 1D and 2D molecular descriptors were calculated using the PaDEL software, which provides information about the structure, shape, size, charge, polarity, solubility and other aspects of the compounds. Different QSAR model for di- and tripeptides were developed. The statistic parameters for di-peptides model (R2train = 0.947 and R2test = 0.804) and for tripeptide models (R2train = 0.923 and R2test = 0.847) indicate that the generated models have high predictive capacity. Then, the influence of the cysteine position was analyzed predicting the antioxidant activity for new di- and tripeptides, and comparing them with glutathione. In dipeptides, excepting SC, TC and VC, the activity increases when cysteine is at the N-terminal position. For tripeptides, we observed a notable increase in activity when cysteine is placed in the N-terminal position.
在食品工业中,抗氧化剂对提高食品的氧化稳定性起着至关重要的作用。近年来,由于化学添加剂潜在的高毒性,人们越来越多地寻找新的天然抗氧化剂。因此,合成和评估肽的抗氧化活性是当前的一个研究领域。在本研究中,我们对含半胱氨酸的 19 种二肽和 19 种三肽进行了定量结构活性关系分析(QSAR)。研究的主要目的是了解肽的结构与其抗氧化活性之间的关系。为此,使用 PaDEL 软件计算了一维和二维分子描述符,该软件提供了化合物的结构、形状、大小、电荷、极性、溶解度和其他方面的信息。为二肽和三肽建立了不同的 QSAR 模型。二肽模型的统计参数(R2train = 0.947 和 R2test = 0.804)和三肽模型的统计参数(R2train = 0.923 和 R2test = 0.847)表明所生成的模型具有较高的预测能力。然后,分析了半胱氨酸位置对预测新的二肽和三肽抗氧化活性的影响,并将它们与谷胱甘肽进行了比较。除 SC、TC 和 VC 外,当半胱氨酸位于 N 端位置时,二肽的活性会增加。在三肽中,我们观察到当半胱氨酸位于 N 端位置时,其活性显著增加。
{"title":"Development of QSARs for cysteine-containing di- and tripeptides with antioxidant activity:influence of the cysteine position.","authors":"Lucas A Garro, Matias F Andrada, Esteban G Vega-Hissi, Sonia Barberis, Juan C Garro Martinez","doi":"10.1007/s10822-024-00567-z","DOIUrl":"https://doi.org/10.1007/s10822-024-00567-z","url":null,"abstract":"<p><p>Antioxidants agents play an essential role in the food industry for improving the oxidative stability of food products. In the last years, the search for new natural antioxidants has increased due to the potential high toxicity of chemical additives. Therefore, the synthesis and evaluation of the antioxidant activity in peptides is a field of current research. In this study, we performed a Quantitative Structure Activity Relationship analysis (QSAR) of cysteine-containing 19 dipeptides and 19 tripeptides. The main objective is to bring information on the relationship between the structure of peptides and their antioxidant activity. For this purpose, 1D and 2D molecular descriptors were calculated using the PaDEL software, which provides information about the structure, shape, size, charge, polarity, solubility and other aspects of the compounds. Different QSAR model for di- and tripeptides were developed. The statistic parameters for di-peptides model (R<sup>2</sup><sub>train</sub> = 0.947 and R<sup>2</sup><sub>test</sub> = 0.804) and for tripeptide models (R<sup>2</sup><sub>train</sub> = 0.923 and R<sup>2</sup><sub>test</sub> = 0.847) indicate that the generated models have high predictive capacity. Then, the influence of the cysteine position was analyzed predicting the antioxidant activity for new di- and tripeptides, and comparing them with glutathione. In dipeptides, excepting SC, TC and VC, the activity increases when cysteine is at the N-terminal position. For tripeptides, we observed a notable increase in activity when cysteine is placed in the N-terminal position.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141873885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1007/s10822-024-00566-0
Laura Guasch, Niels Maeder, John G Cumming, Christian Kramer
Nonadditivity (NA) in Structure-Activity and Structure-Property Relationship (SAR) data is a rare but very information rich phenomenon. It can indicate conformational flexibility, structural rearrangements, and errors in assay results and structural assignment. While purely ligand-based conformational causes of NA are rather well understood and mundane, other factors are less so and cause surprising NA that has a huge influence on SAR analysis and ML model performance. We here report a systematic analysis across a wide range of properties (20 on-target biological activities and 4 physicochemical ADME-related properties) to understand the frequency of various different phenomena that may lead to NA. A set of novel descriptors were developed to characterize double transformation cycles and identify trends in NA. Double transformation cycles were classified into "surprising" and "mundane" categories, with the majority being classed as mundane. We also examined commonalities among surprising cycles, finding LogP differences to have the most significant impact on NA. A distinct behavior of NA for on-target sets compared to ADME sets was observed. Finally, we show that machine learning models struggle with highly nonadditive data, indicating that a better understanding of NA is an important future research direction.
结构-活性和结构-性质关系(SAR)数据中的非相加性(NA)是一种罕见但信息丰富的现象。它可以表明构象的灵活性、结构的重排以及检测结果和结构分配的错误。虽然纯粹基于配体的构象原因导致的 NA 比较容易理解,也很普通,但其他因素就不那么容易理解了,它们会导致令人惊讶的 NA,对 SAR 分析和 ML 模型性能产生巨大影响。我们在此报告了对各种性质(20 种靶上生物活性和 4 种物理化学 ADME 相关性质)的系统分析,以了解可能导致 NA 的各种不同现象的发生频率。我们开发了一套新的描述指标来描述双重转化周期并确定 NA 的趋势。双重转化周期被分为 "惊人 "和 "平凡 "两类,其中大多数被归为平凡类。我们还研究了令人惊讶的周期之间的共性,发现 LogP 差异对 NA 的影响最大。我们还观察到,与 ADME 集相比,目标集的 NA 具有独特的行为。最后,我们发现机器学习模型在处理高度非加性数据时非常吃力,这表明更好地理解NA是未来的一个重要研究方向。
{"title":"From mundane to surprising nonadditivity: drivers and impact on ML models.","authors":"Laura Guasch, Niels Maeder, John G Cumming, Christian Kramer","doi":"10.1007/s10822-024-00566-0","DOIUrl":"https://doi.org/10.1007/s10822-024-00566-0","url":null,"abstract":"<p><p>Nonadditivity (NA) in Structure-Activity and Structure-Property Relationship (SAR) data is a rare but very information rich phenomenon. It can indicate conformational flexibility, structural rearrangements, and errors in assay results and structural assignment. While purely ligand-based conformational causes of NA are rather well understood and mundane, other factors are less so and cause surprising NA that has a huge influence on SAR analysis and ML model performance. We here report a systematic analysis across a wide range of properties (20 on-target biological activities and 4 physicochemical ADME-related properties) to understand the frequency of various different phenomena that may lead to NA. A set of novel descriptors were developed to characterize double transformation cycles and identify trends in NA. Double transformation cycles were classified into \"surprising\" and \"mundane\" categories, with the majority being classed as mundane. We also examined commonalities among surprising cycles, finding LogP differences to have the most significant impact on NA. A distinct behavior of NA for on-target sets compared to ADME sets was observed. Finally, we show that machine learning models struggle with highly nonadditive data, indicating that a better understanding of NA is an important future research direction.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141756486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-17DOI: 10.1007/s10822-024-00564-2
Alexander C Brueckner, Benjamin Shields, Palani Kirubakaran, Alexander Suponya, Manoranjan Panda, Shana L Posy, Stephen Johnson, Sirish Kaushik Lakkaraju
Molecular dynamics (MD) simulation is a powerful tool for characterizing ligand-protein conformational dynamics and offers significant advantages over docking and other rigid structure-based computational methods. However, setting up, running, and analyzing MD simulations continues to be a multi-step process making it cumbersome to assess a library of ligands in a protein binding pocket using MD. We present an automated workflow that streamlines setting up, running, and analyzing Desmond MD simulations for protein-ligand complexes using machine learning (ML) models. The workflow takes a library of pre-docked ligands and a prepared protein structure as input, sets up and runs MD with each protein-ligand complex, and generates simulation fingerprints for each ligand. Simulation fingerprints (SimFP) capture protein-ligand compatibility, including stability of different ligand-pocket interactions and other useful metrics that enable easy rank-ordering of the ligand library for pocket optimization. SimFPs from a ligand library are used to build & deploy ML models that predict binding assay outcomes and automatically infer important interactions. Unlike relative free-energy methods that are constrained to assess ligands with high chemical similarity, ML models based on SimFPs can accommodate diverse ligand sets. We present two case studies on how SimFP helps delineate structure-activity relationship (SAR) trends and explain potency differences across matched-molecular pairs of (1) cyclic peptides targeting PD-L1 and (2) small molecule inhibitors targeting CDK9.
{"title":"MDFit: automated molecular simulations workflow enables high throughput assessment of ligands-protein dynamics.","authors":"Alexander C Brueckner, Benjamin Shields, Palani Kirubakaran, Alexander Suponya, Manoranjan Panda, Shana L Posy, Stephen Johnson, Sirish Kaushik Lakkaraju","doi":"10.1007/s10822-024-00564-2","DOIUrl":"https://doi.org/10.1007/s10822-024-00564-2","url":null,"abstract":"<p><p>Molecular dynamics (MD) simulation is a powerful tool for characterizing ligand-protein conformational dynamics and offers significant advantages over docking and other rigid structure-based computational methods. However, setting up, running, and analyzing MD simulations continues to be a multi-step process making it cumbersome to assess a library of ligands in a protein binding pocket using MD. We present an automated workflow that streamlines setting up, running, and analyzing Desmond MD simulations for protein-ligand complexes using machine learning (ML) models. The workflow takes a library of pre-docked ligands and a prepared protein structure as input, sets up and runs MD with each protein-ligand complex, and generates simulation fingerprints for each ligand. Simulation fingerprints (SimFP) capture protein-ligand compatibility, including stability of different ligand-pocket interactions and other useful metrics that enable easy rank-ordering of the ligand library for pocket optimization. SimFPs from a ligand library are used to build & deploy ML models that predict binding assay outcomes and automatically infer important interactions. Unlike relative free-energy methods that are constrained to assess ligands with high chemical similarity, ML models based on SimFPs can accommodate diverse ligand sets. We present two case studies on how SimFP helps delineate structure-activity relationship (SAR) trends and explain potency differences across matched-molecular pairs of (1) cyclic peptides targeting PD-L1 and (2) small molecule inhibitors targeting CDK9.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141625626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-17DOI: 10.1007/s10822-024-00565-1
Wen-Chieh Huang, Chia-Hung Hsu, Titus V Albu, Chia-Ning Yang
Adenosine deaminases acting on RNA (ADARs) are pivotal RNA-editing enzymes responsible for converting adenosine to inosine within double-stranded RNA (dsRNA). Dysregulation of ADAR1 editing activity, often arising from genetic mutations, has been linked to elevated interferon levels and the onset of autoinflammatory diseases. However, understanding the molecular underpinnings of this dysregulation is impeded by the lack of an experimentally determined structure for the ADAR1 deaminase domain. In this computational study, we utilized homology modeling and the AlphaFold2 to construct structural models of the ADAR1 deaminase domain in wild-type and two pathogenic variants, R892H and Y1112F, to decipher the structural impact on the reduced deaminase activity. Our findings illuminate the critical role of structural complementarity between the ADAR1 deaminase domain and dsRNA in enzyme-substrate recognition. That is, the relative position of E1008 and K1120 must be maintained so that they can insert into the minor and major grooves of the substrate dsRNA, respectively, facilitating the flipping-out of adenosine to be accommodated within a cavity surrounding E912. Both amino acid replacements studied, R892H at the orthosteric site and Y1112F at the allosteric site, alter K1120 position and ultimately hinder substrate RNA binding.
{"title":"Structural impacts of two disease-linked ADAR1 mutants: a molecular dynamics study.","authors":"Wen-Chieh Huang, Chia-Hung Hsu, Titus V Albu, Chia-Ning Yang","doi":"10.1007/s10822-024-00565-1","DOIUrl":"https://doi.org/10.1007/s10822-024-00565-1","url":null,"abstract":"<p><p>Adenosine deaminases acting on RNA (ADARs) are pivotal RNA-editing enzymes responsible for converting adenosine to inosine within double-stranded RNA (dsRNA). Dysregulation of ADAR1 editing activity, often arising from genetic mutations, has been linked to elevated interferon levels and the onset of autoinflammatory diseases. However, understanding the molecular underpinnings of this dysregulation is impeded by the lack of an experimentally determined structure for the ADAR1 deaminase domain. In this computational study, we utilized homology modeling and the AlphaFold2 to construct structural models of the ADAR1 deaminase domain in wild-type and two pathogenic variants, R892H and Y1112F, to decipher the structural impact on the reduced deaminase activity. Our findings illuminate the critical role of structural complementarity between the ADAR1 deaminase domain and dsRNA in enzyme-substrate recognition. That is, the relative position of E1008 and K1120 must be maintained so that they can insert into the minor and major grooves of the substrate dsRNA, respectively, facilitating the flipping-out of adenosine to be accommodated within a cavity surrounding E912. Both amino acid replacements studied, R892H at the orthosteric site and Y1112F at the allosteric site, alter K1120 position and ultimately hinder substrate RNA binding.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141625551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-30DOI: 10.1007/s10822-024-00563-3
Konrad Diedrich, Christiane Ehrt, Joel Graef, Martin Poppinga, Norbert Ritter, Matthias Rarey
In this work, we present the frontend of GeoMine and showcase its application, focusing on the new features of its latest version. GeoMine is a search engine for ligand-bound and predicted empty binding sites in the Protein Data Bank. In addition to its basic text-based search functionalities, GeoMine offers a geometric query type for searching binding sites with a specific relative spatial arrangement of chemical features such as heavy atoms and intermolecular interactions. In contrast to a text search that requires simple and easy-to-formulate user input, a 3D input is more complex, and its specification can be challenging for users. GeoMine's new version aims to address this issue from the graphical user interface perspective by introducing an additional visualization concept and a new query template type. In its latest version, GeoMine extends its query-building capabilities primarily through input formulation in 2D. The 2D editor is fully synchronized with GeoMine's 3D editor and provides the same functionality. It enables template-free query generation and template-based query selection directly in 2D pose diagrams. In addition, the query generation with the 3D editor now supports predicted empty binding sites for AlphaFold structures as query templates. GeoMine is freely accessible on the ProteinsPlus web server ( https://proteins.plus ).
{"title":"User-centric design of a 3D search interface for protein-ligand complexes.","authors":"Konrad Diedrich, Christiane Ehrt, Joel Graef, Martin Poppinga, Norbert Ritter, Matthias Rarey","doi":"10.1007/s10822-024-00563-3","DOIUrl":"10.1007/s10822-024-00563-3","url":null,"abstract":"<p><p>In this work, we present the frontend of GeoMine and showcase its application, focusing on the new features of its latest version. GeoMine is a search engine for ligand-bound and predicted empty binding sites in the Protein Data Bank. In addition to its basic text-based search functionalities, GeoMine offers a geometric query type for searching binding sites with a specific relative spatial arrangement of chemical features such as heavy atoms and intermolecular interactions. In contrast to a text search that requires simple and easy-to-formulate user input, a 3D input is more complex, and its specification can be challenging for users. GeoMine's new version aims to address this issue from the graphical user interface perspective by introducing an additional visualization concept and a new query template type. In its latest version, GeoMine extends its query-building capabilities primarily through input formulation in 2D. The 2D editor is fully synchronized with GeoMine's 3D editor and provides the same functionality. It enables template-free query generation and template-based query selection directly in 2D pose diagrams. In addition, the query generation with the 3D editor now supports predicted empty binding sites for AlphaFold structures as query templates. GeoMine is freely accessible on the ProteinsPlus web server ( https://proteins.plus ).</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11139749/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141173926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}