Pub Date : 2024-10-29DOI: 10.1007/s10822-024-00576-y
Adiran Garaizar Suarez, Andreas H. Göller, Michael E. Beck, Sadra Kashef Ol Gheta, Katharina Meier
Relative solubilities, i.e. whether a given molecule is more soluble in one solvent compared to others, is a critical parameter for pharmaceutical and agricultural formulation development and chemical synthesis, material science, and environmental chemistry. In silico predictions of this crucial variable can help reducing experiments, waste of solvents and synthesis optimization. In this study, we evaluate the performance of different physics-based methods for predicting relative solubilities. Our assessment involves quantum mechanics-based COSMO-RS and molecular dynamics-based free energy methods using OPLS4, the open-source OpenFF Sage, and GAFF force fields, spanning over 200 solvent–solute combinations. Our investigation highlights the important role of compound multimerization, an effect which must be accounted for to obtain accurate relative solubility predictions. The performance landscape of these methods is varied, with significant differences in precision depending on both the method used and the solute considered, thereby offering an improved understanding of the predictive power of physics-based methods in chemical research.
{"title":"Comparative assessment of physics-based in silico methods to calculate relative solubilities","authors":"Adiran Garaizar Suarez, Andreas H. Göller, Michael E. Beck, Sadra Kashef Ol Gheta, Katharina Meier","doi":"10.1007/s10822-024-00576-y","DOIUrl":"10.1007/s10822-024-00576-y","url":null,"abstract":"<div><p>Relative solubilities, i.e. whether a given molecule is more soluble in one solvent compared to others, is a critical parameter for pharmaceutical and agricultural formulation development and chemical synthesis, material science, and environmental chemistry. In silico predictions of this crucial variable can help reducing experiments, waste of solvents and synthesis optimization. In this study, we evaluate the performance of different physics-based methods for predicting relative solubilities. Our assessment involves quantum mechanics-based COSMO-RS and molecular dynamics-based free energy methods using OPLS4, the open-source OpenFF Sage, and GAFF force fields, spanning over 200 solvent–solute combinations. Our investigation highlights the important role of compound multimerization, an effect which must be accounted for to obtain accurate relative solubility predictions. The performance landscape of these methods is varied, with significant differences in precision depending on both the method used and the solute considered, thereby offering an improved understanding of the predictive power of physics-based methods in chemical research.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"38 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1007/s10822-024-00573-1
Tugay Direk, Osman Doluca
G-quadruplexes refer to a large group of nucleic acid–based structures. In recent years, they have been attracting attention due to their biological roles in the telomeres and promoter regions. These structures show wide diversity in topology, however, development of methods for structural classification of G-quadruplexes has been evaded for a long time. There has been a limited number of studies aiming to bring forth a secondary structure classification method. The situation was even more complex than imagined, since the discovery of bulged and mismatched G-quadruplexes while most of the available tools fail to distinguish these non-canonical G-quadruplex motifs. Moreover, the interpretation of their analysis output still requires expert knowledge. In this study, we propose a new method for identification of unimolecular G-Quadruplexes and classification by secondary structures based on three-dimensional structural data. Briefly, coordinates of guanines are processed to identify tetrads, loops and bulges. Then, we present the secondary structure in the form of a depiction which shows the loop types, bulges, and guanines that participate in each tetrad. Moreover, CIIS-GQ identifies non-guanine nucleotides that joins the G-tetrads and forms multiplets. Finally, the results of our study are compared with DSSR and ElTetrado classification methods, and the advantages of the proposed depiction method for representing secondary structures were discussed. The source code of the method can be accessed via https://github.com/TugayDirek/CIIS-GQ.
G 型四聚体是指一大类基于核酸的结构。近年来,由于它们在端粒和启动子区域的生物学作用,它们一直备受关注。这些结构在拓扑结构上表现出广泛的多样性,然而,G-四重链结构分类方法的开发却迟迟没有进展。旨在提出二级结构分类方法的研究数量有限。情况比想象的还要复杂,因为人们发现了隆起和不匹配的 G 型四重结构,而大多数现有工具都无法区分这些非经典的 G 型四重结构图案。此外,对其分析结果的解释仍然需要专业知识。在这项研究中,我们提出了一种基于三维结构数据的新方法,用于识别单分子 G 型四核苷酸并根据二级结构进行分类。简而言之,通过处理鸟嘌呤的坐标来识别四聚体、环和凸起。然后,我们以描述的形式呈现二级结构,显示环路类型、隆起和参与每个四元组的鸟嘌呤。此外,CIIS-GQ 还能识别连接 G 四元组并形成多聚体的非鸟嘌呤核苷酸。最后,我们将研究结果与 DSSR 和 ElTetrado 分类方法进行了比较,并讨论了所提出的描述二级结构方法的优势。该方法的源代码可通过 https://github.com/TugayDirek/CIIS-GQ 访问。
{"title":"Computational Identification and Illustrative Standard for Representation of Unimolecular G-Quadruplex Secondary Structures (CIIS-GQ)","authors":"Tugay Direk, Osman Doluca","doi":"10.1007/s10822-024-00573-1","DOIUrl":"10.1007/s10822-024-00573-1","url":null,"abstract":"<div><p>G-quadruplexes refer to a large group of nucleic acid–based structures. In recent years, they have been attracting attention due to their biological roles in the telomeres and promoter regions. These structures show wide diversity in topology, however, development of methods for structural classification of G-quadruplexes has been evaded for a long time. There has been a limited number of studies aiming to bring forth a secondary structure classification method. The situation was even more complex than imagined, since the discovery of bulged and mismatched G-quadruplexes while most of the available tools fail to distinguish these non-canonical G-quadruplex motifs. Moreover, the interpretation of their analysis output still requires expert knowledge. In this study, we propose a new method for identification of unimolecular G-Quadruplexes and classification by secondary structures based on three-dimensional structural data. Briefly, coordinates of guanines are processed to identify tetrads, loops and bulges. Then, we present the secondary structure in the form of a depiction which shows the loop types, bulges, and guanines that participate in each tetrad. Moreover, CIIS-GQ identifies non-guanine nucleotides that joins the G-tetrads and forms multiplets. Finally, the results of our study are compared with DSSR and ElTetrado classification methods, and the advantages of the proposed depiction method for representing secondary structures were discussed. The source code of the method can be accessed via https://github.com/TugayDirek/CIIS-GQ.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"38 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24DOI: 10.1007/s10822-024-00575-z
Phuc-Chau Do, Vy T. T. Le
Therapeutic monoclonal antibodies are an effective method of treating acute infectious diseases. However, knowing which of the produced antibodies in the vast number of human antibodies can cure the disease requires a long time and advanced technology. The previously introduced iBRAB method relies on studied antibodies to design a broad-spectrum antibody capable of neutralizing antigens of many different Influenza A viral strains. To evaluate the antigen-binding fragment as an applicable drug, the therapeutic antibody profiles providing guidelines collected from clinically staged therapeutic antibodies were used to access different measurements. Although the evaluated values were within an accepted range, the modification in the amino acid sequence is required for better properties. Thus, using the steered molecular dynamics (SMD) simulation to determine the binding capacity of amino acids in the functional region, the profile of interacted amino acids of Fab with the antigen was established for modified reference. As a result, the model was modified with amino acids elimination at positions 96–97 in the heavy chain and 26–27, 91, 96–97, and 102–103 in the light chain, which has better Therapeutic Antibody Profiler evaluations than the original designation. Thus again, SMD simulation is a promising computational approach for post-modification in rational drug design.
{"title":"Steered molecular dynamics simulation as a post-process to optimize the iBRAB-designed Fab model","authors":"Phuc-Chau Do, Vy T. T. Le","doi":"10.1007/s10822-024-00575-z","DOIUrl":"10.1007/s10822-024-00575-z","url":null,"abstract":"<div><p>Therapeutic monoclonal antibodies are an effective method of treating acute infectious diseases. However, knowing which of the produced antibodies in the vast number of human antibodies can cure the disease requires a long time and advanced technology. The previously introduced <i>i</i>BRAB method relies on studied antibodies to design a broad-spectrum antibody capable of neutralizing antigens of many different Influenza A viral strains. To evaluate the antigen-binding fragment as an applicable drug, the therapeutic antibody profiles providing guidelines collected from clinically staged therapeutic antibodies were used to access different measurements. Although the evaluated values were within an accepted range, the modification in the amino acid sequence is required for better properties. Thus, using the steered molecular dynamics (SMD) simulation to determine the binding capacity of amino acids in the functional region, the profile of interacted amino acids of Fab with the antigen was established for modified reference. As a result, the model was modified with amino acids elimination at positions 96–97 in the heavy chain and 26–27, 91, 96–97, and 102–103 in the light chain, which has better Therapeutic Antibody Profiler evaluations than the original designation. Thus again, SMD simulation is a promising computational approach for post-modification in rational drug design.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"38 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142492596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-16DOI: 10.1007/s10822-024-00574-0
Ann E. Cleves, Himani Tandon, Ajay N. Jain
So-called “cross-docking” is the prediction of the bound configuration of small-molecule ligands that differ from the cognate ligand of a protein co-crystal structure. This is a much more challenging problem than re-docking the cognate ligand, particularly when the new ligand is structurally dissimilar from prior known ones. We have updated the previously introduced PINC (“PINC Is Not Cognate”) benchmark which introduced the idea of temporal segregation to measure cross-docking performance. The temporal set encompasses 846 future ligands for ten targets based on information from the earliest 25% of X-ray co-crystal structures known for each target. Here, we extend the benchmark to include thirteen targets where the bound poses of 128 macrocyclic ligands are to be predicted based on knowledge from structures of bound non-macrocyclic ligands. Performance was roughly equivalent for both the temporally-split non-macrocyclic ligand set and the macrocycle prediction set. Using standard and fully automatic protocols for the Surflex-Dock and ForceGen methods, across the combined 974 non-macrocyclic and macrocyclic ligands, the top-scoring pose family was correct 68% of the time, with the top-two pose families achieving a 79% success rate. Correct poses among all those predicted were identified 92% of the time. These success rates far exceeded those observed for the alternative methods AutoDock Vina and Gnina on both sets.
{"title":"Structure-based pose prediction: Non-cognate docking extended to macrocyclic ligands","authors":"Ann E. Cleves, Himani Tandon, Ajay N. Jain","doi":"10.1007/s10822-024-00574-0","DOIUrl":"10.1007/s10822-024-00574-0","url":null,"abstract":"<div><p>So-called “cross-docking” is the prediction of the bound configuration of small-molecule ligands that differ from the cognate ligand of a protein co-crystal structure. This is a much more challenging problem than re-docking the cognate ligand, particularly when the new ligand is structurally dissimilar from prior known ones. We have updated the previously introduced PINC (“PINC Is Not Cognate”) benchmark which introduced the idea of temporal segregation to measure cross-docking performance. The temporal set encompasses 846 <i>future</i> ligands for ten targets based on information from the earliest 25% of X-ray co-crystal structures known for each target. Here, we extend the benchmark to include thirteen targets where the bound poses of 128 macrocyclic ligands are to be predicted based on knowledge from structures of bound <i>non-macrocyclic</i> ligands. Performance was roughly equivalent for both the temporally-split non-macrocyclic ligand set and the macrocycle prediction set. Using standard and fully automatic protocols for the Surflex-Dock and ForceGen methods, across the combined 974 non-macrocyclic and macrocyclic ligands, the top-scoring pose family was correct 68% of the time, with the top-two pose families achieving a 79% success rate. Correct poses among all those predicted were identified 92% of the time. These success rates far exceeded those observed for the alternative methods AutoDock Vina and Gnina on both sets.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"38 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10822-024-00574-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142443293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-27DOI: 10.1007/s10822-024-00571-3
Hyosoon Jang, Sangmin Seo, Sanghyun Park, Byung Ju Kim, Geon-Woo Choi, Jonghwan Choi, Chihyun Park
Over the last decade, automatic chemical design frameworks for discovering molecules with drug-like properties have significantly progressed. Among them, the variational autoencoder (VAE) is a cutting-edge approach that models the tractable latent space of the molecular space. In particular, the usage of a VAE along with a property estimator has attracted considerable interest because it enables gradient-based optimization of a given molecule. However, although successful results have been achieved experimentally, the theoretical background and prerequisites for the correct operation of this method have not yet been clarified. In view of the above, we theoretically analyze and rigorously reconstruct the entire framework. From the perspective of parameterized distribution and the information theory, we first describe how the previous model overcomes the limitations of the beta VAE in discovering molecules with the desired properties. Furthermore, we describe the prerequisites for training the above model. Next, from the log-likelihood perspective of each term, we reformulate the objectives for exploring latent space to generate drug-like molecules. The distributional constraints are defined in this study, which will break away from the invalid molecular search. We demonstrated that our model could discover a novel chemical compound for targeting BCL-2 family proteins in de novo approach. Through the theoretical analysis and practical implementation, the importance of the aforementioned prerequisites and constraints to operate the model was verified.
{"title":"De novo drug design through gradient-based regularized search in information-theoretically controlled latent space","authors":"Hyosoon Jang, Sangmin Seo, Sanghyun Park, Byung Ju Kim, Geon-Woo Choi, Jonghwan Choi, Chihyun Park","doi":"10.1007/s10822-024-00571-3","DOIUrl":"10.1007/s10822-024-00571-3","url":null,"abstract":"<div><p>Over the last decade, automatic chemical design frameworks for discovering molecules with drug-like properties have significantly progressed. Among them, the variational autoencoder (VAE) is a cutting-edge approach that models the tractable latent space of the molecular space. In particular, the usage of a VAE along with a property estimator has attracted considerable interest because it enables gradient-based optimization of a given molecule. However, although successful results have been achieved experimentally, the theoretical background and prerequisites for the correct operation of this method have not yet been clarified. In view of the above, we theoretically analyze and rigorously reconstruct the entire framework. From the perspective of parameterized distribution and the information theory, we first describe how the previous model overcomes the limitations of the beta VAE in discovering molecules with the desired properties. Furthermore, we describe the prerequisites for training the above model. Next, from the log-likelihood perspective of each term, we reformulate the objectives for exploring latent space to generate drug-like molecules. The distributional constraints are defined in this study, which will break away from the invalid molecular search. We demonstrated that our model could discover a novel chemical compound for targeting BCL-2 family proteins in de novo approach. Through the theoretical analysis and practical implementation, the importance of the aforementioned prerequisites and constraints to operate the model was verified.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"38 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11349835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142071705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-23DOI: 10.1007/s10822-024-00572-2
Kaipeng Li, Lijun Liu
Human Hippo signaling pathway is an evolutionarily conserved regulator network that controls organ development and has been implicated in various cancers. Transcriptional enhanced associate domain-4 (TEAD4) is the final nuclear effector of Hippo pathway, which is activated by Yes-associated protein (YAP) through binding to two separated YAP regions of α1-helix and Ω-loop. Previous efforts have all been addressed on deriving peptide inhibitors from the YAP to target TEAD4. Instead, we herein attempted to rationally design a so-called ‘YAP helixα1-trap’ based on the TEAD4 to target YAP by using dynamics simulation and energetics analysis as well as experimental assays at molecular and cellular levels. The trap represents a native double-stranded helical hairpin covering a specific YAP-binding site on TEAD4 surface, which is expected to form a three-helix bundle with the α1-helical region of YAP, thus competitively disrupting TEAD4–YAP interaction. The hairpin was further stapled by a disulfide bridge across its two helical arms. Circular dichroism characterized that the stapling can effectively constrain the trap into a native-like structured conformation in free state, thus largely minimizing the entropy penalty upon its binding to YAP. Affinity assays revealed that the stapling can considerably improve the trap binding potency to YAP α1-helix by up to 8.5-fold at molecular level, which also exhibited a good tumor-suppressing effect at cellular level if fused with TAT cell permeation sequence. In this respect, it is considered that the YAP helixα1-trap-mediated blockade of Hippo pathway may be a new and promising therapeutic strategy against cancers.
{"title":"Computational design and experimental confirmation of a disulfide-stapled YAP helixα1-trap derived from TEAD4 helical hairpin to selectively capture YAP α1-helix with potent antitumor activity","authors":"Kaipeng Li, Lijun Liu","doi":"10.1007/s10822-024-00572-2","DOIUrl":"10.1007/s10822-024-00572-2","url":null,"abstract":"<div><p>Human Hippo signaling pathway is an evolutionarily conserved regulator network that controls organ development and has been implicated in various cancers. Transcriptional enhanced associate domain-4 (TEAD4) is the final nuclear effector of Hippo pathway, which is activated by Yes-associated protein (YAP) through binding to two separated YAP regions of α1-helix and Ω-loop. Previous efforts have all been addressed on deriving peptide inhibitors from the YAP to target TEAD4. Instead, we herein attempted to rationally design a so-called ‘YAP helix<sup>α1</sup>-trap’ based on the TEAD4 to target YAP by using dynamics simulation and energetics analysis as well as experimental assays at molecular and cellular levels. The trap represents a native double-stranded helical hairpin covering a specific YAP-binding site on TEAD4 surface, which is expected to form a three-helix bundle with the α1-helical region of YAP, thus competitively disrupting TEAD4–YAP interaction. The hairpin was further stapled by a disulfide bridge across its two helical arms. Circular dichroism characterized that the stapling can effectively constrain the trap into a native-like structured conformation in free state, thus largely minimizing the entropy penalty upon its binding to YAP. Affinity assays revealed that the stapling can considerably improve the trap binding potency to YAP α1-helix by up to 8.5-fold at molecular level, which also exhibited a good tumor-suppressing effect at cellular level if fused with TAT cell permeation sequence. In this respect, it is considered that the YAP helix<sup>α1</sup>-trap-mediated blockade of Hippo pathway may be a new and promising therapeutic strategy against cancers.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"38 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142034861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1007/s10822-024-00569-x
Daniel A. M. Pais, Jan-Peter A. Mayer, Karin Felderer, Maria B. Batalha, Timo Eichner, Sofia T. Santos, Raman Kumar, Sandra D. Silva, Hitto Kaufmann
The development of novel therapeutic proteins is a lengthy and costly process, with an average attrition rate of 91% (Thomas et al. Clinical Development Success Rates and Contributing Factors 2011–2020, 2021). To increase the probability of success and ensure robust drug supply beyond approval, it is essential to assess the developability profile of new potential drug candidates as early and broadly as possible in development (Jain et al. MAbs, 2023. https://doi.org/10.1016/j.copbio.2011.06.002). Predicting these properties in silico is expected to be the next leap in innovation as it would enable significantly reduced development timelines combined with broader screens at lower costs. However, developing predictive algorithms typically requires substantial datasets generated under very defined conditions, a limiting factor especially for new classes of therapeutic proteins that hold immense clinical promise. Here we describe a strategy for assessing the developability of a novel class of small therapeutic Anticalin® proteins using machine learning in conjunction with a knowledge-driven approach. The knowledge-driven approach considers developability attributes such as aggregation propensity, charge variants, immunogenicity, specificity, thermal stability, hydrophobicity, and potential post-translational modifications, to calculate a holistic developability score. Based on sequence-derived descriptors as input parameters we established novel statistical models designed to predict the developability scores for Anticalin proteins. The best models yielded low root mean square errors across the entire dataset and were further validated by removing input data from individual screening campaigns and predicting developability scores for those drug candidates. The adoption of the described workflow will enable significantly streamlined preclinical development of Anticalin drug candidates and could potentially be applied to other therapeutic protein scaffolds.