Pub Date : 2024-07-02DOI: 10.1021/acs.jcim.4c00609
Inbal Tuvi-Arad, Yaffa Shalit, Gil Alon
We present a comprehensive and updated Python-based open software to calculate continuous symmetry measures (CSMs) and their related continuous chirality measure (CCM) of molecules across chemistry. These descriptors are used to quantify distortion levels of molecular structures on a continuous scale and were proven insightful in numerous studies. The input information includes the coordinates of the molecular geometry and a desired cyclic symmetry point group (i.e., Cs, Ci, Cn, or Sn). The results include the coordinates of the nearest symmetric structure that belong to the desired symmetry point group, the permutation that defines the symmetry operation, the direction of the symmetry element in space, and a number, between zero and 100, representing the level of symmetry or chirality. Rather than treating symmetry as a binary property by which a structure is either symmetric or asymmetric, the CSM approach quantifies the level of gray between black and white and allows one to follow the course of change. The software can be downloaded from https://github.com/continuous-symmetry-measure/csm or used online at https://csm.ouproj.org.il.
{"title":"CSM Software: Continuous Symmetry and Chirality Measures for Quantitative Structural Analysis.","authors":"Inbal Tuvi-Arad, Yaffa Shalit, Gil Alon","doi":"10.1021/acs.jcim.4c00609","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00609","url":null,"abstract":"<p><p>We present a comprehensive and updated Python-based open software to calculate continuous symmetry measures (CSMs) and their related continuous chirality measure (CCM) of molecules across chemistry. These descriptors are used to quantify distortion levels of molecular structures on a continuous scale and were proven insightful in numerous studies. The input information includes the coordinates of the molecular geometry and a desired cyclic symmetry point group (<i>i.e., C</i><sub>s</sub>, <i>C</i><sub>i</sub>, <i>C</i><sub>n</sub>, or <i>S</i><sub>n</sub>). The results include the coordinates of the nearest symmetric structure that belong to the desired symmetry point group, the permutation that defines the symmetry operation, the direction of the symmetry element in space, and a number, between zero and 100, representing the level of symmetry or chirality. Rather than treating symmetry as a binary property by which a structure is either symmetric or asymmetric, the CSM approach quantifies the level of gray between black and white and allows one to follow the course of change. The software can be downloaded from https://github.com/continuous-symmetry-measure/csm or used online at https://csm.ouproj.org.il.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141489949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.1021/acs.jcim.4c00311
Zachary J. Gale-Day, Laura Shub, Kangway V. Chuang, Michael J. Keiser
Message passing neural networks (MPNNs) on molecular graphs generate continuous and differentiable encodings of small molecules with state-of-the-art performance on protein–ligand complex scoring tasks. Here, we describe the proximity graph network (PGN) package, an open-source toolkit that constructs ligand–receptor graphs based on atom proximity and allows users to rapidly apply and evaluate MPNN architectures for a broad range of tasks. We demonstrate the utility of PGN by introducing benchmarks for affinity and docking score prediction tasks. Graph networks generalize better than fingerprint-based models and perform strongly for the docking score prediction task. Overall, MPNNs with proximity graph data structures augment the prediction of ligand–receptor complex properties when ligand–receptor data are available.
{"title":"Proximity Graph Networks: Predicting Ligand Affinity with Message Passing Neural Networks","authors":"Zachary J. Gale-Day, Laura Shub, Kangway V. Chuang, Michael J. Keiser","doi":"10.1021/acs.jcim.4c00311","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00311","url":null,"abstract":"Message passing neural networks (MPNNs) on molecular graphs generate continuous and differentiable encodings of small molecules with state-of-the-art performance on protein–ligand complex scoring tasks. Here, we describe the proximity graph network (PGN) package, an open-source toolkit that constructs ligand–receptor graphs based on atom proximity and allows users to rapidly apply and evaluate MPNN architectures for a broad range of tasks. We demonstrate the utility of PGN by introducing benchmarks for affinity and docking score prediction tasks. Graph networks generalize better than fingerprint-based models and perform strongly for the docking score prediction task. Overall, MPNNs with proximity graph data structures augment the prediction of ligand–receptor complex properties when ligand–receptor data are available.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141489782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1021/acs.jcim.4c00485
Peter Eckmann, Jake Anderson, Rose Yu, Michael K Gilson
Predicting the activities of new compounds against biophysical or phenotypic assays based on the known activities of one or a few existing compounds is a common goal in early stage drug discovery. This problem can be cast as a "few-shot learning" challenge, and prior studies have developed few-shot learning methods to classify compounds as active versus inactive. However, the ability to go beyond classification and rank compounds by expected affinity is more valuable. We describe Few-Shot Compound Activity Prediction (FS-CAP), a novel neural architecture trained on a large bioactivity data set to predict compound activities against an assay outside the training set, based on only the activities of a few known compounds against the same assay. Our model aggregates encodings generated from the known compounds and their activities to capture assay information and uses a separate encoder for the new compound whose activity is to be predicted. The new method provides encouraging results relative to traditional chemical-similarity-based techniques as well as other state-of-the-art few-shot learning methods in tests on a variety of ligand-based drug discovery settings and data sets. The code for FS-CAP is available at https://github.com/Rose-STL-Lab/FS-CAP.
{"title":"Ligand-Based Compound Activity Prediction via Few-Shot Learning.","authors":"Peter Eckmann, Jake Anderson, Rose Yu, Michael K Gilson","doi":"10.1021/acs.jcim.4c00485","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00485","url":null,"abstract":"<p><p>Predicting the activities of new compounds against biophysical or phenotypic assays based on the known activities of one or a few existing compounds is a common goal in early stage drug discovery. This problem can be cast as a \"few-shot learning\" challenge, and prior studies have developed few-shot learning methods to classify compounds as active versus inactive. However, the ability to go beyond classification and rank compounds by expected affinity is more valuable. We describe <i>Few-Shot Compound Activity Prediction</i> (FS-CAP), a novel neural architecture trained on a large bioactivity data set to predict compound activities against an assay outside the training set, based on only the activities of a few known compounds against the same assay. Our model aggregates encodings generated from the known compounds and their activities to capture assay information and uses a separate encoder for the new compound whose activity is to be predicted. The new method provides encouraging results relative to traditional chemical-similarity-based techniques as well as other state-of-the-art few-shot learning methods in tests on a variety of ligand-based drug discovery settings and data sets. The code for FS-CAP is available at https://github.com/Rose-STL-Lab/FS-CAP.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Foods possess a range of unexplored functionalities; however, fully identifying these functions through empirical means presents significant challenges. In this study, we have proposed an in silico approach to comprehensively predict the functionalities of foods, encompassing even processed foods. This prediction is accomplished through the utilization of machine learning on biomedical big data. Our focus revolves around disease-related protein pathways, wherein we statistically evaluate how the constituent compounds collaboratively regulate these pathways. The proposed method has been employed across 876 foods and 83 diseases, leading to an extensive revelation of both food functionalities and their underlying operational mechanisms. Additionally, this approach identifies food combinations that potentially affect molecular pathways based on interrelationships between food functions within disease-related pathways. Our proposed method holds potential for advancing preventive healthcare.
{"title":"Revealing Comprehensive Food Functionalities and Mechanisms of Action through Machine Learning.","authors":"Nanako Inoue, Tomokazu Shibata, Yusuke Tanaka, Hiromu Taguchi, Ryusuke Sawada, Kenshin Goto, Shogo Momokita, Morihiro Aoyagi, Takashi Hirao, Yoshihiro Yamanishi","doi":"10.1021/acs.jcim.4c00061","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00061","url":null,"abstract":"<p><p>Foods possess a range of unexplored functionalities; however, fully identifying these functions through empirical means presents significant challenges. In this study, we have proposed an <i>in silico</i> approach to comprehensively predict the functionalities of foods, encompassing even processed foods. This prediction is accomplished through the utilization of machine learning on biomedical big data. Our focus revolves around disease-related protein pathways, wherein we statistically evaluate how the constituent compounds collaboratively regulate these pathways. The proposed method has been employed across 876 foods and 83 diseases, leading to an extensive revelation of both food functionalities and their underlying operational mechanisms. Additionally, this approach identifies food combinations that potentially affect molecular pathways based on interrelationships between food functions within disease-related pathways. Our proposed method holds potential for advancing preventive healthcare.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1021/acs.jcim.4c00366
Qing Liu, Dakuo He, Mengmeng Fan, Jinpeng Wang, Zeyu Cui, Hao Wang, Yan Mi, Ning Li, Qingqi Meng, Yue Hou
Ameliorating microglia-mediated neuroinflammation is a crucial strategy in developing new drugs for neurodegenerative diseases. Plant compounds are an important screening target for the discovery of drugs for the treatment of neurodegenerative diseases. However, due to the spatial complexity of phytochemicals, it becomes particularly important to evaluate the effectiveness of compounds while avoiding the mixing of cytotoxic substances in the early stages of compound screening. Traditional high-throughput screening methods suffer from high cost and low efficiency. A computational model based on machine learning provides a novel avenue for cytotoxicity determination. In this study, a microglia cytotoxicity classifier was developed using a machine learning approach. First, we proposed a data splitting strategy based on the molecule murcko generic scaffold, under this condition, three machine learning approaches were coupled with three kinds of molecular representation methods to construct microglia cytotoxicity classifier, which were then compared and assessed by the predictive accuracy, balanced accuracy, F1-score, and Matthews Correlation Coefficient. Then, the recursive feature elimination integrated with support vector machine (RFE-SVC) dimension reduction method was introduced to molecular fingerprints with high dimensions to further improve the model performance. Among all the microglial cytotoxicity classifiers, the SVM coupled with ECFP4 fingerprint after feature selection (ECFP4-RFE-SVM) obtained the most accurate classification for the test set (ACC of 0.99, BA of 0.99, F1-score of 0.99, MCC of 0.97). Finally, the Shapley additive explanations (SHAP) method was used in interpreting the microglia cytotoxicity classifier and key substructure smart identified as structural alerts. Experimental results show that ECFP4-RFE-SVM have reliable classification capability for microglia cytotoxicity, and SHAP can not only provide a rational explanation for microglia cytotoxicity predictions, but also offer a guideline for subsequent molecular cytotoxicity modifications.
{"title":"Prediction and Interpretation Microglia Cytotoxicity by Machine Learning.","authors":"Qing Liu, Dakuo He, Mengmeng Fan, Jinpeng Wang, Zeyu Cui, Hao Wang, Yan Mi, Ning Li, Qingqi Meng, Yue Hou","doi":"10.1021/acs.jcim.4c00366","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00366","url":null,"abstract":"<p><p>Ameliorating microglia-mediated neuroinflammation is a crucial strategy in developing new drugs for neurodegenerative diseases. Plant compounds are an important screening target for the discovery of drugs for the treatment of neurodegenerative diseases. However, due to the spatial complexity of phytochemicals, it becomes particularly important to evaluate the effectiveness of compounds while avoiding the mixing of cytotoxic substances in the early stages of compound screening. Traditional high-throughput screening methods suffer from high cost and low efficiency. A computational model based on machine learning provides a novel avenue for cytotoxicity determination. In this study, a microglia cytotoxicity classifier was developed using a machine learning approach. First, we proposed a data splitting strategy based on the molecule murcko generic scaffold, under this condition, three machine learning approaches were coupled with three kinds of molecular representation methods to construct microglia cytotoxicity classifier, which were then compared and assessed by the predictive accuracy, balanced accuracy, F<sub>1</sub>-score, and Matthews Correlation Coefficient. Then, the recursive feature elimination integrated with support vector machine (RFE-SVC) dimension reduction method was introduced to molecular fingerprints with high dimensions to further improve the model performance. Among all the microglial cytotoxicity classifiers, the SVM coupled with ECFP4 fingerprint after feature selection (ECFP4-RFE-SVM) obtained the most accurate classification for the test set (ACC of 0.99, BA of 0.99, F<sub>1</sub>-score of 0.99, MCC of 0.97). Finally, the Shapley additive explanations (SHAP) method was used in interpreting the microglia cytotoxicity classifier and key substructure smart identified as structural alerts. Experimental results show that ECFP4-RFE-SVM have reliable classification capability for microglia cytotoxicity, and SHAP can not only provide a rational explanation for microglia cytotoxicity predictions, but also offer a guideline for subsequent molecular cytotoxicity modifications.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141464275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1021/acs.jcim.4c00457
Lewis Mervin, Alexey Voronov, Mikhail Kabeshov, Ola Engkvist
Machine-learning (ML) and deep-learning (DL) approaches to predict the molecular properties of small molecules are increasingly deployed within the design-make-test-analyze (DMTA) drug design cycle to predict molecular properties of interest. Despite this uptake, there are only a few automated packages to aid their development and deployment that also support uncertainty estimation, model explainability, and other key aspects of model usage. This represents a key unmet need within the field, and the large number of molecular representations and algorithms (and associated parameters) means it is nontrivial to robustly optimize, evaluate, reproduce, and deploy models. Here, we present QSARtuna, a molecule property prediction modeling pipeline, written in Python and utilizing the Optuna, Scikit-learn, RDKit, and ChemProp packages, which enables the efficient and automated comparison between molecular representations and machine learning models. The platform was developed by considering the increasingly important aspect of model uncertainty quantification and explainability by design. We provide details for our framework and provide illustrative examples to demonstrate the capability of the software when applied to simple molecular property, reaction/reactivity prediction, and DNA encoded library enrichment classification. We hope that the release of QSARtuna will further spur innovation in automatic ML modeling and provide a platform for education of best practices in molecular property modeling. The code for the QSARtuna framework is made freely available via GitHub.
在 "设计-制造-测试-分析"(DMTA)药物设计周期中,越来越多地采用机器学习(ML)和深度学习(DL)方法来预测小分子的分子特性。尽管如此,只有少数自动化软件包可以帮助开发和部署这些模型,同时还支持不确定性估计、模型可解释性以及模型使用的其他关键方面。这是该领域尚未满足的一个关键需求,而大量的分子表征和算法(以及相关参数)意味着要稳健地优化、评估、复制和部署模型并非易事。在此,我们介绍 QSARtuna,这是一个用 Python 编写的分子性质预测建模管道,它利用 Optuna、Scikit-learn、RDKit 和 ChemProp 软件包,实现了分子表征与机器学习模型之间的高效自动比较。该平台的开发考虑了日益重要的模型不确定性量化和可解释性设计。我们将详细介绍我们的框架,并举例说明该软件在应用于简单分子特性、反应/活性预测和 DNA 编码文库富集分类时的能力。我们希望 QSARtuna 的发布能进一步推动自动 ML 建模的创新,并为分子性质建模的最佳实践教育提供一个平台。QSARtuna 框架的代码可通过 GitHub 免费获取。
{"title":"QSARtuna: An Automated QSAR Modeling Platform for Molecular Property Prediction in Drug Design.","authors":"Lewis Mervin, Alexey Voronov, Mikhail Kabeshov, Ola Engkvist","doi":"10.1021/acs.jcim.4c00457","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00457","url":null,"abstract":"<p><p>Machine-learning (ML) and deep-learning (DL) approaches to predict the molecular properties of small molecules are increasingly deployed within the design-make-test-analyze (DMTA) drug design cycle to predict molecular properties of interest. Despite this uptake, there are only a few automated packages to aid their development and deployment that also support uncertainty estimation, model explainability, and other key aspects of model usage. This represents a key unmet need within the field, and the large number of molecular representations and algorithms (and associated parameters) means it is nontrivial to robustly optimize, evaluate, reproduce, and deploy models. Here, we present QSARtuna, a molecule property prediction modeling pipeline, written in Python and utilizing the Optuna, Scikit-learn, RDKit, and ChemProp packages, which enables the efficient and automated comparison between molecular representations and machine learning models. The platform was developed by considering the increasingly important aspect of model uncertainty quantification and explainability by design. We provide details for our framework and provide illustrative examples to demonstrate the capability of the software when applied to simple molecular property, reaction/reactivity prediction, and DNA encoded library enrichment classification. We hope that the release of QSARtuna will further spur innovation in automatic ML modeling and provide a platform for education of best practices in molecular property modeling. The code for the QSARtuna framework is made freely available via GitHub.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1021/acs.jcim.4c00343
Benjamin Ries, Irfan Alibay, Nithishwer Mouroug Anand, Philip C Biggin, Aniket Magarkar
Absolute binding free energies play a crucial role in drug development, particularly as part of the lead discovery process. In recent work, we showed how in silico predictions directly could support drug development by ranking and recommending favorable ideas over unfavorable ones. Here, we demonstrate a Python workflow that enables the calculation of ABFEs with minimal manual input effort, such as the receptor PDB and ligand SDF files, and outputs a .tsv file containing the ranked ligands and their corresponding binding free energies. The implementation uses Snakemake to structure and control the execution of tasks, allowing for dynamic control of parameters and execution patterns. We provide an example of a benchmark system that demonstrates the effectiveness of the automated workflow.
{"title":"Automated Absolute Binding Free Energy Calculation Workflow for Drug Discovery.","authors":"Benjamin Ries, Irfan Alibay, Nithishwer Mouroug Anand, Philip C Biggin, Aniket Magarkar","doi":"10.1021/acs.jcim.4c00343","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00343","url":null,"abstract":"<p><p>Absolute binding free energies play a crucial role in drug development, particularly as part of the lead discovery process. In recent work, we showed how <i>in silico</i> predictions directly could support drug development by ranking and recommending favorable ideas over unfavorable ones. Here, we demonstrate a Python workflow that enables the calculation of ABFEs with minimal manual input effort, such as the receptor PDB and ligand SDF files, and outputs a .tsv file containing the ranked ligands and their corresponding binding free energies. The implementation uses Snakemake to structure and control the execution of tasks, allowing for dynamic control of parameters and execution patterns. We provide an example of a benchmark system that demonstrates the effectiveness of the automated workflow.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1021/acs.jcim.3c01738
Francesco Cappelluti, Lorenzo Gontrani, Alessandro Mariani, Simone Galliano, Marilena Carbone, Matteo Bonomo
Deep eutectic solvents (DESs) have attracted increasing attention in recent years due to their broad applicability in different fields, but their computer-aided discovery, which avoids a time-consuming trial-and-error investigation, is still lagging. In this paper, a set of nine DESs, composed of choline chloride as a hydrogen-bond acceptor and nine functionalized phenols as hydrogen bond donors, is simulated by using classical molecular dynamics to investigate the possible formation of a DES. The tool of the Voronoi tessellation analysis is employed for producing an intuitive and straightforward representation of the degree of mixing between the different components of the solutions, therefore permitting the definition of a metric quantifying the propensity of the components to produce a uniform solution. The computational findings agree with the experimental results, thus confirming that the Voronoi tessellation analysis can act as a lightweight yet powerful approach for the high-throughput screening of mixtures in the optics of the new DES design.
近年来,深共晶溶剂(DESs)因其在不同领域的广泛应用而受到越来越多的关注,但其计算机辅助发现工作仍然滞后,这避免了耗时的试错研究。本文利用经典分子动力学模拟了一组由氯化胆碱作为氢键受体和九种官能化苯酚作为氢键供体组成的九种 DES,研究了 DES 的可能形成过程。利用沃罗诺网格分析工具,可以直观、简单地表示溶液中不同成分之间的混合程度,因此可以定义一个指标,量化各成分产生均匀溶液的倾向。计算结果与实验结果相吻合,从而证实了沃罗诺网格分析法可以作为一种轻便而强大的方法,用于在新 DES 设计的光学系统中对混合物进行高通量筛选。
{"title":"Voronoi Tessellation as a Tool for Predicting the Formation of Deep Eutectic Solvents.","authors":"Francesco Cappelluti, Lorenzo Gontrani, Alessandro Mariani, Simone Galliano, Marilena Carbone, Matteo Bonomo","doi":"10.1021/acs.jcim.3c01738","DOIUrl":"https://doi.org/10.1021/acs.jcim.3c01738","url":null,"abstract":"<p><p>Deep eutectic solvents (DESs) have attracted increasing attention in recent years due to their broad applicability in different fields, but their computer-aided discovery, which avoids a time-consuming trial-and-error investigation, is still lagging. In this paper, a set of nine DESs, composed of choline chloride as a hydrogen-bond acceptor and nine functionalized phenols as hydrogen bond donors, is simulated by using classical molecular dynamics to investigate the possible formation of a DES. The tool of the Voronoi tessellation analysis is employed for producing an intuitive and straightforward representation of the degree of mixing between the different components of the solutions, therefore permitting the definition of a metric quantifying the propensity of the components to produce a uniform solution. The computational findings agree with the experimental results, thus confirming that the Voronoi tessellation analysis can act as a lightweight yet powerful approach for the high-throughput screening of mixtures in the optics of the new DES design.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1021/acs.jcim.4c00316
Jakob Liu, Andreas Fischer, Monika Cserjan-Puschmann, Nico Lingg, Chris Oostenbrink
The Caspase-based fusion protein technology (CASPON) allows for universal cleavage of fusion tags from proteins of interest to reconstitute the native N-terminus. While the CASPON enzyme has been optimized to be promiscuous against a diversity of N-terminal peptides, the cleavage efficacy for larger proteins can be surprisingly low. We develop an efficient means to rationalize and predict the cleavage efficiency based on a structural representation of the intrinsically disordered N-terminal peptides and their putative interactions with the CASPON enzyme. The number of favorably interacting N-terminal conformations shows a very good agreement with the experimentally observed cleavage efficiency, in agreement with a conformational selection model. The method relies on computationally cheap molecular dynamics simulations to efficiently generate a diverse collection of N-terminal conformations, followed by a simple fitting procedure into the CASPON enzyme. It can be readily used to assess the CASPON cleavability a priori.
基于 Caspase 的融合蛋白技术(CASPON)可普遍裂解相关蛋白质的融合标签,以重建原生 N-端。虽然 CASPON 酶已被优化为可对多种 N 端肽进行杂交,但对于较大的蛋白质,其裂解效率可能低得令人吃惊。我们根据内在无序 N 端肽的结构表征及其与 CASPON 酶的假定相互作用,开发出一种有效的方法来合理化和预测裂解效率。有利的 N 端相互作用构象的数量与实验观察到的裂解效率非常吻合,与构象选择模型一致。该方法依靠计算成本低廉的分子动力学模拟,有效地生成了一系列不同的 N 端构象,然后通过简单的拟合程序将其加入 CASPON 酶中。该方法可用于预先评估 CASPON 的可裂解性。
{"title":"Caspase-Based Fusion Protein Technology: Substrate Cleavability Described by Computational Modeling and Simulation.","authors":"Jakob Liu, Andreas Fischer, Monika Cserjan-Puschmann, Nico Lingg, Chris Oostenbrink","doi":"10.1021/acs.jcim.4c00316","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00316","url":null,"abstract":"<p><p>The Caspase-based fusion protein technology (CASPON) allows for universal cleavage of fusion tags from proteins of interest to reconstitute the native N-terminus. While the CASPON enzyme has been optimized to be promiscuous against a diversity of N-terminal peptides, the cleavage efficacy for larger proteins can be surprisingly low. We develop an efficient means to rationalize and predict the cleavage efficiency based on a structural representation of the intrinsically disordered N-terminal peptides and their putative interactions with the CASPON enzyme. The number of favorably interacting N-terminal conformations shows a very good agreement with the experimentally observed cleavage efficiency, in agreement with a conformational selection model. The method relies on computationally cheap molecular dynamics simulations to efficiently generate a diverse collection of N-terminal conformations, followed by a simple fitting procedure into the CASPON enzyme. It can be readily used to assess the CASPON cleavability a priori.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141464274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1021/acs.jcim.4c00572
Vincent Fan, Yujie Qian, Alex Wang, Amber Wang, Connor W Coley, Regina Barzilay
Information extraction from chemistry literature is vital for constructing up-to-date reaction databases for data-driven chemistry. Complete extraction requires combining information across text, tables, and figures, whereas prior work has mainly investigated extracting reactions from single modalities. In this paper, we present OpenChemIE to address this complex challenge and enable the extraction of reaction data at the document level. OpenChemIE approaches the problem in two steps: extracting relevant information from individual modalities and then integrating the results to obtain a final list of reactions. For the first step, we employ specialized neural models that each address a specific task for chemistry information extraction, such as parsing molecules or reactions from text or figures. We then integrate the information from these modules using chemistry-informed algorithms, allowing for the extraction of fine-grained reaction data from reaction condition and substrate scope investigations. Our machine learning models attain state-of-the-art performance when evaluated individually, and we meticulously annotate a challenging dataset of reaction schemes with R-groups to evaluate our pipeline as a whole, achieving an F1 score of 69.5%. Additionally, the reaction extraction results of OpenChemIE attain an accuracy score of 64.3% when directly compared against the Reaxys chemical database. OpenChemIE is most suited for information extraction on organic chemistry literature, where molecules are generally depicted as planar graphs or written in text and can be consolidated into a SMILES format. We provide OpenChemIE freely to the public as an open-source package, as well as through a web interface.
{"title":"OpenChemIE: An Information Extraction Toolkit for Chemistry Literature.","authors":"Vincent Fan, Yujie Qian, Alex Wang, Amber Wang, Connor W Coley, Regina Barzilay","doi":"10.1021/acs.jcim.4c00572","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00572","url":null,"abstract":"<p><p>Information extraction from chemistry literature is vital for constructing up-to-date reaction databases for data-driven chemistry. Complete extraction requires combining information across text, tables, and figures, whereas prior work has mainly investigated extracting reactions from single modalities. In this paper, we present OpenChemIE to address this complex challenge and enable the extraction of reaction data at the document level. OpenChemIE approaches the problem in two steps: extracting relevant information from individual modalities and then integrating the results to obtain a final list of reactions. For the first step, we employ specialized neural models that each address a specific task for chemistry information extraction, such as parsing molecules or reactions from text or figures. We then integrate the information from these modules using chemistry-informed algorithms, allowing for the extraction of fine-grained reaction data from reaction condition and substrate scope investigations. Our machine learning models attain state-of-the-art performance when evaluated individually, and we meticulously annotate a challenging dataset of reaction schemes with R-groups to evaluate our pipeline as a whole, achieving an F1 score of 69.5%. Additionally, the reaction extraction results of OpenChemIE attain an accuracy score of 64.3% when directly compared against the Reaxys chemical database. OpenChemIE is most suited for information extraction on organic chemistry literature, where molecules are generally depicted as planar graphs or written in text and can be consolidated into a SMILES format. We provide OpenChemIE freely to the public as an open-source package, as well as through a web interface.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}