首页 > 最新文献

Journal of Chemical Information and Modeling 最新文献

英文 中文
CSM Software: Continuous Symmetry and Chirality Measures for Quantitative Structural Analysis. CSM 软件:用于定量结构分析的连续对称性和手性度量。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-07-02 DOI: 10.1021/acs.jcim.4c00609
Inbal Tuvi-Arad, Yaffa Shalit, Gil Alon

We present a comprehensive and updated Python-based open software to calculate continuous symmetry measures (CSMs) and their related continuous chirality measure (CCM) of molecules across chemistry. These descriptors are used to quantify distortion levels of molecular structures on a continuous scale and were proven insightful in numerous studies. The input information includes the coordinates of the molecular geometry and a desired cyclic symmetry point group (i.e., Cs, Ci, Cn, or Sn). The results include the coordinates of the nearest symmetric structure that belong to the desired symmetry point group, the permutation that defines the symmetry operation, the direction of the symmetry element in space, and a number, between zero and 100, representing the level of symmetry or chirality. Rather than treating symmetry as a binary property by which a structure is either symmetric or asymmetric, the CSM approach quantifies the level of gray between black and white and allows one to follow the course of change. The software can be downloaded from https://github.com/continuous-symmetry-measure/csm or used online at https://csm.ouproj.org.il.

我们介绍了一个基于 Python 的全面、最新的开放式软件,用于计算化学分子的连续对称性度量(CSM)及其相关的连续手性度量(CCM)。这些描述符用于量化分子结构在连续尺度上的畸变程度,并在大量研究中得到了证实。输入信息包括分子几何形状的坐标和所需的循环对称点组(即 Cs、Ci、Cn 或 Sn)。结果包括属于所需对称点组的最近对称结构的坐标、定义对称操作的置换、空间对称元素的方向,以及代表对称性或手性级的 0 至 100 之间的数字。CSM 方法不是将对称性视为结构对称或不对称的二元属性,而是量化黑白之间的灰度,并允许人们跟踪变化过程。该软件可从 https://github.com/continuous-symmetry-measure/csm 下载或在 https://csm.ouproj.org.il 在线使用。
{"title":"CSM Software: Continuous Symmetry and Chirality Measures for Quantitative Structural Analysis.","authors":"Inbal Tuvi-Arad, Yaffa Shalit, Gil Alon","doi":"10.1021/acs.jcim.4c00609","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00609","url":null,"abstract":"<p><p>We present a comprehensive and updated Python-based open software to calculate continuous symmetry measures (CSMs) and their related continuous chirality measure (CCM) of molecules across chemistry. These descriptors are used to quantify distortion levels of molecular structures on a continuous scale and were proven insightful in numerous studies. The input information includes the coordinates of the molecular geometry and a desired cyclic symmetry point group (<i>i.e., C</i><sub>s</sub>, <i>C</i><sub>i</sub>, <i>C</i><sub>n</sub>, or <i>S</i><sub>n</sub>). The results include the coordinates of the nearest symmetric structure that belong to the desired symmetry point group, the permutation that defines the symmetry operation, the direction of the symmetry element in space, and a number, between zero and 100, representing the level of symmetry or chirality. Rather than treating symmetry as a binary property by which a structure is either symmetric or asymmetric, the CSM approach quantifies the level of gray between black and white and allows one to follow the course of change. The software can be downloaded from https://github.com/continuous-symmetry-measure/csm or used online at https://csm.ouproj.org.il.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141489949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proximity Graph Networks: Predicting Ligand Affinity with Message Passing Neural Networks 邻近图网络:利用信息传递神经网络预测配体亲和性
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-07-02 DOI: 10.1021/acs.jcim.4c00311
Zachary J. Gale-Day, Laura Shub, Kangway V. Chuang, Michael J. Keiser
Message passing neural networks (MPNNs) on molecular graphs generate continuous and differentiable encodings of small molecules with state-of-the-art performance on protein–ligand complex scoring tasks. Here, we describe the proximity graph network (PGN) package, an open-source toolkit that constructs ligand–receptor graphs based on atom proximity and allows users to rapidly apply and evaluate MPNN architectures for a broad range of tasks. We demonstrate the utility of PGN by introducing benchmarks for affinity and docking score prediction tasks. Graph networks generalize better than fingerprint-based models and perform strongly for the docking score prediction task. Overall, MPNNs with proximity graph data structures augment the prediction of ligand–receptor complex properties when ligand–receptor data are available.
分子图上的消息传递神经网络(MPNN)可生成连续且可微分的小分子编码,在蛋白质配体复合物评分任务中具有最先进的性能。在这里,我们介绍了邻近图网络(PGN)软件包,这是一个开源工具包,可根据原子邻近性构建配体-受体图,并允许用户在广泛的任务中快速应用和评估 MPNN 架构。我们通过引入亲和力和对接得分预测任务的基准,展示了 PGN 的实用性。与基于指纹的模型相比,图网络的泛化效果更好,在对接得分预测任务中表现强劲。总体而言,当配体-受体数据可用时,具有接近图数据结构的 MPNNs 可以增强配体-受体复合物特性的预测。
{"title":"Proximity Graph Networks: Predicting Ligand Affinity with Message Passing Neural Networks","authors":"Zachary J. Gale-Day, Laura Shub, Kangway V. Chuang, Michael J. Keiser","doi":"10.1021/acs.jcim.4c00311","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00311","url":null,"abstract":"Message passing neural networks (MPNNs) on molecular graphs generate continuous and differentiable encodings of small molecules with state-of-the-art performance on protein–ligand complex scoring tasks. Here, we describe the proximity graph network (PGN) package, an open-source toolkit that constructs ligand–receptor graphs based on atom proximity and allows users to rapidly apply and evaluate MPNN architectures for a broad range of tasks. We demonstrate the utility of PGN by introducing benchmarks for affinity and docking score prediction tasks. Graph networks generalize better than fingerprint-based models and perform strongly for the docking score prediction task. Overall, MPNNs with proximity graph data structures augment the prediction of ligand–receptor complex properties when ligand–receptor data are available.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141489782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ligand-Based Compound Activity Prediction via Few-Shot Learning. 基于配体的化合物活性预测(Few-Shot Learning)。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 DOI: 10.1021/acs.jcim.4c00485
Peter Eckmann, Jake Anderson, Rose Yu, Michael K Gilson

Predicting the activities of new compounds against biophysical or phenotypic assays based on the known activities of one or a few existing compounds is a common goal in early stage drug discovery. This problem can be cast as a "few-shot learning" challenge, and prior studies have developed few-shot learning methods to classify compounds as active versus inactive. However, the ability to go beyond classification and rank compounds by expected affinity is more valuable. We describe Few-Shot Compound Activity Prediction (FS-CAP), a novel neural architecture trained on a large bioactivity data set to predict compound activities against an assay outside the training set, based on only the activities of a few known compounds against the same assay. Our model aggregates encodings generated from the known compounds and their activities to capture assay information and uses a separate encoder for the new compound whose activity is to be predicted. The new method provides encouraging results relative to traditional chemical-similarity-based techniques as well as other state-of-the-art few-shot learning methods in tests on a variety of ligand-based drug discovery settings and data sets. The code for FS-CAP is available at https://github.com/Rose-STL-Lab/FS-CAP.

根据一种或几种现有化合物的已知活性,预测新化合物在生物物理或表型测定中的活性,是早期药物发现的一个共同目标。这一问题可被视为 "少量学习 "挑战,先前的研究已开发出少量学习方法,可将化合物分为活性和非活性两种。然而,超越分类并根据预期亲和力对化合物进行排序的能力更有价值。我们介绍了 "少量化合物活性预测"(FS-CAP),这是一种在大型生物活性数据集上进行训练的新型神经架构,它可以仅根据少数已知化合物对同一检测方法的活性,预测化合物对训练集之外的检测方法的活性。我们的模型汇总了从已知化合物及其活性生成的编码,以捕捉检测信息,并为要预测其活性的新化合物使用单独的编码器。在对各种配体药物发现设置和数据集的测试中,与传统的基于化学相似性的技术以及其他最先进的少量学习方法相比,新方法取得了令人鼓舞的结果。FS-CAP 的代码见 https://github.com/Rose-STL-Lab/FS-CAP。
{"title":"Ligand-Based Compound Activity Prediction via Few-Shot Learning.","authors":"Peter Eckmann, Jake Anderson, Rose Yu, Michael K Gilson","doi":"10.1021/acs.jcim.4c00485","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00485","url":null,"abstract":"<p><p>Predicting the activities of new compounds against biophysical or phenotypic assays based on the known activities of one or a few existing compounds is a common goal in early stage drug discovery. This problem can be cast as a \"few-shot learning\" challenge, and prior studies have developed few-shot learning methods to classify compounds as active versus inactive. However, the ability to go beyond classification and rank compounds by expected affinity is more valuable. We describe <i>Few-Shot Compound Activity Prediction</i> (FS-CAP), a novel neural architecture trained on a large bioactivity data set to predict compound activities against an assay outside the training set, based on only the activities of a few known compounds against the same assay. Our model aggregates encodings generated from the known compounds and their activities to capture assay information and uses a separate encoder for the new compound whose activity is to be predicted. The new method provides encouraging results relative to traditional chemical-similarity-based techniques as well as other state-of-the-art few-shot learning methods in tests on a variety of ligand-based drug discovery settings and data sets. The code for FS-CAP is available at https://github.com/Rose-STL-Lab/FS-CAP.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revealing Comprehensive Food Functionalities and Mechanisms of Action through Machine Learning. 通过机器学习揭示食品的综合功能和作用机理。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 DOI: 10.1021/acs.jcim.4c00061
Nanako Inoue, Tomokazu Shibata, Yusuke Tanaka, Hiromu Taguchi, Ryusuke Sawada, Kenshin Goto, Shogo Momokita, Morihiro Aoyagi, Takashi Hirao, Yoshihiro Yamanishi

Foods possess a range of unexplored functionalities; however, fully identifying these functions through empirical means presents significant challenges. In this study, we have proposed an in silico approach to comprehensively predict the functionalities of foods, encompassing even processed foods. This prediction is accomplished through the utilization of machine learning on biomedical big data. Our focus revolves around disease-related protein pathways, wherein we statistically evaluate how the constituent compounds collaboratively regulate these pathways. The proposed method has been employed across 876 foods and 83 diseases, leading to an extensive revelation of both food functionalities and their underlying operational mechanisms. Additionally, this approach identifies food combinations that potentially affect molecular pathways based on interrelationships between food functions within disease-related pathways. Our proposed method holds potential for advancing preventive healthcare.

食品具有一系列尚未开发的功能;然而,通过经验手段全面确定这些功能是一项重大挑战。在这项研究中,我们提出了一种硅学方法来全面预测食品的功能,甚至包括加工食品。这种预测是通过对生物医学大数据的机器学习来实现的。我们的重点是围绕与疾病相关的蛋白质通路,通过统计评估组成化合物如何协同调节这些通路。我们已在 876 种食物和 83 种疾病中采用了所提出的方法,从而广泛揭示了食物的功能及其潜在的运行机制。此外,这种方法还能根据食物功能在疾病相关途径中的相互关系,确定可能影响分子途径的食物组合。我们提出的方法具有推进预防保健的潜力。
{"title":"Revealing Comprehensive Food Functionalities and Mechanisms of Action through Machine Learning.","authors":"Nanako Inoue, Tomokazu Shibata, Yusuke Tanaka, Hiromu Taguchi, Ryusuke Sawada, Kenshin Goto, Shogo Momokita, Morihiro Aoyagi, Takashi Hirao, Yoshihiro Yamanishi","doi":"10.1021/acs.jcim.4c00061","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00061","url":null,"abstract":"<p><p>Foods possess a range of unexplored functionalities; however, fully identifying these functions through empirical means presents significant challenges. In this study, we have proposed an <i>in silico</i> approach to comprehensively predict the functionalities of foods, encompassing even processed foods. This prediction is accomplished through the utilization of machine learning on biomedical big data. Our focus revolves around disease-related protein pathways, wherein we statistically evaluate how the constituent compounds collaboratively regulate these pathways. The proposed method has been employed across 876 foods and 83 diseases, leading to an extensive revelation of both food functionalities and their underlying operational mechanisms. Additionally, this approach identifies food combinations that potentially affect molecular pathways based on interrelationships between food functions within disease-related pathways. Our proposed method holds potential for advancing preventive healthcare.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction and Interpretation Microglia Cytotoxicity by Machine Learning. 通过机器学习预测和解释小胶质细胞毒性。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 DOI: 10.1021/acs.jcim.4c00366
Qing Liu, Dakuo He, Mengmeng Fan, Jinpeng Wang, Zeyu Cui, Hao Wang, Yan Mi, Ning Li, Qingqi Meng, Yue Hou

Ameliorating microglia-mediated neuroinflammation is a crucial strategy in developing new drugs for neurodegenerative diseases. Plant compounds are an important screening target for the discovery of drugs for the treatment of neurodegenerative diseases. However, due to the spatial complexity of phytochemicals, it becomes particularly important to evaluate the effectiveness of compounds while avoiding the mixing of cytotoxic substances in the early stages of compound screening. Traditional high-throughput screening methods suffer from high cost and low efficiency. A computational model based on machine learning provides a novel avenue for cytotoxicity determination. In this study, a microglia cytotoxicity classifier was developed using a machine learning approach. First, we proposed a data splitting strategy based on the molecule murcko generic scaffold, under this condition, three machine learning approaches were coupled with three kinds of molecular representation methods to construct microglia cytotoxicity classifier, which were then compared and assessed by the predictive accuracy, balanced accuracy, F1-score, and Matthews Correlation Coefficient. Then, the recursive feature elimination integrated with support vector machine (RFE-SVC) dimension reduction method was introduced to molecular fingerprints with high dimensions to further improve the model performance. Among all the microglial cytotoxicity classifiers, the SVM coupled with ECFP4 fingerprint after feature selection (ECFP4-RFE-SVM) obtained the most accurate classification for the test set (ACC of 0.99, BA of 0.99, F1-score of 0.99, MCC of 0.97). Finally, the Shapley additive explanations (SHAP) method was used in interpreting the microglia cytotoxicity classifier and key substructure smart identified as structural alerts. Experimental results show that ECFP4-RFE-SVM have reliable classification capability for microglia cytotoxicity, and SHAP can not only provide a rational explanation for microglia cytotoxicity predictions, but also offer a guideline for subsequent molecular cytotoxicity modifications.

改善小胶质细胞介导的神经炎症是开发治疗神经退行性疾病新药的重要策略。植物化合物是发现治疗神经退行性疾病药物的重要筛选目标。然而,由于植物化学物质的空间复杂性,在化合物筛选的早期阶段,既要评估化合物的有效性,又要避免混入细胞毒性物质变得尤为重要。传统的高通量筛选方法成本高、效率低。基于机器学习的计算模型为细胞毒性测定提供了一条新途径。本研究采用机器学习方法开发了小胶质细胞毒性分类器。首先,我们提出了一种基于分子murcko通用支架的数据拆分策略,在此条件下,将三种机器学习方法与三种分子表征方法相结合,构建了小胶质细胞毒性分类器,并通过预测准确率、平衡准确率、F1-score和Matthews相关系数对其进行了比较和评估。然后,针对高维度的分子指纹引入了递归特征消除与支持向量机(RFE-SVC)降维方法,进一步提高了模型的性能。在所有小神经胶质细胞毒性分类器中,特征选择后与 ECFP4 指纹相结合的 SVM(ECFP4-RFE-SVM)对测试集的分类准确度最高(ACC 为 0.99,BA 为 0.99,F1-score 为 0.99,MCC 为 0.97)。最后,在解释小胶质细胞毒性分类器时使用了 Shapley 加性解释(SHAP)方法,并将关键子结构智能识别为结构警报。实验结果表明,ECFP4-RFE-SVM 对小胶质细胞毒性具有可靠的分类能力,SHAP 不仅可以为小胶质细胞毒性预测提供合理解释,还可以为后续的分子细胞毒性修饰提供指导。
{"title":"Prediction and Interpretation Microglia Cytotoxicity by Machine Learning.","authors":"Qing Liu, Dakuo He, Mengmeng Fan, Jinpeng Wang, Zeyu Cui, Hao Wang, Yan Mi, Ning Li, Qingqi Meng, Yue Hou","doi":"10.1021/acs.jcim.4c00366","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00366","url":null,"abstract":"<p><p>Ameliorating microglia-mediated neuroinflammation is a crucial strategy in developing new drugs for neurodegenerative diseases. Plant compounds are an important screening target for the discovery of drugs for the treatment of neurodegenerative diseases. However, due to the spatial complexity of phytochemicals, it becomes particularly important to evaluate the effectiveness of compounds while avoiding the mixing of cytotoxic substances in the early stages of compound screening. Traditional high-throughput screening methods suffer from high cost and low efficiency. A computational model based on machine learning provides a novel avenue for cytotoxicity determination. In this study, a microglia cytotoxicity classifier was developed using a machine learning approach. First, we proposed a data splitting strategy based on the molecule murcko generic scaffold, under this condition, three machine learning approaches were coupled with three kinds of molecular representation methods to construct microglia cytotoxicity classifier, which were then compared and assessed by the predictive accuracy, balanced accuracy, F<sub>1</sub>-score, and Matthews Correlation Coefficient. Then, the recursive feature elimination integrated with support vector machine (RFE-SVC) dimension reduction method was introduced to molecular fingerprints with high dimensions to further improve the model performance. Among all the microglial cytotoxicity classifiers, the SVM coupled with ECFP4 fingerprint after feature selection (ECFP4-RFE-SVM) obtained the most accurate classification for the test set (ACC of 0.99, BA of 0.99, F<sub>1</sub>-score of 0.99, MCC of 0.97). Finally, the Shapley additive explanations (SHAP) method was used in interpreting the microglia cytotoxicity classifier and key substructure smart identified as structural alerts. Experimental results show that ECFP4-RFE-SVM have reliable classification capability for microglia cytotoxicity, and SHAP can not only provide a rational explanation for microglia cytotoxicity predictions, but also offer a guideline for subsequent molecular cytotoxicity modifications.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141464275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QSARtuna: An Automated QSAR Modeling Platform for Molecular Property Prediction in Drug Design. QSARtuna:用于药物设计中分子性质预测的 QSAR 自动建模平台。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 DOI: 10.1021/acs.jcim.4c00457
Lewis Mervin, Alexey Voronov, Mikhail Kabeshov, Ola Engkvist

Machine-learning (ML) and deep-learning (DL) approaches to predict the molecular properties of small molecules are increasingly deployed within the design-make-test-analyze (DMTA) drug design cycle to predict molecular properties of interest. Despite this uptake, there are only a few automated packages to aid their development and deployment that also support uncertainty estimation, model explainability, and other key aspects of model usage. This represents a key unmet need within the field, and the large number of molecular representations and algorithms (and associated parameters) means it is nontrivial to robustly optimize, evaluate, reproduce, and deploy models. Here, we present QSARtuna, a molecule property prediction modeling pipeline, written in Python and utilizing the Optuna, Scikit-learn, RDKit, and ChemProp packages, which enables the efficient and automated comparison between molecular representations and machine learning models. The platform was developed by considering the increasingly important aspect of model uncertainty quantification and explainability by design. We provide details for our framework and provide illustrative examples to demonstrate the capability of the software when applied to simple molecular property, reaction/reactivity prediction, and DNA encoded library enrichment classification. We hope that the release of QSARtuna will further spur innovation in automatic ML modeling and provide a platform for education of best practices in molecular property modeling. The code for the QSARtuna framework is made freely available via GitHub.

在 "设计-制造-测试-分析"(DMTA)药物设计周期中,越来越多地采用机器学习(ML)和深度学习(DL)方法来预测小分子的分子特性。尽管如此,只有少数自动化软件包可以帮助开发和部署这些模型,同时还支持不确定性估计、模型可解释性以及模型使用的其他关键方面。这是该领域尚未满足的一个关键需求,而大量的分子表征和算法(以及相关参数)意味着要稳健地优化、评估、复制和部署模型并非易事。在此,我们介绍 QSARtuna,这是一个用 Python 编写的分子性质预测建模管道,它利用 Optuna、Scikit-learn、RDKit 和 ChemProp 软件包,实现了分子表征与机器学习模型之间的高效自动比较。该平台的开发考虑了日益重要的模型不确定性量化和可解释性设计。我们将详细介绍我们的框架,并举例说明该软件在应用于简单分子特性、反应/活性预测和 DNA 编码文库富集分类时的能力。我们希望 QSARtuna 的发布能进一步推动自动 ML 建模的创新,并为分子性质建模的最佳实践教育提供一个平台。QSARtuna 框架的代码可通过 GitHub 免费获取。
{"title":"QSARtuna: An Automated QSAR Modeling Platform for Molecular Property Prediction in Drug Design.","authors":"Lewis Mervin, Alexey Voronov, Mikhail Kabeshov, Ola Engkvist","doi":"10.1021/acs.jcim.4c00457","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00457","url":null,"abstract":"<p><p>Machine-learning (ML) and deep-learning (DL) approaches to predict the molecular properties of small molecules are increasingly deployed within the design-make-test-analyze (DMTA) drug design cycle to predict molecular properties of interest. Despite this uptake, there are only a few automated packages to aid their development and deployment that also support uncertainty estimation, model explainability, and other key aspects of model usage. This represents a key unmet need within the field, and the large number of molecular representations and algorithms (and associated parameters) means it is nontrivial to robustly optimize, evaluate, reproduce, and deploy models. Here, we present QSARtuna, a molecule property prediction modeling pipeline, written in Python and utilizing the Optuna, Scikit-learn, RDKit, and ChemProp packages, which enables the efficient and automated comparison between molecular representations and machine learning models. The platform was developed by considering the increasingly important aspect of model uncertainty quantification and explainability by design. We provide details for our framework and provide illustrative examples to demonstrate the capability of the software when applied to simple molecular property, reaction/reactivity prediction, and DNA encoded library enrichment classification. We hope that the release of QSARtuna will further spur innovation in automatic ML modeling and provide a platform for education of best practices in molecular property modeling. The code for the QSARtuna framework is made freely available via GitHub.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Absolute Binding Free Energy Calculation Workflow for Drug Discovery. 用于药物发现的绝对结合自由能自动计算工作流程。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 DOI: 10.1021/acs.jcim.4c00343
Benjamin Ries, Irfan Alibay, Nithishwer Mouroug Anand, Philip C Biggin, Aniket Magarkar

Absolute binding free energies play a crucial role in drug development, particularly as part of the lead discovery process. In recent work, we showed how in silico predictions directly could support drug development by ranking and recommending favorable ideas over unfavorable ones. Here, we demonstrate a Python workflow that enables the calculation of ABFEs with minimal manual input effort, such as the receptor PDB and ligand SDF files, and outputs a .tsv file containing the ranked ligands and their corresponding binding free energies. The implementation uses Snakemake to structure and control the execution of tasks, allowing for dynamic control of parameters and execution patterns. We provide an example of a benchmark system that demonstrates the effectiveness of the automated workflow.

绝对结合自由能在药物研发中发挥着至关重要的作用,尤其是作为先导药物发现过程的一部分。在最近的工作中,我们展示了如何通过将有利的想法与不利的想法进行排序和推荐,直接进行硅学预测,从而为药物开发提供支持。在这里,我们展示了一个 Python 工作流程,它能以最少的人工输入(如受体 PDB 和配体 SDF 文件)计算 ABFEs,并输出一个包含排序配体及其相应结合自由能的 .tsv 文件。实施过程使用 Snakemake 来构建和控制任务的执行,从而实现对参数和执行模式的动态控制。我们提供了一个基准系统示例,展示了自动化工作流程的有效性。
{"title":"Automated Absolute Binding Free Energy Calculation Workflow for Drug Discovery.","authors":"Benjamin Ries, Irfan Alibay, Nithishwer Mouroug Anand, Philip C Biggin, Aniket Magarkar","doi":"10.1021/acs.jcim.4c00343","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00343","url":null,"abstract":"<p><p>Absolute binding free energies play a crucial role in drug development, particularly as part of the lead discovery process. In recent work, we showed how <i>in silico</i> predictions directly could support drug development by ranking and recommending favorable ideas over unfavorable ones. Here, we demonstrate a Python workflow that enables the calculation of ABFEs with minimal manual input effort, such as the receptor PDB and ligand SDF files, and outputs a .tsv file containing the ranked ligands and their corresponding binding free energies. The implementation uses Snakemake to structure and control the execution of tasks, allowing for dynamic control of parameters and execution patterns. We provide an example of a benchmark system that demonstrates the effectiveness of the automated workflow.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Voronoi Tessellation as a Tool for Predicting the Formation of Deep Eutectic Solvents. 将 Voronoi Tessellation 用作预测深共晶溶剂形成的工具。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 DOI: 10.1021/acs.jcim.3c01738
Francesco Cappelluti, Lorenzo Gontrani, Alessandro Mariani, Simone Galliano, Marilena Carbone, Matteo Bonomo

Deep eutectic solvents (DESs) have attracted increasing attention in recent years due to their broad applicability in different fields, but their computer-aided discovery, which avoids a time-consuming trial-and-error investigation, is still lagging. In this paper, a set of nine DESs, composed of choline chloride as a hydrogen-bond acceptor and nine functionalized phenols as hydrogen bond donors, is simulated by using classical molecular dynamics to investigate the possible formation of a DES. The tool of the Voronoi tessellation analysis is employed for producing an intuitive and straightforward representation of the degree of mixing between the different components of the solutions, therefore permitting the definition of a metric quantifying the propensity of the components to produce a uniform solution. The computational findings agree with the experimental results, thus confirming that the Voronoi tessellation analysis can act as a lightweight yet powerful approach for the high-throughput screening of mixtures in the optics of the new DES design.

近年来,深共晶溶剂(DESs)因其在不同领域的广泛应用而受到越来越多的关注,但其计算机辅助发现工作仍然滞后,这避免了耗时的试错研究。本文利用经典分子动力学模拟了一组由氯化胆碱作为氢键受体和九种官能化苯酚作为氢键供体组成的九种 DES,研究了 DES 的可能形成过程。利用沃罗诺网格分析工具,可以直观、简单地表示溶液中不同成分之间的混合程度,因此可以定义一个指标,量化各成分产生均匀溶液的倾向。计算结果与实验结果相吻合,从而证实了沃罗诺网格分析法可以作为一种轻便而强大的方法,用于在新 DES 设计的光学系统中对混合物进行高通量筛选。
{"title":"Voronoi Tessellation as a Tool for Predicting the Formation of Deep Eutectic Solvents.","authors":"Francesco Cappelluti, Lorenzo Gontrani, Alessandro Mariani, Simone Galliano, Marilena Carbone, Matteo Bonomo","doi":"10.1021/acs.jcim.3c01738","DOIUrl":"https://doi.org/10.1021/acs.jcim.3c01738","url":null,"abstract":"<p><p>Deep eutectic solvents (DESs) have attracted increasing attention in recent years due to their broad applicability in different fields, but their computer-aided discovery, which avoids a time-consuming trial-and-error investigation, is still lagging. In this paper, a set of nine DESs, composed of choline chloride as a hydrogen-bond acceptor and nine functionalized phenols as hydrogen bond donors, is simulated by using classical molecular dynamics to investigate the possible formation of a DES. The tool of the Voronoi tessellation analysis is employed for producing an intuitive and straightforward representation of the degree of mixing between the different components of the solutions, therefore permitting the definition of a metric quantifying the propensity of the components to produce a uniform solution. The computational findings agree with the experimental results, thus confirming that the Voronoi tessellation analysis can act as a lightweight yet powerful approach for the high-throughput screening of mixtures in the optics of the new DES design.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Caspase-Based Fusion Protein Technology: Substrate Cleavability Described by Computational Modeling and Simulation. 基于 Caspase 的融合蛋白技术:通过计算建模和仿真描述底物的可裂解性。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 DOI: 10.1021/acs.jcim.4c00316
Jakob Liu, Andreas Fischer, Monika Cserjan-Puschmann, Nico Lingg, Chris Oostenbrink

The Caspase-based fusion protein technology (CASPON) allows for universal cleavage of fusion tags from proteins of interest to reconstitute the native N-terminus. While the CASPON enzyme has been optimized to be promiscuous against a diversity of N-terminal peptides, the cleavage efficacy for larger proteins can be surprisingly low. We develop an efficient means to rationalize and predict the cleavage efficiency based on a structural representation of the intrinsically disordered N-terminal peptides and their putative interactions with the CASPON enzyme. The number of favorably interacting N-terminal conformations shows a very good agreement with the experimentally observed cleavage efficiency, in agreement with a conformational selection model. The method relies on computationally cheap molecular dynamics simulations to efficiently generate a diverse collection of N-terminal conformations, followed by a simple fitting procedure into the CASPON enzyme. It can be readily used to assess the CASPON cleavability a priori.

基于 Caspase 的融合蛋白技术(CASPON)可普遍裂解相关蛋白质的融合标签,以重建原生 N-端。虽然 CASPON 酶已被优化为可对多种 N 端肽进行杂交,但对于较大的蛋白质,其裂解效率可能低得令人吃惊。我们根据内在无序 N 端肽的结构表征及其与 CASPON 酶的假定相互作用,开发出一种有效的方法来合理化和预测裂解效率。有利的 N 端相互作用构象的数量与实验观察到的裂解效率非常吻合,与构象选择模型一致。该方法依靠计算成本低廉的分子动力学模拟,有效地生成了一系列不同的 N 端构象,然后通过简单的拟合程序将其加入 CASPON 酶中。该方法可用于预先评估 CASPON 的可裂解性。
{"title":"Caspase-Based Fusion Protein Technology: Substrate Cleavability Described by Computational Modeling and Simulation.","authors":"Jakob Liu, Andreas Fischer, Monika Cserjan-Puschmann, Nico Lingg, Chris Oostenbrink","doi":"10.1021/acs.jcim.4c00316","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00316","url":null,"abstract":"<p><p>The Caspase-based fusion protein technology (CASPON) allows for universal cleavage of fusion tags from proteins of interest to reconstitute the native N-terminus. While the CASPON enzyme has been optimized to be promiscuous against a diversity of N-terminal peptides, the cleavage efficacy for larger proteins can be surprisingly low. We develop an efficient means to rationalize and predict the cleavage efficiency based on a structural representation of the intrinsically disordered N-terminal peptides and their putative interactions with the CASPON enzyme. The number of favorably interacting N-terminal conformations shows a very good agreement with the experimentally observed cleavage efficiency, in agreement with a conformational selection model. The method relies on computationally cheap molecular dynamics simulations to efficiently generate a diverse collection of N-terminal conformations, followed by a simple fitting procedure into the CASPON enzyme. It can be readily used to assess the CASPON cleavability a priori.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141464274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OpenChemIE: An Information Extraction Toolkit for Chemistry Literature. OpenChemIE:化学文献信息提取工具包。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2024-07-01 DOI: 10.1021/acs.jcim.4c00572
Vincent Fan, Yujie Qian, Alex Wang, Amber Wang, Connor W Coley, Regina Barzilay

Information extraction from chemistry literature is vital for constructing up-to-date reaction databases for data-driven chemistry. Complete extraction requires combining information across text, tables, and figures, whereas prior work has mainly investigated extracting reactions from single modalities. In this paper, we present OpenChemIE to address this complex challenge and enable the extraction of reaction data at the document level. OpenChemIE approaches the problem in two steps: extracting relevant information from individual modalities and then integrating the results to obtain a final list of reactions. For the first step, we employ specialized neural models that each address a specific task for chemistry information extraction, such as parsing molecules or reactions from text or figures. We then integrate the information from these modules using chemistry-informed algorithms, allowing for the extraction of fine-grained reaction data from reaction condition and substrate scope investigations. Our machine learning models attain state-of-the-art performance when evaluated individually, and we meticulously annotate a challenging dataset of reaction schemes with R-groups to evaluate our pipeline as a whole, achieving an F1 score of 69.5%. Additionally, the reaction extraction results of OpenChemIE attain an accuracy score of 64.3% when directly compared against the Reaxys chemical database. OpenChemIE is most suited for information extraction on organic chemistry literature, where molecules are generally depicted as planar graphs or written in text and can be consolidated into a SMILES format. We provide OpenChemIE freely to the public as an open-source package, as well as through a web interface.

从化学文献中提取信息对于为数据驱动化学构建最新的反应数据库至关重要。完整的提取需要结合文本、表格和图表中的信息,而之前的工作主要研究从单一模式中提取反应。在本文中,我们提出了 OpenChemIE 来应对这一复杂的挑战,并实现在文档级提取反应数据。OpenChemIE 分两步解决这一问题:从单个模态中提取相关信息,然后整合结果,得到最终的反应列表。在第一步中,我们采用了专门的神经模型,每个神经模型处理化学信息提取的特定任务,如从文本或图表中解析分子或反应。然后,我们利用化学信息算法整合这些模块的信息,从而从反应条件和底物范围调查中提取精细反应数据。在单独评估时,我们的机器学习模型达到了最先进的性能;在评估我们的管道整体时,我们用 R 组精心注释了一个具有挑战性的反应方案数据集,F1 得分为 69.5%。此外,在与 Reaxys 化学数据库直接比较时,OpenChemIE 的反应提取结果达到了 64.3% 的准确率。OpenChemIE 最适用于有机化学文献的信息提取,因为有机化学文献中的分子通常以平面图或文本形式描述,并可合并为 SMILES 格式。我们以开源软件包的形式向公众免费提供 OpenChemIE,并提供网络接口。
{"title":"OpenChemIE: An Information Extraction Toolkit for Chemistry Literature.","authors":"Vincent Fan, Yujie Qian, Alex Wang, Amber Wang, Connor W Coley, Regina Barzilay","doi":"10.1021/acs.jcim.4c00572","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00572","url":null,"abstract":"<p><p>Information extraction from chemistry literature is vital for constructing up-to-date reaction databases for data-driven chemistry. Complete extraction requires combining information across text, tables, and figures, whereas prior work has mainly investigated extracting reactions from single modalities. In this paper, we present OpenChemIE to address this complex challenge and enable the extraction of reaction data at the document level. OpenChemIE approaches the problem in two steps: extracting relevant information from individual modalities and then integrating the results to obtain a final list of reactions. For the first step, we employ specialized neural models that each address a specific task for chemistry information extraction, such as parsing molecules or reactions from text or figures. We then integrate the information from these modules using chemistry-informed algorithms, allowing for the extraction of fine-grained reaction data from reaction condition and substrate scope investigations. Our machine learning models attain state-of-the-art performance when evaluated individually, and we meticulously annotate a challenging dataset of reaction schemes with R-groups to evaluate our pipeline as a whole, achieving an F1 score of 69.5%. Additionally, the reaction extraction results of OpenChemIE attain an accuracy score of 64.3% when directly compared against the Reaxys chemical database. OpenChemIE is most suited for information extraction on organic chemistry literature, where molecules are generally depicted as planar graphs or written in text and can be consolidated into a SMILES format. We provide OpenChemIE freely to the public as an open-source package, as well as through a web interface.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemical Information and Modeling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1