Pub Date : 2024-07-03DOI: 10.1021/acs.jcim.3c01934
Cheng Wang, Chuang Yuan, Yahui Wang, Yuying Shi, Tao Zhang, Gary J Patti
Libraries of collision cross-section (CCS) values have the potential to facilitate compound identification in metabolomics. Although computational methods provide an opportunity to increase library size rapidly, accurate prediction of CCS values remains challenging due to the structural diversity of small molecules. Here, we developed a machine learning (ML) model that integrates graph attention networks and multimodal molecular representations to predict CCS values on the basis of chemical class. Our approach, referred to as MGAT-CCS, had superior performance in comparison to other ML models in CCS prediction. MGAT-CCS achieved a median relative error of 0.47%/1.14% (positive/negative mode) and 1.40%/1.63% (positive/negative mode) for lipids and metabolites, respectively. When MGAT-CCS was applied to real-world metabolomics data, it reduced the number of false metabolite candidates by roughly 25% across multiple sample types ranging from plasma and urine to cells. To facilitate its application, we developed a user-friendly stand-alone web server for MGAT-CCS that is freely available at https://mgat-ccs-web.onrender.com. This work represents a step forward in predicting CCS values and can potentially facilitate the identification of small molecules when using ion mobility spectrometry coupled with mass spectrometry.
{"title":"Predicting Collision Cross-Section Values for Small Molecules through Chemical Class-Based Multimodal Graph Attention Network.","authors":"Cheng Wang, Chuang Yuan, Yahui Wang, Yuying Shi, Tao Zhang, Gary J Patti","doi":"10.1021/acs.jcim.3c01934","DOIUrl":"https://doi.org/10.1021/acs.jcim.3c01934","url":null,"abstract":"<p><p>Libraries of collision cross-section (CCS) values have the potential to facilitate compound identification in metabolomics. Although computational methods provide an opportunity to increase library size rapidly, accurate prediction of CCS values remains challenging due to the structural diversity of small molecules. Here, we developed a machine learning (ML) model that integrates graph attention networks and multimodal molecular representations to predict CCS values on the basis of chemical class. Our approach, referred to as MGAT-CCS, had superior performance in comparison to other ML models in CCS prediction. MGAT-CCS achieved a median relative error of 0.47%/1.14% (positive/negative mode) and 1.40%/1.63% (positive/negative mode) for lipids and metabolites, respectively. When MGAT-CCS was applied to real-world metabolomics data, it reduced the number of false metabolite candidates by roughly 25% across multiple sample types ranging from plasma and urine to cells. To facilitate its application, we developed a user-friendly stand-alone web server for MGAT-CCS that is freely available at https://mgat-ccs-web.onrender.com. This work represents a step forward in predicting CCS values and can potentially facilitate the identification of small molecules when using ion mobility spectrometry coupled with mass spectrometry.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141489951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-03DOI: 10.1021/acs.jcim.4c00806
Ciceron Ayala-Orozco, Hamid Teimouri, Angela Medvedeva, Bowen Li, Alex Lathem, Gang Li, Anatoly B Kolomeisky, James M Tour
One of the most challenging tasks in modern medicine is to find novel efficient cancer therapeutic methods with minimal side effects. The recent discovery of several classes of organic molecules known as "molecular jackhammers" is a promising development in this direction. It is known that these molecules can directly target and eliminate cancer cells with no impact on healthy tissues. However, the underlying microscopic picture remains poorly understood. We present a study that utilizes theoretical analysis together with experimental measurements to clarify the microscopic aspects of jackhammers' anticancer activities. Our physical-chemical approach combines statistical analysis with chemoinformatics methods to design and optimize molecular jackhammers. By correlating specific physical-chemical properties of these molecules with their abilities to kill cancer cells, several important structural features are identified and discussed. Although our theoretical analysis enhances understanding of the molecular interactions of jackhammers, it also highlights the need for further research to comprehensively elucidate their mechanisms and to develop a robust physical-chemical framework for the rational design of targeted anticancer drugs.
{"title":"Chemoinformatics Insights on Molecular Jackhammers and Cancer Cells.","authors":"Ciceron Ayala-Orozco, Hamid Teimouri, Angela Medvedeva, Bowen Li, Alex Lathem, Gang Li, Anatoly B Kolomeisky, James M Tour","doi":"10.1021/acs.jcim.4c00806","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00806","url":null,"abstract":"<p><p>One of the most challenging tasks in modern medicine is to find novel efficient cancer therapeutic methods with minimal side effects. The recent discovery of several classes of organic molecules known as \"molecular jackhammers\" is a promising development in this direction. It is known that these molecules can directly target and eliminate cancer cells with no impact on healthy tissues. However, the underlying microscopic picture remains poorly understood. We present a study that utilizes theoretical analysis together with experimental measurements to clarify the microscopic aspects of jackhammers' anticancer activities. Our physical-chemical approach combines statistical analysis with chemoinformatics methods to design and optimize molecular jackhammers. By correlating specific physical-chemical properties of these molecules with their abilities to kill cancer cells, several important structural features are identified and discussed. Although our theoretical analysis enhances understanding of the molecular interactions of jackhammers, it also highlights the need for further research to comprehensively elucidate their mechanisms and to develop a robust physical-chemical framework for the rational design of targeted anticancer drugs.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141489948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In drug discovery, molecular docking methods face challenges in accurately predicting energy. Scoring functions used in molecular docking often fail to simulate complex protein-ligand interactions fully and accurately leading to biases and inaccuracies in virtual screening and target predictions. We introduce the "Docking Score ML", developed from an analysis of over 200,000 docked complexes from 155 known targets for cancer treatments. The scoring functions used are founded on bioactivity data sourced from ChEMBL and have been fine-tuned using both supervised machine learning and deep learning techniques. We validated our approach extensively using multiple data sets such as validation of selectivity mechanism, the DUDE, DUD-AD, and LIT-PCBA data sets, and performed a multitarget analysis on drugs like sunitinib. To enhance prediction accuracy, feature fusion techniques were explored. By merging the capabilities of the Graph Convolutional Network (GCN) with multiple docking functions, our results indicated a clear superiority of our methodologies over conventional approaches. These advantages demonstrate that Docking Score ML is an efficient and accurate tool for virtual screening and reverse docking.
在药物发现过程中,分子对接方法在准确预测能量方面面临挑战。分子对接中使用的评分函数往往不能全面准确地模拟复杂的蛋白质配体相互作用,从而导致虚拟筛选和靶点预测的偏差和不准确。我们介绍的 "Docking Score ML "是通过分析 155 个已知癌症治疗靶点的 20 多万个对接复合物而开发的。所使用的评分函数基于来自 ChEMBL 的生物活性数据,并利用监督机器学习和深度学习技术进行了微调。我们利用选择性机制验证、DUDE、DUD-AD 和 LIT-PCBA 数据集等多个数据集广泛验证了我们的方法,并对舒尼替尼等药物进行了多靶点分析。为了提高预测准确性,研究人员探索了特征融合技术。通过将图形卷积网络(GCN)的功能与多种对接函数相结合,我们的结果表明我们的方法明显优于传统方法。这些优势表明,Docking Score ML 是一种高效、准确的虚拟筛选和反向对接工具。
{"title":"Docking Score ML: Target-Specific Machine Learning Models Improving Docking-Based Virtual Screening in 155 Targets.","authors":"Haihan Liu, Baichun Hu, Peiying Chen, Xiao Wang, Hanxun Wang, Shizun Wang, Jian Wang, Bin Lin, Maosheng Cheng","doi":"10.1021/acs.jcim.4c00072","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00072","url":null,"abstract":"<p><p>In drug discovery, molecular docking methods face challenges in accurately predicting energy. Scoring functions used in molecular docking often fail to simulate complex protein-ligand interactions fully and accurately leading to biases and inaccuracies in virtual screening and target predictions. We introduce the \"Docking Score ML\", developed from an analysis of over 200,000 docked complexes from 155 known targets for cancer treatments. The scoring functions used are founded on bioactivity data sourced from ChEMBL and have been fine-tuned using both supervised machine learning and deep learning techniques. We validated our approach extensively using multiple data sets such as validation of selectivity mechanism, the DUDE, DUD-AD, and LIT-PCBA data sets, and performed a multitarget analysis on drugs like sunitinib. To enhance prediction accuracy, feature fusion techniques were explored. By merging the capabilities of the Graph Convolutional Network (GCN) with multiple docking functions, our results indicated a clear superiority of our methodologies over conventional approaches. These advantages demonstrate that Docking Score ML is an efficient and accurate tool for virtual screening and reverse docking.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141489950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.1021/acs.jcim.4c00555
Shuyu Wang, Hongxing Yue, Xiaoming Yuan
Deep learning holds great potential for expediting the discovery of new polymers from the vast chemical space. However, accurately predicting polymer properties for practical applications based on their monomer composition has long been a challenge. The main obstacles include insufficient data, ineffective representation encoding, and lack of explainability. To address these issues, we propose an interpretable model called the Polymer Graph Convolutional Neural Network (PGCNN) that can accurately predict various polymer properties. This model is trained using the RadonPy data set and validated using experimental data samples. By integrating evidential deep learning with the model, we can quantify the uncertainty of predictions and enable sample-efficient training through uncertainty-guided active learning. Additionally, we demonstrate that the global attention of the graph embedding can aid in discovering underlying physical principles by identifying important functional groups within polymers and associating them with specific material attributes. Lastly, we explore the high-throughput screening capability of our model by rapidly identifying thousands of promising candidates with low and high thermal conductivity from a pool of one million hypothetical polymers. In summary, our research not only advances our mechanistic understanding of polymers using explainable AI but also paves the way for data-driven trustworthy discovery of polymer materials.
{"title":"Accelerating Polymer Discovery with Uncertainty-Guided PGCNN: Explainable AI for Predicting Properties and Mechanistic Insights.","authors":"Shuyu Wang, Hongxing Yue, Xiaoming Yuan","doi":"10.1021/acs.jcim.4c00555","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00555","url":null,"abstract":"<p><p>Deep learning holds great potential for expediting the discovery of new polymers from the vast chemical space. However, accurately predicting polymer properties for practical applications based on their monomer composition has long been a challenge. The main obstacles include insufficient data, ineffective representation encoding, and lack of explainability. To address these issues, we propose an interpretable model called the Polymer Graph Convolutional Neural Network (PGCNN) that can accurately predict various polymer properties. This model is trained using the RadonPy data set and validated using experimental data samples. By integrating evidential deep learning with the model, we can quantify the uncertainty of predictions and enable sample-efficient training through uncertainty-guided active learning. Additionally, we demonstrate that the global attention of the graph embedding can aid in discovering underlying physical principles by identifying important functional groups within polymers and associating them with specific material attributes. Lastly, we explore the high-throughput screening capability of our model by rapidly identifying thousands of promising candidates with low and high thermal conductivity from a pool of one million hypothetical polymers. In summary, our research not only advances our mechanistic understanding of polymers using explainable AI but also paves the way for data-driven trustworthy discovery of polymer materials.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.1021/acs.jcim.4c00609
Inbal Tuvi-Arad, Yaffa Shalit, Gil Alon
We present a comprehensive and updated Python-based open software to calculate continuous symmetry measures (CSMs) and their related continuous chirality measure (CCM) of molecules across chemistry. These descriptors are used to quantify distortion levels of molecular structures on a continuous scale and were proven insightful in numerous studies. The input information includes the coordinates of the molecular geometry and a desired cyclic symmetry point group (i.e., Cs, Ci, Cn, or Sn). The results include the coordinates of the nearest symmetric structure that belong to the desired symmetry point group, the permutation that defines the symmetry operation, the direction of the symmetry element in space, and a number, between zero and 100, representing the level of symmetry or chirality. Rather than treating symmetry as a binary property by which a structure is either symmetric or asymmetric, the CSM approach quantifies the level of gray between black and white and allows one to follow the course of change. The software can be downloaded from https://github.com/continuous-symmetry-measure/csm or used online at https://csm.ouproj.org.il.
{"title":"CSM Software: Continuous Symmetry and Chirality Measures for Quantitative Structural Analysis.","authors":"Inbal Tuvi-Arad, Yaffa Shalit, Gil Alon","doi":"10.1021/acs.jcim.4c00609","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00609","url":null,"abstract":"<p><p>We present a comprehensive and updated Python-based open software to calculate continuous symmetry measures (CSMs) and their related continuous chirality measure (CCM) of molecules across chemistry. These descriptors are used to quantify distortion levels of molecular structures on a continuous scale and were proven insightful in numerous studies. The input information includes the coordinates of the molecular geometry and a desired cyclic symmetry point group (<i>i.e., C</i><sub>s</sub>, <i>C</i><sub>i</sub>, <i>C</i><sub>n</sub>, or <i>S</i><sub>n</sub>). The results include the coordinates of the nearest symmetric structure that belong to the desired symmetry point group, the permutation that defines the symmetry operation, the direction of the symmetry element in space, and a number, between zero and 100, representing the level of symmetry or chirality. Rather than treating symmetry as a binary property by which a structure is either symmetric or asymmetric, the CSM approach quantifies the level of gray between black and white and allows one to follow the course of change. The software can be downloaded from https://github.com/continuous-symmetry-measure/csm or used online at https://csm.ouproj.org.il.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141489949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.1021/acs.jcim.4c00311
Zachary J. Gale-Day, Laura Shub, Kangway V. Chuang, Michael J. Keiser
Message passing neural networks (MPNNs) on molecular graphs generate continuous and differentiable encodings of small molecules with state-of-the-art performance on protein–ligand complex scoring tasks. Here, we describe the proximity graph network (PGN) package, an open-source toolkit that constructs ligand–receptor graphs based on atom proximity and allows users to rapidly apply and evaluate MPNN architectures for a broad range of tasks. We demonstrate the utility of PGN by introducing benchmarks for affinity and docking score prediction tasks. Graph networks generalize better than fingerprint-based models and perform strongly for the docking score prediction task. Overall, MPNNs with proximity graph data structures augment the prediction of ligand–receptor complex properties when ligand–receptor data are available.
{"title":"Proximity Graph Networks: Predicting Ligand Affinity with Message Passing Neural Networks","authors":"Zachary J. Gale-Day, Laura Shub, Kangway V. Chuang, Michael J. Keiser","doi":"10.1021/acs.jcim.4c00311","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00311","url":null,"abstract":"Message passing neural networks (MPNNs) on molecular graphs generate continuous and differentiable encodings of small molecules with state-of-the-art performance on protein–ligand complex scoring tasks. Here, we describe the proximity graph network (PGN) package, an open-source toolkit that constructs ligand–receptor graphs based on atom proximity and allows users to rapidly apply and evaluate MPNN architectures for a broad range of tasks. We demonstrate the utility of PGN by introducing benchmarks for affinity and docking score prediction tasks. Graph networks generalize better than fingerprint-based models and perform strongly for the docking score prediction task. Overall, MPNNs with proximity graph data structures augment the prediction of ligand–receptor complex properties when ligand–receptor data are available.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141489782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1021/acs.jcim.4c00485
Peter Eckmann, Jake Anderson, Rose Yu, Michael K Gilson
Predicting the activities of new compounds against biophysical or phenotypic assays based on the known activities of one or a few existing compounds is a common goal in early stage drug discovery. This problem can be cast as a "few-shot learning" challenge, and prior studies have developed few-shot learning methods to classify compounds as active versus inactive. However, the ability to go beyond classification and rank compounds by expected affinity is more valuable. We describe Few-Shot Compound Activity Prediction (FS-CAP), a novel neural architecture trained on a large bioactivity data set to predict compound activities against an assay outside the training set, based on only the activities of a few known compounds against the same assay. Our model aggregates encodings generated from the known compounds and their activities to capture assay information and uses a separate encoder for the new compound whose activity is to be predicted. The new method provides encouraging results relative to traditional chemical-similarity-based techniques as well as other state-of-the-art few-shot learning methods in tests on a variety of ligand-based drug discovery settings and data sets. The code for FS-CAP is available at https://github.com/Rose-STL-Lab/FS-CAP.
{"title":"Ligand-Based Compound Activity Prediction via Few-Shot Learning.","authors":"Peter Eckmann, Jake Anderson, Rose Yu, Michael K Gilson","doi":"10.1021/acs.jcim.4c00485","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00485","url":null,"abstract":"<p><p>Predicting the activities of new compounds against biophysical or phenotypic assays based on the known activities of one or a few existing compounds is a common goal in early stage drug discovery. This problem can be cast as a \"few-shot learning\" challenge, and prior studies have developed few-shot learning methods to classify compounds as active versus inactive. However, the ability to go beyond classification and rank compounds by expected affinity is more valuable. We describe <i>Few-Shot Compound Activity Prediction</i> (FS-CAP), a novel neural architecture trained on a large bioactivity data set to predict compound activities against an assay outside the training set, based on only the activities of a few known compounds against the same assay. Our model aggregates encodings generated from the known compounds and their activities to capture assay information and uses a separate encoder for the new compound whose activity is to be predicted. The new method provides encouraging results relative to traditional chemical-similarity-based techniques as well as other state-of-the-art few-shot learning methods in tests on a variety of ligand-based drug discovery settings and data sets. The code for FS-CAP is available at https://github.com/Rose-STL-Lab/FS-CAP.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Foods possess a range of unexplored functionalities; however, fully identifying these functions through empirical means presents significant challenges. In this study, we have proposed an in silico approach to comprehensively predict the functionalities of foods, encompassing even processed foods. This prediction is accomplished through the utilization of machine learning on biomedical big data. Our focus revolves around disease-related protein pathways, wherein we statistically evaluate how the constituent compounds collaboratively regulate these pathways. The proposed method has been employed across 876 foods and 83 diseases, leading to an extensive revelation of both food functionalities and their underlying operational mechanisms. Additionally, this approach identifies food combinations that potentially affect molecular pathways based on interrelationships between food functions within disease-related pathways. Our proposed method holds potential for advancing preventive healthcare.
{"title":"Revealing Comprehensive Food Functionalities and Mechanisms of Action through Machine Learning.","authors":"Nanako Inoue, Tomokazu Shibata, Yusuke Tanaka, Hiromu Taguchi, Ryusuke Sawada, Kenshin Goto, Shogo Momokita, Morihiro Aoyagi, Takashi Hirao, Yoshihiro Yamanishi","doi":"10.1021/acs.jcim.4c00061","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00061","url":null,"abstract":"<p><p>Foods possess a range of unexplored functionalities; however, fully identifying these functions through empirical means presents significant challenges. In this study, we have proposed an <i>in silico</i> approach to comprehensively predict the functionalities of foods, encompassing even processed foods. This prediction is accomplished through the utilization of machine learning on biomedical big data. Our focus revolves around disease-related protein pathways, wherein we statistically evaluate how the constituent compounds collaboratively regulate these pathways. The proposed method has been employed across 876 foods and 83 diseases, leading to an extensive revelation of both food functionalities and their underlying operational mechanisms. Additionally, this approach identifies food combinations that potentially affect molecular pathways based on interrelationships between food functions within disease-related pathways. Our proposed method holds potential for advancing preventive healthcare.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1021/acs.jcim.4c00366
Qing Liu, Dakuo He, Mengmeng Fan, Jinpeng Wang, Zeyu Cui, Hao Wang, Yan Mi, Ning Li, Qingqi Meng, Yue Hou
Ameliorating microglia-mediated neuroinflammation is a crucial strategy in developing new drugs for neurodegenerative diseases. Plant compounds are an important screening target for the discovery of drugs for the treatment of neurodegenerative diseases. However, due to the spatial complexity of phytochemicals, it becomes particularly important to evaluate the effectiveness of compounds while avoiding the mixing of cytotoxic substances in the early stages of compound screening. Traditional high-throughput screening methods suffer from high cost and low efficiency. A computational model based on machine learning provides a novel avenue for cytotoxicity determination. In this study, a microglia cytotoxicity classifier was developed using a machine learning approach. First, we proposed a data splitting strategy based on the molecule murcko generic scaffold, under this condition, three machine learning approaches were coupled with three kinds of molecular representation methods to construct microglia cytotoxicity classifier, which were then compared and assessed by the predictive accuracy, balanced accuracy, F1-score, and Matthews Correlation Coefficient. Then, the recursive feature elimination integrated with support vector machine (RFE-SVC) dimension reduction method was introduced to molecular fingerprints with high dimensions to further improve the model performance. Among all the microglial cytotoxicity classifiers, the SVM coupled with ECFP4 fingerprint after feature selection (ECFP4-RFE-SVM) obtained the most accurate classification for the test set (ACC of 0.99, BA of 0.99, F1-score of 0.99, MCC of 0.97). Finally, the Shapley additive explanations (SHAP) method was used in interpreting the microglia cytotoxicity classifier and key substructure smart identified as structural alerts. Experimental results show that ECFP4-RFE-SVM have reliable classification capability for microglia cytotoxicity, and SHAP can not only provide a rational explanation for microglia cytotoxicity predictions, but also offer a guideline for subsequent molecular cytotoxicity modifications.
{"title":"Prediction and Interpretation Microglia Cytotoxicity by Machine Learning.","authors":"Qing Liu, Dakuo He, Mengmeng Fan, Jinpeng Wang, Zeyu Cui, Hao Wang, Yan Mi, Ning Li, Qingqi Meng, Yue Hou","doi":"10.1021/acs.jcim.4c00366","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00366","url":null,"abstract":"<p><p>Ameliorating microglia-mediated neuroinflammation is a crucial strategy in developing new drugs for neurodegenerative diseases. Plant compounds are an important screening target for the discovery of drugs for the treatment of neurodegenerative diseases. However, due to the spatial complexity of phytochemicals, it becomes particularly important to evaluate the effectiveness of compounds while avoiding the mixing of cytotoxic substances in the early stages of compound screening. Traditional high-throughput screening methods suffer from high cost and low efficiency. A computational model based on machine learning provides a novel avenue for cytotoxicity determination. In this study, a microglia cytotoxicity classifier was developed using a machine learning approach. First, we proposed a data splitting strategy based on the molecule murcko generic scaffold, under this condition, three machine learning approaches were coupled with three kinds of molecular representation methods to construct microglia cytotoxicity classifier, which were then compared and assessed by the predictive accuracy, balanced accuracy, F<sub>1</sub>-score, and Matthews Correlation Coefficient. Then, the recursive feature elimination integrated with support vector machine (RFE-SVC) dimension reduction method was introduced to molecular fingerprints with high dimensions to further improve the model performance. Among all the microglial cytotoxicity classifiers, the SVM coupled with ECFP4 fingerprint after feature selection (ECFP4-RFE-SVM) obtained the most accurate classification for the test set (ACC of 0.99, BA of 0.99, F<sub>1</sub>-score of 0.99, MCC of 0.97). Finally, the Shapley additive explanations (SHAP) method was used in interpreting the microglia cytotoxicity classifier and key substructure smart identified as structural alerts. Experimental results show that ECFP4-RFE-SVM have reliable classification capability for microglia cytotoxicity, and SHAP can not only provide a rational explanation for microglia cytotoxicity predictions, but also offer a guideline for subsequent molecular cytotoxicity modifications.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141464275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1021/acs.jcim.4c00457
Lewis Mervin, Alexey Voronov, Mikhail Kabeshov, Ola Engkvist
Machine-learning (ML) and deep-learning (DL) approaches to predict the molecular properties of small molecules are increasingly deployed within the design-make-test-analyze (DMTA) drug design cycle to predict molecular properties of interest. Despite this uptake, there are only a few automated packages to aid their development and deployment that also support uncertainty estimation, model explainability, and other key aspects of model usage. This represents a key unmet need within the field, and the large number of molecular representations and algorithms (and associated parameters) means it is nontrivial to robustly optimize, evaluate, reproduce, and deploy models. Here, we present QSARtuna, a molecule property prediction modeling pipeline, written in Python and utilizing the Optuna, Scikit-learn, RDKit, and ChemProp packages, which enables the efficient and automated comparison between molecular representations and machine learning models. The platform was developed by considering the increasingly important aspect of model uncertainty quantification and explainability by design. We provide details for our framework and provide illustrative examples to demonstrate the capability of the software when applied to simple molecular property, reaction/reactivity prediction, and DNA encoded library enrichment classification. We hope that the release of QSARtuna will further spur innovation in automatic ML modeling and provide a platform for education of best practices in molecular property modeling. The code for the QSARtuna framework is made freely available via GitHub.
在 "设计-制造-测试-分析"(DMTA)药物设计周期中,越来越多地采用机器学习(ML)和深度学习(DL)方法来预测小分子的分子特性。尽管如此,只有少数自动化软件包可以帮助开发和部署这些模型,同时还支持不确定性估计、模型可解释性以及模型使用的其他关键方面。这是该领域尚未满足的一个关键需求,而大量的分子表征和算法(以及相关参数)意味着要稳健地优化、评估、复制和部署模型并非易事。在此,我们介绍 QSARtuna,这是一个用 Python 编写的分子性质预测建模管道,它利用 Optuna、Scikit-learn、RDKit 和 ChemProp 软件包,实现了分子表征与机器学习模型之间的高效自动比较。该平台的开发考虑了日益重要的模型不确定性量化和可解释性设计。我们将详细介绍我们的框架,并举例说明该软件在应用于简单分子特性、反应/活性预测和 DNA 编码文库富集分类时的能力。我们希望 QSARtuna 的发布能进一步推动自动 ML 建模的创新,并为分子性质建模的最佳实践教育提供一个平台。QSARtuna 框架的代码可通过 GitHub 免费获取。
{"title":"QSARtuna: An Automated QSAR Modeling Platform for Molecular Property Prediction in Drug Design.","authors":"Lewis Mervin, Alexey Voronov, Mikhail Kabeshov, Ola Engkvist","doi":"10.1021/acs.jcim.4c00457","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00457","url":null,"abstract":"<p><p>Machine-learning (ML) and deep-learning (DL) approaches to predict the molecular properties of small molecules are increasingly deployed within the design-make-test-analyze (DMTA) drug design cycle to predict molecular properties of interest. Despite this uptake, there are only a few automated packages to aid their development and deployment that also support uncertainty estimation, model explainability, and other key aspects of model usage. This represents a key unmet need within the field, and the large number of molecular representations and algorithms (and associated parameters) means it is nontrivial to robustly optimize, evaluate, reproduce, and deploy models. Here, we present QSARtuna, a molecule property prediction modeling pipeline, written in Python and utilizing the Optuna, Scikit-learn, RDKit, and ChemProp packages, which enables the efficient and automated comparison between molecular representations and machine learning models. The platform was developed by considering the increasingly important aspect of model uncertainty quantification and explainability by design. We provide details for our framework and provide illustrative examples to demonstrate the capability of the software when applied to simple molecular property, reaction/reactivity prediction, and DNA encoded library enrichment classification. We hope that the release of QSARtuna will further spur innovation in automatic ML modeling and provide a platform for education of best practices in molecular property modeling. The code for the QSARtuna framework is made freely available via GitHub.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}