基于GCAN指纹和集成机器学习算法的药物发现中分子化合物配体虚拟筛选

IF 2.2 4区计算机科学 Q2 Computer Science Computer Systems Science and Engineering Pub Date : 2023-01-01 DOI:10.32604/csse.2023.033807

R. Ani, O. S. Deepa, B. R. Manju

{"title":"基于GCAN指纹和集成机器学习算法的药物发现中分子化合物配体虚拟筛选","authors":"R. Ani, O. S. Deepa, B. R. Manju","doi":"10.32604/csse.2023.033807","DOIUrl":null,"url":null,"abstract":"The drug development process takes a long time since it requires sorting through a large number of inactive compounds from a large collection of compounds chosen for study and choosing just the most pertinent compounds that can bind to a disease protein. The use of virtual screening in pharmaceutical research is growing in popularity. During the early phases of medication research and development, it is crucial. Chemical compound searches are now more narrowly targeted. Because the databases contain more and more ligands, this method needs to be quick and exact. Neural network fingerprints were created more effectively than the well-known Extended Connectivity Fingerprint (ECFP). Only the largest sub-graph is taken into consideration to learn the representation, despite the fact that the conventional graph network generates a better-encoded fingerprint. When using the average or maximum pooling layer, it also contains unrelated data. This article suggested the Graph Convolutional Attention Network (GCAN), a graph neural network with an attention mechanism, to address these problems. Additionally, it makes the nodes or sub-graphs that are used to create the molecular fingerprint more significant. The generated fingerprint is used to classify drugs using ensemble learning. As base classifiers, ensemble stacking is applied to Support Vector Machines (SVM), Random Forest, Nave Bayes, Decision Trees, AdaBoost, and Gradient Boosting. When compared to existing models, the proposed GCAN fingerprint with an ensemble model achieves relatively high accuracy, sensitivity, specificity, and area under the curve. Additionally, it is revealed that our ensemble learning with generated molecular fingerprint yields 91% accuracy, outperforming earlier approaches.","PeriodicalId":50634,"journal":{"name":"Computer Systems Science and Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ligand Based Virtual Screening of Molecular Compounds in Drug Discovery Using GCAN Fingerprint and Ensemble Machine Learning Algorithm\",\"authors\":\"R. Ani, O. S. Deepa, B. R. Manju\",\"doi\":\"10.32604/csse.2023.033807\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The drug development process takes a long time since it requires sorting through a large number of inactive compounds from a large collection of compounds chosen for study and choosing just the most pertinent compounds that can bind to a disease protein. The use of virtual screening in pharmaceutical research is growing in popularity. During the early phases of medication research and development, it is crucial. Chemical compound searches are now more narrowly targeted. Because the databases contain more and more ligands, this method needs to be quick and exact. Neural network fingerprints were created more effectively than the well-known Extended Connectivity Fingerprint (ECFP). Only the largest sub-graph is taken into consideration to learn the representation, despite the fact that the conventional graph network generates a better-encoded fingerprint. When using the average or maximum pooling layer, it also contains unrelated data. This article suggested the Graph Convolutional Attention Network (GCAN), a graph neural network with an attention mechanism, to address these problems. Additionally, it makes the nodes or sub-graphs that are used to create the molecular fingerprint more significant. The generated fingerprint is used to classify drugs using ensemble learning. As base classifiers, ensemble stacking is applied to Support Vector Machines (SVM), Random Forest, Nave Bayes, Decision Trees, AdaBoost, and Gradient Boosting. When compared to existing models, the proposed GCAN fingerprint with an ensemble model achieves relatively high accuracy, sensitivity, specificity, and area under the curve. Additionally, it is revealed that our ensemble learning with generated molecular fingerprint yields 91% accuracy, outperforming earlier approaches.\",\"PeriodicalId\":50634,\"journal\":{\"name\":\"Computer Systems Science and Engineering\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Systems Science and Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32604/csse.2023.033807\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Systems Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32604/csse.2023.033807","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

摘要

药物开发过程需要很长时间，因为它需要从大量选择用于研究的化合物中筛选大量无活性化合物，并选择能够与疾病蛋白质结合的最相关的化合物。虚拟筛选在药物研究中的应用日益普及。在药物研究和开发的早期阶段，这是至关重要的。化学化合物的搜索现在更有针对性。由于数据库中包含的配体越来越多，该方法需要快速准确。神经网络指纹的创建比众所周知的扩展连接指纹(ECFP)更有效。尽管传统的图网络生成了更好的编码指纹，但它只考虑最大的子图来学习表征。当使用平均或最大池化层时，它还包含不相关的数据。本文提出了一种具有注意机制的图神经网络——图卷积注意网络(GCAN)来解决这些问题。此外，它使用于创建分子指纹的节点或子图更加重要。生成的指纹用于使用集成学习对药物进行分类。作为基本分类器，集成叠加被应用于支持向量机(SVM)、随机森林、朴素贝叶斯、决策树、AdaBoost和梯度增强。与现有模型相比，集成模型的GCAN指纹具有较高的精度、灵敏度、特异度和曲线下面积。此外，我们的集成学习与生成的分子指纹的准确率达到91%，优于早期的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Ligand Based Virtual Screening of Molecular Compounds in Drug Discovery Using GCAN Fingerprint and Ensemble Machine Learning Algorithm

The drug development process takes a long time since it requires sorting through a large number of inactive compounds from a large collection of compounds chosen for study and choosing just the most pertinent compounds that can bind to a disease protein. The use of virtual screening in pharmaceutical research is growing in popularity. During the early phases of medication research and development, it is crucial. Chemical compound searches are now more narrowly targeted. Because the databases contain more and more ligands, this method needs to be quick and exact. Neural network fingerprints were created more effectively than the well-known Extended Connectivity Fingerprint (ECFP). Only the largest sub-graph is taken into consideration to learn the representation, despite the fact that the conventional graph network generates a better-encoded fingerprint. When using the average or maximum pooling layer, it also contains unrelated data. This article suggested the Graph Convolutional Attention Network (GCAN), a graph neural network with an attention mechanism, to address these problems. Additionally, it makes the nodes or sub-graphs that are used to create the molecular fingerprint more significant. The generated fingerprint is used to classify drugs using ensemble learning. As base classifiers, ensemble stacking is applied to Support Vector Machines (SVM), Random Forest, Nave Bayes, Decision Trees, AdaBoost, and Gradient Boosting. When compared to existing models, the proposed GCAN fingerprint with an ensemble model achieves relatively high accuracy, sensitivity, specificity, and area under the curve. Additionally, it is revealed that our ensemble learning with generated molecular fingerprint yields 91% accuracy, outperforming earlier approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Systems Science and Engineering 工程技术-计算机：理论方法

CiteScore

3.10

自引率

13.60%

发文量

308

审稿时长

>12 weeks

期刊介绍： The journal is devoted to the publication of high quality papers on theoretical developments in computer systems science, and their applications in computer systems engineering. Original research papers, state-of-the-art reviews and technical notes are invited for publication. All papers will be refereed by acknowledged experts in the field, and may be (i) accepted without change, (ii) require amendment and subsequent re-refereeing, or (iii) be rejected on the grounds of either relevance or content. The submission of a paper implies that, if accepted for publication, it will not be published elsewhere in the same form, in any language, without the prior consent of the Publisher.

期刊最新文献

Faster RCNN Target Detection Algorithm Integrating CBAM and FPN SNELM: SqueezeNet-Guided ELM for COVID-19 Recognition. WACPN: A Neural Network for Pneumonia Diagnosis. A Lightweight Driver Drowsiness Detection System Using 3DCNN With LSTM Brain Tumor Diagnosis Using Sparrow Search Algorithm Based Deep Learning Model