Molecule generation toward target protein (SARS-CoV-2) using reinforcement learning-based graph neural network via knowledge graph.

IF 2 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Network Modeling and Analysis in Health Informatics and Bioinformatics Pub Date : 2023-01-01 DOI:10.1007/s13721-023-00409-2

Amit Ranjan, Hritik Kumar, Deepshikha Kumari, Archit Anand, Rajiv Misra

{"title":"Molecule generation toward target protein (SARS-CoV-2) using reinforcement learning-based graph neural network via knowledge graph.","authors":"Amit Ranjan, Hritik Kumar, Deepshikha Kumari, Archit Anand, Rajiv Misra","doi":"10.1007/s13721-023-00409-2","DOIUrl":null,"url":null,"abstract":"AI-driven approaches are widely used in drug discovery, where candidate molecules are generated and tested on a target protein for binding affinity prediction. However, generating new compounds with desirable molecular properties such as Quantitative Estimate of Drug-likeness (QED) and Dopamine Receptor D2 activity (DRD2) while adhering to distinct chemical laws is challenging. To address these challenges, we proposed a graph-based deep learning framework to generate potential therapeutic drugs targeting the SARS-CoV-2 protein. Our proposed framework consists of two modules: a novel reinforcement learning (RL)-based graph generative module with knowledge graph (KG) and a graph early fusion approach (GEFA) for binding affinity prediction. The first module uses a gated graph neural network (GGNN) model under the RL environment for generating novel molecular compounds with desired properties and a custom-made KG for molecule screening. The second module uses GEFA to predict binding affinity scores between the generated compounds and target proteins. Experiments show how fine-tuning the GGNN model under the RL environment enhances the molecules with desired properties to generate <math><mrow><mn>100</mn> <mo>%</mo></mrow> </math> valid and <math><mrow><mn>100</mn> <mo>%</mo></mrow> </math> unique compounds using different scoring functions. Additionally, KG-based screening reduces the search space of generated candidate molecules by <math><mrow><mn>96.64</mn> <mo>%</mo></mrow> </math> while retaining <math><mrow><mn>95.38</mn> <mo>%</mo></mrow> </math> of promising binding molecules against SARS-CoV-2 protein, i.e., 3C-like protease (3CLpro). We achieved a binding affinity score of 8.185 from the top rank of generated compound. In addition, we compared top-ranked generated compounds to Indinavir on different parameters, including drug-likeness and medicinal chemistry, for qualitative analysis from a drug development perspective.Supplementary information: The online version contains supplementary material available at 10.1007/s13721-023-00409-2.","PeriodicalId":44876,"journal":{"name":"Network Modeling and Analysis in Health Informatics and Bioinformatics","volume":"12 1","pages":"13"},"PeriodicalIF":2.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9817447/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Network Modeling and Analysis in Health Informatics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13721-023-00409-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

AI-driven approaches are widely used in drug discovery, where candidate molecules are generated and tested on a target protein for binding affinity prediction. However, generating new compounds with desirable molecular properties such as Quantitative Estimate of Drug-likeness (QED) and Dopamine Receptor D2 activity (DRD2) while adhering to distinct chemical laws is challenging. To address these challenges, we proposed a graph-based deep learning framework to generate potential therapeutic drugs targeting the SARS-CoV-2 protein. Our proposed framework consists of two modules: a novel reinforcement learning (RL)-based graph generative module with knowledge graph (KG) and a graph early fusion approach (GEFA) for binding affinity prediction. The first module uses a gated graph neural network (GGNN) model under the RL environment for generating novel molecular compounds with desired properties and a custom-made KG for molecule screening. The second module uses GEFA to predict binding affinity scores between the generated compounds and target proteins. Experiments show how fine-tuning the GGNN model under the RL environment enhances the molecules with desired properties to generate $100 %$ valid and $100 %$ unique compounds using different scoring functions. Additionally, KG-based screening reduces the search space of generated candidate molecules by $96.64 %$ while retaining $95.38 %$ of promising binding molecules against SARS-CoV-2 protein, i.e., 3C-like protease (3CLpro). We achieved a binding affinity score of 8.185 from the top rank of generated compound. In addition, we compared top-ranked generated compounds to Indinavir on different parameters, including drug-likeness and medicinal chemistry, for qualitative analysis from a drug development perspective.

Supplementary information: The online version contains supplementary material available at 10.1007/s13721-023-00409-2.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于知识图谱的基于强化学习的图神经网络对目标蛋白(SARS-CoV-2)的分子生成。

人工智能驱动的方法广泛用于药物发现，其中候选分子被生成并在目标蛋白上进行测试，以预测结合亲和力。然而，产生具有理想分子特性的新化合物，如定量估计药物相似性(QED)和多巴胺受体D2活性(DRD2)，同时遵守不同的化学规律是具有挑战性的。为了应对这些挑战，我们提出了一个基于图的深度学习框架，以生成针对SARS-CoV-2蛋白的潜在治疗药物。我们提出的框架由两个模块组成:一个基于知识图(KG)的新型强化学习(RL)的图生成模块和一个用于绑定亲和力预测的图早期融合方法(GEFA)。第一个模块在RL环境下使用门控图神经网络(GGNN)模型生成具有所需性质的新分子化合物，并使用定制的KG进行分子筛选。第二个模块使用GEFA来预测生成的化合物与目标蛋白之间的结合亲和力评分。实验表明，在RL环境下微调GGNN模型如何增强具有所需性质的分子，从而使用不同的评分函数生成100%有效和100%独特的化合物。此外，基于kg的筛选将生成的候选分子的搜索空间减少了96.64%，同时保留了95.38%的针对SARS-CoV-2蛋白的有希望的结合分子，即3c样蛋白酶(3CLpro)。在合成的化合物中，我们获得了8.185的结合亲和力评分。此外，我们还将排名靠前的合成化合物与Indinavir进行了不同参数的比较，包括药物相似性和药物化学，从药物开发的角度进行了定性分析。补充资料:在线版本提供补充资料，网址为10.1007/s13721-023-00409-2。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Network Modeling and Analysis in Health Informatics and Bioinformatics MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

5.40

自引率

4.30%

发文量

期刊介绍： NetMAHIB publishes original research articles and reviews reporting how graph theory, statistics, linear algebra and machine learning techniques can be effectively used for modelling and analysis in health informatics and bioinformatics. It aims at creating a synergy between these disciplines by providing a forum for disseminating the latest developments and research findings; hence, results can be shared with readers across institutions, governments, researchers, students, and the industry. The journal emphasizes fundamental contributions on new methodologies, discoveries and techniques that have general applicability and which form the basis for network based modelling, knowledge discovery, knowledge sharing and decision support to the benefit of patients, healthcare professionals and society in traditional and advanced emerging settings, including eHealth and mHealth .