PUResNetV2.0：利用稀疏表示改进配体结合位点预测的深度学习模型

IF 7.1 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Journal of Cheminformatics Pub Date : 2024-06-07 DOI:10.1186/s13321-024-00865-6

Kandel Jeevan, Shrestha Palistha, Hilal Tayara, Kil T. Chong

{"title":"PUResNetV2.0：利用稀疏表示改进配体结合位点预测的深度学习模型","authors":"Kandel Jeevan, Shrestha Palistha, Hilal Tayara, Kil T. Chong","doi":"10.1186/s13321-024-00865-6","DOIUrl":null,"url":null,"abstract":"<div><p>Accurate ligand binding site prediction (LBSP) within proteins is essential for drug discovery. We developed ProteinUNetResNetV2.0 (PUResNetV2.0), leveraging sparse representation of protein structures to improve LBSP accuracy. Our training dataset included protein complexes from 4729 protein families. Evaluations on benchmark datasets showed that PUResNetV2.0 achieved an 85.4% Distance Center Atom (DCA) success rate and a 74.7% F1 Score on the Holo801 dataset, outperforming existing methods. However, its performance in specific cases, such as RNA, DNA, peptide-like ligand, and ion binding site prediction, was limited due to constraints in our training data. Our findings underscore the potential of sparse representation in LBSP, especially for oligomeric structures, suggesting PUResNetV2.0 as a promising tool for computational drug discovery.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00865-6","citationCount":"0","resultStr":"{\"title\":\"PUResNetV2.0: a deep learning model leveraging sparse representation for improved ligand binding site prediction\",\"authors\":\"Kandel Jeevan, Shrestha Palistha, Hilal Tayara, Kil T. Chong\",\"doi\":\"10.1186/s13321-024-00865-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Accurate ligand binding site prediction (LBSP) within proteins is essential for drug discovery. We developed ProteinUNetResNetV2.0 (PUResNetV2.0), leveraging sparse representation of protein structures to improve LBSP accuracy. Our training dataset included protein complexes from 4729 protein families. Evaluations on benchmark datasets showed that PUResNetV2.0 achieved an 85.4% Distance Center Atom (DCA) success rate and a 74.7% F1 Score on the Holo801 dataset, outperforming existing methods. However, its performance in specific cases, such as RNA, DNA, peptide-like ligand, and ion binding site prediction, was limited due to constraints in our training data. Our findings underscore the potential of sparse representation in LBSP, especially for oligomeric structures, suggesting PUResNetV2.0 as a promising tool for computational drug discovery.</p></div>\",\"PeriodicalId\":617,\"journal\":{\"name\":\"Journal of Cheminformatics\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":7.1000,\"publicationDate\":\"2024-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00865-6\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cheminformatics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1186/s13321-024-00865-6\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00865-6","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

准确预测蛋白质中的配体结合位点（LBSP）对药物发现至关重要。我们开发了 ProteinUNetResNetV2.0（PUResNetV2.0），利用蛋白质结构的稀疏表示来提高配体结合位点预测的准确性。我们的训练数据集包括来自 4729 个蛋白质家族的蛋白质复合物。在基准数据集上进行的评估表明，PUResNetV2.0 在 Holo801 数据集上取得了 85.4% 的距离中心原子（DCA）成功率和 74.7% 的 F1 分数，优于现有方法。然而，由于训练数据的限制，它在特定情况下的表现有限，如 RNA、DNA、类肽配体和离子结合位点预测。我们的发现强调了稀疏表示在 LBSP 中的潜力，尤其是在寡聚结构方面，这表明 PUResNetV2.0 是一种很有前途的计算药物发现工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PUResNetV2.0: a deep learning model leveraging sparse representation for improved ligand binding site prediction

Accurate ligand binding site prediction (LBSP) within proteins is essential for drug discovery. We developed ProteinUNetResNetV2.0 (PUResNetV2.0), leveraging sparse representation of protein structures to improve LBSP accuracy. Our training dataset included protein complexes from 4729 protein families. Evaluations on benchmark datasets showed that PUResNetV2.0 achieved an 85.4% Distance Center Atom (DCA) success rate and a 74.7% F1 Score on the Holo801 dataset, outperforming existing methods. However, its performance in specific cases, such as RNA, DNA, peptide-like ligand, and ion binding site prediction, was limited due to constraints in our training data. Our findings underscore the potential of sparse representation in LBSP, especially for oligomeric structures, suggesting PUResNetV2.0 as a promising tool for computational drug discovery.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.