DeepBP: Ensemble deep learning strategy for bioactive peptide prediction.

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS BMC Bioinformatics Pub Date : 2024-11-11 DOI:10.1186/s12859-024-05974-5

Ming Zhang, Jianren Zhou, Xiaohua Wang, Xun Wang, Fang Ge

{"title":"DeepBP: Ensemble deep learning strategy for bioactive peptide prediction.","authors":"Ming Zhang, Jianren Zhou, Xiaohua Wang, Xun Wang, Fang Ge","doi":"10.1186/s12859-024-05974-5","DOIUrl":null,"url":null,"abstract":"Background: Bioactive peptides are important bioactive molecules composed of short-chain amino acids that play various crucial roles in the body, such as regulating physiological processes and promoting immune responses and antibacterial effects. Due to their significance, bioactive peptides have broad application potential in drug development, food science, and biotechnology. Among them, understanding their biological mechanisms will contribute to new ideas for drug discovery and disease treatment.Results: This study employs generative adversarial capsule networks (CapsuleGAN), gated recurrent units (GRU), and convolutional neural networks (CNN) as base classifiers to achieve ensemble learning through voting methods, which not only obtains high-precision prediction results on the angiotensin-converting enzyme (ACE) inhibitory peptides dataset and the anticancer peptides (ACP) dataset but also demonstrates effective model performance. For this method, we first utilized the protein language model-evolutionary scale modeling (ESM-2)-to extract relevant features for the ACE inhibitory peptides and ACP datasets. Following feature extraction, we trained three deep learning models-CapsuleGAN, GRU, and CNN-while continuously adjusting the model parameters throughout the training process. Finally, during the voting stage, different weights were assigned to the models based on their prediction accuracy, allowing full utilization of the model's performance. Experimental results show that on the ACE inhibitory peptide dataset, the balanced accuracy is 0.926, the Matthews correlation coefficient (MCC) is 0.831, and the area under the curve is 0.966; on the ACP dataset, the accuracy (ACC) is 0.779, and the MCC is 0.558. The experimental results on both datasets are superior to existing methods, demonstrating the effectiveness of the experimental approach.Conclusion: In this study, CapsuleGAN, GRU, and CNN were successfully employed as base classifiers to implement ensemble learning, which not only achieved good results in the prediction of two datasets but also surpassed existing methods. The ability to predict peptides with strong ACE inhibitory activity and ACPs more accurately and quickly is significant, and this work provides valuable insights for predicting other functional peptides. The source code and dataset for this experiment are publicly available at https://github.com/Zhou-Jianren/bioactive-peptides .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"352"},"PeriodicalIF":3.3000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11556071/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05974-5","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Bioactive peptides are important bioactive molecules composed of short-chain amino acids that play various crucial roles in the body, such as regulating physiological processes and promoting immune responses and antibacterial effects. Due to their significance, bioactive peptides have broad application potential in drug development, food science, and biotechnology. Among them, understanding their biological mechanisms will contribute to new ideas for drug discovery and disease treatment.

Results: This study employs generative adversarial capsule networks (CapsuleGAN), gated recurrent units (GRU), and convolutional neural networks (CNN) as base classifiers to achieve ensemble learning through voting methods, which not only obtains high-precision prediction results on the angiotensin-converting enzyme (ACE) inhibitory peptides dataset and the anticancer peptides (ACP) dataset but also demonstrates effective model performance. For this method, we first utilized the protein language model-evolutionary scale modeling (ESM-2)-to extract relevant features for the ACE inhibitory peptides and ACP datasets. Following feature extraction, we trained three deep learning models-CapsuleGAN, GRU, and CNN-while continuously adjusting the model parameters throughout the training process. Finally, during the voting stage, different weights were assigned to the models based on their prediction accuracy, allowing full utilization of the model's performance. Experimental results show that on the ACE inhibitory peptide dataset, the balanced accuracy is 0.926, the Matthews correlation coefficient (MCC) is 0.831, and the area under the curve is 0.966; on the ACP dataset, the accuracy (ACC) is 0.779, and the MCC is 0.558. The experimental results on both datasets are superior to existing methods, demonstrating the effectiveness of the experimental approach.

Conclusion: In this study, CapsuleGAN, GRU, and CNN were successfully employed as base classifiers to implement ensemble learning, which not only achieved good results in the prediction of two datasets but also surpassed existing methods. The ability to predict peptides with strong ACE inhibitory activity and ACPs more accurately and quickly is significant, and this work provides valuable insights for predicting other functional peptides. The source code and dataset for this experiment are publicly available at https://github.com/Zhou-Jianren/bioactive-peptides .

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DeepBP：用于生物活性肽预测的集合深度学习策略。

背景：生物活性肽是由短链氨基酸组成的重要生物活性分子，在人体内发挥着多种重要作用，如调节生理过程、促进免疫反应和抗菌作用等。由于其重要作用，生物活性肽在药物开发、食品科学和生物技术领域具有广泛的应用潜力。其中，了解生物活性肽的生物学机制将有助于为药物发现和疾病治疗提供新思路：本研究采用生成对抗胶囊网络（CapsuleGAN）、门控递归单元（GRU）和卷积神经网络（CNN）作为基础分类器，通过投票方法实现集合学习，不仅在血管紧张素转换酶（ACE）抑制肽数据集和抗癌肽（ACP）数据集上获得了高精度的预测结果，而且展示了有效的模型性能。在该方法中，我们首先利用蛋白质语言模型--进化尺度建模（ESM-2）--提取 ACE 抑制肽和 ACP 数据集的相关特征。提取特征后，我们训练了三个深度学习模型--CapsuleGAN、GRU 和 CNN，同时在整个训练过程中不断调整模型参数。最后，在投票阶段，我们根据模型的预测准确率为其分配了不同的权重，从而充分发挥了模型的性能。实验结果表明，在 ACE 抑制肽数据集上，平衡准确率为 0.926，马修斯相关系数（MCC）为 0.831，曲线下面积为 0.966；在 ACP 数据集上，准确率（ACC）为 0.779，MCC 为 0.558。在这两个数据集上的实验结果均优于现有方法，证明了实验方法的有效性：本研究成功地采用了 CapsuleGAN、GRU 和 CNN 作为基础分类器来实现集合学习，不仅在两个数据集的预测中取得了良好的效果，而且超越了现有的方法。能更准确、更快速地预测具有强 ACE 抑制活性的多肽和 ACP 具有重要意义，这项工作为预测其他功能性多肽提供了宝贵的启示。本实验的源代码和数据集可在 https://github.com/Zhou-Jianren/bioactive-peptides 网站上公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.