WGAN-GP_Glu：基于双生成器-Wasserstein GAN 和梯度惩罚算法的半监督模型，用于谷氨酰化位点识别。

IF 7 2区医学 Q1 BIOLOGY Computers in biology and medicine Pub Date : 2024-11-14 DOI:10.1016/j.compbiomed.2024.109328

Qiao Ning , Zedong Qi

{"title":"WGAN-GP_Glu：基于双生成器-Wasserstein GAN 和梯度惩罚算法的半监督模型，用于谷氨酰化位点识别。","authors":"Qiao Ning , Zedong Qi","doi":"10.1016/j.compbiomed.2024.109328","DOIUrl":null,"url":null,"abstract":"<div><div>As an important post-translational modification, glutarylation plays a crucial role in a variety of cellular functions. Recently, diverse computational methods for glutarylation site identification have been proposed. However, the class imbalance problem due to data noise and uncertainty of non-glutarylation sites remains a great challenge. In this article, we propose a novel semi-supervised learning algorithm, called WGAN-GP_Glu, for identifying reliable non-glutarylation lysine sites from those without glutarylation annotation. WGAN-GP_Glu method is a multi-module framework algorithm, which mainly includes a reliable negative sample selection module, a deep feature extraction module, and a glutarylation site prediction module. In reliable negative sample selection module, we design an improved method of Wasserstein GAN with Gradient Penalty (WGAN-GP), named ReliableWGAN-GP, including three parts, two generators G1, G2 and a discriminator D, which can select reliable non-glutarylation samples from a great number of unlabeled samples. Generator G1 is utilized to generate noise data from unlabeled samples. For generator G2, both the positive sample and the noise data are used as inputs to improve the discriminant capability of discriminator D. Then, convolutional neural network and bidirectional long short-term memory network combined with attention mechanism are utilized to extract deep features for glutarylation samples and reliable non-glutarylation samples. Finally, a glutarylation site prediction module based on the three-layer fully connected layer is designed to make class predictions for samples. The sensitivity, specificity, accuracy and Matthew correlation coefficient of WGAN-GP_Glu on the independent test data set reach 90.58 %, 95.82 %, 94.44 % and 0.8645, respectively, which surpassed the existing methods for glutarylation sites prediction. Therefore, WGAN-GP_Glu can serve as a powerful tool in identifying glutarylation sites and the ReliableWGAN-GP algorithm is effective in selecting reliable negative samples. The data and code are available at <span><span>https://github.com/xbbxhbc/WGAN-GP_Glu.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109328"},"PeriodicalIF":7.0000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"WGAN-GP_Glu: A semi-supervised model based on double generator-Wasserstein GAN with gradient penalty algorithm for glutarylation site identification\",\"authors\":\"Qiao Ning , Zedong Qi\",\"doi\":\"10.1016/j.compbiomed.2024.109328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>As an important post-translational modification, glutarylation plays a crucial role in a variety of cellular functions. Recently, diverse computational methods for glutarylation site identification have been proposed. However, the class imbalance problem due to data noise and uncertainty of non-glutarylation sites remains a great challenge. In this article, we propose a novel semi-supervised learning algorithm, called WGAN-GP_Glu, for identifying reliable non-glutarylation lysine sites from those without glutarylation annotation. WGAN-GP_Glu method is a multi-module framework algorithm, which mainly includes a reliable negative sample selection module, a deep feature extraction module, and a glutarylation site prediction module. In reliable negative sample selection module, we design an improved method of Wasserstein GAN with Gradient Penalty (WGAN-GP), named ReliableWGAN-GP, including three parts, two generators G1, G2 and a discriminator D, which can select reliable non-glutarylation samples from a great number of unlabeled samples. Generator G1 is utilized to generate noise data from unlabeled samples. For generator G2, both the positive sample and the noise data are used as inputs to improve the discriminant capability of discriminator D. Then, convolutional neural network and bidirectional long short-term memory network combined with attention mechanism are utilized to extract deep features for glutarylation samples and reliable non-glutarylation samples. Finally, a glutarylation site prediction module based on the three-layer fully connected layer is designed to make class predictions for samples. The sensitivity, specificity, accuracy and Matthew correlation coefficient of WGAN-GP_Glu on the independent test data set reach 90.58 %, 95.82 %, 94.44 % and 0.8645, respectively, which surpassed the existing methods for glutarylation sites prediction. Therefore, WGAN-GP_Glu can serve as a powerful tool in identifying glutarylation sites and the ReliableWGAN-GP algorithm is effective in selecting reliable negative samples. The data and code are available at <span><span>https://github.com/xbbxhbc/WGAN-GP_Glu.git</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":10578,\"journal\":{\"name\":\"Computers in biology and medicine\",\"volume\":\"184 \",\"pages\":\"Article 109328\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in biology and medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010482524014136\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482524014136","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

作为一种重要的翻译后修饰，谷氨酰化在多种细胞功能中发挥着至关重要的作用。最近，人们提出了多种谷氨酰化位点识别的计算方法。然而，由于数据噪声和非谷氨酰化位点的不确定性导致的类不平衡问题仍然是一个巨大的挑战。在本文中，我们提出了一种名为 WGAN-GP_Glu 的新型半监督学习算法，用于从没有谷氨酰化注释的赖氨酸位点中识别可靠的非谷氨酰化位点。WGAN-GP_Glu 方法是一种多模块框架算法，主要包括可靠的负样本选择模块、深度特征提取模块和谷氨酰化位点预测模块。在可靠负样本选择模块中，我们设计了一种改进的带梯度惩罚的 Wasserstein GAN 方法（WGAN-GP），命名为 ReliableWGAN-GP，包括三个部分，两个生成器 G1、G2 和一个判别器 D，可以从大量未标记样本中选择可靠的非谷氨酰化样本。生成器 G1 用于从未标明的样本中生成噪声数据。然后，利用卷积神经网络和双向长短期记忆网络结合注意力机制，提取谷氨酰化样本和可靠的非谷氨酰化样本的深度特征。最后，设计了基于三层全连接层的谷氨酰化位点预测模块，对样本进行分类预测。在独立测试数据集上，WGAN-GP_Glu 的灵敏度、特异度、准确度和马修相关系数分别达到 90.58 %、95.82 %、94.44 % 和 0.8645，超过了现有的谷氨酰化位点预测方法。因此，WGAN-GP_Glu 可以作为鉴定谷氨酰化位点的有力工具，ReliableWGAN-GP 算法可以有效地选择可靠的阴性样本。数据和代码见 https://github.com/xbbxhbc/WGAN-GP_Glu.git。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

WGAN-GP_Glu: A semi-supervised model based on double generator-Wasserstein GAN with gradient penalty algorithm for glutarylation site identification

As an important post-translational modification, glutarylation plays a crucial role in a variety of cellular functions. Recently, diverse computational methods for glutarylation site identification have been proposed. However, the class imbalance problem due to data noise and uncertainty of non-glutarylation sites remains a great challenge. In this article, we propose a novel semi-supervised learning algorithm, called WGAN-GP_Glu, for identifying reliable non-glutarylation lysine sites from those without glutarylation annotation. WGAN-GP_Glu method is a multi-module framework algorithm, which mainly includes a reliable negative sample selection module, a deep feature extraction module, and a glutarylation site prediction module. In reliable negative sample selection module, we design an improved method of Wasserstein GAN with Gradient Penalty (WGAN-GP), named ReliableWGAN-GP, including three parts, two generators G1, G2 and a discriminator D, which can select reliable non-glutarylation samples from a great number of unlabeled samples. Generator G1 is utilized to generate noise data from unlabeled samples. For generator G2, both the positive sample and the noise data are used as inputs to improve the discriminant capability of discriminator D. Then, convolutional neural network and bidirectional long short-term memory network combined with attention mechanism are utilized to extract deep features for glutarylation samples and reliable non-glutarylation samples. Finally, a glutarylation site prediction module based on the three-layer fully connected layer is designed to make class predictions for samples. The sensitivity, specificity, accuracy and Matthew correlation coefficient of WGAN-GP_Glu on the independent test data set reach 90.58 %, 95.82 %, 94.44 % and 0.8645, respectively, which surpassed the existing methods for glutarylation sites prediction. Therefore, WGAN-GP_Glu can serve as a powerful tool in identifying glutarylation sites and the ReliableWGAN-GP algorithm is effective in selecting reliable negative samples. The data and code are available at https://github.com/xbbxhbc/WGAN-GP_Glu.git.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers in biology and medicine 工程技术-工程：生物医学

CiteScore

11.70

自引率

10.40%

发文量

1086

审稿时长

74 days

期刊介绍： Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.