GPSD: a hybrid learning framework for the prediction of phosphatase-specific dephosphorylation sites.

IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Briefings in bioinformatics Pub Date : 2024-11-22 DOI:10.1093/bib/bbae694
Cheng Han, Shanshan Fu, Miaomiao Chen, Yujie Gou, Dan Liu, Chi Zhang, Xinhe Huang, Leming Xiao, Miaoying Zhao, Jiayi Zhang, Qiang Xiao, Di Peng, Yu Xue
{"title":"GPSD: a hybrid learning framework for the prediction of phosphatase-specific dephosphorylation sites.","authors":"Cheng Han, Shanshan Fu, Miaomiao Chen, Yujie Gou, Dan Liu, Chi Zhang, Xinhe Huang, Leming Xiao, Miaoying Zhao, Jiayi Zhang, Qiang Xiao, Di Peng, Yu Xue","doi":"10.1093/bib/bbae694","DOIUrl":null,"url":null,"abstract":"<p><p>Protein phosphorylation is dynamically and reversibly regulated by protein kinases and protein phosphatases, and plays an essential role in orchestrating a wide range of biological processes. Although a number of tools have been developed for predicting kinase-specific phosphorylation sites (p-sites), computational prediction of phosphatase-specific dephosphorylation sites remains to be a great challenge. In this study, we manually curated 4393 experimentally identified site-specific phosphatase-substrate relationships for 3463 dephosphorylation sites occurring on phosphoserine, phosphothreonine, and/or phosphotyrosine residues, from the literature and public databases. Then, we developed a hybrid learning framework, the group-based prediction system for the prediction of phosphatase-specific dephosphorylation sites (GPSD). For model training, we integrated 10 types of sequence features and utilized three types of machine learning methods, including penalized logistic regression, deep neural networks, and transformer neural networks. First, a pretrained model was constructed using 561 416 nonredundant p-sites and then fine-tuned to generate computational models for predicting general dephosphorylation sites. In addition, 103 individual phosphatase-specific predictors were constructed via transfer learning and meta-learning. For site prediction, one or multiple protein sequences in FASTA format could be inputted, and the prediction results will be shown together with additional annotations, such as protein-protein interactions, structural information, and disorder propensity. The online service of GPSD is freely available at https://gpsd.biocuckoo.cn/. We believe that GPSD can serve as a valuable tool for further analysis of dephosphorylation.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695897/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae694","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Protein phosphorylation is dynamically and reversibly regulated by protein kinases and protein phosphatases, and plays an essential role in orchestrating a wide range of biological processes. Although a number of tools have been developed for predicting kinase-specific phosphorylation sites (p-sites), computational prediction of phosphatase-specific dephosphorylation sites remains to be a great challenge. In this study, we manually curated 4393 experimentally identified site-specific phosphatase-substrate relationships for 3463 dephosphorylation sites occurring on phosphoserine, phosphothreonine, and/or phosphotyrosine residues, from the literature and public databases. Then, we developed a hybrid learning framework, the group-based prediction system for the prediction of phosphatase-specific dephosphorylation sites (GPSD). For model training, we integrated 10 types of sequence features and utilized three types of machine learning methods, including penalized logistic regression, deep neural networks, and transformer neural networks. First, a pretrained model was constructed using 561 416 nonredundant p-sites and then fine-tuned to generate computational models for predicting general dephosphorylation sites. In addition, 103 individual phosphatase-specific predictors were constructed via transfer learning and meta-learning. For site prediction, one or multiple protein sequences in FASTA format could be inputted, and the prediction results will be shown together with additional annotations, such as protein-protein interactions, structural information, and disorder propensity. The online service of GPSD is freely available at https://gpsd.biocuckoo.cn/. We believe that GPSD can serve as a valuable tool for further analysis of dephosphorylation.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GPSD:预测磷酸酶特异性去磷酸化位点的混合学习框架。
蛋白磷酸化受蛋白激酶和蛋白磷酸酶的动态可逆调控,在多种生物过程中起着重要作用。尽管已经开发了许多工具来预测激酶特异性磷酸化位点(p位点),但计算预测磷酸酶特异性去磷酸化位点仍然是一个巨大的挑战。在这项研究中,我们从文献和公共数据库中手动筛选了4393个位点特异性磷酸酶-底物关系,实验鉴定了3463个发生在磷丝氨酸、磷苏氨酸和/或磷酪氨酸残基上的去磷酸化位点。然后,我们开发了一个混合学习框架,即基于组的预测系统,用于预测磷酸酶特异性去磷酸化位点(GPSD)。对于模型训练,我们整合了10种类型的序列特征,并使用了三种类型的机器学习方法,包括惩罚逻辑回归,深度神经网络和变压器神经网络。首先,使用561 416个非冗余p位点构建预训练模型,然后进行微调以生成预测一般去磷酸化位点的计算模型。此外,通过迁移学习和元学习构建了103个个体磷酸酶特异性预测因子。对于位点预测,可以输入FASTA格式的一个或多个蛋白质序列,并将预测结果与蛋白质-蛋白质相互作用、结构信息、无序倾向等附加注释一起显示。政府服务署的网上服务可于https://gpsd.biocuckoo.cn/免费提供。我们相信GPSD可以作为进一步分析去磷酸化的有价值的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
期刊最新文献
TRIAGE: an R package for regulatory gene analysis. AutoXAI4Omics: an automated explainable AI tool for omics and tabular data. MCGAE: unraveling tumor invasion through integrated multimodal spatial transcriptomics. tcrBLOSUM: an amino acid substitution matrix for sensitive alignment of distant epitope-specific TCRs. A versatile pipeline to identify convergently lost ancestral conserved fragments associated with convergent evolution of vocal learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1