CoNglyPred:利用 ESM-2 和结构特征以及图形网络和共注意力准确预测 N-连接糖基化位点。

IF 3.4 4区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Proteomics Pub Date : 2024-10-03 DOI:10.1002/pmic.202400210
Hongmei Wang, Long Zhao, Ziyuan Yu, Ximin Zeng, Shaoping Shi
{"title":"CoNglyPred:利用 ESM-2 和结构特征以及图形网络和共注意力准确预测 N-连接糖基化位点。","authors":"Hongmei Wang, Long Zhao, Ziyuan Yu, Ximin Zeng, Shaoping Shi","doi":"10.1002/pmic.202400210","DOIUrl":null,"url":null,"abstract":"<p><p>N-Linked glycosylation is crucial for various biological processes such as protein folding, immune response, and cellular transport. Traditional experimental methods for determining N-linked glycosylation sites entail substantial time and labor investment, which has led to the development of computational approaches as a more efficient alternative. However, due to the limited availability of 3D structural data, existing prediction methods often struggle to fully utilize structural information and fall short in integrating sequence and structural information effectively. Motivated by the progress of protein pretrained language models (pLMs) and the breakthrough in protein structure prediction, we introduced a high-accuracy model called CoNglyPred. Having compared various pLMs, we opt for the large-scale pLM ESM-2 to extract sequence embeddings, thus mitigating certain limitations associated with manual feature extraction. Meanwhile, our approach employs a graph transformer network to process the 3D protein structures predicted by AlphaFold2. The final graph output and ESM-2 embedding are intricately integrated through a co-attention mechanism. Among a series of comprehensive experiments on the independent test dataset, CoNglyPred outperforms state-of-the-art models and demonstrates exceptional performance in case study. In addition, we are the first to report the uncertainty of N-linked glycosylation predictors using expected calibration error and expected uncertainty calibration error.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CoNglyPred: Accurate Prediction of N-Linked Glycosylation Sites Using ESM-2 and Structural Features With Graph Network and Co-Attention.\",\"authors\":\"Hongmei Wang, Long Zhao, Ziyuan Yu, Ximin Zeng, Shaoping Shi\",\"doi\":\"10.1002/pmic.202400210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>N-Linked glycosylation is crucial for various biological processes such as protein folding, immune response, and cellular transport. Traditional experimental methods for determining N-linked glycosylation sites entail substantial time and labor investment, which has led to the development of computational approaches as a more efficient alternative. However, due to the limited availability of 3D structural data, existing prediction methods often struggle to fully utilize structural information and fall short in integrating sequence and structural information effectively. Motivated by the progress of protein pretrained language models (pLMs) and the breakthrough in protein structure prediction, we introduced a high-accuracy model called CoNglyPred. Having compared various pLMs, we opt for the large-scale pLM ESM-2 to extract sequence embeddings, thus mitigating certain limitations associated with manual feature extraction. Meanwhile, our approach employs a graph transformer network to process the 3D protein structures predicted by AlphaFold2. The final graph output and ESM-2 embedding are intricately integrated through a co-attention mechanism. Among a series of comprehensive experiments on the independent test dataset, CoNglyPred outperforms state-of-the-art models and demonstrates exceptional performance in case study. In addition, we are the first to report the uncertainty of N-linked glycosylation predictors using expected calibration error and expected uncertainty calibration error.</p>\",\"PeriodicalId\":224,\"journal\":{\"name\":\"Proteomics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proteomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/pmic.202400210\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pmic.202400210","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

N-连接糖基化对蛋白质折叠、免疫反应和细胞运输等各种生物过程至关重要。确定N-连接糖基化位点的传统实验方法需要投入大量的时间和人力,因此人们开始开发计算方法作为更有效的替代方法。然而,由于三维结构数据有限,现有的预测方法往往难以充分利用结构信息,无法有效整合序列和结构信息。在蛋白质预训练语言模型(pLMs)取得进展和蛋白质结构预测取得突破的推动下,我们引入了一种名为 CoNglyPred 的高精度模型。在比较了各种 pLM 后,我们选择了大规模 pLM ESM-2 来提取序列嵌入,从而减少了人工特征提取的某些局限性。同时,我们的方法采用图转换器网络来处理 AlphaFold2 预测的三维蛋白质结构。最终的图输出和 ESM-2 嵌入通过共同关注机制错综复杂地结合在一起。在对独立测试数据集进行的一系列综合实验中,CoNglyPred 的表现优于最先进的模型,并在案例研究中表现出卓越的性能。此外,我们还首次使用预期校准误差和预期不确定性校准误差报告了N-连接糖基化预测因子的不确定性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CoNglyPred: Accurate Prediction of N-Linked Glycosylation Sites Using ESM-2 and Structural Features With Graph Network and Co-Attention.

N-Linked glycosylation is crucial for various biological processes such as protein folding, immune response, and cellular transport. Traditional experimental methods for determining N-linked glycosylation sites entail substantial time and labor investment, which has led to the development of computational approaches as a more efficient alternative. However, due to the limited availability of 3D structural data, existing prediction methods often struggle to fully utilize structural information and fall short in integrating sequence and structural information effectively. Motivated by the progress of protein pretrained language models (pLMs) and the breakthrough in protein structure prediction, we introduced a high-accuracy model called CoNglyPred. Having compared various pLMs, we opt for the large-scale pLM ESM-2 to extract sequence embeddings, thus mitigating certain limitations associated with manual feature extraction. Meanwhile, our approach employs a graph transformer network to process the 3D protein structures predicted by AlphaFold2. The final graph output and ESM-2 embedding are intricately integrated through a co-attention mechanism. Among a series of comprehensive experiments on the independent test dataset, CoNglyPred outperforms state-of-the-art models and demonstrates exceptional performance in case study. In addition, we are the first to report the uncertainty of N-linked glycosylation predictors using expected calibration error and expected uncertainty calibration error.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Proteomics
Proteomics 生物-生化研究方法
CiteScore
6.30
自引率
5.90%
发文量
193
审稿时长
3 months
期刊介绍: PROTEOMICS is the premier international source for information on all aspects of applications and technologies, including software, in proteomics and other "omics". The journal includes but is not limited to proteomics, genomics, transcriptomics, metabolomics and lipidomics, and systems biology approaches. Papers describing novel applications of proteomics and integration of multi-omics data and approaches are especially welcome.
期刊最新文献
Omics Studies in CKD: Diagnostic Opportunities and Therapeutic Potential. Proteome integral solubility alteration via label-free DIA approach (PISA-DIA), game changer in drug target deconvolution. Transforming peptide hormone prediction: The role of AI in modern proteomics. Integrative Proteomic and Phosphoproteomic Profiling Reveals the Salt-Responsive Mechanisms in Two Rice Varieties (Oryza Sativa subsp. Japonica and Indica). Proteomics analysis of round and wrinkled pea (Pisum sativum L.) seeds during different development periods.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1