DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers

IF 2.4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Current Bioinformatics Pub Date : 2024-02-02 DOI:10.2174/0115748936283134240109054157
Necla Nisa Soylu, Emre Sefer
{"title":"DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers","authors":"Necla Nisa Soylu, Emre Sefer","doi":"10.2174/0115748936283134240109054157","DOIUrl":null,"url":null,"abstract":"Introduction:: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost. Methods:: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species. Results:: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.988 and 0.965 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.982, 0.955, and 0.953 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques. Conclusion:: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936283134240109054157","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction:: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost. Methods:: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species. Results:: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.988 and 0.965 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.982, 0.955, and 0.953 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques. Conclusion:: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DeepPTM:通过将深度蛋白质语言模型与视觉变换器相结合,从蛋白质序列预测蛋白质翻译后修饰
简介::最近的自监督深度语言模型,如来自变换器的双向编码器表示(BERT),通过上下文词嵌入以获得更好的动态表示,在一些语言任务中表现最佳。它们的蛋白质特定版本(如 ProtBERT)生成了动态蛋白质序列嵌入,从而在一些生物信息学任务中取得了更好的性能。此外,一些不同的蛋白质翻译后修饰在细胞任务(如发育和分化)中非常突出。目前的生物实验可以检测这些修饰,但持续时间较长,成本较高。方法为了更简洁、更快速地理解伴随的生物过程,我们在本文中提出了 DEEPPTM,以更高效地从蛋白质序列中预测蛋白质翻译后修饰(PTM)位点。与现有方法不同,DEEPPTM 通过整合基于 ProtBERT 的专业蛋白质嵌入和基于注意力的视觉转换器(ViT),提高了修饰预测性能,并揭示了不同修饰类型与蛋白质序列内容之间的关联。此外,它还能推断出不同物种的多种不同修饰。结果应用 10 倍交叉验证后,人类和小鼠琥珀酰化修饰预测的 ROC AUC 分别为 0.988 和 0.965。同样,我们在推断泛素化、巴豆酰化和糖化位点时也分别获得了 0.982、0.955 和 0.953 的 ROC AUC 分数。根据详细的计算实验,DEEPPTM 减少了实验室实验所花费的时间,同时在推断所有 4 个修饰位点方面优于竞争方法和基线方法。在我们的案例中,与传统的深度学习和机器学习技术相比,基于注意力的深度学习方法(如视觉转换器)更有利于从 ProtBERT 特征中学习。结论此外,在 PTM 预测任务中,蛋白质特异性 ProtBERT 模型比原始 BERT 嵌入更有效。我们的代码和数据集见 https://github.com/seferlab/deepptm。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Current Bioinformatics
Current Bioinformatics 生物-生化研究方法
CiteScore
6.60
自引率
2.50%
发文量
77
审稿时长
>12 weeks
期刊介绍: Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.
期刊最新文献
Mining Transcriptional Data for Precision Medicine: Bioinformatics Insights into Inflammatory Bowel Disease Prediction of miRNA-disease Associations by Deep Matrix Decomposition Method based on Fused Similarity Information TCM@MPXV: A Resource for Treating Monkeypox Patients in Traditional Chinese Medicine Identifying Key Clinical Indicators Associated with the Risk of Death in Hospitalized COVID-19 Patients A Parallel Implementation for Large-Scale TSR-based 3D Structural Comparisons of Protein and Amino Acid
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1