Support vector machine prediction of N-and O-glycosylation sites using whole sequence information and subcellular localization

Q3 Biochemistry, Genetics and Molecular Biology IPSJ Transactions on Bioinformatics Pub Date : 2009-12-01 DOI:10.2197/IPSJTBIO.2.25

Kenta Sasaki, Nobuyoshi Nagamine, Y. Sakakibara

{"title":"Support vector machine prediction of N-and O-glycosylation sites using whole sequence information and subcellular localization","authors":"Kenta Sasaki, Nobuyoshi Nagamine, Y. Sakakibara","doi":"10.2197/IPSJTBIO.2.25","DOIUrl":null,"url":null,"abstract":"Background: Glycans, or sugar chains, are one of the three types of chain (DNA, protein and glycan) that constitute living organisms; they are often called “the third chain of the living organism”. About half of all proteins are estimated to be glycosylated based on the SWISS-PROT database. Glycosylation is one of the most important post-translational modifications, affecting many critical functions of proteins, including cellular communication, and their tertiary structure. In order to computationally predict N-glycosylation and O-glycosylation sites, we developed three kinds of support vector machine (SVM) model, which utilize local information, general protein information and/or subcellular localization in consideration of the binding specificity of glycosyltransferases and the characteristic subcellular localization of glycoproteins. Results: In our computational experiment, the model integrating three kinds of information achieved about 90% accuracy in predictions of both N-glycosylation and O-glycosylation sites. Moreover, our model was applied to a protein whose glycosylation sites had not been previously identified and we succeeded in showing that the glycosylation sites predicted by our model were structurally reasonable. Conclusions: In the present study, we developed a comprehensive and effective computational method that detects glycosylation sites. We conclude that our method is a comprehensive and effective computational prediction method that is applicable at a genome-wide level.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"2 1","pages":"25-35"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.2.25","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IPSJ Transactions on Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/IPSJTBIO.2.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}

引用次数: 16

Abstract

Background: Glycans, or sugar chains, are one of the three types of chain (DNA, protein and glycan) that constitute living organisms; they are often called “the third chain of the living organism”. About half of all proteins are estimated to be glycosylated based on the SWISS-PROT database. Glycosylation is one of the most important post-translational modifications, affecting many critical functions of proteins, including cellular communication, and their tertiary structure. In order to computationally predict N-glycosylation and O-glycosylation sites, we developed three kinds of support vector machine (SVM) model, which utilize local information, general protein information and/or subcellular localization in consideration of the binding specificity of glycosyltransferases and the characteristic subcellular localization of glycoproteins. Results: In our computational experiment, the model integrating three kinds of information achieved about 90% accuracy in predictions of both N-glycosylation and O-glycosylation sites. Moreover, our model was applied to a protein whose glycosylation sites had not been previously identified and we succeeded in showing that the glycosylation sites predicted by our model were structurally reasonable. Conclusions: In the present study, we developed a comprehensive and effective computational method that detects glycosylation sites. We conclude that our method is a comprehensive and effective computational prediction method that is applicable at a genome-wide level.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于全序列信息和亚细胞定位的支持向量机预测n和o糖基化位点

背景:聚糖或糖链是构成生物体的三种链(DNA、蛋白质和聚糖)之一;它们通常被称为“生物体的第三链”。根据SWISS-PROT数据库估计，大约一半的蛋白质被糖基化。糖基化是最重要的翻译后修饰之一，影响蛋白质的许多关键功能，包括细胞通讯和它们的三级结构。为了计算预测n -糖基化位点和o -糖基化位点，考虑到糖基转移酶的结合特异性和糖蛋白的亚细胞定位特性，我们开发了三种支持向量机(SVM)模型，分别利用局部信息、一般蛋白质信息和/或亚细胞定位。结果:在我们的计算实验中，整合三种信息的模型对n -糖基化位点和o -糖基化位点的预测准确率均达到90%左右。此外，我们的模型应用于一种糖基化位点之前未被确定的蛋白质，我们成功地证明了我们的模型预测的糖基化位点在结构上是合理的。结论:在本研究中，我们开发了一种全面有效的检测糖基化位点的计算方法。结果表明，该方法是一种全面有效的计算预测方法，适用于全基因组水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IPSJ Transactions on Bioinformatics Biochemistry, Genetics and Molecular Biology-Biochemistry, Genetics and Molecular Biology (miscellaneous)

CiteScore

1.90

自引率

0.00%

发文量