{"title":"蛋白质序列中o链糖基化位点的主成分分析","authors":"Xue-mei Yang, Yenwei Chen, M. Ito, I. Nishikawa","doi":"10.1109/IIH-MSP.2007.248","DOIUrl":null,"url":null,"abstract":"In this paper, a detailed analysis about the structure of O-glycosylated protein has been done by calculating the positional probability functions (PPFs) and principal components. We found that the content of proline , serine , threonine and alanine in O-glycosylated protein is higher than those in nonglycosylated protein. Furthermore, we also found that the serine near N or C terminus was easily glycosylated and the threonine near N terminus is easily glycosylated. The prediction was also done as a classification problem. The test protein sequence is projected to the common subspace and then by calculating the distance between the projection and each class center, the test protein sequence can be assigned into the \"nearest\" class. The prediction accuracy is about 60%-100%.","PeriodicalId":385132,"journal":{"name":"Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007)","volume":"53 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Principal Component Analysis of O-linked Glycosylation Sites in Protein Sequence\",\"authors\":\"Xue-mei Yang, Yenwei Chen, M. Ito, I. Nishikawa\",\"doi\":\"10.1109/IIH-MSP.2007.248\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, a detailed analysis about the structure of O-glycosylated protein has been done by calculating the positional probability functions (PPFs) and principal components. We found that the content of proline , serine , threonine and alanine in O-glycosylated protein is higher than those in nonglycosylated protein. Furthermore, we also found that the serine near N or C terminus was easily glycosylated and the threonine near N terminus is easily glycosylated. The prediction was also done as a classification problem. The test protein sequence is projected to the common subspace and then by calculating the distance between the projection and each class center, the test protein sequence can be assigned into the \\\"nearest\\\" class. The prediction accuracy is about 60%-100%.\",\"PeriodicalId\":385132,\"journal\":{\"name\":\"Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007)\",\"volume\":\"53 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IIH-MSP.2007.248\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIH-MSP.2007.248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Principal Component Analysis of O-linked Glycosylation Sites in Protein Sequence
In this paper, a detailed analysis about the structure of O-glycosylated protein has been done by calculating the positional probability functions (PPFs) and principal components. We found that the content of proline , serine , threonine and alanine in O-glycosylated protein is higher than those in nonglycosylated protein. Furthermore, we also found that the serine near N or C terminus was easily glycosylated and the threonine near N terminus is easily glycosylated. The prediction was also done as a classification problem. The test protein sequence is projected to the common subspace and then by calculating the distance between the projection and each class center, the test protein sequence can be assigned into the "nearest" class. The prediction accuracy is about 60%-100%.