{"title":"Predicting the Subcellular Localization of Proteins with Multiple Sites Based on N-Terminal Signals","authors":"Xumi Qu, Yuehui Chen, Shanping Qiao","doi":"10.1109/ISCC-C.2013.101","DOIUrl":null,"url":null,"abstract":"Sub cellular localization of proteins is an important attribute in bioinformatics, closely related to its functions, signal transduction and biological process. In this research field, great progress has been made in recent years. However, some shortcomings still exist in the prediction methods. Such as the extracted features information is not complete enough to achieve a higher prediction accuracy rate, some important protein information and the correlation of the amino acid sequence are usually ignored and so on. Some proteins do not have only one location, they may have two locations or three and even more, but were considered to have only one location. In this study, we divide a protein sequence into two parts according to its N-terminal sorting signals and extract their pseudo amino acid composition features respectively. And then we use the multi-label KNN, shorted for ML-KNN to deal with the proteins which have two, three or even more locations. The results are satisfied by Jack Knife test.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Information Science and Cloud Computing Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC-C.2013.101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Sub cellular localization of proteins is an important attribute in bioinformatics, closely related to its functions, signal transduction and biological process. In this research field, great progress has been made in recent years. However, some shortcomings still exist in the prediction methods. Such as the extracted features information is not complete enough to achieve a higher prediction accuracy rate, some important protein information and the correlation of the amino acid sequence are usually ignored and so on. Some proteins do not have only one location, they may have two locations or three and even more, but were considered to have only one location. In this study, we divide a protein sequence into two parts according to its N-terminal sorting signals and extract their pseudo amino acid composition features respectively. And then we use the multi-label KNN, shorted for ML-KNN to deal with the proteins which have two, three or even more locations. The results are satisfied by Jack Knife test.