{"title":"Prediction of Protein Functional Sites Using Novel String Kernels","authors":"C. Das, P. Maji","doi":"10.1109/ICIT.2008.11","DOIUrl":null,"url":null,"abstract":"In most pattern recognition algorithms, amino acids cannot be used directly as inputs since they are nonnumerical variables. They, therefore, need encoding prior to input. In this regard, a novel string kernel is introduced, which maps a nonnumerical sequence space to a numerical feature space.The proposed string kernel is developed based on the conventional bio-basis function and termed as novel bio-basis function. The novel bio-basis function is designed based on the principle of asymmetricity of biological distance, which is calculated using an amino acid mutation matrix. The concept of zone of influence of bio-basis is introduced in the proposed string kernel to normalize the asymmetric distance. An efficient method to select bio-bases for the novel string kernel is described integrating the concepts of the Fisher ratio and degree of resemblance. The effectiveness of the proposed string kernel and bio-bases selection method, along with a comparison with existing kernel and related selection methods, is demonstrated on different protein data sets.","PeriodicalId":184201,"journal":{"name":"2008 International Conference on Information Technology","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIT.2008.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In most pattern recognition algorithms, amino acids cannot be used directly as inputs since they are nonnumerical variables. They, therefore, need encoding prior to input. In this regard, a novel string kernel is introduced, which maps a nonnumerical sequence space to a numerical feature space.The proposed string kernel is developed based on the conventional bio-basis function and termed as novel bio-basis function. The novel bio-basis function is designed based on the principle of asymmetricity of biological distance, which is calculated using an amino acid mutation matrix. The concept of zone of influence of bio-basis is introduced in the proposed string kernel to normalize the asymmetric distance. An efficient method to select bio-bases for the novel string kernel is described integrating the concepts of the Fisher ratio and degree of resemblance. The effectiveness of the proposed string kernel and bio-bases selection method, along with a comparison with existing kernel and related selection methods, is demonstrated on different protein data sets.