Prediction of Protein Functional Sites Using Novel String Kernels

2008 International Conference on Information Technology Pub Date : 2008-12-17 DOI:10.1109/ICIT.2008.11

C. Das, P. Maji

引用次数: 0

Abstract

In most pattern recognition algorithms, amino acids cannot be used directly as inputs since they are nonnumerical variables. They, therefore, need encoding prior to input. In this regard, a novel string kernel is introduced, which maps a nonnumerical sequence space to a numerical feature space.The proposed string kernel is developed based on the conventional bio-basis function and termed as novel bio-basis function. The novel bio-basis function is designed based on the principle of asymmetricity of biological distance, which is calculated using an amino acid mutation matrix. The concept of zone of influence of bio-basis is introduced in the proposed string kernel to normalize the asymmetric distance. An efficient method to select bio-bases for the novel string kernel is described integrating the concepts of the Fisher ratio and degree of resemblance. The effectiveness of the proposed string kernel and bio-bases selection method, along with a comparison with existing kernel and related selection methods, is demonstrated on different protein data sets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用新型串核预测蛋白质功能位点

在大多数模式识别算法中，氨基酸不能直接用作输入，因为它们是非数值变量。因此，它们需要在输入之前进行编码。在这方面，引入了一种新的字符串核，它将非数值序列空间映射到数值特征空间。该串核是在传统生物基函数的基础上发展起来的，称为新型生物基函数。基于生物距离的不对称原理设计了新的生物基函数，利用氨基酸突变矩阵计算生物基函数。在提出的弦核中引入生物基影响区概念，对不对称距离进行归一化。结合费雪比和相似度的概念，提出了一种新的串核生物基选择方法。在不同的蛋白质数据集上证明了所提出的串核和生物碱基选择方法的有效性，并与现有的核和相关选择方法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2008 International Conference on Information Technology

自引率

0.00%

发文量