Protein sequence motif patterns using adaptive Fuzzy C-Means granular computing model

2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering Pub Date : 2013-04-15 DOI:10.1109/ICPRIME.2013.6496454

M. Chitralegha, K. Thangavel

{"title":"Protein sequence motif patterns using adaptive Fuzzy C-Means granular computing model","authors":"M. Chitralegha, K. Thangavel","doi":"10.1109/ICPRIME.2013.6496454","DOIUrl":null,"url":null,"abstract":"Data Mining is the process to extract hidden predictive information from large databases. In Bioinformatics, data mining enables researchers to meet the challenge of mining large amount of biomolecular data to discover real knowledge. Major research efforts done in the area of bioinformatics involves sequence analysis, protein structure prediction and gene finding. Proteins are said to be prominent molecules in our cells. They involve virtually in all cell functions. The activities and functions of proteins can be determined by protein sequence motifs. These protein motifs are identified from the segments of protein sequences. All segments may not be important to produce good motif patterns. The generated sequence segments do not have classes or labels. Hence, unsupervised segment selection technique is adopted to select significant segments. Therefore Singular Value Decomposition (SVD) entropy method is adopted to select significant sequence segments. In this proposed work, weighted K-Means and Adaptive Fuzzy C-Means have been applied to the selected segments to generate granules, since large amount of segments cannot be grouped or clustered as such. Each granules generated by weighted K-Means algorithm are further clustered by using the K-Means algorithm and granules generated by Adaptive Fuzzy C-Means algorithm are clustered by using Weighted K-Means. The two proposed models are compared with K-Means granular computing model. The experimental results show that Adaptive Fuzzy C-Means with Weighted K-Means technique produces better results than K-Means and weighted K-Means granular computing methods.","PeriodicalId":123210,"journal":{"name":"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPRIME.2013.6496454","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Data Mining is the process to extract hidden predictive information from large databases. In Bioinformatics, data mining enables researchers to meet the challenge of mining large amount of biomolecular data to discover real knowledge. Major research efforts done in the area of bioinformatics involves sequence analysis, protein structure prediction and gene finding. Proteins are said to be prominent molecules in our cells. They involve virtually in all cell functions. The activities and functions of proteins can be determined by protein sequence motifs. These protein motifs are identified from the segments of protein sequences. All segments may not be important to produce good motif patterns. The generated sequence segments do not have classes or labels. Hence, unsupervised segment selection technique is adopted to select significant segments. Therefore Singular Value Decomposition (SVD) entropy method is adopted to select significant sequence segments. In this proposed work, weighted K-Means and Adaptive Fuzzy C-Means have been applied to the selected segments to generate granules, since large amount of segments cannot be grouped or clustered as such. Each granules generated by weighted K-Means algorithm are further clustered by using the K-Means algorithm and granules generated by Adaptive Fuzzy C-Means algorithm are clustered by using Weighted K-Means. The two proposed models are compared with K-Means granular computing model. The experimental results show that Adaptive Fuzzy C-Means with Weighted K-Means technique produces better results than K-Means and weighted K-Means granular computing methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于自适应模糊c均值颗粒计算模型的蛋白质序列基序模式

数据挖掘是从大型数据库中提取隐藏的预测信息的过程。在生物信息学中，数据挖掘使研究人员能够满足挖掘大量生物分子数据以发现真实知识的挑战。生物信息学领域的主要研究工作包括序列分析、蛋白质结构预测和基因发现。据说蛋白质是我们细胞中重要的分子。它们几乎参与了所有的细胞功能。蛋白质的活性和功能可以通过蛋白质序列基序来确定。这些蛋白质基序是从蛋白质序列片段中鉴定出来的。所有的片段可能并不重要，以产生良好的图案。生成的序列片段没有类或标签。因此，采用无监督段选择技术来选择有意义的段。因此，采用奇异值分解(SVD)熵方法来选择有意义的序列段。在这项工作中，由于大量的片段不能被分组或聚类，加权K-Means和自适应模糊C-Means被应用于所选的片段来生成颗粒。对加权K-Means算法生成的颗粒进行K-Means聚类，对自适应模糊C-Means算法生成的颗粒进行加权K-Means聚类。将两种模型与K-Means颗粒计算模型进行了比较。实验结果表明，加权K-Means自适应模糊C-Means技术比K-Means和加权K-Means颗粒计算方法具有更好的计算效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering

自引率

0.00%

发文量

期刊最新文献

Separable reversible data hiding using Rc4 algorithm Personal approach for mobile search: A review Bijective soft set based classification of medical data Deployment and power assignment problem in Wireless Sensor Networks for intruder detection application using MEA Protein sequence motif patterns using adaptive Fuzzy C-Means granular computing model