Protein sequence motif patterns using adaptive Fuzzy C-Means granular computing model

M. Chitralegha, K. Thangavel
{"title":"Protein sequence motif patterns using adaptive Fuzzy C-Means granular computing model","authors":"M. Chitralegha, K. Thangavel","doi":"10.1109/ICPRIME.2013.6496454","DOIUrl":null,"url":null,"abstract":"Data Mining is the process to extract hidden predictive information from large databases. In Bioinformatics, data mining enables researchers to meet the challenge of mining large amount of biomolecular data to discover real knowledge. Major research efforts done in the area of bioinformatics involves sequence analysis, protein structure prediction and gene finding. Proteins are said to be prominent molecules in our cells. They involve virtually in all cell functions. The activities and functions of proteins can be determined by protein sequence motifs. These protein motifs are identified from the segments of protein sequences. All segments may not be important to produce good motif patterns. The generated sequence segments do not have classes or labels. Hence, unsupervised segment selection technique is adopted to select significant segments. Therefore Singular Value Decomposition (SVD) entropy method is adopted to select significant sequence segments. In this proposed work, weighted K-Means and Adaptive Fuzzy C-Means have been applied to the selected segments to generate granules, since large amount of segments cannot be grouped or clustered as such. Each granules generated by weighted K-Means algorithm are further clustered by using the K-Means algorithm and granules generated by Adaptive Fuzzy C-Means algorithm are clustered by using Weighted K-Means. The two proposed models are compared with K-Means granular computing model. The experimental results show that Adaptive Fuzzy C-Means with Weighted K-Means technique produces better results than K-Means and weighted K-Means granular computing methods.","PeriodicalId":123210,"journal":{"name":"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPRIME.2013.6496454","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Data Mining is the process to extract hidden predictive information from large databases. In Bioinformatics, data mining enables researchers to meet the challenge of mining large amount of biomolecular data to discover real knowledge. Major research efforts done in the area of bioinformatics involves sequence analysis, protein structure prediction and gene finding. Proteins are said to be prominent molecules in our cells. They involve virtually in all cell functions. The activities and functions of proteins can be determined by protein sequence motifs. These protein motifs are identified from the segments of protein sequences. All segments may not be important to produce good motif patterns. The generated sequence segments do not have classes or labels. Hence, unsupervised segment selection technique is adopted to select significant segments. Therefore Singular Value Decomposition (SVD) entropy method is adopted to select significant sequence segments. In this proposed work, weighted K-Means and Adaptive Fuzzy C-Means have been applied to the selected segments to generate granules, since large amount of segments cannot be grouped or clustered as such. Each granules generated by weighted K-Means algorithm are further clustered by using the K-Means algorithm and granules generated by Adaptive Fuzzy C-Means algorithm are clustered by using Weighted K-Means. The two proposed models are compared with K-Means granular computing model. The experimental results show that Adaptive Fuzzy C-Means with Weighted K-Means technique produces better results than K-Means and weighted K-Means granular computing methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于自适应模糊c均值颗粒计算模型的蛋白质序列基序模式
数据挖掘是从大型数据库中提取隐藏的预测信息的过程。在生物信息学中,数据挖掘使研究人员能够满足挖掘大量生物分子数据以发现真实知识的挑战。生物信息学领域的主要研究工作包括序列分析、蛋白质结构预测和基因发现。据说蛋白质是我们细胞中重要的分子。它们几乎参与了所有的细胞功能。蛋白质的活性和功能可以通过蛋白质序列基序来确定。这些蛋白质基序是从蛋白质序列片段中鉴定出来的。所有的片段可能并不重要,以产生良好的图案。生成的序列片段没有类或标签。因此,采用无监督段选择技术来选择有意义的段。因此,采用奇异值分解(SVD)熵方法来选择有意义的序列段。在这项工作中,由于大量的片段不能被分组或聚类,加权K-Means和自适应模糊C-Means被应用于所选的片段来生成颗粒。对加权K-Means算法生成的颗粒进行K-Means聚类,对自适应模糊C-Means算法生成的颗粒进行加权K-Means聚类。将两种模型与K-Means颗粒计算模型进行了比较。实验结果表明,加权K-Means自适应模糊C-Means技术比K-Means和加权K-Means颗粒计算方法具有更好的计算效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Separable reversible data hiding using Rc4 algorithm Personal approach for mobile search: A review Bijective soft set based classification of medical data Deployment and power assignment problem in Wireless Sensor Networks for intruder detection application using MEA Protein sequence motif patterns using adaptive Fuzzy C-Means granular computing model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1