Constraint Frequent Motif Detection in sequence datasets

Mr. E. Ramanujam, Dr. S. Padmavathi
{"title":"Constraint Frequent Motif Detection in sequence datasets","authors":"Mr. E. Ramanujam, Dr. S. Padmavathi","doi":"10.1109/ICOAC.2012.6416844","DOIUrl":null,"url":null,"abstract":"The Subsequence Motif mining problem has large class of applications in the field of Bioinformatics, such as protein-protein interaction, protein Motif mining, DNA classification, web log analysis and the like. Existing algorithms detects contiguous exact and approximate patterns by restricting the user in pattern length. Though many algorithms have been solved for the related problem with poor scalability, time inefficiency, some algorithms have extracted only non-contiguous exact patterns without noise in adapting to other applications. In this paper, Constraint Frequent Motif Detection (CFMD) an algorithm is used in extracting both contiguous, non-contiguous patterns of short or long sequences of any length in biological database. CFMD combines data mining techniques such as TRIE like Frequent Pattern (FP-Tree) in constructing the patterns in such a way that most commonly occurring patterns from root to leaf node, Constraints to restrict the growth of FP-Tree and to reduce the search space of the FP-Tree. The efficiency of the proposed CFMD is fast, scalable to extract patterns from both contiguous and non-contiguous sequences. The performance of the proposed approach is proved using both real and synthetic datasets.","PeriodicalId":286985,"journal":{"name":"2012 Fourth International Conference on Advanced Computing (ICoAC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Fourth International Conference on Advanced Computing (ICoAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOAC.2012.6416844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The Subsequence Motif mining problem has large class of applications in the field of Bioinformatics, such as protein-protein interaction, protein Motif mining, DNA classification, web log analysis and the like. Existing algorithms detects contiguous exact and approximate patterns by restricting the user in pattern length. Though many algorithms have been solved for the related problem with poor scalability, time inefficiency, some algorithms have extracted only non-contiguous exact patterns without noise in adapting to other applications. In this paper, Constraint Frequent Motif Detection (CFMD) an algorithm is used in extracting both contiguous, non-contiguous patterns of short or long sequences of any length in biological database. CFMD combines data mining techniques such as TRIE like Frequent Pattern (FP-Tree) in constructing the patterns in such a way that most commonly occurring patterns from root to leaf node, Constraints to restrict the growth of FP-Tree and to reduce the search space of the FP-Tree. The efficiency of the proposed CFMD is fast, scalable to extract patterns from both contiguous and non-contiguous sequences. The performance of the proposed approach is proved using both real and synthetic datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
序列数据集的约束频繁基序检测
子序列Motif挖掘问题在生物信息学领域有着广泛的应用,如蛋白质相互作用、蛋白质Motif挖掘、DNA分类、网络日志分析等。现有算法通过限制用户的模式长度来检测连续的精确模式和近似模式。虽然许多算法解决了可扩展性差、时间效率低的问题,但在适应其他应用时,有些算法只提取了非连续的精确模式,没有噪声。本文将约束频繁基序检测(CFMD)算法用于提取生物数据库中任意长度的短序列或长序列的连续或非连续模式。CFMD结合了TRIE等数据挖掘技术,如频繁模式(FP-Tree),以一种从根节点到叶节点的最常见模式的方式构建模式,约束FP-Tree的生长并减少FP-Tree的搜索空间。该方法具有快速、可扩展的特点,可以从连续和非连续序列中提取模式。用真实数据集和合成数据集验证了该方法的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Realization of gateway relocation using admission control algorithm in mobile WiMAX networks Where are you? A location awareness system Atmospheric correction of remotely sensed multispectral satellite images in transform domain Vehicle detection in aerial surveillance using morphological shared-pixels neural (MSPN) networks SVM-instance based approach to improve QoS parameters for time critical applications in WSN
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1