{"title":"Constraint Frequent Motif Detection in sequence datasets","authors":"Mr. E. Ramanujam, Dr. S. Padmavathi","doi":"10.1109/ICOAC.2012.6416844","DOIUrl":null,"url":null,"abstract":"The Subsequence Motif mining problem has large class of applications in the field of Bioinformatics, such as protein-protein interaction, protein Motif mining, DNA classification, web log analysis and the like. Existing algorithms detects contiguous exact and approximate patterns by restricting the user in pattern length. Though many algorithms have been solved for the related problem with poor scalability, time inefficiency, some algorithms have extracted only non-contiguous exact patterns without noise in adapting to other applications. In this paper, Constraint Frequent Motif Detection (CFMD) an algorithm is used in extracting both contiguous, non-contiguous patterns of short or long sequences of any length in biological database. CFMD combines data mining techniques such as TRIE like Frequent Pattern (FP-Tree) in constructing the patterns in such a way that most commonly occurring patterns from root to leaf node, Constraints to restrict the growth of FP-Tree and to reduce the search space of the FP-Tree. The efficiency of the proposed CFMD is fast, scalable to extract patterns from both contiguous and non-contiguous sequences. The performance of the proposed approach is proved using both real and synthetic datasets.","PeriodicalId":286985,"journal":{"name":"2012 Fourth International Conference on Advanced Computing (ICoAC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Fourth International Conference on Advanced Computing (ICoAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOAC.2012.6416844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The Subsequence Motif mining problem has large class of applications in the field of Bioinformatics, such as protein-protein interaction, protein Motif mining, DNA classification, web log analysis and the like. Existing algorithms detects contiguous exact and approximate patterns by restricting the user in pattern length. Though many algorithms have been solved for the related problem with poor scalability, time inefficiency, some algorithms have extracted only non-contiguous exact patterns without noise in adapting to other applications. In this paper, Constraint Frequent Motif Detection (CFMD) an algorithm is used in extracting both contiguous, non-contiguous patterns of short or long sequences of any length in biological database. CFMD combines data mining techniques such as TRIE like Frequent Pattern (FP-Tree) in constructing the patterns in such a way that most commonly occurring patterns from root to leaf node, Constraints to restrict the growth of FP-Tree and to reduce the search space of the FP-Tree. The efficiency of the proposed CFMD is fast, scalable to extract patterns from both contiguous and non-contiguous sequences. The performance of the proposed approach is proved using both real and synthetic datasets.