Jiahong Wang, Yoshiaki Asanuma, Eiichiro Kodama, T. Takata, Jie Li
{"title":"Mining Sequential Patterns More Efficiently by Reducing the Cost of Scanning Sequence Databases","authors":"Jiahong Wang, Yoshiaki Asanuma, Eiichiro Kodama, T. Takata, Jie Li","doi":"10.2197/IPSJDC.2.768","DOIUrl":null,"url":null,"abstract":"Sequential pattern mining is a useful technique used to discover frequent subsequences as patterns in a sequence database. Depending on the application, sequence databases vary by number of sequences, number of individual items, average length of sequences, and average length of potential patterns. In addition, to discover the necessary patterns in a sequence database, the support threshold may be set to different values. Thus, for a sequential pattern-mining algorithm, responsiveness should be achieved for all of these factors. For that purpose, we propose a candidate-driven pattern-growth sequential pattern-mining algorithm called FSPM (Fast Sequential Pattern Mining). A useful property of FSPM is that the sequential patterns concerning a user-specified item can be mined directly. Extensive experimental results show that, in most cases FSPM outperforms existing algorithms. An analytical performance study shows that it is the inherent potentiality of FSPM that makes it more effective.","PeriodicalId":432390,"journal":{"name":"Ipsj Digital Courier","volume":"8 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ipsj Digital Courier","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/IPSJDC.2.768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Sequential pattern mining is a useful technique used to discover frequent subsequences as patterns in a sequence database. Depending on the application, sequence databases vary by number of sequences, number of individual items, average length of sequences, and average length of potential patterns. In addition, to discover the necessary patterns in a sequence database, the support threshold may be set to different values. Thus, for a sequential pattern-mining algorithm, responsiveness should be achieved for all of these factors. For that purpose, we propose a candidate-driven pattern-growth sequential pattern-mining algorithm called FSPM (Fast Sequential Pattern Mining). A useful property of FSPM is that the sequential patterns concerning a user-specified item can be mined directly. Extensive experimental results show that, in most cases FSPM outperforms existing algorithms. An analytical performance study shows that it is the inherent potentiality of FSPM that makes it more effective.