Haitao He, Hai-Yan Cao, Ruixia Yao, Jiadong Ren, C. Hu
{"title":"Mining frequent itemsets based on projection array","authors":"Haitao He, Hai-Yan Cao, Ruixia Yao, Jiadong Ren, C. Hu","doi":"10.1109/ICMLC.2010.5581018","DOIUrl":null,"url":null,"abstract":"Frequent itemsets mining is a crucial problem in the field of data mining. Although many related studies have been suggested, these algorithms may suffer from high computation cost and spatial complexity in dense database, especially when mining long frequent itemsets or support threshold is very lower. To address this problem, a new data structure called P Array is proposed. P Array makes use of data horizontally and vertically like Bit Table FI, and those itemsets that co_occurence with single frequent items are found by computing intersection in P Array. Then, a new algorithm, call MFIPA, is proposed based on P Array. Some frequent itemsets which have the same supports as single frequent item can be found firstly by connecting the single frequent item with every nonempty subsets of its projection, then all other frequent itemsets can be found by using depth-first search strategy. The experimental results show that the proposed algorithm is superior to Bit Table FI in execution efficiency and memory requirement, especially for dense database.","PeriodicalId":126080,"journal":{"name":"2010 International Conference on Machine Learning and Cybernetics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2010.5581018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Frequent itemsets mining is a crucial problem in the field of data mining. Although many related studies have been suggested, these algorithms may suffer from high computation cost and spatial complexity in dense database, especially when mining long frequent itemsets or support threshold is very lower. To address this problem, a new data structure called P Array is proposed. P Array makes use of data horizontally and vertically like Bit Table FI, and those itemsets that co_occurence with single frequent items are found by computing intersection in P Array. Then, a new algorithm, call MFIPA, is proposed based on P Array. Some frequent itemsets which have the same supports as single frequent item can be found firstly by connecting the single frequent item with every nonempty subsets of its projection, then all other frequent itemsets can be found by using depth-first search strategy. The experimental results show that the proposed algorithm is superior to Bit Table FI in execution efficiency and memory requirement, especially for dense database.