{"title":"An efficient hash-based method for discovering the maximal frequent set","authors":"Don-Lin Yang, Ching-Ting Pan, Yeh-Ching Chung","doi":"10.1109/CMPSAC.2001.960661","DOIUrl":null,"url":null,"abstract":"The association rule mining can be divided into two steps. The first step is to find out all frequent itemsets, whose occurrences are greater than or equal to the user-specified threshold. The second step is to generate reliable association rules based on all frequent itemsets found in the first step. Identifying all frequent itemsets in a large database dominates the overall performance in the association rule mining. In this paper, we propose an efficient hash-based method, HMFS, for discovering the maximal frequent itemsets. The HMFS method combines the advantages of both the DHP (Direct Hashing and Pruning) and the Pincer-Search algorithms. The combination leads to two advantages. First, the HMFS method, in general, can reduce the number of database scans. Second, the HMFS can filter the infrequent candidate itemsets and can use the filtered itemsets to find the maximal frequent itemsets. These two advantages can reduce the overall computing time of finding the maximal frequent itemsets. In addition, the HMFS method also provides an efficient mechanism to construct the maximal frequent candidate itemsets to reduce the search space. We have implemented the HMFS method along with the DHP and the Pincer-Search algorithms on a Pentium III 800 MHz PC. The experimental results show that the HMFS method has better performance than the DHP and the Pincer-Search algorithms for most of test cases. In particular, our method has significant improvement over the DHP and the Pincer-Search algorithms when the size of a database is large and the length of the longest itemset is relatively long.","PeriodicalId":269568,"journal":{"name":"25th Annual International Computer Software and Applications Conference. COMPSAC 2001","volume":"242 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"25th Annual International Computer Software and Applications Conference. COMPSAC 2001","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CMPSAC.2001.960661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
The association rule mining can be divided into two steps. The first step is to find out all frequent itemsets, whose occurrences are greater than or equal to the user-specified threshold. The second step is to generate reliable association rules based on all frequent itemsets found in the first step. Identifying all frequent itemsets in a large database dominates the overall performance in the association rule mining. In this paper, we propose an efficient hash-based method, HMFS, for discovering the maximal frequent itemsets. The HMFS method combines the advantages of both the DHP (Direct Hashing and Pruning) and the Pincer-Search algorithms. The combination leads to two advantages. First, the HMFS method, in general, can reduce the number of database scans. Second, the HMFS can filter the infrequent candidate itemsets and can use the filtered itemsets to find the maximal frequent itemsets. These two advantages can reduce the overall computing time of finding the maximal frequent itemsets. In addition, the HMFS method also provides an efficient mechanism to construct the maximal frequent candidate itemsets to reduce the search space. We have implemented the HMFS method along with the DHP and the Pincer-Search algorithms on a Pentium III 800 MHz PC. The experimental results show that the HMFS method has better performance than the DHP and the Pincer-Search algorithms for most of test cases. In particular, our method has significant improvement over the DHP and the Pincer-Search algorithms when the size of a database is large and the length of the longest itemset is relatively long.