关联规则挖掘:抗倾斜算法

Proceedings 14th International Conference on Data Engineering Pub Date : 1998-02-23 DOI:10.1109/ICDE.1998.655811

Jun-Lin Lin, M. Dunham

{"title":"关联规则挖掘:抗倾斜算法","authors":"Jun-Lin Lin, M. Dunham","doi":"10.1109/ICDE.1998.655811","DOIUrl":null,"url":null,"abstract":"Mining association rules among items in a large database has been recognized as one of the most important data mining problems. All proposed approaches for this problem require scanning the entire database at least or almost twice in the worst case. We propose several techniques which overcome the problem of data skew in the basket data. These techniques reduce the maximum number of scans to less than 2, and in most cases find all association rules in about 1 scan. Our algorithms employ prior knowledge collected during the mining process and/or via sampling, to further reduce the number of candidate itemsets and identify false candidate itemsets at an earlier stage.","PeriodicalId":264926,"journal":{"name":"Proceedings 14th International Conference on Data Engineering","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"98","resultStr":"{\"title\":\"Mining association rules: anti-skew algorithms\",\"authors\":\"Jun-Lin Lin, M. Dunham\",\"doi\":\"10.1109/ICDE.1998.655811\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mining association rules among items in a large database has been recognized as one of the most important data mining problems. All proposed approaches for this problem require scanning the entire database at least or almost twice in the worst case. We propose several techniques which overcome the problem of data skew in the basket data. These techniques reduce the maximum number of scans to less than 2, and in most cases find all association rules in about 1 scan. Our algorithms employ prior knowledge collected during the mining process and/or via sampling, to further reduce the number of candidate itemsets and identify false candidate itemsets at an earlier stage.\",\"PeriodicalId\":264926,\"journal\":{\"name\":\"Proceedings 14th International Conference on Data Engineering\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"98\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 14th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.1998.655811\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 14th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.1998.655811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 98

摘要

挖掘大型数据库中项目间的关联规则已被公认为是最重要的数据挖掘问题之一。在最坏的情况下，针对这个问题提出的所有方法都需要扫描整个数据库至少或几乎两次。我们提出了几种克服篮数据中数据倾斜问题的技术。这些技术将最大扫描次数减少到2次以下，并且在大多数情况下，大约在1次扫描中找到所有关联规则。我们的算法利用在挖掘过程中和/或通过抽样收集的先验知识，进一步减少候选项目集的数量，并在早期阶段识别错误的候选项目集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Mining association rules: anti-skew algorithms

Mining association rules among items in a large database has been recognized as one of the most important data mining problems. All proposed approaches for this problem require scanning the entire database at least or almost twice in the worst case. We propose several techniques which overcome the problem of data skew in the basket data. These techniques reduce the maximum number of scans to less than 2, and in most cases find all association rules in about 1 scan. Our algorithms employ prior knowledge collected during the mining process and/or via sampling, to further reduce the number of candidate itemsets and identify false candidate itemsets at an earlier stage.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings 14th International Conference on Data Engineering

自引率

0.00%

发文量

期刊最新文献

A distribution-based clustering algorithm for mining in large spatial databases Parallelizing loops in database programming languages Data logging: a method for efficient data updates in constantly active RAIDs Query processing in a video retrieval system Optimizing regular path expressions using graph schemas