一种快速挖掘频繁项集的有效算法

Shaopeng Wang, Yufei Wang, Chunkai Feng, ChaoYu Niu
{"title":"一种快速挖掘频繁项集的有效算法","authors":"Shaopeng Wang, Yufei Wang, Chunkai Feng, ChaoYu Niu","doi":"10.1109/PRML52754.2021.9520736","DOIUrl":null,"url":null,"abstract":"The negFIN is the current state-of-the art algorithm for frequent itemsets mining. It employs a novel BMC (bitmap code) encoding model for nodes in a prefix tree based on the bitmap representation of sets. The encoding of each node is a binary number of which bit number is the number of frequent items, and is stored in the form of decimal integer number. The key operations of negFIN are all performed based on the bitwise operation of the encoding. The main problem of BMC is that the maximal bit number of the data type which is used to store the decimal integer number in current general compiling systems is 64, so if the number of frequent items exceeds 64, the encoding cannot work effectively. In this work, we propose B-BMC (block bitmap code) encoding model, a more efficient encoding model. The B-BMC is a dividing of BMC based on the block size in essential. For facilitating the work of B-BMC, the B-BMC tree and TNC(terminal node code) table are devised as an alternative to the BMC tree of negFIN. Based on these two structures, we present an efficient algorithm called INFIN (improved negFIN) to mining frequent itemsets. Our experiments illustrate that the B-BMC can overcome the drawback of BMC, and the INFIN is the most efficient one in time and space when the block size takes value 64 on condition that the number of frequent items exceeds 64.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"INFIN: An Efficient Algorithm for Fast Mining Frequent Itemsets\",\"authors\":\"Shaopeng Wang, Yufei Wang, Chunkai Feng, ChaoYu Niu\",\"doi\":\"10.1109/PRML52754.2021.9520736\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The negFIN is the current state-of-the art algorithm for frequent itemsets mining. It employs a novel BMC (bitmap code) encoding model for nodes in a prefix tree based on the bitmap representation of sets. The encoding of each node is a binary number of which bit number is the number of frequent items, and is stored in the form of decimal integer number. The key operations of negFIN are all performed based on the bitwise operation of the encoding. The main problem of BMC is that the maximal bit number of the data type which is used to store the decimal integer number in current general compiling systems is 64, so if the number of frequent items exceeds 64, the encoding cannot work effectively. In this work, we propose B-BMC (block bitmap code) encoding model, a more efficient encoding model. The B-BMC is a dividing of BMC based on the block size in essential. For facilitating the work of B-BMC, the B-BMC tree and TNC(terminal node code) table are devised as an alternative to the BMC tree of negFIN. Based on these two structures, we present an efficient algorithm called INFIN (improved negFIN) to mining frequent itemsets. Our experiments illustrate that the B-BMC can overcome the drawback of BMC, and the INFIN is the most efficient one in time and space when the block size takes value 64 on condition that the number of frequent items exceeds 64.\",\"PeriodicalId\":429603,\"journal\":{\"name\":\"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PRML52754.2021.9520736\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

negFIN是当前最先进的频繁项集挖掘算法。基于集合的位图表示,对前缀树中的节点采用了一种新颖的BMC (bitmap code)编码模型。每个节点的编码是一个二进制数,其中位数是频繁项的个数,并以十进制整数的形式存储。negFIN的键操作都是基于编码的位操作来执行的。BMC的主要问题是目前一般编译系统中用于存储十进制整数的数据类型的最大位数为64位,因此如果频繁项的数量超过64位,则无法有效地进行编码。在这项工作中,我们提出了B-BMC(块位图码)编码模型,这是一种更有效的编码模型。B-BMC本质上是一种基于块大小的BMC划分。为了方便B-BMC的工作,设计了B-BMC树和TNC(终端节点代码)表,作为negFIN的BMC树的替代方案。在这两种结构的基础上,我们提出了一种高效的挖掘频繁项集的算法,称为INFIN (improved negFIN)。实验表明,B-BMC可以克服BMC的缺点,当块大小为64且频繁项数超过64时,INFIN在时间和空间上是最有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
INFIN: An Efficient Algorithm for Fast Mining Frequent Itemsets
The negFIN is the current state-of-the art algorithm for frequent itemsets mining. It employs a novel BMC (bitmap code) encoding model for nodes in a prefix tree based on the bitmap representation of sets. The encoding of each node is a binary number of which bit number is the number of frequent items, and is stored in the form of decimal integer number. The key operations of negFIN are all performed based on the bitwise operation of the encoding. The main problem of BMC is that the maximal bit number of the data type which is used to store the decimal integer number in current general compiling systems is 64, so if the number of frequent items exceeds 64, the encoding cannot work effectively. In this work, we propose B-BMC (block bitmap code) encoding model, a more efficient encoding model. The B-BMC is a dividing of BMC based on the block size in essential. For facilitating the work of B-BMC, the B-BMC tree and TNC(terminal node code) table are devised as an alternative to the BMC tree of negFIN. Based on these two structures, we present an efficient algorithm called INFIN (improved negFIN) to mining frequent itemsets. Our experiments illustrate that the B-BMC can overcome the drawback of BMC, and the INFIN is the most efficient one in time and space when the block size takes value 64 on condition that the number of frequent items exceeds 64.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Intelligent Robot for Cleaning Garbage Based on OpenCV Research on Tibetan-Chinese Machine Translation Based on Multi-Strategy Processing A Survey of Object Detection Based on CNN and Transformer A Review of Segmentation and Classification for Retinal Optical Coherence Tomography Images Research on the Methods of Speech Synthesis Technology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1