Index support for frequent itemset mining in a relational DBMS

Elena Baralis, T. Cerquitelli, S. Chiusano
{"title":"Index support for frequent itemset mining in a relational DBMS","authors":"Elena Baralis, T. Cerquitelli, S. Chiusano","doi":"10.1109/ICDE.2005.80","DOIUrl":null,"url":null,"abstract":"Many efforts have been devoted to couple data mining activities with relational DBMSs, but a true integration into the relational DBMS kernel has been rarely achieved. This paper presents a novel indexing technique, which represents transactions in a succinct form, appropriate for tightly integrating frequent itemset mining in a relational DBMS. The data representation is complete, i.e., no support threshold is enforced, in order to allow reusing the index for mining itemsets with any support threshold. Furthermore, an appropriate structure of the stored information has been devised, in order to allow a selective access of the index blocks necessary for the current extraction phase. The index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Experiments have been run for various datasets, characterized by different data distributions. The execution time of the frequent itemset extraction task exploiting the index is always comparable with and sometime faster than a C++ implementation of the FP-growth algorithm accessing data stored on a flat file.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"21st International Conference on Data Engineering (ICDE'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2005.80","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27

Abstract

Many efforts have been devoted to couple data mining activities with relational DBMSs, but a true integration into the relational DBMS kernel has been rarely achieved. This paper presents a novel indexing technique, which represents transactions in a succinct form, appropriate for tightly integrating frequent itemset mining in a relational DBMS. The data representation is complete, i.e., no support threshold is enforced, in order to allow reusing the index for mining itemsets with any support threshold. Furthermore, an appropriate structure of the stored information has been devised, in order to allow a selective access of the index blocks necessary for the current extraction phase. The index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Experiments have been run for various datasets, characterized by different data distributions. The execution time of the frequent itemset extraction task exploiting the index is always comparable with and sometime faster than a C++ implementation of the FP-growth algorithm accessing data stored on a flat file.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在关系DBMS中对频繁项集挖掘的索引支持
许多工作都致力于将数据挖掘活动与关系DBMS结合起来,但是很少实现与关系DBMS内核的真正集成。本文提出了一种新的索引技术,它以一种简洁的形式表示事务,适合于在关系DBMS中紧密集成频繁项集挖掘。数据表示是完整的,即没有强制执行支持阈值,以便允许重用索引来挖掘具有任何支持阈值的项集。此外,还设计了存储信息的适当结构,以便允许对当前提取阶段所需的索引块进行选择性访问。该索引已经在PostgreSQL开源DBMS中实现,并利用了它的物理层访问方法。实验已经运行了各种数据集,具有不同的数据分布特征。利用索引的频繁项集提取任务的执行时间总是与访问平面文件上存储的数据的fp增长算法的c++实现相当,有时甚至比它更快。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Proactive caching for spatial queries in mobile environments MoDB: database system for synthesizing human motion Integrating data from disparate sources: a mass collaboration approach ViteX: a streaming XPath processing system Efficient data management on lightweight computing devices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1