在关系DBMS中对频繁项集挖掘的索引支持

21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI:10.1109/ICDE.2005.80

Elena Baralis, T. Cerquitelli, S. Chiusano

{"title":"在关系DBMS中对频繁项集挖掘的索引支持","authors":"Elena Baralis, T. Cerquitelli, S. Chiusano","doi":"10.1109/ICDE.2005.80","DOIUrl":null,"url":null,"abstract":"Many efforts have been devoted to couple data mining activities with relational DBMSs, but a true integration into the relational DBMS kernel has been rarely achieved. This paper presents a novel indexing technique, which represents transactions in a succinct form, appropriate for tightly integrating frequent itemset mining in a relational DBMS. The data representation is complete, i.e., no support threshold is enforced, in order to allow reusing the index for mining itemsets with any support threshold. Furthermore, an appropriate structure of the stored information has been devised, in order to allow a selective access of the index blocks necessary for the current extraction phase. The index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Experiments have been run for various datasets, characterized by different data distributions. The execution time of the frequent itemset extraction task exploiting the index is always comparable with and sometime faster than a C++ implementation of the FP-growth algorithm accessing data stored on a flat file.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":"{\"title\":\"Index support for frequent itemset mining in a relational DBMS\",\"authors\":\"Elena Baralis, T. Cerquitelli, S. Chiusano\",\"doi\":\"10.1109/ICDE.2005.80\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many efforts have been devoted to couple data mining activities with relational DBMSs, but a true integration into the relational DBMS kernel has been rarely achieved. This paper presents a novel indexing technique, which represents transactions in a succinct form, appropriate for tightly integrating frequent itemset mining in a relational DBMS. The data representation is complete, i.e., no support threshold is enforced, in order to allow reusing the index for mining itemsets with any support threshold. Furthermore, an appropriate structure of the stored information has been devised, in order to allow a selective access of the index blocks necessary for the current extraction phase. The index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Experiments have been run for various datasets, characterized by different data distributions. The execution time of the frequent itemset extraction task exploiting the index is always comparable with and sometime faster than a C++ implementation of the FP-growth algorithm accessing data stored on a flat file.\",\"PeriodicalId\":297231,\"journal\":{\"name\":\"21st International Conference on Data Engineering (ICDE'05)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"27\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"21st International Conference on Data Engineering (ICDE'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2005.80\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"21st International Conference on Data Engineering (ICDE'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2005.80","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

摘要

许多工作都致力于将数据挖掘活动与关系DBMS结合起来，但是很少实现与关系DBMS内核的真正集成。本文提出了一种新的索引技术，它以一种简洁的形式表示事务，适合于在关系DBMS中紧密集成频繁项集挖掘。数据表示是完整的，即没有强制执行支持阈值，以便允许重用索引来挖掘具有任何支持阈值的项集。此外，还设计了存储信息的适当结构，以便允许对当前提取阶段所需的索引块进行选择性访问。该索引已经在PostgreSQL开源DBMS中实现，并利用了它的物理层访问方法。实验已经运行了各种数据集，具有不同的数据分布特征。利用索引的频繁项集提取任务的执行时间总是与访问平面文件上存储的数据的fp增长算法的c++实现相当，有时甚至比它更快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Index support for frequent itemset mining in a relational DBMS

Many efforts have been devoted to couple data mining activities with relational DBMSs, but a true integration into the relational DBMS kernel has been rarely achieved. This paper presents a novel indexing technique, which represents transactions in a succinct form, appropriate for tightly integrating frequent itemset mining in a relational DBMS. The data representation is complete, i.e., no support threshold is enforced, in order to allow reusing the index for mining itemsets with any support threshold. Furthermore, an appropriate structure of the stored information has been devised, in order to allow a selective access of the index blocks necessary for the current extraction phase. The index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Experiments have been run for various datasets, characterized by different data distributions. The execution time of the frequent itemset extraction task exploiting the index is always comparable with and sometime faster than a C++ implementation of the FP-growth algorithm accessing data stored on a flat file.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

21st International Conference on Data Engineering (ICDE'05)

自引率

0.00%

发文量

期刊最新文献

Proactive caching for spatial queries in mobile environments MoDB: database system for synthesizing human motion Integrating data from disparate sources: a mass collaboration approach ViteX: a streaming XPath processing system Efficient data management on lightweight computing devices