Range cube: efficient cube computation by exploiting data correlation

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI:10.1109/ICDE.2004.1320035

Ying Feng, D. Agrawal, A. E. Abbadi, Ahmed A. Metwally

{"title":"Range cube: efficient cube computation by exploiting data correlation","authors":"Ying Feng, D. Agrawal, A. E. Abbadi, Ahmed A. Metwally","doi":"10.1109/ICDE.2004.1320035","DOIUrl":null,"url":null,"abstract":"Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. We introduce range cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 20th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2004.1320035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 46

Abstract

Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. We introduce range cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

范围立方体:利用数据相关性进行高效的立方体计算

数据多维数据集计算和表示在时间和空间上都非常昂贵。以前的工作主要集中在减少计算时间或压缩数据立方体的表示。我们引入了范围立方作为一种有效的方法来计算和压缩数据立方而不损失任何精度。采用一种新的数据结构range trie来压缩和识别属性值之间的相关性，并对输入数据集进行压缩，有效地降低了计算成本。范围立方算法生成一个压缩的立方体，称为范围立方，它将所有单元划分为不相交的范围。每个区域表示具有相同聚合值的单元格子集，作为与输入数据元组具有相同维数的元组。范围多维数据集保留数据多维数据集的上卷/下钻语义。与h -立方相比，在真实数据集上的实验表明，当两种算法以其首选维度顺序运行时，运行时间小于三十分之一，仍然生成小于完整立方体空间的九分之一的范围立方体。在合成数据上，范围立方显示出更好的可伸缩性，以及对数据稀疏性和倾斜的更高适应性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings. 20th International Conference on Data Engineering

自引率

0.00%

发文量

期刊最新文献

ContextMetrics/sup /spl trade//: semantic and syntactic interoperability in cross-border trading systems EShopMonitor: a Web content monitoring tool A probabilistic approach to metasearching with adaptive probing Simple, robust and highly concurrent b-trees with node deletion Substructure clustering on sequential 3d object datasets