首页 > 最新文献

Proceedings 18th International Conference on Data Engineering最新文献

英文 中文
SCADDAR: an efficient randomized technique to reorganize continuous media blocks SCADDAR:一种重组连续媒体块的有效随机化技术
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994760
Ashish Goel, C. Shahabi, S. Yao, Roger Zimmermann
Scalable storage architectures allow for the addition of disks to increase storage capacity and/or bandwidth. In its general form, disk scaling also refers to disk removals when either capacity needs to be conserved or old disk drives are retired. Assuming random placement of blocks on multiple nodes of a continuous media server, our optimization objective is to redistribute a minimum number of media blocks after disk scaling. This objective should be met under two restrictions. First, uniform distribution and hence a balanced load should be ensured after redistribution. Second, the redistributed blocks should be retrieved at the normal mode of operation in one disk access and through low complexity computation. We propose a technique that meets the objective, while we prove that it also satisfies both restrictions. The SCADDAR approach is based on using a series of REMAP functions which can derive the location of a new block using only its original location as a basis.
可扩展的存储架构允许添加磁盘来增加存储容量和/或带宽。一般来说,磁盘扩展还指在需要保留容量或淘汰旧磁盘驱动器时删除磁盘。假设在连续媒体服务器的多个节点上随机放置块,我们的优化目标是在磁盘扩展后重新分配最小数量的媒体块。这一目标应在两个限制条件下实现。第一,确保再分配后的均衡分配。其次,重新分配的块应该在一次磁盘访问中以正常的操作模式通过低复杂度的计算来检索。我们提出了一种满足目标的技术,同时证明了它同时满足这两个限制。SCADDAR方法基于使用一系列REMAP函数,这些函数可以仅使用其原始位置作为基础来派生新块的位置。
{"title":"SCADDAR: an efficient randomized technique to reorganize continuous media blocks","authors":"Ashish Goel, C. Shahabi, S. Yao, Roger Zimmermann","doi":"10.1109/ICDE.2002.994760","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994760","url":null,"abstract":"Scalable storage architectures allow for the addition of disks to increase storage capacity and/or bandwidth. In its general form, disk scaling also refers to disk removals when either capacity needs to be conserved or old disk drives are retired. Assuming random placement of blocks on multiple nodes of a continuous media server, our optimization objective is to redistribute a minimum number of media blocks after disk scaling. This objective should be met under two restrictions. First, uniform distribution and hence a balanced load should be ensured after redistribution. Second, the redistributed blocks should be retrieved at the normal mode of operation in one disk access and through low complexity computation. We propose a technique that meets the objective, while we prove that it also satisfies both restrictions. The SCADDAR approach is based on using a series of REMAP functions which can derive the location of a new block using only its original location as a basis.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116167608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 83
Processing reporting function views in a data warehouse environment 在数据仓库环境中处理报表功能视图
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994707
Wolfgang Lehner, W. Hümmer, L. Schlesinger
Reporting functions reflect a novel technique to formulate sequence-oriented queries in SQL. They extend the classical way of grouping and applying aggregation functions by additionally providing a column-based ordering, partitioning, and windowing mechanism. The application area of reporting functions ranges from simple ranking queries (TOP(n)-analyses) over cumulative (Year-To-Date-analyses) to sliding window queries. We discuss the problem of deriving reporting function queries from materialized reporting function views, which is one of the most important issues in efficiently processing queries in a data warehouse environment. Two different derivation algorithms, including their relational mappings are introduced and compared in a test scenario.
报告功能反映了在SQL中制定面向序列查询的新技术。它们通过额外提供基于列的排序、分区和窗口机制,扩展了分组和应用聚合函数的经典方式。报告功能的应用范围从简单的排名查询(TOP(n)-分析)到累积(年初至今分析)到滑动窗口查询。我们讨论了从实体化报表功能视图派生报表功能查询的问题,这是在数据仓库环境中有效处理查询的最重要问题之一。介绍了两种不同的派生算法,包括它们的关系映射,并在一个测试场景中进行了比较。
{"title":"Processing reporting function views in a data warehouse environment","authors":"Wolfgang Lehner, W. Hümmer, L. Schlesinger","doi":"10.1109/ICDE.2002.994707","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994707","url":null,"abstract":"Reporting functions reflect a novel technique to formulate sequence-oriented queries in SQL. They extend the classical way of grouping and applying aggregation functions by additionally providing a column-based ordering, partitioning, and windowing mechanism. The application area of reporting functions ranges from simple ranking queries (TOP(n)-analyses) over cumulative (Year-To-Date-analyses) to sliding window queries. We discuss the problem of deriving reporting function queries from materialized reporting function views, which is one of the most important issues in efficiently processing queries in a data warehouse environment. Two different derivation algorithms, including their relational mappings are introduced and compared in a test scenario.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131369452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exploring aggregate effect with weighted transcoding graphs for efficient cache replacement in transcoding proxies 探索加权转码图的聚合效应,以便在转码代理中有效地替换缓存
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994752
Cheng-Yue Chang, Ming-Syan Chen
This paper explores the aggregate effect when caching multiple versions of the same Web object in the transcoding proxy. Explicitly, the aggregate profit from caching multiple versions of an object is not simply the sum of the profits from caching individual versions, but rather, depends on the transcoding relationships among them. Hence, to evaluate the profit from caching each version of an object efficiently, we devise the notion of a weighted transcoding graph and formulate a generalized profit function which explicitly considers the aggregate effect and several new emerging factors in the transcoding proxy. Based on the weighted transcoding graph and the generalized profit function, an innovative cache replacement algorithm for transcoding proxies is proposed in this paper. Experimental results show that the algorithm proposed consistently outperforms companion schemes in terms of the delay saving ratios and cache hit ratios.
本文探讨了在转码代理中缓存同一Web对象的多个版本时的聚合效应。显然,缓存一个对象的多个版本的总利润不是缓存单个版本的利润的简单总和,而是取决于它们之间的转码关系。因此,为了有效地评估缓存每个版本对象的利润,我们设计了加权转码图的概念,并制定了一个广义的利润函数,该函数明确考虑了转码代理中的总效应和几个新出现的因素。基于加权转码图和广义收益函数,提出了一种新的转码代理缓存替换算法。实验结果表明,该算法在延迟节省率和缓存命中率方面始终优于同类方案。
{"title":"Exploring aggregate effect with weighted transcoding graphs for efficient cache replacement in transcoding proxies","authors":"Cheng-Yue Chang, Ming-Syan Chen","doi":"10.1109/ICDE.2002.994752","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994752","url":null,"abstract":"This paper explores the aggregate effect when caching multiple versions of the same Web object in the transcoding proxy. Explicitly, the aggregate profit from caching multiple versions of an object is not simply the sum of the profits from caching individual versions, but rather, depends on the transcoding relationships among them. Hence, to evaluate the profit from caching each version of an object efficiently, we devise the notion of a weighted transcoding graph and formulate a generalized profit function which explicitly considers the aggregate effect and several new emerging factors in the transcoding proxy. Based on the weighted transcoding graph and the generalized profit function, an innovative cache replacement algorithm for transcoding proxies is proposed in this paper. Experimental results show that the algorithm proposed consistently outperforms companion schemes in terms of the delay saving ratios and cache hit ratios.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130164063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Lossy reduction for very high dimensional data 非常高维数据的有损降低
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994783
C. Jermaine, E. Omiecinski
We consider the use of data reduction techniques for the problem of approximate query answering. We focus on applications for which accurate answers to selective queries are required, and for which the data are very high dimensional (having hundreds of attributes). We present a new data reduction method for this type of application, called the RS kernel. We demonstrate the effectiveness of this method for answering difficult, highly selective queries over high dimensional data using several real datasets.
我们考虑使用数据约简技术来解决近似查询回答问题。我们关注的是需要对选择性查询给出准确答案的应用程序,以及数据维度非常高(具有数百个属性)的应用程序。我们为这类应用提出了一种新的数据约简方法,称为RS内核。我们用几个真实的数据集证明了这种方法在回答高维数据上困难的、高度选择性的查询时的有效性。
{"title":"Lossy reduction for very high dimensional data","authors":"C. Jermaine, E. Omiecinski","doi":"10.1109/ICDE.2002.994783","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994783","url":null,"abstract":"We consider the use of data reduction techniques for the problem of approximate query answering. We focus on applications for which accurate answers to selective queries are required, and for which the data are very high dimensional (having hundreds of attributes). We present a new data reduction method for this type of application, called the RS kernel. We demonstrate the effectiveness of this method for answering difficult, highly selective queries over high dimensional data using several real datasets.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128729706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Condensed cube: an effective approach to reducing data cube size 压缩立方体:减少数据立方体大小的有效方法
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994705
Wei Wang, Hongjun Lu, Jianlin Feng, J. Yu
Pre-computed data cube facilitates OLAP (on-line analytical processing). It is well-known that data cube computation is an expensive operation. While most algorithms have been devoted to optimizing memory management and reducing computation costs, less work has addressed a fundamental issue: the size of a data cube is huge when a large base relation with a large number of attributes is involved. In this paper, we propose a new concept, called a condensed data cube. The condensed cube is of much smaller size than a complete non-condensed cube. More importantly, it is a fully pre-computed cube without compression, and, hence, it requires neither decompression nor further aggregation when answering queries. Several algorithms for computing a condensed cube are proposed. Results of experiments on the effectiveness of condensed data cube are presented, using both synthetic and real-world data. The results indicate that the proposed condensed cube can reduce both the cube size and therefore its computation time.
预先计算的数据立方体便于联机分析处理。众所周知,数据立方体计算是一项昂贵的操作。虽然大多数算法都致力于优化内存管理和降低计算成本,但较少的工作解决了一个基本问题:当涉及具有大量属性的大型基关系时,数据立方体的大小是巨大的。在本文中,我们提出了一个新的概念,称为压缩数据立方体。浓缩的立方体比完全的非浓缩立方体要小得多。更重要的是,它是一个没有压缩的完全预先计算的多维数据集,因此,在回答查询时既不需要解压缩,也不需要进一步聚合。提出了几种计算压缩立方体的算法。用合成数据和真实数据对压缩数据立方体的有效性进行了实验。结果表明,所提出的压缩立方体既可以减少立方体的大小,也可以减少计算时间。
{"title":"Condensed cube: an effective approach to reducing data cube size","authors":"Wei Wang, Hongjun Lu, Jianlin Feng, J. Yu","doi":"10.1109/ICDE.2002.994705","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994705","url":null,"abstract":"Pre-computed data cube facilitates OLAP (on-line analytical processing). It is well-known that data cube computation is an expensive operation. While most algorithms have been devoted to optimizing memory management and reducing computation costs, less work has addressed a fundamental issue: the size of a data cube is huge when a large base relation with a large number of attributes is involved. In this paper, we propose a new concept, called a condensed data cube. The condensed cube is of much smaller size than a complete non-condensed cube. More importantly, it is a fully pre-computed cube without compression, and, hence, it requires neither decompression nor further aggregation when answering queries. Several algorithms for computing a condensed cube are proposed. Results of experiments on the effectiveness of condensed data cube are presented, using both synthetic and real-world data. The results indicate that the proposed condensed cube can reduce both the cube size and therefore its computation time.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133659751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 179
Demonstration: active asynchronous transaction management in high-autonomy federated environment using data agents: Global Change Master Directory v8.0 演示:使用数据代理的高度自治联邦环境中的活动异步事务管理:Global Change Master Directory v8.0
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994744
O. Bukhres, Srinivasan Sikkupparbathyam, K. Nagendra, Zina Ben-Miled, M. Areal, L. Olsen, Chris Gokey, David Kendig, Rosy Cordova, G. Major, J. Savage
The Global Change Master Directory (GCMD) is an Earth science information repository that specifically tracks research data on global climatic change. Building a directory of Earth science metadata that allows the exchange of metadata content among partner organizations is challenging due to the complex issues involved in supporting heterogeneous metadata schema, database schema, database implementation and platforms. This demonstration presents the design of the MD8 (Master Directory v8.0), which allows automated exchange of metadata content among Earth science collaborators through a proposed asynchronous distributed transaction protocol. Specifically, the demonstration focuses on the local data agent that captures local database updates and broadcasts them to other cooperating nodes asynchronously using an announcer.
全球变化主目录(GCMD)是一个专门跟踪全球气候变化研究数据的地球科学信息库。由于支持异构元数据模式、数据库模式、数据库实现和平台所涉及的复杂问题,构建一个允许在合作组织之间交换元数据内容的地球科学元数据目录具有挑战性。这个演示展示了MD8 (Master Directory v8.0)的设计,它允许通过提议的异步分布式事务协议在地球科学合作者之间自动交换元数据内容。具体地说,该演示侧重于本地数据代理,该代理捕获本地数据库更新,并使用公告器将其异步广播到其他协作节点。
{"title":"Demonstration: active asynchronous transaction management in high-autonomy federated environment using data agents: Global Change Master Directory v8.0","authors":"O. Bukhres, Srinivasan Sikkupparbathyam, K. Nagendra, Zina Ben-Miled, M. Areal, L. Olsen, Chris Gokey, David Kendig, Rosy Cordova, G. Major, J. Savage","doi":"10.1109/ICDE.2002.994744","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994744","url":null,"abstract":"The Global Change Master Directory (GCMD) is an Earth science information repository that specifically tracks research data on global climatic change. Building a directory of Earth science metadata that allows the exchange of metadata content among partner organizations is challenging due to the complex issues involved in supporting heterogeneous metadata schema, database schema, database implementation and platforms. This demonstration presents the design of the MD8 (Master Directory v8.0), which allows automated exchange of metadata content among Earth science collaborators through a proposed asynchronous distributed transaction protocol. Specifically, the demonstration focuses on the local data agent that captures local database updates and broadcasts them to other cooperating nodes asynchronously using an announcer.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"273 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115892962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural joins: a primitive for efficient XML query pattern matching 结构连接:用于高效XML查询模式匹配的原语
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994704
S. Al-Khalifa, H. Jagadish, Nick Koudas, J. Patel, D. Srivastava, Yuqing Wu
XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing. We develop two families of structural join algorithms for this task: tree-merge and stack-tree. The tree-merge algorithms are a natural extension of traditional merge joins and the multi-predicate merge joins, while the stack-tree algorithms have no counterpart in traditional relational join processing. We present experimental results on a range of data and queries using the TIMBER native XML query engine built on top of SHORE. We show that while, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse. This behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack-tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-merge algorithms do not have the same guarantee.
XML查询通常在具有某些指定的树结构关系的多个元素上指定选择谓词的模式。原始树结构关系是父子关系和祖先-后代关系,在XML数据库中查找这些关系的所有出现是XML查询处理的核心操作。我们为此开发了两类结构连接算法:树合并和堆栈树。树合并算法是传统合并连接和多谓词合并连接的自然扩展,而堆栈树算法在传统的关系连接处理中没有对应的算法。我们使用构建在SHORE之上的TIMBER原生XML查询引擎,给出了一系列数据和查询的实验结果。我们表明,虽然在某些情况下,树合并算法可以具有与堆栈树算法相当的性能,但在许多情况下,它们要差得多。分析结果解释了这种行为,分析结果表明,在排序输入上,堆栈树算法的最坏情况I/O和CPU复杂性在输入和输出大小的总和中呈线性,而树合并算法没有相同的保证。
{"title":"Structural joins: a primitive for efficient XML query pattern matching","authors":"S. Al-Khalifa, H. Jagadish, Nick Koudas, J. Patel, D. Srivastava, Yuqing Wu","doi":"10.1109/ICDE.2002.994704","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994704","url":null,"abstract":"XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing. We develop two families of structural join algorithms for this task: tree-merge and stack-tree. The tree-merge algorithms are a natural extension of traditional merge joins and the multi-predicate merge joins, while the stack-tree algorithms have no counterpart in traditional relational join processing. We present experimental results on a range of data and queries using the TIMBER native XML query engine built on top of SHORE. We show that while, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse. This behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack-tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-merge algorithms do not have the same guarantee.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126385261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 890
Data reduction by partial preaggregation 部分预聚合的数据约简
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994787
P. Larson
Partial preaggregation is a simple data reduction operator that can be applied to aggregation queries. Whenever we group and aggregate on a column set G, we can preaggregate on any column set that functionally determines G. Preaggregation can be used, for example, to reduce the input size to a join. Regular aggregation reduces the input to one record per group. Partial preaggregation exploits the fact that preaggregation need not be complete-if multiple records happen to be output for a group, they will be combined into the same group by the final aggregation. This paper describes a straightforward hash-based algorithm for partial preaggregation, discusses where it can be applied, and derives a mathematical model for estimating the output size. The effectiveness of the technique and the accuracy of the model are shown on both artificial and real data. It is also shown how to reduce memory requirements by combining partial preaggregation with the input phase of a subsequent join or sort operator. Partial preaggregation has been implemented, in part, in Microsoft SQL Server.
部分预聚合是一种简单的数据约简操作符,可应用于聚合查询。当我们对列集G进行分组和聚合时,我们可以对任何在功能上决定G的列集进行预聚合。预聚合可以用于减少连接的输入大小。常规聚合将输入减少到每组一条记录。部分预聚合利用了预聚合不必是完全的这一事实——如果一个组碰巧输出了多条记录,它们将通过最终聚合组合到同一个组中。本文描述了一个直接的基于哈希的部分预聚合算法,讨论了它可以应用的地方,并推导了一个估计输出大小的数学模型。人工数据和实际数据均表明了该方法的有效性和模型的准确性。还展示了如何通过将部分预聚合与后续连接或排序操作符的输入阶段相结合来减少内存需求。部分预聚合已经在Microsoft SQL Server中部分实现。
{"title":"Data reduction by partial preaggregation","authors":"P. Larson","doi":"10.1109/ICDE.2002.994787","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994787","url":null,"abstract":"Partial preaggregation is a simple data reduction operator that can be applied to aggregation queries. Whenever we group and aggregate on a column set G, we can preaggregate on any column set that functionally determines G. Preaggregation can be used, for example, to reduce the input size to a join. Regular aggregation reduces the input to one record per group. Partial preaggregation exploits the fact that preaggregation need not be complete-if multiple records happen to be output for a group, they will be combined into the same group by the final aggregation. This paper describes a straightforward hash-based algorithm for partial preaggregation, discusses where it can be applied, and derives a mathematical model for estimating the output size. The effectiveness of the technique and the accuracy of the model are shown on both artificial and real data. It is also shown how to reduce memory requirements by combining partial preaggregation with the input phase of a subsequent join or sort operator. Partial preaggregation has been implemented, in part, in Microsoft SQL Server.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124966629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Multiple query optimization by cache-aware middleware using query teamwork 通过缓存感知中间件使用查询团队进行多个查询优化
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994728
K. O'Gorman, D. Agrawal, A. E. Abbadi
Queries with common sequences of disk accesses can make maximal use of a buffer pool. We developed middleware to promote the necessary conditions in concurrent query streams, and achieved a speedup of 2.99 in executing a workload derived from the TCP-H benchmark.
具有公共磁盘访问顺序的查询可以最大限度地利用缓冲池。我们开发了中间件来提升并发查询流中的必要条件,并在执行来自TCP-H基准测试的工作负载时实现了2.99的加速。
{"title":"Multiple query optimization by cache-aware middleware using query teamwork","authors":"K. O'Gorman, D. Agrawal, A. E. Abbadi","doi":"10.1109/ICDE.2002.994728","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994728","url":null,"abstract":"Queries with common sequences of disk accesses can make maximal use of a buffer pool. We developed middleware to promote the necessary conditions in concurrent query streams, and achieved a speedup of 2.99 in executing a workload derived from the TCP-H benchmark.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133723870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Specification-based data reduction in dimensional data warehouses 维度数据仓库中基于规范的数据缩减
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994732
Janne Skyt, Christian S. Jensen, T. Pedersen
Presents a powerful and easy-to-use technique for aggregation-based data reduction that enables the gradual change of the data from being detailed to being increasingly aggregated. The technique enables huge storage gains while retaining the data that is essential to the users, and it preserves the ability to query original and reduced data in an integrated manner.
提出了一种功能强大且易于使用的基于聚合的数据缩减技术,使数据从详细到日益聚合的逐渐变化成为可能。该技术可以在保留对用户至关重要的数据的同时获得巨大的存储收益,并且可以以集成的方式保留查询原始数据和简化数据的能力。
{"title":"Specification-based data reduction in dimensional data warehouses","authors":"Janne Skyt, Christian S. Jensen, T. Pedersen","doi":"10.1109/ICDE.2002.994732","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994732","url":null,"abstract":"Presents a powerful and easy-to-use technique for aggregation-based data reduction that enables the gradual change of the data from being detailed to being increasingly aggregated. The technique enables huge storage gains while retaining the data that is essential to the users, and it preserves the ability to query original and reduced data in an integrated manner.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132698813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
期刊
Proceedings 18th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1