首页 > 最新文献

22nd International Conference on Data Engineering (ICDE'06)最新文献

英文 中文
Faster In-Network Evaluation of Spatial Aggregationin Sensor Networks 传感器网络空间聚合的快速网络评价
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.70
Dina Q. Goldin
Spatial aggregation is an important class of queries for geoaware spatial sensor database applications. Given a set of spatial regions, it involves the aggregation of dynamic sensor readings over each of these regions simultaneously. Nested spatial aggregation involves one more level of aggregation, combining these aggregates into a single aggregate value. We show that spatial aggregate values can often be computed in-network, rather than waiting until the partial aggregate records reach the root as is now the case. This decreases the amount of communication involved in query evaluation, thereby reducing the network's power consumption. We describe an algorithm that allows us to determine when an aggregate record for any spatial region is ready to be evaluated in-network, based on decorating the routing tree with region leader lists. We also identify several important scenarios, such as nested spatial aggregation and filtering predicates, when the savings from our approach are expected to be particularly great.
空间聚合查询是地理感知空间传感器数据库应用中的一类重要查询。给定一组空间区域,它涉及同时在这些区域的每个动态传感器读数的聚合。嵌套空间聚合涉及更多的聚合级别,将这些聚合组合成单个聚合值。我们展示了空间聚合值通常可以在网络中计算,而不是像现在这样等到部分聚合记录到达根节点。这减少了查询评估中涉及的通信量,从而降低了网络的功耗。我们描述了一种算法,该算法允许我们确定任何空间区域的聚合记录何时准备好在网络中进行评估,基于用区域领导者列表装饰路由树。我们还确定了几个重要的场景,例如嵌套空间聚合和过滤谓词,当我们的方法节省的成本特别大时。
{"title":"Faster In-Network Evaluation of Spatial Aggregationin Sensor Networks","authors":"Dina Q. Goldin","doi":"10.1109/ICDE.2006.70","DOIUrl":"https://doi.org/10.1109/ICDE.2006.70","url":null,"abstract":"Spatial aggregation is an important class of queries for geoaware spatial sensor database applications. Given a set of spatial regions, it involves the aggregation of dynamic sensor readings over each of these regions simultaneously. Nested spatial aggregation involves one more level of aggregation, combining these aggregates into a single aggregate value. We show that spatial aggregate values can often be computed in-network, rather than waiting until the partial aggregate records reach the root as is now the case. This decreases the amount of communication involved in query evaluation, thereby reducing the network's power consumption. We describe an algorithm that allows us to determine when an aggregate record for any spatial region is ready to be evaluated in-network, based on decorating the routing tree with region leader lists. We also identify several important scenarios, such as nested spatial aggregation and filtering predicates, when the savings from our approach are expected to be particularly great.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"91 1","pages":"148-148"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86172797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Technique for Optimal Adaptation of Time-Dependent Workflows with Security Constraints 具有安全约束的时间相关工作流的最优自适应技术
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.156
Basit Shafiq, Arjmand Samuel, E. Bertino, A. Ghafoor
Distributed workflow based systems are widely used in various application domains including e-commerce, digital government, healthcare, manufacturing and many others. Workflows in these application domains are not restricted to the administrative boundaries of a single organization [1]. The tasks in a workflow need to be performed in a certain order and often times are subject to temporal constraints and dependencies [1, 2]. A key requirement for such workflow applications is to provide the right data to the right person at the right time. This requirement motivates for dynamic adaptations of workflows for dealing with changing environmental conditions and exceptions.
基于分布式工作流的系统广泛应用于各种应用领域,包括电子商务、数字政府、医疗保健、制造业和许多其他领域。这些应用领域中的工作流并不局限于单个组织的管理边界[1]。工作流中的任务需要按照一定的顺序执行,并且经常受到时间约束和依赖[1,2]。这种工作流应用程序的一个关键需求是在正确的时间向正确的人提供正确的数据。这一需求推动了工作流的动态调整,以处理不断变化的环境条件和异常。
{"title":"Technique for Optimal Adaptation of Time-Dependent Workflows with Security Constraints","authors":"Basit Shafiq, Arjmand Samuel, E. Bertino, A. Ghafoor","doi":"10.1109/ICDE.2006.156","DOIUrl":"https://doi.org/10.1109/ICDE.2006.156","url":null,"abstract":"Distributed workflow based systems are widely used in various application domains including e-commerce, digital government, healthcare, manufacturing and many others. Workflows in these application domains are not restricted to the administrative boundaries of a single organization [1]. The tasks in a workflow need to be performed in a certain order and often times are subject to temporal constraints and dependencies [1, 2]. A key requirement for such workflow applications is to provide the right data to the right person at the right time. This requirement motivates for dynamic adaptations of workflows for dealing with changing environmental conditions and exceptions.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"49 1","pages":"119-119"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85039885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable Exploration of Physical Database Design 物理数据库设计的可扩展探索
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.133
A. König, Shubha U. Nabar
Physical database design is critical to the performance of a large-scale DBMS. The corresponding automated design tuning tools need to select the best physical design from a large set of candidate designs quickly. However, for large workloads, evaluating the cost of each query in the workload for every candidate does not scale. To overcome this, we present a novel comparison primitive that only evaluates a fraction of the workload and provides an accurate estimate of the likelihood of selecting correctly. We show how to use this primitive to construct accurate and scalable selection procedures. Furthermore, we address the issue of ensuring that the estimates are conservative, even for highly skewed cost distributions. The proposed techniques are evaluated through a prototype implementation inside a commercial physical design tool.
物理数据库设计对大型DBMS的性能至关重要。相应的自动化设计调优工具需要从大量候选设计中快速选择最佳物理设计。然而,对于大型工作负载,评估每个候选工作负载中每个查询的成本是不可伸缩的。为了克服这个问题,我们提出了一种新的比较原语,它只评估工作负载的一小部分,并提供正确选择可能性的准确估计。我们将展示如何使用这个原语来构建准确且可伸缩的选择过程。此外,我们解决了确保估算是保守的问题,即使对于高度倾斜的成本分布也是如此。提出的技术通过商业物理设计工具内的原型实现进行评估。
{"title":"Scalable Exploration of Physical Database Design","authors":"A. König, Shubha U. Nabar","doi":"10.1109/ICDE.2006.133","DOIUrl":"https://doi.org/10.1109/ICDE.2006.133","url":null,"abstract":"Physical database design is critical to the performance of a large-scale DBMS. The corresponding automated design tuning tools need to select the best physical design from a large set of candidate designs quickly. However, for large workloads, evaluating the cost of each query in the workload for every candidate does not scale. To overcome this, we present a novel comparison primitive that only evaluates a fraction of the workload and provides an accurate estimate of the likelihood of selecting correctly. We show how to use this primitive to construct accurate and scalable selection procedures. Furthermore, we address the issue of ensuring that the estimates are conservative, even for highly skewed cost distributions. The proposed techniques are evaluated through a prototype implementation inside a commercial physical design tool.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"1 1","pages":"37-37"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88939422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Taming Compliance with Sarbanes-Oxley Internal Controls Using Database Technology 使用数据库技术驯服遵守萨班斯-奥克斯利内部控制
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.155
R. Agrawal, Christopher M. Johnson, J. Kiernan, F. Leymann
The Sarbanes-Oxley Act instituted a series of corporate reforms to improve the accuracy and reliability of financial reporting. Sections 302 and 404 of the Act require SEC-reporting companies to implement internal controls over financial reporting, periodically assess the effectiveness of these internal controls, and certify the accuracy of their financial statements. We suggest that database technology can play an important role in assisting compliance with the internal control provisions of the Act. The core components of our solution include: (i) modeling of required workflows, (ii) active enforcement of control activities, (iii) auditing of actual workflows to verify compliance with internal controls, and (iv) discovery-driven OLAP to identify irregularities in financial data. We illustrate how the features of our solution fulfill Sarbanes-Oxley requirements using several real-life scenarios. In the process, we identify opportunities for new database research.
《萨班斯-奥克斯利法案》(Sarbanes-Oxley Act)制定了一系列公司改革,以提高财务报告的准确性和可靠性。该法案第302条和第404条要求向美国证券交易委员会申报的公司对财务报告实施内部控制,定期评估这些内部控制的有效性,并证明其财务报表的准确性。我们建议,数据库技术可以在协助遵守该法的内部控制规定方面发挥重要作用。我们解决方案的核心组件包括:(i)所需工作流的建模,(ii)控制活动的主动执行,(iii)审计实际工作流以验证是否符合内部控制,以及(iv)发现驱动的OLAP以识别财务数据中的违规行为。我们使用几个现实场景来说明我们的解决方案的特性是如何满足萨班斯-奥克斯利法案的要求的。在这个过程中,我们发现了新的数据库研究的机会。
{"title":"Taming Compliance with Sarbanes-Oxley Internal Controls Using Database Technology","authors":"R. Agrawal, Christopher M. Johnson, J. Kiernan, F. Leymann","doi":"10.1109/ICDE.2006.155","DOIUrl":"https://doi.org/10.1109/ICDE.2006.155","url":null,"abstract":"The Sarbanes-Oxley Act instituted a series of corporate reforms to improve the accuracy and reliability of financial reporting. Sections 302 and 404 of the Act require SEC-reporting companies to implement internal controls over financial reporting, periodically assess the effectiveness of these internal controls, and certify the accuracy of their financial statements. We suggest that database technology can play an important role in assisting compliance with the internal control provisions of the Act. The core components of our solution include: (i) modeling of required workflows, (ii) active enforcement of control activities, (iii) auditing of actual workflows to verify compliance with internal controls, and (iv) discovery-driven OLAP to identify irregularities in financial data. We illustrate how the features of our solution fulfill Sarbanes-Oxley requirements using several real-life scenarios. In the process, we identify opportunities for new database research.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"44 1","pages":"92-92"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89070207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 90
Super-Scalar RAM-CPU Cache Compression 超标量RAM-CPU缓存压缩
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.150
M. Zukowski, S. Héman, N. Nes, P. Boncz
High-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even when high-end RAID storage systems are used. Compression can alleviate this bottleneck only if encoding and decoding speeds significantly exceed RAID I/O bandwidth. For this purpose, we propose three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs. We compare these algorithms with compression techniques used in (commercial) database and information retrieval systems. Our experiments on the MonetDB/X100 database system, using both DSM and PAX disk storage, show that these techniques strongly accelerate TPC-H performance to the point that the I/O bottleneck is eliminated.
高性能数据密集型查询处理任务,如OLAP、数据挖掘或科学数据分析,即使使用高端RAID存储系统,也可能受到严重的I/O限制。只有当编码和解码速度明显超过RAID I/O带宽时,压缩才能缓解这一瓶颈。为此,我们提出了三种新的通用压缩方案(PDICT、PFOR和PFOR- delta),它们专门用于从现代cpu中提取最大的IPC。我们将这些算法与(商业)数据库和信息检索系统中使用的压缩技术进行比较。我们在MonetDB/X100数据库系统上使用DSM和PAX磁盘存储进行的实验表明,这些技术极大地提高了TPC-H性能,从而消除了I/O瓶颈。
{"title":"Super-Scalar RAM-CPU Cache Compression","authors":"M. Zukowski, S. Héman, N. Nes, P. Boncz","doi":"10.1109/ICDE.2006.150","DOIUrl":"https://doi.org/10.1109/ICDE.2006.150","url":null,"abstract":"High-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even when high-end RAID storage systems are used. Compression can alleviate this bottleneck only if encoding and decoding speeds significantly exceed RAID I/O bandwidth. For this purpose, we propose three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs. We compare these algorithms with compression techniques used in (commercial) database and information retrieval systems. Our experiments on the MonetDB/X100 database system, using both DSM and PAX disk storage, show that these techniques strongly accelerate TPC-H performance to the point that the I/O bottleneck is eliminated.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"573 1","pages":"59-59"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80974966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 517
Clean Answers over Dirty Databases: A Probabilistic Approach 干净的答案胜于肮脏的数据库:一种概率方法
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.35
Periklis Andritsos, A. Fuxman, Renée J. Miller
The detection of duplicate tuples, corresponding to the same real-world entity, is an important task in data integration and cleaning. While many techniques exist to identify such tuples, the merging or elimination of duplicates can be a difficult task that relies on ad-hoc and often manual solutions. We propose a complementary approach that permits declarative query answering over duplicated data, where each duplicate is associated with a probability of being in the clean database. We rewrite queries over a database containing duplicates to return each answer with the probability that the answer is in the clean database. Our rewritten queries are sensitive to the semantics of duplication and help a user understand which query answers are most likely to be present in the clean database. The semantics that we adopt is independent of the way the probabilities are produced, but is able to effectively exploit them during query answering. In the absence of external knowledge that associates each database tuple with a probability, we offer a technique, based on tuple summaries, that automates this task. We experimentally study the performance of our rewritten queries. Our studies show that the rewriting does not introduce a significant overhead in query execution time. This work is done in the context of the ConQuer project at the University of Toronto, which focuses on the efficient management of inconsistent and dirty databases.
在数据集成和清理中,检测对应于相同现实世界实体的重复元组是一项重要任务。虽然有许多技术可以识别这样的元组,但是合并或消除重复项可能是一项困难的任务,它依赖于特别的、通常是手动的解决方案。我们提出了一种补充方法,允许对重复数据进行声明性查询回答,其中每个重复数据都与在干净数据库中的概率相关联。我们重写对包含重复项的数据库的查询,以返回每个答案的概率为答案在干净的数据库中。我们重写的查询对重复的语义很敏感,并帮助用户理解哪些查询答案最有可能出现在干净的数据库中。我们采用的语义独立于概率产生的方式,但能够在查询回答期间有效地利用它们。在缺乏将每个数据库元组与概率关联起来的外部知识的情况下,我们提供了一种基于元组摘要的技术,可以自动执行此任务。我们通过实验研究了重写查询的性能。我们的研究表明,重写不会在查询执行时间上带来很大的开销。这项工作是在多伦多大学的ConQuer项目的上下文中完成的,该项目的重点是对不一致和脏数据库的有效管理。
{"title":"Clean Answers over Dirty Databases: A Probabilistic Approach","authors":"Periklis Andritsos, A. Fuxman, Renée J. Miller","doi":"10.1109/ICDE.2006.35","DOIUrl":"https://doi.org/10.1109/ICDE.2006.35","url":null,"abstract":"The detection of duplicate tuples, corresponding to the same real-world entity, is an important task in data integration and cleaning. While many techniques exist to identify such tuples, the merging or elimination of duplicates can be a difficult task that relies on ad-hoc and often manual solutions. We propose a complementary approach that permits declarative query answering over duplicated data, where each duplicate is associated with a probability of being in the clean database. We rewrite queries over a database containing duplicates to return each answer with the probability that the answer is in the clean database. Our rewritten queries are sensitive to the semantics of duplication and help a user understand which query answers are most likely to be present in the clean database. The semantics that we adopt is independent of the way the probabilities are produced, but is able to effectively exploit them during query answering. In the absence of external knowledge that associates each database tuple with a probability, we offer a technique, based on tuple summaries, that automates this task. We experimentally study the performance of our rewritten queries. Our studies show that the rewriting does not introduce a significant overhead in query execution time. This work is done in the context of the ConQuer project at the University of Toronto, which focuses on the efficient management of inconsistent and dirty databases.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"20 5 1","pages":"30-30"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82904403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 202
Approximating Aggregation Queries in Peer-to-Peer Networks 对等网络中近似聚合查询
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.23
Benjamin Arai, Gautam Das, D. Gunopulos, V. Kalogeraki
Peer-to-peer databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large scale, ad-hoc analysis queries ― e.g., aggregation queries ― on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement given the distributed and dynamic nature of peer-to-peer databases. In this paper we present novel sampling-based techniques for approximate answering of ad-hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors ― the data is distributed (usually in uneven quantities) across many peers, within each peer the data is often highly correlated, and moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach, based on random walks of the P2P graph as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solutio
点对点数据库在Internet上越来越流行,用于分发和共享文档、应用程序和其他数字媒体。在这些数据库上回答大规模、特别的分析查询(例如聚合查询)的问题带来了独特的挑战。考虑到点对点数据库的分布式和动态性,精确的解决方案可能非常耗时且难以实现。在本文中,我们提出了一种新的基于抽样的技术来近似回答此类数据库中的特别聚合查询。由于以下几个因素,在P2P环境中高效地计算高质量的数据库随机样本是复杂的:数据分布在许多对等点上(通常数量不均匀),在每个对等点内数据通常是高度相关的,而且,即使收集对等点的随机样本也很难完成。为了解决这些问题,我们开发了一种基于P2P图的随机游走和块级采样技术的自适应两阶段采样方法。我们提出了广泛的实验评估,以证明我们提出的解决方案的可行性
{"title":"Approximating Aggregation Queries in Peer-to-Peer Networks","authors":"Benjamin Arai, Gautam Das, D. Gunopulos, V. Kalogeraki","doi":"10.1109/ICDE.2006.23","DOIUrl":"https://doi.org/10.1109/ICDE.2006.23","url":null,"abstract":"Peer-to-peer databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large scale, ad-hoc analysis queries ― e.g., aggregation queries ― on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement given the distributed and dynamic nature of peer-to-peer databases. In this paper we present novel sampling-based techniques for approximate answering of ad-hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors ― the data is distributed (usually in uneven quantities) across many peers, within each peer the data is often highly correlated, and moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach, based on random walks of the P2P graph as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solutio","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"6 3 1","pages":"42-42"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82910368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Space-efficient Relative Error Order Sketch over Data Streams 数据流上的空间效率相对错误顺序草图
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.145
Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang
We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - delta using O( 1_ in frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1frac{1} { in } log(1frac{1}{ in } log 1begin{gathered} frac{1}{delta } hfill hfill end{gathered} )frac{{log ^{2 + alpha } in N}} {{1 - 1/2^alpha }} (foralpha gt 0) on average while the worst case space remains O( frac{1}{{ in ^2 }}log frac{1} {delta }log in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee in and significantly improve the previous best space bound O( frac{1} {{ in ^3 }}log frac{1}{delta }log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.
考虑了具有相对秩误差保证的数据流上连续保持有序草图的问题。开发了新颖的空间高效和一次扫描随机技术。我们的第一个随机化算法可以使用O(1_ infrac{1} 22 log 1d log ^2N)空间保证这种相对误差精度为1 - delta,其中N是迄今为止在数据流中看到的数据元素的数量。在此基础上,提出了一种新的单扫描空间压缩技术。与第一种随机化算法结合,一次扫描空间压缩技术产生另一种一次扫描随机化算法,该算法保证空间需求为O(1 {}frac{1}{in log(1 }frac{1}{ in } log 1)begin{gathered} frac{1}{delta } hfill hfill end{gathered} )frac{{log ^{2 + alpha } in N}} {{1 - 1/2^alpha }} (用于alpha gt 平均为0),而最坏情况仍然为0 ( frac{1}{{ in ^2 }}log frac{1} {delta }log in 这些结果立即适用于具有相对误差保证的数据流上的近似计算分位数 in 并显著提高了之前的最佳空间界O( frac{1} {{ in ^3 }}log frac{1}{delta }log 我们广泛的实验结果表明,这两种技术都可以支持对高速数据流的在线计算。
{"title":"Space-efficient Relative Error Order Sketch over Data Streams","authors":"Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang","doi":"10.1109/ICDE.2006.145","DOIUrl":"https://doi.org/10.1109/ICDE.2006.145","url":null,"abstract":"We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - delta using O( 1_ in frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1frac{1} { in } log(1frac{1}{ in } log 1begin{gathered} frac{1}{delta } hfill hfill end{gathered} )frac{{log ^{2 + alpha } in N}} {{1 - 1/2^alpha }} (foralpha gt 0) on average while the worst case space remains O( frac{1}{{ in ^2 }}log frac{1} {delta }log in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee in and significantly improve the previous best space bound O( frac{1} {{ in ^3 }}log frac{1}{delta }log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"17 1","pages":"51-51"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83005437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Compiled Query Execution Engine using JVM 使用JVM的编译查询执行引擎
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.40
Jun Rao, H. Pirahesh, C. Mohan, G. Lohman
A conventional query execution engine in a database system essentially uses a SQL virtual machine (SVM) to interpret a dataflow tree in which each node is associated with a relational operator. During query evaluation, a single tuple at a time is processed and passed among the operators. Such a model is popular because of its efficiency for pipelined processing. However, since each operator is implemented statically, it has to be very generic in order to deal with all possible queries. Such generality tends to introduce significant runtime inefficiency, especially in the context of memory-resident systems, because the granularity of data commercial system, using SVM. processing (a tuple) is too small compared with the associated overhead. Another disadvantage in such an engine is that each operator code is compiled statically, so query-specific optimization cannot be applied. To improve runtime efficiency, we propose a compiled execution engine, which, for a given query, generates new query-specific code on the fly, and then dynamically compiles and executes the code. The Java platform makes our approach particularly interesting for several reasons: (1) modern Java Virtual Machines (JVM) have Just- In-Time (JIT) compilers that optimize code at runtime based on the execution pattern, a key feature that SVMs lack; (2) because of Java’s continued popularity, JVMs keep improving at a faster pace than SVMs, allowing us to exploit new advances in the Java runtime in the future; (3) Java is a dynamic language, which makes it convenient to load a piece of new code on the fly. In this paper, we develop both an interpreted and a compiled query execution engine in a relational, Java-based, in-memory database prototype, and perform an experimental study. Our experimental results on the TPC-H data set show that, despite both engines benefiting from JIT, the compiled engine runs on average about twice as fast as the interpreted one, and significantly faster than an in-memory
数据库系统中的传统查询执行引擎本质上使用SQL虚拟机(SVM)来解释数据流树,其中每个节点都与一个关系操作符相关联。在查询求值期间,每次处理一个元组,并在操作符之间传递。这种模型因其对流水线处理的效率而广受欢迎。但是,由于每个操作符都是静态实现的,因此它必须非常通用,以便处理所有可能的查询。这种通用性往往会引入显著的运行时效率低下,特别是在内存驻留系统的上下文中,因为使用支持向量机的商业系统的数据粒度。与相关的开销相比,处理(元组)太小了。这种引擎的另一个缺点是,每个操作符代码都是静态编译的,因此无法应用特定于查询的优化。为了提高运行时效率,我们提出了一个编译执行引擎,对于给定的查询,它动态地生成新的特定于查询的代码,然后动态地编译和执行代码。Java平台使我们的方法特别有趣,原因如下:(1)现代Java虚拟机(JVM)具有即时(JIT)编译器,它基于执行模式在运行时优化代码,这是svm所缺乏的一个关键特性;(2)由于Java的持续流行,jvm以比svm更快的速度不断改进,使我们能够在未来利用Java运行时的新进展;(3) Java是一种动态语言,这使得动态加载一段新代码非常方便。在本文中,我们在一个基于java的关系型内存数据库原型中开发了一个解释型和编译型查询执行引擎,并进行了实验研究。我们在TPC-H数据集上的实验结果表明,尽管两个引擎都受益于JIT,但编译引擎的平均运行速度是解释引擎的两倍,并且明显快于内存中的引擎
{"title":"Compiled Query Execution Engine using JVM","authors":"Jun Rao, H. Pirahesh, C. Mohan, G. Lohman","doi":"10.1109/ICDE.2006.40","DOIUrl":"https://doi.org/10.1109/ICDE.2006.40","url":null,"abstract":"A conventional query execution engine in a database system essentially uses a SQL virtual machine (SVM) to interpret a dataflow tree in which each node is associated with a relational operator. During query evaluation, a single tuple at a time is processed and passed among the operators. Such a model is popular because of its efficiency for pipelined processing. However, since each operator is implemented statically, it has to be very generic in order to deal with all possible queries. Such generality tends to introduce significant runtime inefficiency, especially in the context of memory-resident systems, because the granularity of data commercial system, using SVM. processing (a tuple) is too small compared with the associated overhead. Another disadvantage in such an engine is that each operator code is compiled statically, so query-specific optimization cannot be applied. To improve runtime efficiency, we propose a compiled execution engine, which, for a given query, generates new query-specific code on the fly, and then dynamically compiles and executes the code. The Java platform makes our approach particularly interesting for several reasons: (1) modern Java Virtual Machines (JVM) have Just- In-Time (JIT) compilers that optimize code at runtime based on the execution pattern, a key feature that SVMs lack; (2) because of Java’s continued popularity, JVMs keep improving at a faster pace than SVMs, allowing us to exploit new advances in the Java runtime in the future; (3) Java is a dynamic language, which makes it convenient to load a piece of new code on the fly. In this paper, we develop both an interpreted and a compiled query execution engine in a relational, Java-based, in-memory database prototype, and perform an experimental study. Our experimental results on the TPC-H data set show that, despite both engines benefiting from JIT, the compiled engine runs on average about twice as fast as the interpreted one, and significantly faster than an in-memory","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"71 1","pages":"23-23"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83224777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
Updates Through Views: A New Hope 透过视野更新:新的希望
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.167
Y. Kotidis, D. Srivastava, Yannis Velegrakis
Database views are extensively used to represent unmaterialized tables. Applications rarely distinguish between a materialized base table and a virtual view, thus, they may issue update requests on the views. Since views are virtual, update requests on them need to be translated to updates on the base tables. Existing literature has shown the difficulty of translating view updates in a side-effect free manner. To address this problem, we propose a novel approach for separating the data instance into a logical and a physical level. This separation allows us to achieve side-effect free translations of any kind of update on the view. Furthermore, deletes on a view can be translated without affecting the base tables. We describe the implementation of the framework and present our experimental results
数据库视图广泛用于表示非实体化的表。应用程序很少区分物化基表和虚拟视图,因此,它们可能会对视图发出更新请求。由于视图是虚拟的,因此需要将对视图的更新请求转换为对基表的更新。现有文献表明,以无副作用的方式翻译视图更新是困难的。为了解决这个问题,我们提出了一种将数据实例划分为逻辑和物理层的新方法。这种分离使我们能够实现视图上任何类型更新的无副作用翻译。此外,视图上的删除可以在不影响基表的情况下进行转换。我们描述了框架的实现,并给出了我们的实验结果
{"title":"Updates Through Views: A New Hope","authors":"Y. Kotidis, D. Srivastava, Yannis Velegrakis","doi":"10.1109/ICDE.2006.167","DOIUrl":"https://doi.org/10.1109/ICDE.2006.167","url":null,"abstract":"Database views are extensively used to represent unmaterialized tables. Applications rarely distinguish between a materialized base table and a virtual view, thus, they may issue update requests on the views. Since views are virtual, update requests on them need to be translated to updates on the base tables. Existing literature has shown the difficulty of translating view updates in a side-effect free manner. To address this problem, we propose a novel approach for separating the data instance into a logical and a physical level. This separation allows us to achieve side-effect free translations of any kind of update on the view. Furthermore, deletes on a view can be translated without affecting the base tables. We describe the implementation of the framework and present our experimental results","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"67 1","pages":"2-2"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78602208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
期刊
22nd International Conference on Data Engineering (ICDE'06)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1