首页 > 最新文献

21st International Conference on Data Engineering (ICDE'05)最新文献

英文 中文
DUP: Dynamic-Tree Based Update Propagation in Peer-to-Peer Networks 对等网络中基于动态树的更新传播
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.52
Liangzhong Yin, G. Cao
In peer-to-peer networks, indices are used to map data id to nodes that host the data. The performance of data access can be improved by actively pushing indices to interested nodes. This paper proposes the Dynamic-tree based Update Propagation (DUP) scheme, which builds the update propagation tree to facilitate the propagation of indices. Because the update propagation tree only involves nodes that are essential for update propagation, the overhead of DUP is very small and the query latency is significantly reduced.
在对等网络中,索引用于将数据id映射到承载数据的节点。通过主动将索引推送到感兴趣的节点,可以提高数据访问的性能。本文提出了基于动态树的更新传播(DUP)方案,该方案通过构建更新传播树来促进索引的传播。由于更新传播树只涉及更新传播所必需的节点,因此DUP的开销非常小,查询延迟也大大降低。
{"title":"DUP: Dynamic-Tree Based Update Propagation in Peer-to-Peer Networks","authors":"Liangzhong Yin, G. Cao","doi":"10.1109/ICDE.2005.52","DOIUrl":"https://doi.org/10.1109/ICDE.2005.52","url":null,"abstract":"In peer-to-peer networks, indices are used to map data id to nodes that host the data. The performance of data access can be improved by actively pushing indices to interested nodes. This paper proposes the Dynamic-tree based Update Propagation (DUP) scheme, which builds the update propagation tree to facilitate the propagation of indices. Because the update propagation tree only involves nodes that are essential for update propagation, the overhead of DUP is very small and the query latency is significantly reduced.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114968012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
PnP: parallel and external memory iceberg cube computation PnP:并行和外部内存冰山立方计算
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.107
Ying Chen, F. Dehne, Todd Eavis, A. Rau-Chaplin
We present "Pipe 'n Prune" (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that it achieves a tight integration of top-down piping for data aggregation with bottom-up a priori data pruning. A particular strength of PnP is that it is very efficient for all of the following scenarios: (1) Sequential iceberg-cube queries. (2) External memory iceberg-cube queries. (3) Parallel iceberg-cube queries on shared-nothing PC clusters with multiple disks.
提出了一种新的用于冰山-立方体查询计算的混合方法“Pipe 'n Prune”(PnP)。我们方法的新颖之处在于,它实现了自顶向下的数据聚合管道与自底向上的先验数据修剪的紧密集成。PnP的一个特殊优点是,它对以下所有场景都非常有效:(1)顺序冰山立方体查询。(2)外部内存冰山立方体查询。(3)在多磁盘的无共享PC集群上并行冰山立方体查询。
{"title":"PnP: parallel and external memory iceberg cube computation","authors":"Ying Chen, F. Dehne, Todd Eavis, A. Rau-Chaplin","doi":"10.1109/ICDE.2005.107","DOIUrl":"https://doi.org/10.1109/ICDE.2005.107","url":null,"abstract":"We present \"Pipe 'n Prune\" (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that it achieves a tight integration of top-down piping for data aggregation with bottom-up a priori data pruning. A particular strength of PnP is that it is very efficient for all of the following scenarios: (1) Sequential iceberg-cube queries. (2) External memory iceberg-cube queries. (3) Parallel iceberg-cube queries on shared-nothing PC clusters with multiple disks.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116668908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Fast approximate similarity search in extremely high-dimensional data sets 在极高维数据集中快速近似相似搜索
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.66
M. Houle, J. Sakuma
This paper introduces a practical index for approximate similarity queries of large multi-dimensional data sets: the spatial approximation sample hierarchy (SASH). A SASH is a multi-level structure of random samples, recursively constructed by building a SASH on a large randomly selected sample of data objects, and then connecting each remaining object to several of their approximate nearest neighbors from within the sample. Queries are processed by first locating approximate neighbors within the sample, and then using the pre-established connections to discover neighbors within the remainder of the data set. The SASH index relies on a pairwise distance measure, but otherwise makes no assumptions regarding the representation of the data. Experimental results are provided for query-by-example operations on protein sequence, image, and text data sets, including one consisting of more than 1 million vectors spanning more than 1.1 million terms - far in excess of what spatial search indices can handle efficiently. For sets of this size, the SASH can return a large proportion of the true neighbors roughly 2 orders of magnitude faster than sequential search.
本文介绍了一种用于大型多维数据集近似相似查询的实用索引:空间近似样本层次(SASH)。SASH是随机样本的多级结构,它通过在随机选择的大量数据对象样本上构建SASH,然后将每个剩余对象连接到样本内的几个最接近的邻居来递归地构建。查询的处理方法是首先定位样本中的近似邻居,然后使用预先建立的连接来发现数据集其余部分中的邻居。SASH指数依赖于两两距离度量,但除此之外对数据的表示没有任何假设。实验结果提供了对蛋白质序列、图像和文本数据集的按例查询操作,包括一个由超过100万个向量组成、跨越超过110万个术语的数据集——远远超出了空间搜索索引所能有效处理的范围。对于这种大小的集合,SASH可以返回很大比例的真实邻居,大约比顺序搜索快2个数量级。
{"title":"Fast approximate similarity search in extremely high-dimensional data sets","authors":"M. Houle, J. Sakuma","doi":"10.1109/ICDE.2005.66","DOIUrl":"https://doi.org/10.1109/ICDE.2005.66","url":null,"abstract":"This paper introduces a practical index for approximate similarity queries of large multi-dimensional data sets: the spatial approximation sample hierarchy (SASH). A SASH is a multi-level structure of random samples, recursively constructed by building a SASH on a large randomly selected sample of data objects, and then connecting each remaining object to several of their approximate nearest neighbors from within the sample. Queries are processed by first locating approximate neighbors within the sample, and then using the pre-established connections to discover neighbors within the remainder of the data set. The SASH index relies on a pairwise distance measure, but otherwise makes no assumptions regarding the representation of the data. Experimental results are provided for query-by-example operations on protein sequence, image, and text data sets, including one consisting of more than 1 million vectors spanning more than 1.1 million terms - far in excess of what spatial search indices can handle efficiently. For sets of this size, the SASH can return a large proportion of the true neighbors roughly 2 orders of magnitude faster than sequential search.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122627757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 119
NFM/sup i/: an inner-domain network fault management system NFM/sup i/:域内网络故障管理系统
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.94
Qingchun Jiang, R. Adaikkalavan, Sharma Chakravarthy
Network fault management has been an active research area for a long period of time because of its complexity, and the returns it generates for service providers. However, most fault management systems are currently custom-developed for a particular domain. As communication service providers continuously add greater capabilities and sophistication to their systems in order to meet demands of a growing user population, these systems have to manage a multi-layered network along with its built-in legacy logical processing procedure. Stream processing has been receiving a lot of attention to deal with applications that generate large amounts of data in real-time at varying input rates and to compute functions over multiple streams, such as network fault management. In this paper, we propose an integrated inter-domain network fault management system for such a multi-layered network based on data stream and event processing techniques. We discuss various components in our system and how data stream processing techniques are used to build a flexible system for a sophisticated real-world application. We further identify a number of important issues related to data stream processing during the course of the discussion of our proposed system, which will further extend the boundaries of data stream processing.
网络故障管理由于其复杂性和为服务提供商带来的回报,长期以来一直是一个活跃的研究领域。然而,目前大多数故障管理系统都是针对特定领域定制开发的。为了满足不断增长的用户群的需求,通信服务提供商不断为其系统添加更大的功能和复杂性,这些系统必须管理多层网络及其内置的传统逻辑处理过程。流处理在处理以不同输入速率实时生成大量数据的应用程序以及在多个流上计算功能(例如网络故障管理)方面受到了很多关注。本文提出了一种基于数据流和事件处理技术的集成域间网络故障管理系统。我们将讨论系统中的各种组件,以及如何使用数据流处理技术为复杂的实际应用程序构建灵活的系统。在讨论我们提出的系统的过程中,我们进一步确定了与数据流处理相关的一些重要问题,这将进一步扩展数据流处理的边界。
{"title":"NFM/sup i/: an inner-domain network fault management system","authors":"Qingchun Jiang, R. Adaikkalavan, Sharma Chakravarthy","doi":"10.1109/ICDE.2005.94","DOIUrl":"https://doi.org/10.1109/ICDE.2005.94","url":null,"abstract":"Network fault management has been an active research area for a long period of time because of its complexity, and the returns it generates for service providers. However, most fault management systems are currently custom-developed for a particular domain. As communication service providers continuously add greater capabilities and sophistication to their systems in order to meet demands of a growing user population, these systems have to manage a multi-layered network along with its built-in legacy logical processing procedure. Stream processing has been receiving a lot of attention to deal with applications that generate large amounts of data in real-time at varying input rates and to compute functions over multiple streams, such as network fault management. In this paper, we propose an integrated inter-domain network fault management system for such a multi-layered network based on data stream and event processing techniques. We discuss various components in our system and how data stream processing techniques are used to build a flexible system for a sophisticated real-world application. We further identify a number of important issues related to data stream processing during the course of the discussion of our proposed system, which will further extend the boundaries of data stream processing.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122876188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
BOXes: efficient maintenance of order-based labeling for dynamic XML data 框:有效地维护动态XML数据的基于顺序的标记
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.29
Adam Silberstein, Hao He, K. Yi, Jun Yang
Order-based element labeling for tree-structured XML data is an important technique in XML processing. It lies at the core of many fundamental XML operations such as containment join and twig matching. While labeling for static XML documents is well understood, less is known about how to maintain accurate labeling for dynamic XML documents, when elements and subtrees are inserted and deleted. Most existing approaches do not work well for arbitrary update patterns; they either produce unacceptably long labels or incur enormous relabeling costs. We present two novel I/O-efficient data structures, W-BOX and B-BOX that efficiently maintain labeling for large, dynamic XML documents. We show analytically and experimentally that both, despite consuming minimal amounts of storage, gracefully handle arbitrary update patterns without sacrificing lookup efficiency. The two structures together provide a nice tradeoff between update and lookup costs: W-BOX has logarithmic amortized update cost and constant worst-case lookup cost, while B-BOX has constant amortized update cost and logarithmic worst-case lookup cost. We further propose techniques to eliminate the lookup cost for read-heavy workloads.
树结构XML数据的基于顺序的元素标记是XML处理中的一项重要技术。它是许多基本XML操作(如包含连接和分支匹配)的核心。虽然对静态XML文档的标记理解得很好,但对于如何在插入和删除元素和子树时为动态XML文档维护准确的标记却知之甚少。大多数现有的方法都不能很好地处理任意的更新模式;他们要么生产长得令人无法接受的标签,要么承担巨大的重新标签成本。我们提出了两种新颖的I/ o高效数据结构W-BOX和B-BOX,它们可以有效地维护大型动态XML文档的标记。我们通过分析和实验证明,尽管消耗的存储量很小,但两者都可以在不牺牲查找效率的情况下优雅地处理任意更新模式。这两种结构一起提供了更新和查找成本之间的良好权衡:W-BOX具有对数平摊更新成本和恒定的最坏情况查找成本,而B-BOX具有恒定的平摊更新成本和对数最坏情况查找成本。我们进一步提出了一些技术来消除读取繁重工作负载的查找成本。
{"title":"BOXes: efficient maintenance of order-based labeling for dynamic XML data","authors":"Adam Silberstein, Hao He, K. Yi, Jun Yang","doi":"10.1109/ICDE.2005.29","DOIUrl":"https://doi.org/10.1109/ICDE.2005.29","url":null,"abstract":"Order-based element labeling for tree-structured XML data is an important technique in XML processing. It lies at the core of many fundamental XML operations such as containment join and twig matching. While labeling for static XML documents is well understood, less is known about how to maintain accurate labeling for dynamic XML documents, when elements and subtrees are inserted and deleted. Most existing approaches do not work well for arbitrary update patterns; they either produce unacceptably long labels or incur enormous relabeling costs. We present two novel I/O-efficient data structures, W-BOX and B-BOX that efficiently maintain labeling for large, dynamic XML documents. We show analytically and experimentally that both, despite consuming minimal amounts of storage, gracefully handle arbitrary update patterns without sacrificing lookup efficiency. The two structures together provide a nice tradeoff between update and lookup costs: W-BOX has logarithmic amortized update cost and constant worst-case lookup cost, while B-BOX has constant amortized update cost and logarithmic worst-case lookup cost. We further propose techniques to eliminate the lookup cost for read-heavy workloads.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129448049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
SNAP: efficient snapshots for back-in-time execution SNAP:用于回溯时间执行的高效快照
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.133
L. Shrira, Hao Xu
SNAP is a novel high-performance snapshot system for object storage systems. The goal is to provide a snapshot service that is efficient enough to permit "back-in-time" read-only activities to run against application-specified snapshots. Such activities are often impossible to run against rapidly evolving current state because of interference or because the required activity is determined in retrospect. A key innovation in SNAP is that it provides snapshots that are transactionally consistent, yet non-disruptive. Unlike earlier systems, we use novel in-memory data structures to ensure that frequent snapshots do not block applications from accessing the storage system, and do not cause unnecessary disk operations. SNAP takes a novel approach to dealing with snapshot meta-data using a new technique that supports both incremental meta-data creation and efficient meta-data reconstruction. We have implemented a SNAP prototype and analyzed its performance. Preliminary results show that providing snapshots for back-in-time activities has low impact on system performance even when snapshots are frequent.
SNAP是一种面向对象存储系统的新型高性能快照系统。目标是提供一个足够高效的快照服务,以允许“回溯”只读活动针对应用程序指定的快照运行。由于干扰或所需的活动是在回顾中确定的,此类活动通常不可能针对快速发展的当前状态运行。SNAP的一个关键创新之处在于,它提供的快照在事务上是一致的,但不会中断。与早期的系统不同,我们使用新颖的内存数据结构来确保频繁的快照不会阻塞应用程序访问存储系统,也不会导致不必要的磁盘操作。SNAP采用一种新颖的方法来处理快照元数据,它使用一种支持增量元数据创建和高效元数据重建的新技术。我们实现了一个SNAP原型并对其性能进行了分析。初步结果表明,即使快照频繁,为回溯活动提供快照对系统性能的影响也很小。
{"title":"SNAP: efficient snapshots for back-in-time execution","authors":"L. Shrira, Hao Xu","doi":"10.1109/ICDE.2005.133","DOIUrl":"https://doi.org/10.1109/ICDE.2005.133","url":null,"abstract":"SNAP is a novel high-performance snapshot system for object storage systems. The goal is to provide a snapshot service that is efficient enough to permit \"back-in-time\" read-only activities to run against application-specified snapshots. Such activities are often impossible to run against rapidly evolving current state because of interference or because the required activity is determined in retrospect. A key innovation in SNAP is that it provides snapshots that are transactionally consistent, yet non-disruptive. Unlike earlier systems, we use novel in-memory data structures to ensure that frequent snapshots do not block applications from accessing the storage system, and do not cause unnecessary disk operations. SNAP takes a novel approach to dealing with snapshot meta-data using a new technique that supports both incremental meta-data creation and efficient meta-data reconstruction. We have implemented a SNAP prototype and analyzed its performance. Preliminary results show that providing snapshots for back-in-time activities has low impact on system performance even when snapshots are frequent.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123875876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Optimizing ETL processes in data warehouses 优化数据仓库中的ETL流程
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.103
A. Simitsis, Panos Vassiliadis, T. Sellis
Extraction-transformation-loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. Usually, these processes must be completed in a certain time window; thus, it is necessary to optimize their execution time. In this paper, we delve into the logical optimization of ETL processes, modeling it as a state-space search problem. We consider each ETL workflow as a state and fabricate the state space through a set of correct state transitions. Moreover, we provide algorithms towards the minimization of the execution cost of an ETL workflow.
提取-转换加载(ETL)工具是负责从多个数据源提取数据、清理数据、定制数据并将其插入数据仓库的软件。通常,这些过程必须在一定的时间窗口内完成;因此,有必要优化它们的执行时间。在本文中,我们深入研究了ETL过程的逻辑优化,将其建模为一个状态空间搜索问题。我们将每个ETL工作流视为一个状态,并通过一组正确的状态转换来构造状态空间。此外,我们还提供了最小化ETL工作流执行成本的算法。
{"title":"Optimizing ETL processes in data warehouses","authors":"A. Simitsis, Panos Vassiliadis, T. Sellis","doi":"10.1109/ICDE.2005.103","DOIUrl":"https://doi.org/10.1109/ICDE.2005.103","url":null,"abstract":"Extraction-transformation-loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. Usually, these processes must be completed in a certain time window; thus, it is necessary to optimize their execution time. In this paper, we delve into the logical optimization of ETL processes, modeling it as a state-space search problem. We consider each ETL workflow as a state and fabricate the state space through a set of correct state transitions. Moreover, we provide algorithms towards the minimization of the execution cost of an ETL workflow.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"13 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120914639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 224
Web services and service-oriented architectures Web服务和面向服务的体系结构
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.154
G. Alonso, F. Casati
Web services, and more in general service-oriented architectures (SOAs), are emerging as the technologies and architectures of choice for implementing distributed systems and performing application integration within and across companies boundaries. In this article we describe Web services from an evolutionary perspective, with an emphasis on the utilization for enterprise application integration and service-oriented architectures. The article also covers basic middleware problems and shows how the solutions to these problems have finally evolved into what we call today Web services.
Web服务,更一般地说是面向服务的体系结构(soa),正在成为实现分布式系统和在公司内部和跨公司边界执行应用程序集成的首选技术和体系结构。在本文中,我们将从演化的角度描述Web服务,重点是对企业应用程序集成和面向服务的体系结构的利用。本文还介绍了基本的中间件问题,并展示了这些问题的解决方案是如何最终演变成我们今天所说的Web服务的。
{"title":"Web services and service-oriented architectures","authors":"G. Alonso, F. Casati","doi":"10.1109/ICDE.2005.154","DOIUrl":"https://doi.org/10.1109/ICDE.2005.154","url":null,"abstract":"Web services, and more in general service-oriented architectures (SOAs), are emerging as the technologies and architectures of choice for implementing distributed systems and performing application integration within and across companies boundaries. In this article we describe Web services from an evolutionary perspective, with an emphasis on the utilization for enterprise application integration and service-oriented architectures. The article also covers basic middleware problems and shows how the solutions to these problems have finally evolved into what we call today Web services.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126488725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 102
Full-fledged algebraic XPath processing in Natix Natix中成熟的代数XPath处理
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.69
M. Brantner, S. Helmer, C. Kanne, G. Moerkotte
We present the first complete translation of XPath into an algebra, paving the way for a comprehensive, state-of-the-art XPath (and later on, XQuery) compiler based on algebraic optimization techniques. Our translation includes all XPath features such as nested expressions, position-based predicates and node-set functions. The translated algebraic expressions can be executed using the proven, scalable, iterator-based approach, as we demonstrate in form of a corresponding physical algebra in our native XML DBMS Natix. A first glance at performance results shows that even without further optimization of the expressions, we provide a competitive evaluation technique for XPath queries.
我们首次将XPath完整地转换为代数,为基于代数优化技术的全面的、最先进的XPath(以及后来的XQuery)编译器铺平了道路。我们的翻译包括所有XPath特性,如嵌套表达式、基于位置的谓词和节点集函数。翻译后的代数表达式可以使用经过验证的、可扩展的、基于迭代器的方法来执行,正如我们在原生XML DBMS Natix中以相应物理代数的形式演示的那样。乍一看性能结果就会发现,即使没有进一步优化表达式,我们也为XPath查询提供了一种有竞争力的评估技术。
{"title":"Full-fledged algebraic XPath processing in Natix","authors":"M. Brantner, S. Helmer, C. Kanne, G. Moerkotte","doi":"10.1109/ICDE.2005.69","DOIUrl":"https://doi.org/10.1109/ICDE.2005.69","url":null,"abstract":"We present the first complete translation of XPath into an algebra, paving the way for a comprehensive, state-of-the-art XPath (and later on, XQuery) compiler based on algebraic optimization techniques. Our translation includes all XPath features such as nested expressions, position-based predicates and node-set functions. The translated algebraic expressions can be executed using the proven, scalable, iterator-based approach, as we demonstrate in form of a corresponding physical algebra in our native XML DBMS Natix. A first glance at performance results shows that even without further optimization of the expressions, we provide a competitive evaluation technique for XPath queries.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132966719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Range-efficient computation of F/sub 0/ over massive data streams 大规模数据流上F/sub / 0/的距离高效计算
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.118
A. Pavan, S. Tirthapura
Efficient one-pass computation of F/sub 0/, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider the problem of efficiently estimating F/sub 0/ of a data stream where each element of the stream is an interval of integers. We present a randomized algorithm which gives an (/spl epsiv/, /spl delta/) approximation of F/sub 0/, with the following time complexity (n is the size of the universe of the items): (1) the amortized processing time per interval is O(log1//spl delta/ log n//spl epsiv/). (2) The time to answer a query for F/sub 0/ is O(log1//spl delta/). The workspace used is O(1//spl epsiv//sup 2/log1//spl delta/logn) bits. Our algorithm improves upon a previous algorithm by Bar-Yossef Kumar and Sivakumar (2002), which requires O(1//spl epsiv//sup 5/log1//spl delta/log/sup 5/n) processing time per item. Our algorithm can be used to compute the max-dominance norm of a stream of multiple signals, and significantly improves upon the current best bounds due to Cormode and Muthukrishnan (2003). This also provides efficient and novel solutions for data aggregation problems in sensor networks studied by Nath and Gibbons (2004) and Considine et. al. (2004).
有效地一次计算F/sub 0/,即数据流中不同元素的数量,是在数据库和网络的各种环境中出现的一个基本问题。我们考虑一个数据流的F/sub 0/的有效估计问题,其中数据流的每个元素都是一个整数区间。我们提出了一种随机化算法,该算法给出了F/sub 0/的(/spl epsiv/, /spl delta/)近似,具有以下时间复杂度(n是项目的范围大小):(1)每个区间的平摊处理时间为O(log1//spl delta/ logn //spl epsiv/)。(2)回答查询F/sub 0/的时间为O(log1//spl delta/)。使用的工作空间是O(1//spl epsiv//sup 2/log1//spl delta/logn)位。我们的算法改进了Bar-Yossef Kumar和Sivakumar(2002)之前的算法,该算法要求每个项目的处理时间为O(1//spl epsiv//sup 5/log1//spl delta/log/sup 5/n)。我们的算法可用于计算多个信号流的最大优势范数,并且由于Cormode和Muthukrishnan(2003)而显著改进了当前的最佳界。这也为Nath和Gibbons(2004)以及Considine等人(2004)研究的传感器网络中的数据聚合问题提供了高效和新颖的解决方案。
{"title":"Range-efficient computation of F/sub 0/ over massive data streams","authors":"A. Pavan, S. Tirthapura","doi":"10.1109/ICDE.2005.118","DOIUrl":"https://doi.org/10.1109/ICDE.2005.118","url":null,"abstract":"Efficient one-pass computation of F/sub 0/, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider the problem of efficiently estimating F/sub 0/ of a data stream where each element of the stream is an interval of integers. We present a randomized algorithm which gives an (/spl epsiv/, /spl delta/) approximation of F/sub 0/, with the following time complexity (n is the size of the universe of the items): (1) the amortized processing time per interval is O(log1//spl delta/ log n//spl epsiv/). (2) The time to answer a query for F/sub 0/ is O(log1//spl delta/). The workspace used is O(1//spl epsiv//sup 2/log1//spl delta/logn) bits. Our algorithm improves upon a previous algorithm by Bar-Yossef Kumar and Sivakumar (2002), which requires O(1//spl epsiv//sup 5/log1//spl delta/log/sup 5/n) processing time per item. Our algorithm can be used to compute the max-dominance norm of a stream of multiple signals, and significantly improves upon the current best bounds due to Cormode and Muthukrishnan (2003). This also provides efficient and novel solutions for data aggregation problems in sensor networks studied by Nath and Gibbons (2004) and Considine et. al. (2004).","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130778163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
期刊
21st International Conference on Data Engineering (ICDE'05)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1