首页 > 最新文献

Proceedings. 20th International Conference on Data Engineering最新文献

英文 中文
Improving logging and recovery performance in Phoenix/App 改进了Phoenix/App的日志记录和恢复性能
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320021
R. Barga, Shimin Chen, D. Lomet
Phoenix/App supports software components whose states are made persistent across a system crash via redo recovery, replaying logged interactions. Our initial prototype force logged all request/reply events resulting from intercomponent method calls and returns. We describe an enhanced prototype that implements: (i) log optimizations to improve normal execution performance; and (ii) checkpointing to improve recovery performance. Logging is reduced in two ways: (1) we only log information required to remove nondeterminism, and we only force the log when an event "commits" the state of the component to other parts of the system; (2) we introduce new component types that provide our enhanced system with more information, enabling further reduction in logging. To improve recovery performance, we save the values of the fields of a component to the log in an application "checkpoint". We describe the system elements that we exploit for these optimizations, and characterize the performance gains that result.
Phoenix/App支持通过重做恢复、重放记录的交互,在系统崩溃时保持状态的软件组件。我们最初的原型force记录了所有由组件间方法调用和返回引起的请求/回复事件。我们描述了一个增强的原型,它实现了:(i)日志优化以提高正常的执行性能;(ii)检查点以提高恢复性能。日志记录从两个方面减少:(1)我们只记录消除不确定性所需的信息,并且我们只在事件将组件的状态“提交”给系统的其他部分时强制记录日志;(2)我们引入新的组件类型,为我们增强的系统提供更多信息,从而进一步减少日志记录。为了提高恢复性能,我们将组件字段的值保存到应用程序“检查点”的日志中。我们描述了用于这些优化的系统元素,并描述了由此产生的性能增益。
{"title":"Improving logging and recovery performance in Phoenix/App","authors":"R. Barga, Shimin Chen, D. Lomet","doi":"10.1109/ICDE.2004.1320021","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320021","url":null,"abstract":"Phoenix/App supports software components whose states are made persistent across a system crash via redo recovery, replaying logged interactions. Our initial prototype force logged all request/reply events resulting from intercomponent method calls and returns. We describe an enhanced prototype that implements: (i) log optimizations to improve normal execution performance; and (ii) checkpointing to improve recovery performance. Logging is reduced in two ways: (1) we only log information required to remove nondeterminism, and we only force the log when an event \"commits\" the state of the component to other parts of the system; (2) we introduce new component types that provide our enhanced system with more information, enabling further reduction in logging. To improve recovery performance, we save the values of the fields of a component to the log in an application \"checkpoint\". We describe the system elements that we exploit for these optimizations, and characterize the performance gains that result.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131453667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Priority mechanisms for OLTP and transactional Web applications OLTP 和事务型网络应用程序的优先级机制
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320025
David T. McWherter, Bianca Schroeder, A. Ailamaki, Mor Harchol-Balter
Transactional workloads are a hallmark of modern OLTP and Web applications, ranging from electronic commerce and banking to online shopping. Often, the database at the core of these applications is the performance bottleneck. Given the limited resources available to the database, transaction execution times can vary wildly as they compete and wait for critical resources. As the competitor is "only a click away", valuable (high-priority) users must be ensured consistently good performance via QoS and transaction prioritization. This paper analyzes and proposes prioritization for transactional workloads in traditional database systems (DBMS). This work first performs a detailed bottleneck analysis of resource usage by transactional workloads on commercial and noncommercial DBMS (IBM DB2, Post-greSQL, Shore) under a range of configurations. Second, this work implements and evaluates the performance of several preemptive and nonpreemptive DBMS prioritization policies in PostgreSQL and Shore. The primary contributions of this work include (i) understanding the bottleneck resources in transactional DBMS workloads and (ii) a demonstration that prioritization in traditional DBMS can provide 2x-5x improvement for high-priority transactions using simple scheduling policies, without expense to low-priority transactions.
事务工作负载是现代OLTP和Web应用程序的标志,范围从电子商务和银行到在线购物。通常,这些应用程序的核心数据库是性能瓶颈。考虑到数据库可用的资源有限,事务执行时间可能会因为它们竞争和等待关键资源而变化很大。由于竞争对手“只需点击一下”,因此必须通过QoS和事务优先级确保有价值(高优先级)的用户始终保持良好的性能。本文对传统数据库系统中事务性工作负载的优先级进行了分析和提出。这项工作首先对各种配置下商业和非商业DBMS (IBM DB2、Post-greSQL、Shore)上事务性工作负载的资源使用情况进行了详细的瓶颈分析。其次,本工作在PostgreSQL和Shore中实现并评估了几种抢占式和非抢占式DBMS优先级策略的性能。这项工作的主要贡献包括:(i)理解事务性DBMS工作负载中的瓶颈资源,(ii)证明传统DBMS中的优先级可以使用简单的调度策略为高优先级事务提供2 -5倍的改进,而不会对低优先级事务造成损失。
{"title":"Priority mechanisms for OLTP and transactional Web applications","authors":"David T. McWherter, Bianca Schroeder, A. Ailamaki, Mor Harchol-Balter","doi":"10.1109/ICDE.2004.1320025","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320025","url":null,"abstract":"Transactional workloads are a hallmark of modern OLTP and Web applications, ranging from electronic commerce and banking to online shopping. Often, the database at the core of these applications is the performance bottleneck. Given the limited resources available to the database, transaction execution times can vary wildly as they compete and wait for critical resources. As the competitor is \"only a click away\", valuable (high-priority) users must be ensured consistently good performance via QoS and transaction prioritization. This paper analyzes and proposes prioritization for transactional workloads in traditional database systems (DBMS). This work first performs a detailed bottleneck analysis of resource usage by transactional workloads on commercial and noncommercial DBMS (IBM DB2, Post-greSQL, Shore) under a range of configurations. Second, this work implements and evaluates the performance of several preemptive and nonpreemptive DBMS prioritization policies in PostgreSQL and Shore. The primary contributions of this work include (i) understanding the bottleneck resources in transactional DBMS workloads and (ii) a demonstration that prioritization in traditional DBMS can provide 2x-5x improvement for high-priority transactions using simple scheduling policies, without expense to low-priority transactions.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129720236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
On the integration of structure indexes and inverted lists 论结构索引与倒排表的集成
Pub Date : 2004-03-30 DOI: 10.1145/1007568.1007656
R. Kaushik, R. Krishnamurthy, J. Naughton, R. Ramakrishnan
Recently, there has been a great deal of interest in the development of techniques to evaluate path expressions over collections of XML documents. In general, these path expressions contain both structural and keyword components. Several methods have been proposed for processing path expressions over graph/tree-structured XML data. These methods can be classified into two broad classes. The first involves graph traversal where the input query is evaluated by traversing the data graph or some compressed representation. The other class involves information-retrieval style processing using inverted lists. In this framework, structure indexes have been proposed to be used as a substitute for graph traversal. Here, we focus on a subclass of CAS queries consisting of simple path expressions. We study algorithmic issues in integrating structure indexes with inverted lists for the evaluation of these queries, where we rank all documents that match the query and return the top k documents in order of relevance.
最近,人们对开发在XML文档集合上计算路径表达式的技术产生了浓厚的兴趣。通常,这些路径表达式包含结构组件和关键字组件。已经提出了几种处理图/树结构XML数据上的路径表达式的方法。这些方法可以分为两大类。第一种方法涉及图遍历,其中通过遍历数据图或某些压缩表示来计算输入查询。另一类涉及使用倒排表的信息检索样式处理。在这个框架中,结构索引被提议用来代替图遍历。在这里,我们关注由简单路径表达式组成的CAS查询的一个子类。我们研究了将结构索引与倒排列表集成以评估这些查询的算法问题,其中我们对匹配查询的所有文档进行排序,并按相关性顺序返回前k个文档。
{"title":"On the integration of structure indexes and inverted lists","authors":"R. Kaushik, R. Krishnamurthy, J. Naughton, R. Ramakrishnan","doi":"10.1145/1007568.1007656","DOIUrl":"https://doi.org/10.1145/1007568.1007656","url":null,"abstract":"Recently, there has been a great deal of interest in the development of techniques to evaluate path expressions over collections of XML documents. In general, these path expressions contain both structural and keyword components. Several methods have been proposed for processing path expressions over graph/tree-structured XML data. These methods can be classified into two broad classes. The first involves graph traversal where the input query is evaluated by traversing the data graph or some compressed representation. The other class involves information-retrieval style processing using inverted lists. In this framework, structure indexes have been proposed to be used as a substitute for graph traversal. Here, we focus on a subclass of CAS queries consisting of simple path expressions. We study algorithmic issues in integrating structure indexes with inverted lists for the evaluation of these queries, where we rank all documents that match the query and return the top k documents in order of relevance.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121910639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 179
Detection and correction of conflicting source updates for view maintenance 为视图维护检测和纠正冲突的源更新
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320017
Songting Chen, Jun Chen, Xin Zhang, Elke A. Rundensteiner
Data integration over multiple heterogeneous data sources has become increasingly important for modern applications. The integrated data is usually stored in materialized views for high availability and better performance. Such views must be maintained after the data sources change. In a loosely-coupled and dynamic environment, such as the Data Grid, the sources may autonomously change not only their data but also their schema, query capabilities or semantics, which may consequently cause the ongoing view maintenance fail. We analyze the maintenance errors and classify them into different classes of dependencies. We then propose several dependency detection and correction algorithms to handle these new classes of concurrency. Our techniques are not tied to specific maintenance algorithms nor to a particular data model. To our knowledge, this is the first complete solution to the view maintenance concurrency problems for both data and schema changes. We have implemented the proposed solutions and experimentally evaluated the impact of anomalies on maintenance performance and trade-offs between different dependency detection algorithms.
在现代应用程序中,跨多个异构数据源的数据集成变得越来越重要。集成的数据通常存储在物化视图中,以获得高可用性和更好的性能。这些视图必须在数据源更改后维护。在松散耦合的动态环境中(例如Data Grid),数据源不仅可以自主地更改数据,还可以自主地更改模式、查询功能或语义,这可能导致正在进行的视图维护失败。我们分析了维护错误,并将它们划分为不同的依赖类。然后,我们提出了几种依赖检测和校正算法来处理这些新的并发类。我们的技术不依赖于特定的维护算法或特定的数据模型。据我们所知,这是针对数据和模式更改的视图维护并发性问题的第一个完整解决方案。我们已经实现了提出的解决方案,并通过实验评估了异常对维护性能的影响以及不同依赖检测算法之间的权衡。
{"title":"Detection and correction of conflicting source updates for view maintenance","authors":"Songting Chen, Jun Chen, Xin Zhang, Elke A. Rundensteiner","doi":"10.1109/ICDE.2004.1320017","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320017","url":null,"abstract":"Data integration over multiple heterogeneous data sources has become increasingly important for modern applications. The integrated data is usually stored in materialized views for high availability and better performance. Such views must be maintained after the data sources change. In a loosely-coupled and dynamic environment, such as the Data Grid, the sources may autonomously change not only their data but also their schema, query capabilities or semantics, which may consequently cause the ongoing view maintenance fail. We analyze the maintenance errors and classify them into different classes of dependencies. We then propose several dependency detection and correction algorithms to handle these new classes of concurrency. Our techniques are not tied to specific maintenance algorithms nor to a particular data model. To our knowledge, this is the first complete solution to the view maintenance concurrency problems for both data and schema changes. We have implemented the proposed solutions and experimentally evaluated the impact of anomalies on maintenance performance and trade-offs between different dependency detection algorithms.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128253640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Lazy database replication with ordering guarantees 具有排序保证的延迟数据库复制
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320016
Khuzaima S. Daudjee, K. Salem
Lazy replication is a popular technique for improving the performance and availability of database systems. Although there are concurrency control techniques, which guarantee serializability in lazy replication systems, these techniques result in undesirable transaction orderings. Since transactions may see stale data, they may be serialized in an order different from the one in which they were submitted. Strong serializability avoids such problems, but it is very costly to implement. We propose a generalized form of strong serializability that is suitable for use with lazy replication. In addition to having many of the advantages of strong serializability, it can be implemented more efficiently. We show how generalized strong serializability can be implemented in a lazy replication system, and we present the results of a simulation study that quantifies the strengths and limitations of the approach.
惰性复制是提高数据库系统性能和可用性的一种流行技术。尽管存在并发控制技术,可以保证延迟复制系统中的可序列化性,但这些技术会导致不希望出现的事务顺序。由于事务可能会看到陈旧的数据,因此它们可能会以与提交时不同的顺序进行序列化。强序列化性避免了这类问题,但实现起来代价很高。我们提出了一种适用于延迟复制的强序列化性的广义形式。除了具有强序列化性的许多优点外,它还可以更有效地实现。我们展示了如何在惰性复制系统中实现广义强序列化性,并给出了量化该方法的优点和局限性的模拟研究结果。
{"title":"Lazy database replication with ordering guarantees","authors":"Khuzaima S. Daudjee, K. Salem","doi":"10.1109/ICDE.2004.1320016","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320016","url":null,"abstract":"Lazy replication is a popular technique for improving the performance and availability of database systems. Although there are concurrency control techniques, which guarantee serializability in lazy replication systems, these techniques result in undesirable transaction orderings. Since transactions may see stale data, they may be serialized in an order different from the one in which they were submitted. Strong serializability avoids such problems, but it is very costly to implement. We propose a generalized form of strong serializability that is suitable for use with lazy replication. In addition to having many of the advantages of strong serializability, it can be implemented more efficiently. We show how generalized strong serializability can be implemented in a lazy replication system, and we present the results of a simulation study that quantifies the strengths and limitations of the approach.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130912794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
Range cube: efficient cube computation by exploiting data correlation 范围立方体:利用数据相关性进行高效的立方体计算
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320035
Ying Feng, D. Agrawal, A. E. Abbadi, Ahmed A. Metwally
Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. We introduce range cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.
数据多维数据集计算和表示在时间和空间上都非常昂贵。以前的工作主要集中在减少计算时间或压缩数据立方体的表示。我们引入了范围立方作为一种有效的方法来计算和压缩数据立方而不损失任何精度。采用一种新的数据结构range trie来压缩和识别属性值之间的相关性,并对输入数据集进行压缩,有效地降低了计算成本。范围立方算法生成一个压缩的立方体,称为范围立方,它将所有单元划分为不相交的范围。每个区域表示具有相同聚合值的单元格子集,作为与输入数据元组具有相同维数的元组。范围多维数据集保留数据多维数据集的上卷/下钻语义。与h -立方相比,在真实数据集上的实验表明,当两种算法以其首选维度顺序运行时,运行时间小于三十分之一,仍然生成小于完整立方体空间的九分之一的范围立方体。在合成数据上,范围立方显示出更好的可伸缩性,以及对数据稀疏性和倾斜的更高适应性。
{"title":"Range cube: efficient cube computation by exploiting data correlation","authors":"Ying Feng, D. Agrawal, A. E. Abbadi, Ahmed A. Metwally","doi":"10.1109/ICDE.2004.1320035","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320035","url":null,"abstract":"Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. We introduce range cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130125345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
A flexible infrastructure for gathering XML statistics and estimating query cardinality 用于收集XML统计信息和估计查询基数的灵活基础设施
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320085
J. Freire, Maya Ramanath, Lingzhi Zhang
A key component of XML data management systems is the result size estimator, which estimates the cardinalities of user queries. Estimated cardinalities are needed in a variety of tasks, including query optimization and cost-based storage design; and they can also be used to give users early feedback about the expected outcome of their queries. In contrast to previously proposed result estimators, which use specialized data structures and estimation algorithms, StatiX uses histograms to uniformly capture both the structural and value skew present in documents. The original version of StatiX was built as a proof of concept. With the goal of making the system publicly available, we have built StatiX++, a new and improved version of StatiX, which extends the original system in significant ways. In this demonstration, we show the key features of StatiX++.
XML数据管理系统的一个关键组件是结果大小估计器,它估计用户查询的基数。各种任务都需要估计基数,包括查询优化和基于成本的存储设计;它们还可以用于向用户提供有关其查询预期结果的早期反馈。与之前提出的使用专门数据结构和估计算法的结果估计器不同,StatiX使用直方图来统一捕获文档中存在的结构和值偏差。最初版本的StatiX是作为概念验证而构建的。为了使系统公开可用,我们构建了StatiX++,这是StatiX的一个新的改进版本,它在很大程度上扩展了原来的系统。在这个演示中,我们将展示StatiX++的关键特性。
{"title":"A flexible infrastructure for gathering XML statistics and estimating query cardinality","authors":"J. Freire, Maya Ramanath, Lingzhi Zhang","doi":"10.1109/ICDE.2004.1320085","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320085","url":null,"abstract":"A key component of XML data management systems is the result size estimator, which estimates the cardinalities of user queries. Estimated cardinalities are needed in a variety of tasks, including query optimization and cost-based storage design; and they can also be used to give users early feedback about the expected outcome of their queries. In contrast to previously proposed result estimators, which use specialized data structures and estimation algorithms, StatiX uses histograms to uniformly capture both the structural and value skew present in documents. The original version of StatiX was built as a proof of concept. With the goal of making the system publicly available, we have built StatiX++, a new and improved version of StatiX, which extends the original system in significant ways. In this demonstration, we show the key features of StatiX++.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"84 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120883441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Approximate aggregation techniques for sensor databases 传感器数据库的近似聚合技术
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320018
Jeffrey Considine, Feifei Li, G. Kollios, J. Byers
In the emerging area of sensor-based systems, a significant challenge is to develop scalable, fault-tolerant methods to extract useful information from the data the sensors collect. An approach to this data management problem is the use of sensor database systems, exemplified by TinyDB and Cougar, which allow users to perform aggregation queries such as MIN, COUNT and AVG on a sensor network. Due to power and range constraints, centralized approaches are generally impractical, so most systems use in-network aggregation to reduce network traffic. However, these aggregation strategies become bandwidth-intensive when combined with the fault-tolerant, multipath routing methods often used in these environments. For example, duplicate-sensitive aggregates such as SUM cannot be computed exactly using substantially less bandwidth than explicit enumeration. To avoid this expense, we investigate the use of approximate in-network aggregation using small sketches. Our contributions are as follows: 1) we generalize well known duplicate-insensitive sketches for approximating COUNT to handle SUM, 2) we present and analyze methods for using sketches to produce accurate results with low communication and computation overhead, and 3) we present an extensive experimental validation of our methods.
在基于传感器的系统的新兴领域,一个重大的挑战是开发可扩展的、容错的方法,从传感器收集的数据中提取有用的信息。解决这个数据管理问题的一种方法是使用传感器数据库系统,例如TinyDB和Cougar,它们允许用户在传感器网络上执行聚合查询,如MIN, COUNT和AVG。由于功率和范围的限制,集中式方法通常是不切实际的,因此大多数系统使用网络内聚合来减少网络流量。然而,当与这些环境中经常使用的容错、多路径路由方法结合使用时,这些聚合策略会变得带宽密集。例如,使用比显式枚举少得多的带宽,不能精确计算SUM等对重复敏感的聚合。为了避免这种开销,我们使用小草图研究了近似网络内聚合的使用。我们的贡献如下:1)我们推广了众所周知的重复不敏感草图,用于近似COUNT来处理SUM; 2)我们提出并分析了使用草图以低通信和计算开销产生准确结果的方法;3)我们对我们的方法进行了广泛的实验验证。
{"title":"Approximate aggregation techniques for sensor databases","authors":"Jeffrey Considine, Feifei Li, G. Kollios, J. Byers","doi":"10.1109/ICDE.2004.1320018","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320018","url":null,"abstract":"In the emerging area of sensor-based systems, a significant challenge is to develop scalable, fault-tolerant methods to extract useful information from the data the sensors collect. An approach to this data management problem is the use of sensor database systems, exemplified by TinyDB and Cougar, which allow users to perform aggregation queries such as MIN, COUNT and AVG on a sensor network. Due to power and range constraints, centralized approaches are generally impractical, so most systems use in-network aggregation to reduce network traffic. However, these aggregation strategies become bandwidth-intensive when combined with the fault-tolerant, multipath routing methods often used in these environments. For example, duplicate-sensitive aggregates such as SUM cannot be computed exactly using substantially less bandwidth than explicit enumeration. To avoid this expense, we investigate the use of approximate in-network aggregation using small sketches. Our contributions are as follows: 1) we generalize well known duplicate-insensitive sketches for approximating COUNT to handle SUM, 2) we present and analyze methods for using sketches to produce accurate results with low communication and computation overhead, and 3) we present an extensive experimental validation of our methods.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123052900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 628
Peering and querying e-catalog communities 对等和查询电子目录社区
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320076
B. Benatallah, Mohand-Said Hacid, Hye-young Paik, Christophe Rey, F. Toumani
More and more suppliers are offering access to their product or information portals (also called e-catalogs) via the Web. The key issue is how to efficiently integrate and query large, intricate, heterogeneous information sources such as e-catalogs. Traditional data integration approach, where the development of an integrated schema requires the understanding of both structure and semantics of all schemas of sources to be integrated, is hardly applicable because of the dynamic nature and size of the Web. We present WS-CatalogNet: a Web services based data sharing middleware infrastructure whose aims is to enhance the potential of e-catalogs by focusing on scalability and flexible aspects of their sharing and access.
越来越多的供应商通过Web提供对其产品或信息门户(也称为电子目录)的访问。关键问题是如何有效地集成和查询大型、复杂、异构的信息源,如电子目录。在传统的数据集成方法中,集成模式的开发需要理解要集成的源的所有模式的结构和语义,由于Web的动态性和规模,这种方法几乎不适用。我们提出WS-CatalogNet:一个基于Web服务的数据共享中间件基础设施,其目标是通过关注电子目录共享和访问的可伸缩性和灵活性方面来增强电子目录的潜力。
{"title":"Peering and querying e-catalog communities","authors":"B. Benatallah, Mohand-Said Hacid, Hye-young Paik, Christophe Rey, F. Toumani","doi":"10.1109/ICDE.2004.1320076","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320076","url":null,"abstract":"More and more suppliers are offering access to their product or information portals (also called e-catalogs) via the Web. The key issue is how to efficiently integrate and query large, intricate, heterogeneous information sources such as e-catalogs. Traditional data integration approach, where the development of an integrated schema requires the understanding of both structure and semantics of all schemas of sources to be integrated, is hardly applicable because of the dynamic nature and size of the Web. We present WS-CatalogNet: a Web services based data sharing middleware infrastructure whose aims is to enhance the potential of e-catalogs by focusing on scalability and flexible aspects of their sharing and access.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122829902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Applications for expression data in relational database systems 表达式数据在关系数据库系统中的应用
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320031
D. Gawlick, Dmitry Lenkov, Aravind Yalamanchi, L. Chernobrod
The support for the expression data type in a relational database system allows storing of conditional expressions as data in database tables and evaluating them using SQL queries. In the context of this new capability, expressions can be interpreted as descriptions, queries, and filters, and this significantly broadens the use of a relational database system to support new types of applications. The paper presents an overview of the expression data type, relates expressions to descriptions, queries, and filters, considers applications pertaining to information distribution, demand analysis, and task assignment, and shows how these applications can be easily supported with improved functionality.
关系数据库系统中对表达式数据类型的支持允许将条件表达式作为数据存储在数据库表中,并使用SQL查询对其求值。在这个新功能的上下文中,表达式可以被解释为描述、查询和过滤器,这极大地扩展了关系数据库系统的使用范围,以支持新的应用程序类型。本文概述了表达式数据类型,将表达式与描述、查询和过滤器联系起来,考虑了与信息分发、需求分析和任务分配相关的应用程序,并展示了如何通过改进的功能轻松支持这些应用程序。
{"title":"Applications for expression data in relational database systems","authors":"D. Gawlick, Dmitry Lenkov, Aravind Yalamanchi, L. Chernobrod","doi":"10.1109/ICDE.2004.1320031","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320031","url":null,"abstract":"The support for the expression data type in a relational database system allows storing of conditional expressions as data in database tables and evaluating them using SQL queries. In the context of this new capability, expressions can be interpreted as descriptions, queries, and filters, and this significantly broadens the use of a relational database system to support new types of applications. The paper presents an overview of the expression data type, relates expressions to descriptions, queries, and filters, considers applications pertaining to information distribution, demand analysis, and task assignment, and shows how these applications can be easily supported with improved functionality.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116986565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
Proceedings. 20th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1