首页 > 最新文献

21st International Conference on Data Engineering (ICDE'05)最新文献

英文 中文
Network-based problem detection for distributed systems 分布式系统基于网络的问题检测
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.93
H. Kashima, T. Tsumura, T. Idé, Takahide Nogayama, R. Hirade, H. Etoh, T. Fukuda
We introduce a network-based problem detection framework for distributed systems, which includes a data-mining method for discovering dynamic dependencies among distributed services from transaction data collected from network, and a novel problem detection method based on the discovered dependencies. From observed containments of transaction execution time periods, we estimate the probabilities of accidental and non-accidental containments, and build a competitive model for discovering direct dependencies by using a model estimation method based on the online EM algorithm. Utilizing the discovered dependency information, we also propose a hierarchical problem detection framework, where microscopic dependency information is incorporated with a macroscopic anomaly metric that monitors the behavior of the system as a whole. This feature is made possible by employing a network-based design which provides overall information of the system without any impact on the performance.
介绍了一种基于网络的分布式系统问题检测框架,该框架包括一种从网络中收集的事务数据中发现分布式服务之间动态依赖关系的数据挖掘方法,以及一种基于发现的依赖关系的新型问题检测方法。根据观察到的事务执行时间段的包含,我们估计了偶然和非偶然包含的概率,并使用基于在线EM算法的模型估计方法建立了发现直接依赖关系的竞争模型。利用发现的依赖信息,我们还提出了一个分层问题检测框架,其中微观依赖信息与宏观异常度量相结合,监视整个系统的行为。该特性是通过采用基于网络的设计实现的,该设计提供了系统的总体信息,而不会对性能产生任何影响。
{"title":"Network-based problem detection for distributed systems","authors":"H. Kashima, T. Tsumura, T. Idé, Takahide Nogayama, R. Hirade, H. Etoh, T. Fukuda","doi":"10.1109/ICDE.2005.93","DOIUrl":"https://doi.org/10.1109/ICDE.2005.93","url":null,"abstract":"We introduce a network-based problem detection framework for distributed systems, which includes a data-mining method for discovering dynamic dependencies among distributed services from transaction data collected from network, and a novel problem detection method based on the discovered dependencies. From observed containments of transaction execution time periods, we estimate the probabilities of accidental and non-accidental containments, and build a competitive model for discovering direct dependencies by using a model estimation method based on the online EM algorithm. Utilizing the discovered dependency information, we also propose a hierarchical problem detection framework, where microscopic dependency information is incorporated with a macroscopic anomaly metric that monitors the behavior of the system as a whole. This feature is made possible by employing a network-based design which provides overall information of the system without any impact on the performance.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131081618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
VLEI code: an efficient labeling method for handling XML documents in an RDB VLEI代码:在RDB中处理XML文档的有效标记方法
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.153
K. Kobayashi, Wenxin Liang, D. Kobayashi, Akitsugu Watanabe, H. Yokota
A number of XML labeling methods have been proposed to store XML documents in relational databases. However, they have a vulnerable point, in insertion operations. We propose the variable length endless insertable (VLEI) code and apply it to XML labeling to reduce the cost of insertion operations. Results of our experiments indicate that a combination of the VLEI code and Dewey order is effective for handling skewed insertions.
为了在关系数据库中存储XML文档,已经提出了许多XML标记方法。然而,它们在插入操作中有一个弱点。我们提出了可变长度无限插入(VLEI)代码,并将其应用于XML标记,以减少插入操作的成本。我们的实验结果表明,VLEI码和杜威顺序的组合对于处理倾斜插入是有效的。
{"title":"VLEI code: an efficient labeling method for handling XML documents in an RDB","authors":"K. Kobayashi, Wenxin Liang, D. Kobayashi, Akitsugu Watanabe, H. Yokota","doi":"10.1109/ICDE.2005.153","DOIUrl":"https://doi.org/10.1109/ICDE.2005.153","url":null,"abstract":"A number of XML labeling methods have been proposed to store XML documents in relational databases. However, they have a vulnerable point, in insertion operations. We propose the variable length endless insertable (VLEI) code and apply it to XML labeling to reduce the cost of insertion operations. Results of our experiments indicate that a combination of the VLEI code and Dewey order is effective for handling skewed insertions.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116275707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
SEA-CNN: scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases SEA-CNN:时空数据库中连续k近邻查询的可扩展处理
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.128
Xiaopeng Xiong, M. Mokbel, Walid G. Aref
Location-aware environments are characterized by a large number of objects and a large number of continuous queries. Both the objects and continuous queries may change their locations over time. In this paper, we focus on continuous k-nearest neighbor queries (CKNN, for short). We present a new algorithm, termed SEA-CNN, for answering continuously a collection of concurrent CKNN queries. SEA-CNN has two important features: incremental evaluation and shared execution. SEA-CNN achieves both efficiency and scalability in the presence of a set of concurrent queries. Furthermore, SEA-CNN does not make any assumptions about the movement of objects, e.g., the objects velocities and shapes of trajectories, or about the mutability of the objects and/or the queries, i.e., moving or stationary queries issued on moving or stationary objects. We provide theoretical analysis of SEA-CNN with respect to the execution costs, memory requirements and effects of tunable parameters. Comprehensive experimentation shows that SEA-CNN is highly scalable and is more efficient in terms of both I/O and CPU costs in comparison to other R-tree-based CKNN techniques.
位置感知环境具有大量对象和大量连续查询的特点。对象和连续查询都可能随时间改变它们的位置。在本文中,我们主要研究连续k近邻查询(CKNN,简称)。我们提出了一种新的算法,称为SEA-CNN,用于连续回答并发CKNN查询的集合。SEA-CNN有两个重要的特点:增量评估和共享执行。SEA-CNN在存在一组并发查询的情况下实现了效率和可扩展性。此外,SEA-CNN没有对物体的运动进行任何假设,例如物体的速度和轨迹形状,也没有对物体和/或查询的可变性进行任何假设,例如,对运动或静止的物体发出的移动或静止的查询。我们从执行成本、内存需求和可调参数的影响等方面对SEA-CNN进行了理论分析。综合实验表明,与其他基于r树的CKNN技术相比,SEA-CNN具有高度可扩展性,并且在I/O和CPU成本方面更高效。
{"title":"SEA-CNN: scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases","authors":"Xiaopeng Xiong, M. Mokbel, Walid G. Aref","doi":"10.1109/ICDE.2005.128","DOIUrl":"https://doi.org/10.1109/ICDE.2005.128","url":null,"abstract":"Location-aware environments are characterized by a large number of objects and a large number of continuous queries. Both the objects and continuous queries may change their locations over time. In this paper, we focus on continuous k-nearest neighbor queries (CKNN, for short). We present a new algorithm, termed SEA-CNN, for answering continuously a collection of concurrent CKNN queries. SEA-CNN has two important features: incremental evaluation and shared execution. SEA-CNN achieves both efficiency and scalability in the presence of a set of concurrent queries. Furthermore, SEA-CNN does not make any assumptions about the movement of objects, e.g., the objects velocities and shapes of trajectories, or about the mutability of the objects and/or the queries, i.e., moving or stationary queries issued on moving or stationary objects. We provide theoretical analysis of SEA-CNN with respect to the execution costs, memory requirements and effects of tunable parameters. Comprehensive experimentation shows that SEA-CNN is highly scalable and is more efficient in terms of both I/O and CPU costs in comparison to other R-tree-based CKNN techniques.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131822851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 336
Data Triage: an adaptive architecture for load shedding in TelegraphCQ 数据分类:在电报cq中用于减载的自适应架构
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.44
Frederick Reiss, J. Hellerstein
Many of the data sources used in stream query processing are known to exhibit bursty behavior. Data in a burst often has different characteristics than steady-state data, and therefore may be of particular interest. In this paper, we describe the Data Triage architecture that we are adding to TelegraphCQ to provide low latency results with good accuracy under such bursts.
流查询处理中使用的许多数据源都表现出突发行为。突发中的数据通常具有与稳态数据不同的特征,因此可能会引起特别的兴趣。在本文中,我们描述了我们添加到TelegraphCQ中的数据分类架构,以在这种突发情况下提供低延迟和高准确性的结果。
{"title":"Data Triage: an adaptive architecture for load shedding in TelegraphCQ","authors":"Frederick Reiss, J. Hellerstein","doi":"10.1109/ICDE.2005.44","DOIUrl":"https://doi.org/10.1109/ICDE.2005.44","url":null,"abstract":"Many of the data sources used in stream query processing are known to exhibit bursty behavior. Data in a burst often has different characteristics than steady-state data, and therefore may be of particular interest. In this paper, we describe the Data Triage architecture that we are adding to TelegraphCQ to provide low latency results with good accuracy under such bursts.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"94 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131879689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Vectorizing and querying large XML repositories 向量化和查询大型XML存储库
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.150
P. Buneman, Byron Choi, W. Fan, R. Hutchison, Robert Mann, Stratis Viglas
Vertical partitioning is a well-known technique for optimizing query performance in relational databases. An extreme form of this technique, which we call vectorization, is to store each column separately. We use a generalization of vectorization as the basis for a native XML store. The idea is to decompose an XML document into a set of vectors that contain the data values and a compressed skeleton that describes the structure. In order to query this representation and produce results in the same vectorized format, we consider a practical fragment of XQuery and introduce the notion of query graphs and a novel graph reduction algorithm that allows us to leverage relational optimization techniques as well as to reduce the unnecessary loading of data vectors and decompression of skeletons. A preliminary experimental study based on some scientific and synthetic XML data repositories in the order of gigabytes supports the claim that these techniques are scalable and have the potential to provide performance comparable with established relational database technology.
垂直分区是一种在关系数据库中优化查询性能的著名技术。这种技术的一种极端形式是分别存储每一列,我们称之为向量化。我们使用向量化的泛化作为原生XML存储的基础。其思想是将XML文档分解为一组包含数据值的向量和描述结构的压缩骨架。为了查询这种表示并以相同的向量化格式生成结果,我们考虑了XQuery的一个实际片段,并引入了查询图的概念和一种新的图约简算法,该算法允许我们利用关系优化技术以及减少不必要的数据向量加载和骨架解压缩。基于一些千兆字节量级的科学的和合成的XML数据存储库的初步实验研究支持这样一种说法,即这些技术是可伸缩的,并且有可能提供与已建立的关系数据库技术相当的性能。
{"title":"Vectorizing and querying large XML repositories","authors":"P. Buneman, Byron Choi, W. Fan, R. Hutchison, Robert Mann, Stratis Viglas","doi":"10.1109/ICDE.2005.150","DOIUrl":"https://doi.org/10.1109/ICDE.2005.150","url":null,"abstract":"Vertical partitioning is a well-known technique for optimizing query performance in relational databases. An extreme form of this technique, which we call vectorization, is to store each column separately. We use a generalization of vectorization as the basis for a native XML store. The idea is to decompose an XML document into a set of vectors that contain the data values and a compressed skeleton that describes the structure. In order to query this representation and produce results in the same vectorized format, we consider a practical fragment of XQuery and introduce the notion of query graphs and a novel graph reduction algorithm that allows us to leverage relational optimization techniques as well as to reduce the unnecessary loading of data vectors and decompression of skeletons. A preliminary experimental study based on some scientific and synthetic XML data repositories in the order of gigabytes supports the claim that these techniques are scalable and have the potential to provide performance comparable with established relational database technology.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132385164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Spatiotemporal annotation graph (STAG): a data model for composite digital objects 时空注释图(STAG):复合数字对象的数据模型
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.136
S. Yamini, Amarnath Gupta
In this demonstration, we present a database over complex documents, which, in addition to a structured text content, also has update information, annotations, and embedded objects. We propose a new data model called spatiotemporal annotation graphs (STAG) for a database of composite digital objects and present a system that shows a query language to efficiently and effectively query such database. The particular application to be demonstrated is a database over annotated MS Word and PowerPoint presentations with embedded multimedia objects.
在这个演示中,我们展示了一个包含复杂文档的数据库,除了结构化文本内容外,它还具有更新信息、注释和嵌入对象。针对复合数字对象数据库,提出了一种新的时空标注图(STAG)数据模型,并给出了一个系统,该系统提供了一种查询语言,能够高效地查询复合数字对象数据库。要演示的特定应用程序是带有嵌入式多媒体对象的带有注释的MS Word和PowerPoint演示文稿的数据库。
{"title":"Spatiotemporal annotation graph (STAG): a data model for composite digital objects","authors":"S. Yamini, Amarnath Gupta","doi":"10.1109/ICDE.2005.136","DOIUrl":"https://doi.org/10.1109/ICDE.2005.136","url":null,"abstract":"In this demonstration, we present a database over complex documents, which, in addition to a structured text content, also has update information, annotations, and embedded objects. We propose a new data model called spatiotemporal annotation graphs (STAG) for a database of composite digital objects and present a system that shows a query language to efficiently and effectively query such database. The particular application to be demonstrated is a database over annotated MS Word and PowerPoint presentations with embedded multimedia objects.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127297992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scrutinizing frequent pattern discovery performance 仔细检查频繁的模式发现性能
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.127
Osmar R Zaiane, Mohammad El-Hajj, Yi Li, S. Luk
Benchmarking technical solutions is as important as the solutions themselves. Yet many fields still lack any type of rigorous evaluation. Performance benchmarking has always been an important issue in databases and has played a significant role in the development, deployment and adoption of technologies. To help assessing the myriad algorithms for frequent itemset mining, we built an open framework and testbed to analytically study the performance of different algorithms and their implementations, and contrast their achievements given different data characteristics, different conditions, and different types of patterns to discover and their constraints. This facilitates reporting consistent and reproducible performance results using known conditions.
对技术解决方案进行基准测试与解决方案本身一样重要。然而,许多领域仍然缺乏任何形式的严格评估。性能基准测试一直是数据库中的一个重要问题,并且在技术的开发、部署和采用中发挥了重要作用。为了帮助评估频繁项集挖掘的各种算法,我们构建了一个开放的框架和测试平台,分析研究了不同算法及其实现的性能,并在不同的数据特征、不同的条件、不同类型的模式发现及其约束的情况下,对比了它们的成果。这有助于使用已知条件报告一致且可重复的性能结果。
{"title":"Scrutinizing frequent pattern discovery performance","authors":"Osmar R Zaiane, Mohammad El-Hajj, Yi Li, S. Luk","doi":"10.1109/ICDE.2005.127","DOIUrl":"https://doi.org/10.1109/ICDE.2005.127","url":null,"abstract":"Benchmarking technical solutions is as important as the solutions themselves. Yet many fields still lack any type of rigorous evaluation. Performance benchmarking has always been an important issue in databases and has played a significant role in the development, deployment and adoption of technologies. To help assessing the myriad algorithms for frequent itemset mining, we built an open framework and testbed to analytically study the performance of different algorithms and their implementations, and contrast their achievements given different data characteristics, different conditions, and different types of patterns to discover and their constraints. This facilitates reporting consistent and reproducible performance results using known conditions.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127412607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Maintaining implicated statistics in constrained environments 在受约束的环境中维护隐含的统计信息
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.84
Yannis Sismanis, N. Roussopoulos
Aggregated information regarding implicated entities is critical for online applications like network management, traffic characterization or identifying patters of resource consumption. Recently there has been a flurry of research for online aggregation on streams (like quantiles, hot items, hierarchical heavy hitters) but surprisingly the problem of summarizing implicated information in stream data has received no attention. As an example, consider an IP-network and the implication source /spl rarr/ destination. Flash crowds - such as those that follow recent sport events (like the Olympics) or seek information regarding catastrophic events - or denial of service attacks direct a large volume of traffic from a huge number of sources to a very small number of destinations. In this paper we present novel randomized algorithms for monitoring such implications with constraints in both memory and processing power for environments like network routers. Our experiments demonstrate several factors of improvements over straightforward approaches.
有关相关实体的聚合信息对于网络管理、流量表征或识别资源消耗模式等在线应用程序至关重要。最近有大量关于流的在线聚合的研究(比如分位数、热点项、分层重磅),但令人惊讶的是,汇总流数据中隐含信息的问题却没有受到关注。例如,考虑一个ip网络和隐含的source /spl rarr/ destination。快速人群——比如那些关注最近的体育赛事(比如奥运会)或寻找有关灾难性事件信息的人群——或拒绝服务攻击将大量流量从大量来源引导到极少数目的地。在本文中,我们提出了一种新颖的随机算法,用于监控诸如网络路由器等环境中具有内存和处理能力约束的此类影响。我们的实验证明了几个因素比直接方法的改进。
{"title":"Maintaining implicated statistics in constrained environments","authors":"Yannis Sismanis, N. Roussopoulos","doi":"10.1109/ICDE.2005.84","DOIUrl":"https://doi.org/10.1109/ICDE.2005.84","url":null,"abstract":"Aggregated information regarding implicated entities is critical for online applications like network management, traffic characterization or identifying patters of resource consumption. Recently there has been a flurry of research for online aggregation on streams (like quantiles, hot items, hierarchical heavy hitters) but surprisingly the problem of summarizing implicated information in stream data has received no attention. As an example, consider an IP-network and the implication source /spl rarr/ destination. Flash crowds - such as those that follow recent sport events (like the Olympics) or seek information regarding catastrophic events - or denial of service attacks direct a large volume of traffic from a huge number of sources to a very small number of destinations. In this paper we present novel randomized algorithms for monitoring such implications with constraints in both memory and processing power for environments like network routers. Our experiments demonstrate several factors of improvements over straightforward approaches.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129028057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The XML stream query processor SPEX XML流查询处理器SPEX
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.141
François Bry, Fatih Coşkun, S. Durmaz, Tim Furche, Dan Olteanu, Markus Spannagel
Data streams are an emerging technology for data dissemination in cases where the data throughput or size makes it unfeasible to rely on the conventional approach based on storing the data before processing it. SPEX evaluates XPath queries against XML data streams. SPEX is built upon formal frameworks for (1) rewriting XPath queries into equivalent XPath queries without reverse axes and (2) correct query evaluation with polynomial combined complexity using networks of pushdown transducers. Such transducers are simple, independent, and can be connected in a flexible manner, thus allowing not only easy extensions but also extensive query optimization. Querying XML streams with SPEX consists in four steps: first, the input XPath query is rewritten into an XPath query without reverse axes. Second, the forward XPath query is compiled into a logical query plan abstracting out details of the concrete XPath syntax. Then, a physical query plan is generated by extending the logical query plan with operators for determination and collection of answers. In the last step, the XML stream is processed continuously with the physical query plan, and the output stream conveying the answers to the original query is generated progressively.
数据流是一种新兴的数据传播技术,在数据吞吐量或规模使得依赖基于在处理数据之前存储数据的传统方法不可行的情况下,数据流是一种新兴的数据传播技术。SPEX根据XML数据流计算XPath查询。SPEX建立在正式框架之上,用于(1)将XPath查询重写为没有反向轴的等价XPath查询,以及(2)使用下推传感器网络以多项式组合复杂度进行正确的查询评估。这种换能器简单、独立,并且可以灵活地连接,因此不仅可以方便地扩展,还可以进行广泛的查询优化。使用SPEX查询XML流包括四个步骤:首先,将输入XPath查询重写为不带反向轴的XPath查询。其次,将前向XPath查询编译成抽象出具体XPath语法细节的逻辑查询计划。然后,通过使用用于确定和收集答案的运算符扩展逻辑查询计划,生成物理查询计划。在最后一步中,使用物理查询计划连续处理XML流,并逐步生成传递原始查询答案的输出流。
{"title":"The XML stream query processor SPEX","authors":"François Bry, Fatih Coşkun, S. Durmaz, Tim Furche, Dan Olteanu, Markus Spannagel","doi":"10.1109/ICDE.2005.141","DOIUrl":"https://doi.org/10.1109/ICDE.2005.141","url":null,"abstract":"Data streams are an emerging technology for data dissemination in cases where the data throughput or size makes it unfeasible to rely on the conventional approach based on storing the data before processing it. SPEX evaluates XPath queries against XML data streams. SPEX is built upon formal frameworks for (1) rewriting XPath queries into equivalent XPath queries without reverse axes and (2) correct query evaluation with polynomial combined complexity using networks of pushdown transducers. Such transducers are simple, independent, and can be connected in a flexible manner, thus allowing not only easy extensions but also extensive query optimization. Querying XML streams with SPEX consists in four steps: first, the input XPath query is rewritten into an XPath query without reverse axes. Second, the forward XPath query is compiled into a logical query plan abstracting out details of the concrete XPath syntax. Then, a physical query plan is generated by extending the logical query plan with operators for determination and collection of answers. In the last step, the XML stream is processed continuously with the physical query plan, and the output stream conveying the answers to the original query is generated progressively.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129099049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
RDF aggregate queries and views RDF聚合查询和视图
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.121
E. Hung, Yu Deng, V. S. Subrahmanian
Resource description framework (RDF) is a rapidly expanding Web standard. RDF databases attempt to track the massive amounts of Web data and services available. In this paper, we study the problem of aggregate queries. We develop an algorithm to compute answers to aggregate queries over RDF databases and algorithms to maintain views involving those aggregates. Though RDF data can be stored in a standard relational DBMS (and hence we can execute standard relational aggregate queries and view maintenance methods on them), we show experimentally that our algorithms that operate directly on the RDF representation exhibit significantly superior performance.
资源描述框架(RDF)是一个正在迅速发展的Web标准。RDF数据库试图跟踪大量可用的Web数据和服务。在本文中,我们研究了聚合查询问题。我们开发了一种算法来计算RDF数据库上聚合查询的答案,并开发了一种算法来维护涉及这些聚合的视图。尽管RDF数据可以存储在标准的关系DBMS中(因此我们可以执行标准的关系聚合查询并在其上查看维护方法),但我们通过实验证明,直接在RDF表示上操作的算法表现出明显优越的性能。
{"title":"RDF aggregate queries and views","authors":"E. Hung, Yu Deng, V. S. Subrahmanian","doi":"10.1109/ICDE.2005.121","DOIUrl":"https://doi.org/10.1109/ICDE.2005.121","url":null,"abstract":"Resource description framework (RDF) is a rapidly expanding Web standard. RDF databases attempt to track the massive amounts of Web data and services available. In this paper, we study the problem of aggregate queries. We develop an algorithm to compute answers to aggregate queries over RDF databases and algorithms to maintain views involving those aggregates. Though RDF data can be stored in a standard relational DBMS (and hence we can execute standard relational aggregate queries and view maintenance methods on them), we show experimentally that our algorithms that operate directly on the RDF representation exhibit significantly superior performance.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129020495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
期刊
21st International Conference on Data Engineering (ICDE'05)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1