Multiresolution indexing of XML for frequent queries

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI:10.1109/ICDE.2004.1320037

Hao He, Jun Yang

{"title":"Multiresolution indexing of XML for frequent queries","authors":"Hao He, Jun Yang","doi":"10.1109/ICDE.2004.1320037","DOIUrl":null,"url":null,"abstract":"XML and other types of semistructured data are typically represented by a labeled directed graph. To speed up path expression queries over the graph, a variety of structural indexes have been proposed. They usually work by partitioning nodes in the data graph into equivalence classes and storing equivalence classes as index nodes. A(k)-index introduces the concept of local bisimilarity for partitioning, allowing the trade-off between index size and query answering power. However, all index nodes in A(k)-index have the same local similarity k, which cannot take advantage of the fact that a workload may contain path expressions of different lengths, or that different parts of the data graph may have different local similarity requirements. To overcome these limitations, we propose M(k)- and M*(k)-indexes. The basic M(k)-index is workload-aware: Like the previously proposed D(k)-index, it allows different index nodes to have different local similarity requirements, providing finer partitioning only for parts of the data graph targeted by longer path expressions. Unlike D(k)-index, M(k)-index is never over-refined for irrelevant index or data nodes. However, the workload-aware feature still incurs overrefinement due to over-qualified parent index nodes. Moreover, fine partitions penalize the performance of short path expressions. To solve these problems, we further propose the M*(k)-index. An M*(k)-index consists of a collection of indexes whose nodes are organized in a partition hierarchy, allowing successively coarser partitioning information to co-exist with the finest partitioning information required. Experiments show that our indexes are superior to previously proposed indexes in terms of index size and query performance.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"84","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 20th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2004.1320037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 84

Abstract

XML and other types of semistructured data are typically represented by a labeled directed graph. To speed up path expression queries over the graph, a variety of structural indexes have been proposed. They usually work by partitioning nodes in the data graph into equivalence classes and storing equivalence classes as index nodes. A(k)-index introduces the concept of local bisimilarity for partitioning, allowing the trade-off between index size and query answering power. However, all index nodes in A(k)-index have the same local similarity k, which cannot take advantage of the fact that a workload may contain path expressions of different lengths, or that different parts of the data graph may have different local similarity requirements. To overcome these limitations, we propose M(k)- and M*(k)-indexes. The basic M(k)-index is workload-aware: Like the previously proposed D(k)-index, it allows different index nodes to have different local similarity requirements, providing finer partitioning only for parts of the data graph targeted by longer path expressions. Unlike D(k)-index, M(k)-index is never over-refined for irrelevant index or data nodes. However, the workload-aware feature still incurs overrefinement due to over-qualified parent index nodes. Moreover, fine partitions penalize the performance of short path expressions. To solve these problems, we further propose the M*(k)-index. An M*(k)-index consists of a collection of indexes whose nodes are organized in a partition hierarchy, allowing successively coarser partitioning information to co-exist with the finest partitioning information required. Experiments show that our indexes are superior to previously proposed indexes in terms of index size and query performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于频繁查询的XML多分辨率索引

XML和其他类型的半结构化数据通常由标记的有向图表示。为了加快对图的路径表达式查询，提出了各种结构索引。它们通常通过将数据图中的节点划分为等价类并将等价类存储为索引节点来工作。A(k)-index为分区引入了局部双相似度的概念，允许在索引大小和查询应答能力之间进行权衡。但是，A(k)-index中的所有索引节点具有相同的局部相似度k，这无法利用工作负载可能包含不同长度的路径表达式，或者数据图的不同部分可能具有不同的局部相似度需求的事实。为了克服这些限制，我们提出了M(k)-和M*(k)-指标。基本的M(k)-索引是工作负载敏感的:与之前提出的D(k)-索引一样，它允许不同的索引节点具有不同的局部相似性需求，仅为更长的路径表达式所针对的数据图的部分提供更精细的分区。与D(k)-index不同，M(k)-index不会对不相关的索引或数据节点进行过度细化。但是，工作负载感知特性仍然会由于父索引节点的过度限定而导致过度细化。此外，细分区会影响短路径表达式的性能。为了解决这些问题，我们进一步提出M*(k)-指标。一个M*(k)-索引由一组索引组成，这些索引的节点被组织在一个分区层次结构中，允许连续较粗的分区信息与所需的最优分区信息共存。实验表明，我们的索引在索引大小和查询性能方面优于先前提出的索引。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings. 20th International Conference on Data Engineering

自引率

0.00%

发文量

期刊最新文献

ContextMetrics/sup /spl trade//: semantic and syntactic interoperability in cross-border trading systems EShopMonitor: a Web content monitoring tool A probabilistic approach to metasearching with adaptive probing Simple, robust and highly concurrent b-trees with node deletion Substructure clustering on sequential 3d object datasets