Counting twig matches in a tree

Proceedings 17th International Conference on Data Engineering Pub Date : 2001-04-02 DOI:10.1109/ICDE.2001.914874

Zhiyuan Chen, H. Jagadish, Flip Korn, Nick Koudas, S. Muthukrishnan, R. Ng, D. Srivastava

{"title":"Counting twig matches in a tree","authors":"Zhiyuan Chen, H. Jagadish, Flip Korn, Nick Koudas, S. Muthukrishnan, R. Ng, D. Srivastava","doi":"10.1109/ICDE.2001.914874","DOIUrl":null,"url":null,"abstract":"Describes efficient algorithms for accurately estimating the number of matches of a small node-labeled tree, i.e. a twig, in a large node-labeled tree, using a summary data structure. This problem is of interest for queries on XML and other hierarchical data, to provide query feedback and for cost-based query optimization. Our summary data structure scalably represents approximate frequency information about twiglets (i.e. small twigs) in the data tree. Given a twig query, the number of matches is estimated by creating a set of query twiglets, and combining two complementary approaches: set hashing, used to estimate the number of matches of each query twiglet, and maximal overlap, used to combine the query twiglet estimates into an estimate for the twig query. We propose several estimation algorithms that apply these approaches on query twiglets formed using variations on different twiglet decomposition techniques. We present an extensive experimental evaluation using several real XML data sets, with a variety of twig queries. Our results demonstrate that accurate and robust estimates can be achieved, even with limited space.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"139","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 17th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2001.914874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 139

Abstract

Describes efficient algorithms for accurately estimating the number of matches of a small node-labeled tree, i.e. a twig, in a large node-labeled tree, using a summary data structure. This problem is of interest for queries on XML and other hierarchical data, to provide query feedback and for cost-based query optimization. Our summary data structure scalably represents approximate frequency information about twiglets (i.e. small twigs) in the data tree. Given a twig query, the number of matches is estimated by creating a set of query twiglets, and combining two complementary approaches: set hashing, used to estimate the number of matches of each query twiglet, and maximal overlap, used to combine the query twiglet estimates into an estimate for the twig query. We propose several estimation algorithms that apply these approaches on query twiglets formed using variations on different twiglet decomposition techniques. We present an extensive experimental evaluation using several real XML data sets, with a variety of twig queries. Our results demonstrate that accurate and robust estimates can be achieved, even with limited space.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

数树枝上的火柴

描述使用摘要数据结构准确估计小节点标记树(即大节点标记树中的小枝)匹配数量的有效算法。对于XML和其他分层数据的查询、提供查询反馈和基于成本的查询优化来说，这个问题非常重要。我们的汇总数据结构可扩展地表示数据树中关于小枝(即小枝)的近似频率信息。给定一个小枝查询，通过创建一组查询小枝并结合两种互补的方法来估计匹配的数量:集合散列(用于估计每个查询小枝的匹配数量)和最大重叠(用于将查询小枝估计合并为小枝查询的估计)。我们提出了几种估计算法，将这些方法应用于使用不同小波分解技术变体形成的查询小波。我们使用几个真实的XML数据集和各种分支查询进行了广泛的实验评估。我们的结果表明，即使在有限的空间中，也可以实现准确而稳健的估计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings 17th International Conference on Data Engineering

自引率

0.00%

发文量

期刊最新文献

Quality-aware and load sensitive planning of image similarity queries Distinctiveness-sensitive nearest-neighbor search for efficient similarity retrieval of multimedia information Data management support of Web applications Prefetching based on the type-level access pattern in object-relational DBMSs Duality-based subsequence matching in time-series databases