Counting twig matches in a tree

Zhiyuan Chen, H. Jagadish, Flip Korn, Nick Koudas, S. Muthukrishnan, R. Ng, D. Srivastava
{"title":"Counting twig matches in a tree","authors":"Zhiyuan Chen, H. Jagadish, Flip Korn, Nick Koudas, S. Muthukrishnan, R. Ng, D. Srivastava","doi":"10.1109/ICDE.2001.914874","DOIUrl":null,"url":null,"abstract":"Describes efficient algorithms for accurately estimating the number of matches of a small node-labeled tree, i.e. a twig, in a large node-labeled tree, using a summary data structure. This problem is of interest for queries on XML and other hierarchical data, to provide query feedback and for cost-based query optimization. Our summary data structure scalably represents approximate frequency information about twiglets (i.e. small twigs) in the data tree. Given a twig query, the number of matches is estimated by creating a set of query twiglets, and combining two complementary approaches: set hashing, used to estimate the number of matches of each query twiglet, and maximal overlap, used to combine the query twiglet estimates into an estimate for the twig query. We propose several estimation algorithms that apply these approaches on query twiglets formed using variations on different twiglet decomposition techniques. We present an extensive experimental evaluation using several real XML data sets, with a variety of twig queries. Our results demonstrate that accurate and robust estimates can be achieved, even with limited space.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"139","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 17th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2001.914874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 139

Abstract

Describes efficient algorithms for accurately estimating the number of matches of a small node-labeled tree, i.e. a twig, in a large node-labeled tree, using a summary data structure. This problem is of interest for queries on XML and other hierarchical data, to provide query feedback and for cost-based query optimization. Our summary data structure scalably represents approximate frequency information about twiglets (i.e. small twigs) in the data tree. Given a twig query, the number of matches is estimated by creating a set of query twiglets, and combining two complementary approaches: set hashing, used to estimate the number of matches of each query twiglet, and maximal overlap, used to combine the query twiglet estimates into an estimate for the twig query. We propose several estimation algorithms that apply these approaches on query twiglets formed using variations on different twiglet decomposition techniques. We present an extensive experimental evaluation using several real XML data sets, with a variety of twig queries. Our results demonstrate that accurate and robust estimates can be achieved, even with limited space.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数树枝上的火柴
描述使用摘要数据结构准确估计小节点标记树(即大节点标记树中的小枝)匹配数量的有效算法。对于XML和其他分层数据的查询、提供查询反馈和基于成本的查询优化来说,这个问题非常重要。我们的汇总数据结构可扩展地表示数据树中关于小枝(即小枝)的近似频率信息。给定一个小枝查询,通过创建一组查询小枝并结合两种互补的方法来估计匹配的数量:集合散列(用于估计每个查询小枝的匹配数量)和最大重叠(用于将查询小枝估计合并为小枝查询的估计)。我们提出了几种估计算法,将这些方法应用于使用不同小波分解技术变体形成的查询小波。我们使用几个真实的XML数据集和各种分支查询进行了广泛的实验评估。我们的结果表明,即使在有限的空间中,也可以实现准确而稳健的估计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Quality-aware and load sensitive planning of image similarity queries Distinctiveness-sensitive nearest-neighbor search for efficient similarity retrieval of multimedia information Data management support of Web applications Prefetching based on the type-level access pattern in object-relational DBMSs Duality-based subsequence matching in time-series databases
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1