Exploiting correlated attributes in acquisitional query processing

A. Deshpande, Carlos Guestrin, W. Hong, S. Madden
{"title":"Exploiting correlated attributes in acquisitional query processing","authors":"A. Deshpande, Carlos Guestrin, W. Hong, S. Madden","doi":"10.1109/ICDE.2005.63","DOIUrl":null,"url":null,"abstract":"Sensor networks and other distributed information systems (such as the Web) must frequently access data that has a high per-attribute acquisition cost, in terms of energy, latency, or computational resources. When executing queries that contain several predicates over such expensive attributes, we observe that it can be beneficial to use correlations to automatically introduce low-cost attributes whose observation will allow the query processor to better estimate die selectivity of these expensive predicates. In particular, we show how to build conditional plans that branch into one or more sub-plans, each with a different ordering for the expensive query predicates, based on the runtime observation of low-cost attributes. We frame the problem of constructing the optimal conditional plan for a given user query and set of candidate low-cost attributes as an optimization problem. We describe an exponential time algorithm for finding such optimal plans, and describe a polynomial-time heuristic for identifying conditional plans that perform well in practice. We also show how to compactly model conditional probability distributions needed to identify correlations and build these plans. We evaluate our algorithms against several real-world sensor-network data sets, showing several-times performance increases for a variety of queries versus traditional optimization techniques.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"138","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"21st International Conference on Data Engineering (ICDE'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2005.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 138

Abstract

Sensor networks and other distributed information systems (such as the Web) must frequently access data that has a high per-attribute acquisition cost, in terms of energy, latency, or computational resources. When executing queries that contain several predicates over such expensive attributes, we observe that it can be beneficial to use correlations to automatically introduce low-cost attributes whose observation will allow the query processor to better estimate die selectivity of these expensive predicates. In particular, we show how to build conditional plans that branch into one or more sub-plans, each with a different ordering for the expensive query predicates, based on the runtime observation of low-cost attributes. We frame the problem of constructing the optimal conditional plan for a given user query and set of candidate low-cost attributes as an optimization problem. We describe an exponential time algorithm for finding such optimal plans, and describe a polynomial-time heuristic for identifying conditional plans that perform well in practice. We also show how to compactly model conditional probability distributions needed to identify correlations and build these plans. We evaluate our algorithms against several real-world sensor-network data sets, showing several-times performance increases for a variety of queries versus traditional optimization techniques.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在获取查询处理中利用相关属性
传感器网络和其他分布式信息系统(如Web)必须频繁访问在能量、延迟或计算资源方面具有高每个属性获取成本的数据。当在这些昂贵的属性上执行包含多个谓词的查询时,我们观察到使用相关性来自动引入低成本属性可能是有益的,这些属性的观察结果将允许查询处理器更好地估计这些昂贵谓词的die选择性。特别是,我们将展示如何构建条件计划,将其分支为一个或多个子计划,每个子计划基于对低成本属性的运行时观察,对昂贵的查询谓词使用不同的顺序。我们将为给定用户查询和候选低成本属性集构造最优条件计划的问题定义为优化问题。我们描述了一个指数时间算法来寻找这样的最优计划,并描述了一个多项式时间启发式算法来识别在实践中表现良好的条件计划。我们还展示了如何对识别相关性和构建这些计划所需的条件概率分布进行紧凑建模。我们针对几个真实世界的传感器网络数据集评估了我们的算法,显示了与传统优化技术相比,各种查询的性能提高了几倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Proactive caching for spatial queries in mobile environments MoDB: database system for synthesizing human motion Integrating data from disparate sources: a mass collaboration approach ViteX: a streaming XPath processing system Efficient data management on lightweight computing devices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1