Pando:增强数据跳跃与逻辑数据分区

Sivaprasad Sudhir, Wenbo Tao, N. Laptev, Cyrille Habis, Michael J. Cafarella, S. Madden
{"title":"Pando:增强数据跳跃与逻辑数据分区","authors":"Sivaprasad Sudhir, Wenbo Tao, N. Laptev, Cyrille Habis, Michael J. Cafarella, S. Madden","doi":"10.14778/3598581.3598601","DOIUrl":null,"url":null,"abstract":"With enormous volumes of data, quickly retrieving data that is relevant to a query is essential for achieving high performance. Modern cloud-based database systems often partition the data into blocks and employ various techniques to skip irrelevant blocks during query execution. Several algorithms, often based on historical properties of a workload of queries run over the data, have been proposed to tune the physical layout of data to reduce the number of blocks accessed. The effectiveness of these methods at skipping blocks depends on what metadata is stored and how well the physical data layout aligns with the queries. Existing work on automatic physical database design misses significant opportunities in skipping blocks because it ignores logical predicates in the workload that exhibit strongly correlated results. In this paper, we present Pando which enables significantly better block skipping than past methods by informing physical layout decisions with correlation-aware logical partitioning. Across a range of benchmark and real-world workloads, Pando attains up to 2.8X reduction in the number of blocks scanned and up to 2.3X speedup in end-to-end query execution time over the state-of-the-art techniques.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"20 1","pages":"2316-2329"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pando: Enhanced Data Skipping with Logical Data Partitioning\",\"authors\":\"Sivaprasad Sudhir, Wenbo Tao, N. Laptev, Cyrille Habis, Michael J. Cafarella, S. Madden\",\"doi\":\"10.14778/3598581.3598601\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With enormous volumes of data, quickly retrieving data that is relevant to a query is essential for achieving high performance. Modern cloud-based database systems often partition the data into blocks and employ various techniques to skip irrelevant blocks during query execution. Several algorithms, often based on historical properties of a workload of queries run over the data, have been proposed to tune the physical layout of data to reduce the number of blocks accessed. The effectiveness of these methods at skipping blocks depends on what metadata is stored and how well the physical data layout aligns with the queries. Existing work on automatic physical database design misses significant opportunities in skipping blocks because it ignores logical predicates in the workload that exhibit strongly correlated results. In this paper, we present Pando which enables significantly better block skipping than past methods by informing physical layout decisions with correlation-aware logical partitioning. Across a range of benchmark and real-world workloads, Pando attains up to 2.8X reduction in the number of blocks scanned and up to 2.3X speedup in end-to-end query execution time over the state-of-the-art techniques.\",\"PeriodicalId\":20467,\"journal\":{\"name\":\"Proc. VLDB Endow.\",\"volume\":\"20 1\",\"pages\":\"2316-2329\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proc. VLDB Endow.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14778/3598581.3598601\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3598581.3598601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

对于大量的数据,快速检索与查询相关的数据对于实现高性能至关重要。现代基于云的数据库系统经常将数据划分为块,并使用各种技术在查询执行期间跳过不相关的块。已经提出了几种算法(通常基于在数据上运行的查询工作负载的历史属性)来调优数据的物理布局,以减少访问的块数量。这些跳过块的方法的有效性取决于存储的元数据以及物理数据布局与查询的对齐程度。现有的自动物理数据库设计工作错过了跳过块的重要机会,因为它忽略了工作负载中显示强烈相关结果的逻辑谓词。在本文中,我们提出了Pando,它通过使用关联感知逻辑分区通知物理布局决策,从而比过去的方法实现更好的块跳转。在一系列基准测试和实际工作负载中,Pando的扫描块数量减少了2.8倍,端到端查询执行时间加快了2.3倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Pando: Enhanced Data Skipping with Logical Data Partitioning
With enormous volumes of data, quickly retrieving data that is relevant to a query is essential for achieving high performance. Modern cloud-based database systems often partition the data into blocks and employ various techniques to skip irrelevant blocks during query execution. Several algorithms, often based on historical properties of a workload of queries run over the data, have been proposed to tune the physical layout of data to reduce the number of blocks accessed. The effectiveness of these methods at skipping blocks depends on what metadata is stored and how well the physical data layout aligns with the queries. Existing work on automatic physical database design misses significant opportunities in skipping blocks because it ignores logical predicates in the workload that exhibit strongly correlated results. In this paper, we present Pando which enables significantly better block skipping than past methods by informing physical layout decisions with correlation-aware logical partitioning. Across a range of benchmark and real-world workloads, Pando attains up to 2.8X reduction in the number of blocks scanned and up to 2.3X speedup in end-to-end query execution time over the state-of-the-art techniques.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Cryptographically Secure Private Record Linkage Using Locality-Sensitive Hashing Utility-aware Payment Channel Network Rebalance Relational Query Synthesis ⋈ Decision Tree Learning Billion-Scale Bipartite Graph Embedding: A Global-Local Induced Approach Query Refinement for Diversity Constraint Satisfaction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1