{"title":"Patched Multi-Key Partitioning for Robust Query Performance","authors":"Steffen Kläbe, K. Sattler","doi":"10.48786/edbt.2023.26","DOIUrl":null,"url":null,"abstract":"Data partitioning is the key for parallel query processing in modern analytical database systems. Choosing the right partitioning key for a given dataset is a difficult task and crucial for query performance. Real world data warehouses contain a large amount of tables connected in complex schemes resulting in an over-whelming amount of partition key candidates. In this paper, we present the approach of patched multi-key partitioning, allowing to define multiple partition keys simultaneously without data replication. The key idea is to map the relational table partitioning problem to a graph partition problem in order to use existing graph partitioning algorithms to find connectivity components in the data and maintain exceptions (patches) to the partitioning separately. We show that patched multi-key partitioning offer opportunities for achieving robust query performance, i.e. reaching reasonably good performance for many queries instead of optimal performance for only a few queries.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"9 1","pages":"324-336"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2023.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data partitioning is the key for parallel query processing in modern analytical database systems. Choosing the right partitioning key for a given dataset is a difficult task and crucial for query performance. Real world data warehouses contain a large amount of tables connected in complex schemes resulting in an over-whelming amount of partition key candidates. In this paper, we present the approach of patched multi-key partitioning, allowing to define multiple partition keys simultaneously without data replication. The key idea is to map the relational table partitioning problem to a graph partition problem in order to use existing graph partitioning algorithms to find connectivity components in the data and maintain exceptions (patches) to the partitioning separately. We show that patched multi-key partitioning offer opportunities for achieving robust query performance, i.e. reaching reasonably good performance for many queries instead of optimal performance for only a few queries.