Lookup Tables: Fine-Grained Partitioning for Distributed Databases

2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI:10.1109/ICDE.2012.26

Aubrey Tatarowicz, C. Curino, E. Jones, S. Madden

{"title":"Lookup Tables: Fine-Grained Partitioning for Distributed Databases","authors":"Aubrey Tatarowicz, C. Curino, E. Jones, S. Madden","doi":"10.1109/ICDE.2012.26","DOIUrl":null,"url":null,"abstract":"The standard way to get linear scaling in a distributed OLTP DBMS is to horizontally partition data across several nodes. Ideally, this partitioning will result in each query being executed at just one node, to avoid the overheads of distributed transactions and allow nodes to be added without increasing the amount of required coordination. For some applications, simple strategies, such as hashing on primary key, provide this property. Unfortunately, for many applications, including social networking and order-fulfillment, many-to-many relationships cause simple strategies to result in a large fraction of distributed queries. Instead, what is needed is a fine-grained partitioning, where related individual tuples (e.g., cliques of friends) are co-located together in the same partition. Maintaining such a fine-grained partitioning requires the database to store a large amount of metadata about which partition each tuple resides in. We call such metadata a lookup table, and present the design of a data distribution layer that efficiently stores these tables and maintains them in the presence of inserts, deletes, and updates. We show that such tables can provide scalability for several difficult to partition database workloads, including Wikipedia, Twitter, and TPC-E. Our implementation provides 40% to 300% better performance on these workloads than either simple range or hash partitioning and shows greater potential for further scale-out.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 28th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2012.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 48

Abstract

The standard way to get linear scaling in a distributed OLTP DBMS is to horizontally partition data across several nodes. Ideally, this partitioning will result in each query being executed at just one node, to avoid the overheads of distributed transactions and allow nodes to be added without increasing the amount of required coordination. For some applications, simple strategies, such as hashing on primary key, provide this property. Unfortunately, for many applications, including social networking and order-fulfillment, many-to-many relationships cause simple strategies to result in a large fraction of distributed queries. Instead, what is needed is a fine-grained partitioning, where related individual tuples (e.g., cliques of friends) are co-located together in the same partition. Maintaining such a fine-grained partitioning requires the database to store a large amount of metadata about which partition each tuple resides in. We call such metadata a lookup table, and present the design of a data distribution layer that efficiently stores these tables and maintains them in the presence of inserts, deletes, and updates. We show that such tables can provide scalability for several difficult to partition database workloads, including Wikipedia, Twitter, and TPC-E. Our implementation provides 40% to 300% better performance on these workloads than either simple range or hash partitioning and shows greater potential for further scale-out.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

查找表:分布式数据库的细粒度分区

在分布式OLTP DBMS中获得线性扩展的标准方法是跨多个节点对数据进行水平分区。理想情况下，这种分区将导致每个查询只在一个节点上执行，以避免分布式事务的开销，并允许在不增加所需协调量的情况下添加节点。对于某些应用程序，一些简单的策略，比如对主键进行散列，就提供了这个属性。不幸的是，对于许多应用程序，包括社交网络和订单履行，多对多关系会导致简单的策略产生大量的分布式查询。相反，需要的是细粒度分区，其中相关的单个元组(例如，朋友团)在同一分区中共同定位在一起。维护这种细粒度分区需要数据库存储关于每个元组所在分区的大量元数据。我们将这样的元数据称为查找表，并介绍了数据分发层的设计，该层可以有效地存储这些表，并在存在插入、删除和更新时维护它们。我们展示了这样的表可以为几个难以分区的数据库工作负载提供可伸缩性，包括Wikipedia、Twitter和TPC-E。我们的实现在这些工作负载上提供了比简单范围分区或散列分区好40%到300%的性能，并显示出进一步向外扩展的更大潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2012 IEEE 28th International Conference on Data Engineering

自引率

0.00%

发文量

期刊最新文献

Keyword Query Reformulation on Structured Data Accuracy-Aware Uncertain Stream Databases Extracting Analyzing and Visualizing Triangle K-Core Motifs within Networks Project Daytona: Data Analytics as a Cloud Service Automatic Extraction of Structured Web Data with Domain Knowledge