Lookup Tables: Fine-Grained Partitioning for Distributed Databases

Aubrey Tatarowicz, C. Curino, E. Jones, S. Madden
{"title":"Lookup Tables: Fine-Grained Partitioning for Distributed Databases","authors":"Aubrey Tatarowicz, C. Curino, E. Jones, S. Madden","doi":"10.1109/ICDE.2012.26","DOIUrl":null,"url":null,"abstract":"The standard way to get linear scaling in a distributed OLTP DBMS is to horizontally partition data across several nodes. Ideally, this partitioning will result in each query being executed at just one node, to avoid the overheads of distributed transactions and allow nodes to be added without increasing the amount of required coordination. For some applications, simple strategies, such as hashing on primary key, provide this property. Unfortunately, for many applications, including social networking and order-fulfillment, many-to-many relationships cause simple strategies to result in a large fraction of distributed queries. Instead, what is needed is a fine-grained partitioning, where related individual tuples (e.g., cliques of friends) are co-located together in the same partition. Maintaining such a fine-grained partitioning requires the database to store a large amount of metadata about which partition each tuple resides in. We call such metadata a lookup table, and present the design of a data distribution layer that efficiently stores these tables and maintains them in the presence of inserts, deletes, and updates. We show that such tables can provide scalability for several difficult to partition database workloads, including Wikipedia, Twitter, and TPC-E. Our implementation provides 40% to 300% better performance on these workloads than either simple range or hash partitioning and shows greater potential for further scale-out.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 28th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2012.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 48

Abstract

The standard way to get linear scaling in a distributed OLTP DBMS is to horizontally partition data across several nodes. Ideally, this partitioning will result in each query being executed at just one node, to avoid the overheads of distributed transactions and allow nodes to be added without increasing the amount of required coordination. For some applications, simple strategies, such as hashing on primary key, provide this property. Unfortunately, for many applications, including social networking and order-fulfillment, many-to-many relationships cause simple strategies to result in a large fraction of distributed queries. Instead, what is needed is a fine-grained partitioning, where related individual tuples (e.g., cliques of friends) are co-located together in the same partition. Maintaining such a fine-grained partitioning requires the database to store a large amount of metadata about which partition each tuple resides in. We call such metadata a lookup table, and present the design of a data distribution layer that efficiently stores these tables and maintains them in the presence of inserts, deletes, and updates. We show that such tables can provide scalability for several difficult to partition database workloads, including Wikipedia, Twitter, and TPC-E. Our implementation provides 40% to 300% better performance on these workloads than either simple range or hash partitioning and shows greater potential for further scale-out.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
查找表:分布式数据库的细粒度分区
在分布式OLTP DBMS中获得线性扩展的标准方法是跨多个节点对数据进行水平分区。理想情况下,这种分区将导致每个查询只在一个节点上执行,以避免分布式事务的开销,并允许在不增加所需协调量的情况下添加节点。对于某些应用程序,一些简单的策略,比如对主键进行散列,就提供了这个属性。不幸的是,对于许多应用程序,包括社交网络和订单履行,多对多关系会导致简单的策略产生大量的分布式查询。相反,需要的是细粒度分区,其中相关的单个元组(例如,朋友团)在同一分区中共同定位在一起。维护这种细粒度分区需要数据库存储关于每个元组所在分区的大量元数据。我们将这样的元数据称为查找表,并介绍了数据分发层的设计,该层可以有效地存储这些表,并在存在插入、删除和更新时维护它们。我们展示了这样的表可以为几个难以分区的数据库工作负载提供可伸缩性,包括Wikipedia、Twitter和TPC-E。我们的实现在这些工作负载上提供了比简单范围分区或散列分区好40%到300%的性能,并显示出进一步向外扩展的更大潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Keyword Query Reformulation on Structured Data Accuracy-Aware Uncertain Stream Databases Extracting Analyzing and Visualizing Triangle K-Core Motifs within Networks Project Daytona: Data Analytics as a Cloud Service Automatic Extraction of Structured Web Data with Domain Knowledge
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1