Locality-sensitive operators for parallel main-memory database clusters

Wolf Rödiger, Tobias Mühlbauer, Philipp Unterbrunner, Angelika Reiser, A. Kemper, Thomas Neumann
{"title":"Locality-sensitive operators for parallel main-memory database clusters","authors":"Wolf Rödiger, Tobias Mühlbauer, Philipp Unterbrunner, Angelika Reiser, A. Kemper, Thomas Neumann","doi":"10.1109/ICDE.2014.6816684","DOIUrl":null,"url":null,"abstract":"The growth in compute speed has outpaced the growth in network bandwidth over the last decades. This has led to an increasing performance gap between local and distributed processing. A parallel database cluster thus has to maximize the locality of query processing. A common technique to this end is to co-partition relations to avoid expensive data shuffling across the network. However, this is limited to one attribute per relation and is expensive to maintain in the face of updates. Other attributes often exhibit a fuzzy co-location due to correlations with the distribution key but current approaches do not leverage this. In this paper, we introduce locality-sensitive data shuffling, which can dramatically reduce the amount of network communication for distributed operators such as join and aggregation. We present four novel techniques: (i) optimal partition assignment exploits locality to reduce the network phase duration; (ii) communication scheduling avoids bandwidth underutilization due to cross traffic; (iii) adaptive radix partitioning retains locality during data repartitioning and handles value skew gracefully; and (iv) selective broadcast reduces network communication in the presence of extreme value skew or large numbers of duplicates. We present comprehensive experimental results, which show that our techniques can improve performance by up to factor of 5 for fuzzy co-location and a factor of 3 for inputs with value skew.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"61","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 30th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2014.6816684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 61

Abstract

The growth in compute speed has outpaced the growth in network bandwidth over the last decades. This has led to an increasing performance gap between local and distributed processing. A parallel database cluster thus has to maximize the locality of query processing. A common technique to this end is to co-partition relations to avoid expensive data shuffling across the network. However, this is limited to one attribute per relation and is expensive to maintain in the face of updates. Other attributes often exhibit a fuzzy co-location due to correlations with the distribution key but current approaches do not leverage this. In this paper, we introduce locality-sensitive data shuffling, which can dramatically reduce the amount of network communication for distributed operators such as join and aggregation. We present four novel techniques: (i) optimal partition assignment exploits locality to reduce the network phase duration; (ii) communication scheduling avoids bandwidth underutilization due to cross traffic; (iii) adaptive radix partitioning retains locality during data repartitioning and handles value skew gracefully; and (iv) selective broadcast reduces network communication in the presence of extreme value skew or large numbers of duplicates. We present comprehensive experimental results, which show that our techniques can improve performance by up to factor of 5 for fuzzy co-location and a factor of 3 for inputs with value skew.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于并行主存数据库集群的位置敏感操作符
在过去的几十年里,计算速度的增长已经超过了网络带宽的增长。这导致本地处理和分布式处理之间的性能差距越来越大。因此,并行数据库集群必须最大化查询处理的局部性。实现这一目的的常用技术是共分区关系,以避免在网络上进行昂贵的数据变换。但是,这仅限于每个关系的一个属性,并且在面对更新时维护成本很高。由于与分布键的相关性,其他属性通常表现出模糊的共定位,但目前的方法没有利用这一点。在本文中,我们引入了位置敏感的数据变换,它可以大大减少分布式操作(如连接和聚合)的网络通信量。我们提出了四种新技术:(i)最优分区分配利用局部性来减少网络相位持续时间;(ii)通信调度避免了由于交叉流量导致的带宽利用率不足;(iii)自适应基数分区在数据重分区期间保持局域性,并优雅地处理值倾斜;(iv)选择性广播减少了存在极端值偏差或大量重复的网络通信。我们提供了全面的实验结果,表明我们的技术可以将模糊共定位的性能提高5倍,对于具有值偏差的输入可以提高3倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Managing uncertainty in spatial and spatio-temporal data Locality-sensitive operators for parallel main-memory database clusters KnowLife: A knowledge graph for health and life sciences We can learn your #hashtags: Connecting tweets to explicit topics A demonstration of MNTG - A web-based road network traffic generator
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1