Elastic Pipelining in an In-Memory Database Cluster

Li Wang, Minqi Zhou, Zhenjie Zhang, Y. Yang, Aoying Zhou, D. Bitton
{"title":"Elastic Pipelining in an In-Memory Database Cluster","authors":"Li Wang, Minqi Zhou, Zhenjie Zhang, Y. Yang, Aoying Zhou, D. Bitton","doi":"10.1145/2882903.2882904","DOIUrl":null,"url":null,"abstract":"An in-memory database cluster consists of multiple interconnected nodes with a large capacity of RAM and modern multi-core CPUs. As a conventional query processing strategy, pipelining remains a promising solution for in-memory parallel database systems, as it avoids expensive intermediate result materialization and parallelizes the data processing among nodes. However, to fully unleash the power of pipelining in a cluster with multi-core nodes, it is crucial for the query optimizer to generate good query plans with appropriate intra-node parallelism, in order to maximize CPU and network bandwidth utilization. A suboptimal plan, on the contrary, causes load imbalance in the pipelines and consequently degrades the query performance. Parallelism assignment optimization at compile time is nearly impossible, as the workload in each node is affected by numerous factors and is highly dynamic during query evaluation. To tackle this problem, we propose elastic pipelining, which makes it possible to optimize intra-node parallelism assignments in the pipelines based on the actual workload at runtime. It is achieved with the adoption of new elastic iterator model and a fully optimized dynamic scheduler. The elastic iterator model generally upgrades traditional iterator model with new dynamic multi-core execution adjustment capability. And the dynamic scheduler efficiently provisions CPU cores to query execution segments in the pipelines based on the light-weight measurements on the operators. Extensive experiments on real and synthetic (TPC-H) data show that our proposal achieves almost full CPU utilization on typical decision-making analytical queries, outperforming state-of-the-art open-source systems by a huge margin.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"134 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2882903.2882904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

An in-memory database cluster consists of multiple interconnected nodes with a large capacity of RAM and modern multi-core CPUs. As a conventional query processing strategy, pipelining remains a promising solution for in-memory parallel database systems, as it avoids expensive intermediate result materialization and parallelizes the data processing among nodes. However, to fully unleash the power of pipelining in a cluster with multi-core nodes, it is crucial for the query optimizer to generate good query plans with appropriate intra-node parallelism, in order to maximize CPU and network bandwidth utilization. A suboptimal plan, on the contrary, causes load imbalance in the pipelines and consequently degrades the query performance. Parallelism assignment optimization at compile time is nearly impossible, as the workload in each node is affected by numerous factors and is highly dynamic during query evaluation. To tackle this problem, we propose elastic pipelining, which makes it possible to optimize intra-node parallelism assignments in the pipelines based on the actual workload at runtime. It is achieved with the adoption of new elastic iterator model and a fully optimized dynamic scheduler. The elastic iterator model generally upgrades traditional iterator model with new dynamic multi-core execution adjustment capability. And the dynamic scheduler efficiently provisions CPU cores to query execution segments in the pipelines based on the light-weight measurements on the operators. Extensive experiments on real and synthetic (TPC-H) data show that our proposal achieves almost full CPU utilization on typical decision-making analytical queries, outperforming state-of-the-art open-source systems by a huge margin.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
内存数据库集群中的弹性管道
内存数据库集群由多个相互连接的节点组成,这些节点具有大容量的RAM和现代多核cpu。作为一种传统的查询处理策略,流水线仍然是内存中并行数据库系统的一种很有前途的解决方案,因为它避免了昂贵的中间结果物化,并使节点之间的数据处理并行化。然而,为了在具有多核节点的集群中充分发挥流水线的功能,查询优化器必须生成具有适当节点内并行性的良好查询计划,以最大限度地提高CPU和网络带宽利用率。相反,次优计划会导致管道中的负载不平衡,从而降低查询性能。编译时的并行分配优化几乎是不可能的,因为每个节点中的工作负载受到许多因素的影响,并且在查询求值期间是高度动态的。为了解决这个问题,我们提出了弹性管道,这使得在运行时根据实际工作负载优化管道中的节点内并行分配成为可能。通过采用新的弹性迭代器模型和完全优化的动态调度来实现。弹性迭代器模型一般是对传统迭代器模型的升级,具有新的动态多核执行调整能力。动态调度器基于对操作符的轻量级度量,有效地分配CPU内核来查询管道中的执行段。在真实和合成(TPC-H)数据上进行的大量实验表明,我们的建议在典型的决策分析查询上实现了几乎全部的CPU利用率,大大优于最先进的开源系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory Rheem: Enabling Multi-Platform Task Execution Wander Join: Online Aggregation for Joins Graph Summarization for Geo-correlated Trends Detection in Social Networks Emma in Action: Declarative Dataflows for Scalable Data Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1