面向大规模交互式数据查询系统的协同资源管理

Wei Yan, Yuan Xue
{"title":"面向大规模交互式数据查询系统的协同资源管理","authors":"Wei Yan, Yuan Xue","doi":"10.1109/CCGrid.2015.149","DOIUrl":null,"url":null,"abstract":"Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"39 7","pages":"677-686"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Coordinated Resource Management for Large Scale Interactive Data Query Systems\",\"authors\":\"Wei Yan, Yuan Xue\",\"doi\":\"10.1109/CCGrid.2015.149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.\",\"PeriodicalId\":6664,\"journal\":{\"name\":\"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"volume\":\"39 7\",\"pages\":\"677-686\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2015.149\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2015.149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

对海量数据集的交互式临时数据查询最近获得了显著的关注。大规模并行数据查询和分析框架(例如,Dremel, Impala)被构建和部署,以支持在集群环境中对分布式和分区数据进行类似sql的查询。因此,每个查询的执行被转换为一组协调的任务,包括数据检索、中间结果计算和传输以及结果聚合。为了支持并发交互查询的高请求率,集群环境的多种资源(例如带宽、CPU、内存)的协调管理至关重要。在本文中,我们使用基于效用的优化框架来研究这个资源管理问题。我们的目标是优化资源利用率,并在不同类型的查询之间保持公平性。我们提出了一种基于价格的算法来实现这一优化目标。我们在开源的Impala系统中实现了我们的算法,并使用TPC-DS工作负载在集群环境中进行了一组实验。实验结果表明,与简单的公平资源共享机制相比,我们的协调资源管理方案可使总效用至少提高15.4%,与先进先出资源管理机制相比,可使总效用提高63.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Coordinated Resource Management for Large Scale Interactive Data Query Systems
Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Self Protecting Data Sharing Using Generic Policies Partition-Aware Routing to Improve Network Isolation in Infiniband Based Multi-tenant Clusters MIC-Tandem: Parallel X!Tandem Using MIC on Tandem Mass Spectrometry Based Proteomics Data Study of the KVM CPU Performance of Open-Source Cloud Management Platforms Visualizing City Events on Search Engine: Tword the Search Infrustration for Smart City
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1