面向大规模交互式数据查询系统的协同资源管理

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2015-05-04 DOI:10.1109/CCGrid.2015.149

Wei Yan, Yuan Xue

{"title":"面向大规模交互式数据查询系统的协同资源管理","authors":"Wei Yan, Yuan Xue","doi":"10.1109/CCGrid.2015.149","DOIUrl":null,"url":null,"abstract":"Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"39 7","pages":"677-686"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Coordinated Resource Management for Large Scale Interactive Data Query Systems\",\"authors\":\"Wei Yan, Yuan Xue\",\"doi\":\"10.1109/CCGrid.2015.149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.\",\"PeriodicalId\":6664,\"journal\":{\"name\":\"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"volume\":\"39 7\",\"pages\":\"677-686\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2015.149\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2015.149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

对海量数据集的交互式临时数据查询最近获得了显著的关注。大规模并行数据查询和分析框架(例如，Dremel, Impala)被构建和部署，以支持在集群环境中对分布式和分区数据进行类似sql的查询。因此，每个查询的执行被转换为一组协调的任务，包括数据检索、中间结果计算和传输以及结果聚合。为了支持并发交互查询的高请求率，集群环境的多种资源(例如带宽、CPU、内存)的协调管理至关重要。在本文中，我们使用基于效用的优化框架来研究这个资源管理问题。我们的目标是优化资源利用率，并在不同类型的查询之间保持公平性。我们提出了一种基于价格的算法来实现这一优化目标。我们在开源的Impala系统中实现了我们的算法，并使用TPC-DS工作负载在集群环境中进行了一组实验。实验结果表明，与简单的公平资源共享机制相比，我们的协调资源管理方案可使总效用至少提高15.4%，与先进先出资源管理机制相比，可使总效用提高63.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Coordinated Resource Management for Large Scale Interactive Data Query Systems

Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

自引率

0.00%

发文量