大规模数据分析中程序编排的自主资源管理

Masahiro Tanaka, K. Taura, Kentaro Torisawa
{"title":"大规模数据分析中程序编排的自主资源管理","authors":"Masahiro Tanaka, K. Taura, Kentaro Torisawa","doi":"10.1109/IPDPS.2017.89","DOIUrl":null,"url":null,"abstract":"Large-scale data analysis applications are becoming more and more prevalent in a wide variety of areas. These applications are composed of many currently available programs called analysis components. Thousands of analysis component processes are orchestrated on many compute nodes. This paper proposes a novel self-tuning framework for optimizing an application's throughput in large-scale data analysis. One challenge is developing efficient orchestration that takes into account the diversity of analysis components and the varying performances of compute nodes. In our previous work, we achieved such an orchestration to a certain degree by introducing our own middleware, which wraps each analysis component as a remote procedure call (RPC) service. The middleware also pools the processes to reduce startup overhead, which is a serious obstacle to achieving high throughput. This work tackles the remaining task of tuning the size of the analysis components' process pools to maximize the application's throughput. This is challenging because analysis components differ drastically in turnaround times and memory footprints. The size of the process pool for each type of analysis component should be set by giving consideration to these properties as well as the constraints on both the memory capacity and the processor core counts. In this work, we formulate this task as a linear programming problem and obtain the optimal pool sizes by solving it. Compared to our previous work, we significantly improved the scalability of our framework by reformulating the performance model to work on hundreds of heterogeneous nodes. We also extended the service allocation mechanism to manage the computational load on each compute node and reduce communication overhead. The experimental results show that our approach is scalable to thousands of analysis component processes running on 200 compute nodes across three clusters. Moreover, our approach significantly reduces memory footprint.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Autonomic Resource Management for Program Orchestration in Large-Scale Data Analysis\",\"authors\":\"Masahiro Tanaka, K. Taura, Kentaro Torisawa\",\"doi\":\"10.1109/IPDPS.2017.89\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale data analysis applications are becoming more and more prevalent in a wide variety of areas. These applications are composed of many currently available programs called analysis components. Thousands of analysis component processes are orchestrated on many compute nodes. This paper proposes a novel self-tuning framework for optimizing an application's throughput in large-scale data analysis. One challenge is developing efficient orchestration that takes into account the diversity of analysis components and the varying performances of compute nodes. In our previous work, we achieved such an orchestration to a certain degree by introducing our own middleware, which wraps each analysis component as a remote procedure call (RPC) service. The middleware also pools the processes to reduce startup overhead, which is a serious obstacle to achieving high throughput. This work tackles the remaining task of tuning the size of the analysis components' process pools to maximize the application's throughput. This is challenging because analysis components differ drastically in turnaround times and memory footprints. The size of the process pool for each type of analysis component should be set by giving consideration to these properties as well as the constraints on both the memory capacity and the processor core counts. In this work, we formulate this task as a linear programming problem and obtain the optimal pool sizes by solving it. Compared to our previous work, we significantly improved the scalability of our framework by reformulating the performance model to work on hundreds of heterogeneous nodes. We also extended the service allocation mechanism to manage the computational load on each compute node and reduce communication overhead. The experimental results show that our approach is scalable to thousands of analysis component processes running on 200 compute nodes across three clusters. Moreover, our approach significantly reduces memory footprint.\",\"PeriodicalId\":209524,\"journal\":{\"name\":\"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2017.89\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2017.89","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

大规模数据分析应用在各个领域变得越来越普遍。这些应用程序由许多当前可用的称为分析组件的程序组成。在许多计算节点上编排了数千个分析组件流程。本文提出了一种新的自调优框架,用于优化大规模数据分析中应用程序的吞吐量。其中一个挑战是开发有效的编排,该编排要考虑到分析组件的多样性和计算节点的不同性能。在我们之前的工作中,我们通过引入自己的中间件在一定程度上实现了这样的编排,中间件将每个分析组件包装为远程过程调用(RPC)服务。中间件还汇集进程以减少启动开销,这是实现高吞吐量的一个严重障碍。这项工作处理剩下的任务,即调整分析组件的进程池的大小,以最大限度地提高应用程序的吞吐量。这是具有挑战性的,因为分析组件在周转时间和内存占用方面差别很大。每种分析组件类型的进程池的大小应该通过考虑这些属性以及对内存容量和处理器核数的限制来设置。在这项工作中,我们将这个任务表述为一个线性规划问题,并通过求解它来获得最优池大小。与以前的工作相比,我们通过重新制定性能模型来处理数百个异构节点,从而显著提高了框架的可伸缩性。我们还扩展了服务分配机制,以管理每个计算节点上的计算负载,减少通信开销。实验结果表明,我们的方法可扩展到在三个集群的200个计算节点上运行的数千个分析组件进程。此外,我们的方法显著减少了内存占用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Autonomic Resource Management for Program Orchestration in Large-Scale Data Analysis
Large-scale data analysis applications are becoming more and more prevalent in a wide variety of areas. These applications are composed of many currently available programs called analysis components. Thousands of analysis component processes are orchestrated on many compute nodes. This paper proposes a novel self-tuning framework for optimizing an application's throughput in large-scale data analysis. One challenge is developing efficient orchestration that takes into account the diversity of analysis components and the varying performances of compute nodes. In our previous work, we achieved such an orchestration to a certain degree by introducing our own middleware, which wraps each analysis component as a remote procedure call (RPC) service. The middleware also pools the processes to reduce startup overhead, which is a serious obstacle to achieving high throughput. This work tackles the remaining task of tuning the size of the analysis components' process pools to maximize the application's throughput. This is challenging because analysis components differ drastically in turnaround times and memory footprints. The size of the process pool for each type of analysis component should be set by giving consideration to these properties as well as the constraints on both the memory capacity and the processor core counts. In this work, we formulate this task as a linear programming problem and obtain the optimal pool sizes by solving it. Compared to our previous work, we significantly improved the scalability of our framework by reformulating the performance model to work on hundreds of heterogeneous nodes. We also extended the service allocation mechanism to manage the computational load on each compute node and reduce communication overhead. The experimental results show that our approach is scalable to thousands of analysis component processes running on 200 compute nodes across three clusters. Moreover, our approach significantly reduces memory footprint.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL Toucan — A Translator for Communication Tolerant MPI Applications Production Hardware Overprovisioning: Real-World Performance Optimization Using an Extensible Power-Aware Resource Management Framework Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs Dynamic Memory-Aware Task-Tree Scheduling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1