Resource Configuration Tuning for Stream Data Processing Systems via Bayesian Optimization

IF 2.2 Q3 COMPUTER SCIENCE, CYBERNETICS International Journal of Intelligent Computing and Cybernetics Pub Date : 2022-10-06 DOI:10.34133/2022/9820424
Shixin Huang, Chao Chen, Gangya Zhu, Jinhan Xin, Z. Wang, Kai Hwang, Zhibin Yu
{"title":"Resource Configuration Tuning for Stream Data Processing Systems via Bayesian Optimization","authors":"Shixin Huang, Chao Chen, Gangya Zhu, Jinhan Xin, Z. Wang, Kai Hwang, Zhibin Yu","doi":"10.34133/2022/9820424","DOIUrl":null,"url":null,"abstract":"Stream data processing systems are becoming increasingly popular in the big data era. Systems such as Apache Flink typically provide a number (e.g., 30) of configuration parameters to flexibly specify the amount of resources (e.g., CPU cores and memory) allocated for tasks. These parameters significantly affect task performance. However, it is hard to manually tune them for optimal performance for an unknown program running on a given cluster. An automatic as well as fast resource configuration tuning approach is therefore desired. To this end, we propose to leverage Bayesian optimization to automatically tune the resource configurations for stream data processing systems. We first select a machine learning model—Random Forest—to construct accurate performance models for a stream data processing program. We subsequently take the Bayesian optimization (BO) algorithm, along with the performance models, to iteratively search the optimal configurations for a stream data processing program. Experimental results show that our approach improves the 99th-percentile tail latency by a factor of 2.62× on average and up to 5.26× overall. Furthermore, our approach improves throughput by a factor of 1.05× on average and up to 1.21× overall.","PeriodicalId":45291,"journal":{"name":"International Journal of Intelligent Computing and Cybernetics","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2022-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Computing and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/2022/9820424","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}
引用次数: 0

Abstract

Stream data processing systems are becoming increasingly popular in the big data era. Systems such as Apache Flink typically provide a number (e.g., 30) of configuration parameters to flexibly specify the amount of resources (e.g., CPU cores and memory) allocated for tasks. These parameters significantly affect task performance. However, it is hard to manually tune them for optimal performance for an unknown program running on a given cluster. An automatic as well as fast resource configuration tuning approach is therefore desired. To this end, we propose to leverage Bayesian optimization to automatically tune the resource configurations for stream data processing systems. We first select a machine learning model—Random Forest—to construct accurate performance models for a stream data processing program. We subsequently take the Bayesian optimization (BO) algorithm, along with the performance models, to iteratively search the optimal configurations for a stream data processing program. Experimental results show that our approach improves the 99th-percentile tail latency by a factor of 2.62× on average and up to 5.26× overall. Furthermore, our approach improves throughput by a factor of 1.05× on average and up to 1.21× overall.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于贝叶斯优化的流数据处理系统资源配置调优
在大数据时代,流数据处理系统越来越受欢迎。像Apache Flink这样的系统通常会提供一些配置参数(例如,30)来灵活地指定分配给任务的资源数量(例如,CPU内核和内存)。这些参数显著影响任务性能。但是,对于在给定集群上运行的未知程序,很难手动调优它们以获得最佳性能。因此,需要一种自动且快速的资源配置调优方法。为此,我们建议利用贝叶斯优化来自动调整流数据处理系统的资源配置。我们首先选择一个机器学习模型-随机森林-为流数据处理程序构建准确的性能模型。随后,我们采用贝叶斯优化(BO)算法,以及性能模型,迭代地搜索流数据处理程序的最佳配置。实验结果表明,该方法将第99百分位尾部延迟平均提高了2.62倍,总体提高了5.26倍。此外,我们的方法将吞吐量平均提高了1.05倍,总体提高了1.21倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.80
自引率
4.70%
发文量
26
期刊最新文献
X-News dataset for online news categorization X-News dataset for online news categorization A novel ensemble causal feature selection approach with mutual information and group fusion strategy for multi-label data Contextualized dynamic meta embeddings based on Gated CNNs and self-attention for Arabic machine translation Dynamic community detection algorithm based on hyperbolic graph convolution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1