并行流处理对工作负载偏差和方差

Junhua Fang, Rong Zhang, T. Fu, Zhenjie Zhang, Aoying Zhou, Junhua Zhu
{"title":"并行流处理对工作负载偏差和方差","authors":"Junhua Fang, Rong Zhang, T. Fu, Zhenjie Zhang, Aoying Zhou, Junhua Zhu","doi":"10.1145/3078597.3078613","DOIUrl":null,"url":null,"abstract":"Key-based workload partitioning is a common strategy used in parallel stream processing engines, enabling effective key-value tuple distribution over worker threads in a logical operator. It is likely to generate poor balancing performance when workload variance occurs on the incoming data stream. This paper presents a new key-based workload partitioning framework, with practical algorithms to support dynamic workload assignment for stateful operators. The framework combines hash-based and explicit key-based routing strategies for workload distribution, which specifies the destination worker threads for a handful of keys and assigns the other keys with the hash function. When short-term distribution fluctuations occur to the incoming data stream, the system adaptively updates the routing table containing the chosen keys, in order to rebalance the workload with minimal migration overhead within the stateful operator. We formulate the rebalance operation as an optimization problem, with multiple objectives on minimizing state migration costs, controlling the size of the routing table and breaking workload imbalance among worker threads. Despite of the NP-hardness nature behind the optimization formulation, we carefully investigate and justify the heuristics behind key (re)routing and state migration, to facilitate fast response to workload variance with ignorable cost to the normal processing in the distributed system. Empirical studies on synthetic data and real-world stream applications validate the usefulness of our proposals.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":"{\"title\":\"Parallel Stream Processing Against Workload Skewness and Variance\",\"authors\":\"Junhua Fang, Rong Zhang, T. Fu, Zhenjie Zhang, Aoying Zhou, Junhua Zhu\",\"doi\":\"10.1145/3078597.3078613\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Key-based workload partitioning is a common strategy used in parallel stream processing engines, enabling effective key-value tuple distribution over worker threads in a logical operator. It is likely to generate poor balancing performance when workload variance occurs on the incoming data stream. This paper presents a new key-based workload partitioning framework, with practical algorithms to support dynamic workload assignment for stateful operators. The framework combines hash-based and explicit key-based routing strategies for workload distribution, which specifies the destination worker threads for a handful of keys and assigns the other keys with the hash function. When short-term distribution fluctuations occur to the incoming data stream, the system adaptively updates the routing table containing the chosen keys, in order to rebalance the workload with minimal migration overhead within the stateful operator. We formulate the rebalance operation as an optimization problem, with multiple objectives on minimizing state migration costs, controlling the size of the routing table and breaking workload imbalance among worker threads. Despite of the NP-hardness nature behind the optimization formulation, we carefully investigate and justify the heuristics behind key (re)routing and state migration, to facilitate fast response to workload variance with ignorable cost to the normal processing in the distributed system. Empirical studies on synthetic data and real-world stream applications validate the usefulness of our proposals.\",\"PeriodicalId\":436194,\"journal\":{\"name\":\"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"38\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3078597.3078613\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3078597.3078613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38

摘要

基于键的工作负载分区是并行流处理引擎中使用的一种常用策略,它支持在逻辑运算符中的工作线程上有效地分配键值元组。当传入数据流上发生工作负载变化时,可能会产生较差的平衡性能。本文提出了一种新的基于密钥的工作负载划分框架,并提供了实用的算法来支持有状态操作符的动态工作负载分配。该框架结合了基于哈希和显式基于键的路由策略,用于工作负载分发,该策略为少量键指定目标工作线程,并使用哈希函数分配其他键。当传入数据流出现短期分布波动时,系统会自适应地更新包含所选键的路由表,以便在有状态操作符内以最小的迁移开销重新平衡工作负载。我们将rebalance操作描述为一个优化问题,具有最小化状态迁移成本、控制路由表大小和打破工作线程之间工作负载不平衡等多个目标。尽管优化公式背后具有np -硬度性质,但我们仔细研究并证明了密钥(重)路由和状态迁移背后的启发式方法,以便在分布式系统中以可忽略的正常处理成本快速响应工作负载变化。对合成数据和现实世界流应用的实证研究验证了我们的建议的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Parallel Stream Processing Against Workload Skewness and Variance
Key-based workload partitioning is a common strategy used in parallel stream processing engines, enabling effective key-value tuple distribution over worker threads in a logical operator. It is likely to generate poor balancing performance when workload variance occurs on the incoming data stream. This paper presents a new key-based workload partitioning framework, with practical algorithms to support dynamic workload assignment for stateful operators. The framework combines hash-based and explicit key-based routing strategies for workload distribution, which specifies the destination worker threads for a handful of keys and assigns the other keys with the hash function. When short-term distribution fluctuations occur to the incoming data stream, the system adaptively updates the routing table containing the chosen keys, in order to rebalance the workload with minimal migration overhead within the stateful operator. We formulate the rebalance operation as an optimization problem, with multiple objectives on minimizing state migration costs, controlling the size of the routing table and breaking workload imbalance among worker threads. Despite of the NP-hardness nature behind the optimization formulation, we carefully investigate and justify the heuristics behind key (re)routing and state migration, to facilitate fast response to workload variance with ignorable cost to the normal processing in the distributed system. Empirical studies on synthetic data and real-world stream applications validate the usefulness of our proposals.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Deep Learning in Cancer and Infectious Disease: Novel Driver Problems for Future HPC Architecture LetGo: A Lightweight Continuous Framework for HPC Applications Under Failures Explaining Wide Area Data Transfer Performance IOGP: An Incremental Online Graph Partitioning Algorithm for Distributed Graph Databases Better Safe than Sorry: Grappling with Failures of In-Memory Data Analytics Frameworks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1