Modeling Distributed Stream Processing Systems Under Heavy Workload

Muhammad Mudassar Qureshi, Hanhua Chen, Hai Jin
{"title":"Modeling Distributed Stream Processing Systems Under Heavy Workload","authors":"Muhammad Mudassar Qureshi, Hanhua Chen, Hai Jin","doi":"10.1109/CW.2019.00024","DOIUrl":null,"url":null,"abstract":"Big data applications play a significant role in diverse fields. Distributed Stream Processing Engines (DSPEs) are widely used to support real time applications efficiently. Partitioning algorithms are used to partition data streams into multiple nodes to process in parallel to gain efficient performance. Aggregation cost is an important factor when process stateful streaming applications using such partitioning algorithms because it plays an important role on performance when final result is being produced in stateful streaming applications. However, impact of aggregation cost in stream processing is not discussed comprehensively in existing literature. We use performance modeling to identify the importance of aggregation cost when workload is high. We implement performance model on a multi-node cluster to predict the same behavior as on single resource performance model. We demonstrate that stateful streaming applications need more resources as compare to stateless applications when workload is high and both stateful and stateless applications are running in the same DSPE. Experiments results show that a stateful streaming application needs more resources compared to a stateless streaming application when both applications are running on the same DSPE when the workload is high. Further experiment results show that the performance modeling may be helpful to predict maximum workload that can be process on a DSPE and increase in parallelism level is not guaranteed to increase the performance of streaming applications.","PeriodicalId":117409,"journal":{"name":"2019 International Conference on Cyberworlds (CW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Cyberworlds (CW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CW.2019.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Big data applications play a significant role in diverse fields. Distributed Stream Processing Engines (DSPEs) are widely used to support real time applications efficiently. Partitioning algorithms are used to partition data streams into multiple nodes to process in parallel to gain efficient performance. Aggregation cost is an important factor when process stateful streaming applications using such partitioning algorithms because it plays an important role on performance when final result is being produced in stateful streaming applications. However, impact of aggregation cost in stream processing is not discussed comprehensively in existing literature. We use performance modeling to identify the importance of aggregation cost when workload is high. We implement performance model on a multi-node cluster to predict the same behavior as on single resource performance model. We demonstrate that stateful streaming applications need more resources as compare to stateless applications when workload is high and both stateful and stateless applications are running in the same DSPE. Experiments results show that a stateful streaming application needs more resources compared to a stateless streaming application when both applications are running on the same DSPE when the workload is high. Further experiment results show that the performance modeling may be helpful to predict maximum workload that can be process on a DSPE and increase in parallelism level is not guaranteed to increase the performance of streaming applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大负载下分布式流处理系统建模
大数据应用在各个领域发挥着重要作用。分布式流处理引擎(Distributed Stream Processing engine, dspe)被广泛用于高效地支持实时应用。采用分区算法将数据流划分到多个节点进行并行处理,以获得高效的性能。在使用这种分区算法处理有状态流应用程序时,聚合成本是一个重要因素,因为在有状态流应用程序中产生最终结果时,聚合成本对性能起着重要作用。然而,现有文献并未对聚合成本对流处理的影响进行全面的讨论。我们使用性能建模来确定工作负载高时聚合成本的重要性。我们在多节点集群上实现性能模型,以预测与单资源性能模型相同的行为。我们演示了当工作负载高且有状态和无状态应用程序都在同一个DSPE中运行时,有状态流应用程序比无状态流应用程序需要更多的资源。实验结果表明,当负载较高时,有状态流应用程序比无状态流应用程序需要更多的资源。进一步的实验结果表明,性能建模可能有助于预测在DSPE上可以处理的最大工作负载,而并行性级别的提高并不能保证提高流应用程序的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
EEG-Based Human Factors Evaluation of Air Traffic Control Operators (ATCOs) for Optimal Training Multi-instance Cancelable Biometric System using Convolutional Neural Network How does Augmented Reality Improve the Play Experience in Current Augmented Reality Enhanced Smartphone Games? Detection of Humanoid Robot Design Preferences Using EEG and Eye Tracker Vulnerability of Adaptive Strategies of Keystroke Dynamics Based Authentication Against Different Attack Types
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1