Dynamically Improving Resiliency to Timing Errors for Stream Processing Workloads

Geoffrey Phi C. Tran, J. Walters, S. Crago
{"title":"Dynamically Improving Resiliency to Timing Errors for Stream Processing Workloads","authors":"Geoffrey Phi C. Tran, J. Walters, S. Crago","doi":"10.1109/PDCAT.2017.00080","DOIUrl":null,"url":null,"abstract":"Large-scale data processing paradigms, such as stream processing, are widespread in academic and corporate workloads. These environments are commonly subject to real-time requirements, such as latency and throughput, and resiliency requirements to node or network failures. These requirements have generally been approached as separate problems. Intermittent timing delays due to factors such as garbage collection can further complicate the management of the stream processing workload. Insufficient resource allocations can also lead to poor performance. Currently, tuning these applications is done manually. We show that improper configuration can greatly affect performance. It is reported that even 100ms of increased latency in online sales platforms can potentially result in lower sales. In this paper we propose Dynamo, a framework and monitor that implements a methodology for addressing both the performance and timing error problems by increasing the resiliency of stream processing frameworks to timing delays. Dynamo autonomously adjusts the resource allocation by using a performance profile that is generated through application profiling. Dynamo partitions an application’s allocated resources into active and passive partitions that are dynamically adjusted to match an application’s multi-modal behavior. The distribution of resources determines the amount of computation that Dynamo can duplicate and process redundantly, thereby reducing the probability of timing errors that affect a tuple’s total execution time. In our experiments, we observed improvements in the number of tuples with missed deadlines. Our results show that Dynamo is able to consistently improve the resiliency to timing errors over a number of differing occurrence rates. Furthermore, we show that the improvement in the number of missed deadlines increases with the amount of spare resources, with a 71.40% reduction in the best case.","PeriodicalId":119197,"journal":{"name":"2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2017.00080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Large-scale data processing paradigms, such as stream processing, are widespread in academic and corporate workloads. These environments are commonly subject to real-time requirements, such as latency and throughput, and resiliency requirements to node or network failures. These requirements have generally been approached as separate problems. Intermittent timing delays due to factors such as garbage collection can further complicate the management of the stream processing workload. Insufficient resource allocations can also lead to poor performance. Currently, tuning these applications is done manually. We show that improper configuration can greatly affect performance. It is reported that even 100ms of increased latency in online sales platforms can potentially result in lower sales. In this paper we propose Dynamo, a framework and monitor that implements a methodology for addressing both the performance and timing error problems by increasing the resiliency of stream processing frameworks to timing delays. Dynamo autonomously adjusts the resource allocation by using a performance profile that is generated through application profiling. Dynamo partitions an application’s allocated resources into active and passive partitions that are dynamically adjusted to match an application’s multi-modal behavior. The distribution of resources determines the amount of computation that Dynamo can duplicate and process redundantly, thereby reducing the probability of timing errors that affect a tuple’s total execution time. In our experiments, we observed improvements in the number of tuples with missed deadlines. Our results show that Dynamo is able to consistently improve the resiliency to timing errors over a number of differing occurrence rates. Furthermore, we show that the improvement in the number of missed deadlines increases with the amount of spare resources, with a 71.40% reduction in the best case.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
动态提高对流处理工作负载的定时错误的弹性
大规模数据处理范例,如流处理,在学术和企业工作负载中广泛使用。这些环境通常受制于实时需求,例如延迟和吞吐量,以及对节点或网络故障的弹性需求。这些要求通常被视为单独的问题。由于诸如垃圾收集之类的因素造成的间歇性定时延迟会使流处理工作负载的管理进一步复杂化。资源分配不足也会导致性能不佳。目前,这些应用程序的调优是手动完成的。我们展示了不正确的配置会极大地影响性能。据报道,在线销售平台即使延迟增加100毫秒,也可能导致销售额下降。在本文中,我们提出了Dynamo,这是一个框架和监视器,它实现了一种方法,通过增加流处理框架对时序延迟的弹性来解决性能和时序错误问题。Dynamo通过使用通过应用程序分析生成的性能配置文件来自主调整资源分配。Dynamo将应用程序分配的资源分为主动分区和被动分区,动态调整以匹配应用程序的多模式行为。资源的分布决定了Dynamo可以复制和冗余处理的计算量,从而减少了影响元组总执行时间的计时错误的概率。在我们的实验中,我们观察到错过截止日期的元组数量有所改善。我们的结果表明,Dynamo能够在许多不同的发生率上持续提高对定时错误的弹性。此外,我们还表明,错过最后期限的数量随着空闲资源的数量而增加,在最佳情况下减少了71.40%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Implementing Algorithmic Skeletons with Bulk Synchronous Parallel ML Managing Bytecode and ISA Compatibility with an Enhanced Toolchain Improved Online Algorithms for One-Dimensional BinPacking with Advice A Case Study in Higher Education Domain Based on a Prototype for Business Process Models Improvement: BPMoQualAssess NMFDIV: A Nonnegative Matrix Factorization Approach for Search Result Diversification on Attributed Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1