Improving Short Job Latency Performance in Hybrid Job Schedulers with Dice

Wei Zhou, K. White, Hongfeng Yu
{"title":"Improving Short Job Latency Performance in Hybrid Job Schedulers with Dice","authors":"Wei Zhou, K. White, Hongfeng Yu","doi":"10.1145/3337821.3337851","DOIUrl":null,"url":null,"abstract":"It is common to find a mixture of both long batch jobs and latency-sensitive short jobs in enterprise data centers. Recently hybrid job schedulers emerge as attractive alternatives of conventional centralized job schedulers. In this paper, we conduct trace-driven experiments to study the job-completion-delay performance of two representative hybrid job schedulers (Hawk and Eagle), and find that short jobs still encounter long latency issues due to fluctuating bursty nature of workloads. To this end, we propose Dice, a general performance optimization framework for hybrid job schedulers, to alleviate the high job-completion-delay problem of short jobs. Dice is composed of two simple yet effective techniques: Elastic Sizing and Opportunistic Preemption. Both Elastic Sizing and Opportunistic Preemption keep track of the task waiting times of short jobs. When the mean task waiting time of short jobs is high, Elastic Sizing dynamically and adaptively increases the short partition size to prioritize short jobs over long jobs. On the other hand, Opportunistic Preemption preempts resources from long tasks running in the general partition on demand, so as to mitigate the \"head-of-line\" blocking problem of short jobs. We enhance the two schedulers with Dice and evaluate Dice performance improvement in our prototype implementation. Experiment results show that Dice achieves 50.9%, 54.5%, and 43.5% improvement on 50th-percentile, 75th-percentile, and 90th-percentile job completion delays of short jobs in Hawk respectively, as well as 33.2%, 74.1%, and 85.3% improvement on those in Eagle respectively under the Google trace, at low performance costs to long jobs.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

It is common to find a mixture of both long batch jobs and latency-sensitive short jobs in enterprise data centers. Recently hybrid job schedulers emerge as attractive alternatives of conventional centralized job schedulers. In this paper, we conduct trace-driven experiments to study the job-completion-delay performance of two representative hybrid job schedulers (Hawk and Eagle), and find that short jobs still encounter long latency issues due to fluctuating bursty nature of workloads. To this end, we propose Dice, a general performance optimization framework for hybrid job schedulers, to alleviate the high job-completion-delay problem of short jobs. Dice is composed of two simple yet effective techniques: Elastic Sizing and Opportunistic Preemption. Both Elastic Sizing and Opportunistic Preemption keep track of the task waiting times of short jobs. When the mean task waiting time of short jobs is high, Elastic Sizing dynamically and adaptively increases the short partition size to prioritize short jobs over long jobs. On the other hand, Opportunistic Preemption preempts resources from long tasks running in the general partition on demand, so as to mitigate the "head-of-line" blocking problem of short jobs. We enhance the two schedulers with Dice and evaluate Dice performance improvement in our prototype implementation. Experiment results show that Dice achieves 50.9%, 54.5%, and 43.5% improvement on 50th-percentile, 75th-percentile, and 90th-percentile job completion delays of short jobs in Hawk respectively, as well as 33.2%, 74.1%, and 85.3% improvement on those in Eagle respectively under the Google trace, at low performance costs to long jobs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用Dice改进混合作业调度器的短作业延迟性能
在企业数据中心中,经常会发现长批作业和对延迟敏感的短作业的混合。最近,混合作业调度器作为传统集中式作业调度器的有吸引力的替代品而出现。本文通过跟踪驱动实验研究了两种具有代表性的混合作业调度器(Hawk和Eagle)的作业完成-延迟性能,发现由于工作负载的突发性波动,短作业仍然会遇到长延迟问题。为此,我们提出了混合作业调度器的通用性能优化框架Dice,以缓解短作业的高作业完成延迟问题。Dice由两种简单而有效的技术组成:弹性大小和机会主义抢占。弹性分级和机会抢占都跟踪短作业的任务等待时间。当短作业的平均任务等待时间较长时,弹性分级可以动态自适应地增加短作业的分区大小,使短作业优先于长作业。另一方面,机会主义抢占从一般分区中运行的长任务中按需抢占资源,以缓解短作业的“排队”阻塞问题。我们用Dice增强了这两个调度器,并在我们的原型实现中评估了Dice的性能改进。实验结果表明,在Google跟踪下,Dice在Hawk中短作业的第50百分位、第75百分位和第90百分位作业完成延迟分别提高了50.9%、54.5%和43.5%,在Eagle中分别提高了33.2%、74.1%和85.3%,且对长作业的性能成本较低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Express Link Placement for NoC-Based Many-Core Platforms Cartesian Collective Communication Artemis A Specialized Concurrent Queue for Scheduling Irregular Workloads on GPUs diBELLA: Distributed Long Read to Long Read Alignment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1