面向深度神经网络推理的异构服务器qos感知调度

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI:10.1145/3132847.3133045

Zhou Fang, Tong Yu, O. Mengshoel, Rajesh K. Gupta

{"title":"面向深度神经网络推理的异构服务器qos感知调度","authors":"Zhou Fang, Tong Yu, O. Mengshoel, Rajesh K. Gupta","doi":"10.1145/3132847.3133045","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) are popular in diverse fields such as computer vision and natural language processing. DNN inference tasks are emerging as a service provided by cloud computing environments. However, cloud-hosted DNN inference faces new challenges in workload scheduling for the best Quality of Service (QoS), due to dependence on batch size, model complexity and resource allocation. This paper represents the QoS metric as a utility function of response delay and inference accuracy. We first propose a simple and effective heuristic approach that keeps low response delay and satisfies the requirement on processing throughput. Then we describe an advanced deep reinforcement learning (RL) approach that learns to schedule from experience. The RL scheduler is trained to maximize QoS, using a set of system statuses as the input to the RL policy model. Our approach performs scheduling actions only when there are free GPUs, thus reduces scheduling overhead over common RL schedulers that run at every continuous time step. We evaluate the schedulers on a simulation platform and demonstrate the advantages of RL over heuristics.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"QoS-Aware Scheduling of Heterogeneous Servers for Inference in Deep Neural Networks\",\"authors\":\"Zhou Fang, Tong Yu, O. Mengshoel, Rajesh K. Gupta\",\"doi\":\"10.1145/3132847.3133045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) are popular in diverse fields such as computer vision and natural language processing. DNN inference tasks are emerging as a service provided by cloud computing environments. However, cloud-hosted DNN inference faces new challenges in workload scheduling for the best Quality of Service (QoS), due to dependence on batch size, model complexity and resource allocation. This paper represents the QoS metric as a utility function of response delay and inference accuracy. We first propose a simple and effective heuristic approach that keeps low response delay and satisfies the requirement on processing throughput. Then we describe an advanced deep reinforcement learning (RL) approach that learns to schedule from experience. The RL scheduler is trained to maximize QoS, using a set of system statuses as the input to the RL policy model. Our approach performs scheduling actions only when there are free GPUs, thus reduces scheduling overhead over common RL schedulers that run at every continuous time step. We evaluate the schedulers on a simulation platform and demonstrate the advantages of RL over heuristics.\",\"PeriodicalId\":20449,\"journal\":{\"name\":\"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3132847.3133045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3132847.3133045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

摘要

深度神经网络(dnn)在计算机视觉和自然语言处理等各个领域都很受欢迎。深度神经网络推理任务作为云计算环境提供的一种服务正在兴起。然而，由于依赖于批处理大小、模型复杂性和资源分配，云托管DNN推理在实现最佳服务质量(QoS)的工作负载调度方面面临新的挑战。本文将QoS度量表示为响应延迟和推理精度的效用函数。首先提出了一种简单有效的启发式方法，既能保持较低的响应延迟，又能满足处理吞吐量的要求。然后，我们描述了一种先进的深度强化学习(RL)方法，它可以从经验中学习调度。使用一组系统状态作为RL策略模型的输入，训练RL调度器以最大化QoS。我们的方法仅在有空闲gpu时执行调度操作，从而减少了在每个连续时间步运行的普通RL调度器的调度开销。我们在仿真平台上评估了调度程序，并展示了强化学习相对于启发式的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

QoS-Aware Scheduling of Heterogeneous Servers for Inference in Deep Neural Networks

Deep neural networks (DNNs) are popular in diverse fields such as computer vision and natural language processing. DNN inference tasks are emerging as a service provided by cloud computing environments. However, cloud-hosted DNN inference faces new challenges in workload scheduling for the best Quality of Service (QoS), due to dependence on batch size, model complexity and resource allocation. This paper represents the QoS metric as a utility function of response delay and inference accuracy. We first propose a simple and effective heuristic approach that keeps low response delay and satisfies the requirement on processing throughput. Then we describe an advanced deep reinforcement learning (RL) approach that learns to schedule from experience. The RL scheduler is trained to maximize QoS, using a set of system statuses as the input to the RL policy model. Our approach performs scheduling actions only when there are free GPUs, thus reduces scheduling overhead over common RL schedulers that run at every continuous time step. We evaluate the schedulers on a simulation platform and demonstrate the advantages of RL over heuristics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

自引率

0.00%

发文量

期刊最新文献

Query and Animate Multi-attribute Trajectory Data HyPerInsight: Data Exploration Deep Inside HyPer Algorithmic Bias: Do Good Systems Make Relevant Documents More Retrievable? NeuPL: Attention-based Semantic Matching and Pair-Linking for Entity Disambiguation Health Forum Thread Recommendation Using an Interest Aware Topic Model