CHARM: Collaborative Host and Accelerator Resource Management for GPU Datacenters

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI:10.1109/ICCD53106.2021.00056

Wei Zhang, Kaihua Fu, Ningxin Zheng, Quan Chen, Chao Li, Wenli Zheng, M. Guo

{"title":"CHARM: Collaborative Host and Accelerator Resource Management for GPU Datacenters","authors":"Wei Zhang, Kaihua Fu, Ningxin Zheng, Quan Chen, Chao Li, Wenli Zheng, M. Guo","doi":"10.1109/ICCD53106.2021.00056","DOIUrl":null,"url":null,"abstract":"Emerging latency-critical (LC) services often have both CPU and GPU stages (e.g. DNN-assisted services) and require short response latency. Co-locating best-effort (BE) applications on the both CPU side and GPU side with the LC service improves resource utilization. However, resource contention often results in the QoS violation of LC services. We therefore present CHARM, a collaborative host-accelerator resource management system. CHARM ensures the required QoS target of DNN-assisted LC services, while maximizing the resource utilization of both the host and accelerator. CHARM is comprised of a BE-aware QoS target allocator, a unified heterogeneous resource manager, and a collaborative accelerator-side QoS compensator. The QoS target allocator determines the time limit of an LC service running on the host side and the accelerator side. The resource manager allocates the shared resources on both host side and accelerator side. The QoS compensator allocates more resources to the LC service to speed up its execution, if it runs slower than expected. Experimental results on an Nvidia GPU RTX 2080Ti show that CHARM improves the resource utilization by 43.2%, while ensuring the required QoS target compared with state-of-the-art solutions.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 39th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD53106.2021.00056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Emerging latency-critical (LC) services often have both CPU and GPU stages (e.g. DNN-assisted services) and require short response latency. Co-locating best-effort (BE) applications on the both CPU side and GPU side with the LC service improves resource utilization. However, resource contention often results in the QoS violation of LC services. We therefore present CHARM, a collaborative host-accelerator resource management system. CHARM ensures the required QoS target of DNN-assisted LC services, while maximizing the resource utilization of both the host and accelerator. CHARM is comprised of a BE-aware QoS target allocator, a unified heterogeneous resource manager, and a collaborative accelerator-side QoS compensator. The QoS target allocator determines the time limit of an LC service running on the host side and the accelerator side. The resource manager allocates the shared resources on both host side and accelerator side. The QoS compensator allocates more resources to the LC service to speed up its execution, if it runs slower than expected. Experimental results on an Nvidia GPU RTX 2080Ti show that CHARM improves the resource utilization by 43.2%, while ensuring the required QoS target compared with state-of-the-art solutions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CHARM: GPU数据中心的协作主机和加速器资源管理

新兴的延迟关键型(LC)服务通常同时具有CPU和GPU两个阶段(例如dnn辅助服务)，并且需要较短的响应延迟。使用LC服务在CPU端和GPU端同时配置best-effort (BE)应用程序可以提高资源利用率。然而，资源争用往往会导致LC服务的QoS冲突。因此，我们提出了CHARM，一个协作主机加速器资源管理系统。CHARM确保了dnn辅助LC服务所需的QoS目标，同时最大限度地提高了主机和加速器的资源利用率。CHARM由一个感知be的QoS目标分配器、一个统一的异构资源管理器和一个协作的加速器端QoS补偿器组成。QoS目标分配器决定LC服务在主机端和加速器端运行的时间限制。资源管理器在主机端和加速器端分配共享资源。如果LC服务的运行速度低于预期，QoS补偿器将为其分配更多的资源，以加快其执行速度。在Nvidia GPU RTX 2080Ti上的实验结果表明，与最先进的解决方案相比，CHARM将资源利用率提高了43.2%，同时确保了所需的QoS目标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE 39th International Conference on Computer Design (ICCD)

自引率

0.00%

发文量