Kelp: QoS for Accelerated Machine Learning Systems

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2019-02-01 DOI:10.1109/HPCA.2019.00036

Haishan Zhu, David Lo, Liqun Cheng, R. Govindaraju, Parthasarathy Ranganathan, M. Erez

{"title":"Kelp: QoS for Accelerated Machine Learning Systems","authors":"Haishan Zhu, David Lo, Liqun Cheng, R. Govindaraju, Parthasarathy Ranganathan, M. Erez","doi":"10.1109/HPCA.2019.00036","DOIUrl":null,"url":null,"abstract":"Development and deployment of machine learning (ML) accelerators in Warehouse Scale Computers (WSCs) demand significant capital investments and engineering efforts. However, even though heavy computation can be offloaded to the accelerators, applications often depend on the host system for various supporting tasks. As a result, contention on host resources, such as memory bandwidth, can significantly discount the performance and efficiency gains of accelerators. The impact of performance interference is further amplified in distributed learning, which has become increasingly common as model sizes continue to grow. In this work, we study the performance of four production machine learning workloads on three accelerator platforms. Our experiments show that these workloads are highly sensitive to host memory bandwidth contention, which can cause 40% average performance degradation when left unmanaged. To tackle this problem, we design and implement Kelp, a software runtime that isolates high priority accelerated ML tasks from memory resource interference. We evaluate Kelp with both production and artificial aggressor workloads, and compare its effectiveness with previously proposed solutions. Our evaluation shows that Kelp is effective in mitigating performance degradation of the accelerated tasks, and improves performance by 24% on average. Compared to previous work, Kelp reduces performance degradation of ML tasks by 7% and improves system efficiency by 17%. Our results further expose opportunities in future architecture designs.","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Development and deployment of machine learning (ML) accelerators in Warehouse Scale Computers (WSCs) demand significant capital investments and engineering efforts. However, even though heavy computation can be offloaded to the accelerators, applications often depend on the host system for various supporting tasks. As a result, contention on host resources, such as memory bandwidth, can significantly discount the performance and efficiency gains of accelerators. The impact of performance interference is further amplified in distributed learning, which has become increasingly common as model sizes continue to grow. In this work, we study the performance of four production machine learning workloads on three accelerator platforms. Our experiments show that these workloads are highly sensitive to host memory bandwidth contention, which can cause 40% average performance degradation when left unmanaged. To tackle this problem, we design and implement Kelp, a software runtime that isolates high priority accelerated ML tasks from memory resource interference. We evaluate Kelp with both production and artificial aggressor workloads, and compare its effectiveness with previously proposed solutions. Our evaluation shows that Kelp is effective in mitigating performance degradation of the accelerated tasks, and improves performance by 24% on average. Compared to previous work, Kelp reduces performance degradation of ML tasks by 7% and improves system efficiency by 17%. Our results further expose opportunities in future architecture designs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

加速机器学习系统的QoS

在仓库规模计算机(WSCs)中开发和部署机器学习(ML)加速器需要大量的资本投资和工程努力。然而，即使繁重的计算可以卸载到加速器上，应用程序经常依赖于主机系统来完成各种支持任务。因此，对主机资源(如内存带宽)的争用会大大降低加速器的性能和效率增益。性能干扰的影响在分布式学习中被进一步放大，随着模型规模的不断增长，分布式学习变得越来越普遍。在这项工作中，我们研究了四个生产机器学习工作负载在三个加速器平台上的性能。我们的实验表明，这些工作负载对主机内存带宽争用非常敏感，如果不进行管理，可能会导致40%的平均性能下降。为了解决这个问题，我们设计并实现了Kelp，一个将高优先级加速ML任务与内存资源干扰隔离开来的软件运行时。我们评估了海带在生产和人工侵蚀负荷下的有效性，并将其与先前提出的解决方案进行了比较。我们的评估表明，海带有效地缓解了加速任务的性能下降，平均提高了24%的性能。与之前的工作相比，Kelp将ML任务的性能降低了7%，并将系统效率提高了17%。我们的研究结果进一步揭示了未来建筑设计的机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量

期刊最新文献

Machine Learning at Facebook: Understanding Inference at the Edge Understanding the Future of Energy Efficiency in Multi-Module GPUs POWERT Channels: A Novel Class of Covert CommunicationExploiting Power Management Vulnerabilities The Accelerator Wall: Limits of Chip Specialization Featherlight Reuse-Distance Measurement