Poly: Efficient Heterogeneous System and Application Management for Interactive Applications

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2019-02-01 DOI:10.1109/HPCA.2019.00038

Shuo Wang, Yun Liang, Wei Zhang

{"title":"Poly: Efficient Heterogeneous System and Application Management for Interactive Applications","authors":"Shuo Wang, Yun Liang, Wei Zhang","doi":"10.1109/HPCA.2019.00038","DOIUrl":null,"url":null,"abstract":"QoS-sensitive workloads, common in warehousescale datacenters, require a guaranteed stable tail latency percentile response latency) of the service. Unfortunately, the system load (e.g., RPS) fluctuates drastically during daily datacenter operations. In order to meet the maximum system RPS requirement, datacenter tends to overprovision the hardware accelerators, which makes the datacenter underutilized.Therefore, the throughput and energy efficiency scaling of the current accelerator-outfitted datacenter are very expensive for QoS-sensitive workloads. To overcome this challenge, this work introduces Poly, an OpenCL based heterogeneous system optimization framework that targets to improve the overall throughput scalability and energy proportionality while guaranteeing the QoS by efficiently utilizing GPUs and FPGAs based accelerators within datacenter. Poly is mainly composed of two phases. At compile-time, Poly automatically captures the parallel patterns in the applications and explores a comprehensive design space within and across parallel patterns. At runtime, Poly relies on a runtime kernel scheduler to judiciously make the scheduling decisions to accommodate the dynamic latency and throughput requirements. Experiments using a variety of cloud QoS-sensitive applications show that Poly improves the energy proportionality by 23%(17%) without sacrificing the QoS compared to the state-of-the-art GPU (FPGA) solution, respectively. Keywords-Heterogeneous; GPU; FPGA; Performance Optimization;","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

QoS-sensitive workloads, common in warehousescale datacenters, require a guaranteed stable tail latency percentile response latency) of the service. Unfortunately, the system load (e.g., RPS) fluctuates drastically during daily datacenter operations. In order to meet the maximum system RPS requirement, datacenter tends to overprovision the hardware accelerators, which makes the datacenter underutilized.Therefore, the throughput and energy efficiency scaling of the current accelerator-outfitted datacenter are very expensive for QoS-sensitive workloads. To overcome this challenge, this work introduces Poly, an OpenCL based heterogeneous system optimization framework that targets to improve the overall throughput scalability and energy proportionality while guaranteeing the QoS by efficiently utilizing GPUs and FPGAs based accelerators within datacenter. Poly is mainly composed of two phases. At compile-time, Poly automatically captures the parallel patterns in the applications and explores a comprehensive design space within and across parallel patterns. At runtime, Poly relies on a runtime kernel scheduler to judiciously make the scheduling decisions to accommodate the dynamic latency and throughput requirements. Experiments using a variety of cloud QoS-sensitive applications show that Poly improves the energy proportionality by 23%(17%) without sacrificing the QoS compared to the state-of-the-art GPU (FPGA) solution, respectively. Keywords-Heterogeneous; GPU; FPGA; Performance Optimization;

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

高效异构系统和交互式应用程序管理

对qos敏感的工作负载(在仓库级数据中心中很常见)需要保证稳定的服务尾部延迟(响应延迟百分比)。不幸的是，系统负载(例如RPS)在日常数据中心操作期间波动很大。为了满足最大的系统RPS需求，数据中心往往会过度配置硬件加速器，从而导致数据中心利用率不足。因此，当前配备加速器的数据中心的吞吐量和能效扩展对于qos敏感的工作负载来说是非常昂贵的。为了克服这一挑战，本工作引入了Poly，一个基于OpenCL的异构系统优化框架，旨在通过有效利用数据中心内基于gpu和fpga的加速器来提高整体吞吐量可扩展性和能量比例性，同时保证QoS。Poly主要由两相组成。在编译时，Poly自动捕获应用程序中的并行模式，并在并行模式内部和跨并行模式探索一个全面的设计空间。在运行时，Poly依赖运行时内核调度器明智地做出调度决策，以适应动态延迟和吞吐量需求。使用各种云QoS敏感应用的实验表明，与最先进的GPU (FPGA)解决方案相比，Poly在不牺牲QoS的情况下分别将能量比例提高了23%(17%)。Keywords-Heterogeneous;GPU;FPGA;性能优化;

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量

期刊最新文献

Machine Learning at Facebook: Understanding Inference at the Edge Understanding the Future of Energy Efficiency in Multi-Module GPUs POWERT Channels: A Novel Class of Covert CommunicationExploiting Power Management Vulnerabilities The Accelerator Wall: Limits of Chip Specialization Featherlight Reuse-Distance Measurement