集成CPU+GPU处理器的调度挑战与机遇

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia) Pub Date : 2016-10-01 DOI:10.1145/2993452.2994307

K. Dev, S. Reda

{"title":"集成CPU+GPU处理器的调度挑战与机遇","authors":"K. Dev, S. Reda","doi":"10.1145/2993452.2994307","DOIUrl":null,"url":null,"abstract":"Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping. In this paper we first provide detailed infrared imaging results that show the impact of mapping decisions on the thermal and power profiles of CPU+GPU processors. Furthermore, we observe that runtime conditions such as power and CPU load from traditional workloads also affect the mapping decision. To exploit our observations, we propose techniques to characterize the OpenCL kernel workloads during run-time and map them on appropriate device under time-varying physical (i.e., chip power limit) and CPU load conditions, in particular the number of available CPU cores for the OpenCL kernel. We implement our dynamic scheduler on a real CPU+GPU processor and evaluate it using various OpenCL benchmarks. Compared to the state-ofthe- art kernel-level scheduling method, the proposed scheduler provides up to 31% and 10% improvements in runtime and energy, respectively.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Scheduling challenges and opportunities in integrated CPU+GPU processors\",\"authors\":\"K. Dev, S. Reda\",\"doi\":\"10.1145/2993452.2994307\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping. In this paper we first provide detailed infrared imaging results that show the impact of mapping decisions on the thermal and power profiles of CPU+GPU processors. Furthermore, we observe that runtime conditions such as power and CPU load from traditional workloads also affect the mapping decision. To exploit our observations, we propose techniques to characterize the OpenCL kernel workloads during run-time and map them on appropriate device under time-varying physical (i.e., chip power limit) and CPU load conditions, in particular the number of available CPU cores for the OpenCL kernel. We implement our dynamic scheduler on a real CPU+GPU processor and evaluate it using various OpenCL benchmarks. Compared to the state-ofthe- art kernel-level scheduling method, the proposed scheduler provides up to 31% and 10% improvements in runtime and energy, respectively.\",\"PeriodicalId\":198459,\"journal\":{\"name\":\"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)\",\"volume\":\"122 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2993452.2994307\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2993452.2994307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

在同一芯片上集成不同架构器件(CPU和GPU)的异构处理器为各种工作负载提供了良好的性能和能源效率。然而，就在适当的设备上调度工作负载而言，它们也带来了挑战和机遇。当前的调度实践主要使用内核工作负载的特征来决定CPU/GPU的映射。在本文中，我们首先提供了详细的红外成像结果，显示了映射决策对CPU+GPU处理器的热和功耗配置文件的影响。此外，我们观察到运行时条件(如传统工作负载的功率和CPU负载)也会影响映射决策。为了利用我们的观察结果，我们提出了在运行期间表征OpenCL内核工作负载的技术，并在时变的物理(即芯片功率限制)和CPU负载条件下将它们映射到适当的设备上，特别是OpenCL内核可用的CPU内核数量。我们在一个真正的CPU+GPU处理器上实现了动态调度器，并使用各种OpenCL基准测试对其进行了评估。与最先进的内核级调度方法相比，所提出的调度程序在运行时间和能源方面分别提供了31%和10%的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Scheduling challenges and opportunities in integrated CPU+GPU processors

Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping. In this paper we first provide detailed infrared imaging results that show the impact of mapping decisions on the thermal and power profiles of CPU+GPU processors. Furthermore, we observe that runtime conditions such as power and CPU load from traditional workloads also affect the mapping decision. To exploit our observations, we propose techniques to characterize the OpenCL kernel workloads during run-time and map them on appropriate device under time-varying physical (i.e., chip power limit) and CPU load conditions, in particular the number of available CPU cores for the OpenCL kernel. We implement our dynamic scheduler on a real CPU+GPU processor and evaluate it using various OpenCL benchmarks. Compared to the state-ofthe- art kernel-level scheduling method, the proposed scheduler provides up to 31% and 10% improvements in runtime and energy, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

自引率

0.00%

发文量