GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping

Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments Pub Date : 2015-03-14 DOI:10.1145/2731186.2731192

Jens Kehne, Jonathan Metter, Frank Bellosa

{"title":"GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping","authors":"Jens Kehne, Jonathan Metter, Frank Bellosa","doi":"10.1145/2731186.2731192","DOIUrl":null,"url":null,"abstract":"Over the last few years, GPUs have been finding their way into cloud computing platforms, allowing users to benefit from the performance of GPUs at low cost. However, a large portion of the cloud's cost advantage traditionally stems from oversubscription: Cloud providers rent out more resources to their customers than are actually available, expecting that the customers will not actually use all of the promised resources. For GPU memory, this oversubscription is difficult due to the lack of support for demand paging in current GPUs. Therefore, recent approaches to enabling oversubscription of GPU memory resort to software scheduling of GPU kernels -- which has been shown to induce significant runtime overhead in applications even if sufficient GPU memory is available -- to ensure that data is present on the GPU when referenced. In this paper, we present GPUswap, a novel approach to enabling oversubscription of GPU memory that does not rely on software scheduling of GPU kernels. GPUswap uses the GPU's ability to access system RAM directly to extend the GPU's own memory. To that end, GPUswap transparently relocates data from the GPU to system RAM in response to memory pressure. GPUswap ensures that all data is permanently accessible to the GPU and thus allows applications to submit commands to the GPU directly at any time, without the need for software scheduling. Experiments with our prototype implementation show that GPU applications can still execute even with only 20 MB of GPU memory available. In addition, while software scheduling suffers from permanent overhead even with sufficient GPU memory available, our approach executes GPU applications with native performance.","PeriodicalId":186972,"journal":{"name":"Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2731186.2731192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

Abstract

Over the last few years, GPUs have been finding their way into cloud computing platforms, allowing users to benefit from the performance of GPUs at low cost. However, a large portion of the cloud's cost advantage traditionally stems from oversubscription: Cloud providers rent out more resources to their customers than are actually available, expecting that the customers will not actually use all of the promised resources. For GPU memory, this oversubscription is difficult due to the lack of support for demand paging in current GPUs. Therefore, recent approaches to enabling oversubscription of GPU memory resort to software scheduling of GPU kernels -- which has been shown to induce significant runtime overhead in applications even if sufficient GPU memory is available -- to ensure that data is present on the GPU when referenced. In this paper, we present GPUswap, a novel approach to enabling oversubscription of GPU memory that does not rely on software scheduling of GPU kernels. GPUswap uses the GPU's ability to access system RAM directly to extend the GPU's own memory. To that end, GPUswap transparently relocates data from the GPU to system RAM in response to memory pressure. GPUswap ensures that all data is permanently accessible to the GPU and thus allows applications to submit commands to the GPU directly at any time, without the need for software scheduling. Experiments with our prototype implementation show that GPU applications can still execute even with only 20 MB of GPU memory available. In addition, while software scheduling suffers from permanent overhead even with sufficient GPU memory available, our approach executes GPU applications with native performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GPUswap:通过透明交换开启GPU内存超订阅功能

在过去的几年里，gpu已经找到了进入云计算平台的方式，允许用户以低成本从gpu的性能中受益。然而，云计算的很大一部分成本优势传统上源于超额订购:云计算提供商向客户出租的资源比实际可用的资源要多，他们期望客户实际上不会使用所有承诺的资源。对于GPU内存，由于当前GPU缺乏对需求分页的支持，这种过度订阅是困难的。因此，最近启用GPU内存超额订阅的方法诉诸于GPU内核的软件调度——即使有足够的GPU内存可用，也会在应用程序中引起显著的运行时开销——以确保数据在引用时出现在GPU上。在本文中，我们提出了GPUswap，一种新的方法来实现GPU内存的超额订阅，而不依赖于GPU内核的软件调度。GPUswap使用GPU直接访问系统RAM的能力来扩展GPU自己的内存。为此，GPUswap透明地将数据从GPU重新定位到系统RAM，以响应内存压力。GPUswap确保所有数据都可以永久访问GPU，从而允许应用程序在任何时候直接向GPU提交命令，而无需软件调度。用我们的原型实现进行的实验表明，即使GPU内存只有20 MB可用，GPU应用程序仍然可以执行。此外，即使有足够的GPU内存可用，软件调度也会受到永久开销的影响，我们的方法可以以本机性能执行GPU应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

自引率

0.00%

发文量