GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping

Jens Kehne, Jonathan Metter, Frank Bellosa
{"title":"GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping","authors":"Jens Kehne, Jonathan Metter, Frank Bellosa","doi":"10.1145/2731186.2731192","DOIUrl":null,"url":null,"abstract":"Over the last few years, GPUs have been finding their way into cloud computing platforms, allowing users to benefit from the performance of GPUs at low cost. However, a large portion of the cloud's cost advantage traditionally stems from oversubscription: Cloud providers rent out more resources to their customers than are actually available, expecting that the customers will not actually use all of the promised resources. For GPU memory, this oversubscription is difficult due to the lack of support for demand paging in current GPUs. Therefore, recent approaches to enabling oversubscription of GPU memory resort to software scheduling of GPU kernels -- which has been shown to induce significant runtime overhead in applications even if sufficient GPU memory is available -- to ensure that data is present on the GPU when referenced. In this paper, we present GPUswap, a novel approach to enabling oversubscription of GPU memory that does not rely on software scheduling of GPU kernels. GPUswap uses the GPU's ability to access system RAM directly to extend the GPU's own memory. To that end, GPUswap transparently relocates data from the GPU to system RAM in response to memory pressure. GPUswap ensures that all data is permanently accessible to the GPU and thus allows applications to submit commands to the GPU directly at any time, without the need for software scheduling. Experiments with our prototype implementation show that GPU applications can still execute even with only 20 MB of GPU memory available. In addition, while software scheduling suffers from permanent overhead even with sufficient GPU memory available, our approach executes GPU applications with native performance.","PeriodicalId":186972,"journal":{"name":"Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2731186.2731192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38

Abstract

Over the last few years, GPUs have been finding their way into cloud computing platforms, allowing users to benefit from the performance of GPUs at low cost. However, a large portion of the cloud's cost advantage traditionally stems from oversubscription: Cloud providers rent out more resources to their customers than are actually available, expecting that the customers will not actually use all of the promised resources. For GPU memory, this oversubscription is difficult due to the lack of support for demand paging in current GPUs. Therefore, recent approaches to enabling oversubscription of GPU memory resort to software scheduling of GPU kernels -- which has been shown to induce significant runtime overhead in applications even if sufficient GPU memory is available -- to ensure that data is present on the GPU when referenced. In this paper, we present GPUswap, a novel approach to enabling oversubscription of GPU memory that does not rely on software scheduling of GPU kernels. GPUswap uses the GPU's ability to access system RAM directly to extend the GPU's own memory. To that end, GPUswap transparently relocates data from the GPU to system RAM in response to memory pressure. GPUswap ensures that all data is permanently accessible to the GPU and thus allows applications to submit commands to the GPU directly at any time, without the need for software scheduling. Experiments with our prototype implementation show that GPU applications can still execute even with only 20 MB of GPU memory available. In addition, while software scheduling suffers from permanent overhead even with sufficient GPU memory available, our approach executes GPU applications with native performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GPUswap:通过透明交换开启GPU内存超订阅功能
在过去的几年里,gpu已经找到了进入云计算平台的方式,允许用户以低成本从gpu的性能中受益。然而,云计算的很大一部分成本优势传统上源于超额订购:云计算提供商向客户出租的资源比实际可用的资源要多,他们期望客户实际上不会使用所有承诺的资源。对于GPU内存,由于当前GPU缺乏对需求分页的支持,这种过度订阅是困难的。因此,最近启用GPU内存超额订阅的方法诉诸于GPU内核的软件调度——即使有足够的GPU内存可用,也会在应用程序中引起显著的运行时开销——以确保数据在引用时出现在GPU上。在本文中,我们提出了GPUswap,一种新的方法来实现GPU内存的超额订阅,而不依赖于GPU内核的软件调度。GPUswap使用GPU直接访问系统RAM的能力来扩展GPU自己的内存。为此,GPUswap透明地将数据从GPU重新定位到系统RAM,以响应内存压力。GPUswap确保所有数据都可以永久访问GPU,从而允许应用程序在任何时候直接向GPU提交命令,而无需软件调度。用我们的原型实现进行的实验表明,即使GPU内存只有20 MB可用,GPU应用程序仍然可以执行。此外,即使有足够的GPU内存可用,软件调度也会受到永久开销的影响,我们的方法可以以本机性能执行GPU应用程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Supporting High Performance Molecular Dynamics in Virtualized Clusters using IOMMU, SR-IOV, and GPUDirect Migration of Web Applications with Seamless Execution A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters HeteroVisor: Exploiting Resource Heterogeneity to Enhance the Elasticity of Cloud Platforms PEMU: A Pin Highly Compatible Out-of-VM Dynamic Binary Instrumentation Framework
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1