DymGPU: Dynamic Memory Management for Sharing GPUs in Virtualized Clouds

2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W) Pub Date : 2018-09-01 DOI:10.1109/FAS-W.2018.00025

Younghun Park, Minwoo Gu, Sun-Mi Yoo, Youngjae Kim, Sungyong Park

{"title":"DymGPU: Dynamic Memory Management for Sharing GPUs in Virtualized Clouds","authors":"Younghun Park, Minwoo Gu, Sun-Mi Yoo, Youngjae Kim, Sungyong Park","doi":"10.1109/FAS-W.2018.00025","DOIUrl":null,"url":null,"abstract":"gVirt is a full GPU virtualization technique for Intel's integrated GPUs that alleviates the problems of other GPU virtualization techniques such as API remoting and direct pass-through. The original gVirt is known to have an inherent scalability limitation on the number of simultaneous virtual machines (VM). gScale solved this problem by allowing each VM to share a global graphics memory space and copy the entries in a private graphics translation table (GTT) to a physical GTT along with a GPU context switch. However, it still suffers from a large overhead of copying entries between private GTT and physical GTT, which becomes worse when the global graphics memory space allocated for each VM is overlapped. In this paper, we identify that the copy overhead caused by GPU context switch is the major bottleneck in performance improvement and propose a dynamic memory management scheme, called DymGPU, that provides two memory allocation algorithms such as size-based and utilization-based algorithms. While the size-based algorithm allocates memory space based on the memory size required by each VM, the utilization-based algorithm considers GPU utilization of each VM to allocate the memory space. DymGPU is also dynamic in the sense that the global graphics memory space used by each VM is rearranged at runtime by periodically checking idle VMs and GPU utilization of each runnable VM. We have implemented our proposed approach in gVirt and confirmed that the proposed scheme reduces GPU context switch time by up to 53% and improved the overall performance of various GPU applications by up to 39%.","PeriodicalId":164903,"journal":{"name":"2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FAS-W.2018.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

gVirt is a full GPU virtualization technique for Intel's integrated GPUs that alleviates the problems of other GPU virtualization techniques such as API remoting and direct pass-through. The original gVirt is known to have an inherent scalability limitation on the number of simultaneous virtual machines (VM). gScale solved this problem by allowing each VM to share a global graphics memory space and copy the entries in a private graphics translation table (GTT) to a physical GTT along with a GPU context switch. However, it still suffers from a large overhead of copying entries between private GTT and physical GTT, which becomes worse when the global graphics memory space allocated for each VM is overlapped. In this paper, we identify that the copy overhead caused by GPU context switch is the major bottleneck in performance improvement and propose a dynamic memory management scheme, called DymGPU, that provides two memory allocation algorithms such as size-based and utilization-based algorithms. While the size-based algorithm allocates memory space based on the memory size required by each VM, the utilization-based algorithm considers GPU utilization of each VM to allocate the memory space. DymGPU is also dynamic in the sense that the global graphics memory space used by each VM is rearranged at runtime by periodically checking idle VMs and GPU utilization of each runnable VM. We have implemented our proposed approach in gVirt and confirmed that the proposed scheme reduces GPU context switch time by up to 53% and improved the overall performance of various GPU applications by up to 39%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DymGPU:虚拟化云中共享gpu的动态内存管理

gVirt是一种针对英特尔集成GPU的完整GPU虚拟化技术，它缓解了其他GPU虚拟化技术(如API远程和直接直通)的问题。众所周知，最初的gVirt在并发虚拟机(VM)的数量上存在固有的可伸缩性限制。gScale通过允许每个VM共享全局图形内存空间并将私有图形转换表(GTT)中的条目复制到物理GTT以及GPU上下文切换来解决这个问题。但是，在私有GTT和物理GTT之间复制条目的开销仍然很大，当为每个VM分配的全局图形内存空间重叠时，情况会变得更糟。在本文中，我们确定了由GPU上下文切换引起的复制开销是性能改进的主要瓶颈，并提出了一种称为DymGPU的动态内存管理方案，该方案提供了两种内存分配算法，如基于大小和基于利用率的算法。基于大小的算法是根据每个虚拟机所需的内存大小来分配内存空间，而基于利用率的算法是根据每个虚拟机的GPU利用率来分配内存空间。DymGPU也是动态的，通过定期检查空闲虚拟机和每个可运行虚拟机的GPU利用率，在运行时重新安排每个虚拟机使用的全局图形内存空间。我们已经在gVirt中实现了我们提出的方法，并证实了所提出的方案将GPU上下文切换时间减少了53%，并将各种GPU应用程序的整体性能提高了39%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W)

自引率

0.00%

发文量