PEP: proactive checkpointing for efficient preemption on GPUs

Chen Li, Andrew Zigerelli, Jun Yang, Yang Guo
{"title":"PEP: proactive checkpointing for efficient preemption on GPUs","authors":"Chen Li, Andrew Zigerelli, Jun Yang, Yang Guo","doi":"10.1109/DAC.2018.8465929","DOIUrl":null,"url":null,"abstract":"The demand for multitasking GPUs increases whenever the GPU may be shared by multiple applications, either spatially or temporally. This requires that GPUs can be preempted and switch context to a new application while already executing one. Unlike CPUs, context switching in GPUs is prohibitively expensive due to the large context states to swap out. There have been a number of efforts on reducing the overhead of preemption, through reducing the context sizes or overlapping context switching with execution. All those techniques are reactive approaches, meaning that context switching occurs when the preemption request arrives.In this paper, we propose a proactive mechanism to reduce the latency of preemption. We observe that kernel execution is almost always preceded by known commands in both CUDA and OpenCL implementations. Hence, a preemption can be anticipated before the actual request arrives. We study such lead time and develop a prediction scheme to perform an early state saving. When the actual preemption is invoked, an incremental update relative to the previous saved state is performed, much like the conventional checkpointing mechanism. This design effectively reduces the stall time of the preempting kernel due to context switching by 58.6%. Moreover, through careful handling of the saved state, we can also reduce the overall size of saved state by an average of 23.3%, compared with a full context switching.","PeriodicalId":87346,"journal":{"name":"Proceedings. Design Automation Conference","volume":"21 1","pages":"114:1-114:6"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Design Automation Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAC.2018.8465929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The demand for multitasking GPUs increases whenever the GPU may be shared by multiple applications, either spatially or temporally. This requires that GPUs can be preempted and switch context to a new application while already executing one. Unlike CPUs, context switching in GPUs is prohibitively expensive due to the large context states to swap out. There have been a number of efforts on reducing the overhead of preemption, through reducing the context sizes or overlapping context switching with execution. All those techniques are reactive approaches, meaning that context switching occurs when the preemption request arrives.In this paper, we propose a proactive mechanism to reduce the latency of preemption. We observe that kernel execution is almost always preceded by known commands in both CUDA and OpenCL implementations. Hence, a preemption can be anticipated before the actual request arrives. We study such lead time and develop a prediction scheme to perform an early state saving. When the actual preemption is invoked, an incremental update relative to the previous saved state is performed, much like the conventional checkpointing mechanism. This design effectively reduces the stall time of the preempting kernel due to context switching by 58.6%. Moreover, through careful handling of the saved state, we can also reduce the overall size of saved state by an average of 23.3%, compared with a full context switching.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PEP:在gpu上有效抢占的主动检查点
当GPU可能被多个应用程序(无论是空间上还是时间上)共享时,对多任务GPU的需求就会增加。这要求gpu可以被抢占,并在已经执行一个应用程序时将上下文切换到一个新的应用程序。与cpu不同,gpu中的上下文切换代价高昂,因为要交换的上下文状态很大。通过减少上下文大小或在执行时重叠上下文切换,已经进行了许多减少抢占开销的工作。所有这些技术都是响应式方法,这意味着在抢占请求到达时发生上下文切换。在本文中,我们提出了一种减少抢占延迟的主动机制。我们观察到,在CUDA和OpenCL实现中,内核执行几乎总是先有已知的命令。因此,在实际请求到达之前可以预见到抢占。我们研究了这种提前期,并开发了一种预测方案来执行早期状态节省。当调用实际的抢占时,将执行相对于前一个保存状态的增量更新,这与传统的检查点机制非常相似。这种设计有效地减少了由于上下文切换导致的抢占内核的停顿时间58.6%。此外,通过仔细处理保存的状态,与完全上下文切换相比,我们还可以将保存状态的总体大小平均减少23.3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Muffin: A Framework Toward Multi-Dimension AI Fairness by Uniting Off-the-Shelf Models. DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10 - 14, 2022 General Chair's Message Exploiting Computation Reuse for Stencil Accelerators. Reconciling remote attestation and safety-critical operation on simple IoT devices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1