Parallel memory defragmentation on a GPU

R. Veldema, M. Philippsen
{"title":"Parallel memory defragmentation on a GPU","authors":"R. Veldema, M. Philippsen","doi":"10.1145/2247684.2247693","DOIUrl":null,"url":null,"abstract":"High-throughput memory management techniques such as malloc/free or mark-and-sweep collectors often exhibit memory fragmentation leaving allocated objects interspersed with free memory holes. Memory defragmentation removes such holes by moving objects around in memory so that they become adjacent (compaction) and holes can be merged (coalesced) to form larger holes. However, known defragmentation techniques are slow. This paper presents a parallel solution to best-effort partial defragmentation that makes use of all available cores. The solution not only speeds up defragmentation times significantly, but it also scales for many simple cores. It can therefore even be implemented on a GPU.\n One problem with compaction is that it requires all references to moved objects to be retargeted to point to their new locations. This paper further improves existing work by a better identification of the parts of the heap that contain references to objects moved by the compactor and only processes these parts to find the references that are then retargeted in parallel.\n To demonstrate the performance of the new memory defragmentation algorithm on many-core processors, we show its performance on a modern GPU. Parallelization speeds up compaction 40 times and coalescing up to 32 times. After compaction, our algorithm only needs to process 2%--4% of the total heap to retarget references.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"447 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Memory System Performance and Correctness","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2247684.2247693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

High-throughput memory management techniques such as malloc/free or mark-and-sweep collectors often exhibit memory fragmentation leaving allocated objects interspersed with free memory holes. Memory defragmentation removes such holes by moving objects around in memory so that they become adjacent (compaction) and holes can be merged (coalesced) to form larger holes. However, known defragmentation techniques are slow. This paper presents a parallel solution to best-effort partial defragmentation that makes use of all available cores. The solution not only speeds up defragmentation times significantly, but it also scales for many simple cores. It can therefore even be implemented on a GPU. One problem with compaction is that it requires all references to moved objects to be retargeted to point to their new locations. This paper further improves existing work by a better identification of the parts of the heap that contain references to objects moved by the compactor and only processes these parts to find the references that are then retargeted in parallel. To demonstrate the performance of the new memory defragmentation algorithm on many-core processors, we show its performance on a modern GPU. Parallelization speeds up compaction 40 times and coalescing up to 32 times. After compaction, our algorithm only needs to process 2%--4% of the total heap to retarget references.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GPU上的并行内存碎片整理
高吞吐量内存管理技术,如malloc/free或标记-清除收集器,通常会出现内存碎片,使分配的对象散布在空闲内存洞中。内存碎片整理通过在内存中移动对象来消除这些洞,这样它们就变得相邻(压缩),并且洞可以合并(合并)以形成更大的洞。然而,已知的碎片整理技术是缓慢的。本文提出了一种利用所有可用内核的并行解决方案,以实现最佳的部分碎片整理。该解决方案不仅大大加快了碎片整理时间,而且还可以扩展到许多简单的内核。因此,它甚至可以在GPU上实现。压缩的一个问题是,它要求对移动对象的所有引用都要重新定位,以指向它们的新位置。本文通过更好地识别堆中包含对由压缩器移动的对象的引用的部分,进一步改进了现有的工作,并且只处理这些部分以查找随后并行重定向的引用。为了演示新的内存碎片整理算法在多核处理器上的性能,我们展示了它在现代GPU上的性能。并行化将压缩速度提高了40倍,合并速度提高了32倍。在压缩之后,我们的算法只需要处理总堆的2%- 4%来重定向引用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
All-window data liveness Cache rationing for multicore Software-controlled transparent management of heterogeneous memory resources in virtualized systems Program-centric cost models for locality A study of data structures with a deep heap shape
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1