Runtime Memory Allocation in a Heterogeneous Reconfigurable Platform

V. Sima, K. Bertels
{"title":"Runtime Memory Allocation in a Heterogeneous Reconfigurable Platform","authors":"V. Sima, K. Bertels","doi":"10.1109/ReConFig.2009.38","DOIUrl":null,"url":null,"abstract":"In this paper, we present a runtime memory allocation algorithm, that aims to substantially reduce the overhead caused by shared-memory accesses by allocating memory directly in the local scratch pad memories. We target a heterogeneous platform, with a complex memory hierarchy. Using special instrumentation, we determine what memory areas are used in functions that could run on different processing elements, like, for example a reconfigurable logic array. Based on profile information, the programmer annotates some functions as candidates for accelerated execution. Then, an algorithm decides the best allocation, taking into account the various processing elements and special scratch pad memories of the heterogeneous platform. Tests are performed on our prototype platform, a Virtex ML410 with Linux operating system, containing a PowerPC processor and a Xilinx FPGA, implementing the MOLEN programming paradigm. We test the algorithm using both state of the art H.264 video encoder as well as other synthetic applications. The performance improvement for the H.264 application is 14% compared to the software only version while the overhead is less than 1% of the application execution time. This improvement is the optimal improvement that can be obtained by optimizing the memory allocation. For the synthetic applications the results are within 5% of the optimum.","PeriodicalId":325631,"journal":{"name":"2009 International Conference on Reconfigurable Computing and FPGAs","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Reconfigurable Computing and FPGAs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReConFig.2009.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

In this paper, we present a runtime memory allocation algorithm, that aims to substantially reduce the overhead caused by shared-memory accesses by allocating memory directly in the local scratch pad memories. We target a heterogeneous platform, with a complex memory hierarchy. Using special instrumentation, we determine what memory areas are used in functions that could run on different processing elements, like, for example a reconfigurable logic array. Based on profile information, the programmer annotates some functions as candidates for accelerated execution. Then, an algorithm decides the best allocation, taking into account the various processing elements and special scratch pad memories of the heterogeneous platform. Tests are performed on our prototype platform, a Virtex ML410 with Linux operating system, containing a PowerPC processor and a Xilinx FPGA, implementing the MOLEN programming paradigm. We test the algorithm using both state of the art H.264 video encoder as well as other synthetic applications. The performance improvement for the H.264 application is 14% compared to the software only version while the overhead is less than 1% of the application execution time. This improvement is the optimal improvement that can be obtained by optimizing the memory allocation. For the synthetic applications the results are within 5% of the optimum.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
异构可重构平台中的运行时内存分配
在本文中,我们提出了一种运行时内存分配算法,该算法旨在通过直接在本地暂存存储器中分配内存来大幅减少共享内存访问带来的开销。我们的目标是一个异构平台,具有复杂的内存层次结构。使用特殊的检测工具,我们确定可以在不同处理元素上运行的函数中使用哪些内存区域,例如可重构逻辑数组。基于概要信息,程序员将一些函数注释为加速执行的候选函数。然后,考虑到异构平台的各种处理元素和特殊的刮记板存储器,确定最佳分配算法。在我们的原型平台上进行了测试,该平台是带有Linux操作系统的Virtex ML410,包含PowerPC处理器和Xilinx FPGA,实现了MOLEN编程范例。我们使用最先进的H.264视频编码器以及其他合成应用程序来测试算法。与纯软件版本相比,H.264应用程序的性能提高了14%,而开销不到应用程序执行时间的1%。这种改进是通过优化内存分配所能获得的最优改进。对于合成应用,结果在最佳的5%以内。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
PCIREX: A Fast Prototyping Platform for TMR Dynamically Reconfigurable Systems On the Implementation of Central Pattern Generators for Periodic Rhythmic Locomotion Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time Protecting the NOEKEON Cipher against SCARE Attacks in FPGAs by Using Dynamic Implementations A Dynamically Reconfigurable Platform for Fixed-Point FIR Filters
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1