A configurable logic architecture for dynamic hardware/software partitioning

Roman L. Lysecky, F. Vahid
{"title":"A configurable logic architecture for dynamic hardware/software partitioning","authors":"Roman L. Lysecky, F. Vahid","doi":"10.1109/DATE.2004.1268892","DOIUrl":null,"url":null,"abstract":"In previous work, we showed the benefits and feasibility of having a processor dynamically partition its executing software such that critical software kernels are transparently partitioned to execute as a hardware coprocessor on configurable logic - an approach we call warp processing. The configurable logic place and route step is the most computationally intensive part of such hardware/software partitioning, normally running for many minutes or hours on powerful desktop processors. In contrast, dynamic partitioning requires place and route to execute in just seconds and on a lean embedded processor. We have therefore designed a configurable logic architecture specifically for dynamic hardware/software partitioning. Through experiments with popular benchmarks, we show that by specifically focusing on the goal of software kernel speedup when designing the FPGA architecture, rather than on the more general goal of ASIC prototyping, we can perform place and route for our architecture 50 times faster, using 10,000 times less data memory, and 1,000 times less code memory, than popular commercial tools mapping to commercial configurable logic. Yet, we show that we obtain speedups (2x on average, and as much as 4x) and energy savings (33% on average, and up to 74%) when partitioning even just one loop, which are comparable to commercial tools and fabrics. Thus, our configurable logic architecture represents a good candidate for platforms that will support dynamic hardware/software partitioning, and enables ultra-fast desktop tools for hardware/software partitioning, and even for fast configurable logic design in general.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"73","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DATE.2004.1268892","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 73

Abstract

In previous work, we showed the benefits and feasibility of having a processor dynamically partition its executing software such that critical software kernels are transparently partitioned to execute as a hardware coprocessor on configurable logic - an approach we call warp processing. The configurable logic place and route step is the most computationally intensive part of such hardware/software partitioning, normally running for many minutes or hours on powerful desktop processors. In contrast, dynamic partitioning requires place and route to execute in just seconds and on a lean embedded processor. We have therefore designed a configurable logic architecture specifically for dynamic hardware/software partitioning. Through experiments with popular benchmarks, we show that by specifically focusing on the goal of software kernel speedup when designing the FPGA architecture, rather than on the more general goal of ASIC prototyping, we can perform place and route for our architecture 50 times faster, using 10,000 times less data memory, and 1,000 times less code memory, than popular commercial tools mapping to commercial configurable logic. Yet, we show that we obtain speedups (2x on average, and as much as 4x) and energy savings (33% on average, and up to 74%) when partitioning even just one loop, which are comparable to commercial tools and fabrics. Thus, our configurable logic architecture represents a good candidate for platforms that will support dynamic hardware/software partitioning, and enables ultra-fast desktop tools for hardware/software partitioning, and even for fast configurable logic design in general.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于动态硬件/软件分区的可配置逻辑体系结构
在之前的工作中,我们展示了让处理器动态分区其正在执行的软件的好处和可行性,这样关键的软件内核就可以透明地分区,作为硬件协处理器在可配置逻辑上执行——我们将这种方法称为warp处理。可配置逻辑位置和路由步骤是此类硬件/软件分区中计算最密集的部分,通常在功能强大的桌面处理器上运行数分钟或数小时。相比之下,动态分区需要在几秒钟内在一个精简的嵌入式处理器上执行位置和路由。因此,我们设计了一个专门用于动态硬件/软件分区的可配置逻辑体系结构。通过流行的基准测试实验,我们表明,通过在设计FPGA架构时特别关注软件内核加速的目标,而不是更一般的ASIC原型目标,我们可以比流行的商业工具映射到商业可配置逻辑快50倍,使用少10000倍的数据内存和少1000倍的代码内存。然而,我们表明,即使只划分一个循环,我们也可以获得速度提升(平均2倍,最多4倍)和能源节约(平均33%,最高74%),这与商业工具和结构相当。因此,我们的可配置逻辑架构代表了支持动态硬件/软件分区的平台的一个很好的候选平台,并且支持用于硬件/软件分区的超快速桌面工具,甚至支持一般的快速可配置逻辑设计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
RTL power optimisation: concepts, tools and design experiences [Tutorial] Reliable design: a system perspective [Tutorial] Evaluation of a refinement-driven systemC/spl trade/-based design flow The coming of age of reconfigurable computing-potentials and challenges of a new technology [Tutorial] Breaking the synchronous barrier for systems-on-chip communication and synchronisation [Tutorial]
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1