快速共享片上存储器架构，用于高效的CGRAs混合计算

2013 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2013-03-18 DOI:10.5555/2485288.2485662

Jongeun Lee, Yeonghun Jeong, Sungsok Seo

{"title":"快速共享片上存储器架构，用于高效的CGRAs混合计算","authors":"Jongeun Lee, Yeonghun Jeong, Sungsok Seo","doi":"10.5555/2485288.2485662","DOIUrl":null,"url":null,"abstract":"While Coarse-Grained Reconfigurable Architectures (CGRAs) are very efficient at handling regular, compute-intensive loops, their weakness at control-intensive processing and the need for frequent reconfiguration require another processor, for which usually a main processor is used. To minimize the overhead arising in such collaborative execution, we integrate a dedicated sequential processor (SP) with a reconfigurable array (RA), where the crucial problem is how to share the memory between SP and RA while keeping the SP's memory access latency very short. We present a detailed architecture, control, and program example of our approach, focusing on our optimized on-chip shared memory organization between SP and RA. Our preliminary results demonstrate that our optimized memory architecture is very effective in reducing kernel execution times (23.5% compared to a more straightforward alternative), and our approach can reduce the RA control overhead and other sequential code execution time in kernels significantly, resulting in up to 23.1% reduction in kernel execution time, compared to the conventional system using the main processor for sequential code execution.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"44 1","pages":"1575-1578"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Fast shared on-chip memory architecture for efficient hybrid computing with CGRAs\",\"authors\":\"Jongeun Lee, Yeonghun Jeong, Sungsok Seo\",\"doi\":\"10.5555/2485288.2485662\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While Coarse-Grained Reconfigurable Architectures (CGRAs) are very efficient at handling regular, compute-intensive loops, their weakness at control-intensive processing and the need for frequent reconfiguration require another processor, for which usually a main processor is used. To minimize the overhead arising in such collaborative execution, we integrate a dedicated sequential processor (SP) with a reconfigurable array (RA), where the crucial problem is how to share the memory between SP and RA while keeping the SP's memory access latency very short. We present a detailed architecture, control, and program example of our approach, focusing on our optimized on-chip shared memory organization between SP and RA. Our preliminary results demonstrate that our optimized memory architecture is very effective in reducing kernel execution times (23.5% compared to a more straightforward alternative), and our approach can reduce the RA control overhead and other sequential code execution time in kernels significantly, resulting in up to 23.1% reduction in kernel execution time, compared to the conventional system using the main processor for sequential code execution.\",\"PeriodicalId\":6310,\"journal\":{\"name\":\"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)\",\"volume\":\"44 1\",\"pages\":\"1575-1578\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5555/2485288.2485662\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/2485288.2485662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

虽然粗粒度可重构架构(CGRAs)在处理常规的计算密集型循环方面非常有效，但它们在控制密集型处理方面的弱点和频繁重新配置的需要需要另一个处理器，通常使用主处理器。为了最大限度地减少这种协同执行中产生的开销，我们将专用顺序处理器(SP)与可重构阵列(RA)集成在一起，其中的关键问题是如何在SP和RA之间共享内存，同时保持SP的内存访问延迟非常短。我们给出了我们方法的详细架构、控制和程序示例，重点介绍了我们优化的SP和RA之间的片上共享内存组织。我们的初步结果表明，我们优化的内存架构在减少内核执行时间方面非常有效(与更直接的替代方案相比为23.5%)，并且我们的方法可以显著减少内核中的RA控制开销和其他顺序代码执行时间，与使用主处理器进行顺序代码执行的传统系统相比，内核执行时间最多减少23.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast shared on-chip memory architecture for efficient hybrid computing with CGRAs

While Coarse-Grained Reconfigurable Architectures (CGRAs) are very efficient at handling regular, compute-intensive loops, their weakness at control-intensive processing and the need for frequent reconfiguration require another processor, for which usually a main processor is used. To minimize the overhead arising in such collaborative execution, we integrate a dedicated sequential processor (SP) with a reconfigurable array (RA), where the crucial problem is how to share the memory between SP and RA while keeping the SP's memory access latency very short. We present a detailed architecture, control, and program example of our approach, focusing on our optimized on-chip shared memory organization between SP and RA. Our preliminary results demonstrate that our optimized memory architecture is very effective in reducing kernel execution times (23.5% compared to a more straightforward alternative), and our approach can reduce the RA control overhead and other sequential code execution time in kernels significantly, resulting in up to 23.1% reduction in kernel execution time, compared to the conventional system using the main processor for sequential code execution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)

自引率

0.00%

发文量