{"title":"一个粗粒度动态可重构多媒体加速器的设计与实现","authors":"Hung K. Nguyen, Xuan-Tu Tran","doi":"10.1145/3543544","DOIUrl":null,"url":null,"abstract":"This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-intensive loops in multimedia applications. The novel architecture of REMAC's reconfigurable processing unit (RPU) allows multiple iterations of a kernel loop can execute concurrently in the pipelining fashion by the temporal overlapping of the configuration fetch, execution, and store processes as much as possible. To address the huge bandwidth required by parallel processing units, REMAC architecture is proposed to efficiently exploit the abundant data locality in the kernel loops to decrease data access bandwidth while increase the efficiency of pipelined execution. In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU. The proposed architecture was modeled at RTL using VHDL language. Several benchmark applications were mapped onto REMAC to validate the high-flexibility and high-performance of the architecture and prove that it is appropriate for a wide set of multimedia applications. The experimental results show that REMAC's performance is better than Xilinx Virtex-II, ADRES, REMUS-II, and TI C64+ DSP.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator\",\"authors\":\"Hung K. Nguyen, Xuan-Tu Tran\",\"doi\":\"10.1145/3543544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-intensive loops in multimedia applications. The novel architecture of REMAC's reconfigurable processing unit (RPU) allows multiple iterations of a kernel loop can execute concurrently in the pipelining fashion by the temporal overlapping of the configuration fetch, execution, and store processes as much as possible. To address the huge bandwidth required by parallel processing units, REMAC architecture is proposed to efficiently exploit the abundant data locality in the kernel loops to decrease data access bandwidth while increase the efficiency of pipelined execution. In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU. The proposed architecture was modeled at RTL using VHDL language. Several benchmark applications were mapped onto REMAC to validate the high-flexibility and high-performance of the architecture and prove that it is appropriate for a wide set of multimedia applications. The experimental results show that REMAC's performance is better than Xilinx Virtex-II, ADRES, REMUS-II, and TI C64+ DSP.\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2022-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3543544\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3543544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
本文提出并实现了一种粗粒度的动态可重构体系结构,称为可重构多媒体加速器(Reconfigurable Multimedia Accelerator, REMAC)。REMAC架构由流水线式多指令多数据执行模型驱动,利用多媒体应用中计算密集型循环的多级并行性。REMAC的可重构处理单元(reconfigurable processing unit, RPU)的新颖架构允许内核循环的多个迭代以流水线方式并发执行,方法是尽可能多地在时间上重叠配置获取、执行和存储进程。针对并行处理单元对带宽的需求,提出了REMAC架构,有效利用内核循环中丰富的数据局部性,在降低数据访问带宽的同时提高流水线执行效率。此外,提出了一种新的专用层次数据存储系统架构,以增加迭代之间的数据重用,并使数据始终可用于RPU的并行操作。采用VHDL语言在RTL上对所提出的体系结构进行了建模。将几个基准测试应用程序映射到REMAC上,以验证该体系结构的高灵活性和高性能,并证明它适用于广泛的多媒体应用程序。实验结果表明,REMAC的性能优于Xilinx Virtex-II、ADRES、REMUS-II和TI C64+ DSP。
Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator
This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-intensive loops in multimedia applications. The novel architecture of REMAC's reconfigurable processing unit (RPU) allows multiple iterations of a kernel loop can execute concurrently in the pipelining fashion by the temporal overlapping of the configuration fetch, execution, and store processes as much as possible. To address the huge bandwidth required by parallel processing units, REMAC architecture is proposed to efficiently exploit the abundant data locality in the kernel loops to decrease data access bandwidth while increase the efficiency of pipelined execution. In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU. The proposed architecture was modeled at RTL using VHDL language. Several benchmark applications were mapped onto REMAC to validate the high-flexibility and high-performance of the architecture and prove that it is appropriate for a wide set of multimedia applications. The experimental results show that REMAC's performance is better than Xilinx Virtex-II, ADRES, REMUS-II, and TI C64+ DSP.