面向粗粒度可重构架构的以边缘为中心的模调度

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2008-10-25 DOI:10.1145/1454115.1454140

Hyunchul Park, Kevin Fan, S. Mahlke, Taewook Oh, Heeseok Kim, Hong-Seok Kim

{"title":"面向粗粒度可重构架构的以边缘为中心的模调度","authors":"Hyunchul Park, Kevin Fan, S. Mahlke, Taewook Oh, Heeseok Kim, Hong-Seok Kim","doi":"10.1145/1454115.1454140","DOIUrl":null,"url":null,"abstract":"Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs consist of an array of function units and register files often organized as a two dimensional grid. The most difficult challenge in deploying CGRAs is compiler scheduling technology that can efficiently map software implementations of compute intensive loops onto the array. Traditional schedulers focus on the placement of operations in time and space. With CGRAs, the challenge of placement is compounded by the need to explicitly route operands from producers to consumers. To systematically attack this problem, we take an edge-centric approach to modulo scheduling that focuses on the routing problem as its primary objective. With edge-centric modulo scheduling (EMS), placement is a by-product of the routing process, and the schedule is developed by routing each edge in the dataflow graph. Routing cost metrics provide the scheduler with a global perspective to guide selection. Experiments on a wide variety of compute-intensive loops from the multimedia domain show that EMS improves throughput by 25% over traditional iterative modulo scheduling, and achieves 98% of the throughput of simulated annealing techniques at a fraction of the compilation time.","PeriodicalId":186773,"journal":{"name":"2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"195","resultStr":"{\"title\":\"Edge-centric modulo scheduling for coarse-grained reconfigurable architectures\",\"authors\":\"Hyunchul Park, Kevin Fan, S. Mahlke, Taewook Oh, Heeseok Kim, Hong-Seok Kim\",\"doi\":\"10.1145/1454115.1454140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs consist of an array of function units and register files often organized as a two dimensional grid. The most difficult challenge in deploying CGRAs is compiler scheduling technology that can efficiently map software implementations of compute intensive loops onto the array. Traditional schedulers focus on the placement of operations in time and space. With CGRAs, the challenge of placement is compounded by the need to explicitly route operands from producers to consumers. To systematically attack this problem, we take an edge-centric approach to modulo scheduling that focuses on the routing problem as its primary objective. With edge-centric modulo scheduling (EMS), placement is a by-product of the routing process, and the schedule is developed by routing each edge in the dataflow graph. Routing cost metrics provide the scheduler with a global perspective to guide selection. Experiments on a wide variety of compute-intensive loops from the multimedia domain show that EMS improves throughput by 25% over traditional iterative modulo scheduling, and achieves 98% of the throughput of simulated annealing techniques at a fraction of the compilation time.\",\"PeriodicalId\":186773,\"journal\":{\"name\":\"2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"195\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1454115.1454140\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1454115.1454140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 195

摘要

粗粒度可重构架构(CGRAs)通过提供高计算吞吐量、可伸缩性、低成本和能源效率的潜力，提供了一个吸引人的硬件平台。CGRAs由一组函数单元和寄存器文件组成，通常组织为二维网格。部署CGRAs最困难的挑战是编译器调度技术，该技术可以有效地将计算密集型循环的软件实现映射到阵列上。传统的调度程序关注操作在时间和空间上的放置。对于CGRAs，由于需要显式地将操作数从生产者路由到消费者，因此放置的挑战变得更加复杂。为了系统地解决这个问题，我们采用了一种以边缘为中心的模调度方法，将路由问题作为其主要目标。在以边为中心的模调度(EMS)中，位置是路由过程的副产品，调度是通过路由数据流图中的每个边来制定的。路由成本度量为调度程序提供了全局视角，以指导选择。在多媒体领域的各种计算密集型循环上的实验表明，EMS比传统的迭代模调度提高了25%的吞吐量，并且在编译时间的一小部分内实现了模拟退火技术98%的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs consist of an array of function units and register files often organized as a two dimensional grid. The most difficult challenge in deploying CGRAs is compiler scheduling technology that can efficiently map software implementations of compute intensive loops onto the array. Traditional schedulers focus on the placement of operations in time and space. With CGRAs, the challenge of placement is compounded by the need to explicitly route operands from producers to consumers. To systematically attack this problem, we take an edge-centric approach to modulo scheduling that focuses on the routing problem as its primary objective. With edge-centric modulo scheduling (EMS), placement is a by-product of the routing process, and the schedule is developed by routing each edge in the dataflow graph. Routing cost metrics provide the scheduler with a global perspective to guide selection. Experiments on a wide variety of compute-intensive loops from the multimedia domain show that EMS improves throughput by 25% over traditional iterative modulo scheduling, and achieves 98% of the throughput of simulated annealing techniques at a fraction of the compilation time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量

期刊最新文献

Meeting points: Using thread criticality to adapt multicore hardware to parallel regions COMIC: A coherent shared memory interface for cell BE Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor Multi-mode energy management for multi-tier server clusters MCAMP: Communication optimization on Massively Parallel Machines with hierarchical scratch-pad memory