芯片多处理器的应用映射

2008 45th ACM/IEEE Design Automation Conference Pub Date : 2008-06-08 DOI:10.1145/1391469.1391628

Guangyu Chen, Feihui Li, S. Son, M. Kandemir

{"title":"芯片多处理器的应用映射","authors":"Guangyu Chen, Feihui Li, S. Son, M. Kandemir","doi":"10.1145/1391469.1391628","DOIUrl":null,"url":null,"abstract":"The problem attacked in this paper is one of automatically mapping an application onto a network-on-chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fashion. The proposed compiler approach has four major steps: task scheduling, processor mapping, data mapping, and packet routing. In the first step, the application code is parallelized and the resulting parallel threads are assigned to virtual processors. The second step implements a virtual processor-to-physical processor mapping. The goal of this mapping is to ensure that the threads that are expected to communicate frequently with each other are assigned to neighboring processors as much as possible. In the third step, data elements are mapped to memories attached to CMP nodes. The main objective of this mapping is to place a given data item into a node which is close to the nodes that access it. The last step of our approach determines the paths (between memories and processors) for data to travel in an energy efficient manner. In this paper, we describe the compiler algorithms we implemented in detail and present an experimental evaluation of the framework. In our evaluation, we test our entire framework as well as the impact of omitting some of its steps. This experimental analysis clearly shows that the proposed framework reduces energy consumption of our applications significantly (27.41% on average over a pure performance oriented application mapping strategy) as a result of improved locality of data accesses.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"95","resultStr":"{\"title\":\"Application mapping for chip multiprocessors\",\"authors\":\"Guangyu Chen, Feihui Li, S. Son, M. Kandemir\",\"doi\":\"10.1145/1391469.1391628\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem attacked in this paper is one of automatically mapping an application onto a network-on-chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fashion. The proposed compiler approach has four major steps: task scheduling, processor mapping, data mapping, and packet routing. In the first step, the application code is parallelized and the resulting parallel threads are assigned to virtual processors. The second step implements a virtual processor-to-physical processor mapping. The goal of this mapping is to ensure that the threads that are expected to communicate frequently with each other are assigned to neighboring processors as much as possible. In the third step, data elements are mapped to memories attached to CMP nodes. The main objective of this mapping is to place a given data item into a node which is close to the nodes that access it. The last step of our approach determines the paths (between memories and processors) for data to travel in an energy efficient manner. In this paper, we describe the compiler algorithms we implemented in detail and present an experimental evaluation of the framework. In our evaluation, we test our entire framework as well as the impact of omitting some of its steps. This experimental analysis clearly shows that the proposed framework reduces energy consumption of our applications significantly (27.41% on average over a pure performance oriented application mapping strategy) as a result of improved locality of data accesses.\",\"PeriodicalId\":412696,\"journal\":{\"name\":\"2008 45th ACM/IEEE Design Automation Conference\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"95\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 45th ACM/IEEE Design Automation Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1391469.1391628\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 45th ACM/IEEE Design Automation Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1391469.1391628","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 95

摘要

本文研究的问题是以位置感知的方式将应用程序自动映射到基于片上网络(NoC)的芯片多处理器(CMP)体系结构上。提出的编译器方法有四个主要步骤:任务调度、处理器映射、数据映射和数据包路由。在第一步中，将应用程序代码并行化，并将生成的并行线程分配给虚拟处理器。第二步实现虚拟处理器到物理处理器的映射。这种映射的目标是确保将预期频繁相互通信的线程尽可能多地分配给相邻的处理器。在第三步中，将数据元素映射到附加到CMP节点的内存。此映射的主要目标是将给定的数据项放置到靠近访问它的节点的节点中。我们方法的最后一步决定了数据以一种节能的方式传输的路径(存储器和处理器之间)。在本文中，我们详细描述了我们实现的编译算法，并给出了该框架的实验评估。在我们的评估中，我们测试了整个框架以及省略其中一些步骤的影响。这个实验分析清楚地表明，由于改进了数据访问的局域性，所提出的框架显着降低了应用程序的能耗(比纯面向性能的应用程序映射策略平均降低了27.41%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Application mapping for chip multiprocessors

The problem attacked in this paper is one of automatically mapping an application onto a network-on-chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fashion. The proposed compiler approach has four major steps: task scheduling, processor mapping, data mapping, and packet routing. In the first step, the application code is parallelized and the resulting parallel threads are assigned to virtual processors. The second step implements a virtual processor-to-physical processor mapping. The goal of this mapping is to ensure that the threads that are expected to communicate frequently with each other are assigned to neighboring processors as much as possible. In the third step, data elements are mapped to memories attached to CMP nodes. The main objective of this mapping is to place a given data item into a node which is close to the nodes that access it. The last step of our approach determines the paths (between memories and processors) for data to travel in an energy efficient manner. In this paper, we describe the compiler algorithms we implemented in detail and present an experimental evaluation of the framework. In our evaluation, we test our entire framework as well as the impact of omitting some of its steps. This experimental analysis clearly shows that the proposed framework reduces energy consumption of our applications significantly (27.41% on average over a pure performance oriented application mapping strategy) as a result of improved locality of data accesses.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 45th ACM/IEEE Design Automation Conference

自引率

0.00%

发文量