芯片多处理器的应用映射

Guangyu Chen, Feihui Li, S. Son, M. Kandemir
{"title":"芯片多处理器的应用映射","authors":"Guangyu Chen, Feihui Li, S. Son, M. Kandemir","doi":"10.1145/1391469.1391628","DOIUrl":null,"url":null,"abstract":"The problem attacked in this paper is one of automatically mapping an application onto a network-on-chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fashion. The proposed compiler approach has four major steps: task scheduling, processor mapping, data mapping, and packet routing. In the first step, the application code is parallelized and the resulting parallel threads are assigned to virtual processors. The second step implements a virtual processor-to-physical processor mapping. The goal of this mapping is to ensure that the threads that are expected to communicate frequently with each other are assigned to neighboring processors as much as possible. In the third step, data elements are mapped to memories attached to CMP nodes. The main objective of this mapping is to place a given data item into a node which is close to the nodes that access it. The last step of our approach determines the paths (between memories and processors) for data to travel in an energy efficient manner. In this paper, we describe the compiler algorithms we implemented in detail and present an experimental evaluation of the framework. In our evaluation, we test our entire framework as well as the impact of omitting some of its steps. This experimental analysis clearly shows that the proposed framework reduces energy consumption of our applications significantly (27.41% on average over a pure performance oriented application mapping strategy) as a result of improved locality of data accesses.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"95","resultStr":"{\"title\":\"Application mapping for chip multiprocessors\",\"authors\":\"Guangyu Chen, Feihui Li, S. Son, M. Kandemir\",\"doi\":\"10.1145/1391469.1391628\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem attacked in this paper is one of automatically mapping an application onto a network-on-chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fashion. The proposed compiler approach has four major steps: task scheduling, processor mapping, data mapping, and packet routing. In the first step, the application code is parallelized and the resulting parallel threads are assigned to virtual processors. The second step implements a virtual processor-to-physical processor mapping. The goal of this mapping is to ensure that the threads that are expected to communicate frequently with each other are assigned to neighboring processors as much as possible. In the third step, data elements are mapped to memories attached to CMP nodes. The main objective of this mapping is to place a given data item into a node which is close to the nodes that access it. The last step of our approach determines the paths (between memories and processors) for data to travel in an energy efficient manner. In this paper, we describe the compiler algorithms we implemented in detail and present an experimental evaluation of the framework. In our evaluation, we test our entire framework as well as the impact of omitting some of its steps. This experimental analysis clearly shows that the proposed framework reduces energy consumption of our applications significantly (27.41% on average over a pure performance oriented application mapping strategy) as a result of improved locality of data accesses.\",\"PeriodicalId\":412696,\"journal\":{\"name\":\"2008 45th ACM/IEEE Design Automation Conference\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"95\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 45th ACM/IEEE Design Automation Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1391469.1391628\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 45th ACM/IEEE Design Automation Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1391469.1391628","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 95

摘要

本文研究的问题是以位置感知的方式将应用程序自动映射到基于片上网络(NoC)的芯片多处理器(CMP)体系结构上。提出的编译器方法有四个主要步骤:任务调度、处理器映射、数据映射和数据包路由。在第一步中,将应用程序代码并行化,并将生成的并行线程分配给虚拟处理器。第二步实现虚拟处理器到物理处理器的映射。这种映射的目标是确保将预期频繁相互通信的线程尽可能多地分配给相邻的处理器。在第三步中,将数据元素映射到附加到CMP节点的内存。此映射的主要目标是将给定的数据项放置到靠近访问它的节点的节点中。我们方法的最后一步决定了数据以一种节能的方式传输的路径(存储器和处理器之间)。在本文中,我们详细描述了我们实现的编译算法,并给出了该框架的实验评估。在我们的评估中,我们测试了整个框架以及省略其中一些步骤的影响。这个实验分析清楚地表明,由于改进了数据访问的局域性,所提出的框架显着降低了应用程序的能耗(比纯面向性能的应用程序映射策略平均降低了27.41%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Application mapping for chip multiprocessors
The problem attacked in this paper is one of automatically mapping an application onto a network-on-chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fashion. The proposed compiler approach has four major steps: task scheduling, processor mapping, data mapping, and packet routing. In the first step, the application code is parallelized and the resulting parallel threads are assigned to virtual processors. The second step implements a virtual processor-to-physical processor mapping. The goal of this mapping is to ensure that the threads that are expected to communicate frequently with each other are assigned to neighboring processors as much as possible. In the third step, data elements are mapped to memories attached to CMP nodes. The main objective of this mapping is to place a given data item into a node which is close to the nodes that access it. The last step of our approach determines the paths (between memories and processors) for data to travel in an energy efficient manner. In this paper, we describe the compiler algorithms we implemented in detail and present an experimental evaluation of the framework. In our evaluation, we test our entire framework as well as the impact of omitting some of its steps. This experimental analysis clearly shows that the proposed framework reduces energy consumption of our applications significantly (27.41% on average over a pure performance oriented application mapping strategy) as a result of improved locality of data accesses.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An automatic Scratch Pad Memory management tool and MPEG-4 encoder case study Standard interfaces in mobile terminals — increasing the efficiency of device design and accelerating innovation Concurrent topology and routing optimization in automotive network integration Keeping hot chips cool: Are IC thermal problems hot air? Customizing computation accelerators for extensible multi-issue processors with effective optimization techniques
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1