分布式共享最后一级缓存的多核pareto最优功率和缓存感知任务映射

Martin Rapp, A. Pathania, J. Henkel
{"title":"分布式共享最后一级缓存的多核pareto最优功率和缓存感知任务映射","authors":"Martin Rapp, A. Pathania, J. Henkel","doi":"10.1145/3218603.3218630","DOIUrl":null,"url":null,"abstract":"Two factors primarily affect performance of multi-threaded tasks on many-core processors with both shared and physically distributed Last-Level Cache (LLC): the power budget associated with a certain task mapping that aims to guarantee thermally safe operation and the non-uniform LLC access latency of threads running on different cores. Spatially distributing threads across the many-core increases the power budget, but unfortunately also increases the associated LLC latency. On the other side, mapping more threads to cores near the center of the many-core decreases the LLC latency, but unfortunately also decreases the power budget. Consequently, both metrics (LLC latency and power budget) cannot be simultaneously optimal, which leads to a Pareto-optimization that has formerly not been exploited. We are the first to present a run-time task mapping algorithm called PCMap that exploits this trade-off. Our approach results in up to 8.6% reduction in the average task response time accompanied by a reduction of up to 8.5% in the energy consumption compared to the state-of-the-art.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Pareto-Optimal Power- and Cache-Aware Task Mapping for Many-Cores with Distributed Shared Last-Level Cache\",\"authors\":\"Martin Rapp, A. Pathania, J. Henkel\",\"doi\":\"10.1145/3218603.3218630\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Two factors primarily affect performance of multi-threaded tasks on many-core processors with both shared and physically distributed Last-Level Cache (LLC): the power budget associated with a certain task mapping that aims to guarantee thermally safe operation and the non-uniform LLC access latency of threads running on different cores. Spatially distributing threads across the many-core increases the power budget, but unfortunately also increases the associated LLC latency. On the other side, mapping more threads to cores near the center of the many-core decreases the LLC latency, but unfortunately also decreases the power budget. Consequently, both metrics (LLC latency and power budget) cannot be simultaneously optimal, which leads to a Pareto-optimization that has formerly not been exploited. We are the first to present a run-time task mapping algorithm called PCMap that exploits this trade-off. Our approach results in up to 8.6% reduction in the average task response time accompanied by a reduction of up to 8.5% in the energy consumption compared to the state-of-the-art.\",\"PeriodicalId\":20456,\"journal\":{\"name\":\"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3218603.3218630\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3218603.3218630","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

有两个因素主要影响多核处理器上多线程任务的性能,包括共享和物理分布式的最后一级缓存(Last-Level Cache, LLC):与特定任务映射相关的功率预算(旨在保证热安全操作)和运行在不同核上的线程的不均匀LLC访问延迟。跨多核空间分布线程增加了功率预算,但不幸的是也增加了相关的LLC延迟。另一方面,将更多的线程映射到多核中心附近的核可以减少LLC延迟,但不幸的是也会减少功耗预算。因此,两个指标(LLC延迟和功率预算)不能同时优化,这导致了以前未被利用的帕累托优化。我们首先提出了一种称为PCMap的运行时任务映射算法,该算法利用了这种权衡。与最先进的方法相比,我们的方法使平均任务响应时间减少了8.6%,同时能耗减少了8.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Pareto-Optimal Power- and Cache-Aware Task Mapping for Many-Cores with Distributed Shared Last-Level Cache
Two factors primarily affect performance of multi-threaded tasks on many-core processors with both shared and physically distributed Last-Level Cache (LLC): the power budget associated with a certain task mapping that aims to guarantee thermally safe operation and the non-uniform LLC access latency of threads running on different cores. Spatially distributing threads across the many-core increases the power budget, but unfortunately also increases the associated LLC latency. On the other side, mapping more threads to cores near the center of the many-core decreases the LLC latency, but unfortunately also decreases the power budget. Consequently, both metrics (LLC latency and power budget) cannot be simultaneously optimal, which leads to a Pareto-optimization that has formerly not been exploited. We are the first to present a run-time task mapping algorithm called PCMap that exploits this trade-off. Our approach results in up to 8.6% reduction in the average task response time accompanied by a reduction of up to 8.5% in the energy consumption compared to the state-of-the-art.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Adiabatic and Clock-Powered Circuits Power Macro-Models for High-Level Power Estimation Stand-By Power Reduction for SRAM Memories Leakage in CMOS Nanometric Technologies Evolution of Deep Submicron Bulk and SOI Technologies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1