Dark silicon and the end of multicore scaling

H. Esmaeilzadeh, Emily R. Blem, Renee St. Amant, K. Sankaralingam, D. Burger
{"title":"Dark silicon and the end of multicore scaling","authors":"H. Esmaeilzadeh, Emily R. Blem, Renee St. Amant, K. Sankaralingam, D. Burger","doi":"10.1145/2000064.2000108","DOIUrl":null,"url":null,"abstract":"Since 2005, processor designers have increased core counts to exploit Moore's Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9× average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.","PeriodicalId":340732,"journal":{"name":"2011 38th Annual International Symposium on Computer Architecture (ISCA)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"364","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 38th Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2000064.2000108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 364

Abstract

Since 2005, processor designers have increased core counts to exploit Moore's Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9× average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
暗硅和多核缩放的终结
自2005年以来,处理器设计人员增加了核心数量,以利用摩尔定律扩展,而不是专注于单核性能。Dennard缩放的失败(向多核部件的转变在一定程度上是对Dennard缩放的回应)可能很快就会限制多核缩放,就像单核缩放已经被限制一样。本文通过结合设备扩展、单核扩展和多核扩展来模拟多核扩展限制,以衡量未来五代技术中一组并行工作负载的加速潜力。对于设备缩放,我们使用ITRS预测和一组更保守的设备缩放参数。为了模拟单核缩放,我们结合了来自150多个处理器的测量结果,得出了面积/性能和功率/性能的帕累托最优边界。最后,为了对多核缩放进行建模,我们建立了一个详细的性能上限和核心功耗下限的性能模型。我们研究的多核设计包括具有对称、非对称、动态和组合拓扑的单线程类cpu和大规模线程类gpu的多核芯片组织。该研究表明,无论芯片组织和拓扑结构如何,多核扩展在一定程度上都受到了计算社区广泛认可的功率限制。即使在22nm工艺中(距离现在只有一年的时间),固定尺寸芯片的21%必须关闭电源,而在8nm工艺中,这个数字增长到50%以上。到2024年,在常用的并行工作负载上只能实现7.9倍的平均加速,与每代性能翻倍的目标相差近24倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Crafting a usable microkernel, processor, and I/O system with strict and provable information flow security Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators Scalable power control for many-core architectures running multi-threaded applications Virtualizing performance asymmetric multi-core systems DBAR: An efficient routing algorithm to support multiple concurrent applications in networks-on-chip
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1