功率限制和暗硅挑战多核的未来

IF 1.8 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS ACM Transactions on Computer Systems Pub Date : 2012-08-01 DOI:10.1145/2324876.2324879

H. Esmaeilzadeh, Emily R. Blem, R. S. Amant, K. Sankaralingam, D. Burger

{"title":"功率限制和暗硅挑战多核的未来","authors":"H. Esmaeilzadeh, Emily R. Blem, R. S. Amant, K. Sankaralingam, D. Burger","doi":"10.1145/2324876.2324879","DOIUrl":null,"url":null,"abstract":"Since 2004, processor designers have increased core counts to exploit Moore’s Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9× average speedup is possible across commonly used parallel workloads for the topologies we study, leaving a nearly 24-fold gap from a target of doubled performance per generation.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"1 1","pages":"11:1-11:27"},"PeriodicalIF":1.8000,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":"{\"title\":\"Power Limitations and Dark Silicon Challenge the Future of Multicore\",\"authors\":\"H. Esmaeilzadeh, Emily R. Blem, R. S. Amant, K. Sankaralingam, D. Burger\",\"doi\":\"10.1145/2324876.2324879\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since 2004, processor designers have increased core counts to exploit Moore’s Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9× average speedup is possible across commonly used parallel workloads for the topologies we study, leaving a nearly 24-fold gap from a target of doubled performance per generation.\",\"PeriodicalId\":50918,\"journal\":{\"name\":\"ACM Transactions on Computer Systems\",\"volume\":\"1 1\",\"pages\":\"11:1-11:27\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2012-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"59\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Computer Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/2324876.2324879\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computer Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/2324876.2324879","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 59

摘要

自2004年以来，处理器设计人员增加了核心数量，以利用摩尔定律的缩放，而不是专注于单核性能。Dennard缩放的失败(向多核部件的转变在一定程度上是对Dennard缩放的回应)可能很快就会限制多核缩放，就像单核缩放已经被限制一样。本文通过结合设备扩展、单核扩展和多核扩展来模拟多核扩展限制，以衡量未来五代技术中一组并行工作负载的加速潜力。对于设备缩放，我们使用ITRS预测和一组更保守的设备缩放参数。为了模拟单核缩放，我们结合了来自150多个处理器的测量结果，得出了面积/性能和功率/性能的帕累托最优边界。最后，为了对多核缩放进行建模，我们建立了一个详细的性能上限和核心功耗下限的性能模型。我们研究的多核设计包括具有对称、非对称、动态和组合拓扑的单线程类cpu和大规模线程类gpu的多核芯片组织。该研究表明，无论芯片组织和拓扑结构如何，多核扩展在一定程度上都受到了计算社区广泛认可的功率限制。即使在22nm工艺中(距离现在只有一年的时间)，固定尺寸芯片的21%必须关闭电源，而在8nm工艺中，这个数字增长到50%以上。到2024年，对于我们所研究的拓扑，在常用的并行工作负载上只能实现7.9倍的平均加速，距离每代性能翻倍的目标还有近24倍的差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Power Limitations and Dark Silicon Challenge the Future of Multicore

Since 2004, processor designers have increased core counts to exploit Moore’s Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9× average speedup is possible across commonly used parallel workloads for the topologies we study, leaving a nearly 24-fold gap from a target of doubled performance per generation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Computer Systems 工程技术-计算机：理论方法

CiteScore

4.00

自引率

0.00%

发文量

审稿时长

1 months

期刊介绍： ACM Transactions on Computer Systems (TOCS) presents research and development results on the design, implementation, analysis, evaluation, and use of computer systems and systems software. The term "computer systems" is interpreted broadly and includes operating systems, systems architecture and hardware, distributed systems, optimizing compilers, and the interaction between systems and computer networks. Articles appearing in TOCS will tend either to present new techniques and concepts, or to report on experiences and experiments with actual systems. Insights useful to system designers, builders, and users will be emphasized. TOCS publishes research and technical papers, both short and long. It includes technical correspondence to permit commentary on technical topics and on previously published papers.