ILP and TLP in shared memory applications: A limit study

Ehsan Fatehi, Paul V. Gratz
{"title":"ILP and TLP in shared memory applications: A limit study","authors":"Ehsan Fatehi, Paul V. Gratz","doi":"10.1145/2628071.2628093","DOIUrl":null,"url":null,"abstract":"With the breakdown of Dennard scaling, future processor designs will be at the mercy of power limits as Chip MultiProcessor (CMP) designs scale out to many-cores. It is critical, therefore, that future CMPs be optimally designed in terms of performance efficiency with respect to power. A characterization analysis of future workloads is imperative to ensure maximum returns of performance per Watt consumed. Hence, a detailed analysis of emerging workloads is necessary to understand their characteristics with respect to hardware in terms of power and performance tradeoffs. In this paper, we conduct a limit study simultaneously analyzing the two dominant forms of parallelism exploited by modern computer architectures: Instruction Level Parallelism (ILP) and Thread Level Parallelism (TLP). This study gives insights into the upper bounds of performance that future architectures can achieve. Furthermore it identifies the bottlenecks of emerging workloads. To the best of our knowledge, our work is the first study that combines the two forms of parallelism into one study with modern applications. We evaluate the PARSEC multithreaded benchmark suite using a specialized trace-driven simulator. We make several contributions describing the high-level behavior of next-generation applications. For example, we show these applications contain up to a factor of 929× more ILP than what is currently being extracted from real machines. We then show the effects of breaking the application into increasing numbers of threads (exploiting TLP), instruction window size, realistic branch prediction, realistic memory latency, and thread dependencies on exploitable ILP. Our examination shows that theses benchmarks differed vastly from one another. As a result, we expect no single, homogeneous, micro-architecture will work optimally for all, arguing for reconfigurable, heterogeneous designs.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"5 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2628071.2628093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

With the breakdown of Dennard scaling, future processor designs will be at the mercy of power limits as Chip MultiProcessor (CMP) designs scale out to many-cores. It is critical, therefore, that future CMPs be optimally designed in terms of performance efficiency with respect to power. A characterization analysis of future workloads is imperative to ensure maximum returns of performance per Watt consumed. Hence, a detailed analysis of emerging workloads is necessary to understand their characteristics with respect to hardware in terms of power and performance tradeoffs. In this paper, we conduct a limit study simultaneously analyzing the two dominant forms of parallelism exploited by modern computer architectures: Instruction Level Parallelism (ILP) and Thread Level Parallelism (TLP). This study gives insights into the upper bounds of performance that future architectures can achieve. Furthermore it identifies the bottlenecks of emerging workloads. To the best of our knowledge, our work is the first study that combines the two forms of parallelism into one study with modern applications. We evaluate the PARSEC multithreaded benchmark suite using a specialized trace-driven simulator. We make several contributions describing the high-level behavior of next-generation applications. For example, we show these applications contain up to a factor of 929× more ILP than what is currently being extracted from real machines. We then show the effects of breaking the application into increasing numbers of threads (exploiting TLP), instruction window size, realistic branch prediction, realistic memory latency, and thread dependencies on exploitable ILP. Our examination shows that theses benchmarks differed vastly from one another. As a result, we expect no single, homogeneous, micro-architecture will work optimally for all, arguing for reconfigurable, heterogeneous designs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
共享内存应用中的ILP和TLP:一个极限研究
随着Dennard扩展的崩溃,随着芯片多处理器(CMP)设计向多核扩展,未来的处理器设计将受到功率限制的支配。因此,未来的cmp在性能效率和功率方面进行优化设计是至关重要的。为了确保每瓦特消耗的最大性能回报,对未来工作负载的特征分析是必不可少的。因此,有必要对新出现的工作负载进行详细分析,以了解它们在功耗和性能权衡方面的硬件特征。在本文中,我们进行了一项极限研究,同时分析了现代计算机体系结构中两种主要的并行形式:指令级并行(ILP)和线程级并行(TLP)。这项研究提供了对未来架构可以实现的性能上限的见解。此外,它还可以识别新出现的工作负载的瓶颈。据我们所知,我们的工作是第一个将两种形式的并行化结合为一种具有现代应用的研究。我们使用专门的跟踪驱动模拟器来评估PARSEC多线程基准套件。我们在描述下一代应用程序的高级行为方面做出了一些贡献。例如,我们展示了这些应用程序包含的ILP比目前从真实机器中提取的ILP多929倍。然后,我们将展示将应用程序分解为越来越多的线程(利用TLP)、指令窗口大小、实际的分支预测、实际的内存延迟以及线程对可利用ILP的依赖所产生的影响。我们的研究表明,这些基准彼此差别很大。因此,我们期望没有一个单一的、同构的、微体系结构能够最优地为所有人工作,主张可重构的、异构的设计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimizing stencil code via locality of computation Adaptive heterogeneous scheduling for integrated GPUs Heterogeneous microarchitectures trump voltage scaling for low-power cores Bitwise data parallelism in regular expression matching KLA: A new algorithmic paradigm for parallel graph computations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1