{"title":"GPU的热情正在消失吗?","authors":"C. Trinitis","doi":"10.1109/HPCSim.2012.6266945","DOIUrl":null,"url":null,"abstract":"In recent years, there has been quite a hype on porting compute intensive kernels to GPUs, claiming impressive speedups of sometimes up to more than 100. However, looking at a number of compute intensive applications that have been investigated at TUM, the outcome looks slightly different. In addition, the overhead for porting applications to GPUs, or, more generally speaking, to accelerators, need be taken into consideration. As both very promising and very disappointing results can be obtained on accelerators (depending on the application), as usual the community is divided into GPU enthusiasts on the one hand and GPU opponents on the other hand. In both industrial and academic practice, the question arises what to do with existing compute intensive applications (often numerical simulation codes) that have existed for years or even decades, and which are treated as “never change a running system” code. Basically, these can be divided into three categories: - Code that should not be touched as it most likely will no longer run if anything will be modified (complete rewrite required if it is to run efficiently) - Code where compute intensive parts can be rewritten (partial rewrite required), and - Code that can easily be ported to new programming paradigms (easy adapting possible). Given the fact that CPUs integrate more and more features known from accelerators, one could conclude that most codes would fall into the third category, as the required porting effort seems to be shrinking and compilers are constantly improving. However, although features like automatic parallelization can be carried out with compilers, tuning by hand coding or using hardware specific programming paradigms still outperforms generic approaches. As GPU enthusiasts are mainly keen on using CUDA (with some of them moving to OpenCL), GPU opponents claim that by hardcore optimization of compute intensive numerical code, CPUs can reach equal or even better results than accelerators, hence taking vector units operating on AVX registers as on chip accelerators. In order to satisfy both CPU and accelerator programmers, it is still not clear which programming interface will eventually turn out to become a de facto standard. Next to GPUs by NVIDIA and AMD, another interesting approach in the accelerator world is Intel's MIC architecture, with a couple of supercomputing projects already being built around this architecture. As it is based on the x86 ISA including the full tool chain from compilers to debuggers to performance analysis tools, MIC aims at minimizing porting effort to accelerators from the programmer's point of view. The talk will present examples from high performance computing that fall into the three abovementioned categories, and how these code examples have been adapted to modern processor and accelerator architectures.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Is GPU enthusiasm vanishing?\",\"authors\":\"C. Trinitis\",\"doi\":\"10.1109/HPCSim.2012.6266945\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, there has been quite a hype on porting compute intensive kernels to GPUs, claiming impressive speedups of sometimes up to more than 100. However, looking at a number of compute intensive applications that have been investigated at TUM, the outcome looks slightly different. In addition, the overhead for porting applications to GPUs, or, more generally speaking, to accelerators, need be taken into consideration. As both very promising and very disappointing results can be obtained on accelerators (depending on the application), as usual the community is divided into GPU enthusiasts on the one hand and GPU opponents on the other hand. In both industrial and academic practice, the question arises what to do with existing compute intensive applications (often numerical simulation codes) that have existed for years or even decades, and which are treated as “never change a running system” code. Basically, these can be divided into three categories: - Code that should not be touched as it most likely will no longer run if anything will be modified (complete rewrite required if it is to run efficiently) - Code where compute intensive parts can be rewritten (partial rewrite required), and - Code that can easily be ported to new programming paradigms (easy adapting possible). Given the fact that CPUs integrate more and more features known from accelerators, one could conclude that most codes would fall into the third category, as the required porting effort seems to be shrinking and compilers are constantly improving. However, although features like automatic parallelization can be carried out with compilers, tuning by hand coding or using hardware specific programming paradigms still outperforms generic approaches. As GPU enthusiasts are mainly keen on using CUDA (with some of them moving to OpenCL), GPU opponents claim that by hardcore optimization of compute intensive numerical code, CPUs can reach equal or even better results than accelerators, hence taking vector units operating on AVX registers as on chip accelerators. In order to satisfy both CPU and accelerator programmers, it is still not clear which programming interface will eventually turn out to become a de facto standard. Next to GPUs by NVIDIA and AMD, another interesting approach in the accelerator world is Intel's MIC architecture, with a couple of supercomputing projects already being built around this architecture. As it is based on the x86 ISA including the full tool chain from compilers to debuggers to performance analysis tools, MIC aims at minimizing porting effort to accelerators from the programmer's point of view. The talk will present examples from high performance computing that fall into the three abovementioned categories, and how these code examples have been adapted to modern processor and accelerator architectures.\",\"PeriodicalId\":428764,\"journal\":{\"name\":\"2012 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCSim.2012.6266945\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2012.6266945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,将计算密集型内核移植到gpu上的宣传相当热烈,声称有时高达100以上的速度令人印象深刻。然而,看看TUM研究的许多计算密集型应用程序,结果看起来略有不同。此外,需要考虑将应用程序移植到gpu,或者更一般地说,移植到加速器的开销。由于在加速器上可以获得非常有希望和非常令人失望的结果(取决于应用程序),因此社区通常分为GPU爱好者和GPU反对者。在工业和学术实践中,出现了一个问题,即如何处理已经存在数年甚至数十年的现有计算密集型应用程序(通常是数值模拟代码),这些应用程序被视为“永远不会改变正在运行的系统”代码。基本上,这些代码可以分为三类:-不应该触碰的代码,因为如果修改任何内容,它很可能不再运行(如果要有效运行,则需要完全重写)-计算密集型部分可以重写的代码(需要部分重写),以及-可以轻松移植到新编程范式的代码(容易适应)。考虑到cpu集成了越来越多来自加速器的特性,人们可以得出结论,大多数代码将属于第三类,因为所需的移植工作似乎正在减少,编译器也在不断改进。然而,尽管像自动并行化这样的特性可以用编译器来实现,但是通过手工编码或使用特定于硬件的编程范例进行调优仍然优于通用方法。由于GPU爱好者主要热衷于使用CUDA(其中一些人转向了OpenCL), GPU的反对者声称,通过对计算密集型数字代码的硬核优化,cpu可以达到与加速器相同甚至更好的结果,因此在AVX寄存器上操作的矢量单元就像在芯片加速器上一样。为了让CPU和加速器程序员都满意,目前还不清楚哪个编程接口最终会成为事实上的标准。除了英伟达(NVIDIA)和AMD的gpu之外,加速器领域另一个有趣的方法是英特尔的MIC架构,目前已经有几个超级计算项目围绕该架构展开。由于它基于x86 ISA,包括从编译器到调试器再到性能分析工具的完整工具链,MIC旨在从程序员的角度最大限度地减少向加速器移植的工作量。该演讲将展示属于上述三种类别的高性能计算示例,以及如何将这些代码示例应用于现代处理器和加速器架构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Is GPU enthusiasm vanishing?
In recent years, there has been quite a hype on porting compute intensive kernels to GPUs, claiming impressive speedups of sometimes up to more than 100. However, looking at a number of compute intensive applications that have been investigated at TUM, the outcome looks slightly different. In addition, the overhead for porting applications to GPUs, or, more generally speaking, to accelerators, need be taken into consideration. As both very promising and very disappointing results can be obtained on accelerators (depending on the application), as usual the community is divided into GPU enthusiasts on the one hand and GPU opponents on the other hand. In both industrial and academic practice, the question arises what to do with existing compute intensive applications (often numerical simulation codes) that have existed for years or even decades, and which are treated as “never change a running system” code. Basically, these can be divided into three categories: - Code that should not be touched as it most likely will no longer run if anything will be modified (complete rewrite required if it is to run efficiently) - Code where compute intensive parts can be rewritten (partial rewrite required), and - Code that can easily be ported to new programming paradigms (easy adapting possible). Given the fact that CPUs integrate more and more features known from accelerators, one could conclude that most codes would fall into the third category, as the required porting effort seems to be shrinking and compilers are constantly improving. However, although features like automatic parallelization can be carried out with compilers, tuning by hand coding or using hardware specific programming paradigms still outperforms generic approaches. As GPU enthusiasts are mainly keen on using CUDA (with some of them moving to OpenCL), GPU opponents claim that by hardcore optimization of compute intensive numerical code, CPUs can reach equal or even better results than accelerators, hence taking vector units operating on AVX registers as on chip accelerators. In order to satisfy both CPU and accelerator programmers, it is still not clear which programming interface will eventually turn out to become a de facto standard. Next to GPUs by NVIDIA and AMD, another interesting approach in the accelerator world is Intel's MIC architecture, with a couple of supercomputing projects already being built around this architecture. As it is based on the x86 ISA including the full tool chain from compilers to debuggers to performance analysis tools, MIC aims at minimizing porting effort to accelerators from the programmer's point of view. The talk will present examples from high performance computing that fall into the three abovementioned categories, and how these code examples have been adapted to modern processor and accelerator architectures.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Adaptive energy-efficient task partitioning for heterogeneous multi-core multiprocessor real-time systems An approach for customizing on-chip interconnect architectures in SoC design An energy consumption model for Energy Efficient Ethernet switches Bid writing: Is project management different? What is appropriate? Simulation of the release and diffusion of neurotransmitters in neuronal synapses: Analysis and modelling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1