编译器、体系结构和人工优化对多处理器CFD性能的协同效应

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI:10.1145/224170.224426

M. Kuba, C. Polychronopoulos, K. Gallivan

{"title":"编译器、体系结构和人工优化对多处理器CFD性能的协同效应","authors":"M. Kuba, C. Polychronopoulos, K. Gallivan","doi":"10.1145/224170.224426","DOIUrl":null,"url":null,"abstract":"This paper discusses the comprehensive performance profiling, improvement and benchmarking of a Computational Fluid Dynamics code, one of the Grand Challenge applications, on three popular multiprocessors. In the process of analyzing performance we considered language, compiler, architecture, and algorithmic changes and quantified each of them and their incremental contribution to bottom-line performance. We demonstrate that parallelization alone cannot result in significant gains if the granularity of parallel threads and the effect of parallelization on data locality are not taken into account. Unlike benchmarking studies that often focus on the performance or effectiveness of parallelizing compilers on specific loop kernels, we used the entire CFD code to measure the global effectiveness of compilers and parallel architectures. We probed the performance bottlenecks in each case and derived solutions which eliminate or neutralize the performance inhibiting factors. The major conclusion of our work is that overall performance is extremely sensitive to the synergetic effects of compiler optimizations, algorithmic and code tuning, and architectural idiosyncrasies.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Synergetic Effect of Compiler, Architecture, and Manual Optimizations on the Performance of CFD on Multiprocessors\",\"authors\":\"M. Kuba, C. Polychronopoulos, K. Gallivan\",\"doi\":\"10.1145/224170.224426\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper discusses the comprehensive performance profiling, improvement and benchmarking of a Computational Fluid Dynamics code, one of the Grand Challenge applications, on three popular multiprocessors. In the process of analyzing performance we considered language, compiler, architecture, and algorithmic changes and quantified each of them and their incremental contribution to bottom-line performance. We demonstrate that parallelization alone cannot result in significant gains if the granularity of parallel threads and the effect of parallelization on data locality are not taken into account. Unlike benchmarking studies that often focus on the performance or effectiveness of parallelizing compilers on specific loop kernels, we used the entire CFD code to measure the global effectiveness of compilers and parallel architectures. We probed the performance bottlenecks in each case and derived solutions which eliminate or neutralize the performance inhibiting factors. The major conclusion of our work is that overall performance is extremely sensitive to the synergetic effects of compiler optimizations, algorithmic and code tuning, and architectural idiosyncrasies.\",\"PeriodicalId\":269909,\"journal\":{\"name\":\"Proceedings of the IEEE/ACM SC95 Conference\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the IEEE/ACM SC95 Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/224170.224426\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM SC95 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/224170.224426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文讨论了大挑战应用程序之一的计算流体动力学代码在三种流行的多处理器上的综合性能分析、改进和基准测试。在分析性能的过程中，我们考虑了语言、编译器、架构和算法的变化，并量化了每一个变化以及它们对底线性能的增量贡献。我们证明，如果不考虑并行线程的粒度和并行化对数据局部性的影响，单独的并行化不能带来显著的收益。与通常关注特定循环内核上并行编译器的性能或有效性的基准测试研究不同，我们使用整个CFD代码来测量编译器和并行架构的全局有效性。我们探讨了每种情况下的性能瓶颈，并推导了消除或抵消性能抑制因素的解决方案。我们工作的主要结论是，总体性能对编译器优化、算法和代码调优以及架构特性的协同效应极其敏感。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Synergetic Effect of Compiler, Architecture, and Manual Optimizations on the Performance of CFD on Multiprocessors

This paper discusses the comprehensive performance profiling, improvement and benchmarking of a Computational Fluid Dynamics code, one of the Grand Challenge applications, on three popular multiprocessors. In the process of analyzing performance we considered language, compiler, architecture, and algorithmic changes and quantified each of them and their incremental contribution to bottom-line performance. We demonstrate that parallelization alone cannot result in significant gains if the granularity of parallel threads and the effect of parallelization on data locality are not taken into account. Unlike benchmarking studies that often focus on the performance or effectiveness of parallelizing compilers on specific loop kernels, we used the entire CFD code to measure the global effectiveness of compilers and parallel architectures. We probed the performance bottlenecks in each case and derived solutions which eliminate or neutralize the performance inhibiting factors. The major conclusion of our work is that overall performance is extremely sensitive to the synergetic effects of compiler optimizations, algorithmic and code tuning, and architectural idiosyncrasies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the IEEE/ACM SC95 Conference

自引率

0.00%

发文量