{"title":"编译器、体系结构和人工优化对多处理器CFD性能的协同效应","authors":"M. Kuba, C. Polychronopoulos, K. Gallivan","doi":"10.1145/224170.224426","DOIUrl":null,"url":null,"abstract":"This paper discusses the comprehensive performance profiling, improvement and benchmarking of a Computational Fluid Dynamics code, one of the Grand Challenge applications, on three popular multiprocessors. In the process of analyzing performance we considered language, compiler, architecture, and algorithmic changes and quantified each of them and their incremental contribution to bottom-line performance. We demonstrate that parallelization alone cannot result in significant gains if the granularity of parallel threads and the effect of parallelization on data locality are not taken into account. Unlike benchmarking studies that often focus on the performance or effectiveness of parallelizing compilers on specific loop kernels, we used the entire CFD code to measure the global effectiveness of compilers and parallel architectures. We probed the performance bottlenecks in each case and derived solutions which eliminate or neutralize the performance inhibiting factors. The major conclusion of our work is that overall performance is extremely sensitive to the synergetic effects of compiler optimizations, algorithmic and code tuning, and architectural idiosyncrasies.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Synergetic Effect of Compiler, Architecture, and Manual Optimizations on the Performance of CFD on Multiprocessors\",\"authors\":\"M. Kuba, C. Polychronopoulos, K. Gallivan\",\"doi\":\"10.1145/224170.224426\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper discusses the comprehensive performance profiling, improvement and benchmarking of a Computational Fluid Dynamics code, one of the Grand Challenge applications, on three popular multiprocessors. In the process of analyzing performance we considered language, compiler, architecture, and algorithmic changes and quantified each of them and their incremental contribution to bottom-line performance. We demonstrate that parallelization alone cannot result in significant gains if the granularity of parallel threads and the effect of parallelization on data locality are not taken into account. Unlike benchmarking studies that often focus on the performance or effectiveness of parallelizing compilers on specific loop kernels, we used the entire CFD code to measure the global effectiveness of compilers and parallel architectures. We probed the performance bottlenecks in each case and derived solutions which eliminate or neutralize the performance inhibiting factors. The major conclusion of our work is that overall performance is extremely sensitive to the synergetic effects of compiler optimizations, algorithmic and code tuning, and architectural idiosyncrasies.\",\"PeriodicalId\":269909,\"journal\":{\"name\":\"Proceedings of the IEEE/ACM SC95 Conference\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the IEEE/ACM SC95 Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/224170.224426\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM SC95 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/224170.224426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Synergetic Effect of Compiler, Architecture, and Manual Optimizations on the Performance of CFD on Multiprocessors
This paper discusses the comprehensive performance profiling, improvement and benchmarking of a Computational Fluid Dynamics code, one of the Grand Challenge applications, on three popular multiprocessors. In the process of analyzing performance we considered language, compiler, architecture, and algorithmic changes and quantified each of them and their incremental contribution to bottom-line performance. We demonstrate that parallelization alone cannot result in significant gains if the granularity of parallel threads and the effect of parallelization on data locality are not taken into account. Unlike benchmarking studies that often focus on the performance or effectiveness of parallelizing compilers on specific loop kernels, we used the entire CFD code to measure the global effectiveness of compilers and parallel architectures. We probed the performance bottlenecks in each case and derived solutions which eliminate or neutralize the performance inhibiting factors. The major conclusion of our work is that overall performance is extremely sensitive to the synergetic effects of compiler optimizations, algorithmic and code tuning, and architectural idiosyncrasies.