Bo Qiao, Oliver Reiche, M. A. Özkan, J. Teich, Frank Hannig
{"title":"高效并行减少gpu与Hipacc","authors":"Bo Qiao, Oliver Reiche, M. A. Özkan, J. Teich, Frank Hannig","doi":"10.1145/3378678.3391885","DOIUrl":null,"url":null,"abstract":"Hipacc is a domain-specific language for ease of programming image processing applications on hardware accelerators such as GPUs. It relieves the burden of manually porting algorithms to hardware for developers with the help of domain- and architecture-specific knowledge. One fundamental operation in image processing is reduction. Global reduction operators are the building blocks of many widely used algorithms, including image normalization, similarity estimation, etc. This paper presents an efficient approach to perform parallel reductions on GPUs with Hipacc. Our proposed approach benefits from the continuous effort of performance and programmability improvement by hardware vendors, for example, by utilizing the latest low-level primitives from Nvidia. Results show our approach achieves a speedup of up to 3.43 over an existing Hipacc implementation with traditional optimization methods, and a speedup of up to 9.02 over an implementation using the Thrust library from Nvidia.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Efficient parallel reduction on GPUs with Hipacc\",\"authors\":\"Bo Qiao, Oliver Reiche, M. A. Özkan, J. Teich, Frank Hannig\",\"doi\":\"10.1145/3378678.3391885\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hipacc is a domain-specific language for ease of programming image processing applications on hardware accelerators such as GPUs. It relieves the burden of manually porting algorithms to hardware for developers with the help of domain- and architecture-specific knowledge. One fundamental operation in image processing is reduction. Global reduction operators are the building blocks of many widely used algorithms, including image normalization, similarity estimation, etc. This paper presents an efficient approach to perform parallel reductions on GPUs with Hipacc. Our proposed approach benefits from the continuous effort of performance and programmability improvement by hardware vendors, for example, by utilizing the latest low-level primitives from Nvidia. Results show our approach achieves a speedup of up to 3.43 over an existing Hipacc implementation with traditional optimization methods, and a speedup of up to 9.02 over an implementation using the Thrust library from Nvidia.\",\"PeriodicalId\":383191,\"journal\":{\"name\":\"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3378678.3391885\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3378678.3391885","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hipacc is a domain-specific language for ease of programming image processing applications on hardware accelerators such as GPUs. It relieves the burden of manually porting algorithms to hardware for developers with the help of domain- and architecture-specific knowledge. One fundamental operation in image processing is reduction. Global reduction operators are the building blocks of many widely used algorithms, including image normalization, similarity estimation, etc. This paper presents an efficient approach to perform parallel reductions on GPUs with Hipacc. Our proposed approach benefits from the continuous effort of performance and programmability improvement by hardware vendors, for example, by utilizing the latest low-level primitives from Nvidia. Results show our approach achieves a speedup of up to 3.43 over an existing Hipacc implementation with traditional optimization methods, and a speedup of up to 9.02 over an implementation using the Thrust library from Nvidia.