{"title":"一个实验性的GPU全局内存性能估计与优化","authors":"Zhu Junfeng, C. Gang, Zhang Keliang, Wu Baifeng","doi":"10.1109/ICSAI.2012.6223155","DOIUrl":null,"url":null,"abstract":"The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order for improving GPGPU application performance by efficiently using GPU global memory, we extend the polyhedral model to capture memory access patterns inside the source programs. We determine the global memory accesses are coalesced or not. We also estimate the memory performance of a GPGPU kernel, with the purpose of eliminating the uncoalesced global memory accesses. Experimental results show that that the present global memory performance model can estimate the global memory performance of these two applications relative accurately and the present global memory optimization methods can significantly improve performance.","PeriodicalId":90521,"journal":{"name":"IEEE International Conference on Systems Biology : [proceedings]. IEEE International Conference on Systems Biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An experimental GPU global memory performance estimation and optimization\",\"authors\":\"Zhu Junfeng, C. Gang, Zhang Keliang, Wu Baifeng\",\"doi\":\"10.1109/ICSAI.2012.6223155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order for improving GPGPU application performance by efficiently using GPU global memory, we extend the polyhedral model to capture memory access patterns inside the source programs. We determine the global memory accesses are coalesced or not. We also estimate the memory performance of a GPGPU kernel, with the purpose of eliminating the uncoalesced global memory accesses. Experimental results show that that the present global memory performance model can estimate the global memory performance of these two applications relative accurately and the present global memory optimization methods can significantly improve performance.\",\"PeriodicalId\":90521,\"journal\":{\"name\":\"IEEE International Conference on Systems Biology : [proceedings]. IEEE International Conference on Systems Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Conference on Systems Biology : [proceedings]. IEEE International Conference on Systems Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSAI.2012.6223155\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Systems Biology : [proceedings]. IEEE International Conference on Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI.2012.6223155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An experimental GPU global memory performance estimation and optimization
The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order for improving GPGPU application performance by efficiently using GPU global memory, we extend the polyhedral model to capture memory access patterns inside the source programs. We determine the global memory accesses are coalesced or not. We also estimate the memory performance of a GPGPU kernel, with the purpose of eliminating the uncoalesced global memory accesses. Experimental results show that that the present global memory performance model can estimate the global memory performance of these two applications relative accurately and the present global memory optimization methods can significantly improve performance.