A Survey of GPGPU Parallel Processing Architecture Performance Optimization

Shiwei Jia, Z. Tian, Yueyuan Ma, Chenglu Sun, Yimen Zhang, Yuming Zhang
{"title":"A Survey of GPGPU Parallel Processing Architecture Performance Optimization","authors":"Shiwei Jia, Z. Tian, Yueyuan Ma, Chenglu Sun, Yimen Zhang, Yuming Zhang","doi":"10.1109/icisfall51598.2021.9627400","DOIUrl":null,"url":null,"abstract":"General purpose graphic processor unit (GPGPU) supports various applications' execution in different fields with high-performance computing capability due to its powerful parallel processing architecture. However, GPGPU parallel processing architecture also has the “memory wall” issue. When memory access in application is intensive or irregular, memory resource competition occurs and then degrade the performance of memory system. In addition, with multithreads' requirement for different on-chip resources such as register and warp slot being inconsistant, as well as the branch divergence irregular computing applications, the development of thread level parallelism (TLP) is severely restrited. Due to the restrictions of memory access and TLP, the acceleration capability of GPGPU large-scale parallel processing architecture has not been developed effectively. Alleviating memory resource contention and improving TLP is the performance optimization hotspot for current GPGPU architecture. In this paper we research how memory access optimization and TLP improvement could contribute to the optimization of parallel processing architecture performance. First we find that memory access optimization could be accomplished by three ways: reducing the number of global memory access, improving memory access latency hiding capability and optimizing cache subsystem performance. Then in order to improve TLP, optimizing thread allocation scheme, developing data approximation and redundancy, as well as compacting branch divergence, researches of these three aspects are surveyed. We also analyze the working mechanism, advantages and challenges of each research. At the end, we suggest the direction of future GPGPU parallel processing architecture optimization.","PeriodicalId":240142,"journal":{"name":"2021 IEEE/ACIS 20th International Fall Conference on Computer and Information Science (ICIS Fall)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACIS 20th International Fall Conference on Computer and Information Science (ICIS Fall)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icisfall51598.2021.9627400","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

General purpose graphic processor unit (GPGPU) supports various applications' execution in different fields with high-performance computing capability due to its powerful parallel processing architecture. However, GPGPU parallel processing architecture also has the “memory wall” issue. When memory access in application is intensive or irregular, memory resource competition occurs and then degrade the performance of memory system. In addition, with multithreads' requirement for different on-chip resources such as register and warp slot being inconsistant, as well as the branch divergence irregular computing applications, the development of thread level parallelism (TLP) is severely restrited. Due to the restrictions of memory access and TLP, the acceleration capability of GPGPU large-scale parallel processing architecture has not been developed effectively. Alleviating memory resource contention and improving TLP is the performance optimization hotspot for current GPGPU architecture. In this paper we research how memory access optimization and TLP improvement could contribute to the optimization of parallel processing architecture performance. First we find that memory access optimization could be accomplished by three ways: reducing the number of global memory access, improving memory access latency hiding capability and optimizing cache subsystem performance. Then in order to improve TLP, optimizing thread allocation scheme, developing data approximation and redundancy, as well as compacting branch divergence, researches of these three aspects are surveyed. We also analyze the working mechanism, advantages and challenges of each research. At the end, we suggest the direction of future GPGPU parallel processing architecture optimization.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GPGPU并行处理架构性能优化研究综述
通用图形处理器单元(GPGPU)由于其强大的并行处理架构,能够以高性能的计算能力支持不同领域的各种应用程序的执行。然而,GPGPU并行处理架构也存在“内存墙”问题。当应用程序中内存访问频繁或不规律时,会引起内存资源的竞争,从而降低内存系统的性能。此外,由于多线程对不同片上资源(如寄存器和warp slot)的要求不一致,以及分支发散的不规则计算应用,严重制约了线程级并行(TLP)的发展。由于内存访问和TLP的限制,GPGPU大规模并行处理架构的加速能力没有得到有效开发。缓解内存资源争用和提高TLP是当前GPGPU架构性能优化的热点。本文研究了内存访问优化和TLP改进如何有助于优化并行处理体系结构的性能。首先,我们发现可以通过减少全局内存访问次数、提高内存访问延迟隐藏能力和优化缓存子系统性能来实现内存访问优化。然后从提高TLP、优化线程分配方案、发展数据逼近和冗余、压缩分支散度三个方面进行了综述。分析了各项研究的工作机制、优势和挑战。最后,提出了未来GPGPU并行处理架构优化的方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Music Playback Control System Based on Facial Expression Recognition Experiences in Developing and Testing BBC Micro: bit Games in a K-12 Coding Club during the COVID-19 Pandemic A Void-Avoidable and Reliability-Based Opportunistic Energy Efficiency Routing for Mobile Sparse Underwater Acoustic Sensor Network A Survey of GPGPU Parallel Processing Architecture Performance Optimization The COVID-19 Question Answering System Based on Knowledge Graph
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1