弹性管道:解决GPU片上共享内存库冲突

ACM International Conference on Computing Frontiers Pub Date : 2011-05-03 DOI:10.1145/2016604.2016608

C. Gou, G. Gaydadjiev

{"title":"弹性管道:解决GPU片上共享内存库冲突","authors":"C. Gou, G. Gaydadjiev","doi":"10.1145/2016604.2016608","DOIUrl":null,"url":null,"abstract":"One of the major problems with the GPU on-chip shared memory is bank conflicts. We observed that the throughput of the GPU processor core is often constrained neither by the shared memory bandwidth, nor by the shared memory latency (as long as it stays constant), but is rather due to the varied latencies caused by memory bank conflicts. This results in conflicts at the writeback stage of the in-order pipeline and pipeline stalls, thus degrading system throughput. Based on this observation, we investigate and propose a novel elastic pipeline design that minimizes the negative impact of on-chip memory bank conflicts on system throughput, by decoupling bank conflicts from pipeline stalls. Simulation results show that our proposed elastic pipeline together with the co-designed bank-conflict aware warp scheduling reduces the pipeline stalls by up to 64.0% (with 42.3% on average) and improves the overall performance by up to 20.7% (on average 13.3%) for our benchmark applications, at trivial hardware overhead.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Elastic pipeline: addressing GPU on-chip shared memory bank conflicts\",\"authors\":\"C. Gou, G. Gaydadjiev\",\"doi\":\"10.1145/2016604.2016608\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the major problems with the GPU on-chip shared memory is bank conflicts. We observed that the throughput of the GPU processor core is often constrained neither by the shared memory bandwidth, nor by the shared memory latency (as long as it stays constant), but is rather due to the varied latencies caused by memory bank conflicts. This results in conflicts at the writeback stage of the in-order pipeline and pipeline stalls, thus degrading system throughput. Based on this observation, we investigate and propose a novel elastic pipeline design that minimizes the negative impact of on-chip memory bank conflicts on system throughput, by decoupling bank conflicts from pipeline stalls. Simulation results show that our proposed elastic pipeline together with the co-designed bank-conflict aware warp scheduling reduces the pipeline stalls by up to 64.0% (with 42.3% on average) and improves the overall performance by up to 20.7% (on average 13.3%) for our benchmark applications, at trivial hardware overhead.\",\"PeriodicalId\":430420,\"journal\":{\"name\":\"ACM International Conference on Computing Frontiers\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM International Conference on Computing Frontiers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2016604.2016608\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2016604.2016608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

GPU片上共享内存的主要问题之一是库冲突。我们观察到GPU处理器核心的吞吐量通常既不受共享内存带宽的限制，也不受共享内存延迟的限制(只要它保持不变)，而是由于内存库冲突引起的各种延迟。这将导致有序管道回写阶段的冲突和管道停滞，从而降低系统吞吐量。基于这一观察，我们研究并提出了一种新的弹性管道设计，通过将存储库冲突与管道失速解耦，将片上存储库冲突对系统吞吐量的负面影响降至最低。仿真结果表明，我们提出的弹性管道以及共同设计的感知银行冲突的warp调度在我们的基准应用程序中减少了高达64.0%(平均42.3%)的管道失速，并将总体性能提高了高达20.7%(平均13.3%)，而硬件开销很小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Elastic pipeline: addressing GPU on-chip shared memory bank conflicts

One of the major problems with the GPU on-chip shared memory is bank conflicts. We observed that the throughput of the GPU processor core is often constrained neither by the shared memory bandwidth, nor by the shared memory latency (as long as it stays constant), but is rather due to the varied latencies caused by memory bank conflicts. This results in conflicts at the writeback stage of the in-order pipeline and pipeline stalls, thus degrading system throughput. Based on this observation, we investigate and propose a novel elastic pipeline design that minimizes the negative impact of on-chip memory bank conflicts on system throughput, by decoupling bank conflicts from pipeline stalls. Simulation results show that our proposed elastic pipeline together with the co-designed bank-conflict aware warp scheduling reduces the pipeline stalls by up to 64.0% (with 42.3% on average) and improves the overall performance by up to 20.7% (on average 13.3%) for our benchmark applications, at trivial hardware overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM International Conference on Computing Frontiers

自引率

0.00%

发文量