{"title":"Packet Pump: Overcoming Network Bottleneck in On-Chip Interconnects for GPGPUs*","authors":"Xianwei Cheng, Yang Zhao, Hui Zhao, Yuan Xie","doi":"10.1145/3195970.3196087","DOIUrl":null,"url":null,"abstract":"In order to fully exploit GPGPU's parallel processing power, on-chip interconnects need to provide bandwidth efficient data communication. GPGPUs exhibit a many-to-few-to-many traffic pattern which makes the memory controller connected routers the network bottleneck. Inefficient design of conventional routers causes long queues of packets blocked at memory controllers and thus greatly constrained the network bandwidth. In this work, we employ heterogeneous design techniques and propose a novel decoupled architecture for routers connected with memory controllers. To further improve performance, we propose techniques called Injection Virtual Circuit and Memory-aware Adaptive Routing. We show that our scheme can effectively eliminate NoC bottleneck and improve performance by 78% on average.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3195970.3196087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
In order to fully exploit GPGPU's parallel processing power, on-chip interconnects need to provide bandwidth efficient data communication. GPGPUs exhibit a many-to-few-to-many traffic pattern which makes the memory controller connected routers the network bottleneck. Inefficient design of conventional routers causes long queues of packets blocked at memory controllers and thus greatly constrained the network bandwidth. In this work, we employ heterogeneous design techniques and propose a novel decoupled architecture for routers connected with memory controllers. To further improve performance, we propose techniques called Injection Virtual Circuit and Memory-aware Adaptive Routing. We show that our scheme can effectively eliminate NoC bottleneck and improve performance by 78% on average.