Hao Wang, ChangMin Park, Gyungsu Byun, Jung Ho Ahn, N. Kim
{"title":"Alloy: Parallel-serial memory channel architecture for single-chip heterogeneous processor systems","authors":"Hao Wang, ChangMin Park, Gyungsu Byun, Jung Ho Ahn, N. Kim","doi":"10.1109/HPCA.2015.7056041","DOIUrl":null,"url":null,"abstract":"A single-chip heterogeneous processor integrates both CPU and GPU on the same chip, demanding higher memory bandwidth. However, the current parallel interface (e.g., DDR3) can increase neither the number of (memory) channels nor the bit rate of the channels without paying high package and power costs. In contrast, the high-speed serial interface (HSI) can offer much higher bandwidth for the same number of pins and lower power consumption for the same bandwidth than the parallel interface. This allows us to integrate more channels under a pin and/or package power constraint but at the cost of longer latency for memory accesses and higher static energy consumption in particular for idle channels. In this paper, we first provide a deep understanding of recent HSI exhibiting very distinct characteristics from past serial interfaces in terms of bit rate, latency, energy per bit transfer, and static power consumption. To overcome the limitation of using only parallel or serial interfaces, we second propose a hybrid memory channel architecture-Alloy consisting of low-latency parallel and high-bandwidth serial channels. Alloy is assisted by our two proposed techniques: (i), a memory channel partitioning technique adoptively maps physical (memory) pages of latency-sensitive (CPU) and bandwidth-consuming (GPU) applications to parallel and serial channels, respectively, and (ii) a power management technique reduces the static energy consumption of idle serial channels. On average, Alloy provides 21% and 32% higher performance for CPU and GPU, respectively, while consuming total memory interface energy comparable to the baseline parallel channel architecture for diverse mixes of co-running CPU and GPU applications.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"707 1","pages":"296-308"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2015.7056041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
A single-chip heterogeneous processor integrates both CPU and GPU on the same chip, demanding higher memory bandwidth. However, the current parallel interface (e.g., DDR3) can increase neither the number of (memory) channels nor the bit rate of the channels without paying high package and power costs. In contrast, the high-speed serial interface (HSI) can offer much higher bandwidth for the same number of pins and lower power consumption for the same bandwidth than the parallel interface. This allows us to integrate more channels under a pin and/or package power constraint but at the cost of longer latency for memory accesses and higher static energy consumption in particular for idle channels. In this paper, we first provide a deep understanding of recent HSI exhibiting very distinct characteristics from past serial interfaces in terms of bit rate, latency, energy per bit transfer, and static power consumption. To overcome the limitation of using only parallel or serial interfaces, we second propose a hybrid memory channel architecture-Alloy consisting of low-latency parallel and high-bandwidth serial channels. Alloy is assisted by our two proposed techniques: (i), a memory channel partitioning technique adoptively maps physical (memory) pages of latency-sensitive (CPU) and bandwidth-consuming (GPU) applications to parallel and serial channels, respectively, and (ii) a power management technique reduces the static energy consumption of idle serial channels. On average, Alloy provides 21% and 32% higher performance for CPU and GPU, respectively, while consuming total memory interface energy comparable to the baseline parallel channel architecture for diverse mixes of co-running CPU and GPU applications.