集成CPU-GPU架构的异构缓存层次管理

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI:10.1109/HPEC.2019.8916239

Hao Wen, W. Zhang

{"title":"集成CPU-GPU架构的异构缓存层次管理","authors":"Hao Wen, W. Zhang","doi":"10.1109/HPEC.2019.8916239","DOIUrl":null,"url":null,"abstract":"Unlike the traditional CPU-GPU heterogeneous architecture where CPU and GPU have separate DRAM and memory address space, current heterogeneous CPU-GPU architectures integrate CPU and GPU in the same die and share the same last level cache (LLC) and memory. For the two-level cache hierarchy in which CPU and GPU have their own private L1 caches but share the LLC, conflict misses in the LLC between CPU and GPU may degrade both CPU and GPU performance. In addition, how the CPU and GPU memory requests flows (write back flow from L1 and cache fill flow from main memory) are managed may impact the performance. In this work, we study three different cache requests flow management policies. The first policy is selective GPU LLC fill, which selectively fills the GPU requests in the LLC. The second policy is selective GPU L1 write back, which selectively writes back GPU blocks in L1 cache to L2 cache. The final policy is a hybrid policy that combines the first two, and selectively replaces CPU blocks in the LLC. Our experimental results indicate that the third policy is the best of these three. On average, it can improve the CPU performance by about 10%, with the highest CPU performance improvement of 22%, with 0.8% averaged GPU performance overhead.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Heterogeneous Cache Hierarchy Management for Integrated CPU-GPU Architecture\",\"authors\":\"Hao Wen, W. Zhang\",\"doi\":\"10.1109/HPEC.2019.8916239\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unlike the traditional CPU-GPU heterogeneous architecture where CPU and GPU have separate DRAM and memory address space, current heterogeneous CPU-GPU architectures integrate CPU and GPU in the same die and share the same last level cache (LLC) and memory. For the two-level cache hierarchy in which CPU and GPU have their own private L1 caches but share the LLC, conflict misses in the LLC between CPU and GPU may degrade both CPU and GPU performance. In addition, how the CPU and GPU memory requests flows (write back flow from L1 and cache fill flow from main memory) are managed may impact the performance. In this work, we study three different cache requests flow management policies. The first policy is selective GPU LLC fill, which selectively fills the GPU requests in the LLC. The second policy is selective GPU L1 write back, which selectively writes back GPU blocks in L1 cache to L2 cache. The final policy is a hybrid policy that combines the first two, and selectively replaces CPU blocks in the LLC. Our experimental results indicate that the third policy is the best of these three. On average, it can improve the CPU performance by about 10%, with the highest CPU performance improvement of 22%, with 0.8% averaged GPU performance overhead.\",\"PeriodicalId\":184253,\"journal\":{\"name\":\"2019 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC.2019.8916239\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2019.8916239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

与传统的CPU-GPU异构架构不同，CPU和GPU有单独的DRAM和内存地址空间，当前的异构CPU-GPU架构将CPU和GPU集成在同一个die中，并共享相同的最后一级缓存(LLC)和内存。对于CPU和GPU各自拥有私有L1缓存但共享LLC的两级缓存结构，CPU和GPU之间的LLC中的冲突缺失可能会降低CPU和GPU的性能。此外，CPU和GPU内存请求流(来自L1的写回流和来自主存的缓存填充流)的管理方式可能会影响性能。在这项工作中，我们研究了三种不同的缓存请求流管理策略。第一种策略是选择性GPU LLC填充，有选择地填充LLC中的GPU请求。第二种策略是选择性GPU L1回写，有选择地将L1缓存中的GPU块回写到L2缓存中。最后一种策略是前两种策略的混合策略，并有选择地替换LLC中的CPU块。我们的实验结果表明，第三种策略是这三种策略中最好的。平均而言，它可以将CPU性能提高约10%，最高CPU性能提高22%，平均GPU性能开销为0.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Heterogeneous Cache Hierarchy Management for Integrated CPU-GPU Architecture

Unlike the traditional CPU-GPU heterogeneous architecture where CPU and GPU have separate DRAM and memory address space, current heterogeneous CPU-GPU architectures integrate CPU and GPU in the same die and share the same last level cache (LLC) and memory. For the two-level cache hierarchy in which CPU and GPU have their own private L1 caches but share the LLC, conflict misses in the LLC between CPU and GPU may degrade both CPU and GPU performance. In addition, how the CPU and GPU memory requests flows (write back flow from L1 and cache fill flow from main memory) are managed may impact the performance. In this work, we study three different cache requests flow management policies. The first policy is selective GPU LLC fill, which selectively fills the GPU requests in the LLC. The second policy is selective GPU L1 write back, which selectively writes back GPU blocks in L1 cache to L2 cache. The final policy is a hybrid policy that combines the first two, and selectively replaces CPU blocks in the LLC. Our experimental results indicate that the third policy is the best of these three. On average, it can improve the CPU performance by about 10%, with the highest CPU performance improvement of 22%, with 0.8% averaged GPU performance overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量