{"title":"Exploiting eDRAM bandwidth with data prefetching: simulation and measurements","authors":"V. Salapura, J. Brunheroto, F. Redígolo, A. Gara","doi":"10.1109/ICCD.2007.4601945","DOIUrl":null,"url":null,"abstract":"Compared to conventional SRAM, embedded DRAM (eDRAM) offers power, bandwidth and density advantages for large on-chip cache memories. However, eDRAM suffers from comparatively slower access times than conventional SRAM arrays. To hide eDRAM access latencies, the Blue Gene/L(&) supercomputer implements small private prefetch caches. We present an exploration of design trade-offs for the prefetch D-cache for eDRAM. We use full system simulation to consider operating system impact. We validate our modeling environment by comparing our simulation results to measurements on actual Blue Gene systems. Actual execution times also include any system effects not modeled in our performance simulator, and confirm the selection of simulation parameters included in the model. Our experiments show that even small prefetch caches with wide lines efficiently capture spatial locality in many applications. Our 2kB private prefetch caches reduce execution time by 10% on average, effectively hiding the latency of the eDRAM-based memory system.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"85 1","pages":"504-511"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 25th International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2007.4601945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Compared to conventional SRAM, embedded DRAM (eDRAM) offers power, bandwidth and density advantages for large on-chip cache memories. However, eDRAM suffers from comparatively slower access times than conventional SRAM arrays. To hide eDRAM access latencies, the Blue Gene/L(&) supercomputer implements small private prefetch caches. We present an exploration of design trade-offs for the prefetch D-cache for eDRAM. We use full system simulation to consider operating system impact. We validate our modeling environment by comparing our simulation results to measurements on actual Blue Gene systems. Actual execution times also include any system effects not modeled in our performance simulator, and confirm the selection of simulation parameters included in the model. Our experiments show that even small prefetch caches with wide lines efficiently capture spatial locality in many applications. Our 2kB private prefetch caches reduce execution time by 10% on average, effectively hiding the latency of the eDRAM-based memory system.