{"title":"LEMap:通过配置文件引导的虚拟地址转换控制大型芯片多处理器缓存中的泄漏","authors":"Jugash Chandarlapati, Mainak Chaudhuri","doi":"10.1109/ICCD.2007.4601934","DOIUrl":null,"url":null,"abstract":"The emerging trend of larger number of cores or processors on a single chip in the server, desktop, and mobile notebook platforms necessarily demands larger amount of on-chip last level cache. However, larger caches threaten to dramatically increase the leakage power as the industry moves into deeper sub-micron technology. In this paper, with the aim of reducing leakage energy we introduce LEMap (low energy map), a novel virtual address translation scheme to control the set of physical pages mapped to each bank of a large multi-banked non-uniform access L2 cache shared across all the cores. Combination of profiling, a simple off-line clustering algorithm, and a new flavor of Irix-style application-directed page placement system call maps the virtual pages that are accessed in the L2 cache roughly together onto the same region of the cache. Thus LEMap makes the access windows of the pages mapped to a region roughly identical and increases the average idle time of a region. As a result, powering down a region after the last access to the clusters of the corresponding virtual pages saves a much bigger amount of L2 cache energy compared to a usual virtual address translation scheme that is oblivious to access patterns. Our execution-driven simulation of an eight-core chip-multiprocessor with a 16 MB shared L2 cache using a 65 nm process on eight shared memory parallel applications drawn from SPLASH-2, SPEC OMP, and DIS suites shows that LEMap, on average, saves 7% of total energy, 50% of L2 cache energy, and 52% of L2 cache power while suffering from a 3% loss in performance compared to a baseline system that employs drowsy cells as well as region power-down without access clustering.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"s1-15 1","pages":"423-430"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"LEMap: Controlling leakage in large chip-multiprocessor caches via profile-guided virtual address translation\",\"authors\":\"Jugash Chandarlapati, Mainak Chaudhuri\",\"doi\":\"10.1109/ICCD.2007.4601934\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The emerging trend of larger number of cores or processors on a single chip in the server, desktop, and mobile notebook platforms necessarily demands larger amount of on-chip last level cache. However, larger caches threaten to dramatically increase the leakage power as the industry moves into deeper sub-micron technology. In this paper, with the aim of reducing leakage energy we introduce LEMap (low energy map), a novel virtual address translation scheme to control the set of physical pages mapped to each bank of a large multi-banked non-uniform access L2 cache shared across all the cores. Combination of profiling, a simple off-line clustering algorithm, and a new flavor of Irix-style application-directed page placement system call maps the virtual pages that are accessed in the L2 cache roughly together onto the same region of the cache. Thus LEMap makes the access windows of the pages mapped to a region roughly identical and increases the average idle time of a region. As a result, powering down a region after the last access to the clusters of the corresponding virtual pages saves a much bigger amount of L2 cache energy compared to a usual virtual address translation scheme that is oblivious to access patterns. Our execution-driven simulation of an eight-core chip-multiprocessor with a 16 MB shared L2 cache using a 65 nm process on eight shared memory parallel applications drawn from SPLASH-2, SPEC OMP, and DIS suites shows that LEMap, on average, saves 7% of total energy, 50% of L2 cache energy, and 52% of L2 cache power while suffering from a 3% loss in performance compared to a baseline system that employs drowsy cells as well as region power-down without access clustering.\",\"PeriodicalId\":6306,\"journal\":{\"name\":\"2007 25th International Conference on Computer Design\",\"volume\":\"s1-15 1\",\"pages\":\"423-430\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 25th International Conference on Computer Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2007.4601934\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 25th International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2007.4601934","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
LEMap: Controlling leakage in large chip-multiprocessor caches via profile-guided virtual address translation
The emerging trend of larger number of cores or processors on a single chip in the server, desktop, and mobile notebook platforms necessarily demands larger amount of on-chip last level cache. However, larger caches threaten to dramatically increase the leakage power as the industry moves into deeper sub-micron technology. In this paper, with the aim of reducing leakage energy we introduce LEMap (low energy map), a novel virtual address translation scheme to control the set of physical pages mapped to each bank of a large multi-banked non-uniform access L2 cache shared across all the cores. Combination of profiling, a simple off-line clustering algorithm, and a new flavor of Irix-style application-directed page placement system call maps the virtual pages that are accessed in the L2 cache roughly together onto the same region of the cache. Thus LEMap makes the access windows of the pages mapped to a region roughly identical and increases the average idle time of a region. As a result, powering down a region after the last access to the clusters of the corresponding virtual pages saves a much bigger amount of L2 cache energy compared to a usual virtual address translation scheme that is oblivious to access patterns. Our execution-driven simulation of an eight-core chip-multiprocessor with a 16 MB shared L2 cache using a 65 nm process on eight shared memory parallel applications drawn from SPLASH-2, SPEC OMP, and DIS suites shows that LEMap, on average, saves 7% of total energy, 50% of L2 cache energy, and 52% of L2 cache power while suffering from a 3% loss in performance compared to a baseline system that employs drowsy cells as well as region power-down without access clustering.