Anastasios Psarras, Junghee Lee, Pavlos M. Mattheakis, C. Nicopoulos, G. Dimitrakopoulos
{"title":"一种低功耗的片上网络架构,用于基于磁片的芯片多处理器","authors":"Anastasios Psarras, Junghee Lee, Pavlos M. Mattheakis, C. Nicopoulos, G. Dimitrakopoulos","doi":"10.1145/2902961.2903010","DOIUrl":null,"url":null,"abstract":"Technology scaling of tiled-based CMPs reduces the physical size of each tile and increases the number of tiles per die. This trend directly impacts the on-chip interconnect; even though the tile population increases, the inter-tile link distances scale down proportionally to the tile dimensions. The decreasing inter-tile wire lengths can be exploited to enable swift link traversal between neighboring tiles, after appropriate wire engineering. Building on this premise, we propose a technique to rapidly transfer flits between adjacent routers in half a clock cycle, by utilizing both edges of the clock during the sending and receiving operations. Half-cycle link traversal enables, for the first time, substantial reductions in (a) link power, irrespective of the data switching profile, and (b) buffer power (through buffer-size reduction), without incurring any latency/throughput loss. In fact, the proposed architecture also yields some latency improvements over a baseline NoC. Detailed hardware analysis using placed-and-routed designs, and cycle-accurate full-system simulations corroborate the significant power and latency improvements.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"A low-power network-on-chip architecture for tile-based chip multi-processors\",\"authors\":\"Anastasios Psarras, Junghee Lee, Pavlos M. Mattheakis, C. Nicopoulos, G. Dimitrakopoulos\",\"doi\":\"10.1145/2902961.2903010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Technology scaling of tiled-based CMPs reduces the physical size of each tile and increases the number of tiles per die. This trend directly impacts the on-chip interconnect; even though the tile population increases, the inter-tile link distances scale down proportionally to the tile dimensions. The decreasing inter-tile wire lengths can be exploited to enable swift link traversal between neighboring tiles, after appropriate wire engineering. Building on this premise, we propose a technique to rapidly transfer flits between adjacent routers in half a clock cycle, by utilizing both edges of the clock during the sending and receiving operations. Half-cycle link traversal enables, for the first time, substantial reductions in (a) link power, irrespective of the data switching profile, and (b) buffer power (through buffer-size reduction), without incurring any latency/throughput loss. In fact, the proposed architecture also yields some latency improvements over a baseline NoC. Detailed hardware analysis using placed-and-routed designs, and cycle-accurate full-system simulations corroborate the significant power and latency improvements.\",\"PeriodicalId\":407054,\"journal\":{\"name\":\"2016 International Great Lakes Symposium on VLSI (GLSVLSI)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Great Lakes Symposium on VLSI (GLSVLSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2902961.2903010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2902961.2903010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A low-power network-on-chip architecture for tile-based chip multi-processors
Technology scaling of tiled-based CMPs reduces the physical size of each tile and increases the number of tiles per die. This trend directly impacts the on-chip interconnect; even though the tile population increases, the inter-tile link distances scale down proportionally to the tile dimensions. The decreasing inter-tile wire lengths can be exploited to enable swift link traversal between neighboring tiles, after appropriate wire engineering. Building on this premise, we propose a technique to rapidly transfer flits between adjacent routers in half a clock cycle, by utilizing both edges of the clock during the sending and receiving operations. Half-cycle link traversal enables, for the first time, substantial reductions in (a) link power, irrespective of the data switching profile, and (b) buffer power (through buffer-size reduction), without incurring any latency/throughput loss. In fact, the proposed architecture also yields some latency improvements over a baseline NoC. Detailed hardware analysis using placed-and-routed designs, and cycle-accurate full-system simulations corroborate the significant power and latency improvements.