{"title":"A Latency-Efficient Router Architecture for CMP Systems","authors":"Antoni Roca, J. Flich, F. Silla, J. Duato","doi":"10.1109/DSD.2010.42","DOIUrl":null,"url":null,"abstract":"As technology advances, the number of cores in Chip Multi Processor systems (CMPs) and Multi Processor Systems-on-Chips (MPSoCs) keeps increasing. Current test chips and products reach tens of cores, and it is expected to reach hundreds of cores in the near future. Such complexity demands for an efficient network-on-chip (NoC). The common choice to build such networks is the 2D mesh topology (as it matches the regular tile-based design) and the Dimension-Order Routing (DOR) algorithm (because its simplicity). The network in such systems must provide sustained throughput and ultra low latencies. One of the key components in the network is the router, and thus, it plays a major role when designing for such performance levels. In this paper we propose a new pipelined router design focused in reducing the router latency. As a first step we identify the router components that take most of the critical path, and thus limit the router frequency. In particular, the arbiter is the one limiting the performance of the router. Based on this fact, we simplify the arbiter logic by using multiple smaller arbiters. The initial set of requests in the initial arbiter is then distributed over the smaller arbiters that operate in parallel. With this design procedure, and with a proper internal router organization, different router architectures are evolved. All of them enable the use of smaller arbiters in parallel by replicating ports and assuming the use of the DOR algorithm. The net result of such changes is a faster router. Preliminary results demonstrate a router latency reduction ranging from 10% to 21% with an increase of the router area. Network latency is reduced in a range from 11% to 15%.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2010.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
As technology advances, the number of cores in Chip Multi Processor systems (CMPs) and Multi Processor Systems-on-Chips (MPSoCs) keeps increasing. Current test chips and products reach tens of cores, and it is expected to reach hundreds of cores in the near future. Such complexity demands for an efficient network-on-chip (NoC). The common choice to build such networks is the 2D mesh topology (as it matches the regular tile-based design) and the Dimension-Order Routing (DOR) algorithm (because its simplicity). The network in such systems must provide sustained throughput and ultra low latencies. One of the key components in the network is the router, and thus, it plays a major role when designing for such performance levels. In this paper we propose a new pipelined router design focused in reducing the router latency. As a first step we identify the router components that take most of the critical path, and thus limit the router frequency. In particular, the arbiter is the one limiting the performance of the router. Based on this fact, we simplify the arbiter logic by using multiple smaller arbiters. The initial set of requests in the initial arbiter is then distributed over the smaller arbiters that operate in parallel. With this design procedure, and with a proper internal router organization, different router architectures are evolved. All of them enable the use of smaller arbiters in parallel by replicating ports and assuming the use of the DOR algorithm. The net result of such changes is a faster router. Preliminary results demonstrate a router latency reduction ranging from 10% to 21% with an increase of the router area. Network latency is reduced in a range from 11% to 15%.