解决高速排队瓶颈问题

13th Symposium on High Performance Interconnects (HOTI'05) Pub Date : 2005-08-17 DOI:10.1109/CONECT.2005.7

S. Sushanth Kumar, J. Turner, P. Crowley

{"title":"解决高速排队瓶颈问题","authors":"S. Sushanth Kumar, J. Turner, P. Crowley","doi":"10.1109/CONECT.2005.7","DOIUrl":null,"url":null,"abstract":"Modern routers and switch fabrics can have hundreds of input and output ports running at up to 10 Gb/s; 40 Gb/s systems are starting to appear. At these rates, the performance of the buffering and queuing subsystem becomes a significant bottleneck. In high performance routers with more than a few queues, packet buffering is typically implemented using DRAM for data storage and a combination of off-chip and on-chip SRAM for storing the linked-list nodes and packet length, and the queue headers, respectively. This paper focuses on the performance bottlenecks associated with the use of off-chip SRAM. We show how the combination of implicit buffer pointers and multi-buffer list nodes can dramatically reduce the impact of buffering and queuing subsystem on queuing performance. We also show how combining it with coarse-grained scheduling can improve the performance of fair queuing algorithms, while also reducing the amount of off-chip memory and bandwidth needed. These techniques can reduce the amount of SRAM needed to hold the list nodes by a factor of 10 at the cost of about 10% wastage of the DRAM space, assuming an aggregation degree of 16.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Addressing queuing bottlenecks at high speeds\",\"authors\":\"S. Sushanth Kumar, J. Turner, P. Crowley\",\"doi\":\"10.1109/CONECT.2005.7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern routers and switch fabrics can have hundreds of input and output ports running at up to 10 Gb/s; 40 Gb/s systems are starting to appear. At these rates, the performance of the buffering and queuing subsystem becomes a significant bottleneck. In high performance routers with more than a few queues, packet buffering is typically implemented using DRAM for data storage and a combination of off-chip and on-chip SRAM for storing the linked-list nodes and packet length, and the queue headers, respectively. This paper focuses on the performance bottlenecks associated with the use of off-chip SRAM. We show how the combination of implicit buffer pointers and multi-buffer list nodes can dramatically reduce the impact of buffering and queuing subsystem on queuing performance. We also show how combining it with coarse-grained scheduling can improve the performance of fair queuing algorithms, while also reducing the amount of off-chip memory and bandwidth needed. These techniques can reduce the amount of SRAM needed to hold the list nodes by a factor of 10 at the cost of about 10% wastage of the DRAM space, assuming an aggregation degree of 16.\",\"PeriodicalId\":148282,\"journal\":{\"name\":\"13th Symposium on High Performance Interconnects (HOTI'05)\",\"volume\":\"128 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"13th Symposium on High Performance Interconnects (HOTI'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONECT.2005.7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"13th Symposium on High Performance Interconnects (HOTI'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONECT.2005.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

现代路由器和交换机结构可以有数百个输入和输出端口，运行速度高达10gb /s;40 Gb/s的系统开始出现。在这样的速率下，缓冲和排队子系统的性能将成为一个重要的瓶颈。在具有多个队列的高性能路由器中，数据包缓冲通常使用DRAM来实现数据存储，并使用片外和片内SRAM的组合来分别存储链表节点和数据包长度以及队列头。本文的重点是与使用片外SRAM相关的性能瓶颈。我们展示了隐式缓冲区指针和多缓冲区列表节点的组合如何显著降低缓冲和排队子系统对排队性能的影响。我们还展示了将它与粗粒度调度相结合如何提高公平排队算法的性能，同时还减少了所需的片外内存和带宽。假设聚合度为16，这些技术可以将保存列表节点所需的SRAM数量减少10倍，代价是DRAM空间浪费约10%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Addressing queuing bottlenecks at high speeds

Modern routers and switch fabrics can have hundreds of input and output ports running at up to 10 Gb/s; 40 Gb/s systems are starting to appear. At these rates, the performance of the buffering and queuing subsystem becomes a significant bottleneck. In high performance routers with more than a few queues, packet buffering is typically implemented using DRAM for data storage and a combination of off-chip and on-chip SRAM for storing the linked-list nodes and packet length, and the queue headers, respectively. This paper focuses on the performance bottlenecks associated with the use of off-chip SRAM. We show how the combination of implicit buffer pointers and multi-buffer list nodes can dramatically reduce the impact of buffering and queuing subsystem on queuing performance. We also show how combining it with coarse-grained scheduling can improve the performance of fair queuing algorithms, while also reducing the amount of off-chip memory and bandwidth needed. These techniques can reduce the amount of SRAM needed to hold the list nodes by a factor of 10 at the cost of about 10% wastage of the DRAM space, assuming an aggregation degree of 16.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

13th Symposium on High Performance Interconnects (HOTI'05)

自引率

0.00%

发文量