为可编程和高效的网络查找设计和实现PLUG架构

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2010-09-11 DOI:10.1145/1854273.1854316

Amit Kumar, Lorenzo De Carli, Sung Jin Kim, M. Kruijf, K. Sankaralingam, Cristian Estan, S. Jha

{"title":"为可编程和高效的网络查找设计和实现PLUG架构","authors":"Amit Kumar, Lorenzo De Carli, Sung Jin Kim, M. Kruijf, K. Sankaralingam, Cristian Estan, S. Jha","doi":"10.1145/1854273.1854316","DOIUrl":null,"url":null,"abstract":"This paper proposes a new architecture called Pipelined LookUp Grid (PLUG) that can perform data structure lookups in network processing. PLUGs are programmable and through simplicity achieve power efficiency. We draw upon the insights that data structure lookups have natural structure that can be statically determined and exploited. The PLUG execution model transforms data-structure lookups into pipelined stages of computation and associates small code-blocks with data. The PLUG architecture is a tiled architecture with each tile consisting predominantly of SRAMs, a lightweight no-buffering router, and an array of lightweight computation cores. Using a principle of fixed delays in the execution model, the architecture is contention-free and completely statically scheduled thus achieving high energy efficiency. The architecture enables rapid deployment of new network protocols and generalizes as a data-structure accelerator. This paper describes the PLUG architecture, the compiler, and evaluates our RTL prototype PLUG chip synthesized on a 55nm technology library. We evaluate six diverse high-end network processing workloads including IPv4, IPv6, and Ethernet forwarding. We show that at a 55nm technology, a 16-tile PLUG occupies 58mm2, provides 4MB on-chip storage, and sustains a clock frequency of 1 GHz. This translates to 1 billion lookups per second, a latency of 18ns to 219ns, and average power less than 1 watt.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Design and implementation of the PLUG architecture for programmable and efficient network lookups\",\"authors\":\"Amit Kumar, Lorenzo De Carli, Sung Jin Kim, M. Kruijf, K. Sankaralingam, Cristian Estan, S. Jha\",\"doi\":\"10.1145/1854273.1854316\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a new architecture called Pipelined LookUp Grid (PLUG) that can perform data structure lookups in network processing. PLUGs are programmable and through simplicity achieve power efficiency. We draw upon the insights that data structure lookups have natural structure that can be statically determined and exploited. The PLUG execution model transforms data-structure lookups into pipelined stages of computation and associates small code-blocks with data. The PLUG architecture is a tiled architecture with each tile consisting predominantly of SRAMs, a lightweight no-buffering router, and an array of lightweight computation cores. Using a principle of fixed delays in the execution model, the architecture is contention-free and completely statically scheduled thus achieving high energy efficiency. The architecture enables rapid deployment of new network protocols and generalizes as a data-structure accelerator. This paper describes the PLUG architecture, the compiler, and evaluates our RTL prototype PLUG chip synthesized on a 55nm technology library. We evaluate six diverse high-end network processing workloads including IPv4, IPv6, and Ethernet forwarding. We show that at a 55nm technology, a 16-tile PLUG occupies 58mm2, provides 4MB on-chip storage, and sustains a clock frequency of 1 GHz. This translates to 1 billion lookups per second, a latency of 18ns to 219ns, and average power less than 1 watt.\",\"PeriodicalId\":422461,\"journal\":{\"name\":\"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1854273.1854316\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1854273.1854316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

本文提出了一种新的结构，称为管道查找网格(pipeline LookUp Grid, PLUG)，它可以在网络处理中执行数据结构查找。插头是可编程的，通过简单实现功率效率。我们认为数据结构查找具有可以静态确定和利用的自然结构。PLUG执行模型将数据结构查找转换为计算的流水线阶段，并将小代码块与数据关联起来。PLUG架构是一种分层架构，每个分层主要由sram、一个轻量级无缓冲路由器和一组轻量级计算核心组成。在执行模型中使用固定延迟原则，该体系结构是无争用的，并且完全是静态调度的，从而实现了高能效。该体系结构能够快速部署新的网络协议，并作为数据结构加速器进行推广。本文介绍了PLUG的架构、编译器，并对我们在55nm工艺库上合成的RTL原型PLUG芯片进行了评估。我们评估了六种不同的高端网络处理工作负载，包括IPv4、IPv6和以太网转发。我们表明，在55nm技术下，16块PLUG占用58mm2，提供4MB片上存储，并保持1ghz的时钟频率。这意味着每秒查找10亿次，延迟为18ns到219ns，平均功耗低于1瓦。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Design and implementation of the PLUG architecture for programmable and efficient network lookups

This paper proposes a new architecture called Pipelined LookUp Grid (PLUG) that can perform data structure lookups in network processing. PLUGs are programmable and through simplicity achieve power efficiency. We draw upon the insights that data structure lookups have natural structure that can be statically determined and exploited. The PLUG execution model transforms data-structure lookups into pipelined stages of computation and associates small code-blocks with data. The PLUG architecture is a tiled architecture with each tile consisting predominantly of SRAMs, a lightweight no-buffering router, and an array of lightweight computation cores. Using a principle of fixed delays in the execution model, the architecture is contention-free and completely statically scheduled thus achieving high energy efficiency. The architecture enables rapid deployment of new network protocols and generalizes as a data-structure accelerator. This paper describes the PLUG architecture, the compiler, and evaluates our RTL prototype PLUG chip synthesized on a 55nm technology library. We evaluate six diverse high-end network processing workloads including IPv4, IPv6, and Ethernet forwarding. We show that at a 55nm technology, a 16-tile PLUG occupies 58mm2, provides 4MB on-chip storage, and sustains a clock frequency of 1 GHz. This translates to 1 billion lookups per second, a latency of 18ns to 219ns, and average power less than 1 watt.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量

期刊最新文献

Reducing task creation and termination overhead in explicitly parallel programs An intra-tile cache set balancing scheme NUcache: A multicore cache organization based on Next-Use distance Towards a science of parallel programming Discovering and understanding performance bottlenecks in transactional applications