Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI:10.1145/2684746.2689068

Ren Chen, Sruja Siriyal, V. Prasanna

{"title":"Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA","authors":"Ren Chen, Sruja Siriyal, V. Prasanna","doi":"10.1145/2684746.2689068","DOIUrl":null,"url":null,"abstract":"Parallel sorting networks are widely employed in hardware implementations for sorting due to their high data parallelism and low control overhead. In this paper, we propose an energy and memory efficient mapping methodology for implementing bitonic sorting network on FPGA. Using this methodology, the proposed sorting architecture can be built for a given data parallelism while supporting continuous data streams. We propose a streaming permutation network (SPN) by \"folding\" the classic Clos network. We prove that the SPN is programmable to realize all the interconnection patterns in the bitonic sorting network. A low cost design for sorting with minimal resource usage is obtained by reusing one SPN . We also demonstrate a high throughput design by trading off area for performance. With a data parallelism of p (2 ≤ p ≤ N/ log2 N), the high throughput design sorts an N-key sequence with latency O(N/p), throughput (# of keys sorted per cycle) O(p) and uses O(N) memory. This achieves optimal memory efficiency (defined as the ratio of throughput to the amount of on-chip memory used by the design) of O(p/N). Another noteworthy feature of the high throughput design is that only single-port memory rather than dual-port memory is required for processing continuous data streams. This results in 50% reduction in memory consumption. Post place-and-route results show that our architecture demonstrates 1.3x ∼1.6x improvment in energy efficiency and 1.5x ∼ 5.3x better memory efficiency compared with the state-of-the-art designs.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2684746.2689068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 67

Abstract

Parallel sorting networks are widely employed in hardware implementations for sorting due to their high data parallelism and low control overhead. In this paper, we propose an energy and memory efficient mapping methodology for implementing bitonic sorting network on FPGA. Using this methodology, the proposed sorting architecture can be built for a given data parallelism while supporting continuous data streams. We propose a streaming permutation network (SPN) by "folding" the classic Clos network. We prove that the SPN is programmable to realize all the interconnection patterns in the bitonic sorting network. A low cost design for sorting with minimal resource usage is obtained by reusing one SPN . We also demonstrate a high throughput design by trading off area for performance. With a data parallelism of p (2 ≤ p ≤ N/ log2 N), the high throughput design sorts an N-key sequence with latency O(N/p), throughput (# of keys sorted per cycle) O(p) and uses O(N) memory. This achieves optimal memory efficiency (defined as the ratio of throughput to the amount of on-chip memory used by the design) of O(p/N). Another noteworthy feature of the high throughput design is that only single-port memory rather than dual-port memory is required for processing continuous data streams. This results in 50% reduction in memory consumption. Post place-and-route results show that our architecture demonstrates 1.3x ∼1.6x improvment in energy efficiency and 1.5x ∼ 5.3x better memory efficiency compared with the state-of-the-art designs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FPGA上双音排序的能量和内存效率映射

并行排序网络由于其高数据并行性和低控制开销而广泛应用于硬件实现中。在本文中，我们提出了一种在FPGA上实现双声排序网络的能量和内存高效映射方法。使用这种方法，可以为给定的数据并行性构建所建议的排序体系结构，同时支持连续数据流。通过对经典Clos网络的“折叠”，提出了一种流置换网络(SPN)。证明了SPN是可编程的，可以实现双元分拣网络中所有的互连模式。通过重用一个SPN，可以获得资源使用最少的低成本排序设计。我们还通过权衡面积和性能来演示高吞吐量设计。在数据并行度为p(2≤p≤N/ log2n)的情况下，高吞吐量设计对N个键序列进行排序，延迟为O(N/p)，吞吐量(每个周期排序的键数)为O(p)，并且使用O(N)内存。这实现了0 (p/N)的最佳内存效率(定义为吞吐量与设计使用的片上内存数量的比率)。高吞吐量设计的另一个值得注意的特点是，处理连续数据流只需要单端口内存，而不需要双端口内存。这将导致内存消耗减少50%。放置和路由后的结果表明，与最先进的设计相比，我们的架构的能源效率提高了1.3倍~ 1.6倍，内存效率提高了1.5倍~ 5.3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量

期刊最新文献

REPROC: A Dynamically Reconfigurable Architecture for Symmetric Cryptography (Abstract Only) Session details: Technical Session 1: Computer-aided Design Energy-Efficient Discrete Signal Processing with Field Programmable Analog Arrays (FPAAs) Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA Impact of Memory Architecture on FPGA Energy Consumption