Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA

Ren Chen, Sruja Siriyal, V. Prasanna
{"title":"Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA","authors":"Ren Chen, Sruja Siriyal, V. Prasanna","doi":"10.1145/2684746.2689068","DOIUrl":null,"url":null,"abstract":"Parallel sorting networks are widely employed in hardware implementations for sorting due to their high data parallelism and low control overhead. In this paper, we propose an energy and memory efficient mapping methodology for implementing bitonic sorting network on FPGA. Using this methodology, the proposed sorting architecture can be built for a given data parallelism while supporting continuous data streams. We propose a streaming permutation network (SPN) by \"folding\" the classic Clos network. We prove that the SPN is programmable to realize all the interconnection patterns in the bitonic sorting network. A low cost design for sorting with minimal resource usage is obtained by reusing one SPN . We also demonstrate a high throughput design by trading off area for performance. With a data parallelism of p (2 ≤ p ≤ N/ log2 N), the high throughput design sorts an N-key sequence with latency O(N/p), throughput (# of keys sorted per cycle) O(p) and uses O(N) memory. This achieves optimal memory efficiency (defined as the ratio of throughput to the amount of on-chip memory used by the design) of O(p/N). Another noteworthy feature of the high throughput design is that only single-port memory rather than dual-port memory is required for processing continuous data streams. This results in 50% reduction in memory consumption. Post place-and-route results show that our architecture demonstrates 1.3x ∼1.6x improvment in energy efficiency and 1.5x ∼ 5.3x better memory efficiency compared with the state-of-the-art designs.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2684746.2689068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 67

Abstract

Parallel sorting networks are widely employed in hardware implementations for sorting due to their high data parallelism and low control overhead. In this paper, we propose an energy and memory efficient mapping methodology for implementing bitonic sorting network on FPGA. Using this methodology, the proposed sorting architecture can be built for a given data parallelism while supporting continuous data streams. We propose a streaming permutation network (SPN) by "folding" the classic Clos network. We prove that the SPN is programmable to realize all the interconnection patterns in the bitonic sorting network. A low cost design for sorting with minimal resource usage is obtained by reusing one SPN . We also demonstrate a high throughput design by trading off area for performance. With a data parallelism of p (2 ≤ p ≤ N/ log2 N), the high throughput design sorts an N-key sequence with latency O(N/p), throughput (# of keys sorted per cycle) O(p) and uses O(N) memory. This achieves optimal memory efficiency (defined as the ratio of throughput to the amount of on-chip memory used by the design) of O(p/N). Another noteworthy feature of the high throughput design is that only single-port memory rather than dual-port memory is required for processing continuous data streams. This results in 50% reduction in memory consumption. Post place-and-route results show that our architecture demonstrates 1.3x ∼1.6x improvment in energy efficiency and 1.5x ∼ 5.3x better memory efficiency compared with the state-of-the-art designs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FPGA上双音排序的能量和内存效率映射
并行排序网络由于其高数据并行性和低控制开销而广泛应用于硬件实现中。在本文中,我们提出了一种在FPGA上实现双声排序网络的能量和内存高效映射方法。使用这种方法,可以为给定的数据并行性构建所建议的排序体系结构,同时支持连续数据流。通过对经典Clos网络的“折叠”,提出了一种流置换网络(SPN)。证明了SPN是可编程的,可以实现双元分拣网络中所有的互连模式。通过重用一个SPN,可以获得资源使用最少的低成本排序设计。我们还通过权衡面积和性能来演示高吞吐量设计。在数据并行度为p(2≤p≤N/ log2n)的情况下,高吞吐量设计对N个键序列进行排序,延迟为O(N/p),吞吐量(每个周期排序的键数)为O(p),并且使用O(N)内存。这实现了0 (p/N)的最佳内存效率(定义为吞吐量与设计使用的片上内存数量的比率)。高吞吐量设计的另一个值得注意的特点是,处理连续数据流只需要单端口内存,而不需要双端口内存。这将导致内存消耗减少50%。放置和路由后的结果表明,与最先进的设计相比,我们的架构的能源效率提高了1.3倍~ 1.6倍,内存效率提高了1.5倍~ 5.3倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
REPROC: A Dynamically Reconfigurable Architecture for Symmetric Cryptography (Abstract Only) Session details: Technical Session 1: Computer-aided Design Energy-Efficient Discrete Signal Processing with Field Programmable Analog Arrays (FPAAs) Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA Impact of Memory Architecture on FPGA Energy Consumption
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1