On-chip memory efficient data layout for 2D FFT on 3D memory integrated FPGA

Shreyas G. Singapura, R. Kannan, V. Prasanna
{"title":"On-chip memory efficient data layout for 2D FFT on 3D memory integrated FPGA","authors":"Shreyas G. Singapura, R. Kannan, V. Prasanna","doi":"10.1109/HPEC.2016.7761606","DOIUrl":null,"url":null,"abstract":"3D memories are becoming viable solutions for the memory wall problem and meeting the bandwidth requirements of memory intensive applications. The high bandwidth provided by 3D memories does not translate to a proportional increase in performance for all applications. For an application such as 2D FFT with strided access patterns, the data layout of the memory has a significant impact on the total execution time of the implementation. In this paper, we present a data layout for 2D FFT on 3D memory integrated FPGA that is both on-chip memory efficient as well as throughput-optimal. Our data layout ensures that consecutive accesses to 3D memory are sufficiently interleaved among layers and vaults to absorb latency due to activation overheads for both sequential (Row FFT) and strided (Column FFT) accesses. The current state-of-the-art implementation on 3D memory requires O(√cN) on-chip memory to reduce the strided accesses and achieve maximum bandwidth for an N × N FFT problem size and c columns in a 3D memory bank row. Our proposed data layout optimizes the throughput of both the Row FFT and Column FFT phases of 2D FFT with O(N) on-chip memory for the same problem size and memory parameters without decreasing the memory bandwidth thereby achieving a √c× reduction in on-chip memory. On architectures with limited on-chip memory, our data layout achieves 2× to 4× improvement in execution time compared with the state-of-art 2D FFT implementation on 3D memory.","PeriodicalId":308129,"journal":{"name":"2016 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"204 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2016.7761606","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

3D memories are becoming viable solutions for the memory wall problem and meeting the bandwidth requirements of memory intensive applications. The high bandwidth provided by 3D memories does not translate to a proportional increase in performance for all applications. For an application such as 2D FFT with strided access patterns, the data layout of the memory has a significant impact on the total execution time of the implementation. In this paper, we present a data layout for 2D FFT on 3D memory integrated FPGA that is both on-chip memory efficient as well as throughput-optimal. Our data layout ensures that consecutive accesses to 3D memory are sufficiently interleaved among layers and vaults to absorb latency due to activation overheads for both sequential (Row FFT) and strided (Column FFT) accesses. The current state-of-the-art implementation on 3D memory requires O(√cN) on-chip memory to reduce the strided accesses and achieve maximum bandwidth for an N × N FFT problem size and c columns in a 3D memory bank row. Our proposed data layout optimizes the throughput of both the Row FFT and Column FFT phases of 2D FFT with O(N) on-chip memory for the same problem size and memory parameters without decreasing the memory bandwidth thereby achieving a √c× reduction in on-chip memory. On architectures with limited on-chip memory, our data layout achieves 2× to 4× improvement in execution time compared with the state-of-art 2D FFT implementation on 3D memory.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
片上存储器在三维存储器集成FPGA上实现二维FFT的高效数据布局
3D存储器正在成为内存墙问题的可行解决方案,并满足内存密集型应用的带宽要求。3D存储器提供的高带宽并不能转化为所有应用程序性能的成比例增长。对于具有跨行访问模式的2D FFT等应用程序,内存的数据布局对实现的总执行时间有重大影响。在本文中,我们提出了一种在3D存储器集成FPGA上进行二维FFT的数据布局,该布局既具有片上存储器效率,又具有吞吐量优化。我们的数据布局确保对3D内存的连续访问在层和vault之间充分交错,以吸收由于顺序(行FFT)和跨行(列FFT)访问的激活开销而导致的延迟。当前最先进的3D存储器实现需要O(√cN)片上存储器来减少跨行访问并实现最大带宽,以满足N × N FFT问题大小和3D存储器行中的c列。我们提出的数据布局优化了具有O(N)片上存储器的2D FFT的行FFT和列FFT阶段的吞吐量,具有相同的问题大小和存储器参数,而不减少存储器带宽,从而实现了片上存储器的√cx减少。在片上内存有限的架构上,我们的数据布局与3D内存上最先进的2D FFT实现相比,执行时间提高了2到4倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Havens: Explicit reliable memory regions for HPC applications In-storage embedded accelerator for sparse pattern processing On-chip memory efficient data layout for 2D FFT on 3D memory integrated FPGA Design space exploration of GPU Accelerated cluster systems for optimal data transfer using PCIe bus Accelerated low-rank updates to tensor decompositions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1