Dramaton: A Near-DRAM Accelerator for Large Number Theoretic Transforms

IF 1.4 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Computer Architecture Letters Pub Date : 2024-03-27 DOI:10.1109/LCA.2024.3381452

Yongmo Park;Subhankar Pal;Aporva Amarnath;Karthik Swaminathan;Wei D. Lu;Alper Buyuktosunoglu;Pradip Bose

引用次数: 0

Abstract

With the rising popularity of post-quantum cryptographic schemes, realizing practical implementations for real-world applications is still a major challenge. A major bottleneck in such schemes is the fetching and processing of large polynomials in the Number Theoretic Transform (NTT), which makes non Von Neumann paradigms, such as near-memory processing, a viable option. We, therefore, propose a novel near-DRAM NTT accelerator design, called Dramaton . Additionally, we introduce a conflict-free mapping algorithm that enables Dramaton to process large NTTs with minimal hardware overhead using a fixed-permutation network. Dramaton achieves 5–207× speedup in latency over the state-of-the-art and 97× improvement in EDP over a recent near-memory NTT accelerator.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DRAMATON: 用于大数理论变换的近 DRAM 加速器

随着后量子加密算法的日益流行，在现实世界中实现实际应用仍然是一项重大挑战。此类方案的一个主要瓶颈是在数论变换（NTT）中获取和处理大型多项式，这使得近内存处理等非冯-诺依曼范式成为可行的选择。因此，我们提出了一种名为 Dramaton 的新型近内存 NTT 加速器设计。此外，我们还引入了一种无冲突映射算法，使 Dramaton 能够使用固定幂次网络以最小的硬件开销处理大型 NTT。与最新的近内存 NTT 加速器相比，Dramaton 的延迟速度提高了 5-207 倍，EDP 提高了 97 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Computer Architecture Letters COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.60

自引率

4.30%

发文量

期刊介绍： IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.

期刊最新文献

DAWN: Efficient Distribution of Attention Workload in PIM-Enabled Systems for LLM Inference 2025 Reviewers List* Driving the Core Frontend With LiteBTB CTL: A Case for CXL Device-Managed Hugepages H3: Hybrid Architecture Using High Bandwidth Memory and High Bandwidth Flash for Cost-Efficient LLM Inference