Accelerating Montgomery Modulo Multiplication for Redundant Radix-64k Number System on the FPGA Using Dual-Port Block RAMs

2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing Pub Date : 2008-12-17 DOI:10.1109/EUC.2008.30

K. Shigemoto, K. Kawakami, K. Nakano

{"title":"Accelerating Montgomery Modulo Multiplication for Redundant Radix-64k Number System on the FPGA Using Dual-Port Block RAMs","authors":"K. Shigemoto, K. Kawakami, K. Nakano","doi":"10.1109/EUC.2008.30","DOIUrl":null,"url":null,"abstract":"The main contribution of this paper is to present hardware algorithms for redundant radix-2r number system in the FPGA to accelerate Montgomery modulo multiplication with many bits, which have applications in security systems such as RSA encryption and decryption. Quite surprisingly, our hardware algorithm for Montgomery modulo multiplication of two dr-bit numbers can be completed in only d+1 clock cycles. Since most FPGAs have 18-bit multipliers and 18 k-bit block RAMs, it makes sense to let r=16. Our hardware algorithm for Montgomery modulo multiplication for 256-bit numbers runs only 17 clock cycles using redundant radix-64 k (i.e.radix-216) number system. The experimental results for Xilinx Virtex-II Pro Family FPGA XC2VP100-6 show that the clock frequency of our circuit is independent of d. Further, the hardware algorithm for 1024-bit Montgomery modulo multiplication using the redundant number system is 3 times faster than that using the conventional number system. Also, for 256-bit Montgomery modulo multiplication, our hardware algorithm runs in 0.322 mus, while a previously known implementation runs in 1.22 mus although our implementation uses less than a half slices.","PeriodicalId":430277,"journal":{"name":"2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUC.2008.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

The main contribution of this paper is to present hardware algorithms for redundant radix-2r number system in the FPGA to accelerate Montgomery modulo multiplication with many bits, which have applications in security systems such as RSA encryption and decryption. Quite surprisingly, our hardware algorithm for Montgomery modulo multiplication of two dr-bit numbers can be completed in only d+1 clock cycles. Since most FPGAs have 18-bit multipliers and 18 k-bit block RAMs, it makes sense to let r=16. Our hardware algorithm for Montgomery modulo multiplication for 256-bit numbers runs only 17 clock cycles using redundant radix-64 k (i.e.radix-216) number system. The experimental results for Xilinx Virtex-II Pro Family FPGA XC2VP100-6 show that the clock frequency of our circuit is independent of d. Further, the hardware algorithm for 1024-bit Montgomery modulo multiplication using the redundant number system is 3 times faster than that using the conventional number system. Also, for 256-bit Montgomery modulo multiplication, our hardware algorithm runs in 0.322 mus, while a previously known implementation runs in 1.22 mus although our implementation uses less than a half slices.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于双端口块ram的冗余基数-64k数字系统的FPGA加速Montgomery模乘法

本文的主要贡献是在FPGA中提出冗余基数-2r数系统的硬件算法，以加速多比特的Montgomery模乘法，该算法在RSA加解密等安全系统中具有应用价值。令人惊讶的是，我们的两个dr位的Montgomery模乘法的硬件算法可以在d+1时钟周期内完成。由于大多数fpga具有18位乘法器和18 k位块ram，因此让r=16是有意义的。我们的256位数字的Montgomery模乘法硬件算法使用冗余基数- 64k(即基数-216)数字系统仅运行17个时钟周期。在Xilinx Virtex-II Pro系列FPGA XC2VP100-6上的实验结果表明，该电路的时钟频率与d无关。此外，使用冗余数字系统进行1024位Montgomery模乘法的硬件算法比使用传统数字系统快3倍。此外，对于256位Montgomery模乘法，我们的硬件算法运行时间为0.322 mus，而以前已知的实现运行时间为1.22 mus，尽管我们的实现使用的切片不到一半。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing

自引率

0.00%

发文量

期刊最新文献

Automatic Integration of Non-Bus Hardware IP into SoC-Platforms for Use by Software A Self-Healing and Mutual-Healing Key Distribution Scheme Using Bilinear Pairings for Wireless Networks Simple Certificateless Signature with Smart Cards Implementation and Evaluation of SIP-Based Secure VoIP Communication System Adaptive Drowsy Cache Control for Java Applications