{"title":"FELIX: FPGA-Based Scalable and Lightweight Accelerator for Large Integer Extended GCD","authors":"Samuel Coulon;Tianyou Bao;Jiafeng Xie","doi":"10.1109/TVLSI.2024.3417016","DOIUrl":null,"url":null,"abstract":"The extended greatest common divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and postquantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bézout coefficients \n<inline-formula> <tex-math>$b_{a}$ </tex-math></inline-formula>\n and \n<inline-formula> <tex-math>$b_{b}$ </tex-math></inline-formula>\n which satisfy \n<inline-formula> <tex-math>$\\mathrm {GCD}(a,b) = a\\times b_{a} + b\\times b_{b}$ </tex-math></inline-formula>\n. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation between 6479-bit integers is required for solving Nth-degree truncated polynomial ring unit (NTRU) trapdoors in Falcon, a National Institute of Standards and Technology (NIST)-selected postquantum digital signature scheme. To this point, existing literature has primarily focused on exploring software-based implementations for XGCD. The few existing high-performance hardware architectures require significant hardware resources and may not be desirable for practical usage, and the lightweight architectures suffer from poor performance. To fill the research gap, this work proposes a novel FPGA-based scalable and lightweight accelerator for large integer XGCD (FELIX). First, a new algorithm suitable for scalable and lightweight computation of XGCD is proposed. Next, a hardware accelerator (FELIX) is presented, including both constant- and variable-time versions. Finally, a thorough evaluation is carried out to showcase the efficiency of the proposed FELIX. In certain configurations, FELIX involves 81% less equivalent area-time product (eATP) than the state-of-the-art design for 1024-bit integers, and achieves a 95% reduction in latency over the software for 6479-bit integers (Falcon parameter set) with reasonable resource usage. Overall, the proposed FELIX is highly efficient, scalable, lightweight, and suitable for very large integer computation, making it the first such XGCD accelerator in the literature (to the best of our knowledge).","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10593812","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10593812/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
The extended greatest common divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and postquantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bézout coefficients
$b_{a}$
and
$b_{b}$
which satisfy
$\mathrm {GCD}(a,b) = a\times b_{a} + b\times b_{b}$
. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation between 6479-bit integers is required for solving Nth-degree truncated polynomial ring unit (NTRU) trapdoors in Falcon, a National Institute of Standards and Technology (NIST)-selected postquantum digital signature scheme. To this point, existing literature has primarily focused on exploring software-based implementations for XGCD. The few existing high-performance hardware architectures require significant hardware resources and may not be desirable for practical usage, and the lightweight architectures suffer from poor performance. To fill the research gap, this work proposes a novel FPGA-based scalable and lightweight accelerator for large integer XGCD (FELIX). First, a new algorithm suitable for scalable and lightweight computation of XGCD is proposed. Next, a hardware accelerator (FELIX) is presented, including both constant- and variable-time versions. Finally, a thorough evaluation is carried out to showcase the efficiency of the proposed FELIX. In certain configurations, FELIX involves 81% less equivalent area-time product (eATP) than the state-of-the-art design for 1024-bit integers, and achieves a 95% reduction in latency over the software for 6479-bit integers (Falcon parameter set) with reasonable resource usage. Overall, the proposed FELIX is highly efficient, scalable, lightweight, and suitable for very large integer computation, making it the first such XGCD accelerator in the literature (to the best of our knowledge).
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.