{"title":"Falic: An FPGA-Based Multi-Scalar Multiplication Accelerator for Zero-Knowledge Proof","authors":"Yongkui Yang;Zhenyan Lu;Jingwei Zeng;Xingguo Liu;Xuehai Qian;Zhibin Yu","doi":"10.1109/TC.2024.3449121","DOIUrl":null,"url":null,"abstract":"In this paper, we propose Falic, a novel FPGA-based accelerator to accelerate multi-scalar multiplication (MSM), the most time-consuming phase of zk-SNARK proof generation. Falic innovates three techniques. First, it leverages globally asynchronous locally synchronous (GALS) strategy to build multiple small and lightweight MSM cores to parallelize the independent inner product computation on different portions of the scalar vector and point vector. Second, each MSM core contains just one large-integer modular multiplier (LIMM) that is multiplexed to perform the point additions (PADDs) generated during MSM. We strike a balance between the throughput and hardware cost by batching the appropriate number of PADDs and selecting the computation graph of PADD with proper parallelism degree. Finally, the performance is further improved by a simple cache structure that enables the computation reuse. We implement Falic on two different FPGAs with different hardware resources, i.e., the Xilinx U200 and Xilinx U250. Compared to the prior FPGA-based accelerator, Falic improves the MSM throughput by \n<inline-formula><tex-math>$3.9\\boldsymbol{\\times}$</tex-math></inline-formula>\n. Experimental results also show that Falic achieves a throughput speedup of up to \n<inline-formula><tex-math>$1.62\\boldsymbol{\\times}$</tex-math></inline-formula>\n and saves as much as \n<inline-formula><tex-math>$8.5\\boldsymbol{\\times}$</tex-math></inline-formula>\n energy compared to an RTX 2080Ti GPU.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 12","pages":"2791-2804"},"PeriodicalIF":3.6000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10644105/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose Falic, a novel FPGA-based accelerator to accelerate multi-scalar multiplication (MSM), the most time-consuming phase of zk-SNARK proof generation. Falic innovates three techniques. First, it leverages globally asynchronous locally synchronous (GALS) strategy to build multiple small and lightweight MSM cores to parallelize the independent inner product computation on different portions of the scalar vector and point vector. Second, each MSM core contains just one large-integer modular multiplier (LIMM) that is multiplexed to perform the point additions (PADDs) generated during MSM. We strike a balance between the throughput and hardware cost by batching the appropriate number of PADDs and selecting the computation graph of PADD with proper parallelism degree. Finally, the performance is further improved by a simple cache structure that enables the computation reuse. We implement Falic on two different FPGAs with different hardware resources, i.e., the Xilinx U200 and Xilinx U250. Compared to the prior FPGA-based accelerator, Falic improves the MSM throughput by
$3.9\boldsymbol{\times}$
. Experimental results also show that Falic achieves a throughput speedup of up to
$1.62\boldsymbol{\times}$
and saves as much as
$8.5\boldsymbol{\times}$
energy compared to an RTX 2080Ti GPU.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.