PriorMSM：高效的多乘法加速架构

IF 2.2 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-07-12 DOI:10.1145/3678006

Changxu Liu, Hao Zhou, Patrick Dai, Li Shang, Fan Yang

{"title":"PriorMSM：高效的多乘法加速架构","authors":"Changxu Liu, Hao Zhou, Patrick Dai, Li Shang, Fan Yang","doi":"10.1145/3678006","DOIUrl":null,"url":null,"abstract":"\n Multi-Scalar Multiplication (MSM) is a computationally intensive task that operates on elliptic curves based on\n GF\n (\n P\n ). It is commonly used in Zero-knowledge proof (ZKP), where it accounts for a significant portion of the computation time required for proof generation. In this paper, we present PriorMSM, an efficient acceleration architecture for MSM. We propose a Priority-based Scheduling Mechanism (PBSM) based on a multi-FIFOs and multi-banks architecture to accelerate the implementation of MSM. By increasing the pairing success rate of internal points, PBSM reduces the number of bubbles in the pipeline of point addition (PADD), consequently improving the data throughput of the pipeline. We also introduce an advanced parallel bucket aggregation algorithm, leveraging PADD’s fully pipelined characteristics to significantly accelerate the implementation of bucket aggregation. We perform a sensitivity analysis on the crucial parameter, window size, in MSM. The results indicate that the window size of the MSM significantly impacts its latency. Area-Time Product (ATP) metric is introduced to guide the selection of the optimal window size, balancing the performance and cost for practical applications of subsequent MSM implementations. PriorMSM is evaluated using the TSMC 28nm process. It achieves a maximum speedup of 10.9 × compared to the previous custom hardware implementations and a maximum speedup of 3.9 × compared to the GPU implementations.\n","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PriorMSM: An Efficient Acceleration Architecture for Multi-Scalar Multiplication\",\"authors\":\"Changxu Liu, Hao Zhou, Patrick Dai, Li Shang, Fan Yang\",\"doi\":\"10.1145/3678006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Multi-Scalar Multiplication (MSM) is a computationally intensive task that operates on elliptic curves based on\\n GF\\n (\\n P\\n ). It is commonly used in Zero-knowledge proof (ZKP), where it accounts for a significant portion of the computation time required for proof generation. In this paper, we present PriorMSM, an efficient acceleration architecture for MSM. We propose a Priority-based Scheduling Mechanism (PBSM) based on a multi-FIFOs and multi-banks architecture to accelerate the implementation of MSM. By increasing the pairing success rate of internal points, PBSM reduces the number of bubbles in the pipeline of point addition (PADD), consequently improving the data throughput of the pipeline. We also introduce an advanced parallel bucket aggregation algorithm, leveraging PADD’s fully pipelined characteristics to significantly accelerate the implementation of bucket aggregation. We perform a sensitivity analysis on the crucial parameter, window size, in MSM. The results indicate that the window size of the MSM significantly impacts its latency. Area-Time Product (ATP) metric is introduced to guide the selection of the optimal window size, balancing the performance and cost for practical applications of subsequent MSM implementations. PriorMSM is evaluated using the TSMC 28nm process. It achieves a maximum speedup of 10.9 × compared to the previous custom hardware implementations and a maximum speedup of 3.9 × compared to the GPU implementations.\\n\",\"PeriodicalId\":50944,\"journal\":{\"name\":\"ACM Transactions on Design Automation of Electronic Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Design Automation of Electronic Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3678006\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3678006","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

多乘法（MSM）是一项计算密集型任务，它在基于 GF ( P ) 的椭圆曲线上运行。它常用于零知识证明（ZKP），占证明生成所需计算时间的很大一部分。本文介绍了 MSM 的高效加速架构 PriorMSM。我们提出了一种基于优先级的调度机制（PBSM），该机制基于多 FIFO 和多银行架构，可加速 MSM 的实现。通过提高内部点的配对成功率，PBSM 减少了点添加流水线（PADD）中的气泡数量，从而提高了流水线的数据吞吐量。我们还引入了一种先进的并行桶聚合算法，利用 PADD 的全流水线特性显著加快了桶聚合的实现。我们对 MSM 中的关键参数窗口大小进行了敏感性分析。结果表明，MSM 的窗口大小对其延迟有显著影响。我们引入了面积-时间乘积（ATP）指标来指导选择最佳窗口大小，从而在后续 MSM 实现的实际应用中平衡性能和成本。PriorMSM 采用台积电 28 纳米工艺进行评估。与之前的定制硬件实现相比，它的最大速度提高了 10.9 倍，与 GPU 实现相比，它的最大速度提高了 3.9 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PriorMSM: An Efficient Acceleration Architecture for Multi-Scalar Multiplication

Multi-Scalar Multiplication (MSM) is a computationally intensive task that operates on elliptic curves based on GF ( P ). It is commonly used in Zero-knowledge proof (ZKP), where it accounts for a significant portion of the computation time required for proof generation. In this paper, we present PriorMSM, an efficient acceleration architecture for MSM. We propose a Priority-based Scheduling Mechanism (PBSM) based on a multi-FIFOs and multi-banks architecture to accelerate the implementation of MSM. By increasing the pairing success rate of internal points, PBSM reduces the number of bubbles in the pipeline of point addition (PADD), consequently improving the data throughput of the pipeline. We also introduce an advanced parallel bucket aggregation algorithm, leveraging PADD’s fully pipelined characteristics to significantly accelerate the implementation of bucket aggregation. We perform a sensitivity analysis on the crucial parameter, window size, in MSM. The results indicate that the window size of the MSM significantly impacts its latency. Area-Time Product (ATP) metric is introduced to guide the selection of the optimal window size, balancing the performance and cost for practical applications of subsequent MSM implementations. PriorMSM is evaluated using the TSMC 28nm process. It achieves a maximum speedup of 10.9 × compared to the previous custom hardware implementations and a maximum speedup of 3.9 × compared to the GPU implementations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Design Automation of Electronic Systems 工程技术-计算机：软件工程

CiteScore

3.20

自引率

7.10%

发文量

105

审稿时长

3 months

期刊介绍： TODAES is a premier ACM journal in design and automation of electronic systems. It publishes innovative work documenting significant research and development advances on the specification, design, analysis, simulation, testing, and evaluation of electronic systems, emphasizing a computer science/engineering orientation. Both theoretical analysis and practical solutions are welcome.