Zihang Wang;Yushu Yang;Jianfei Wang;Jia Hou;Yang Su;Chen Yang
{"title":"基于组对存储器访问和快速级间重排序的可扩展高效NTT/INTT体系结构","authors":"Zihang Wang;Yushu Yang;Jianfei Wang;Jia Hou;Yang Su;Chen Yang","doi":"10.1109/TVLSI.2024.3465010","DOIUrl":null,"url":null,"abstract":"Polynomial multiplication is a significant bottleneck in mainstream postquantum cryptography (PQC) schemes. To speed it up, number theoretic transform (NTT) is widely used, which decreases the time complexity from <inline-formula> <tex-math>${O}(n^{2})$ </tex-math></inline-formula> to <inline-formula> <tex-math>$O[n\\log _{2}(n)]$ </tex-math></inline-formula>. However, it is challenging to ensure optimal hardware efficiency in conjunction with scalability. This brief proposes a novel pipelined NTT/inverse-NTT (INTT) architecture on field-programmable gate array (FPGA). A group-based pairwise memory access (GPMA) scheme is proposed, and a scratchpad and reordering unit (SRU) is designed to form an efficient dataflow that simplifies control units and achieves almost <inline-formula> <tex-math>$n/2$ </tex-math></inline-formula> processing cycles on average for n-point NTT. Moreover, our architecture can support varying parameters. Compared to the state-of-the-art works, our architecture achieves up to <inline-formula> <tex-math>$4.8\\times $ </tex-math></inline-formula> latency improvements and up to <inline-formula> <tex-math>$4.3\\times $ </tex-math></inline-formula> improvements on area time product (ATP).","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"588-592"},"PeriodicalIF":2.8000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Scalable and Efficient NTT/INTT Architecture Using Group-Based Pairwise Memory Access and Fast Interstage Reordering\",\"authors\":\"Zihang Wang;Yushu Yang;Jianfei Wang;Jia Hou;Yang Su;Chen Yang\",\"doi\":\"10.1109/TVLSI.2024.3465010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Polynomial multiplication is a significant bottleneck in mainstream postquantum cryptography (PQC) schemes. To speed it up, number theoretic transform (NTT) is widely used, which decreases the time complexity from <inline-formula> <tex-math>${O}(n^{2})$ </tex-math></inline-formula> to <inline-formula> <tex-math>$O[n\\\\log _{2}(n)]$ </tex-math></inline-formula>. However, it is challenging to ensure optimal hardware efficiency in conjunction with scalability. This brief proposes a novel pipelined NTT/inverse-NTT (INTT) architecture on field-programmable gate array (FPGA). A group-based pairwise memory access (GPMA) scheme is proposed, and a scratchpad and reordering unit (SRU) is designed to form an efficient dataflow that simplifies control units and achieves almost <inline-formula> <tex-math>$n/2$ </tex-math></inline-formula> processing cycles on average for n-point NTT. Moreover, our architecture can support varying parameters. Compared to the state-of-the-art works, our architecture achieves up to <inline-formula> <tex-math>$4.8\\\\times $ </tex-math></inline-formula> latency improvements and up to <inline-formula> <tex-math>$4.3\\\\times $ </tex-math></inline-formula> improvements on area time product (ATP).\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":\"33 2\",\"pages\":\"588-592\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10710157/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10710157/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A Scalable and Efficient NTT/INTT Architecture Using Group-Based Pairwise Memory Access and Fast Interstage Reordering
Polynomial multiplication is a significant bottleneck in mainstream postquantum cryptography (PQC) schemes. To speed it up, number theoretic transform (NTT) is widely used, which decreases the time complexity from ${O}(n^{2})$ to $O[n\log _{2}(n)]$ . However, it is challenging to ensure optimal hardware efficiency in conjunction with scalability. This brief proposes a novel pipelined NTT/inverse-NTT (INTT) architecture on field-programmable gate array (FPGA). A group-based pairwise memory access (GPMA) scheme is proposed, and a scratchpad and reordering unit (SRU) is designed to form an efficient dataflow that simplifies control units and achieves almost $n/2$ processing cycles on average for n-point NTT. Moreover, our architecture can support varying parameters. Compared to the state-of-the-art works, our architecture achieves up to $4.8\times $ latency improvements and up to $4.3\times $ improvements on area time product (ATP).
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.