{"title":"Winograd for NTT:关于在 FPGA 上为后量子加密技术实现更高分辨率和低延迟 NTT 的案例研究","authors":"Suraj Mandal;Debapriya Basu Roy","doi":"10.1109/TCSI.2024.3470335","DOIUrl":null,"url":null,"abstract":"Number Theoretic Transform (NTT) plays an important role in efficiently implementing lattice-based cryptographic algorithms like CRYSTALS-Kyber, Dilithium, and FALCON. Existing implementations of NTT for these algorithms are mostly based on radix-2 or radix-4 realization of Cooley-Tukey and Gentleman-Sande architectures. In this work, we explore an alternative method of performing NTT known as Winograd’s NTT that requires fewer number of modular multipliers than the conventional Coole-Tukey/Gentleman-Sande for higher radix NTT. We have proposed three different low-latency implementations of Winograd’s NTT, applicable to CRYSTALS-Dilithium, FALCON, and CRYSTALS-Kyber, respectively. Our first implementation of Winograd NTT focuses on radix-16 NTT multiplication unit for polynomials of length 256 and can be directly used for CRYSTALS-Dilithium. The NTT of CRYSTALS-Dilithium is also benefited from our proposed K-RED modular multiplication. Our radix-16-based Winograd outperforms existing Cooley-Tukey/Gentleman-Sande based NTT multipliers of CRYSTALS-Dilithium. Our second implementation of NTT is based on radix-8 Winograd structure with a novel modular multiplication method that targets polynomials of length 512 and can be directly applied for FALCON. For CRYSTALS-Kyber, we have designed a radix-16 Winograd Butterfly Unit (BFU) that can be configured as two parallel radix-8 Winograd BFUs during mixed-radix computation. To the best of our knowledge, this is the first work that applied the Winograd technique for NTT multiplication for post-quantum secure lattice-based cryptographic algorithms.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 12","pages":"6396-6409"},"PeriodicalIF":5.2000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Winograd for NTT: A Case Study on Higher-Radix and Low-Latency Implementation of NTT for Post Quantum Cryptography on FPGA\",\"authors\":\"Suraj Mandal;Debapriya Basu Roy\",\"doi\":\"10.1109/TCSI.2024.3470335\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Number Theoretic Transform (NTT) plays an important role in efficiently implementing lattice-based cryptographic algorithms like CRYSTALS-Kyber, Dilithium, and FALCON. Existing implementations of NTT for these algorithms are mostly based on radix-2 or radix-4 realization of Cooley-Tukey and Gentleman-Sande architectures. In this work, we explore an alternative method of performing NTT known as Winograd’s NTT that requires fewer number of modular multipliers than the conventional Coole-Tukey/Gentleman-Sande for higher radix NTT. We have proposed three different low-latency implementations of Winograd’s NTT, applicable to CRYSTALS-Dilithium, FALCON, and CRYSTALS-Kyber, respectively. Our first implementation of Winograd NTT focuses on radix-16 NTT multiplication unit for polynomials of length 256 and can be directly used for CRYSTALS-Dilithium. The NTT of CRYSTALS-Dilithium is also benefited from our proposed K-RED modular multiplication. Our radix-16-based Winograd outperforms existing Cooley-Tukey/Gentleman-Sande based NTT multipliers of CRYSTALS-Dilithium. Our second implementation of NTT is based on radix-8 Winograd structure with a novel modular multiplication method that targets polynomials of length 512 and can be directly applied for FALCON. For CRYSTALS-Kyber, we have designed a radix-16 Winograd Butterfly Unit (BFU) that can be configured as two parallel radix-8 Winograd BFUs during mixed-radix computation. To the best of our knowledge, this is the first work that applied the Winograd technique for NTT multiplication for post-quantum secure lattice-based cryptographic algorithms.\",\"PeriodicalId\":13039,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"volume\":\"71 12\",\"pages\":\"6396-6409\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10711850/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10711850/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Winograd for NTT: A Case Study on Higher-Radix and Low-Latency Implementation of NTT for Post Quantum Cryptography on FPGA
Number Theoretic Transform (NTT) plays an important role in efficiently implementing lattice-based cryptographic algorithms like CRYSTALS-Kyber, Dilithium, and FALCON. Existing implementations of NTT for these algorithms are mostly based on radix-2 or radix-4 realization of Cooley-Tukey and Gentleman-Sande architectures. In this work, we explore an alternative method of performing NTT known as Winograd’s NTT that requires fewer number of modular multipliers than the conventional Coole-Tukey/Gentleman-Sande for higher radix NTT. We have proposed three different low-latency implementations of Winograd’s NTT, applicable to CRYSTALS-Dilithium, FALCON, and CRYSTALS-Kyber, respectively. Our first implementation of Winograd NTT focuses on radix-16 NTT multiplication unit for polynomials of length 256 and can be directly used for CRYSTALS-Dilithium. The NTT of CRYSTALS-Dilithium is also benefited from our proposed K-RED modular multiplication. Our radix-16-based Winograd outperforms existing Cooley-Tukey/Gentleman-Sande based NTT multipliers of CRYSTALS-Dilithium. Our second implementation of NTT is based on radix-8 Winograd structure with a novel modular multiplication method that targets polynomials of length 512 and can be directly applied for FALCON. For CRYSTALS-Kyber, we have designed a radix-16 Winograd Butterfly Unit (BFU) that can be configured as two parallel radix-8 Winograd BFUs during mixed-radix computation. To the best of our knowledge, this is the first work that applied the Winograd technique for NTT multiplication for post-quantum secure lattice-based cryptographic algorithms.
期刊介绍:
TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.