Décio Luiz Gazzoni Filho, Tomás Recio, Julio López Hernandez
We present a solution to the open problem of designing a linear-time, unbiased and timing attack-resistant shuffling algorithm for fixed-weight sampling. Although it can be implemented without timing leakages of secret data in any architecture, we illustrate with ARMv7-M and ARMv8-A implementations; for the latter, we take advantage of architectural features such as NEON and conditional instructions, which are representative of features available on architectures targeting similar systems, such as Intel. Our proposed algorithm improves asymptotically upon the current approach based on constant-time sorting networks ( O ( n ) versus O ( n log 2 n ) ), and an implementation of the new algorithm applied to NTRU is also faster in practice, by a factor of up to 6.91 ( 591 % ) on ARMv8-A cores and 12.89 ( 1189 % ) on the Cortex-M4; it also requires fewer uniform random bits. This translates into performance improvements for NTRU encapsulation, compared to state-of-the-art implementations, of up to 50% on ARMv8-A cores and 72% on the Cortex-M4, and small improvements to key generation (up to 2.7% on ARMv8-A cores and 6.1% on the Cortex-M4), with negligible impact on code size and a slight improvement in RAM usage for the Cortex-M4.
{"title":"Efficient isochronous fixed-weight sampling with applications to NTRU","authors":"Décio Luiz Gazzoni Filho, Tomás Recio, Julio López Hernandez","doi":"10.62056/a6n59qgxq","DOIUrl":"https://doi.org/10.62056/a6n59qgxq","url":null,"abstract":"We present a solution to the open problem of designing a linear-time, unbiased and timing attack-resistant shuffling algorithm for fixed-weight sampling. Although it can be implemented without timing leakages of secret data in any architecture, we illustrate with ARMv7-M and ARMv8-A implementations; for the latter, we take advantage of architectural features such as NEON and conditional instructions, which are representative of features available on architectures targeting similar systems, such as Intel. Our proposed algorithm improves asymptotically upon the current approach based on constant-time sorting networks (\u0000 \u0000 O\u0000 (\u0000 n\u0000 )\u0000 \u0000 versus \u0000 \u0000 O\u0000 (\u0000 n\u0000 \u0000 log\u0000 2\u0000 \u0000 n\u0000 )\u0000 \u0000 ), and an implementation of the new algorithm applied to NTRU is also faster in practice, by a factor of up to \u0000 \u0000 6.91\u0000 \u0000 (\u0000 591\u0000 %\u0000 )\u0000 \u0000 on ARMv8-A cores and \u0000 \u0000 12.89\u0000 \u0000 (\u0000 1189\u0000 %\u0000 )\u0000 \u0000 on the Cortex-M4; it also requires fewer uniform random bits. This translates into performance improvements for NTRU encapsulation, compared to state-of-the-art implementations, of up to 50% on ARMv8-A cores and 72% on the Cortex-M4, and small improvements to key generation (up to 2.7% on ARMv8-A cores and 6.1% on the Cortex-M4), with negligible impact on code size and a slight improvement in RAM usage for the Cortex-M4.","PeriodicalId":13158,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"119 17","pages":"548"},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141667708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed key generation (DKG) is a key building block in developing many efficient threshold cryptosystems. This work initiates the study of communication complexity and round complexity of DKG protocols over a point-to-point (bounded) synchronous network. Our key result is the first synchronous DKG protocol for discrete log-based cryptosystems with