Pub Date : 2023-08-31DOI: 10.46586/tches.v2023.i4.80-109
Yusuke Naito, Yu Sasaki, T. Sugawara
We propose a new authenticated encryption (AE) mode MMM that achieves the minimum memory size with masking. Minimizing the secret state is the crucial challenge in the low-memory AE suitable for masking. Here, the minimum secret state is s + b bits, composed of s bits for a secret key and b bits for a plaintext block. HOMA appeared in CRYPTO 2022 achieved this goal with b = 64, but choosing a smaller b was difficult because b = s/2 is bound to the block size of the underlying primitive, meaning that a block cipher with an unrealistically small block size (e.g., 8 bits) is necessary for further improvement. MMM addresses the issue by making b independent of the underlying primitive while achieving the minimum (s + b)-bit secret state. Moreover, MMM provides additional advantages over HOMA, including (i) a better rate, (ii) the security under the multi-user model, (iii) and a smaller transmission cost. We instantiate two variants, MMM-8 (with b = 8) and MMM-64 (with b = 64), using the standard tweakable block cipher SKINNY-64/192. With a (d + 1)-masking scheme, MMM-8 (resp. MMM-64) is smaller by 56d + 184 (resp. 128) bits compared with HOMA. As a result of hardware performance evaluation, MMM-8 and MMM-64 achieved smaller circuit areas than HOMA with all the examined protection orders d ∈ [0, 5]. MMM-8’s circuit area is only 81% of HOMA with d = 5, and MMM-64 achieves more than x3 speed-up with a smaller circuit area.
{"title":"MMM: Authenticated Encryption with Minimum Secret State for Masking","authors":"Yusuke Naito, Yu Sasaki, T. Sugawara","doi":"10.46586/tches.v2023.i4.80-109","DOIUrl":"https://doi.org/10.46586/tches.v2023.i4.80-109","url":null,"abstract":"We propose a new authenticated encryption (AE) mode MMM that achieves the minimum memory size with masking. Minimizing the secret state is the crucial challenge in the low-memory AE suitable for masking. Here, the minimum secret state is s + b bits, composed of s bits for a secret key and b bits for a plaintext block. HOMA appeared in CRYPTO 2022 achieved this goal with b = 64, but choosing a smaller b was difficult because b = s/2 is bound to the block size of the underlying primitive, meaning that a block cipher with an unrealistically small block size (e.g., 8 bits) is necessary for further improvement. MMM addresses the issue by making b independent of the underlying primitive while achieving the minimum (s + b)-bit secret state. Moreover, MMM provides additional advantages over HOMA, including (i) a better rate, (ii) the security under the multi-user model, (iii) and a smaller transmission cost. We instantiate two variants, MMM-8 (with b = 8) and MMM-64 (with b = 64), using the standard tweakable block cipher SKINNY-64/192. With a (d + 1)-masking scheme, MMM-8 (resp. MMM-64) is smaller by 56d + 184 (resp. 128) bits compared with HOMA. As a result of hardware performance evaluation, MMM-8 and MMM-64 achieved smaller circuit areas than HOMA with all the examined protection orders d ∈ [0, 5]. MMM-8’s circuit area is only 81% of HOMA with d = 5, and MMM-64 achieves more than x3 speed-up with a smaller circuit area.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"138 1","pages":"80-109"},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74695154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-31DOI: 10.46586/tches.v2023.i4.393-419
Yuanxi Dai, Fangguo Zhang, Chang-An Zhao
Pairing-friendly curves with odd prime embedding degrees at the 128-bit security level, such as BW13-310 and BW19-286, sparked interest in the field of public-key cryptography as small sizes of the prime fields. However, compared to mainstream pairing-friendly curves at the same security level, i.e., BN446 and BLS12-446, the performance of pairing computations on BW13-310 and BW19-286 is usually considered inefficient. In this paper we investigate high performance software implementations of pairing computation on BW13-310 and corresponding building blocks used in pairing-based protocols, including hashing, group exponentiations and membership testings. Firstly, we propose efficient explicit formulas for pairing computation on this curve. Moreover, we also exploit the state-of-art techniques to implement hashing in G1 and G2, group exponentiations and membership testings. In particular, for exponentiations in G2 and GT , we present new optimizations to speed up computational efficiency. Our implementation results on a 64-bit processor show that the gap in the performance of pairing computation between BW13-310 and BN446 (resp. BLS12-446) is only up to 4.9% (resp. 26%). More importantly, compared to BN446 and BLS12-446, BW13-310 is about 109.1% − 227.3%, 100% − 192.6%, 24.5%−108.5% and 68.2%−145.5% faster in terms of hashing to G1, exponentiations in G1 and GT , and membership testing for GT , respectively. These results reveal that BW13-310 would be an interesting candidate in pairing-based cryptographic protocols.
{"title":"Don't Forget Pairing-Friendly Curves with Odd Prime Embedding Degrees","authors":"Yuanxi Dai, Fangguo Zhang, Chang-An Zhao","doi":"10.46586/tches.v2023.i4.393-419","DOIUrl":"https://doi.org/10.46586/tches.v2023.i4.393-419","url":null,"abstract":"Pairing-friendly curves with odd prime embedding degrees at the 128-bit security level, such as BW13-310 and BW19-286, sparked interest in the field of public-key cryptography as small sizes of the prime fields. However, compared to mainstream pairing-friendly curves at the same security level, i.e., BN446 and BLS12-446, the performance of pairing computations on BW13-310 and BW19-286 is usually considered inefficient. In this paper we investigate high performance software implementations of pairing computation on BW13-310 and corresponding building blocks used in pairing-based protocols, including hashing, group exponentiations and membership testings. Firstly, we propose efficient explicit formulas for pairing computation on this curve. Moreover, we also exploit the state-of-art techniques to implement hashing in G1 and G2, group exponentiations and membership testings. In particular, for exponentiations in G2 and GT , we present new optimizations to speed up computational efficiency. Our implementation results on a 64-bit processor show that the gap in the performance of pairing computation between BW13-310 and BN446 (resp. BLS12-446) is only up to 4.9% (resp. 26%). More importantly, compared to BN446 and BLS12-446, BW13-310 is about 109.1% − 227.3%, 100% − 192.6%, 24.5%−108.5% and 68.2%−145.5% faster in terms of hashing to G1, exponentiations in G1 and GT , and membership testing for GT , respectively. These results reveal that BW13-310 would be an interesting candidate in pairing-based cryptographic protocols.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"11 1","pages":"393-419"},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74721106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-31DOI: 10.46586/tches.v2023.i4.211-237
V. Fischer, F. Bernard, Nathalie Bochard, Quentin Dallison, M. Skorski
Field Programmable Gate Arrays (FPGAs) are used more and more frequently to implement cryptographic systems, which need random number generators (RNGs) to be embedded in the same device. The main challenge related to the implementation of a generator running inside FPGAs is that the physical source of randomness, such as jittered clock generator, is implemented in the configurable logic area, i.e. in the close vicinity of noisy running algorithms, which can have significant impact on generated numbers or even serve to attack the generator. A possible approach to prevent such influence is the use of Phase-Lock Loops (PLLs), which are separated from the re-configurable logic area inside the FPGA chip. In this paper, we propose a new architecture of the PLL-based TRNG including a method to avoid correlation in the output through control of timing in the sampling process, as well as new embedded tests based on the enhanced stochastic model. We also propose a workflow to help find the best parameters, such as output bitrate and entropy rate. We show that bitrates of around 400 kb/s or more can be achieved, while guaranteeing min-entropy rates per bit higher than 0.98 as required by the latest security standards.
{"title":"Enhancing Quality and Security of the PLL-TRNG","authors":"V. Fischer, F. Bernard, Nathalie Bochard, Quentin Dallison, M. Skorski","doi":"10.46586/tches.v2023.i4.211-237","DOIUrl":"https://doi.org/10.46586/tches.v2023.i4.211-237","url":null,"abstract":"Field Programmable Gate Arrays (FPGAs) are used more and more frequently to implement cryptographic systems, which need random number generators (RNGs) to be embedded in the same device. The main challenge related to the implementation of a generator running inside FPGAs is that the physical source of randomness, such as jittered clock generator, is implemented in the configurable logic area, i.e. in the close vicinity of noisy running algorithms, which can have significant impact on generated numbers or even serve to attack the generator. A possible approach to prevent such influence is the use of Phase-Lock Loops (PLLs), which are separated from the re-configurable logic area inside the FPGA chip. In this paper, we propose a new architecture of the PLL-based TRNG including a method to avoid correlation in the output through control of timing in the sampling process, as well as new embedded tests based on the enhanced stochastic model. We also propose a workflow to help find the best parameters, such as output bitrate and entropy rate. We show that bitrates of around 400 kb/s or more can be achieved, while guaranteeing min-entropy rates per bit higher than 0.98 as required by the latest security standards.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"109 1","pages":"211-237"},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79249929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-16DOI: 10.3390/cryptography7020032
Tao Feng, Yufeng Liu
In the blockchain network, the communication delay between different nodes is a great threat to the distributed ledger consistency of each miner. Blockchain is the core technology of Bitcoin. At present, some research has proven the security of the PoW protocol when the number of delay rounds is small, but in complex asynchronous networks, the research is insufficient on the security of the PoW protocol when the number of delay rounds is large. This paper improves the proposed blockchain main chain record model under the PoW protocol and then proposes the TOD model, which makes the main chain record in the model more close to the actual situation and reduces the errors caused by the establishment of the model in the analysis process. By comparing the differences between the TOD model and the original model, it is verified that the improved model has a higher success rate of attack when the probability of mining the delayable block increases. Then, the long delay attack is improved on the balance attack in this paper, which makes the adversary control part of the computing power and improves the success rate of the adversary attack within a certain limit.
{"title":"Research on PoW Protocol Security under Optimized Long Delay Attack","authors":"Tao Feng, Yufeng Liu","doi":"10.3390/cryptography7020032","DOIUrl":"https://doi.org/10.3390/cryptography7020032","url":null,"abstract":"In the blockchain network, the communication delay between different nodes is a great threat to the distributed ledger consistency of each miner. Blockchain is the core technology of Bitcoin. At present, some research has proven the security of the PoW protocol when the number of delay rounds is small, but in complex asynchronous networks, the research is insufficient on the security of the PoW protocol when the number of delay rounds is large. This paper improves the proposed blockchain main chain record model under the PoW protocol and then proposes the TOD model, which makes the main chain record in the model more close to the actual situation and reduces the errors caused by the establishment of the model in the analysis process. By comparing the differences between the TOD model and the original model, it is verified that the improved model has a higher success rate of attack when the probability of mining the delayable block increases. Then, the long delay attack is improved on the balance attack in this paper, which makes the adversary control part of the computing power and improves the success rate of the adversary attack within a certain limit.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"80 1","pages":"32"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75302651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-09DOI: 10.46586/tches.v2023.i3.74-96
Joppe W. Bos, Olivier Bronchain, F. Custers, Joost Renes, Denise Verbakel, C. V. Vredendaal
FrodoKEM is a lattice-based Key Encapsulation Mechanism (KEM) based on unstructured lattices. From a security point of view this makes it a conservative option to achieve post-quantum security, hence why it is favored by several European authorities (e.g., German BSI and French ANSSI). Relying on unstructured instead of structured lattices (e.g., CRYSTALS-Kyber) comes at the cost of additional memory usage, which is particularly critical for embedded security applications such as smart cards. For example, prior FrodoKEM-640 implementations (using AES) on Cortex-M4 require more than 80 kB of stack making it impossible to run on some embedded systems. In this work, we explore several stack reduction strategies and the resulting time versus memory trade-offs. Concretely, we reduce the stack consumption of FrodoKEM by a factor 2–3x compared to the smallest known implementations with almost no impact on performance. We also present various time-memory trade-offs going as low as 8 kB for all AES parameter sets, and below 4 kB for FrodoKEM-640. By introducing a minor tweak to the FrodoKEM specifications, we additionally reduce the stack consumption down to 8 kB for all the SHAKE versions. As a result, this work enables FrodoKEM on more resource constrained embedded systems.
{"title":"Enabling FrodoKEM on Embedded Devices","authors":"Joppe W. Bos, Olivier Bronchain, F. Custers, Joost Renes, Denise Verbakel, C. V. Vredendaal","doi":"10.46586/tches.v2023.i3.74-96","DOIUrl":"https://doi.org/10.46586/tches.v2023.i3.74-96","url":null,"abstract":"FrodoKEM is a lattice-based Key Encapsulation Mechanism (KEM) based on unstructured lattices. From a security point of view this makes it a conservative option to achieve post-quantum security, hence why it is favored by several European authorities (e.g., German BSI and French ANSSI). Relying on unstructured instead of structured lattices (e.g., CRYSTALS-Kyber) comes at the cost of additional memory usage, which is particularly critical for embedded security applications such as smart cards. For example, prior FrodoKEM-640 implementations (using AES) on Cortex-M4 require more than 80 kB of stack making it impossible to run on some embedded systems. In this work, we explore several stack reduction strategies and the resulting time versus memory trade-offs. Concretely, we reduce the stack consumption of FrodoKEM by a factor 2–3x compared to the smallest known implementations with almost no impact on performance. We also present various time-memory trade-offs going as low as 8 kB for all AES parameter sets, and below 4 kB for FrodoKEM-640. By introducing a minor tweak to the FrodoKEM specifications, we additionally reduce the stack consumption down to 8 kB for all the SHAKE versions. As a result, this work enables FrodoKEM on more resource constrained embedded systems.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"22 1","pages":"74-96"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73788248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-09DOI: 10.46586/tches.v2023.i3.504-521
G. Botrel, Youssef El Housni
The bottleneck in the proving algorithm of most of elliptic-curve-based SNARK proof systems is the Multi-Scalar-Multiplication (MSM) algorithm. In this paper we give an overview of a variant of the Pippenger MSM algorithm together with a set of optimizations tailored for curves that admit a twisted Edwards form. We prove that this is the case for SNARK-friendly chains and cycles of elliptic curves, which are useful for recursive constructions. Our contribution is twofold: first, we optimize the arithmetic of finite fields by improving on the well-known Coarsely Integrated Operand Scanning (CIOS) modular multiplication. This is a contribution of independent interest that applies to many different contexts. Second, we propose a new coordinate system for twisted Edwards curves tailored for the Pippenger MSM algorithm.Accelerating the MSM over these curves is critical for deployment of recursive proof< systems applications such as proof-carrying-data, blockchain rollups and blockchain light clients. We implement our work in Go and benchmark it on two different CPU architectures (x86 and arm64). We show that our implementation achieves a 40-47% speedup over the state-of-the-art implementation (which was implemented in Rust). This MSM implementation won the first place in the ZPrize competition in the open division “Accelerating MSM on Mobile” and will be deployed in two real-world applications: Linea zkEVM by ConsenSys and probably Celo network.
{"title":"Faster Montgomery multiplication and Multi-Scalar-Multiplication for SNARKs","authors":"G. Botrel, Youssef El Housni","doi":"10.46586/tches.v2023.i3.504-521","DOIUrl":"https://doi.org/10.46586/tches.v2023.i3.504-521","url":null,"abstract":"The bottleneck in the proving algorithm of most of elliptic-curve-based SNARK proof systems is the Multi-Scalar-Multiplication (MSM) algorithm. In this paper we give an overview of a variant of the Pippenger MSM algorithm together with a set of optimizations tailored for curves that admit a twisted Edwards form. We prove that this is the case for SNARK-friendly chains and cycles of elliptic curves, which are useful for recursive constructions. Our contribution is twofold: first, we optimize the arithmetic of finite fields by improving on the well-known Coarsely Integrated Operand Scanning (CIOS) modular multiplication. This is a contribution of independent interest that applies to many different contexts. Second, we propose a new coordinate system for twisted Edwards curves tailored for the Pippenger MSM algorithm.Accelerating the MSM over these curves is critical for deployment of recursive proof< systems applications such as proof-carrying-data, blockchain rollups and blockchain light clients. We implement our work in Go and benchmark it on two different CPU architectures (x86 and arm64). We show that our implementation achieves a 40-47% speedup over the state-of-the-art implementation (which was implemented in Rust). This MSM implementation won the first place in the ZPrize competition in the open division “Accelerating MSM on Mobile” and will be deployed in two real-world applications: Linea zkEVM by ConsenSys and probably Celo network.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"35 1","pages":"504-521"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84569799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-09DOI: 10.46586/tches.v2023.i3.164-193
J. Almeida, M. Barbosa, G. Barthe, B. Grégoire, Vincent Laporte, Jean-Christophe Léchenet, Tiago Oliveira, Hugo Pacheco, Miguel Quaresma, P. Schwabe, Antoine Séré, Pierre-Yves Strub
In this paper we present the first formally verified implementations of Kyber and, to the best of our knowledge, the first such implementations of any post-quantum cryptosystem. We give a (readable) formal specification of Kyber in the EasyCrypt proof assistant, which is syntactically very close to the pseudocode description of the scheme as given in the most recent version of the NIST submission. We present high-assurance open-source implementations of Kyber written in the Jasmin language, along with machine-checked proofs that they are functionally correct with respect to the EasyCrypt specification. We describe a number of improvements to the EasyCrypt and Jasmin frameworks that were needed for this implementation and verification effort, and we present detailed benchmarks of our implementations, showing that our code achieves performance close to existing hand-optimized implementations in C and assembly.
{"title":"Formally verifying Kyber Episode IV: Implementation correctness","authors":"J. Almeida, M. Barbosa, G. Barthe, B. Grégoire, Vincent Laporte, Jean-Christophe Léchenet, Tiago Oliveira, Hugo Pacheco, Miguel Quaresma, P. Schwabe, Antoine Séré, Pierre-Yves Strub","doi":"10.46586/tches.v2023.i3.164-193","DOIUrl":"https://doi.org/10.46586/tches.v2023.i3.164-193","url":null,"abstract":"In this paper we present the first formally verified implementations of Kyber and, to the best of our knowledge, the first such implementations of any post-quantum cryptosystem. We give a (readable) formal specification of Kyber in the EasyCrypt proof assistant, which is syntactically very close to the pseudocode description of the scheme as given in the most recent version of the NIST submission. We present high-assurance open-source implementations of Kyber written in the Jasmin language, along with machine-checked proofs that they are functionally correct with respect to the EasyCrypt specification. We describe a number of improvements to the EasyCrypt and Jasmin frameworks that were needed for this implementation and verification effort, and we present detailed benchmarks of our implementations, showing that our code achieves performance close to existing hand-optimized implementations in C and assembly.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"25 1","pages":"164-193"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74026969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-09DOI: 10.46586/tches.v2023.i3.294-320
Kai Schoos, Sergej Meschkov, M. Tahoori, Dennis R. E. Gnad
In safety and security conscious environments, isolated communication channels are often deemed necessary. Galvanically isolated communication channels are typically expected not to allow physical side-channel attacks through that channel. However, in this paper, we show that they can inadvertently leak side channel information in the form of minuscule jitter on the communication signal. We observe worst-case signal jitter within 54 ± 45 ps using an FPGA-based receiver employing a time-to-digital converter (TDC), which is a higher time resolution than a typical oscilloscope can measure, while in many other systems such measurements are also possible. A transmitter device runs a cryptographic accelerator, while we connect an FPGA on the receiver side and measure the signal jitter using a TDC. We can indeed show sufficient side-channel leakage in the jitter of the signal by performing a key recovery of an AES accelerator running on the transmitter. Furthermore, we compare this leakage to a power side channel also measured with a TDC and prove that the timing jitter alone contains sufficient side-channel information. While for an on-chip power analysis attack about 27k traces are needed for key recovery, our cross-device jitter-based attack only needs as few as 47k traces, depending on the setup. Galvanic isolation does not change that significantly. That is an increase by only 1.7x, showing that fine-grained jitter timing information can be a very potent attack vector even under galvanic isolation. In summary, we introduce a new side-channel attack vector that can leak information in many presumably secure systems. Communication channels can inadvertently leak information through tiny timing variations, known as signal jitter. This could affect millions of devices and needs to be considered.
{"title":"JitSCA: Jitter-based Side-Channel Analysis in Picoscale Resolution","authors":"Kai Schoos, Sergej Meschkov, M. Tahoori, Dennis R. E. Gnad","doi":"10.46586/tches.v2023.i3.294-320","DOIUrl":"https://doi.org/10.46586/tches.v2023.i3.294-320","url":null,"abstract":"In safety and security conscious environments, isolated communication channels are often deemed necessary. Galvanically isolated communication channels are typically expected not to allow physical side-channel attacks through that channel. However, in this paper, we show that they can inadvertently leak side channel information in the form of minuscule jitter on the communication signal. We observe worst-case signal jitter within 54 ± 45 ps using an FPGA-based receiver employing a time-to-digital converter (TDC), which is a higher time resolution than a typical oscilloscope can measure, while in many other systems such measurements are also possible. A transmitter device runs a cryptographic accelerator, while we connect an FPGA on the receiver side and measure the signal jitter using a TDC. We can indeed show sufficient side-channel leakage in the jitter of the signal by performing a key recovery of an AES accelerator running on the transmitter. Furthermore, we compare this leakage to a power side channel also measured with a TDC and prove that the timing jitter alone contains sufficient side-channel information. While for an on-chip power analysis attack about 27k traces are needed for key recovery, our cross-device jitter-based attack only needs as few as 47k traces, depending on the setup. Galvanic isolation does not change that significantly. That is an increase by only 1.7x, showing that fine-grained jitter timing information can be a very potent attack vector even under galvanic isolation. In summary, we introduce a new side-channel attack vector that can leak information in many presumably secure systems. Communication channels can inadvertently leak information through tiny timing variations, known as signal jitter. This could affect millions of devices and needs to be considered.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"74 1","pages":"294-320"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82196665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-09DOI: 10.46586/tches.v2023.i3.570-596
Zehong Qiu, Fan Zhang
Algebraic Fault Analysis (AFA) is a cryptanalysis for block ciphers proposed by Courtois et al., which incorporates algebraic cryptanalysis to overcome the complexity of manual analysis within the context of Differential Fault Analysis (DFA). The effectiveness of AFA on lightweight block ciphers has been demonstrated. However, the complexity of the algebraic systems prevents it from attacking heavyweight block ciphers efficiently. In this paper, we propose a novel cryptanalysis called Redundancies-assisted Algebraic Fault Analysis (RAFA) to facilitate the solution of algebraic systems in the setting of heavyweight block ciphers. The core idea of RAFA is to expedite SAT solvers by modifying the algebraic systems, which is accomplished via two methods. The first method introduces redundant constraints, which is proposed for the first time in the context of algebraic cryptanalysis. The second one is a sophisticated linearization of the nonlinear Algebraic Normal Form (ANF). It takes RAFA for about 9.68 hours to attack AES-128. To the best of our knowledge, this is the first work that uses a general SAT solver to attack AES with only a single injection of byte-fault. Moreover, RAFA can attack AES-128 in 50.92 and 27.54 minutes for nibble- and bit-based fault model, respectively. In comparison, the traditional DFA algorithm implemented by pure C takes 4 ~ 5 hours under all three fault models investigated in this work. Moreover, in order to show the generality of RAFA, we also apply it to other heavyweight block ciphers. The best results show that RAFA could recover the key of Serpent-256 and SPEEDY-r-192 in 20.7 and 1.5 hours using only three faults, respectively. In comparison, AFA could not break these two ciphers even when 30 bits and 50 bits of their keys are known, respectively. Furthermore, no DFA work on Serpent or SPEEDY is known using comparable fault models.
{"title":"RAFA: Redundancies-assisted Algebraic Fault Analysis and its implementation on SPN block ciphers","authors":"Zehong Qiu, Fan Zhang","doi":"10.46586/tches.v2023.i3.570-596","DOIUrl":"https://doi.org/10.46586/tches.v2023.i3.570-596","url":null,"abstract":"Algebraic Fault Analysis (AFA) is a cryptanalysis for block ciphers proposed by Courtois et al., which incorporates algebraic cryptanalysis to overcome the complexity of manual analysis within the context of Differential Fault Analysis (DFA). The effectiveness of AFA on lightweight block ciphers has been demonstrated. However, the complexity of the algebraic systems prevents it from attacking heavyweight block ciphers efficiently. In this paper, we propose a novel cryptanalysis called Redundancies-assisted Algebraic Fault Analysis (RAFA) to facilitate the solution of algebraic systems in the setting of heavyweight block ciphers. The core idea of RAFA is to expedite SAT solvers by modifying the algebraic systems, which is accomplished via two methods. The first method introduces redundant constraints, which is proposed for the first time in the context of algebraic cryptanalysis. The second one is a sophisticated linearization of the nonlinear Algebraic Normal Form (ANF). It takes RAFA for about 9.68 hours to attack AES-128. To the best of our knowledge, this is the first work that uses a general SAT solver to attack AES with only a single injection of byte-fault. Moreover, RAFA can attack AES-128 in 50.92 and 27.54 minutes for nibble- and bit-based fault model, respectively. In comparison, the traditional DFA algorithm implemented by pure C takes 4 ~ 5 hours under all three fault models investigated in this work. Moreover, in order to show the generality of RAFA, we also apply it to other heavyweight block ciphers. The best results show that RAFA could recover the key of Serpent-256 and SPEEDY-r-192 in 20.7 and 1.5 hours using only three faults, respectively. In comparison, AFA could not break these two ciphers even when 30 bits and 50 bits of their keys are known, respectively. Furthermore, no DFA work on Serpent or SPEEDY is known using comparable fault models.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"33 1","pages":"570-596"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81138040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-09DOI: 10.46586/tches.v2023.i3.194-220
Tao Lu, Chengkun Wei, Ruijing Yu, Yi Chen, L. xilinx Wang, Chaochao Chen, Zeke Wang, Wenzhi Chen
Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 has a high overhead on its proof generation step, which consists of several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), and multi-scalar multiplication (MSM). Therefore, this paper presents cuZK, an efficient GPU implementation of zkSNARK with the following three techniques to achieve high performance. First, we propose a new parallel MSM algorithm. This MSM algorithm achieves nearly perfect linear speedup over the Pippenger algorithm, a well-known serial MSM algorithm. Second, we parallelize the MUL operation. Along with our self-designed MSM scheme and well-studied NTT scheme, cuZK achieves the parallelization of all operations in the proof generation step. Third, cuZK reduces the latency overhead caused by CPU-GPU data transfer by 1) reducing redundant data transfer and 2) overlapping data transfer and device computation. The evaluation results show that our MSM module provides over 2.08x (up to 2.94x) speedup versus the state-of-the-art GPU implementation. cuZK achieves over 2.65x (up to 4.86x) speedup on standard benchmarks and 2.18× speedup on a GPU-accelerated cryptocurrency application, Filecoin.
{"title":"cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs","authors":"Tao Lu, Chengkun Wei, Ruijing Yu, Yi Chen, L. xilinx Wang, Chaochao Chen, Zeke Wang, Wenzhi Chen","doi":"10.46586/tches.v2023.i3.194-220","DOIUrl":"https://doi.org/10.46586/tches.v2023.i3.194-220","url":null,"abstract":"Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 has a high overhead on its proof generation step, which consists of several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), and multi-scalar multiplication (MSM). Therefore, this paper presents cuZK, an efficient GPU implementation of zkSNARK with the following three techniques to achieve high performance. First, we propose a new parallel MSM algorithm. This MSM algorithm achieves nearly perfect linear speedup over the Pippenger algorithm, a well-known serial MSM algorithm. Second, we parallelize the MUL operation. Along with our self-designed MSM scheme and well-studied NTT scheme, cuZK achieves the parallelization of all operations in the proof generation step. Third, cuZK reduces the latency overhead caused by CPU-GPU data transfer by 1) reducing redundant data transfer and 2) overlapping data transfer and device computation. The evaluation results show that our MSM module provides over 2.08x (up to 2.94x) speedup versus the state-of-the-art GPU implementation. cuZK achieves over 2.65x (up to 4.86x) speedup on standard benchmarks and 2.18× speedup on a GPU-accelerated cryptocurrency application, Filecoin.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"40 1","pages":"194-220"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83279249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}