IACR Trans. Cryptogr. Hardw. Embed. Syst.最新文献_第2页

MMM: Authenticated Encryption with Minimum Secret State for Masking MMM:具有最小秘密状态的身份验证加密

IACR Trans. Cryptogr. Hardw. Embed. Syst.

Pub Date : 2023-08-31 DOI: 10.46586/tches.v2023.i4.80-109

Yusuke Naito, Yu Sasaki, T. Sugawara

We propose a new authenticated encryption (AE) mode MMM that achieves the minimum memory size with masking. Minimizing the secret state is the crucial challenge in the low-memory AE suitable for masking. Here, the minimum secret state is s + b bits, composed of s bits for a secret key and b bits for a plaintext block. HOMA appeared in CRYPTO 2022 achieved this goal with b = 64, but choosing a smaller b was difficult because b = s/2 is bound to the block size of the underlying primitive, meaning that a block cipher with an unrealistically small block size (e.g., 8 bits) is necessary for further improvement. MMM addresses the issue by making b independent of the underlying primitive while achieving the minimum (s + b)-bit secret state. Moreover, MMM provides additional advantages over HOMA, including (i) a better rate, (ii) the security under the multi-user model, (iii) and a smaller transmission cost. We instantiate two variants, MMM-8 (with b = 8) and MMM-64 (with b = 64), using the standard tweakable block cipher SKINNY-64/192. With a (d + 1)-masking scheme, MMM-8 (resp. MMM-64) is smaller by 56d + 184 (resp. 128) bits compared with HOMA. As a result of hardware performance evaluation, MMM-8 and MMM-64 achieved smaller circuit areas than HOMA with all the examined protection orders d ∈ [0, 5]. MMM-8’s circuit area is only 81% of HOMA with d = 5, and MMM-64 achieves more than x3 speed-up with a smaller circuit area.

我们提出了一种新的身份验证加密(AE)模式MMM，该模式通过屏蔽实现了最小内存大小。最小化秘密状态是适合掩蔽的低内存声发射的关键问题。这里，最小的秘密状态是s + b位，其中s位是一个秘密密钥，b位是一个明文块。CRYPTO 2022中出现的HOMA以b = 64实现了这一目标，但选择较小的b是困难的，因为b = s/2与底层原语的块大小相绑定，这意味着为了进一步改进，需要具有不切实际的小块大小(例如8位)的块密码。MMM通过使b独立于底层原语来解决这个问题，同时实现最小(s + b)位的秘密状态。此外，MMM提供了比HOMA更多的优势，包括(i)更高的速率，(ii)多用户模式下的安全性，(iii)更小的传输成本。我们实例化了两个变体，mm -8 (b = 8)和mm -64 (b = 64)，使用标准的可调整分组密码SKINNY-64/192。采用(d + 1)-掩码方案，mm -8 (p。mm -64)比mm -64小56d + 184。128)位与HOMA相比。通过硬件性能评估，MMM-8和MMM-64的电路面积均小于HOMA，所有检测的保护阶数d∈[0,5]。当d = 5时，mm -8的电路面积仅为HOMA的81%，而mm -64在电路面积较小的情况下可实现3倍以上的加速。

{"title":"MMM: Authenticated Encryption with Minimum Secret State for Masking","authors":"Yusuke Naito, Yu Sasaki, T. Sugawara","doi":"10.46586/tches.v2023.i4.80-109","DOIUrl":"https://doi.org/10.46586/tches.v2023.i4.80-109","url":null,"abstract":"We propose a new authenticated encryption (AE) mode MMM that achieves the minimum memory size with masking. Minimizing the secret state is the crucial challenge in the low-memory AE suitable for masking. Here, the minimum secret state is s + b bits, composed of s bits for a secret key and b bits for a plaintext block. HOMA appeared in CRYPTO 2022 achieved this goal with b = 64, but choosing a smaller b was difficult because b = s/2 is bound to the block size of the underlying primitive, meaning that a block cipher with an unrealistically small block size (e.g., 8 bits) is necessary for further improvement. MMM addresses the issue by making b independent of the underlying primitive while achieving the minimum (s + b)-bit secret state. Moreover, MMM provides additional advantages over HOMA, including (i) a better rate, (ii) the security under the multi-user model, (iii) and a smaller transmission cost. We instantiate two variants, MMM-8 (with b = 8) and MMM-64 (with b = 64), using the standard tweakable block cipher SKINNY-64/192. With a (d + 1)-masking scheme, MMM-8 (resp. MMM-64) is smaller by 56d + 184 (resp. 128) bits compared with HOMA. As a result of hardware performance evaluation, MMM-8 and MMM-64 achieved smaller circuit areas than HOMA with all the examined protection orders d ∈ [0, 5]. MMM-8’s circuit area is only 81% of HOMA with d = 5, and MMM-64 achieves more than x3 speed-up with a smaller circuit area.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"138 1","pages":"80-109"},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74695154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Don't Forget Pairing-Friendly Curves with Odd Prime Embedding Degrees 不要忘记具有奇数素数嵌入度的配对友好曲线

IACR Trans. Cryptogr. Hardw. Embed. Syst.

Pub Date : 2023-08-31 DOI: 10.46586/tches.v2023.i4.393-419

Yuanxi Dai, Fangguo Zhang, Chang-An Zhao

Pairing-friendly curves with odd prime embedding degrees at the 128-bit security level, such as BW13-310 and BW19-286, sparked interest in the field of public-key cryptography as small sizes of the prime fields. However, compared to mainstream pairing-friendly curves at the same security level, i.e., BN446 and BLS12-446, the performance of pairing computations on BW13-310 and BW19-286 is usually considered inefficient. In this paper we investigate high performance software implementations of pairing computation on BW13-310 and corresponding building blocks used in pairing-based protocols, including hashing, group exponentiations and membership testings. Firstly, we propose efficient explicit formulas for pairing computation on this curve. Moreover, we also exploit the state-of-art techniques to implement hashing in G1 and G2, group exponentiations and membership testings. In particular, for exponentiations in G2 and GT , we present new optimizations to speed up computational efficiency. Our implementation results on a 64-bit processor show that the gap in the performance of pairing computation between BW13-310 and BN446 (resp. BLS12-446) is only up to 4.9% (resp. 26%). More importantly, compared to BN446 and BLS12-446, BW13-310 is about 109.1% − 227.3%, 100% − 192.6%, 24.5%−108.5% and 68.2%−145.5% faster in terms of hashing to G1, exponentiations in G1 and GT , and membership testing for GT , respectively. These results reveal that BW13-310 would be an interesting candidate in pairing-based cryptographic protocols.

在128位安全级别上具有奇数素数嵌入度的配对友好曲线，如BW13-310和BW19-286，作为小尺寸的素数域引发了对公钥加密领域的兴趣。然而，相对于同安全级别的主流配对友好曲线BN446和BLS12-446, BW13-310和BW19-286的配对计算性能通常被认为是低效的。本文研究了基于BW13-310的配对计算的高性能软件实现，以及在基于配对的协议中使用的相应构建块，包括哈希、群幂和隶属性测试。首先，我们提出了在该曲线上进行配对计算的有效显式公式。此外，我们还利用最先进的技术来实现G1和G2中的哈希、组求幂和成员测试。特别地，对于G2和GT的幂运算，我们提出了新的优化来加快计算效率。我们在64位处理器上的实现结果表明，BW13-310和BN446之间的配对计算性能差距很大。BLS12-446)仅高达4.9%。26%)。更重要的是，与BN446和BLS12-446相比，BW13-310在G1的哈希、G1和GT的幂和GT的成员测试方面分别快了109.1% ~ 227.3%、100% ~ 192.6%、24.5% ~ 108.5%和68.2% ~ 145.5%。这些结果表明，BW13-310将是基于配对的加密协议中一个有趣的候选。

{"title":"Don't Forget Pairing-Friendly Curves with Odd Prime Embedding Degrees","authors":"Yuanxi Dai, Fangguo Zhang, Chang-An Zhao","doi":"10.46586/tches.v2023.i4.393-419","DOIUrl":"https://doi.org/10.46586/tches.v2023.i4.393-419","url":null,"abstract":"Pairing-friendly curves with odd prime embedding degrees at the 128-bit security level, such as BW13-310 and BW19-286, sparked interest in the field of public-key cryptography as small sizes of the prime fields. However, compared to mainstream pairing-friendly curves at the same security level, i.e., BN446 and BLS12-446, the performance of pairing computations on BW13-310 and BW19-286 is usually considered inefficient. In this paper we investigate high performance software implementations of pairing computation on BW13-310 and corresponding building blocks used in pairing-based protocols, including hashing, group exponentiations and membership testings. Firstly, we propose efficient explicit formulas for pairing computation on this curve. Moreover, we also exploit the state-of-art techniques to implement hashing in G1 and G2, group exponentiations and membership testings. In particular, for exponentiations in G2 and GT , we present new optimizations to speed up computational efficiency. Our implementation results on a 64-bit processor show that the gap in the performance of pairing computation between BW13-310 and BN446 (resp. BLS12-446) is only up to 4.9% (resp. 26%). More importantly, compared to BN446 and BLS12-446, BW13-310 is about 109.1% − 227.3%, 100% − 192.6%, 24.5%−108.5% and 68.2%−145.5% faster in terms of hashing to G1, exponentiations in G1 and GT , and membership testing for GT , respectively. These results reveal that BW13-310 would be an interesting candidate in pairing-based cryptographic protocols.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"11 1","pages":"393-419"},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74721106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Quality and Security of the PLL-TRNG 提高PLL-TRNG的质量和安全性

IACR Trans. Cryptogr. Hardw. Embed. Syst.

Pub Date : 2023-08-31 DOI: 10.46586/tches.v2023.i4.211-237

V. Fischer, F. Bernard, Nathalie Bochard, Quentin Dallison, M. Skorski

Field Programmable Gate Arrays (FPGAs) are used more and more frequently to implement cryptographic systems, which need random number generators (RNGs) to be embedded in the same device. The main challenge related to the implementation of a generator running inside FPGAs is that the physical source of randomness, such as jittered clock generator, is implemented in the configurable logic area, i.e. in the close vicinity of noisy running algorithms, which can have significant impact on generated numbers or even serve to attack the generator. A possible approach to prevent such influence is the use of Phase-Lock Loops (PLLs), which are separated from the re-configurable logic area inside the FPGA chip. In this paper, we propose a new architecture of the PLL-based TRNG including a method to avoid correlation in the output through control of timing in the sampling process, as well as new embedded tests based on the enhanced stochastic model. We also propose a workflow to help find the best parameters, such as output bitrate and entropy rate. We show that bitrates of around 400 kb/s or more can be achieved, while guaranteeing min-entropy rates per bit higher than 0.98 as required by the latest security standards.

现场可编程门阵列(fpga)越来越多地用于实现加密系统，这需要在同一设备中嵌入随机数生成器(rng)。实现在fpga内部运行的生成器所面临的主要挑战是，随机性的物理来源，如抖动时钟生成器，是在可配置逻辑区域中实现的，即在噪声运行算法的附近，这可能对生成的数字产生重大影响，甚至可以攻击生成器。防止这种影响的一种可能方法是使用锁相环(pll)，它与FPGA芯片内的可重新配置逻辑区域分开。在本文中，我们提出了一种新的基于锁相环的TRNG结构，包括一种通过控制采样过程中的时序来避免输出相关性的方法，以及基于增强随机模型的新的嵌入式测试。我们还提出了一个工作流来帮助找到最佳参数，如输出比特率和熵率。我们表明，可以实现大约400kb /s或更高的比特率，同时保证最新安全标准要求的每比特最小熵率高于0.98。

{"title":"Enhancing Quality and Security of the PLL-TRNG","authors":"V. Fischer, F. Bernard, Nathalie Bochard, Quentin Dallison, M. Skorski","doi":"10.46586/tches.v2023.i4.211-237","DOIUrl":"https://doi.org/10.46586/tches.v2023.i4.211-237","url":null,"abstract":"Field Programmable Gate Arrays (FPGAs) are used more and more frequently to implement cryptographic systems, which need random number generators (RNGs) to be embedded in the same device. The main challenge related to the implementation of a generator running inside FPGAs is that the physical source of randomness, such as jittered clock generator, is implemented in the configurable logic area, i.e. in the close vicinity of noisy running algorithms, which can have significant impact on generated numbers or even serve to attack the generator. A possible approach to prevent such influence is the use of Phase-Lock Loops (PLLs), which are separated from the re-configurable logic area inside the FPGA chip. In this paper, we propose a new architecture of the PLL-based TRNG including a method to avoid correlation in the output through control of timing in the sampling process, as well as new embedded tests based on the enhanced stochastic model. We also propose a workflow to help find the best parameters, such as output bitrate and entropy rate. We show that bitrates of around 400 kb/s or more can be achieved, while guaranteeing min-entropy rates per bit higher than 0.98 as required by the latest security standards.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"109 1","pages":"211-237"},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79249929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Research on PoW Protocol Security under Optimized Long Delay Attack 优化长延时攻击下PoW协议安全性研究

IACR Trans. Cryptogr. Hardw. Embed. Syst.

Pub Date : 2023-06-16 DOI: 10.3390/cryptography7020032

Tao Feng, Yufeng Liu

In the blockchain network, the communication delay between different nodes is a great threat to the distributed ledger consistency of each miner. Blockchain is the core technology of Bitcoin. At present, some research has proven the security of the PoW protocol when the number of delay rounds is small, but in complex asynchronous networks, the research is insufficient on the security of the PoW protocol when the number of delay rounds is large. This paper improves the proposed blockchain main chain record model under the PoW protocol and then proposes the TOD model, which makes the main chain record in the model more close to the actual situation and reduces the errors caused by the establishment of the model in the analysis process. By comparing the differences between the TOD model and the original model, it is verified that the improved model has a higher success rate of attack when the probability of mining the delayable block increases. Then, the long delay attack is improved on the balance attack in this paper, which makes the adversary control part of the computing power and improves the success rate of the adversary attack within a certain limit.

在区块链网络中，不同节点之间的通信延迟对每个矿工的分布式账本一致性构成了很大的威胁。区块链是比特币的核心技术。目前已有研究证明了延迟轮数较小时PoW协议的安全性，但在复杂的异步网络中，对延迟轮数较大时PoW协议的安全性研究不足。本文对PoW协议下提出的区块链主链记录模型进行了改进，进而提出了TOD模型，使模型中的主链记录更接近实际情况，减少了在分析过程中由于模型建立所带来的误差。通过对比TOD模型与原始模型的差异，验证了改进模型在挖掘可延迟区块的概率增大时具有更高的攻击成功率。然后，本文在平衡攻击的基础上改进了长延时攻击，使敌方控制了部分算力，在一定范围内提高了敌方攻击的成功率。

引用次数: 0

Enabling FrodoKEM on Embedded Devices 在嵌入式设备上启用FrodoKEM

IACR Trans. Cryptogr. Hardw. Embed. Syst.

Pub Date : 2023-06-09 DOI: 10.46586/tches.v2023.i3.74-96

Joppe W. Bos, Olivier Bronchain, F. Custers, Joost Renes, Denise Verbakel, C. V. Vredendaal

FrodoKEM is a lattice-based Key Encapsulation Mechanism (KEM) based on unstructured lattices. From a security point of view this makes it a conservative option to achieve post-quantum security, hence why it is favored by several European authorities (e.g., German BSI and French ANSSI). Relying on unstructured instead of structured lattices (e.g., CRYSTALS-Kyber) comes at the cost of additional memory usage, which is particularly critical for embedded security applications such as smart cards. For example, prior FrodoKEM-640 implementations (using AES) on Cortex-M4 require more than 80 kB of stack making it impossible to run on some embedded systems. In this work, we explore several stack reduction strategies and the resulting time versus memory trade-offs. Concretely, we reduce the stack consumption of FrodoKEM by a factor 2–3x compared to the smallest known implementations with almost no impact on performance. We also present various time-memory trade-offs going as low as 8 kB for all AES parameter sets, and below 4 kB for FrodoKEM-640. By introducing a minor tweak to the FrodoKEM specifications, we additionally reduce the stack consumption down to 8 kB for all the SHAKE versions. As a result, this work enables FrodoKEM on more resource constrained embedded systems.

FrodoKEM是一种基于非结构化格的基于格的密钥封装机制。从安全的角度来看，这使得它成为实现后量子安全的保守选择，因此它受到几个欧洲当局(例如，德国BSI和法国ANSSI)的青睐。依赖于非结构化而不是结构化的网格(例如，crystal - kyber)是以额外的内存使用为代价的，这对于嵌入式安全应用(如智能卡)尤其重要。例如，以前在Cortex-M4上实现的frodokom -640(使用AES)需要超过80kb的堆栈，这使得它无法在某些嵌入式系统上运行。在这项工作中，我们探讨了几种堆栈减少策略以及由此产生的时间与内存权衡。具体地说，与最小的已知实现相比，我们将FrodoKEM的堆栈消耗减少了2 - 3倍，而对性能几乎没有影响。我们还提供了各种时间-内存权衡，对于所有AES参数集低至8 kB，对于frodokemo -640低于4 kB。通过对FrodoKEM规范进行微调，我们还将所有SHAKE版本的堆栈消耗降低到8 kB。因此，这项工作使FrodoKEM能够在更多资源受限的嵌入式系统上运行。

{"title":"Enabling FrodoKEM on Embedded Devices","authors":"Joppe W. Bos, Olivier Bronchain, F. Custers, Joost Renes, Denise Verbakel, C. V. Vredendaal","doi":"10.46586/tches.v2023.i3.74-96","DOIUrl":"https://doi.org/10.46586/tches.v2023.i3.74-96","url":null,"abstract":"FrodoKEM is a lattice-based Key Encapsulation Mechanism (KEM) based on unstructured lattices. From a security point of view this makes it a conservative option to achieve post-quantum security, hence why it is favored by several European authorities (e.g., German BSI and French ANSSI). Relying on unstructured instead of structured lattices (e.g., CRYSTALS-Kyber) comes at the cost of additional memory usage, which is particularly critical for embedded security applications such as smart cards. For example, prior FrodoKEM-640 implementations (using AES) on Cortex-M4 require more than 80 kB of stack making it impossible to run on some embedded systems. In this work, we explore several stack reduction strategies and the resulting time versus memory trade-offs. Concretely, we reduce the stack consumption of FrodoKEM by a factor 2–3x compared to the smallest known implementations with almost no impact on performance. We also present various time-memory trade-offs going as low as 8 kB for all AES parameter sets, and below 4 kB for FrodoKEM-640. By introducing a minor tweak to the FrodoKEM specifications, we additionally reduce the stack consumption down to 8 kB for all the SHAKE versions. As a result, this work enables FrodoKEM on more resource constrained embedded systems.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"22 1","pages":"74-96"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73788248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Faster Montgomery multiplication and Multi-Scalar-Multiplication for SNARKs 更快的蒙哥马利乘法和多标量乘法为snark

IACR Trans. Cryptogr. Hardw. Embed. Syst.

Pub Date : 2023-06-09 DOI: 10.46586/tches.v2023.i3.504-521

G. Botrel, Youssef El Housni

The bottleneck in the proving algorithm of most of elliptic-curve-based SNARK proof systems is the Multi-Scalar-Multiplication (MSM) algorithm. In this paper we give an overview of a variant of the Pippenger MSM algorithm together with a set of optimizations tailored for curves that admit a twisted Edwards form. We prove that this is the case for SNARK-friendly chains and cycles of elliptic curves, which are useful for recursive constructions. Our contribution is twofold: first, we optimize the arithmetic of finite fields by improving on the well-known Coarsely Integrated Operand Scanning (CIOS) modular multiplication. This is a contribution of independent interest that applies to many different contexts. Second, we propose a new coordinate system for twisted Edwards curves tailored for the Pippenger MSM algorithm.Accelerating the MSM over these curves is critical for deployment of recursive proof< systems applications such as proof-carrying-data, blockchain rollups and blockchain light clients. We implement our work in Go and benchmark it on two different CPU architectures (x86 and arm64). We show that our implementation achieves a 40-47% speedup over the state-of-the-art implementation (which was implemented in Rust). This MSM implementation won the first place in the ZPrize competition in the open division “Accelerating MSM on Mobile” and will be deployed in two real-world applications: Linea zkEVM by ConsenSys and probably Celo network.

大多数基于椭圆曲线的SNARK证明系统的证明算法的瓶颈是多标量乘法(MSM)算法。在本文中，我们概述了Pippenger MSM算法的一个变体，并给出了一组针对扭曲Edwards形式曲线的优化。我们证明了椭圆曲线的snark友好链和环的情况，这对递归构造是有用的。我们的贡献是双重的:首先，我们通过改进众所周知的粗集成操作数扫描(CIOS)模块乘法来优化有限域的算法。这是一种适用于许多不同背景的独立兴趣贡献。其次，针对Pippenger MSM算法，提出了一种新的扭曲Edwards曲线坐标系。在这些曲线上加速MSM对于部署递归证明系统应用程序至关重要，例如证明携带数据，区块链卷和区块链轻客户端。我们在Go语言中实现我们的工作，并在两种不同的CPU架构(x86和arm64)上对其进行基准测试。我们展示了我们的实现比最先进的实现(在Rust中实现)实现了40-47%的加速。这个MSM实现赢得了ZPrize竞赛“加速移动MSM”公开组的第一名，并将部署在两个实际应用中:ConsenSys的Linea zkEVM和可能的Celo网络。

{"title":"Faster Montgomery multiplication and Multi-Scalar-Multiplication for SNARKs","authors":"G. Botrel, Youssef El Housni","doi":"10.46586/tches.v2023.i3.504-521","DOIUrl":"https://doi.org/10.46586/tches.v2023.i3.504-521","url":null,"abstract":"The bottleneck in the proving algorithm of most of elliptic-curve-based SNARK proof systems is the Multi-Scalar-Multiplication (MSM) algorithm. In this paper we give an overview of a variant of the Pippenger MSM algorithm together with a set of optimizations tailored for curves that admit a twisted Edwards form. We prove that this is the case for SNARK-friendly chains and cycles of elliptic curves, which are useful for recursive constructions. Our contribution is twofold: first, we optimize the arithmetic of finite fields by improving on the well-known Coarsely Integrated Operand Scanning (CIOS) modular multiplication. This is a contribution of independent interest that applies to many different contexts. Second, we propose a new coordinate system for twisted Edwards curves tailored for the Pippenger MSM algorithm.Accelerating the MSM over these curves is critical for deployment of recursive proof< systems applications such as proof-carrying-data, blockchain rollups and blockchain light clients. We implement our work in Go and benchmark it on two different CPU architectures (x86 and arm64). We show that our implementation achieves a 40-47% speedup over the state-of-the-art implementation (which was implemented in Rust). This MSM implementation won the first place in the ZPrize competition in the open division “Accelerating MSM on Mobile” and will be deployed in two real-world applications: Linea zkEVM by ConsenSys and probably Celo network.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"35 1","pages":"504-521"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84569799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Formally verifying Kyber Episode IV: Implementation correctness 正式验证Kyber第4集:实现正确性

IACR Trans. Cryptogr. Hardw. Embed. Syst.

Pub Date : 2023-06-09 DOI: 10.46586/tches.v2023.i3.164-193

J. Almeida, M. Barbosa, G. Barthe, B. Grégoire, Vincent Laporte, Jean-Christophe Léchenet, Tiago Oliveira, Hugo Pacheco, Miguel Quaresma, P. Schwabe, Antoine Séré, Pierre-Yves Strub

In this paper we present the first formally verified implementations of Kyber and, to the best of our knowledge, the first such implementations of any post-quantum cryptosystem. We give a (readable) formal specification of Kyber in the EasyCrypt proof assistant, which is syntactically very close to the pseudocode description of the scheme as given in the most recent version of the NIST submission. We present high-assurance open-source implementations of Kyber written in the Jasmin language, along with machine-checked proofs that they are functionally correct with respect to the EasyCrypt specification. We describe a number of improvements to the EasyCrypt and Jasmin frameworks that were needed for this implementation and verification effort, and we present detailed benchmarks of our implementations, showing that our code achieves performance close to existing hand-optimized implementations in C and assembly.

在本文中，我们提出了Kyber的第一个正式验证的实现，据我们所知，这是任何后量子密码系统的第一个这样的实现。我们在EasyCrypt证明助手中给出了Kyber的(可读的)正式规范，它在语法上非常接近NIST提交的最新版本中给出的方案的伪代码描述。我们提供了用Jasmin语言编写的Kyber的高保证开源实现，以及机器检查的证明，证明它们在EasyCrypt规范的功能上是正确的。我们描述了对EasyCrypt和Jasmin框架的一些改进，这些改进是实现和验证工作所需要的，我们提供了实现的详细基准，表明我们的代码达到了接近现有的C和汇编手工优化实现的性能。

引用次数: 2

JitSCA: Jitter-based Side-Channel Analysis in Picoscale Resolution JitSCA:微尺度分辨率下基于抖动的侧通道分析

IACR Trans. Cryptogr. Hardw. Embed. Syst.

Pub Date : 2023-06-09 DOI: 10.46586/tches.v2023.i3.294-320

Kai Schoos, Sergej Meschkov, M. Tahoori, Dennis R. E. Gnad

In safety and security conscious environments, isolated communication channels are often deemed necessary. Galvanically isolated communication channels are typically expected not to allow physical side-channel attacks through that channel. However, in this paper, we show that they can inadvertently leak side channel information in the form of minuscule jitter on the communication signal. We observe worst-case signal jitter within 54 ± 45 ps using an FPGA-based receiver employing a time-to-digital converter (TDC), which is a higher time resolution than a typical oscilloscope can measure, while in many other systems such measurements are also possible. A transmitter device runs a cryptographic accelerator, while we connect an FPGA on the receiver side and measure the signal jitter using a TDC. We can indeed show sufficient side-channel leakage in the jitter of the signal by performing a key recovery of an AES accelerator running on the transmitter. Furthermore, we compare this leakage to a power side channel also measured with a TDC and prove that the timing jitter alone contains sufficient side-channel information. While for an on-chip power analysis attack about 27k traces are needed for key recovery, our cross-device jitter-based attack only needs as few as 47k traces, depending on the setup. Galvanic isolation does not change that significantly. That is an increase by only 1.7x, showing that fine-grained jitter timing information can be a very potent attack vector even under galvanic isolation. In summary, we introduce a new side-channel attack vector that can leak information in many presumably secure systems. Communication channels can inadvertently leak information through tiny timing variations, known as signal jitter. This could affect millions of devices and needs to be considered.

在安全和安全意识很强的环境中，隔离的通信通道通常被认为是必要的。电隔离通信通道通常不允许通过该通道进行物理侧通道攻击。然而，在本文中，我们表明它们可以在通信信号上以微小抖动的形式无意中泄漏侧信道信息。我们使用采用时间-数字转换器(TDC)的基于fpga的接收器观察到54±45 ps的最坏情况信号抖动，这比典型示波器可以测量的时间分辨率更高，而在许多其他系统中也可以进行此类测量。发送设备运行加密加速器，而我们在接收端连接FPGA并使用TDC测量信号抖动。我们确实可以通过执行运行在发射机上的AES加速器的密钥恢复，在信号的抖动中显示足够的侧信道泄漏。此外，我们将此泄漏与也用TDC测量的功率侧通道进行比较，并证明时序抖动本身包含足够的侧通道信息。对于片上功率分析攻击，密钥恢复需要大约27k走线，而我们基于跨设备抖动的攻击只需要47k走线，具体取决于设置。电流隔离不会显著改变这一点。这只增加了1.7倍，表明即使在电流隔离下，细粒度的抖动定时信息也可能是一个非常有效的攻击向量。总之，我们引入了一种新的侧信道攻击向量，它可以在许多可能安全的系统中泄露信息。通信信道可能会通过微小的时序变化(即信号抖动)无意中泄露信息。这可能会影响数以百万计的设备，需要加以考虑。

{"title":"JitSCA: Jitter-based Side-Channel Analysis in Picoscale Resolution","authors":"Kai Schoos, Sergej Meschkov, M. Tahoori, Dennis R. E. Gnad","doi":"10.46586/tches.v2023.i3.294-320","DOIUrl":"https://doi.org/10.46586/tches.v2023.i3.294-320","url":null,"abstract":"In safety and security conscious environments, isolated communication channels are often deemed necessary. Galvanically isolated communication channels are typically expected not to allow physical side-channel attacks through that channel. However, in this paper, we show that they can inadvertently leak side channel information in the form of minuscule jitter on the communication signal. We observe worst-case signal jitter within 54 ± 45 ps using an FPGA-based receiver employing a time-to-digital converter (TDC), which is a higher time resolution than a typical oscilloscope can measure, while in many other systems such measurements are also possible. A transmitter device runs a cryptographic accelerator, while we connect an FPGA on the receiver side and measure the signal jitter using a TDC. We can indeed show sufficient side-channel leakage in the jitter of the signal by performing a key recovery of an AES accelerator running on the transmitter. Furthermore, we compare this leakage to a power side channel also measured with a TDC and prove that the timing jitter alone contains sufficient side-channel information. While for an on-chip power analysis attack about 27k traces are needed for key recovery, our cross-device jitter-based attack only needs as few as 47k traces, depending on the setup. Galvanic isolation does not change that significantly. That is an increase by only 1.7x, showing that fine-grained jitter timing information can be a very potent attack vector even under galvanic isolation. In summary, we introduce a new side-channel attack vector that can leak information in many presumably secure systems. Communication channels can inadvertently leak information through tiny timing variations, known as signal jitter. This could affect millions of devices and needs to be considered.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"74 1","pages":"294-320"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82196665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RAFA: Redundancies-assisted Algebraic Fault Analysis and its implementation on SPN block ciphers 冗余辅助代数故障分析及其在SPN分组密码中的实现

IACR Trans. Cryptogr. Hardw. Embed. Syst.

Pub Date : 2023-06-09 DOI: 10.46586/tches.v2023.i3.570-596

Zehong Qiu, Fan Zhang

Algebraic Fault Analysis (AFA) is a cryptanalysis for block ciphers proposed by Courtois et al., which incorporates algebraic cryptanalysis to overcome the complexity of manual analysis within the context of Differential Fault Analysis (DFA). The effectiveness of AFA on lightweight block ciphers has been demonstrated. However, the complexity of the algebraic systems prevents it from attacking heavyweight block ciphers efficiently. In this paper, we propose a novel cryptanalysis called Redundancies-assisted Algebraic Fault Analysis (RAFA) to facilitate the solution of algebraic systems in the setting of heavyweight block ciphers. The core idea of RAFA is to expedite SAT solvers by modifying the algebraic systems, which is accomplished via two methods. The first method introduces redundant constraints, which is proposed for the first time in the context of algebraic cryptanalysis. The second one is a sophisticated linearization of the nonlinear Algebraic Normal Form (ANF). It takes RAFA for about 9.68 hours to attack AES-128. To the best of our knowledge, this is the first work that uses a general SAT solver to attack AES with only a single injection of byte-fault. Moreover, RAFA can attack AES-128 in 50.92 and 27.54 minutes for nibble- and bit-based fault model, respectively. In comparison, the traditional DFA algorithm implemented by pure C takes 4 ~ 5 hours under all three fault models investigated in this work. Moreover, in order to show the generality of RAFA, we also apply it to other heavyweight block ciphers. The best results show that RAFA could recover the key of Serpent-256 and SPEEDY-r-192 in 20.7 and 1.5 hours using only three faults, respectively. In comparison, AFA could not break these two ciphers even when 30 bits and 50 bits of their keys are known, respectively. Furthermore, no DFA work on Serpent or SPEEDY is known using comparable fault models.

代数故障分析(Algebraic Fault Analysis, AFA)是由Courtois等人提出的一种分组密码分析方法，它结合了代数密码分析，克服了差分故障分析(Differential Fault Analysis, DFA)中人工分析的复杂性。在轻量级分组密码中证明了AFA的有效性。然而，代数系统的复杂性使其无法有效地攻击重量级分组密码。在本文中，我们提出了一种新的密码分析方法，称为冗余辅助代数故障分析(RAFA)，以方便求解重量级分组密码设置中的代数系统。RAFA的核心思想是通过修改代数系统来加快SAT求解，这是通过两种方法来实现的。第一种方法引入冗余约束，首次在代数密码分析领域提出。第二种是非线性代数范式(ANF)的复杂线性化。英国空军攻击AES-128大约需要9.68小时。据我们所知，这是第一个使用通用SAT求解器仅通过单个字节错误注入攻击AES的工作。基于咬点和比特的故障模型，RAFA攻击AES-128的时间分别为50.92分钟和27.54分钟。相比之下，在本文研究的三种故障模型下，纯C实现的传统DFA算法需要4 ~ 5个小时。此外，为了显示RAFA的通用性，我们还将其应用于其他重量级分组密码。结果表明，在3次故障情况下，RAFA分别在20.7小时和1.5小时内恢复了snake -256和speed -r-192的密钥。相比之下，即使这两个密码的密钥分别已知30位和50位，AFA也无法破解。此外，使用可比较的故障模型对Serpent或SPEEDY进行的DFA工作是未知的。

{"title":"RAFA: Redundancies-assisted Algebraic Fault Analysis and its implementation on SPN block ciphers","authors":"Zehong Qiu, Fan Zhang","doi":"10.46586/tches.v2023.i3.570-596","DOIUrl":"https://doi.org/10.46586/tches.v2023.i3.570-596","url":null,"abstract":"Algebraic Fault Analysis (AFA) is a cryptanalysis for block ciphers proposed by Courtois et al., which incorporates algebraic cryptanalysis to overcome the complexity of manual analysis within the context of Differential Fault Analysis (DFA). The effectiveness of AFA on lightweight block ciphers has been demonstrated. However, the complexity of the algebraic systems prevents it from attacking heavyweight block ciphers efficiently. In this paper, we propose a novel cryptanalysis called Redundancies-assisted Algebraic Fault Analysis (RAFA) to facilitate the solution of algebraic systems in the setting of heavyweight block ciphers. The core idea of RAFA is to expedite SAT solvers by modifying the algebraic systems, which is accomplished via two methods. The first method introduces redundant constraints, which is proposed for the first time in the context of algebraic cryptanalysis. The second one is a sophisticated linearization of the nonlinear Algebraic Normal Form (ANF). It takes RAFA for about 9.68 hours to attack AES-128. To the best of our knowledge, this is the first work that uses a general SAT solver to attack AES with only a single injection of byte-fault. Moreover, RAFA can attack AES-128 in 50.92 and 27.54 minutes for nibble- and bit-based fault model, respectively. In comparison, the traditional DFA algorithm implemented by pure C takes 4 ~ 5 hours under all three fault models investigated in this work. Moreover, in order to show the generality of RAFA, we also apply it to other heavyweight block ciphers. The best results show that RAFA could recover the key of Serpent-256 and SPEEDY-r-192 in 20.7 and 1.5 hours using only three faults, respectively. In comparison, AFA could not break these two ciphers even when 30 bits and 50 bits of their keys are known, respectively. Furthermore, no DFA work on Serpent or SPEEDY is known using comparable fault models.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"33 1","pages":"570-596"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81138040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs 基于gpu的并行多标量乘法算法加速零知识证明

IACR Trans. Cryptogr. Hardw. Embed. Syst.

Pub Date : 2023-06-09 DOI: 10.46586/tches.v2023.i3.194-220

Tao Lu, Chengkun Wei, Ruijing Yu, Yi Chen, L. xilinx Wang, Chaochao Chen, Zeke Wang, Wenzhi Chen

Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 has a high overhead on its proof generation step, which consists of several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), and multi-scalar multiplication (MSM). Therefore, this paper presents cuZK, an efficient GPU implementation of zkSNARK with the following three techniques to achieve high performance. First, we propose a new parallel MSM algorithm. This MSM algorithm achieves nearly perfect linear speedup over the Pippenger algorithm, a well-known serial MSM algorithm. Second, we parallelize the MUL operation. Along with our self-designed MSM scheme and well-studied NTT scheme, cuZK achieves the parallelization of all operations in the proof generation step. Third, cuZK reduces the latency overhead caused by CPU-GPU data transfer by 1) reducing redundant data transfer and 2) overlapping data transfer and device computation. The evaluation results show that our MSM module provides over 2.08x (up to 2.94x) speedup versus the state-of-the-art GPU implementation. cuZK achieves over 2.65x (up to 4.86x) speedup on standard benchmarks and 2.18× speedup on a GPU-accelerated cryptocurrency application, Filecoin.

零知识证明是一个关键的密码学原语。其最实用的类型被称为零知识简洁非交互式知识论证(zkSNARK)，已部署在各种隐私保护应用程序中，如加密货币和可验证的机器学习。不幸的是，与Groth16一样，zkSNARK在其证明生成步骤上有很高的开销，该步骤由几个耗时的操作组成，包括大规模矩阵向量乘法(MUL)、数论变换(NTT)和多标量乘法(MSM)。因此，本文提出了一种基于zkSNARK的高效GPU实现cuZK，通过以下三种技术来实现高性能。首先，提出了一种新的并行MSM算法。该算法比Pippenger算法(一种著名的串行MSM算法)实现了近乎完美的线性加速。其次，我们并行化MUL操作。cuZK结合我们自己设计的MSM方案和经过充分研究的NTT方案，实现了证明生成步骤中所有操作的并行化。第三，cuZK通过1)减少冗余数据传输和2)重叠数据传输和设备计算来减少CPU-GPU数据传输带来的延迟开销。评估结果表明，与最先进的GPU实现相比，我们的MSM模块提供了超过2.08倍(最高2.94倍)的加速。cuZK在标准基准上实现了超过2.65倍(最高4.86倍)的加速，在gpu加速的加密货币应用程序Filecoin上实现了2.18倍的加速。

{"title":"cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs","authors":"Tao Lu, Chengkun Wei, Ruijing Yu, Yi Chen, L. xilinx Wang, Chaochao Chen, Zeke Wang, Wenzhi Chen","doi":"10.46586/tches.v2023.i3.194-220","DOIUrl":"https://doi.org/10.46586/tches.v2023.i3.194-220","url":null,"abstract":"Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 has a high overhead on its proof generation step, which consists of several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), and multi-scalar multiplication (MSM). Therefore, this paper presents cuZK, an efficient GPU implementation of zkSNARK with the following three techniques to achieve high performance. First, we propose a new parallel MSM algorithm. This MSM algorithm achieves nearly perfect linear speedup over the Pippenger algorithm, a well-known serial MSM algorithm. Second, we parallelize the MUL operation. Along with our self-designed MSM scheme and well-studied NTT scheme, cuZK achieves the parallelization of all operations in the proof generation step. Third, cuZK reduces the latency overhead caused by CPU-GPU data transfer by 1) reducing redundant data transfer and 2) overlapping data transfer and device computation. The evaluation results show that our MSM module provides over 2.08x (up to 2.94x) speedup versus the state-of-the-art GPU implementation. cuZK achieves over 2.65x (up to 4.86x) speedup on standard benchmarks and 2.18× speedup on a GPU-accelerated cryptocurrency application, Filecoin.","PeriodicalId":13186,"journal":{"name":"IACR Trans. Cryptogr. Hardw. Embed. Syst.","volume":"40 1","pages":"194-220"},"PeriodicalIF":0.0,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83279249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3