Gaëtan Cassiers, Loïc Masure, Charles Momin, Thorben Moos, A. Moradi, François-Xavier Standaert
{"title":"Randomness Generation for Secure Hardware Masking - Unrolled Trivium to the Rescue","authors":"Gaëtan Cassiers, Loïc Masure, Charles Momin, Thorben Moos, A. Moradi, François-Xavier Standaert","doi":"10.62056/akdkp2fgx","DOIUrl":null,"url":null,"abstract":"Masking is a prominent strategy to protect cryptographic implementations against side-channel analysis. Its popularity arises from the exponential security gains that can be achieved for (approximately) quadratic resource utilization. Many variants of the countermeasure tailored for different optimization goals have been proposed. The common denominator among all of them is the implicit demand for robust and high entropy randomness. Simply assuming that uniformly distributed random bits are available, without taking the cost of their generation into account, leads to a poor understanding of the efficiency vs. security tradeoff of masked implementations. This is especially relevant in case of hardware masking schemes which are known to consume large amounts of random bits per cycle due to parallelism. Currently, there seems to be no consensus on how to most efficiently derive many pseudo-random bits per clock cycle from an initial seed and with properties suitable for masked hardware implementations. In this work, we evaluate a number of building blocks for this purpose and find that hardware-oriented stream ciphers like Trivium and its reduced-security variant Bivium B outperform most competitors when implemented in an unrolled fashion. Unrolled implementations of these primitives enable the flexible generation of many bits per cycle, which is crucial for satisfying the large randomness demands of state-of-the-art masking schemes. According to our analysis, only Linear Feedback Shift Registers (LFSRs), when also unrolled, are capable of producing long non-repetitive sequences of random-looking bits at a higher rate per cycle for the same or lower cost as Trivium and Bivium B. Yet, these instances do not provide black-box security as they generate only linear outputs. We experimentally demonstrate that using multiple output bits from an LFSR in the same masked implementation can violate probing security and even lead to harmful randomness cancellations. Circumventing these problems, and enabling an independent analysis of randomness generation and masking, requires the use of cryptographically stronger primitives like stream ciphers. As a result of our studies, we provide an evidence-based estimate for the cost of securely generating \n \n n\n \n fresh random bits per cycle. Depending on the desired level of black-box security and operating frequency, this cost can be as low as \n \n 20\n n\n \n to \n \n 30\n n\n \n ASIC gate equivalents (GE) or \n \n 3\n n\n \n to \n \n 4\n n\n \n FPGA look-up tables (LUTs), where \n \n n\n \n is the number of random bits required. Our results demonstrate that the cost per bit is (sometimes significantly) lower than estimated in previous works, incentivizing parallelism whenever exploitable. This provides further motivation to potentially move low randomness usage from a primary to a secondary design goal in hardware masking research.","PeriodicalId":13158,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":" 1236","pages":"1134"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IACR Cryptol. ePrint Arch.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.62056/akdkp2fgx","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Masking is a prominent strategy to protect cryptographic implementations against side-channel analysis. Its popularity arises from the exponential security gains that can be achieved for (approximately) quadratic resource utilization. Many variants of the countermeasure tailored for different optimization goals have been proposed. The common denominator among all of them is the implicit demand for robust and high entropy randomness. Simply assuming that uniformly distributed random bits are available, without taking the cost of their generation into account, leads to a poor understanding of the efficiency vs. security tradeoff of masked implementations. This is especially relevant in case of hardware masking schemes which are known to consume large amounts of random bits per cycle due to parallelism. Currently, there seems to be no consensus on how to most efficiently derive many pseudo-random bits per clock cycle from an initial seed and with properties suitable for masked hardware implementations. In this work, we evaluate a number of building blocks for this purpose and find that hardware-oriented stream ciphers like Trivium and its reduced-security variant Bivium B outperform most competitors when implemented in an unrolled fashion. Unrolled implementations of these primitives enable the flexible generation of many bits per cycle, which is crucial for satisfying the large randomness demands of state-of-the-art masking schemes. According to our analysis, only Linear Feedback Shift Registers (LFSRs), when also unrolled, are capable of producing long non-repetitive sequences of random-looking bits at a higher rate per cycle for the same or lower cost as Trivium and Bivium B. Yet, these instances do not provide black-box security as they generate only linear outputs. We experimentally demonstrate that using multiple output bits from an LFSR in the same masked implementation can violate probing security and even lead to harmful randomness cancellations. Circumventing these problems, and enabling an independent analysis of randomness generation and masking, requires the use of cryptographically stronger primitives like stream ciphers. As a result of our studies, we provide an evidence-based estimate for the cost of securely generating
n
fresh random bits per cycle. Depending on the desired level of black-box security and operating frequency, this cost can be as low as
20
n
to
30
n
ASIC gate equivalents (GE) or
3
n
to
4
n
FPGA look-up tables (LUTs), where
n
is the number of random bits required. Our results demonstrate that the cost per bit is (sometimes significantly) lower than estimated in previous works, incentivizing parallelism whenever exploitable. This provides further motivation to potentially move low randomness usage from a primary to a secondary design goal in hardware masking research.
掩码是保护加密实现免受侧信道分析的一种重要策略。它之所以广受欢迎,是因为在(近似)二次资源利用率的情况下,可以实现指数级的安全增益。针对不同的优化目标,人们提出了许多对策变体。所有这些变体的共同点是隐含着对鲁棒性和高熵随机性的需求。简单地假定均匀分布的随机比特是可用的,而不考虑其生成成本,会导致对掩码实现的效率与安全权衡认识不清。这一点与硬件掩码方案尤其相关,众所周知,硬件掩码方案由于并行性,每个周期会消耗大量随机比特。目前,对于如何在每个时钟周期内从初始种子最高效地获得大量伪随机比特,并使其具有适合掩码硬件实现的特性,似乎还没有达成共识。在这项工作中,我们评估了一些用于此目的的构件,发现面向硬件的流密码(如 Trivium 及其安全性较低的变体 Bivium B)在以解卷方式实现时优于大多数竞争对手。这些基元的解卷实现可以灵活地在每个周期生成许多比特,这对于满足最先进的掩码方案的大随机性要求至关重要。根据我们的分析,只有线性反馈移位寄存器(LFSRs)在未卷化的情况下,能够以更高的速率在每个周期生成冗长的非重复随机比特序列,成本与 Trivium 和 Bivium B 相同或更低。我们通过实验证明,在同一屏蔽实现中使用来自 LFSR 的多个输出位会违反探测安全性,甚至导致有害的随机性抵消。要规避这些问题,并对随机性生成和掩码进行独立分析,需要使用流密码等密码学上更强大的基元。通过研究,我们对每个周期安全生成 n 个新随机比特的成本进行了基于证据的估算。根据所需的黑盒安全级别和工作频率,这一成本可低至 20 n 至 30 n ASIC 门当量(GE)或 3 n 至 4 n FPGA 查找表(LUT),其中 n 为所需的随机比特数。我们的研究结果表明,每比特的成本比以前的研究估计要低(有时低得多),这就鼓励了并行性的利用。这进一步推动了在硬件掩码研究中,将低随机性使用从主要设计目标转变为次要设计目标。