Pub Date : 2024-07-24DOI: 10.1007/s13389-024-00359-z
Antonio Guimarães, E. Borin, Diego F. Aranha
{"title":"MOSFHET: Optimized Software for FHE over the Torus","authors":"Antonio Guimarães, E. Borin, Diego F. Aranha","doi":"10.1007/s13389-024-00359-z","DOIUrl":"https://doi.org/10.1007/s13389-024-00359-z","url":null,"abstract":"","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":"33 6","pages":"515"},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141810103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.46586/tches.v2024.i3.249-272
Dilip Kumar, Siemen Dhooghe, J. Balasch, Benedikt Gierlichs, Ingrid Verbauwhede
We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitchextended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools without sacrificing security. We provide concrete results of several case studies. Our low-latency implementation of a complete PRINCE core shows a 32% area improvement (44% with optimization) over the state-of-the-art. Our PRINCE S-Box passes formal verification with a tool and the complete core on FPGA shows no first-order leakage in TVLA with 100 million traces. Our low-latency implementation of the AES S-Box costs roughly one third (one quarter with optimization) of the area of state-of-the-art implementations. It shows no first-order leakage in TVLA with 250 million traces.
{"title":"Time Sharing - A Novel Approach to Low-Latency Masking","authors":"Dilip Kumar, Siemen Dhooghe, J. Balasch, Benedikt Gierlichs, Ingrid Verbauwhede","doi":"10.46586/tches.v2024.i3.249-272","DOIUrl":"https://doi.org/10.46586/tches.v2024.i3.249-272","url":null,"abstract":"We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitchextended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools without sacrificing security. We provide concrete results of several case studies. Our low-latency implementation of a complete PRINCE core shows a 32% area improvement (44% with optimization) over the state-of-the-art. Our PRINCE S-Box passes formal verification with a tool and the complete core on FPGA shows no first-order leakage in TVLA with 100 million traces. Our low-latency implementation of the AES S-Box costs roughly one third (one quarter with optimization) of the area of state-of-the-art implementations. It shows no first-order leakage in TVLA with 250 million traces.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":" 8","pages":"925"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141825699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.46586/tches.v2024.i3.1-24
Alex Charlès, A. Udovenko
This work proposes a new white-box attack technique called filtering, which can be combined with any other trace-based attack method. The idea is to filter the traces based on the value of an intermediate variable in the implementation, aiming to fix a share of a sensitive value and degrade the security of an involved masking scheme.Coupled with LDA (filtered LDA, FLDA), it leads to an attack defeating the state-ofthe-art SEL masking scheme (CHES 2021) of arbitrary degree and number of linear shares with quartic complexity in the window size. In comparison, the current best attacks have exponential complexities in the degree (higher degree decoding analysis, HDDA), in the number of linear shares (higher-order differential computation analysis, HODCA), or the window size (white-box learning parity with noise, WBLPN). The attack exploits the key idea of the SEL scheme - an efficient parallel combination of the nonlinear and linear masking schemes. We conclude that a proper composition of masking schemes is essential for security.In addition, we propose several optimizations for linear algebraic attacks: redundant node removal (RNR), optimized parity check matrix usage, and chosen-plaintext filtering (CPF), significantly improving the performance of security evaluation of white-box implementations.
这项工作提出了一种新的白盒攻击技术,称为过滤,它可以与任何其他基于轨迹的攻击方法相结合。该技术与 LDA(过滤 LDA,FLDA)相结合,可产生一种攻击方法,可击败任意程度和线性份额数的最先进 SEL 屏蔽方案(CHES 2021),其窗口大小的复杂度为四次方。相比之下,目前最好的攻击在阶数(高阶解码分析,HDDA)、线性份额数(高阶差分计算分析,HODCA)或窗口大小(带噪声的白盒学习奇偶校验,WBLPN)方面的复杂度都是指数级的。这种攻击利用了 SEL 方案的关键理念--非线性和线性掩码方案的高效并行组合。此外,我们还针对线性代数攻击提出了几种优化方案:冗余节点移除(RNR)、优化奇偶校验矩阵使用和选择平文过滤(CPF),从而显著提高了白盒实现的安全评估性能。
{"title":"White-box filtering attacks breaking SEL masking: from exponential to polynomial time","authors":"Alex Charlès, A. Udovenko","doi":"10.46586/tches.v2024.i3.1-24","DOIUrl":"https://doi.org/10.46586/tches.v2024.i3.1-24","url":null,"abstract":"This work proposes a new white-box attack technique called filtering, which can be combined with any other trace-based attack method. The idea is to filter the traces based on the value of an intermediate variable in the implementation, aiming to fix a share of a sensitive value and degrade the security of an involved masking scheme.Coupled with LDA (filtered LDA, FLDA), it leads to an attack defeating the state-ofthe-art SEL masking scheme (CHES 2021) of arbitrary degree and number of linear shares with quartic complexity in the window size. In comparison, the current best attacks have exponential complexities in the degree (higher degree decoding analysis, HDDA), in the number of linear shares (higher-order differential computation analysis, HODCA), or the window size (white-box learning parity with noise, WBLPN). The attack exploits the key idea of the SEL scheme - an efficient parallel combination of the nonlinear and linear masking schemes. We conclude that a proper composition of masking schemes is essential for security.In addition, we propose several optimizations for linear algebraic attacks: redundant node removal (RNR), optimized parity check matrix usage, and chosen-plaintext filtering (CPF), significantly improving the performance of security evaluation of white-box implementations.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":" 37","pages":"691"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141825507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.46586/tches.v2024.i3.302-341
Nicolas Bon, David Pointcheval, Matthieu Rivain
We propose a new framework to homomorphically evaluate Boolean functions using the Torus Fully Homomorphic Encryption (TFHE) scheme. Compared to previous approaches focusing on Boolean gates, our technique can evaluate more complex Boolean functions with several inputs using a single bootstrapping. This allows us to greatly reduce the number of bootstrapping operations necessary to evaluate a Boolean circuit compared to previous works, thus achieving significant improvements in terms of performances. We define theoretically our approach which consists in adding an intermediate homomorphic layer between the plain Boolean space and the ciphertext space. This layer relies on so-called p-encodings embedding bits into Zp. We analyze the properties of these encodings to enable the evaluation of a given Boolean function and provide a deterministic algorithm (as well as an efficient heuristic) to find valid sets of encodings for a given function. We also propose a method to decompose any Boolean circuit into Boolean functions which are efficiently evaluable using our approach. We apply our framework to homomorphically evaluate various cryptographic primitives, and in particular the AES cipher. Our implementation results show significant improvements compared to the state of the art.
我们提出了一种利用环完全同态加密(TFHE)方案对布尔函数进行同态评估的新框架。与以往专注于布尔门的方法相比,我们的技术只需一次引导,就能评估具有多个输入的更复杂布尔函数。与以前的方法相比,我们可以大大减少评估布尔电路所需的引导操作次数,从而显著提高性能。我们从理论上定义了我们的方法,即在纯布尔空间和密码文本空间之间添加一个中间同态层。这一层依赖于将比特嵌入 Zp 的所谓 p 编码。我们分析了这些编码的特性,以评估给定的布尔函数,并提供了一种确定性算法(以及一种高效的启发式算法),为给定函数找到有效的编码集。我们还提出了一种将任何布尔电路分解为布尔函数的方法,使用我们的方法可以高效地评估这些布尔函数。我们将我们的框架应用于同态评估各种加密原语,特别是 AES 密码。我们的实现结果表明,与目前的技术水平相比,我们的方法有了明显的改进。
{"title":"Optimized Homomorphic Evaluation of Boolean Functions","authors":"Nicolas Bon, David Pointcheval, Matthieu Rivain","doi":"10.46586/tches.v2024.i3.302-341","DOIUrl":"https://doi.org/10.46586/tches.v2024.i3.302-341","url":null,"abstract":"We propose a new framework to homomorphically evaluate Boolean functions using the Torus Fully Homomorphic Encryption (TFHE) scheme. Compared to previous approaches focusing on Boolean gates, our technique can evaluate more complex Boolean functions with several inputs using a single bootstrapping. This allows us to greatly reduce the number of bootstrapping operations necessary to evaluate a Boolean circuit compared to previous works, thus achieving significant improvements in terms of performances. We define theoretically our approach which consists in adding an intermediate homomorphic layer between the plain Boolean space and the ciphertext space. This layer relies on so-called p-encodings embedding bits into Zp. We analyze the properties of these encodings to enable the evaluation of a given Boolean function and provide a deterministic algorithm (as well as an efficient heuristic) to find valid sets of encodings for a given function. We also propose a method to decompose any Boolean circuit into Boolean functions which are efficiently evaluable using our approach. We apply our framework to homomorphically evaluate various cryptographic primitives, and in particular the AES cipher. Our implementation results show significant improvements compared to the state of the art.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":" 32","pages":"1589"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141825558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Number Theoretic Transform (NTT) has been widely used in accelerating computations in lattice-based cryptography. However, attackers can potentially launch power analysis targeting the NTT because it is one of the most time-consuming parts of the implementation. This extended time frame provides a natural window of opportunity for attackers. In this paper, we investigate the first CPU frequency leakage (Hertzbleed-like) attacks against NTT in lattice-based KEMs. Our key observation is that different inputs to NTT incur different Hamming weights in its output and intermediate layers. By measuring the CPU frequency during the execution of NTT, we propose a simple yet effective attack idea to find the input to NTT that triggers NTT processing data with significantly low Hamming weight. We further apply our attack idea to real-world applications that are built upon NTT: CPAsecure Kyber without Compression and Decompression functions, and CCA-secure NTTRU. This leads us to extract information or frequency hints about the secret key. Integrating these hints into the LWE-estimator framework, we estimate a minimum of 35% security loss caused by the leakage. The frequency and timing measurements on the Reference and AVX2 implementations of NTT in both Kyber and NTTRU align well with our theoretical analysis, confirming the existence of frequency side-channel leakage in NTT. It is important to emphasize that our observation is not limited to a specific implementation but rather the algorithm on which NTT is based. Therefore, our results call for more attention to the analysis of power leakage against NTT in lattice-based cryptography.
{"title":"Hints from Hertz: Dynamic Frequency Scaling Side-Channel Analysis of Number Theoretic Transform in Lattice-Based KEMs","authors":"Tianrun Yu, Chi Cheng, Zilong Yang, Yingchen Wang, Yanbin Pan, Jian Weng","doi":"10.46586/tches.v2024.i3.200-223","DOIUrl":"https://doi.org/10.46586/tches.v2024.i3.200-223","url":null,"abstract":"Number Theoretic Transform (NTT) has been widely used in accelerating computations in lattice-based cryptography. However, attackers can potentially launch power analysis targeting the NTT because it is one of the most time-consuming parts of the implementation. This extended time frame provides a natural window of opportunity for attackers. In this paper, we investigate the first CPU frequency leakage (Hertzbleed-like) attacks against NTT in lattice-based KEMs. Our key observation is that different inputs to NTT incur different Hamming weights in its output and intermediate layers. By measuring the CPU frequency during the execution of NTT, we propose a simple yet effective attack idea to find the input to NTT that triggers NTT processing data with significantly low Hamming weight. We further apply our attack idea to real-world applications that are built upon NTT: CPAsecure Kyber without Compression and Decompression functions, and CCA-secure NTTRU. This leads us to extract information or frequency hints about the secret key. Integrating these hints into the LWE-estimator framework, we estimate a minimum of 35% security loss caused by the leakage. The frequency and timing measurements on the Reference and AVX2 implementations of NTT in both Kyber and NTTRU align well with our theoretical analysis, confirming the existence of frequency side-channel leakage in NTT. It is important to emphasize that our observation is not limited to a specific implementation but rather the algorithm on which NTT is based. Therefore, our results call for more attention to the analysis of power leakage against NTT in lattice-based cryptography.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":" 16","pages":"70"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141825759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.46586/tches.v2024.i3.25-75
J. Cheon, Hyeong-Soon Choe, Julien Devevey, Tim Güneysu, Dongyeon Hong, Markus Krausz, Georg Land, Marc Möller, D. Stehlé, MinJune Yi
We present HAETAE (Hyperball bimodAl modulE rejecTion signAture schemE), a new lattice-based signature scheme. Like the NIST-selected Dilithium signature scheme, HAETAE is based on the Fiat-Shamir with Aborts paradigm, but our design choices target an improved complexity/compactness compromise that is highly relevant for many space-limited application scenarios. We primarily focus on reducing signature and verification key sizes so that signatures fit into one TCP or UDP datagram while preserving a high level of security against a variety of attacks. As a result, our scheme has signature and verification key sizes up to 39% and 25% smaller, respectively, compared than Dilithium. We provide a portable, constanttime reference implementation together with an optimized implementation using AVX2 instructions and an implementation with reduced stack size for the Cortex-M4. Moreover, we describe how to efficiently protect HAETAE against implementation attacks such as side-channel analysis, making it an attractive candidate for use in IoT and other embedded systems.
{"title":"HAETAE: Shorter Lattice-Based Fiat-Shamir Signatures","authors":"J. Cheon, Hyeong-Soon Choe, Julien Devevey, Tim Güneysu, Dongyeon Hong, Markus Krausz, Georg Land, Marc Möller, D. Stehlé, MinJune Yi","doi":"10.46586/tches.v2024.i3.25-75","DOIUrl":"https://doi.org/10.46586/tches.v2024.i3.25-75","url":null,"abstract":"We present HAETAE (Hyperball bimodAl modulE rejecTion signAture schemE), a new lattice-based signature scheme. Like the NIST-selected Dilithium signature scheme, HAETAE is based on the Fiat-Shamir with Aborts paradigm, but our design choices target an improved complexity/compactness compromise that is highly relevant for many space-limited application scenarios. We primarily focus on reducing signature and verification key sizes so that signatures fit into one TCP or UDP datagram while preserving a high level of security against a variety of attacks. As a result, our scheme has signature and verification key sizes up to 39% and 25% smaller, respectively, compared than Dilithium. We provide a portable, constanttime reference implementation together with an optimized implementation using AVX2 instructions and an implementation with reduced stack size for the Cortex-M4. Moreover, we describe how to efficiently protect HAETAE against implementation attacks such as side-channel analysis, making it an attractive candidate for use in IoT and other embedded systems.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":" 15","pages":"624"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141825986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.46586/tches.v2024.i3.342-376
Jannik Zeitschner, Amir Moradi
Software solutions to address computational challenges are ubiquitous in our daily lives. One specific application area where software is often used is in embedded systems, which, like other digital electronic devices, are vulnerable to side-channel analysis attacks. Although masking is the most common countermeasure and provides a solid theoretical foundation for ensuring security, recent research has revealed a crucial gap between theoretical and real-world security. This shortcoming stems from the micro-architectural effects of the underlying micro-processor. Common security models used to formally verify masking schemes such as the d-probing model fully ignore the micro-architectural leakages that lead to a set of instructions that unintentionally recombine the shares. Manual generation of masked assembly code that remains secure in the presence of such micro-architectural recombinations often involves trial and error, and is non-trivial even for experts.Motivated by this, we present PoMMES, which enables inexperienced software developers to automatically compile masked functions written in a high-level programming language into assembly code, while preserving the theoretically proven security in practice. Compared to the state of the art, based on a general model for microarchitectural effects, our scheme allows the generation of practically secure masked software at arbitrary security orders for in-order processors. The major contribution of PoMMES is its micro-architecture aware register allocation algorithm, which is one of the crucial steps during the compilation process. In addition to simulation-based assessments that we conducted by open-source tools dedicated to evaluating masked software implementations, we confirm the effectiveness of the PoMMES-generated codes through experimental analysis. We present the result of power consumption based leakage assessments of several case studies running on a Cortex M0+ micro-controller, which is commonly deployed in industry.
{"title":"PoMMES: Prevention of Micro-architectural Leakages in Masked Embedded Software","authors":"Jannik Zeitschner, Amir Moradi","doi":"10.46586/tches.v2024.i3.342-376","DOIUrl":"https://doi.org/10.46586/tches.v2024.i3.342-376","url":null,"abstract":"Software solutions to address computational challenges are ubiquitous in our daily lives. One specific application area where software is often used is in embedded systems, which, like other digital electronic devices, are vulnerable to side-channel analysis attacks. Although masking is the most common countermeasure and provides a solid theoretical foundation for ensuring security, recent research has revealed a crucial gap between theoretical and real-world security. This shortcoming stems from the micro-architectural effects of the underlying micro-processor. Common security models used to formally verify masking schemes such as the d-probing model fully ignore the micro-architectural leakages that lead to a set of instructions that unintentionally recombine the shares. Manual generation of masked assembly code that remains secure in the presence of such micro-architectural recombinations often involves trial and error, and is non-trivial even for experts.Motivated by this, we present PoMMES, which enables inexperienced software developers to automatically compile masked functions written in a high-level programming language into assembly code, while preserving the theoretically proven security in practice. Compared to the state of the art, based on a general model for microarchitectural effects, our scheme allows the generation of practically secure masked software at arbitrary security orders for in-order processors. The major contribution of PoMMES is its micro-architecture aware register allocation algorithm, which is one of the crucial steps during the compilation process. In addition to simulation-based assessments that we conducted by open-source tools dedicated to evaluating masked software implementations, we confirm the effectiveness of the PoMMES-generated codes through experimental analysis. We present the result of power consumption based leakage assessments of several case studies running on a Cortex M0+ micro-controller, which is commonly deployed in industry.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":" 20","pages":"574"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141826110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.46586/tches.v2024.i3.574-602
Dev M. Mehta, M. Hashemi, Domenic Forte, Shahin Tajik, F. Ganji
A universal circuit (UC) can be thought of as a programmable circuit that can simulate any circuit up to a certain size by specifying its secret configuration bits. UCs have been incorporated into various applications, such as private function evaluation (PFE). Recently, studies have attempted to formalize the concept of semiconductor intellectual property (IP) protection in the context of UCs. This is despite the observations made in theory and practice that, in reality, the adversary may obtain additional information about the secret when executing cryptographic protocols. This paper aims to answer the question of whether UCs leak information unintentionally, which can be leveraged by the adversary to disclose the configuration bits. In this regard, we propose the first photon emission analysis against UCs relying on computer vision-based approaches. We demonstrate that the adversary can utilize a cost-effective solution to take images to be processed by off-the-shelf algorithms to extract configuration bits. We examine the efficacy of our method in two scenarios: (1) the design is small enough to be captured in a single image during the attack phase, and (2) multiple images should be captured to launch the attack by deploying a divide-and-conquer strategy. To evaluate the effectiveness of our attack, we use metrics commonly applied in side-channel analysis, namely rank and success rate. By doing so, we show that our profiled photon emission analysis achieves a success rate of 1 by employing a few templates (concretely, only 18 images were used as templates).
{"title":"1/0 Shades of UC: Photonic Side-Channel Analysis of Universal Circuits","authors":"Dev M. Mehta, M. Hashemi, Domenic Forte, Shahin Tajik, F. Ganji","doi":"10.46586/tches.v2024.i3.574-602","DOIUrl":"https://doi.org/10.46586/tches.v2024.i3.574-602","url":null,"abstract":"A universal circuit (UC) can be thought of as a programmable circuit that can simulate any circuit up to a certain size by specifying its secret configuration bits. UCs have been incorporated into various applications, such as private function evaluation (PFE). Recently, studies have attempted to formalize the concept of semiconductor intellectual property (IP) protection in the context of UCs. This is despite the observations made in theory and practice that, in reality, the adversary may obtain additional information about the secret when executing cryptographic protocols. This paper aims to answer the question of whether UCs leak information unintentionally, which can be leveraged by the adversary to disclose the configuration bits. In this regard, we propose the first photon emission analysis against UCs relying on computer vision-based approaches. We demonstrate that the adversary can utilize a cost-effective solution to take images to be processed by off-the-shelf algorithms to extract configuration bits. We examine the efficacy of our method in two scenarios: (1) the design is small enough to be captured in a single image during the attack phase, and (2) multiple images should be captured to launch the attack by deploying a divide-and-conquer strategy. To evaluate the effectiveness of our attack, we use metrics commonly applied in side-channel analysis, namely rank and success rate. By doing so, we show that our profiled photon emission analysis achieves a success rate of 1 by employing a few templates (concretely, only 18 images were used as templates).","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":" 3","pages":"72"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141824401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.46586/tches.v2024.i3.377-417
Oren Ganon, Itamar Levi
The efficient execution of a Lightweight Cryptography (LWC) algorithm is essential for edge computing platforms. Dedicated Instruction Set Extensions (ISEs) are often included for this purpose. We propose the CrISA-X-a Cryptography Instruction Set Architecture eXtensions designed to improve cryptographic latency on extendable processors. CrISA-X, provides enhanced speed of various algorithms simultaneously while optimizing ISA adaptability, a feat yet to be accomplished. The extension, diverse for several computation levels, is first tailored explicitly for individual algorithms and sets of LWC algorithms, depending on performance, frequency, and area trade-offs. By diligently applying the Min-Max optimization technique, we have configured these extensions to achieve a delicate balance between performance, area utilization, code size, etc. Our study presents empirical evidence of the performance enhancement achieved on a synthesis modular RISC processor. We offer a framework for creating optimized processor hardware and ISA extensions. The CrISA-X outperforms ISA extensions by delivering significant performance boosts between 3x to 17x while experiencing a relative area cost increase of +12% and +47% in LUTs. Notably, as one important example, the utilization of the ASCON algorithm yields a 10x performance boost in contrast to the base ISA instruction implementation.
高效执行轻量级密码学(LWC)算法对边缘计算平台至关重要。专用指令集扩展(ISE)通常用于此目的。我们提出了 CrISA-X- 一种加密指令集架构扩展,旨在改善可扩展处理器上的加密延迟。CrISA-X 在优化 ISA 适应性的同时,提高了各种算法的速度,这是一项尚未完成的壮举。根据性能、频率和面积的权衡,该扩展首先针对单个算法和 LWC 算法集进行了明确定制,并在多个计算级别上实现了多样化。通过孜孜不倦地应用 Min-Max 优化技术,我们配置了这些扩展,在性能、面积利用率、代码大小等方面实现了微妙的平衡。我们的研究提供了在合成模块化 RISC 处理器上实现性能提升的经验证据。我们提供了一个创建优化处理器硬件和 ISA 扩展的框架。CrISA-X 的性能优于 ISA 扩展,其性能显著提高了 3 至 17 倍,而相对面积成本增加了 12%,LUT 增加了 47%。值得注意的是,一个重要的例子是,与基本 ISA 指令实现相比,利用 ASCON 算法可实现 10 倍的性能提升。
{"title":"CrISA-X: Unleashing Performance Excellence in Lightweight Symmetric Cryptography for Extendable and Deeply Embedded Processors","authors":"Oren Ganon, Itamar Levi","doi":"10.46586/tches.v2024.i3.377-417","DOIUrl":"https://doi.org/10.46586/tches.v2024.i3.377-417","url":null,"abstract":"The efficient execution of a Lightweight Cryptography (LWC) algorithm is essential for edge computing platforms. Dedicated Instruction Set Extensions (ISEs) are often included for this purpose. We propose the CrISA-X-a Cryptography Instruction Set Architecture eXtensions designed to improve cryptographic latency on extendable processors. CrISA-X, provides enhanced speed of various algorithms simultaneously while optimizing ISA adaptability, a feat yet to be accomplished. The extension, diverse for several computation levels, is first tailored explicitly for individual algorithms and sets of LWC algorithms, depending on performance, frequency, and area trade-offs. By diligently applying the Min-Max optimization technique, we have configured these extensions to achieve a delicate balance between performance, area utilization, code size, etc. Our study presents empirical evidence of the performance enhancement achieved on a synthesis modular RISC processor. We offer a framework for creating optimized processor hardware and ISA extensions. The CrISA-X outperforms ISA extensions by delivering significant performance boosts between 3x to 17x while experiencing a relative area cost increase of +12% and +47% in LUTs. Notably, as one important example, the utilization of the ASCON algorithm yields a 10x performance boost in contrast to the base ISA instruction implementation.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":" 10","pages":"59"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141827770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.46586/tches.v2024.i3.224-248
Shing Hing William Cheng, C. Chuengsatiansup, Daniel Genkin, Dallas McNeil, Toby Murray, Y. Yarom, Zhiyuan Zhang
Speculative out-of-order execution is a strategy of masking execution latency by allowing younger instructions to execute before older instructions. While originally considered to be innocuous, speculative out-of-order execution was brought into the spotlight with the 2018 publication of the Spectre and Meltdown attacks. These attacks demonstrated that microarchitectural side channels can leak sensitive data accessed by speculatively executed instructions that are not part of the normal program execution. Since then, a significant effort has been vested in investigating how microarchitectural side channels can leak data from speculatively executed instructions and how to control this leakage. However, much less is known about how speculative out-of-order execution affects microarchitectural side-channel attacks.In this paper, we investigate how speculative out-of-order execution affects the Evict+Time cache attack. Evict+Time is based on the observation that cache misses are slower than cache hits, hence by measuring the execution time of code, an attacker can determine if a cache miss occurred during the execution. We demonstrate that, due to limited resources for tracking out-of-order execution, under certain conditions an attacker can gain more fine-grained information and determine whether a cache miss occurred in part of the executed code.Based on the observation, we design the Evict+Spec+Time attack, a variant of Evict+Time that can learn not only whether a cache miss occurred, but also in which part of the victim code it occurred. We demonstrate that Evict+Spec+Time is an order of magnitude more efficient than Evict+Time when attacking a T-tables-based implementation of AES. We further show an Evict+Spec+Time attack on an S-boxbased implementation of AES, recovering the key with as little as 14 815 decryptions. To the best of our knowledge, ours is the first successful Evict+Time attack on such a victim.
{"title":"Evict+Spec+Time: Exploiting Out-of-Order Execution to Improve Cache-Timing Attacks","authors":"Shing Hing William Cheng, C. Chuengsatiansup, Daniel Genkin, Dallas McNeil, Toby Murray, Y. Yarom, Zhiyuan Zhang","doi":"10.46586/tches.v2024.i3.224-248","DOIUrl":"https://doi.org/10.46586/tches.v2024.i3.224-248","url":null,"abstract":"Speculative out-of-order execution is a strategy of masking execution latency by allowing younger instructions to execute before older instructions. While originally considered to be innocuous, speculative out-of-order execution was brought into the spotlight with the 2018 publication of the Spectre and Meltdown attacks. These attacks demonstrated that microarchitectural side channels can leak sensitive data accessed by speculatively executed instructions that are not part of the normal program execution. Since then, a significant effort has been vested in investigating how microarchitectural side channels can leak data from speculatively executed instructions and how to control this leakage. However, much less is known about how speculative out-of-order execution affects microarchitectural side-channel attacks.In this paper, we investigate how speculative out-of-order execution affects the Evict+Time cache attack. Evict+Time is based on the observation that cache misses are slower than cache hits, hence by measuring the execution time of code, an attacker can determine if a cache miss occurred during the execution. We demonstrate that, due to limited resources for tracking out-of-order execution, under certain conditions an attacker can gain more fine-grained information and determine whether a cache miss occurred in part of the executed code.Based on the observation, we design the Evict+Spec+Time attack, a variant of Evict+Time that can learn not only whether a cache miss occurred, but also in which part of the victim code it occurred. We demonstrate that Evict+Spec+Time is an order of magnitude more efficient than Evict+Time when attacking a T-tables-based implementation of AES. We further show an Evict+Spec+Time attack on an S-boxbased implementation of AES, recovering the key with as little as 14 815 decryptions. To the best of our knowledge, ours is the first successful Evict+Time attack on such a victim.","PeriodicalId":508905,"journal":{"name":"IACR Cryptol. ePrint Arch.","volume":" 38","pages":"149"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141827442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}