Pub Date : 2022-09-01DOI: 10.1109/ARITH54963.2022.00022
M. Arnold
Quantum computers, which process qubits, offer the promise of spectacular performance improvement over ordinary computers that deal only with classical bits, but there are obstacles to this vision. First, current quantum technology only allows a small number of qubits, and these are susceptible to noise. Second, quantum algorithms must be reversible, which often requires ancillary data that consume precious qubits. Third, interesting algorithms amenable to quantum implementation, such as chemistry simulation, require representing real numbers. Although quantum integer arithmetic has been studied extensively, the few works on quantum floating point demand more ancillary qubits than input data making floating point impractical for current quantum hardware. This paper suggests an alternative to floating point, known as the Logarithmic Number System (LNS), which has proven effective for approximate arithmetic with classical hardware. Reversible LNS multiplication and division are easy and exact with one ancillary qubit. Here we explore the quantum cost of difficult LNS operations (addition and subtraction). LNS offers implementation tradeoffs between accuracy and qubit cost that suggest highly-approximate LNS will be practical on quantum hardware sooner than when quantum technology has improved enough for floating-point to be practical.
{"title":"Towards Quantum Logarithm Number Systems","authors":"M. Arnold","doi":"10.1109/ARITH54963.2022.00022","DOIUrl":"https://doi.org/10.1109/ARITH54963.2022.00022","url":null,"abstract":"Quantum computers, which process qubits, offer the promise of spectacular performance improvement over ordinary computers that deal only with classical bits, but there are obstacles to this vision. First, current quantum technology only allows a small number of qubits, and these are susceptible to noise. Second, quantum algorithms must be reversible, which often requires ancillary data that consume precious qubits. Third, interesting algorithms amenable to quantum implementation, such as chemistry simulation, require representing real numbers. Although quantum integer arithmetic has been studied extensively, the few works on quantum floating point demand more ancillary qubits than input data making floating point impractical for current quantum hardware. This paper suggests an alternative to floating point, known as the Logarithmic Number System (LNS), which has proven effective for approximate arithmetic with classical hardware. Reversible LNS multiplication and division are easy and exact with one ancillary qubit. Here we explore the quantum cost of difficult LNS operations (addition and subtraction). LNS offers implementation tradeoffs between accuracy and qubit cost that suggest highly-approximate LNS will be practical on quantum hardware sooner than when quantum technology has improved enough for floating-point to be practical.","PeriodicalId":268661,"journal":{"name":"2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132418898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01DOI: 10.1109/ARITH54963.2022.00013
C. F. Borges, C. Jeannerod, J. Muller
We analyze two fast and accurate algorithms recently presented by Borges for computing $x^{-1/2}$ in binary floating-point arithmetic (assuming that efficient and correctly-rounded FMA and square root are available). The first algorithm is based on the Newton-Raphson iteration, and the second one uses an order-3 iteration. We give attainable relative-error bounds for these two algorithms, build counterexamples showing that in very rare cases they do not provide a correctly-rounded result, and characterize precisely when such failures happen in IEEE 754 binary32 and binary64 arithmetics. We then give a generic (i.e., precision-independent) algorithm that always returns a correctly-rounded result, and show how it can be simplified and made more efficient in the important cases of binary32 and binary64.
{"title":"High-level algorithms for correctly-rounded reciprocal square roots","authors":"C. F. Borges, C. Jeannerod, J. Muller","doi":"10.1109/ARITH54963.2022.00013","DOIUrl":"https://doi.org/10.1109/ARITH54963.2022.00013","url":null,"abstract":"We analyze two fast and accurate algorithms recently presented by Borges for computing $x^{-1/2}$ in binary floating-point arithmetic (assuming that efficient and correctly-rounded FMA and square root are available). The first algorithm is based on the Newton-Raphson iteration, and the second one uses an order-3 iteration. We give attainable relative-error bounds for these two algorithms, build counterexamples showing that in very rare cases they do not provide a correctly-rounded result, and characterize precisely when such failures happen in IEEE 754 binary32 and binary64 arithmetics. We then give a generic (i.e., precision-independent) algorithm that always returns a correctly-rounded result, and show how it can be simplified and made more efficient in the important cases of binary32 and binary64.","PeriodicalId":268661,"journal":{"name":"2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125777616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01DOI: 10.1109/ARITH54963.2022.00026
Daichi Aoki, Kazuhiko Minematsu, T. Okamura, T. Takagi
As an efficient multiplication method for polynomial rings, Number Theoretic Transform (NTT) is a fundamental algorithm that is both practically useful and theoretically established. Chung et al. proposed a method to perform NTT-based polynomial multiplication for NTT-unfriendly rings that do not have suitable primitive roots. They applied their proposal to lattice-based cryptography using NTT-unfriendly rings and speeded up several schemes. At ARITH 2021, Plantard proposed a modular multiplication algorithm that improves the speed of NTT if moduli are not large (a few dozen of bits), which is the case for typical lattice-based cryptography. It is natural to expect that Plantard's method improves Chung et al.‘s NTT when applied to them, however, this is not possible as Chung et al. requires the use of signed integers while Plantard's method assumes unsigned integers. A simple fix would cause a slowdown and a non-constant-time operation. To overcome this problem, we propose an efficient method for calculating the modular multiplication for signed integers based on Plantard's method. Our proposal generally incurs no overhead from the original and works in a constant-time fashion. To show the effectiveness of our proposal, we provide experimental implementation results on a lattice-based cryptographic scheme Saber. Currently, NIST is selecting candidates for standardization of post-quantum cryp-tography in preparation for the compromise of current public key cryptography by quantum computers, and has completed the selection of the final candidates. Saber is one of the finalists for the NIST standardization project,
{"title":"Efficient Word Size Modular Multiplication over Signed Integers","authors":"Daichi Aoki, Kazuhiko Minematsu, T. Okamura, T. Takagi","doi":"10.1109/ARITH54963.2022.00026","DOIUrl":"https://doi.org/10.1109/ARITH54963.2022.00026","url":null,"abstract":"As an efficient multiplication method for polynomial rings, Number Theoretic Transform (NTT) is a fundamental algorithm that is both practically useful and theoretically established. Chung et al. proposed a method to perform NTT-based polynomial multiplication for NTT-unfriendly rings that do not have suitable primitive roots. They applied their proposal to lattice-based cryptography using NTT-unfriendly rings and speeded up several schemes. At ARITH 2021, Plantard proposed a modular multiplication algorithm that improves the speed of NTT if moduli are not large (a few dozen of bits), which is the case for typical lattice-based cryptography. It is natural to expect that Plantard's method improves Chung et al.‘s NTT when applied to them, however, this is not possible as Chung et al. requires the use of signed integers while Plantard's method assumes unsigned integers. A simple fix would cause a slowdown and a non-constant-time operation. To overcome this problem, we propose an efficient method for calculating the modular multiplication for signed integers based on Plantard's method. Our proposal generally incurs no overhead from the original and works in a constant-time fashion. To show the effectiveness of our proposal, we provide experimental implementation results on a lattice-based cryptographic scheme Saber. Currently, NIST is selecting candidates for standardization of post-quantum cryp-tography in preparation for the compromise of current public key cryptography by quantum computers, and has completed the selection of the final candidates. Saber is one of the finalists for the NIST standardization project,","PeriodicalId":268661,"journal":{"name":"2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133655286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01DOI: 10.1109/ARITH54963.2022.00025
Laurent-Stéphane Didier, J. Robert, Fangan-Yssouf Dosso, Nadia El Mrabet
The Polynomial Modular Number System (PMNS) and the Residue Number System (RNS) are integer number systems which aim to speed up modular arithmetic. Their parallel properties make them suitable for the implementation of cryptographic applications on modern processors with SIMD instructions. In this work, we will show the implementation choices made for the modular multiplication in both systems and compare their implementation performances for several sizes of moduli. We target the Intel 64-bit sequential instruction set and the Intel AVX-512 vector instruction set. This instruction set allows significant speed-ups up to 1 621 bit size moduli, while the vectorized PMNS implementation is up to 2.5 times faster than the vectorized RNS, though the vectorized RNS becomes slightly better for 3 251 bits, due to the difficulty to find a PMNS with a suitable parameter $n$. The vectorized RNS implementations reach performance levels close the state-of-the-art GMP library, while the retired instruction counts are lower for sizes between 401 and 3 251 bits.
{"title":"A software comparison of RNS and PMNS","authors":"Laurent-Stéphane Didier, J. Robert, Fangan-Yssouf Dosso, Nadia El Mrabet","doi":"10.1109/ARITH54963.2022.00025","DOIUrl":"https://doi.org/10.1109/ARITH54963.2022.00025","url":null,"abstract":"The Polynomial Modular Number System (PMNS) and the Residue Number System (RNS) are integer number systems which aim to speed up modular arithmetic. Their parallel properties make them suitable for the implementation of cryptographic applications on modern processors with SIMD instructions. In this work, we will show the implementation choices made for the modular multiplication in both systems and compare their implementation performances for several sizes of moduli. We target the Intel 64-bit sequential instruction set and the Intel AVX-512 vector instruction set. This instruction set allows significant speed-ups up to 1 621 bit size moduli, while the vectorized PMNS implementation is up to 2.5 times faster than the vectorized RNS, though the vectorized RNS becomes slightly better for 3 251 bits, due to the difficulty to find a PMNS with a suitable parameter $n$. The vectorized RNS implementations reach performance levels close the state-of-the-art GMP library, while the retired instruction counts are lower for sizes between 401 and 3 251 bits.","PeriodicalId":268661,"journal":{"name":"2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122852117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01DOI: 10.1109/ARITH54963.2022.00014
A. Sibidanov, P. Zimmermann, Stéphane Glondu
The CORE-MATH project aims at providing open-source mathematical functions with correct rounding that can be integrated into current mathematical libraries. This article demonstrates the CORE-MATH methodology on two functions: the binary32 power function (powf) and the binary64 cube root function (cbrt). CORE-MATH already provides a full set of correctly rounded C99 functions for single precision (binary32). These functions provide similar or in some cases up to threefold speedups with respect to the GNU libc mathematic library, which is not correctly rounded. This work offers a prospect of the mandatory requirement of correct rounding for mathematical functions in the next revision of the IEEE-754 standard.
{"title":"The CORE-MATH Project","authors":"A. Sibidanov, P. Zimmermann, Stéphane Glondu","doi":"10.1109/ARITH54963.2022.00014","DOIUrl":"https://doi.org/10.1109/ARITH54963.2022.00014","url":null,"abstract":"The CORE-MATH project aims at providing open-source mathematical functions with correct rounding that can be integrated into current mathematical libraries. This article demonstrates the CORE-MATH methodology on two functions: the binary32 power function (powf) and the binary64 cube root function (cbrt). CORE-MATH already provides a full set of correctly rounded C99 functions for single precision (binary32). These functions provide similar or in some cases up to threefold speedups with respect to the GNU libc mathematic library, which is not correctly rounded. This work offers a prospect of the mandatory requirement of correct rounding for mathematical functions in the next revision of the IEEE-754 standard.","PeriodicalId":268661,"journal":{"name":"2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121690606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01DOI: 10.1109/ARITH54963.2022.00028
Aurélien Greuet, Simon Montoya, Clémence Vermeersch
Modular reduction is a core operation in public-key cryptography. While a standard modular re-duction is often required, a partial reduction limiting the growth of the coefficients is enough for several usecases. Knowing the quotient of the Euclidean division of an integer by the modulus allows to easily recover the remainder. We propose a way to compute efficiently, without divisions, an approximation of this quotient. From this approximation, both full and partial reductions are deduced. The resulting algorithms are modulus specific: the sequence of operations to perform in order to get a reduction depends on the modulus and the size of the input. We analyse the cost of our algorithms for a usecase coming from post-quantum cryptography. We show that with this modulus, our method gives an algorithm faster than prior art algorithms.
{"title":"Quotient Approximation Modular Reduction","authors":"Aurélien Greuet, Simon Montoya, Clémence Vermeersch","doi":"10.1109/ARITH54963.2022.00028","DOIUrl":"https://doi.org/10.1109/ARITH54963.2022.00028","url":null,"abstract":"Modular reduction is a core operation in public-key cryptography. While a standard modular re-duction is often required, a partial reduction limiting the growth of the coefficients is enough for several usecases. Knowing the quotient of the Euclidean division of an integer by the modulus allows to easily recover the remainder. We propose a way to compute efficiently, without divisions, an approximation of this quotient. From this approximation, both full and partial reductions are deduced. The resulting algorithms are modulus specific: the sequence of operations to perform in order to get a reduction depends on the modulus and the size of the input. We analyse the cost of our algorithms for a usecase coming from post-quantum cryptography. We show that with this modulus, our method gives an algorithm faster than prior art algorithms.","PeriodicalId":268661,"journal":{"name":"2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125796800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01DOI: 10.1109/arith54963.2022.00019
David Mallasén, Raul Murillo, A. D. Del Barrio, G. Botella, L. Piñuel, Manuel Prieto-Matias
{"title":"PERCIVAL: Open-Source Posit RISC-V Core With Quire Capability","authors":"David Mallasén, Raul Murillo, A. D. Del Barrio, G. Botella, L. Piñuel, Manuel Prieto-Matias","doi":"10.1109/arith54963.2022.00019","DOIUrl":"https://doi.org/10.1109/arith54963.2022.00019","url":null,"abstract":"","PeriodicalId":268661,"journal":{"name":"2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130450052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-01DOI: 10.1109/arith54963.2022.00020
Efstratios Zacharelos, I. Nunziata, Gerardo Saggese, A. Strollo, E. Napoli
{"title":"Approximate Recursive Multipliers Using Low Power Building Blocks","authors":"Efstratios Zacharelos, I. Nunziata, Gerardo Saggese, A. Strollo, E. Napoli","doi":"10.1109/arith54963.2022.00020","DOIUrl":"https://doi.org/10.1109/arith54963.2022.00020","url":null,"abstract":"","PeriodicalId":268661,"journal":{"name":"2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121361673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}