Evaluating instruction set extensions for fast arithmetic on binary finite fields

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004. Pub Date : 2004-09-27 DOI:10.1109/ASAP.2004.10003

A. M. Fiskiran, R. Lee

{"title":"Evaluating instruction set extensions for fast arithmetic on binary finite fields","authors":"A. M. Fiskiran, R. Lee","doi":"10.1109/ASAP.2004.10003","DOIUrl":null,"url":null,"abstract":"Binary finite fields GF(2/sup n/) are very commonly used in cryptography, particularly in public-key algorithms such as elliptic curve cryptography (ECC). On word-oriented programmable processors, field elements are generally represented as polynomials with coefficients from [0, 1]. Key arithmetic operations on these polynomials, such as squaring and multiplication, are not supported by integer-oriented processor architectures. Instead, these are implemented in software, causing a very large fraction of the cryptography execution time to be dominated by a few elementary operations. For example, more than 90% of the execution time of 163-bit ECC may be consumed by two simple field operations: squaring and multiplication. A few processor architectures have been proposed recently that include instructions for binary field arithmetic. However, these have only considered processors with small wordsizes and in-order, single-issue execution. The first contribution of this paper is to validate these new arithmetic instructions for processors with wider wordsizes and multiple-issue (e.g. superscalar) execution. We also consider the effects of varying the number of functional units and load/store pipes. We demonstrate that the combination of microarchitecture and new instructions provides speedups up to 22.4x for ECC point multiplication. Second, we show that if a bit-level reverse instruction is included in the instruction set, the size of the multiplier can be reduced by half without significant performance degradation. Third, we compare the benefits of superscalar execution with wordsize scaling. The latter has been used in recent processor architectures such as PLX and PAX as a new way to extract parallelism. We show that 2x wordsize scaling provides 70% better performance than 2-way superscalar execution. Finally, we suggest a low-cost method, which we call multi-word result execution, to realize some of the benefits of wordsize scaling in existing processors with fixed wordsizes.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.2004.10003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Binary finite fields GF(2/sup n/) are very commonly used in cryptography, particularly in public-key algorithms such as elliptic curve cryptography (ECC). On word-oriented programmable processors, field elements are generally represented as polynomials with coefficients from [0, 1]. Key arithmetic operations on these polynomials, such as squaring and multiplication, are not supported by integer-oriented processor architectures. Instead, these are implemented in software, causing a very large fraction of the cryptography execution time to be dominated by a few elementary operations. For example, more than 90% of the execution time of 163-bit ECC may be consumed by two simple field operations: squaring and multiplication. A few processor architectures have been proposed recently that include instructions for binary field arithmetic. However, these have only considered processors with small wordsizes and in-order, single-issue execution. The first contribution of this paper is to validate these new arithmetic instructions for processors with wider wordsizes and multiple-issue (e.g. superscalar) execution. We also consider the effects of varying the number of functional units and load/store pipes. We demonstrate that the combination of microarchitecture and new instructions provides speedups up to 22.4x for ECC point multiplication. Second, we show that if a bit-level reverse instruction is included in the instruction set, the size of the multiplier can be reduced by half without significant performance degradation. Third, we compare the benefits of superscalar execution with wordsize scaling. The latter has been used in recent processor architectures such as PLX and PAX as a new way to extract parallelism. We show that 2x wordsize scaling provides 70% better performance than 2-way superscalar execution. Finally, we suggest a low-cost method, which we call multi-word result execution, to realize some of the benefits of wordsize scaling in existing processors with fixed wordsizes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

二元有限域上快速算法的指令集扩展评估

二进制有限域GF(2/sup n/)在密码学中非常常用，特别是在椭圆曲线密码学(ECC)等公钥算法中。在面向字的可编程处理器上，字段元素通常表示为系数为[0,1]的多项式。这些多项式上的关键算术运算，如平方和乘法，不支持面向整数的处理器体系结构。相反，这些都是在软件中实现的，导致加密执行时间的很大一部分被一些基本操作所支配。例如，超过90%的163位ECC的执行时间可能被两个简单的字段操作所消耗:平方和乘法。最近提出了一些包含二进制字段运算指令的处理器体系结构。然而，这些方法只考虑了字数小、按顺序单问题执行的处理器。本文的第一个贡献是验证这些新的算术指令适用于具有更大字长和多问题(例如超标量)执行的处理器。我们还考虑了改变功能单元和加载/存储管道数量的影响。我们证明了微架构和新指令的组合为ECC点乘法提供了高达22.4倍的加速。其次，我们表明，如果在指令集中包含位级反向指令，则乘法器的大小可以减少一半而不会显著降低性能。第三，我们比较了超标量执行和字长缩放的好处。后者已被用于最近的处理器体系结构，如PLX和PAX，作为提取并行性的新方法。我们表明，2倍字长缩放比双向超标量执行提供了70%的性能提升。最后，我们提出了一种低成本的方法，我们称之为多词结果执行，以在具有固定词长的现有处理器中实现词长缩放的一些好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.

自引率

0.00%

发文量