Evaluating instruction set extensions for fast arithmetic on binary finite fields

A. M. Fiskiran, R. Lee
{"title":"Evaluating instruction set extensions for fast arithmetic on binary finite fields","authors":"A. M. Fiskiran, R. Lee","doi":"10.1109/ASAP.2004.10003","DOIUrl":null,"url":null,"abstract":"Binary finite fields GF(2/sup n/) are very commonly used in cryptography, particularly in public-key algorithms such as elliptic curve cryptography (ECC). On word-oriented programmable processors, field elements are generally represented as polynomials with coefficients from [0, 1]. Key arithmetic operations on these polynomials, such as squaring and multiplication, are not supported by integer-oriented processor architectures. Instead, these are implemented in software, causing a very large fraction of the cryptography execution time to be dominated by a few elementary operations. For example, more than 90% of the execution time of 163-bit ECC may be consumed by two simple field operations: squaring and multiplication. A few processor architectures have been proposed recently that include instructions for binary field arithmetic. However, these have only considered processors with small wordsizes and in-order, single-issue execution. The first contribution of this paper is to validate these new arithmetic instructions for processors with wider wordsizes and multiple-issue (e.g. superscalar) execution. We also consider the effects of varying the number of functional units and load/store pipes. We demonstrate that the combination of microarchitecture and new instructions provides speedups up to 22.4x for ECC point multiplication. Second, we show that if a bit-level reverse instruction is included in the instruction set, the size of the multiplier can be reduced by half without significant performance degradation. Third, we compare the benefits of superscalar execution with wordsize scaling. The latter has been used in recent processor architectures such as PLX and PAX as a new way to extract parallelism. We show that 2x wordsize scaling provides 70% better performance than 2-way superscalar execution. Finally, we suggest a low-cost method, which we call multi-word result execution, to realize some of the benefits of wordsize scaling in existing processors with fixed wordsizes.","PeriodicalId":120245,"journal":{"name":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2004-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.2004.10003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Binary finite fields GF(2/sup n/) are very commonly used in cryptography, particularly in public-key algorithms such as elliptic curve cryptography (ECC). On word-oriented programmable processors, field elements are generally represented as polynomials with coefficients from [0, 1]. Key arithmetic operations on these polynomials, such as squaring and multiplication, are not supported by integer-oriented processor architectures. Instead, these are implemented in software, causing a very large fraction of the cryptography execution time to be dominated by a few elementary operations. For example, more than 90% of the execution time of 163-bit ECC may be consumed by two simple field operations: squaring and multiplication. A few processor architectures have been proposed recently that include instructions for binary field arithmetic. However, these have only considered processors with small wordsizes and in-order, single-issue execution. The first contribution of this paper is to validate these new arithmetic instructions for processors with wider wordsizes and multiple-issue (e.g. superscalar) execution. We also consider the effects of varying the number of functional units and load/store pipes. We demonstrate that the combination of microarchitecture and new instructions provides speedups up to 22.4x for ECC point multiplication. Second, we show that if a bit-level reverse instruction is included in the instruction set, the size of the multiplier can be reduced by half without significant performance degradation. Third, we compare the benefits of superscalar execution with wordsize scaling. The latter has been used in recent processor architectures such as PLX and PAX as a new way to extract parallelism. We show that 2x wordsize scaling provides 70% better performance than 2-way superscalar execution. Finally, we suggest a low-cost method, which we call multi-word result execution, to realize some of the benefits of wordsize scaling in existing processors with fixed wordsizes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
二元有限域上快速算法的指令集扩展评估
二进制有限域GF(2/sup n/)在密码学中非常常用,特别是在椭圆曲线密码学(ECC)等公钥算法中。在面向字的可编程处理器上,字段元素通常表示为系数为[0,1]的多项式。这些多项式上的关键算术运算,如平方和乘法,不支持面向整数的处理器体系结构。相反,这些都是在软件中实现的,导致加密执行时间的很大一部分被一些基本操作所支配。例如,超过90%的163位ECC的执行时间可能被两个简单的字段操作所消耗:平方和乘法。最近提出了一些包含二进制字段运算指令的处理器体系结构。然而,这些方法只考虑了字数小、按顺序单问题执行的处理器。本文的第一个贡献是验证这些新的算术指令适用于具有更大字长和多问题(例如超标量)执行的处理器。我们还考虑了改变功能单元和加载/存储管道数量的影响。我们证明了微架构和新指令的组合为ECC点乘法提供了高达22.4倍的加速。其次,我们表明,如果在指令集中包含位级反向指令,则乘法器的大小可以减少一半而不会显著降低性能。第三,我们比较了超标量执行和字长缩放的好处。后者已被用于最近的处理器体系结构,如PLX和PAX,作为提取并行性的新方法。我们表明,2倍字长缩放比双向超标量执行提供了70%的性能提升。最后,我们提出了一种低成本的方法,我们称之为多词结果执行,以在具有固定词长的现有处理器中实现词长缩放的一些好处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Design and evaluation of a network-based architecture for cryptographic devices Switching-activity minimization on instruction-level loop for VLIW DSP applications Modeling and scheduling parallel data flow systems using structured systems of recurrence equations Hyper-programmable architectures for adaptable networked systems Efficient processing of color image sequences using a color-aware instruction set on mobile systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1