{"title":"使用英特尔IFMA扩展加速大整数运算","authors":"S. Gueron, V. Krasnov","doi":"10.1109/ARITH.2016.22","DOIUrl":null,"url":null,"abstract":"Intel has recently announced a new set of processor instructions, dubbed AVX512IFMA, that carry out Integer Fused Multiply Accumulate operations. These instructions operate on 512-bit registers and compute eight independent 52-bit unsigned integer multiplications, to generate eight 104-bit products, and accumulate their low/high halves into 64-bit containers. Using these instructions requires that inputs are converted to (redundant form) radix 252, and outputs are converted to the desired representation. This paper demonstrates several techniques for leveraging the AVX512IFMA instructions in order to speed up big-integer multiplications. Although processors that support AVX512IFMA are not yet available at the time this paper is written, we show how currently available public tools can be used for estimating their potential performance benefits. For example, based on these tools, we expect a 2x speedup for 1024-bit integer multiplication, over the best currently available method.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Accelerating Big Integer Arithmetic Using Intel IFMA Extensions\",\"authors\":\"S. Gueron, V. Krasnov\",\"doi\":\"10.1109/ARITH.2016.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Intel has recently announced a new set of processor instructions, dubbed AVX512IFMA, that carry out Integer Fused Multiply Accumulate operations. These instructions operate on 512-bit registers and compute eight independent 52-bit unsigned integer multiplications, to generate eight 104-bit products, and accumulate their low/high halves into 64-bit containers. Using these instructions requires that inputs are converted to (redundant form) radix 252, and outputs are converted to the desired representation. This paper demonstrates several techniques for leveraging the AVX512IFMA instructions in order to speed up big-integer multiplications. Although processors that support AVX512IFMA are not yet available at the time this paper is written, we show how currently available public tools can be used for estimating their potential performance benefits. For example, based on these tools, we expect a 2x speedup for 1024-bit integer multiplication, over the best currently available method.\",\"PeriodicalId\":145448,\"journal\":{\"name\":\"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)\",\"volume\":\"109 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ARITH.2016.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARITH.2016.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Accelerating Big Integer Arithmetic Using Intel IFMA Extensions
Intel has recently announced a new set of processor instructions, dubbed AVX512IFMA, that carry out Integer Fused Multiply Accumulate operations. These instructions operate on 512-bit registers and compute eight independent 52-bit unsigned integer multiplications, to generate eight 104-bit products, and accumulate their low/high halves into 64-bit containers. Using these instructions requires that inputs are converted to (redundant form) radix 252, and outputs are converted to the desired representation. This paper demonstrates several techniques for leveraging the AVX512IFMA instructions in order to speed up big-integer multiplications. Although processors that support AVX512IFMA are not yet available at the time this paper is written, we show how currently available public tools can be used for estimating their potential performance benefits. For example, based on these tools, we expect a 2x speedup for 1024-bit integer multiplication, over the best currently available method.