基于错误密码环学习的图形处理器更快的数论变换

2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI) Pub Date : 2018-07-01 DOI:10.1109/SOLI.2018.8476725

Ahmad Al Badawi, B. Veeravalli, Khin Mi Mi Aung

{"title":"基于错误密码环学习的图形处理器更快的数论变换","authors":"Ahmad Al Badawi, B. Veeravalli, Khin Mi Mi Aung","doi":"10.1109/SOLI.2018.8476725","DOIUrl":null,"url":null,"abstract":"The Number Theoretic Transform (NTT) has been revived recently by the advent of the Ring-Learning with Errors (Ring-LWE) Homomorphic Encryption (HE) schemes. In these schemes, the NTT is used to calculate the product of high degree polynomials with multi-precision coefficients in quasilinear time. This is known as the most time-consuming operation in Ring–based HE schemes. Therefore; accelerating NTT is key to realize efficient implementations. As such, in its current version, a fast NTT implementation is included in cuHE, which is a publicly available HE library in Compute Unified Device Architecture (CUDA). We analyzed cuHE NTT kernels and found out that they suffer from two performance pitfalls: shared memory conflicts and thread divergence. We show that by using a set of CUDA tailored-made optimizations, we can improve on the speed of cuHE NTT computation by 20%-50% for different problem sizes.","PeriodicalId":424115,"journal":{"name":"2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Faster Number Theoretic Transform on Graphics Processors for Ring Learning with Errors Based Cryptography\",\"authors\":\"Ahmad Al Badawi, B. Veeravalli, Khin Mi Mi Aung\",\"doi\":\"10.1109/SOLI.2018.8476725\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Number Theoretic Transform (NTT) has been revived recently by the advent of the Ring-Learning with Errors (Ring-LWE) Homomorphic Encryption (HE) schemes. In these schemes, the NTT is used to calculate the product of high degree polynomials with multi-precision coefficients in quasilinear time. This is known as the most time-consuming operation in Ring–based HE schemes. Therefore; accelerating NTT is key to realize efficient implementations. As such, in its current version, a fast NTT implementation is included in cuHE, which is a publicly available HE library in Compute Unified Device Architecture (CUDA). We analyzed cuHE NTT kernels and found out that they suffer from two performance pitfalls: shared memory conflicts and thread divergence. We show that by using a set of CUDA tailored-made optimizations, we can improve on the speed of cuHE NTT computation by 20%-50% for different problem sizes.\",\"PeriodicalId\":424115,\"journal\":{\"name\":\"2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SOLI.2018.8476725\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOLI.2018.8476725","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

近年来，有误差环学习(Ring-LWE)同态加密(HE)方案的出现使数论变换(NTT)重新焕发了新生。在这些格式中，NTT用于在拟线性时间内计算具有多精度系数的高次多项式的乘积。这是基于ring的HE方案中最耗时的操作。因此;加速NTT是实现高效实施的关键。因此，在其当前版本中，快速NTT实现包含在cuHE中，这是计算统一设备体系结构(CUDA)中公开可用的HE库。我们分析了cuHE NTT内核，发现它们存在两个性能缺陷:共享内存冲突和线程发散。我们表明，通过使用一组CUDA量身定制的优化，我们可以将不同问题大小的cuHE NTT计算速度提高20%-50%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Faster Number Theoretic Transform on Graphics Processors for Ring Learning with Errors Based Cryptography

The Number Theoretic Transform (NTT) has been revived recently by the advent of the Ring-Learning with Errors (Ring-LWE) Homomorphic Encryption (HE) schemes. In these schemes, the NTT is used to calculate the product of high degree polynomials with multi-precision coefficients in quasilinear time. This is known as the most time-consuming operation in Ring–based HE schemes. Therefore; accelerating NTT is key to realize efficient implementations. As such, in its current version, a fast NTT implementation is included in cuHE, which is a publicly available HE library in Compute Unified Device Architecture (CUDA). We analyzed cuHE NTT kernels and found out that they suffer from two performance pitfalls: shared memory conflicts and thread divergence. We show that by using a set of CUDA tailored-made optimizations, we can improve on the speed of cuHE NTT computation by 20%-50% for different problem sizes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI)

自引率

0.00%

发文量