TurboHE: Accelerating Fully Homomorphic Encryption Using FPGA Clusters

Haohao Liao, Mahmoud A. Elmohr, Xuan Dong, Yanjun Qian, Wenzhe Yang, Zhiwei Shang, Yin Tan
{"title":"TurboHE: Accelerating Fully Homomorphic Encryption Using FPGA Clusters","authors":"Haohao Liao, Mahmoud A. Elmohr, Xuan Dong, Yanjun Qian, Wenzhe Yang, Zhiwei Shang, Yin Tan","doi":"10.1109/IPDPS54959.2023.00084","DOIUrl":null,"url":null,"abstract":"With the burgeoning demands for cloud computing in various fields followed by the rising attention to sensitive data exposure, Fully Homomorphic Encryption (FHE) is gaining popularity as a potential solution to privacy protection. By performing computations directly on the ciphertext (encrypted data) without decrypting it, FHE can guarantee the security of data throughout its lifecycle without compromising the privacy. However, the excruciatingly slow speed of FHE scheme makes adopting it impractical in real life applications. Therefore, hardware accelerators come to the rescue to mitigate the problem. Among various hardware platforms, FPGA clusters are particularly promising because of their flexibility and ready availability at many cloud providers such as FPGA-as-a-Service (FaaS). Hence, reusing the existing infrastructure can greatly facilitate the implementation of FHE on the cloud.In this paper, we present TurboHE, the first hardware accelerator for FHE operations based on an FPGA cluster. TurboHE aims to boost the performance of CKKS, one of the fastest FHE schemes which is most suitable to machine learning applications, by accelerating its computationally intensive and frequently used operation: relinearization. The proposed scalable architecture based on hardware partitioning can be easily configured to accommodate high acceleration requirements for relinearization with very large CKKS parameters. As a demonstration, an implementation, which supports 32,768 polynomial coefficients and a coefficient bitwidth of 594 decomposed into 11 Residue Number System (RNS) components, was deployed on a cluster consisting of 9 Xilinx VU13P FPGAs. The cluster operated at 200 MHz and achieved 1096 times throughput compared with a single threaded CPU implementation. Moreover, the low level hardware components implemented in this work such as the NTT module can also be applied to accelerate other lattice-based cryptography schemes.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the burgeoning demands for cloud computing in various fields followed by the rising attention to sensitive data exposure, Fully Homomorphic Encryption (FHE) is gaining popularity as a potential solution to privacy protection. By performing computations directly on the ciphertext (encrypted data) without decrypting it, FHE can guarantee the security of data throughout its lifecycle without compromising the privacy. However, the excruciatingly slow speed of FHE scheme makes adopting it impractical in real life applications. Therefore, hardware accelerators come to the rescue to mitigate the problem. Among various hardware platforms, FPGA clusters are particularly promising because of their flexibility and ready availability at many cloud providers such as FPGA-as-a-Service (FaaS). Hence, reusing the existing infrastructure can greatly facilitate the implementation of FHE on the cloud.In this paper, we present TurboHE, the first hardware accelerator for FHE operations based on an FPGA cluster. TurboHE aims to boost the performance of CKKS, one of the fastest FHE schemes which is most suitable to machine learning applications, by accelerating its computationally intensive and frequently used operation: relinearization. The proposed scalable architecture based on hardware partitioning can be easily configured to accommodate high acceleration requirements for relinearization with very large CKKS parameters. As a demonstration, an implementation, which supports 32,768 polynomial coefficients and a coefficient bitwidth of 594 decomposed into 11 Residue Number System (RNS) components, was deployed on a cluster consisting of 9 Xilinx VU13P FPGAs. The cluster operated at 200 MHz and achieved 1096 times throughput compared with a single threaded CPU implementation. Moreover, the low level hardware components implemented in this work such as the NTT module can also be applied to accelerate other lattice-based cryptography schemes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TurboHE:利用FPGA集群加速全同态加密
随着各个领域对云计算的需求不断增长,以及对敏感数据暴露的日益关注,完全同态加密(Fully Homomorphic Encryption, FHE)作为一种潜在的隐私保护解决方案越来越受欢迎。通过直接对密文(加密数据)进行计算而不进行解密,FHE可以在不损害隐私的情况下保证数据在整个生命周期中的安全性。然而,FHE方案的速度慢得令人难以忍受,这使得它在实际应用中不切实际。因此,硬件加速器可以缓解这个问题。在各种硬件平台中,FPGA集群特别有前途,因为它们在许多云提供商(如FPGA即服务(FaaS))中具有灵活性和现成的可用性。因此,重用现有的基础设施可以极大地促进FHE在云上的实现。在本文中,我们提出了TurboHE,第一个基于FPGA集群的FHE操作硬件加速器。TurboHE旨在提高CKKS的性能,CKKS是最适合机器学习应用的最快的FHE方案之一,通过加速其计算密集型和频繁使用的操作:线性化。提出的基于硬件分区的可扩展架构可以很容易地配置,以适应具有非常大的CKKS参数的线性化的高加速要求。作为演示,在由9个Xilinx VU13P fpga组成的集群上部署了一个支持32,768个多项式系数和594个系数位宽分解为11个残余数系统(RNS)组件的实现。与单线程CPU实现相比,集群在200 MHz下运行,实现了1096倍的吞吐量。此外,在本工作中实现的底层硬件组件,如NTT模块,也可以应用于加速其他基于点阵的加密方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations Generalizable Reinforcement Learning-Based Coarsening Model for Resource Allocation over Large and Diverse Stream Processing Graphs Smart Redbelly Blockchain: Reducing Congestion for Web3 QoS-Aware and Cost-Efficient Dynamic Resource Allocation for Serverless ML Workflows Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1