ARX密码处理中可编程性的硬件开销分析

Proceedings of the Fourth Workshop on Hardware and Architectural Support for Security and Privacy Pub Date : 2015-06-14 DOI:10.1145/2768566.2768574

Mohamed El-Hadedy, K. Skadron

{"title":"ARX密码处理中可编程性的硬件开销分析","authors":"Mohamed El-Hadedy, K. Skadron","doi":"10.1145/2768566.2768574","DOIUrl":null,"url":null,"abstract":"This paper evaluates the area and performance overhead of a programmable cryptographic accelerator specialized to support ARX (Add, Rotate, and Xor) based encryption standards, which are common in symmetric cryptography. This overhead is measured by comparing to a variety of custom ARX implementations optimized specifically for π -- Cipher. This is a new algorithm for authenticated encryption that offers advantages over AES-GCM and is a candidate in the CAESAR competition. The programmable processor is designed to accommodate different word sizes, different block sizes and different security levels. The custom variants require separate versions to support these diverse capabilities. We find that the overhead of the programmability is quite high. For example, we implemented the Programmable Processing Element PPE in 227 slices, achieving a throughput of about 1.2 Gbps/block, regardless of the word size. In comparison, our best custom 64-bit implementation so far requires 445 slices, achieving 3.09 Gbps. This means that two PPEs running in parallel can achieve 75% of the throughput of the custom 64-bit solution, while providing flexibility to support diverse cryptographic standards.","PeriodicalId":332892,"journal":{"name":"Proceedings of the Fourth Workshop on Hardware and Architectural Support for Security and Privacy","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Hardware overhead analysis of programmability in ARX crypto processing\",\"authors\":\"Mohamed El-Hadedy, K. Skadron\",\"doi\":\"10.1145/2768566.2768574\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper evaluates the area and performance overhead of a programmable cryptographic accelerator specialized to support ARX (Add, Rotate, and Xor) based encryption standards, which are common in symmetric cryptography. This overhead is measured by comparing to a variety of custom ARX implementations optimized specifically for π -- Cipher. This is a new algorithm for authenticated encryption that offers advantages over AES-GCM and is a candidate in the CAESAR competition. The programmable processor is designed to accommodate different word sizes, different block sizes and different security levels. The custom variants require separate versions to support these diverse capabilities. We find that the overhead of the programmability is quite high. For example, we implemented the Programmable Processing Element PPE in 227 slices, achieving a throughput of about 1.2 Gbps/block, regardless of the word size. In comparison, our best custom 64-bit implementation so far requires 445 slices, achieving 3.09 Gbps. This means that two PPEs running in parallel can achieve 75% of the throughput of the custom 64-bit solution, while providing flexibility to support diverse cryptographic standards.\",\"PeriodicalId\":332892,\"journal\":{\"name\":\"Proceedings of the Fourth Workshop on Hardware and Architectural Support for Security and Privacy\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Fourth Workshop on Hardware and Architectural Support for Security and Privacy\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2768566.2768574\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fourth Workshop on Hardware and Architectural Support for Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2768566.2768574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文评估了专用于支持基于ARX(添加、旋转和Xor)的加密标准的可编程加密加速器的面积和性能开销，这些标准在对称加密中很常见。这种开销是通过比较各种专门为π - Cipher优化的定制ARX实现来衡量的。这是一种用于身份验证加密的新算法，与AES-GCM相比具有优势，是CAESAR竞争的候选算法。可编程处理器被设计成适应不同的字大小，不同的块大小和不同的安全级别。定制的变体需要单独的版本来支持这些不同的功能。我们发现可编程性的开销相当高。例如，我们在227片中实现了可编程处理元素PPE，无论字长如何，都实现了大约1.2 Gbps/块的吞吐量。相比之下，到目前为止，我们最好的定制64位实现需要445片，达到3.09 Gbps。这意味着两个并行运行的ppe可以达到定制64位解决方案吞吐量的75%，同时提供支持各种加密标准的灵活性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hardware overhead analysis of programmability in ARX crypto processing

This paper evaluates the area and performance overhead of a programmable cryptographic accelerator specialized to support ARX (Add, Rotate, and Xor) based encryption standards, which are common in symmetric cryptography. This overhead is measured by comparing to a variety of custom ARX implementations optimized specifically for π -- Cipher. This is a new algorithm for authenticated encryption that offers advantages over AES-GCM and is a candidate in the CAESAR competition. The programmable processor is designed to accommodate different word sizes, different block sizes and different security levels. The custom variants require separate versions to support these diverse capabilities. We find that the overhead of the programmability is quite high. For example, we implemented the Programmable Processing Element PPE in 227 slices, achieving a throughput of about 1.2 Gbps/block, regardless of the word size. In comparison, our best custom 64-bit implementation so far requires 445 slices, achieving 3.09 Gbps. This means that two PPEs running in parallel can achieve 75% of the throughput of the custom 64-bit solution, while providing flexibility to support diverse cryptographic standards.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Fourth Workshop on Hardware and Architectural Support for Security and Privacy

自引率

0.00%

发文量