基于cam的移动加速器大规模并行SIMD矩阵核对AES算法进行并行软件加密

Pub Date : 2023-01-01 DOI:10.12720/jait.14.2.355-362
Kyosuke Kageyama, Sota Arai, Hajime Hamano, Xiangbo Kong, T. Kumaki, T. Koide
{"title":"基于cam的移动加速器大规模并行SIMD矩阵核对AES算法进行并行软件加密","authors":"Kyosuke Kageyama, Sota Arai, Hajime Hamano, Xiangbo Kong, T. Kumaki, T. Koide","doi":"10.12720/jait.14.2.355-362","DOIUrl":null,"url":null,"abstract":"—Recently, it has become possible to execute various digital multimedia applications, such as image compression, video compression, and audio processing, on mobile devices — as long as the processing core in the mobile device has the required high levels of performance, versatility, and programmability. Generally speaking, multimedia applications operate by performing repeated arithmetic and table-lookup coding operations. Therefore, to make it easier to achieve those required high levels of performance, versatility, and programmability, we propose an accelerator for mobile Central Processing Units (CPUs) known as a Content Addressable Memory-based massive-parallel Single Instruction Multiple Data (SIMD) Matrix Core (CAMX) that improves the processing speeds of both arithmetic and table-lookup coding operations. Our proposed CAMX, which is equipped with two CAM modules, has highly parallel processing capabilities that facilitate fast table-lookup coding operations. In fact, the results of Advanced Encryption Standard (AES) encryption simulations conducted in this study show that its AES encryption total clock cycles are 1,362,699. Additionally, a detailed breakdown of the number of clock cycles shows 1,312,160 for SubBytes, a combined total of 17,161 for ShiftRows and MixColumns, and 2519 for AddRoundKey. This paper also confirmed that CAMX could process AES encryptions at a rate of 83.17 clock cycles/byte. Also, the performance of CAMX, related works, and existing mobile processors are compared. The related works do not have a dedicated circuit for AES processing. From the comparison results, CAMX provides a performance improvement of approximately 4.4-and 3569.1-times over the related works. The existing mobile processors are Texas Instruments (TI) DM3730 and a TI OMAP3530. From the comparison results, CAMX provides a performance improvement of approximately 2.1 times over TI DM3730 and TI OMAP3530.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallel Software Encryption of AES Algorithm by Using CAM-Based Massive-Parallel SIMD Matrix Core for Mobile Accelerator\",\"authors\":\"Kyosuke Kageyama, Sota Arai, Hajime Hamano, Xiangbo Kong, T. Kumaki, T. Koide\",\"doi\":\"10.12720/jait.14.2.355-362\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"—Recently, it has become possible to execute various digital multimedia applications, such as image compression, video compression, and audio processing, on mobile devices — as long as the processing core in the mobile device has the required high levels of performance, versatility, and programmability. Generally speaking, multimedia applications operate by performing repeated arithmetic and table-lookup coding operations. Therefore, to make it easier to achieve those required high levels of performance, versatility, and programmability, we propose an accelerator for mobile Central Processing Units (CPUs) known as a Content Addressable Memory-based massive-parallel Single Instruction Multiple Data (SIMD) Matrix Core (CAMX) that improves the processing speeds of both arithmetic and table-lookup coding operations. Our proposed CAMX, which is equipped with two CAM modules, has highly parallel processing capabilities that facilitate fast table-lookup coding operations. In fact, the results of Advanced Encryption Standard (AES) encryption simulations conducted in this study show that its AES encryption total clock cycles are 1,362,699. Additionally, a detailed breakdown of the number of clock cycles shows 1,312,160 for SubBytes, a combined total of 17,161 for ShiftRows and MixColumns, and 2519 for AddRoundKey. This paper also confirmed that CAMX could process AES encryptions at a rate of 83.17 clock cycles/byte. Also, the performance of CAMX, related works, and existing mobile processors are compared. The related works do not have a dedicated circuit for AES processing. From the comparison results, CAMX provides a performance improvement of approximately 4.4-and 3569.1-times over the related works. The existing mobile processors are Texas Instruments (TI) DM3730 and a TI OMAP3530. From the comparison results, CAMX provides a performance improvement of approximately 2.1 times over TI DM3730 and TI OMAP3530.\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.12720/jait.14.2.355-362\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12720/jait.14.2.355-362","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

-最近,在移动设备上执行各种数字多媒体应用程序(如图像压缩、视频压缩和音频处理)已经成为可能——只要移动设备中的处理核心具有所需的高水平性能、多功能性和可编程性。一般来说,多媒体应用程序通过执行重复的算术和表查找编码操作来运行。因此,为了更容易实现所需的高水平性能、多功能性和可编程性,我们提出了一种用于移动中央处理单元(cpu)的加速器,称为基于内容可寻址内存的大规模并行单指令多数据(SIMD)矩阵核心(CAMX),它可以提高算术和表查找编码操作的处理速度。我们提出的CAMX配备了两个CAM模块,具有高度并行的处理能力,可以促进快速的表查找编码操作。实际上,本研究进行的高级加密标准(Advanced Encryption Standard, AES)加密仿真结果表明,其AES加密总时钟周期为1,362,699。此外,时钟周期数量的详细细分显示SubBytes为1,312,160,ShiftRows和MixColumns的总和为17,161,AddRoundKey为2519。本文还证实了CAMX能够以83.17时钟周期/字节的速率处理AES加密。并对CAMX的性能、相关工作以及现有的移动处理器进行了比较。相关工作没有专门的AES处理电路。从比较结果来看,CAMX提供了大约4.4倍和3569.1倍的性能提升。现有的移动处理器是德州仪器(TI) DM3730和TI OMAP3530。从比较结果来看,CAMX的性能比TI DM3730和TI OMAP3530提高了约2.1倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
Parallel Software Encryption of AES Algorithm by Using CAM-Based Massive-Parallel SIMD Matrix Core for Mobile Accelerator
—Recently, it has become possible to execute various digital multimedia applications, such as image compression, video compression, and audio processing, on mobile devices — as long as the processing core in the mobile device has the required high levels of performance, versatility, and programmability. Generally speaking, multimedia applications operate by performing repeated arithmetic and table-lookup coding operations. Therefore, to make it easier to achieve those required high levels of performance, versatility, and programmability, we propose an accelerator for mobile Central Processing Units (CPUs) known as a Content Addressable Memory-based massive-parallel Single Instruction Multiple Data (SIMD) Matrix Core (CAMX) that improves the processing speeds of both arithmetic and table-lookup coding operations. Our proposed CAMX, which is equipped with two CAM modules, has highly parallel processing capabilities that facilitate fast table-lookup coding operations. In fact, the results of Advanced Encryption Standard (AES) encryption simulations conducted in this study show that its AES encryption total clock cycles are 1,362,699. Additionally, a detailed breakdown of the number of clock cycles shows 1,312,160 for SubBytes, a combined total of 17,161 for ShiftRows and MixColumns, and 2519 for AddRoundKey. This paper also confirmed that CAMX could process AES encryptions at a rate of 83.17 clock cycles/byte. Also, the performance of CAMX, related works, and existing mobile processors are compared. The related works do not have a dedicated circuit for AES processing. From the comparison results, CAMX provides a performance improvement of approximately 4.4-and 3569.1-times over the related works. The existing mobile processors are Texas Instruments (TI) DM3730 and a TI OMAP3530. From the comparison results, CAMX provides a performance improvement of approximately 2.1 times over TI DM3730 and TI OMAP3530.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1