首页 > 最新文献

Microprocessors and Microsystems最新文献

英文 中文
A Real-time P-SFA hardware implementation of Deep Neural Networks using FPGA 使用 FPGA 的深度神经网络实时 P-SFA 硬件实现
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-01 Epub Date: 2024-02-17 DOI: 10.1016/j.micpro.2024.105037
Nour Elshahawy , Sandy A. Wasif , Maggie Mashaly , Eman Azab

Machine Learning (ML) algorithms, specifically Artificial Neural Networks (ANNs), have proved their effectiveness in solving complex problems in many different applications and multiple fields. This paper focuses on optimizing the activation function (AF) block of the NN hardware architecture. The AF block used is based on a probability-based sigmoid function approximation block (P-SFA) combined with a novel real-time probability module (PRT) that calculates the probability of the input data. The proposed NN design aims to use the least amount of hardware resources and area while maintaining a high recognition accuracy. The proposed AF module in this work consists of two P-SFA blocks and the PRT component. The architecture proposed for implementing NNs is evaluated on Field Programmable Gate Arrays (FPGAs). The proposed design has achieved a recognition accuracy of 97.84 % on a 6-layer Deep Neural Network (DNN) for the MNIST dataset and a recognition accuracy of 88.58% on a 6-layer DNN for the FMNIST dataset. The proposed AF module has a total area of 1136 LUTs and 327 FFs, a logical critical path delay of 8.853 ns. The power consumption of the P-SFA block is 6 mW and the PRT block is 5 mW.

机器学习(ML)算法,特别是人工神经网络(ANN),已证明能有效解决许多不同应用和多个领域的复杂问题。本文的重点是优化 NN 硬件架构中的激活函数 (AF) 模块。所使用的激活函数块基于基于概率的 sigmoid 函数近似块 (P-SFA),并与计算输入数据概率的新型实时概率模块 (PRT) 相结合。拟议的 NN 设计旨在使用最少的硬件资源和面积,同时保持较高的识别准确率。这项工作中提出的自动指纹识别模块由两个 P-SFA 模块和 PRT 组件组成。在现场可编程门阵列(FPGA)上评估了为实现 NN 而提出的架构。在 MNIST 数据集上,拟议设计的 6 层深度神经网络(DNN)的识别准确率达到 97.84%,在 FMNIST 数据集上,拟议设计的 6 层深度神经网络的识别准确率达到 88.58%。拟议的自动指纹识别模块的总面积为 1136 个 LUT 和 327 个 FF,逻辑关键路径延迟为 8.853 ns。P-SFA 模块的功耗为 6 mW,PRT 模块的功耗为 5 mW。
{"title":"A Real-time P-SFA hardware implementation of Deep Neural Networks using FPGA","authors":"Nour Elshahawy ,&nbsp;Sandy A. Wasif ,&nbsp;Maggie Mashaly ,&nbsp;Eman Azab","doi":"10.1016/j.micpro.2024.105037","DOIUrl":"10.1016/j.micpro.2024.105037","url":null,"abstract":"<div><p>Machine Learning (ML) algorithms, specifically Artificial Neural Networks (ANNs), have proved their effectiveness in solving complex problems in many different applications and multiple fields. This paper focuses on optimizing the activation function (AF) block of the NN hardware architecture. The AF block used is based on a probability-based sigmoid function approximation block (P-SFA) combined with a novel real-time probability module (PRT) that calculates the probability of the input data. The proposed NN design aims to use the least amount of hardware resources and area while maintaining a high recognition accuracy. The proposed AF module in this work consists of two P-SFA blocks and the PRT component. The architecture proposed for implementing NNs is evaluated on Field Programmable Gate Arrays (FPGAs). The proposed design has achieved a recognition accuracy of 97.84 % on a 6-layer Deep Neural Network (DNN) for the MNIST dataset and a recognition accuracy of 88.58% on a 6-layer DNN for the FMNIST dataset. The proposed AF module has a total area of 1136 LUTs and 327 FFs, a logical critical path delay of 8.853 ns. The power consumption of the P-SFA block is 6 mW and the PRT block is 5 mW.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105037"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139919876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CNC: A lightweight architecture for Binary Ring-LWE based PQC CNC:基于二进制环-LWE 的 PQC 轻量级架构
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-01 Epub Date: 2024-03-26 DOI: 10.1016/j.micpro.2024.105044
Shaik Ahmadunnisa, Sudha Ellison Mathe

In lattice-based cryptography, Ring Learning with Errors (RLWE) is a computationally hard cryptographic problem, comprising three basic mechanisms i.e., key generation, encryption, and decryption. Binary Ring Learning with Error (BRLWE), a new variant of RLWE has been proposed recently to reduce the key size and computational complexity compared to previous RLWE-based schemes. Based on this BRLWE scheme, efficient hardware architectures have been obtained in recent works for lightweight applications. The key operation involved in this scheme is AB+C , where A and C are integer polynomials and B is a binary polynomial. This paper proposes an efficient hardware architecture for BRLWE-based scheme targeted for lightweight applications. The architecture computes the arithmetic operation AB+C, which includes polynomial multiplication and addition over the polynomial ring Zq/(xn+1). The proposed architecture is applied in two conditions, fixed and variable values of q. Experimental results show the architecture proposed has 50% less Area-Delay Product (ADP) and 20% less Power-Delay Product (PDP) compared to the recently reported work for n=256.

在基于网格的密码学中,有误环学习(RLWE)是一个计算难度很大的密码学问题,包括三个基本机制,即密钥生成、加密和解密。二进制环形有误学习(BRLWE)是 RLWE 的一种新变体,与之前基于 RLWE 的方案相比,它可以减少密钥大小,降低计算复杂度。基于这种 BRLWE 方案,最近的研究为轻量级应用提供了高效的硬件架构。该方案涉及的密钥运算为 AB+C ,其中 A 和 C 为整数多项式,B 为二元多项式。本文针对轻量级应用,为基于 BRLWE 的方案提出了一种高效的硬件架构。该架构可计算算术运算 AB+C,包括多项式环 Zq/(xn+1)上的多项式乘法和加法。实验结果表明,与最近报道的 n=256 的工作相比,该架构的面积延迟积(ADP)减少了 50%,功率延迟积(PDP)减少了 20%。
{"title":"CNC: A lightweight architecture for Binary Ring-LWE based PQC","authors":"Shaik Ahmadunnisa,&nbsp;Sudha Ellison Mathe","doi":"10.1016/j.micpro.2024.105044","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105044","url":null,"abstract":"<div><p>In lattice-based cryptography, Ring Learning with Errors (RLWE) is a computationally hard cryptographic problem, comprising three basic mechanisms i.e., key generation, encryption, and decryption. Binary Ring Learning with Error (BRLWE), a new variant of RLWE has been proposed recently to reduce the key size and computational complexity compared to previous RLWE-based schemes. Based on this BRLWE scheme, efficient hardware architectures have been obtained in recent works for lightweight applications. The key operation involved in this scheme is <span><math><mrow><mi>A</mi><mi>B</mi><mo>+</mo><mi>C</mi></mrow></math></span> , where <span><math><mi>A</mi></math></span> and <span><math><mi>C</mi></math></span> are integer polynomials and <span><math><mi>B</mi></math></span> is a binary polynomial. This paper proposes an efficient hardware architecture for BRLWE-based scheme targeted for lightweight applications. The architecture computes the arithmetic operation <span><math><mrow><mi>A</mi><mi>B</mi><mo>+</mo><mi>C</mi></mrow></math></span>, which includes polynomial multiplication and addition over the polynomial ring <span><math><mrow><msub><mrow><mi>Z</mi></mrow><mrow><mi>q</mi></mrow></msub><mo>/</mo><mrow><mo>(</mo><msup><mrow><mi>x</mi></mrow><mrow><mi>n</mi></mrow></msup><mo>+</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span>. The proposed architecture is applied in two conditions, fixed and variable values of <span><math><mi>q</mi></math></span>. Experimental results show the architecture proposed has 50% less Area-Delay Product (ADP) and 20% less Power-Delay Product (PDP) compared to the recently reported work for <span><math><mrow><mi>n</mi><mo>=</mo><mn>256</mn></mrow></math></span>.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105044"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140309393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MOSAIC: Maximizing ResOurce Sharing in Behavioral Application SpecIfic ProCessors MOSAIC:最大限度地共享行为应用程序模拟处理器中的资源
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-01 Epub Date: 2024-02-20 DOI: 10.1016/j.micpro.2024.105039
Qilin Si, Benjamin Carrion Schafer

This work presents a method that can quickly determine which hardware accelerators (HWaccs) should be mapped together onto an Application-Specific Instruction Set Processor (ASIP), such that the resources shared among them are maximized. This work in particular targets HWaccs generated from untimed behavioral descriptions for High-Level Synthesis (HLS). Although HLS is a single process synthesis method, our approach is able to force resource sharing among the HWaccs by combining their behavioral descriptions together into a single description based on their potential to share resources. These shared resources include functional units (FUs) like multipliers, adders, and dividers, and also registers. In particular, our proposed flow leads up to 48% in area savings and on average 30%. Because an exhaustive enumeration of all possible combinations can lead to long runtimes, we propose a fast heuristic that leads to comparable results (only 6% worse on average), while being much faster (on average 500×).

这项工作提出了一种方法,可以快速确定哪些硬件加速器(HWaccs)应一起映射到特定应用指令集处理器(ASIP)上,从而使它们之间共享的资源最大化。这项工作特别针对从用于高级合成(HLS)的未计时行为描述中生成的 HWaccs。虽然 HLS 是一种单进程合成方法,但我们的方法能够根据 HWaccs 的资源共享潜力,将它们的行为描述合并为单一描述,从而强制它们共享资源。这些共享资源包括乘法器、加法器、除法器等功能单元(FU)以及寄存器。特别是,我们提出的流程可节省高达 48% 的面积,平均节省 30%。由于穷举所有可能的组合会导致较长的运行时间,因此我们提出了一种快速启发式方法,其结果与之相当(平均只差 6%),同时速度更快(平均 500 倍)。
{"title":"MOSAIC: Maximizing ResOurce Sharing in Behavioral Application SpecIfic ProCessors","authors":"Qilin Si,&nbsp;Benjamin Carrion Schafer","doi":"10.1016/j.micpro.2024.105039","DOIUrl":"10.1016/j.micpro.2024.105039","url":null,"abstract":"<div><p>This work presents a method that can quickly determine which hardware accelerators (HWaccs) should be mapped together onto an Application-Specific Instruction Set Processor (ASIP), such that the resources shared among them are maximized. This work in particular targets HWaccs generated from untimed behavioral descriptions for High-Level Synthesis (HLS). Although HLS is a single process synthesis method, our approach is able to force resource sharing among the HWaccs by combining their behavioral descriptions together into a single description based on their potential to share resources. These shared resources include functional units (FUs) like multipliers, adders, and dividers, and also registers. In particular, our proposed flow leads up to 48% in area savings and on average 30%. Because an exhaustive enumeration of all possible combinations can lead to long runtimes, we propose a fast heuristic that leads to comparable results (only 6% worse on average), while being much faster (on average 500<span><math><mo>×</mo></math></span>).</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105039"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139919853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IoT-Edge technology based cloud optimization using artificial neural networks 利用人工神经网络进行基于物联网边缘技术的云优化
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-01 Epub Date: 2024-04-03 DOI: 10.1016/j.micpro.2024.105049
Amjad Rehman , Tanzila Saba , Khalid Haseeb , Teg Alam , Gwanggil Jeon

In recent decades, artificial intelligence techniques have been adopted for many real-time applications. The Internet of Things (IoT) network comprises many sensing devices and physical objects for information gathering and further transmission. In addition to being sent to the receiving nodes, the collected data also needs to be received promptly. Also, many solutions have been proposed for IoT-based embedded systems using edge computing but they are not fully protected against unidentified communication threats. In such circumstances, such systems decrease the trust ratio, and communication performance is compromised. In this research, we describe an optimization model based on IoT-edged technology that incorporates cloud computational intelligence. Furthermore, edge nodes employ artificial intelligence algorithms to provide the optimal outcome for selecting trustworthy forwarded data and lengthen the connected time for smart devices. Firstly, the edge devices extract useful information from the IoT nodes, and accordingly, it provides a decision module based on optimization computing. Secondly, utilizing cryptographic approaches, edge technology secures the multi-layers of the IoT system and ensures data privacy with integrity. Finally, the proposed model is tested and verified for its performance than other related studies in terms of energy consumption, packet delivery ratio, and data delay.

近几十年来,许多实时应用都采用了人工智能技术。物联网(IoT)网络由许多传感设备和物理对象组成,用于信息收集和进一步传输。收集到的数据除了要发送到接收节点外,还需要及时接收。此外,针对使用边缘计算的基于物联网的嵌入式系统提出了许多解决方案,但这些解决方案并不能完全抵御不明通信威胁。在这种情况下,此类系统会降低信任率,通信性能也会受到影响。在这项研究中,我们介绍了一种基于物联网边缘技术的优化模型,该模型结合了云计算智能。此外,边缘节点采用人工智能算法,为选择可信转发数据提供最优结果,并延长智能设备的连接时间。首先,边缘设备从物联网节点中提取有用信息,并据此提供基于优化计算的决策模块。其次,利用加密方法,边缘技术可确保物联网系统的多层安全,并确保数据隐私的完整性。最后,对所提出的模型进行了测试和验证,证明其在能耗、数据包传送率和数据延迟方面的性能优于其他相关研究。
{"title":"IoT-Edge technology based cloud optimization using artificial neural networks","authors":"Amjad Rehman ,&nbsp;Tanzila Saba ,&nbsp;Khalid Haseeb ,&nbsp;Teg Alam ,&nbsp;Gwanggil Jeon","doi":"10.1016/j.micpro.2024.105049","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105049","url":null,"abstract":"<div><p>In recent decades, artificial intelligence techniques have been adopted for many real-time applications. The Internet of Things (IoT) network comprises many sensing devices and physical objects for information gathering and further transmission. In addition to being sent to the receiving nodes, the collected data also needs to be received promptly. Also, many solutions have been proposed for IoT-based embedded systems using edge computing but they are not fully protected against unidentified communication threats. In such circumstances, such systems decrease the trust ratio, and communication performance is compromised. In this research, we describe an optimization model based on IoT-edged technology that incorporates cloud computational intelligence. Furthermore, edge nodes employ artificial intelligence algorithms to provide the optimal outcome for selecting trustworthy forwarded data and lengthen the connected time for smart devices. Firstly, the edge devices extract useful information from the IoT nodes, and accordingly, it provides a decision module based on optimization computing. Secondly, utilizing cryptographic approaches, edge technology secures the multi-layers of the IoT system and ensures data privacy with integrity. Finally, the proposed model is tested and verified for its performance than other related studies in terms of energy consumption, packet delivery ratio, and data delay.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105049"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140351221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hand-held GPU accelerated device for multiclass classification of X-ray images using CNN model 利用 CNN 模型对 X 射线图像进行多类分类的手持式 GPU 加速设备
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-01 Epub Date: 2024-04-04 DOI: 10.1016/j.micpro.2024.105046
K.G. Satheeshkumar , V. Arunachalam , S. Deepika

Chest X-ray (CXR) images are the primary investigation aid for many lung diseases and their follow-ups. For diagnosis of SARS-CoV-2, RT–PCR test and chest Computed Tomography (CT) are commonly used but both face false negatives for ruling out the infection. So, there is a demanding need for developing a system combined with Artificial Intelligence (AI) and CXR imaging to detect COVID-19 patients to avoid its spread. Here, a robust and efficient handheld device is proposed. It uses the computational power of the Graphics Processing Unit (GPU) and pre-trained deep learning models for analyzing the CXR images. A Resnet-50 CNN model is deployed on an NVIDIA Jetson Nano GPU module for the real-time classification of COVID-19, Tuberculosis, and Normal using CXR images. The device can perform real-time classification of CXR images from a portable X-ray machine and classify the image into one of the above categories. For the extensive training, a database of 680 COVID-19, 1230 Tuberculosis, and 1050 normal CXR images are extracted by combining several global databases like Kaggle, SIRM, RSNA, and Radiopaedia. The classification accuracy, precision, and loss rate were 0.9879, 0.9758, and 0.0196 respectively and our model would improve with larger data sets. The highly accurate and high-performance GPU device significantly plays a far-reaching role in COVID-19 diagnosis using Chest X-ray, which could be beneficial to triage the health system and to combat the outbreak of COVID-19.

胸部 X 光(CXR)图像是许多肺部疾病及其后续治疗的主要辅助检查手段。对于 SARS-CoV-2 的诊断,RT-PCR 测试和胸部计算机断层扫描(CT)是常用的方法,但这两种方法都面临着排除感染的假阴性。因此,亟需开发一种结合人工智能(AI)和 CXR 成像的系统来检测 COVID-19 患者,以避免其扩散。在此,我们提出了一种强大而高效的手持设备。它利用图形处理器(GPU)的计算能力和预训练的深度学习模型来分析 CXR 图像。在英伟达 Jetson Nano GPU 模块上部署了一个 Resnet-50 CNN 模型,用于使用 CXR 图像对 COVID-19、肺结核和正常进行实时分类。该设备可对便携式 X 光机拍摄的 CXR 图像进行实时分类,并将图像归入上述类别之一。为了进行广泛的训练,结合 Kaggle、SIRM、RSNA 和 Radiopaedia 等多个全球数据库,提取了 680 张 COVID-19、1230 张肺结核和 1050 张正常 CXR 图像。分类准确率、精确率和丢失率分别为 0.9879、0.9758 和 0.0196,随着数据集的增加,我们的模型也会有所改进。高精度、高性能的 GPU 设备在利用胸部 X 光诊断 COVID-19 方面发挥了深远的作用,有利于卫生系统的分流和抗击 COVID-19 的爆发。
{"title":"Hand-held GPU accelerated device for multiclass classification of X-ray images using CNN model","authors":"K.G. Satheeshkumar ,&nbsp;V. Arunachalam ,&nbsp;S. Deepika","doi":"10.1016/j.micpro.2024.105046","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105046","url":null,"abstract":"<div><p>Chest X-ray (CXR) images are the primary investigation aid for many lung diseases and their follow-ups. For diagnosis of SARS-CoV-2, RT–PCR test and chest Computed Tomography (CT) are commonly used but both face false negatives for ruling out the infection. So, there is a demanding need for developing a system combined with Artificial Intelligence (AI) and CXR imaging to detect COVID-19 patients to avoid its spread. Here, a robust and efficient handheld device is proposed. It uses the computational power of the Graphics Processing Unit (GPU) and pre-trained deep learning models for analyzing the CXR images. A Resnet-50 CNN model is deployed on an NVIDIA Jetson Nano GPU module for the real-time classification of COVID-19, Tuberculosis, and Normal using CXR images. The device can perform real-time classification of CXR images from a portable X-ray machine and classify the image into one of the above categories. For the extensive training, a database of 680 COVID-19, 1230 Tuberculosis, and 1050 normal CXR images are extracted by combining several global databases like Kaggle, SIRM, RSNA, and Radiopaedia. The classification accuracy, precision, and loss rate were 0.9879, 0.9758, and 0.0196 respectively and our model would improve with larger data sets. The highly accurate and high-performance GPU device significantly plays a far-reaching role in COVID-19 diagnosis using Chest X-ray, which could be beneficial to triage the health system and to combat the outbreak of COVID-19.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105046"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140537103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A light-weight neuromorphic controlling clock gating based multi-core cryptography platform 基于时钟门控的轻量级神经形态多核加密平台
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-01 Epub Date: 2024-02-26 DOI: 10.1016/j.micpro.2024.105040
Pham-Khoi Dong , Khanh N. Dang , Duy-Anh Nguyen , Xuan-Tu Tran

While speeding up cryptography tasks can be accomplished by using a multi-core architecture to parallelize computation, one of the major challenges is optimizing power consumption. In principle, depending on the computation workload, individual cores can be turned off to save power during operation. However, too few active cores may lead to computational bottlenecks. In this work, we propose a novel platform named Spike-MCryptCores: a low-power multi-core AES platform with a neuromorphic controller. The proposed Spike-MCryptCores platform is composed of multiple AES cores, each core is equipped with a clock-gating scheme for reducing its power consumption while being idle. To optimize the power consumption of the whole platform, we use a neuromorphic controller. Therefore, a comprehensive framework to generate a data set, train the neural network, and produce hardware configuration for the Spiking Neural Network (SNN), a brain-inspired computing paradigm, is also presented in this paper. Moreover, Spike-MCryptCores integrates the hardware SNN inside its architecture to support low-cost and low-latency adaptations. The results show that implemented SNN controller occupies only 2.3 % of the overall area cost while providing the ability to reduce power consumption significantly. The lightweight SNN controller model is trained and tested with up to 95 % accuracy. The maximum difference between the predicted number of cores and the ideal one from the label is one unit only. Under 24 test scenarios, a SNN controller with clock-gating helps Spike-MCryptCores reducing the power consumption by 48.6 % on the average; by 67 % for the best-case scenario, and by 39 % for the worst-case scenario.

虽然利用多核架构并行计算可以加快密码学任务的速度,但其中一个主要挑战是优化功耗。原则上,根据计算工作量,可以在运行期间关闭单个内核以节省功耗。然而,过少的活动内核可能会导致计算瓶颈。在这项工作中,我们提出了一种名为 Spike-MCryptCores 的新型平台:一种带有神经形态控制器的低功耗多核 AES 平台。所提出的 Spike-MCryptCores 平台由多个 AES 内核组成,每个内核都配备了时钟门方案,以降低空闲时的功耗。为了优化整个平台的功耗,我们使用了神经形态控制器。因此,本文还提出了一个综合框架,用于生成数据集、训练神经网络,以及为尖峰神经网络(SNN)(一种大脑启发计算范例)生成硬件配置。此外,Spike-MCryptCores 还在其架构中集成了硬件 SNN,以支持低成本和低延迟的自适应。研究结果表明,实现的 SNN 控制器仅占整体面积成本的 2.3%,同时还能显著降低功耗。轻量级 SNN 控制器模型经过训练和测试,准确率高达 95%。根据标签预测的内核数与理想内核数之间的最大差异仅为一个单位。在 24 种测试场景下,带有时钟门的 SNN 控制器帮助 Spike-MCryptCores 平均降低了 48.6% 的功耗;在最佳场景下降低了 67%,在最差场景下降低了 39%。
{"title":"A light-weight neuromorphic controlling clock gating based multi-core cryptography platform","authors":"Pham-Khoi Dong ,&nbsp;Khanh N. Dang ,&nbsp;Duy-Anh Nguyen ,&nbsp;Xuan-Tu Tran","doi":"10.1016/j.micpro.2024.105040","DOIUrl":"10.1016/j.micpro.2024.105040","url":null,"abstract":"<div><p>While speeding up cryptography tasks can be accomplished by using a multi-core architecture to parallelize computation, one of the major challenges is optimizing power consumption. In principle, depending on the computation workload, individual cores can be turned off to save power during operation. However, too few active cores may lead to computational bottlenecks. In this work, we propose a novel platform named Spike-MCryptCores: a low-power multi-core AES platform with a neuromorphic controller. The proposed Spike-MCryptCores platform is composed of multiple AES cores, each core is equipped with a clock-gating scheme for reducing its power consumption while being idle. To optimize the power consumption of the whole platform, we use a neuromorphic controller. Therefore, a comprehensive framework to generate a data set, train the neural network, and produce hardware configuration for the Spiking Neural Network (SNN), a brain-inspired computing paradigm, is also presented in this paper. Moreover, Spike-MCryptCores integrates the hardware SNN inside its architecture to support low-cost and low-latency adaptations. The results show that implemented SNN controller occupies only 2.3 % of the overall area cost while providing the ability to reduce power consumption significantly. The lightweight SNN controller model is trained and tested with up to 95 % accuracy. The maximum difference between the predicted number of cores and the ideal one from the label is one unit only. Under 24 test scenarios, a SNN controller with clock-gating helps Spike-MCryptCores reducing the power consumption by 48.6 % on the average; by 67 % for the best-case scenario, and by 39 % for the worst-case scenario.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105040"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and evaluation of low power and area efficient approximate Booth multipliers for error tolerant applications 设计和评估用于容错应用的低功耗、高效面积近似布斯乘法器
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-01 Epub Date: 2024-02-17 DOI: 10.1016/j.micpro.2024.105036
Vishal Gundavarapu , P. Gowtham , A. Anita Angeline , P. Sasipriya

Approximate computing is an innovative design methodology to reduce the design complexity with an improvement in power efficiency, performance and area by compromising on the requirement of accuracy. In this paper, 8-bit approximate Booth multipliers have been proposed based on the approximate Radix-4 modified Booth encoding algorithm and approximate compressors for partial product accumulation to produce the final products are proposed. Two approximate Probability Based Booth Encoders (PBBE-1 and PBBE-2) have been proposed and used in the Booth multipliers. Error parameters have been measured and compared with the existing approximate booth multipliers. Exact booth multiplier of novel design existing in the literature has also been implemented for comparison purpose. The proposed approximate multipliers are then used in applications like image multiplication and IIR bi-quad filtering to prove their performance. Simulation results prove that the proposed booth multipliers outperform the existing approximate booth multipliers in terms of power and area with better accuracy. Synthesis results prove that the proposed Multiplier 6 was found to be the most efficient with a 56 % power consumption improvement and a 47 % area improvement when compared to the exact multiplier. All the simulations are carried out using Cadence® Genus with 180 nm CMOS process technology.

近似计算是一种创新的设计方法,它可以降低设计复杂度,提高能效、性能和面积,同时又不影响精度要求。本文提出了基于近似 Radix-4 改良 Booth 编码算法的 8 位近似 Booth 乘法器,并提出了用于部分乘积累加以生成最终乘积的近似压缩器。我们提出了两种基于概率的近似 Booth 编码器(PBBE-1 和 PBBE-2),并将其用于 Booth 乘法器中。对误差参数进行了测量,并与现有的近似展位乘法器进行了比较。为了进行比较,还实现了文献中现有的新颖设计的精确展位乘法器。然后,将所提出的近似乘法器用于图像乘法和 IIR 双四滤波等应用中,以证明其性能。仿真结果证明,所提出的近似亭乘法器在功耗和面积方面优于现有的近似亭乘法器,而且精度更高。合成结果证明,与精确乘法器相比,建议的乘法器 6 功耗降低了 56%,面积缩小了 47%,是最高效的乘法器。所有仿真均采用 180 纳米 CMOS 工艺技术的 Cadence® Genus 进行。
{"title":"Design and evaluation of low power and area efficient approximate Booth multipliers for error tolerant applications","authors":"Vishal Gundavarapu ,&nbsp;P. Gowtham ,&nbsp;A. Anita Angeline ,&nbsp;P. Sasipriya","doi":"10.1016/j.micpro.2024.105036","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105036","url":null,"abstract":"<div><p>Approximate computing is an innovative design methodology to reduce the design complexity with an improvement in power efficiency, performance and area by compromising on the requirement of accuracy. In this paper, 8-bit approximate Booth multipliers have been proposed based on the approximate Radix-4 modified Booth encoding algorithm and approximate compressors for partial product accumulation to produce the final products are proposed. Two approximate Probability Based Booth Encoders (PBBE-1 and PBBE-2) have been proposed and used in the Booth multipliers. Error parameters have been measured and compared with the existing approximate booth multipliers. Exact booth multiplier of novel design existing in the literature has also been implemented for comparison purpose. The proposed approximate multipliers are then used in applications like image multiplication and IIR bi-quad filtering to prove their performance. Simulation results prove that the proposed booth multipliers outperform the existing approximate booth multipliers in terms of power and area with better accuracy. Synthesis results prove that the proposed Multiplier 6 was found to be the most efficient with a 56 % power consumption improvement and a 47 % area improvement when compared to the exact multiplier. All the simulations are carried out using Cadence® Genus with 180 nm CMOS process technology.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105036"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139908537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retraction notice to the articles published in the Special Issue Signal Processing from “Microprocessors and Microsystems” 关于 "微处理器与微系统 "信号处理特刊中发表文章的撤稿通知
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-01 Epub Date: 2024-03-14 DOI: 10.1016/j.micpro.2024.105043
{"title":"Retraction notice to the articles published in the Special Issue Signal Processing from “Microprocessors and Microsystems”","authors":"","doi":"10.1016/j.micpro.2024.105043","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105043","url":null,"abstract":"","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105043"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000383/pdfft?md5=b791a7c7e5a9bb52a68a4f6dceabab14&pid=1-s2.0-S0141933124000383-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140134565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An improved algorithm for the estimation of the root mean square value as an optimal solution for commercial measurement equipment 作为商用测量设备最优解的均方根值估算改进算法
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-01 Epub Date: 2024-03-13 DOI: 10.1016/j.micpro.2024.105042
Marina Bulat, Stefan Mirković, Nemanja Gazivoda, Dragan Pejić, Marjan Urekar, Boris Antić

This paper demonstrates that direct changes in the algorithm for the estimation of the root mean square value of a voltage signal of an arbitrary waveform can lead to improved performances and lower measurement uncertainty of commercially available instruments without requiring any upgrade of their existing hardware. The research conducted and presented here is an original contribution to the development of estimation techniques and mathematical models for measurement oriented purposes regardless of the number of samples in the given period relying on mathematical calculation of the equal complexity as in the methods already in use. The theoretical approach examines the problem of numerical integration focusing on modified Simpson's 1/3 rule and modified Simpson's 3/8 rule used for the purpose of the estimation of the root mean square value when a small number of samples per period is available. It highlights the limitations of Simpson's 1/3 rule and Simpson's 3/8 rule, and shows that the newly proposed algorithm is optimal with respect to measurement accuracy and precision even in cases when the ratio of the sampling frequency and the signal's fundamental frequency is low. All theoretical results have been validated experimentally.

本文证明,直接改变任意波形电压信号均方根值的估算算法,可以提高市售仪器的性能并降低测量不确定性,而无需对现有硬件进行任何升级。本文所进行的研究是对开发估算技术和数学模型的原创性贡献,这些估算技术和数学模型以测量为导向,不考虑给定周期内的采样数量,并依赖于与现有方法具有相同复杂性的数学计算。理论方法研究了数值积分问题,重点是在每周期样本数较少的情况下,用于估计均方根值的修正辛普森 1/3 规则和修正辛普森 3/8 规则。它强调了辛普森 1/3 规则和辛普森 3/8 规则的局限性,并表明即使在采样频率与信号基频之比很低的情况下,新提出的算法在测量精度和准确度方面也是最佳的。所有理论结果均已通过实验验证。
{"title":"An improved algorithm for the estimation of the root mean square value as an optimal solution for commercial measurement equipment","authors":"Marina Bulat,&nbsp;Stefan Mirković,&nbsp;Nemanja Gazivoda,&nbsp;Dragan Pejić,&nbsp;Marjan Urekar,&nbsp;Boris Antić","doi":"10.1016/j.micpro.2024.105042","DOIUrl":"10.1016/j.micpro.2024.105042","url":null,"abstract":"<div><p>This paper demonstrates that direct changes in the algorithm for the estimation of the root mean square value of a voltage signal of an arbitrary waveform can lead to improved performances and lower measurement uncertainty of commercially available instruments without requiring any upgrade of their existing hardware. The research conducted and presented here is an original contribution to the development of estimation techniques and mathematical models for measurement oriented purposes regardless of the number of samples in the given period relying on mathematical calculation of the equal complexity as in the methods already in use. The theoretical approach examines the problem of numerical integration focusing on modified Simpson's 1/3 rule and modified Simpson's 3/8 rule used for the purpose of the estimation of the root mean square value when a small number of samples per period is available. It highlights the limitations of Simpson's 1/3 rule and Simpson's 3/8 rule, and shows that the newly proposed algorithm is optimal with respect to measurement accuracy and precision even in cases when the ratio of the sampling frequency and the signal's fundamental frequency is low. All theoretical results have been validated experimentally.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105042"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140153526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retraction notice to the articles published in the Special issue Smart Agri from “Microprocessors and Microsystems” 关于 "微处理器与微系统 "特刊《智能农业》中发表文章的撤稿通知
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-01 Epub Date: 2024-02-28 DOI: 10.1016/j.micpro.2024.105038
{"title":"Retraction notice to the articles published in the Special issue Smart Agri from “Microprocessors and Microsystems”","authors":"","doi":"10.1016/j.micpro.2024.105038","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105038","url":null,"abstract":"","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105038"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000334/pdfft?md5=6970ffb236c663c09734849d8c298072&pid=1-s2.0-S0141933124000334-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139993273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Microprocessors and Microsystems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1