首页 > 最新文献

Microprocessors and Microsystems最新文献

英文 中文
A review on hardware accelerators for convolutional neural network-based inference engines: Strategies for performance and energy-efficiency enhancement 基于卷积神经网络的推理引擎硬件加速器综述:性能和能效提升策略
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 DOI: 10.1016/j.micpro.2025.105146
Deepika S․ , Arunachalam V․ , Alex Noel Joseph Raj
In time-critical & safety-critical image classification applications, Convolutional Neural Networks (CNNs) based Inference Engines (IEs) are preferred and required to be fast, accurate, and cost-effective to meet the market demands. The self-feature extraction capabilities use millions of parameters and neurons in the stack of layers with restricted processing time. This paper reviews strategies applied in Hardware-based image classification CNN inference engines. The acceleration strategies are (1) Arithmetic Logic Unit (ALU)-based, (2) Data flow-based, and (3) Sparsity-based are considered here. Considering benchmark accuracy, the 16-bit mixed fixed/floating point could provide 99 % and 3.75 times more performance than Half-precision floating point in an application-specific CNN model. Feeding 2-dimensional or 3-dimensional data frames to the CNN layers would reuse the data. It optimizes the volume of memory usage and improves the efficiency of the processor array. The pruning of zero/near-zero valued Input Feature Maps (IFMs) and weights leads to sparsity in the data fed to the different layers. Therefore, data compression strategies and skipping the trivial computation (zero skipping approach) would reduce the complexity of the controller. There is a benchmark performance improvement of 1.17 times and 6.2 times in power efficiency compared to dense architecture. Minimizing the complexity of indexing and load balancing controller would improve the performance further.
在时间紧迫的&;基于卷积神经网络(Convolutional Neural Networks, cnn)的推理引擎(Inference engine, IEs)在安全关键的图像分类应用中更受青睐,并且需要快速、准确和高性价比来满足市场需求。在有限的处理时间内,自特征提取能力使用了数以百万计的参数和神经元。本文综述了在基于硬件的图像分类CNN推理引擎中应用的策略。本文考虑了(1)基于算术逻辑单元(ALU)、(2)基于数据流和(3)基于稀疏性的加速策略。考虑到基准精度,在特定应用的CNN模型中,16位混合固定/浮点可以提供比半精度浮点高99%和3.75倍的性能。向CNN层提供二维或三维数据帧将重用这些数据。它优化了内存使用量,提高了处理器阵列的效率。零/近零值的输入特征映射(ifm)和权重的修剪导致了馈送到不同层的数据的稀疏性。因此,数据压缩策略和跳过琐碎的计算(跳零方法)将降低控制器的复杂性。与密集架构相比,基准性能提高了1.17倍,能效提高了6.2倍。最小化索引和负载平衡控制器的复杂性将进一步提高性能。
{"title":"A review on hardware accelerators for convolutional neural network-based inference engines: Strategies for performance and energy-efficiency enhancement","authors":"Deepika S․ ,&nbsp;Arunachalam V․ ,&nbsp;Alex Noel Joseph Raj","doi":"10.1016/j.micpro.2025.105146","DOIUrl":"10.1016/j.micpro.2025.105146","url":null,"abstract":"<div><div>In time-critical &amp; safety-critical image classification applications, Convolutional Neural Networks (CNNs) based Inference Engines (IEs) are preferred and required to be fast, accurate, and cost-effective to meet the market demands. The self-feature extraction capabilities use millions of parameters and neurons in the stack of layers with restricted processing time. This paper reviews strategies applied in Hardware-based image classification CNN inference engines. The acceleration strategies are (1) Arithmetic Logic Unit (ALU)-based, (2) Data flow-based, and (3) Sparsity-based are considered here. Considering benchmark accuracy, the 16-bit mixed fixed/floating point could provide 99 % and 3.75 times more performance than Half-precision floating point in an application-specific CNN model. Feeding 2-dimensional or 3-dimensional data frames to the CNN layers would reuse the data. It optimizes the volume of memory usage and improves the efficiency of the processor array. The pruning of zero/near-zero valued Input Feature Maps (IFMs) and weights leads to sparsity in the data fed to the different layers. Therefore, data compression strategies and skipping the trivial computation (zero skipping approach) would reduce the complexity of the controller. There is a benchmark performance improvement of 1.17 times and 6.2 times in power efficiency compared to dense architecture. Minimizing the complexity of indexing and load balancing controller would improve the performance further.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105146"},"PeriodicalIF":1.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143510725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A cost-effective design for a mid-range microcontroller-based lock-in amplifier 一种基于中档微控制器的锁相放大器的高性价比设计
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-02-19 DOI: 10.1016/j.micpro.2025.105145
Ignacio Horcas , David Moreno-Salinas , José Sánchez-Moreno
Lock-in amplifiers are instruments widely used in physics and engineering laboratories, whose invention goes back to the 1940s. Due to the late electronic developments, the former analog implementations have been replaced with digital versions, mainly based on FPGAs (field-programmable gate arrays). The present work, exploiting the last advances in the microcontrollers field, consists in the development of a functional prototype of a low-cost lock-in amplifier based on a microcontroller with similar specifications to mid-range commercial amplifiers. The performance of the prototype has been tested and compared with commercial devices, showing a similar performance in common use cases at a much reduced cost.
锁相放大器是一种广泛应用于物理和工程实验室的仪器,其发明可以追溯到20世纪40年代。由于后期电子技术的发展,以前的模拟实现已经被主要基于fpga(现场可编程门阵列)的数字版本所取代。目前的工作,利用微控制器领域的最新进展,包括基于微控制器的低成本锁定放大器的功能原型的开发,其规格与中程商用放大器相似。原型机的性能已经过测试,并与商用设备进行了比较,在常见用例中显示出相似的性能,成本大大降低。
{"title":"A cost-effective design for a mid-range microcontroller-based lock-in amplifier","authors":"Ignacio Horcas ,&nbsp;David Moreno-Salinas ,&nbsp;José Sánchez-Moreno","doi":"10.1016/j.micpro.2025.105145","DOIUrl":"10.1016/j.micpro.2025.105145","url":null,"abstract":"<div><div>Lock-in amplifiers are instruments widely used in physics and engineering laboratories, whose invention goes back to the 1940s. Due to the late electronic developments, the former analog implementations have been replaced with digital versions, mainly based on FPGAs (field-programmable gate arrays). The present work, exploiting the last advances in the microcontrollers field, consists in the development of a functional prototype of a low-cost lock-in amplifier based on a microcontroller with similar specifications to mid-range commercial amplifiers. The performance of the prototype has been tested and compared with commercial devices, showing a similar performance in common use cases at a much reduced cost.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105145"},"PeriodicalIF":1.9,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143453911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A real-time interception system for compromised frequency-hopping signal eavesdropping 一种用于窃听跳频信号的实时拦截系统
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-02-13 DOI: 10.1016/j.micpro.2025.105144
Corentin Lavaud , Robin Gerzaguet , Matthieu Gautier , Olivier Berder , Erwan Nogues , Stephane Molton
In modern computing architectures, sensitive data (red data) is carried out in the same processing units as encrypted data (black data). Due to leaks (internal mixing, coupling …), this red data can be emitted in a legitimate radio transmission through a so-called telecom side-channel. This new type of side-channel creates an important threat as it can be passively and remotely processed by a dedicated interception system. This threat becomes even more concerning within the context of the Internet of Things, as the use of low-cost components leads to increased leaks. This paper addresses telecom side-channels on frequency-hopping signals, that are harsh to eavesdrop due to their sporadic nature in both time and frequency domains. To that goal, a wideband interception system is proposed, able to intercept frequency-hopping signals in real time and to extract sensitive red data from it. The system relies on software-defined radios and leverages both hardware and software resources to process a 200MHz bandwidth in real time. The proposed architecture is capable of detecting jumps on the order of 20μs and can therefore track 50,000 jumps per second across 1,024 channels. Finally, the criticality of telecom side-channels in Bluetooth communications is demonstrated through real interception on several microcontroller chips.
在现代计算架构中,敏感数据(红色数据)与加密数据(黑色数据)在相同的处理单元中执行。由于泄漏(内部混合,耦合…),这些红色数据可以通过所谓的电信侧信道在合法的无线电传输中发射。这种新型的侧信道产生了一个重要的威胁,因为它可以被专用拦截系统被动地远程处理。在物联网的背景下,这种威胁变得更加令人担忧,因为使用低成本组件会导致泄漏增加。本文研究了电信跳频信号的边信道,这种信号在时域和频域都具有偶发性,对窃听很不利。为此,提出了一种能够实时截获跳频信号并从中提取敏感红色数据的宽带截获系统。该系统依赖于软件定义无线电,并利用硬件和软件资源实时处理200MHz带宽。所提出的架构能够检测20μs量级的跳变,因此可以在1024个通道中每秒跟踪50,000个跳变。最后,通过在多个单片机上的实际拦截,论证了电信侧信道在蓝牙通信中的重要性。
{"title":"A real-time interception system for compromised frequency-hopping signal eavesdropping","authors":"Corentin Lavaud ,&nbsp;Robin Gerzaguet ,&nbsp;Matthieu Gautier ,&nbsp;Olivier Berder ,&nbsp;Erwan Nogues ,&nbsp;Stephane Molton","doi":"10.1016/j.micpro.2025.105144","DOIUrl":"10.1016/j.micpro.2025.105144","url":null,"abstract":"<div><div>In modern computing architectures, sensitive data (<em>red data</em>) is carried out in the same processing units as encrypted data (<em>black data</em>). Due to leaks (internal mixing, coupling …), this red data can be emitted in a legitimate radio transmission through a so-called telecom side-channel. This new type of side-channel creates an important threat as it can be passively and remotely processed by a dedicated interception system. This threat becomes even more concerning within the context of the Internet of Things, as the use of low-cost components leads to increased leaks. This paper addresses telecom side-channels on frequency-hopping signals, that are harsh to eavesdrop due to their sporadic nature in both time and frequency domains. To that goal, a wideband interception system is proposed, able to intercept frequency-hopping signals in real time and to extract sensitive red data from it. The system relies on software-defined radios and leverages both hardware and software resources to process a 200MHz bandwidth in real time. The proposed architecture is capable of detecting jumps on the order of <span><math><mrow><mn>20</mn><mi>μ</mi><mi>s</mi></mrow></math></span> and can therefore track 50,000 jumps per second across 1,024 channels. Finally, the criticality of telecom side-channels in Bluetooth communications is demonstrated through real interception on several microcontroller chips.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105144"},"PeriodicalIF":1.9,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143463740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open-source ROS-based simulation for verification of FPGA robotics applications 开源的基于ros的FPGA机器人应用验证仿真
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-02-10 DOI: 10.1016/j.micpro.2025.105143
Rubén Nieto , Felipe Machado , Jesús Fernández-Conde , David Lobato , José M. Cañas
FPGAs are increasingly incorporated in many high-end robotics applications, often involving computer vision and motor control. However, functional verification of FPGA designs is labor-intensive, time-consuming, and consequently expensive. Moreover, validation of complex systems, such as robots, poses even further challenges because neither the external interactions can be easily modeled with traditional testbenches nor the robot’s response can be adequately observed and ascertained. This work presents a new methodology that validates the robot’s behavior in a realistic simulated environment before transferring the design to the physical robot and the onboard FPGA. This methodology allows integral, fast, and flexible debugging cycles of robotics applications by integrating the functional simulation of the processing unit (FPGA) with the simulation of the robot, its environment, and their mutual interconnections. The Verilator simulation tool is used for fast Verilog/SystemVerilog verification and simulation. ROS, the standard robotics middleware, and Gazebo 3D robotics simulator are used for realistic robot simulation, including a robust physics engine. We have implemented several open-source software extensions to interconnect the Verilog circuit with the simulated ROS sensors and actuators. This methodology’s utility and correctness have been assessed by developing a complete proof-of-concept FPGA-based robotics application in which a commercial robot follows a colored object using its onboard camera and differential drive motors. This work establishes the foundations for developing and testing complex robot FPGA-based modules more efficiently and flexibly.
fpga越来越多地应用于许多高端机器人应用,通常涉及计算机视觉和电机控制。然而,FPGA设计的功能验证是劳动密集型的,耗时的,因此是昂贵的。此外,对复杂系统(如机器人)的验证提出了进一步的挑战,因为传统的试验台既不能轻易地对外部相互作用进行建模,也不能充分观察和确定机器人的响应。这项工作提出了一种新的方法,在将设计转移到物理机器人和板载FPGA之前,在真实的模拟环境中验证机器人的行为。这种方法通过将处理单元(FPGA)的功能模拟与机器人、其环境及其相互连接的模拟相结合,允许机器人应用程序的集成、快速和灵活的调试周期。Verilator仿真工具用于快速Verilog/SystemVerilog验证和仿真。ROS,标准机器人中间件和Gazebo 3D机器人模拟器用于逼真的机器人模拟,包括一个强大的物理引擎。我们已经实现了几个开源软件扩展,将Verilog电路与模拟ROS传感器和执行器互连。该方法的实用性和正确性已经通过开发一个完整的基于fpga的概念验证机器人应用程序进行了评估,其中一个商业机器人使用其机载摄像头和差动驱动电机跟踪彩色物体。为更高效、灵活地开发和测试复杂的机器人fpga模块奠定了基础。
{"title":"Open-source ROS-based simulation for verification of FPGA robotics applications","authors":"Rubén Nieto ,&nbsp;Felipe Machado ,&nbsp;Jesús Fernández-Conde ,&nbsp;David Lobato ,&nbsp;José M. Cañas","doi":"10.1016/j.micpro.2025.105143","DOIUrl":"10.1016/j.micpro.2025.105143","url":null,"abstract":"<div><div>FPGAs are increasingly incorporated in many high-end robotics applications, often involving computer vision and motor control. However, functional verification of FPGA designs is labor-intensive, time-consuming, and consequently expensive. Moreover, validation of complex systems, such as robots, poses even further challenges because neither the external interactions can be easily modeled with traditional testbenches nor the robot’s response can be adequately observed and ascertained. This work presents a new methodology that validates the robot’s behavior in a realistic simulated environment before transferring the design to the physical robot and the onboard FPGA. This methodology allows integral, fast, and flexible debugging cycles of robotics applications by integrating the functional simulation of the processing unit (FPGA) with the simulation of the robot, its environment, and their mutual interconnections. The Verilator simulation tool is used for fast Verilog/SystemVerilog verification and simulation. ROS, the standard robotics middleware, and Gazebo 3D robotics simulator are used for realistic robot simulation, including a robust physics engine. We have implemented several open-source software extensions to interconnect the Verilog circuit with the simulated ROS sensors and actuators. This methodology’s utility and correctness have been assessed by developing a complete proof-of-concept FPGA-based robotics application in which a commercial robot follows a colored object using its onboard camera and differential drive motors. This work establishes the foundations for developing and testing complex robot FPGA-based modules more efficiently and flexibly.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105143"},"PeriodicalIF":1.9,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143428214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware-assisted virtualization extensions for LEON processors in mixed-criticality systems 混合临界系统中LEON处理器的硬件辅助虚拟化扩展
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-02-01 DOI: 10.1016/j.micpro.2024.105130
Borja Losa , Pablo Parra , Antonio Da Silva , Óscar R. Polo , J. Ignacio G. Tejedor , Agustín Martínez , Sebastián Sánchez , David Guzmán
The increasing complexity of real-time embedded critical systems has driven the adoption of new methodologies to mitigate high development costs. One of the most common approaches is the implementation of mixed-criticality systems, characterized by integrating applications with different levels of criticality on the same processing unit. In these systems, applications run on a separation kernel hypervisor, a software element that controls the execution of the different operating systems, providing a virtualized environment and ensuring the necessary spatial and temporal isolation. This paper presents the design and implementation of hardware virtualization extensions for LEON processors, whose use is widespread in the field of space systems. These extensions enable the execution of virtualized applications with minimal transitions to the hypervisor, enhancing system performance. Our proposed solution defines a specific execution mode and duplicates control and status registers for the exclusive use of virtualized applications. In addition, the functionality of the hardware and software interrupt signals has been extended, allowing developers to select which ones are handled by the hypervisor and which ones by the guest operating systems directly. We have implemented the proposed extension using the LEON version 3 processor’s original VHDL code, and validated it using exhaustive tests to evaluate performance and resource consumption. The results show that the proposed modifications allow virtualized applications to execute without hypervisor intervention, matching the performance when non-virtualized while significantly outperforming existing para-virtualization solutions. Resource consumption increases by 6% to 14%, depending on the FPGA, which is low when compared to available resources. Power consumption increases by only a few milliwatts, which can be considered negligible.
实时嵌入式关键系统日益增加的复杂性促使采用新的方法来降低高昂的开发成本。最常见的方法之一是实现混合临界系统,其特点是在同一处理单元上集成具有不同临界级别的应用程序。在这些系统中,应用程序运行在分离内核管理程序(一个控制不同操作系统执行的软件元素)上,提供虚拟化环境并确保必要的空间和时间隔离。本文介绍了在空间系统中应用广泛的LEON处理器硬件虚拟化扩展的设计与实现。这些扩展支持执行虚拟化的应用程序,而无需向管理程序转换,从而增强了系统性能。我们提出的解决方案定义了一个特定的执行模式,并为虚拟化应用程序的独占使用复制控制和状态寄存器。此外,硬件和软件中断信号的功能也得到了扩展,允许开发人员选择哪些由hypervisor处理,哪些由客户机操作系统直接处理。我们使用LEON版本3处理器的原始VHDL代码实现了建议的扩展,并使用详尽的测试来评估性能和资源消耗。结果表明,所提出的修改允许虚拟化应用程序在没有管理程序干预的情况下执行,与非虚拟化时的性能相当,同时显著优于现有的半虚拟化解决方案。根据FPGA的不同,资源消耗增加了6%到14%,与可用资源相比,这是低的。功率消耗只增加几毫瓦,可以认为是微不足道的。
{"title":"Hardware-assisted virtualization extensions for LEON processors in mixed-criticality systems","authors":"Borja Losa ,&nbsp;Pablo Parra ,&nbsp;Antonio Da Silva ,&nbsp;Óscar R. Polo ,&nbsp;J. Ignacio G. Tejedor ,&nbsp;Agustín Martínez ,&nbsp;Sebastián Sánchez ,&nbsp;David Guzmán","doi":"10.1016/j.micpro.2024.105130","DOIUrl":"10.1016/j.micpro.2024.105130","url":null,"abstract":"<div><div>The increasing complexity of real-time embedded critical systems has driven the adoption of new methodologies to mitigate high development costs. One of the most common approaches is the implementation of mixed-criticality systems, characterized by integrating applications with different levels of criticality on the same processing unit. In these systems, applications run on a separation kernel hypervisor, a software element that controls the execution of the different operating systems, providing a virtualized environment and ensuring the necessary spatial and temporal isolation. This paper presents the design and implementation of hardware virtualization extensions for LEON processors, whose use is widespread in the field of space systems. These extensions enable the execution of virtualized applications with minimal transitions to the hypervisor, enhancing system performance. Our proposed solution defines a specific execution mode and duplicates control and status registers for the exclusive use of virtualized applications. In addition, the functionality of the hardware and software interrupt signals has been extended, allowing developers to select which ones are handled by the hypervisor and which ones by the guest operating systems directly. We have implemented the proposed extension using the LEON version 3 processor’s original VHDL code, and validated it using exhaustive tests to evaluate performance and resource consumption. The results show that the proposed modifications allow virtualized applications to execute without hypervisor intervention, matching the performance when non-virtualized while significantly outperforming existing para-virtualization solutions. Resource consumption increases by 6% to 14%, depending on the FPGA, which is low when compared to available resources. Power consumption increases by only a few milliwatts, which can be considered negligible.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"112 ","pages":"Article 105130"},"PeriodicalIF":1.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143147891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware security against IP piracy using secure fingerprint encrypted fused amino-acid biometric with facial anthropometric signature 使用安全指纹加密融合氨基酸生物特征与面部人体特征签名的硬件安全防止IP盗版
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-02-01 DOI: 10.1016/j.micpro.2024.105131
Anirban Sengupta, Aditya Anshul, Ayush Kumar Singh
In the era of modern global design supply chain, the emergence of hardware threats is on the rise. Conventional hardware security techniques may fall short in terms of offering inferior tamper tolerance, unpersuasive digital ownership proof and weaker entropy, for sturdy intellectual property (IP) piracy detection and seamless IP ownership conflict resolution process. This paper presents a novel hardware security methodology based on IP seller's amino acid biometric and facial anthropometric features to generate an encrypted fused signature using multi-key driven non-invertible fingerprint, for providing sturdy detective countermeasure against IP piracy. The proposed approach exploits AES framework, where the generated key-translated fingerprint minutiae points of the IP seller is used as an encryption key. The proposed methodology is highly robust against hardware threats as it capable to generate large size covert security constraints for embedding, as digital evidence, in the IP design during high level synthesis (HLS). The results of the proposed approach on comparison with existing approaches, indicates enhanced tamper tolerance ability (against brute force attack) of upto 1.15E+77, lower probability of coincidence or false positive (against ghost signature search attack) of upto 6.72E-06, and stronger entropy of upto 2.06E-138, respectively.
在现代全球设计供应链时代,硬件威胁的出现呈上升趋势。传统的硬件安全技术在提供较差的篡改容忍度、缺乏说服力的数字所有权证明和较弱的熵方面可能存在不足,无法实现可靠的知识产权盗版检测和无缝的知识产权所有权冲突解决过程。本文提出了一种基于IP卖家的氨基酸生物特征和面部人体特征的新型硬件安全方法,利用多密钥驱动的不可逆指纹生成加密融合签名,为IP盗版提供了可靠的检测对策。提出的方法利用AES框架,将IP卖家生成的密钥转换指纹细节点用作加密密钥。所提出的方法对硬件威胁具有很强的鲁棒性,因为它能够在高层次合成(HLS)期间在IP设计中生成大尺寸的隐蔽安全约束,作为数字证据。与现有方法相比,该方法的抗篡改能力(抗蛮力攻击)提高到1.15E+77,一致性和误报概率(抗幽灵签名搜索攻击)降低到6.72E-06,熵增强到2.06E-138。
{"title":"Hardware security against IP piracy using secure fingerprint encrypted fused amino-acid biometric with facial anthropometric signature","authors":"Anirban Sengupta,&nbsp;Aditya Anshul,&nbsp;Ayush Kumar Singh","doi":"10.1016/j.micpro.2024.105131","DOIUrl":"10.1016/j.micpro.2024.105131","url":null,"abstract":"<div><div>In the era of modern global design supply chain, the emergence of hardware threats is on the rise. Conventional hardware security techniques may fall short in terms of offering inferior tamper tolerance, unpersuasive digital ownership proof and weaker entropy, for sturdy intellectual property (IP) piracy detection and seamless IP ownership conflict resolution process. This paper presents a novel hardware security methodology based on IP seller's amino acid biometric and facial anthropometric features to generate an encrypted fused signature using multi-key driven non-invertible fingerprint, for providing sturdy detective countermeasure against IP piracy. The proposed approach exploits AES framework, where the generated key-translated fingerprint minutiae points of the IP seller is used as an encryption key. The proposed methodology is highly robust against hardware threats as it capable to generate large size covert security constraints for embedding, as digital evidence, in the IP design during high level synthesis (HLS). The results of the proposed approach on comparison with existing approaches, indicates enhanced tamper tolerance ability (against brute force attack) of upto 1.15E+77, lower probability of coincidence or false positive (against ghost signature search attack) of upto 6.72E-06, and stronger entropy of upto 2.06E-138, respectively.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"112 ","pages":"Article 105131"},"PeriodicalIF":1.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143147889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and implementation of a synchronous Hardware Performance Monitor for a RISC-V space-oriented processor 面向RISC-V空间处理器的同步硬件性能监视器的设计与实现
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-02-01 DOI: 10.1016/j.micpro.2024.105132
Miguel Jiménez Arribas, Agustín Martínez Hellín, Manuel Prieto Mateo, Iván Gamino del Río, Andrea Fernández Gallego, Óscar Rodríguez Polo, Antonio da Silva, Pablo Parra, Sebastián Sánchez
The ability to collect statistics about the execution of a program within a CPU is of the utmost importance across all fields of computing since it allows characterizing the timing performance of a program. This capability is even more relevant in safety-critical software systems, where it is mandatory to analyze the software timing requirements to ensure the correct operation of the programs. Moreover, in order to properly evaluate and verify the extra-functional properties of these systems, besides timing performance, there are many other statistics available on a CPU, such as those associated with its resource utilization. In this paper, we showcase a Performance Measurement Unit (PMU), also known as a Hardware Performance Monitor (HPM), integrated into a RISC-V On-Board Computer (OBC) designed for space applications by our research group. The monitoring technique features a novel approach whereby the events triggered are not counted immediately but instead are propagated through the pipeline so that their annotation is synchronized with the executed instruction. Additionally, we also demonstrate the use of this PMU in a process to characterize the execution model of the processor. Finally, as an example of the statistics provided by the PMU, the results obtained running the CoreMark and Dhrystone benchmarks on the RISC-V OBC are shown.
收集关于CPU内程序执行的统计信息的能力在所有计算领域都是至关重要的,因为它允许描述程序的计时性能。这种能力在安全关键型软件系统中更为重要,在这些系统中,分析软件时序需求以确保程序的正确操作是强制性的。此外,为了正确地评估和验证这些系统的额外功能属性,除了计时性能之外,CPU上还有许多其他可用的统计数据,例如与其资源利用率相关的统计数据。在本文中,我们展示了一个性能测量单元(PMU),也称为硬件性能监视器(HPM),集成到RISC-V板载计算机(OBC)中,由我们的研究小组设计用于空间应用。监视技术的特点是采用一种新颖的方法,即触发的事件不会立即计数,而是通过管道传播,以便它们的注释与执行的指令同步。此外,我们还演示了在进程中使用该PMU来描述处理器的执行模型。最后,作为PMU提供的统计数据的一个示例,展示了在RISC-V OBC上运行CoreMark和Dhrystone基准测试获得的结果。
{"title":"Design and implementation of a synchronous Hardware Performance Monitor for a RISC-V space-oriented processor","authors":"Miguel Jiménez Arribas,&nbsp;Agustín Martínez Hellín,&nbsp;Manuel Prieto Mateo,&nbsp;Iván Gamino del Río,&nbsp;Andrea Fernández Gallego,&nbsp;Óscar Rodríguez Polo,&nbsp;Antonio da Silva,&nbsp;Pablo Parra,&nbsp;Sebastián Sánchez","doi":"10.1016/j.micpro.2024.105132","DOIUrl":"10.1016/j.micpro.2024.105132","url":null,"abstract":"<div><div>The ability to collect statistics about the execution of a program within a CPU is of the utmost importance across all fields of computing since it allows characterizing the timing performance of a program. This capability is even more relevant in safety-critical software systems, where it is mandatory to analyze the software timing requirements to ensure the correct operation of the programs. Moreover, in order to properly evaluate and verify the extra-functional properties of these systems, besides timing performance, there are many other statistics available on a CPU, such as those associated with its resource utilization. In this paper, we showcase a Performance Measurement Unit (PMU), also known as a Hardware Performance Monitor (HPM), integrated into a RISC-V On-Board Computer (OBC) designed for space applications by our research group. The monitoring technique features a novel approach whereby the events triggered are not counted immediately but instead are propagated through the pipeline so that their annotation is synchronized with the executed instruction. Additionally, we also demonstrate the use of this PMU in a process to characterize the execution model of the processor. Finally, as an example of the statistics provided by the PMU, the results obtained running the CoreMark and Dhrystone benchmarks on the RISC-V OBC are shown.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"112 ","pages":"Article 105132"},"PeriodicalIF":1.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143147890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Coarse-Grained Reconfigurable Array architecture for machine learning applications in space using DARE65T library platform 利用 DARE65T 库平台为空间机器学习应用设计高效的粗粒度可重构阵列架构
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-01-14 DOI: 10.1016/j.micpro.2025.105142
Luca Zulberti , Matteo Monopoli , Pietro Nannipieri , Silvia Moranti , Geert Thys , Luca Fanucci
<div><div>With the increasing use of satellites, rovers, and other space exploration devices, Artificial Intelligence (AI) is also becoming an important tool for space exploration, allowing autonomous decision-making and operations in harsh environments. As a result, there is an increasing demand for reliable and energy-efficient processing platforms in the space industry. Among all processing architectures, Coarse-Grained Reconfigurable Arrays (CGRAs) are becoming popular, particularly in data-intensive applications like machine learning, demonstrating a substantial improvement in the energy efficiency of inference operations while preserving a good degree of versatility. In high-level class space missions, the hardware platforms incorporate radiation-hardened Field Programmable Gate Arrays (FPGAs) and microcontrollers, which do not meet the performance requirements for the aforementioned AI applications. The use of CGRA architectures in space missions is still not widely studied. The main contribution of this work is a comprehensive Design Space Exploration (DSE) activity with our highly parameterized CGRA architecture, exploring the costs associated with various design parameters when targeting AI in the space domain. We evaluated performance, power consumption, and area occupation after synthesis on the radiation-hardened DARE65T standard cell library developed by imec, based on a commercial 65 nm technology process. We characterize different CGRA configurations, comparing them with state-of-the-art solutions used for the acceleration of the AI algorithms. This work highlights Performance, Power, and Area (PPA) results that range from <span><math><mrow><mi>100</mi><mspace></mspace><mi>MHz</mi></mrow></math></span> (up to <span><math><mrow><mi>600</mi><mspace></mspace><mi>MOps</mi></mrow></math></span>), <span><math><mrow><mi>2.43</mi><mo>×</mo><msup><mrow><mi>10</mi></mrow><mrow><mi>4</mi></mrow></msup><mspace></mspace><mstyle><mstyle><mi>μ</mi></mstyle></mstyle><msup><mrow><mi>m</mi></mrow><mrow><mi>2</mi></mrow></msup></mrow></math></span> cell area occupation and <span><math><mrow><mi>0.699</mi><mspace></mspace><mi>mW</mi></mrow></math></span> power consumption, to <span><math><mrow><mi>625</mi><mspace></mspace><mi>MHz</mi></mrow></math></span> (up to <span><math><mrow><mi>3.75</mi><mspace></mspace><mi>GOps</mi></mrow></math></span>), <span><math><mrow><mi>2.43</mi><mo>×</mo><msup><mrow><mi>10</mi></mrow><mrow><mi>5</mi></mrow></msup><mspace></mspace><mstyle><mstyle><mi>μ</mi></mstyle></mstyle><msup><mrow><mi>m</mi></mrow><mrow><mi>2</mi></mrow></msup><mo>,</mo><mi>46.5</mi><mspace></mspace><mi>mW</mi></mrow></math></span>. During DSE activity, we highlight the optimal solutions in terms of area efficiency (up to <span><math><mrow><mi>313.1</mi><mspace></mspace><msup><mrow><mi>GOps/mm</mi></mrow><mrow><mi>2</mi></mrow></msup></mrow></math></span>) and energy efficiency (up to <span><math><mrow><mi>289</mi><mspace></mspace><mi>GOps/W</mi></
随着卫星、漫游者和其他空间探索设备的使用越来越多,人工智能(AI)也成为空间探索的重要工具,可以在恶劣环境下自主决策和操作。因此,航天工业对可靠和节能的处理平台的需求日益增加。在所有的处理架构中,粗粒度可重构阵列(CGRAs)正变得越来越流行,特别是在数据密集型应用中,如机器学习,在保持良好通用性的同时,证明了推理操作的能源效率的大幅提高。在高级别空间任务中,硬件平台包含抗辐射的现场可编程门阵列(fpga)和微控制器,它们不满足上述人工智能应用的性能要求。CGRA结构在空间任务中的应用还没有得到广泛的研究。这项工作的主要贡献是利用我们高度参数化的CGRA架构进行全面的设计空间探索(DSE)活动,探索在空间领域瞄准人工智能时与各种设计参数相关的成本。我们评估了imec基于商用65纳米工艺开发的抗辐射DARE65T标准细胞库合成后的性能、功耗和面积占用。我们描述了不同的CGRA配置,并将它们与用于加速AI算法的最先进解决方案进行了比较。这项工作突出了性能,功率和面积(PPA)结果,范围从100MHz(高达600MOps), 2.43×104μm2小区面积占用和0.699mW功耗,到625MHz(高达3.75GOps), 2.43×105μm2,46.5mW。在DSE活动期间,我们强调了每个CGRA配置在面积效率(高达313.1GOps/mm2)和能源效率(高达289GOps/W)方面的最佳解决方案。
{"title":"Efficient Coarse-Grained Reconfigurable Array architecture for machine learning applications in space using DARE65T library platform","authors":"Luca Zulberti ,&nbsp;Matteo Monopoli ,&nbsp;Pietro Nannipieri ,&nbsp;Silvia Moranti ,&nbsp;Geert Thys ,&nbsp;Luca Fanucci","doi":"10.1016/j.micpro.2025.105142","DOIUrl":"10.1016/j.micpro.2025.105142","url":null,"abstract":"&lt;div&gt;&lt;div&gt;With the increasing use of satellites, rovers, and other space exploration devices, Artificial Intelligence (AI) is also becoming an important tool for space exploration, allowing autonomous decision-making and operations in harsh environments. As a result, there is an increasing demand for reliable and energy-efficient processing platforms in the space industry. Among all processing architectures, Coarse-Grained Reconfigurable Arrays (CGRAs) are becoming popular, particularly in data-intensive applications like machine learning, demonstrating a substantial improvement in the energy efficiency of inference operations while preserving a good degree of versatility. In high-level class space missions, the hardware platforms incorporate radiation-hardened Field Programmable Gate Arrays (FPGAs) and microcontrollers, which do not meet the performance requirements for the aforementioned AI applications. The use of CGRA architectures in space missions is still not widely studied. The main contribution of this work is a comprehensive Design Space Exploration (DSE) activity with our highly parameterized CGRA architecture, exploring the costs associated with various design parameters when targeting AI in the space domain. We evaluated performance, power consumption, and area occupation after synthesis on the radiation-hardened DARE65T standard cell library developed by imec, based on a commercial 65 nm technology process. We characterize different CGRA configurations, comparing them with state-of-the-art solutions used for the acceleration of the AI algorithms. This work highlights Performance, Power, and Area (PPA) results that range from &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;100&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;MHz&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; (up to &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;600&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;MOps&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;), &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;2.43&lt;/mi&gt;&lt;mo&gt;×&lt;/mo&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;10&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;4&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mstyle&gt;&lt;mstyle&gt;&lt;mi&gt;μ&lt;/mi&gt;&lt;/mstyle&gt;&lt;/mstyle&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;2&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; cell area occupation and &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;0.699&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;mW&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; power consumption, to &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;625&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;MHz&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; (up to &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;3.75&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;GOps&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;), &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;2.43&lt;/mi&gt;&lt;mo&gt;×&lt;/mo&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;10&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;5&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mstyle&gt;&lt;mstyle&gt;&lt;mi&gt;μ&lt;/mi&gt;&lt;/mstyle&gt;&lt;/mstyle&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;2&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;46.5&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;mW&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;. During DSE activity, we highlight the optimal solutions in terms of area efficiency (up to &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;313.1&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;GOps/mm&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;2&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;) and energy efficiency (up to &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;289&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;GOps/W&lt;/mi&gt;&lt;/","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105142"},"PeriodicalIF":1.9,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143180785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware implementation of a high-resolution auto-tuned time-frequency signal analyzer over TMS320C6713 DSK using a compact support polynomial kernel 基于TMS320C6713 DSK的高分辨率自调谐时频信号分析仪的硬件实现,采用紧凑的支持多项式核
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-01-09 DOI: 10.1016/j.micpro.2025.105141
Ibrahim Lantri , Mansour Abed , Adel Belouchrani
This paper explores the hardware implementation of an embedded time-frequency signal analyzer using the Polynomial Cheriet-Belouchrani Distribution (PCBD) with a compact kernel. We implemented this distribution on a Texas Instruments TMS320C6713 Digital Signal Processing Starter Kit (DSK). Compared to other quadratic time-frequency distributions (TFDs), the PCBD requires a low computational cost due to its compact support nature, which reduces the number of points needing calculation. The sole smoothing parameter γ that controls its kernel's bandwidth is an integer, simplifying the unsupervised approach. To ensure that the realized TF analyzer is automatically tuned, an accurate low-complexity performance measure must be employed to achieve optimal concentration, resolution, and cross-term suppression. Failure to do so may result in missing or degraded essential signal characteristics. The Stankovic measure has been identified as the preferred measure among many others for finding the optimal value of the integer γ. We have also been exploring methods to optimize the execution of various algorithms by taking advantage of specific mathematical properties inherent in the compact polynomial kernel and the PCBD. Additionally, we propose a recursive method to minimize the computation cost associated with the discrete PCB kernel. These strategies are designed to enhance efficiency and reduce the required machine cycles. To compare the performances provided, we thoroughly evaluate the numerical complexity of our implemented distribution, both with and without mathematical optimization. The findings obtained demonstrate the effectiveness of using the TMS320C6713 DSK board to design a high-resolution auto-tuned time-frequency signal analyzer. We not only achieved a perfect match with the results obtained from MATLAB, but the optimized approach also reduced runtime by approximately 19 % to 47 % compared to the direct method, depending on the input signal length and the number of loops required to optimize the Stankovic measure. A comparative analysis was also conducted to assess the effectiveness of our approach in relation to other linear and quadratic TF analyzers, including those implemented on field-programmable gate arrays (FPGAs).
本文探讨了一种嵌入式时频信号分析仪的硬件实现,该分析仪采用具有紧凑核的多项式cherieet - belouchrani分布(PCBD)。我们在德州仪器TMS320C6713数字信号处理入门套件(DSK)上实现了这个分布。与其他二次时频分布(TFDs)相比,PCBD的计算成本较低,因为它具有紧凑的支撑特性,减少了需要计算的点的数量。控制其核带宽的唯一平滑参数γ是一个整数,简化了无监督方法。为了确保所实现的TF分析仪自动调谐,必须采用精确的低复杂度性能测量来实现最佳浓度、分辨率和交叉项抑制。如果不这样做,可能会导致丢失或降级的基本信号特性。Stankovic测度已被确定为在许多其他测度中寻找整数γ的最优值的首选测度。我们也一直在探索利用紧多项式核和PCBD中固有的特定数学性质来优化各种算法执行的方法。此外,我们提出了一种递归方法来最小化与离散PCB内核相关的计算成本。这些策略旨在提高效率并减少所需的机器周期。为了比较所提供的性能,我们彻底地评估了我们实现的分布的数值复杂性,无论是否进行了数学优化。实验结果证明了利用TMS320C6713 DSK板设计高分辨率自调谐时频信号分析仪的有效性。我们不仅实现了与MATLAB得到的结果的完美匹配,而且优化后的方法与直接方法相比,运行时间减少了约19%至47%,具体取决于优化Stankovic测量所需的输入信号长度和循环数量。还进行了比较分析,以评估我们的方法与其他线性和二次型TF分析仪(包括在现场可编程门阵列(fpga)上实现的分析仪)的有效性。
{"title":"Hardware implementation of a high-resolution auto-tuned time-frequency signal analyzer over TMS320C6713 DSK using a compact support polynomial kernel","authors":"Ibrahim Lantri ,&nbsp;Mansour Abed ,&nbsp;Adel Belouchrani","doi":"10.1016/j.micpro.2025.105141","DOIUrl":"10.1016/j.micpro.2025.105141","url":null,"abstract":"<div><div>This paper explores the hardware implementation of an embedded time-frequency signal analyzer using the Polynomial Cheriet-Belouchrani Distribution (PCBD) with a compact kernel. We implemented this distribution on a Texas Instruments TMS320C6713 Digital Signal Processing Starter Kit (DSK). Compared to other quadratic time-frequency distributions (TFDs), the PCBD requires a low computational cost due to its compact support nature, which reduces the number of points needing calculation. The sole smoothing parameter <em>γ</em> that controls its kernel's bandwidth is an integer, simplifying the unsupervised approach. To ensure that the realized TF analyzer is automatically tuned, an accurate low-complexity performance measure must be employed to achieve optimal concentration, resolution, and cross-term suppression. Failure to do so may result in missing or degraded essential signal characteristics. The Stankovic measure has been identified as the preferred measure among many others for finding the optimal value of the integer <em>γ</em>. We have also been exploring methods to optimize the execution of various algorithms by taking advantage of specific mathematical properties inherent in the compact polynomial kernel and the PCBD. Additionally, we propose a recursive method to minimize the computation cost associated with the discrete PCB kernel. These strategies are designed to enhance efficiency and reduce the required machine cycles. To compare the performances provided, we thoroughly evaluate the numerical complexity of our implemented distribution, both with and without mathematical optimization. The findings obtained demonstrate the effectiveness of using the TMS320C6713 DSK board to design a high-resolution auto-tuned time-frequency signal analyzer. We not only achieved a perfect match with the results obtained from MATLAB, but the optimized approach also reduced runtime by approximately 19 % to 47 % compared to the direct method, depending on the input signal length and the number of loops required to optimize the Stankovic measure. A comparative analysis was also conducted to assess the effectiveness of our approach in relation to other linear and quadratic TF analyzers, including those implemented on field-programmable gate arrays (FPGAs).</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105141"},"PeriodicalIF":1.9,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143179736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An adaptive binary classifier for highly imbalanced datasets on the Edge 边缘高度不平衡数据集的自适应二元分类器
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-01 DOI: 10.1016/j.micpro.2024.105120
V. Hurbungs , T.P. Fowdur , V. Bassoo
Edge machine learning brings intelligence to low-power devices at the periphery of a network. By running machine learning algorithms on the Edge, classification can be performed faster without the need to transmit large data volumes across a network. However, on-device training is often not feasible since Edge devices have limited computing and storage resources. Improved, Scalable, Efficient, and Fast classifieR (iSEFR) is a classifier that performs both training and testing on low-power devices using linearly separable balanced datasets. The novelty of this work is the improvement of the iSEFR accuracy by fine-tuning the algorithm with datasets having an uneven class distribution. Three adaptive linear function transformation techniques were proposed to improve the decision threshold which is in the form of a linear function. Experiments using stratified sampling with 5-fold cross-validation demonstrate that one of the proposed techniques significantly improved F1-score, Recall and Matthews Correlation Coefficient (MCC) by an average of 23 %, 35 % and 21 % compared to iSEFR. Further evaluation of this technique in a Fog environment using highly imbalanced datasets such as credit card fraud, network intrusion and diabetic retinopathy also showed a significant increase of 38 %, 44 % and 30 % in F1-score, Recall and MCC with a Precision of 97 %. The adaptive binary classifier maintained the time complexity of iSEFR without altering the class imbalance.
边缘机器学习为网络外围的低功耗设备带来了智能。通过在边缘设备上运行机器学习算法,可以更快地进行分类,而无需在网络上传输大量数据。然而,由于边缘设备的计算和存储资源有限,在设备上进行训练往往并不可行。改进、可扩展、高效和快速分类器(iSEFR)是一种分类器,可在低功耗设备上使用线性可分离平衡数据集进行训练和测试。这项工作的新颖之处在于通过对算法进行微调来提高 iSEFR 的准确性,因为数据集的类别分布并不均衡。该研究提出了三种自适应线性函数变换技术,以改进线性函数形式的决策阈值。使用分层抽样和 5 倍交叉验证进行的实验表明,与 iSEFR 相比,其中一种建议的技术显著提高了 F1 分数、召回率和马修斯相关系数(MCC),平均提高了 23%、35% 和 21%。在使用高度不平衡数据集(如信用卡欺诈、网络入侵和糖尿病视网膜病变)的雾环境中对该技术进行的进一步评估也显示,F1 分数、召回率和马修斯相关系数分别大幅提高了 38%、44% 和 30%,精确度达到 97%。自适应二元分类器保持了 iSEFR 的时间复杂性,但没有改变类的不平衡性。
{"title":"An adaptive binary classifier for highly imbalanced datasets on the Edge","authors":"V. Hurbungs ,&nbsp;T.P. Fowdur ,&nbsp;V. Bassoo","doi":"10.1016/j.micpro.2024.105120","DOIUrl":"10.1016/j.micpro.2024.105120","url":null,"abstract":"<div><div>Edge machine learning brings intelligence to low-power devices at the periphery of a network. By running machine learning algorithms on the Edge, classification can be performed faster without the need to transmit large data volumes across a network. However, on-device training is often not feasible since Edge devices have limited computing and storage resources. Improved, Scalable, Efficient, and Fast classifieR (iSEFR) is a classifier that performs both training and testing on low-power devices using linearly separable balanced datasets. The novelty of this work is the improvement of the iSEFR accuracy by fine-tuning the algorithm with datasets having an uneven class distribution. Three adaptive linear function transformation techniques were proposed to improve the decision threshold which is in the form of a linear function. Experiments using stratified sampling with 5-fold cross-validation demonstrate that one of the proposed techniques significantly improved F1-score, Recall and Matthews Correlation Coefficient (MCC) by an average of 23 %, 35 % and 21 % compared to iSEFR. Further evaluation of this technique in a Fog environment using highly imbalanced datasets such as credit card fraud, network intrusion and diabetic retinopathy also showed a significant increase of 38 %, 44 % and 30 % in F1-score, Recall and MCC with a Precision of 97 %. The adaptive binary classifier maintained the time complexity of iSEFR without altering the class imbalance.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"111 ","pages":"Article 105120"},"PeriodicalIF":1.9,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142661032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Microprocessors and Microsystems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1