首页 > 最新文献

Microprocessors and Microsystems最新文献

英文 中文
A reconfigurable PUF and TRNG design based on multiplexers for securing IoT applications 基于多路复用器的可重构PUF和TRNG设计,用于保护物联网应用
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-01 Epub Date: 2025-05-30 DOI: 10.1016/j.micpro.2025.105170
Zhiyuan Pan , Jiafeng Cheng , Nengyuan Sun , Jinghe Wang , Kai Shi , Jianghong Li , Zhaoyi Niu , Jiaqi Wang , Jiawei Zhang , Linhan Wang , Weize Yu
Physical Unclonable Function (PUF) and True Random Number Generator (TRNG) are two important hardware security primitives in modern cryptography. A regular arbiter PUF can be broken by machine learning (ML) attacks without much effort since a high linear relationship exists between the input data and the output response of the PUF. In this paper, an ML-resistant reconfigurable PUF and TRNG (RePT) architecture is proposed for the first time. Within this RePT design, a non-linearization technique by masking the linear relationship between the input data and the output response is proposed to greatly reinforce the robustness of an arbiter PUF against ML attacks without significantly increasing its area and power overhead. So as to further reuse the existing hardware resource within the arbiter PUF to build another hardware security primitive: TRNG, a novel algorithm is proposed to efficiently determine the selection signal value of each multiplexer within the arbiter PUF. As shown in the result, the proposed RePT design is able to achieve a 38 Mbps PUF (260 Mbps TRNG) throughput with 32,621 μm2 area, under the synthesis of SMIC 55 nm process design kits (PDK). Additionally, when ML attacks are performed on the proposed RePT circuit, it cannot be cracked even if 100,000 training data are enabled.
物理不可克隆函数(PUF)和真随机数生成器(TRNG)是现代密码学中两个重要的硬件安全原语。由于PUF的输入数据和输出响应之间存在高度线性关系,因此机器学习(ML)攻击可以毫不费力地破坏常规的仲裁PUF。本文首次提出了一种抗ml的可重构PUF和TRNG (RePT)体系结构。在这个RePT设计中,提出了一种非线性化技术,通过掩盖输入数据和输出响应之间的线性关系,大大增强了仲裁PUF对ML攻击的鲁棒性,而不会显着增加其面积和功率开销。为了进一步重用仲裁PUF内的现有硬件资源,构建另一种硬件安全原语:TRNG,提出了一种新的算法来有效地确定仲裁PUF内各多路复用器的选择信号值。结果表明,在中芯国际55纳米工艺设计套件(PDK)的合成下,所提出的RePT设计能够在32,621 μm2的面积上实现38 Mbps的PUF (260 Mbps的TRNG)吞吐量。此外,当对建议的RePT电路进行ML攻击时,即使启用100,000个训练数据也无法破解。
{"title":"A reconfigurable PUF and TRNG design based on multiplexers for securing IoT applications","authors":"Zhiyuan Pan ,&nbsp;Jiafeng Cheng ,&nbsp;Nengyuan Sun ,&nbsp;Jinghe Wang ,&nbsp;Kai Shi ,&nbsp;Jianghong Li ,&nbsp;Zhaoyi Niu ,&nbsp;Jiaqi Wang ,&nbsp;Jiawei Zhang ,&nbsp;Linhan Wang ,&nbsp;Weize Yu","doi":"10.1016/j.micpro.2025.105170","DOIUrl":"10.1016/j.micpro.2025.105170","url":null,"abstract":"<div><div>Physical Unclonable Function (PUF) and True Random Number Generator (TRNG) are two important hardware security primitives in modern cryptography. A regular arbiter PUF can be broken by machine learning (ML) attacks without much effort since a high linear relationship exists between the input data and the output response of the PUF. In this paper, an ML-resistant reconfigurable PUF and TRNG (RePT) architecture is proposed for the first time. Within this RePT design, a non-linearization technique by masking the linear relationship between the input data and the output response is proposed to greatly reinforce the robustness of an arbiter PUF against ML attacks without significantly increasing its area and power overhead. So as to further reuse the existing hardware resource within the arbiter PUF to build another hardware security primitive: TRNG, a novel algorithm is proposed to efficiently determine the selection signal value of each multiplexer within the arbiter PUF. As shown in the result, the proposed RePT design is able to achieve a 38 Mbps PUF (260 Mbps TRNG) throughput with 32,621 <span><math><mi>μ</mi></math></span>m<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span> area, under the synthesis of SMIC 55 nm process design kits (PDK). Additionally, when ML attacks are performed on the proposed RePT circuit, it cannot be cracked even if 100,000 training data are enabled.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"116 ","pages":"Article 105170"},"PeriodicalIF":1.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144189866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STRATUM project: AI-based point of care computing for neurosurgical 3D decision support tools STRATUM项目:基于人工智能的神经外科三维决策支持工具护理点计算
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-01 Epub Date: 2025-05-14 DOI: 10.1016/j.micpro.2025.105157
Himar Fabelo , Raquel Leon , Emanuele Torti , Santiago Marco , Asaf Badouh , Max Verbers , Carlos Vega , Javier Santana-Nunez , Yann Falevoz , Yolanda Ramallo-Fariña , Christian Weis , Ana M Wägner , Eduardo Juarez , Claudio Rial , Alfonso Lagares , Gustav Burström , Francesco Leporati , Luis Jimenez-Roldan , Elisa Marenzi , Teresa Cervero , Gustavo M. Callico
Integrated digital diagnostics are transforming complex surgical procedures, with brain tumour surgery being among the most challenging. STRATUM, a five-year Horizon Europe-funded project, aims to develop an advanced 3D decision support system leveraging real-time multimodal data processing powered by artificial intelligence. A key innovation of STRATUM is its design as an energy-efficient Point-of-Care computing system, seamlessly integrated into neurosurgical workflows. This system will provide surgeons with real-time, AI-driven insights, enhancing decision-making accuracy and efficiency. By optimizing surgical precision and reducing procedure duration, STRATUM is expected to improve patient outcomes while streamlining resource utilization within European healthcare systems.
综合数字诊断正在改变复杂的外科手术,脑肿瘤手术是最具挑战性的手术之一。STRATUM是一项为期五年的Horizon欧洲资助项目,旨在开发一种先进的3D决策支持系统,利用人工智能驱动的实时多模式数据处理。STRATUM的一个关键创新是它作为一个节能的护理点计算系统的设计,无缝集成到神经外科工作流程中。该系统将为外科医生提供实时的、人工智能驱动的见解,提高决策的准确性和效率。通过优化手术精度和缩短手术时间,STRATUM有望改善患者的结果,同时简化欧洲医疗保健系统内的资源利用。
{"title":"STRATUM project: AI-based point of care computing for neurosurgical 3D decision support tools","authors":"Himar Fabelo ,&nbsp;Raquel Leon ,&nbsp;Emanuele Torti ,&nbsp;Santiago Marco ,&nbsp;Asaf Badouh ,&nbsp;Max Verbers ,&nbsp;Carlos Vega ,&nbsp;Javier Santana-Nunez ,&nbsp;Yann Falevoz ,&nbsp;Yolanda Ramallo-Fariña ,&nbsp;Christian Weis ,&nbsp;Ana M Wägner ,&nbsp;Eduardo Juarez ,&nbsp;Claudio Rial ,&nbsp;Alfonso Lagares ,&nbsp;Gustav Burström ,&nbsp;Francesco Leporati ,&nbsp;Luis Jimenez-Roldan ,&nbsp;Elisa Marenzi ,&nbsp;Teresa Cervero ,&nbsp;Gustavo M. Callico","doi":"10.1016/j.micpro.2025.105157","DOIUrl":"10.1016/j.micpro.2025.105157","url":null,"abstract":"<div><div>Integrated digital diagnostics are transforming complex surgical procedures, with brain tumour surgery being among the most challenging. STRATUM, a five-year Horizon Europe-funded project, aims to develop an advanced 3D decision support system leveraging real-time multimodal data processing powered by artificial intelligence. A key innovation of STRATUM is its design as an energy-efficient Point-of-Care computing system, seamlessly integrated into neurosurgical workflows. This system will provide surgeons with real-time, AI-driven insights, enhancing decision-making accuracy and efficiency. By optimizing surgical precision and reducing procedure duration, STRATUM is expected to improve patient outcomes while streamlining resource utilization within European healthcare systems.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"116 ","pages":"Article 105157"},"PeriodicalIF":1.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144106844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel machine learning-driven optimization methodology for faster and more efficient design space exploration in high-level synthesis 一种新的机器学习驱动的优化方法,用于在高级综合中更快、更有效的设计空间探索
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-01 Epub Date: 2025-04-16 DOI: 10.1016/j.micpro.2025.105154
Esra Celik, Deniz Dal
The optimization of digital circuits is a critical factor in determining the competitiveness of modern electronic systems, particularly in terms of area, performance, and power consumption. High-Level Synthesis (HLS) plays a pivotal role in this optimization process, enabling designers to define system requirements at a higher level of abstraction and providing opportunities to analyze and optimize digital circuits against various metrics prior to production. However, the design constraints inherent in the HLS process often lead to multi-objective optimization problems, which significantly complicate the exploration process. This complexity necessitates the development of novel synthesis methodologies enabling faster and more efficient design space exploration. In response to this need, within the scope of this study, we introduced an innovative and hybrid HLS methodology that combines metaheuristic and machine learning approaches. In this respect, two distinct synthesis tools were developed. The first tool, implemented in C++, utilizes the Simulated Annealing (SA) metaheuristic with a novel three-part solution representation. This representation, a key contribution of our study, aims to minimize the weighted sum of latency and area constraints for Data Flow Graph (DFG) designs. While effective, this approach resulted in extended execution times due to computationally intensive design variables. To address the performance bottleneck identified in the standard cost function evaluation, we developed a second tool that integrates machine learning with the traditional SA. This hybrid approach combines C++ and Python, incorporating a Support Vector Regression (SVR) model to estimate solution costs more efficiently, significantly reducing execution times. Our study also presents the detailed analyses of the experimental results conducted on seven benchmarks with varying node counts. The three-part solution representation in the traditional SA approach demonstrated up to a 53.38% improvement in performance compared to the single-part representation across all benchmarks. For benchmarks with fewer nodes (DiffEq, Lattice, Ellip, and FEWF), the model-based estimation implementation achieved results identical to the traditional approach but required longer execution times. For benchmarks characterized by higher node counts (MatMul, IntAux, and MCM), our novel approach demonstrated equivalent results to the traditional SA implementation with a time savings of up to 129 seconds. We leveraged these time savings to enhance the exploration process, achieving up to 5.4% improvement in solution quality without exceeding the execution time of the traditional approach.
数字电路的优化是决定现代电子系统竞争力的关键因素,特别是在面积、性能和功耗方面。高级综合(HLS)在优化过程中起着关键作用,使设计人员能够在更高的抽象层次上定义系统需求,并提供在生产之前根据各种指标分析和优化数字电路的机会。然而,HLS过程中固有的设计约束往往导致多目标优化问题,这大大复杂化了勘探过程。这种复杂性要求开发新的合成方法,以实现更快、更有效的设计空间探索。为了满足这一需求,在本研究的范围内,我们引入了一种创新的混合HLS方法,该方法结合了元启发式和机器学习方法。在这方面,开发了两种不同的合成工具。第一个工具是用c++实现的,它利用模拟退火(SA)元启发式算法和一种新颖的三部分解表示。这种表示是我们研究的一个关键贡献,旨在最小化数据流图(DFG)设计的延迟和面积约束的加权总和。这种方法虽然有效,但由于计算密集的设计变量,导致执行时间延长。为了解决在标准成本函数评估中发现的性能瓶颈,我们开发了第二种工具,将机器学习与传统SA集成在一起。这种混合方法结合了c++和Python,结合了支持向量回归(SVR)模型来更有效地估计解决方案的成本,大大减少了执行时间。我们的研究还详细分析了在七个具有不同节点计数的基准上进行的实验结果。在所有基准测试中,与单部分表示相比,传统SA方法中的三部分解决方案表示的性能提高了53.38%。对于节点较少的基准测试(DiffEq、Lattice、Ellip和FEWF),基于模型的估计实现获得了与传统方法相同的结果,但需要更长的执行时间。对于具有较高节点计数特征的基准测试(MatMul、inaux和MCM),我们的新方法证明了与传统SA实现相当的结果,并且节省了高达129秒的时间。我们利用这些节省的时间来增强勘探过程,在不超过传统方法执行时间的情况下,将解决方案质量提高了5.4%。
{"title":"A novel machine learning-driven optimization methodology for faster and more efficient design space exploration in high-level synthesis","authors":"Esra Celik,&nbsp;Deniz Dal","doi":"10.1016/j.micpro.2025.105154","DOIUrl":"10.1016/j.micpro.2025.105154","url":null,"abstract":"<div><div>The optimization of digital circuits is a critical factor in determining the competitiveness of modern electronic systems, particularly in terms of area, performance, and power consumption. High-Level Synthesis (HLS) plays a pivotal role in this optimization process, enabling designers to define system requirements at a higher level of abstraction and providing opportunities to analyze and optimize digital circuits against various metrics prior to production. However, the design constraints inherent in the HLS process often lead to multi-objective optimization problems, which significantly complicate the exploration process. This complexity necessitates the development of novel synthesis methodologies enabling faster and more efficient design space exploration. In response to this need, within the scope of this study, we introduced an innovative and hybrid HLS methodology that combines metaheuristic and machine learning approaches. In this respect, two distinct synthesis tools were developed. The first tool, implemented in C++, utilizes the Simulated Annealing (SA) metaheuristic with a novel three-part solution representation. This representation, a key contribution of our study, aims to minimize the weighted sum of latency and area constraints for Data Flow Graph (DFG) designs. While effective, this approach resulted in extended execution times due to computationally intensive design variables. To address the performance bottleneck identified in the standard cost function evaluation, we developed a second tool that integrates machine learning with the traditional SA. This hybrid approach combines C++ and Python, incorporating a Support Vector Regression (SVR) model to estimate solution costs more efficiently, significantly reducing execution times. Our study also presents the detailed analyses of the experimental results conducted on seven benchmarks with varying node counts. The three-part solution representation in the traditional SA approach demonstrated up to a 53.38% improvement in performance compared to the single-part representation across all benchmarks. For benchmarks with fewer nodes (DiffEq, Lattice, Ellip, and FEWF), the model-based estimation implementation achieved results identical to the traditional approach but required longer execution times. For benchmarks characterized by higher node counts (MatMul, IntAux, and MCM), our novel approach demonstrated equivalent results to the traditional SA implementation with a time savings of up to 129 seconds. We leveraged these time savings to enhance the exploration process, achieving up to 5.4% improvement in solution quality without exceeding the execution time of the traditional approach.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"114 ","pages":"Article 105154"},"PeriodicalIF":1.9,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AAL-based smart cane system with security and privacy features for blind and visually impaired individuals 基于人工智能的智能手杖系统,为盲人和视障人士提供安全和隐私功能
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-01 Epub Date: 2025-04-24 DOI: 10.1016/j.micpro.2025.105155
Kyriaki Tsantikidou, Grigorios Delimpaltadakis, Damianos Diasakos, Nicolas Sklavos
Ambient Assisted Living (AAL) technologies aim at increasing the quality of life for people with impairments. Practicality, reliability, autonomy, ease-of-use, safety, and low cost are of the utmost importance and in some cases omitted or overlooked by the research community. In this paper, an AAL-based smart cane system with security and privacy features for blind and visually impaired individuals that aims at satisfying these requirements is proposed. Multiple services that facilitate the everyday life for both indoor and outdoor activities are implemented: obstacle detection for ground and head level via ultrasonic (US) sensors and vibrations, ascending and descending stair detection/recognition via computer vision, image processing through various sensors, an emergency button for additional safety, and a LoRa antenna with a security and privacy mechanism for safely communicating with the Health 4.0-based environment. The proposed system is implemented with an Arduino and Raspberry Pi Zero combination and provides more practical and economic services compared to other published related works, including head-level detection, an indoor-outdoor adjustment switch and security mechanisms that are in most cases dismissed. It achieves a 7.4 % accuracy increase for general obstacle detection and a 100 % consistent drop or wall detection accuracy compared to published works. The proposed system presents a 37.82 % increase of speed-adjusted recall and a 24.4 % performance increase in its stair detection feature compared to published works. It focuses on hardware efficiency, safety and real-world autonomy with cost efficient alternatives. The proposed architecture of the security mechanism achieves a small area consumption, minimum of 35.6 % decrease compared to published designs, and an efficient throughput, that is appropriate with the utilized antenna.
环境辅助生活(AAL)技术旨在提高残疾人的生活质量。实用性、可靠性、自主性、易用性、安全性和低成本是最重要的,在某些情况下被研究界忽略或忽视。本文针对这些需求,提出了一种基于人工智能的盲人和视障人士安全隐私智能手杖系统。实现了多种便利室内和室外日常生活的服务:通过超声波(US)传感器和振动对地面和头部进行障碍物检测,通过计算机视觉对上下楼梯进行检测/识别,通过各种传感器进行图像处理,用于额外安全的紧急按钮,以及具有安全和隐私机制的LoRa天线,用于与基于Health 4.0的环境进行安全通信。该系统采用Arduino和Raspberry Pi Zero的组合实现,与其他已发表的相关作品相比,该系统提供了更实用、更经济的服务,包括头位检测、室内外调节开关和安全机制,这些在大多数情况下都被忽略了。与已发表的作品相比,它在一般障碍物检测方面的准确率提高了7.4%,并且在跌落或墙壁检测方面的准确率达到了100%。与已发表的作品相比,该系统的速度调整召回率提高了37.82%,楼梯检测功能的性能提高了24.4%。它专注于硬件效率、安全性和具有成本效益替代方案的现实世界自主性。所提出的安全机制架构实现了较小的面积消耗,与已发表的设计相比至少减少了35.6%,并且具有与所使用的天线相适应的高效吞吐量。
{"title":"AAL-based smart cane system with security and privacy features for blind and visually impaired individuals","authors":"Kyriaki Tsantikidou,&nbsp;Grigorios Delimpaltadakis,&nbsp;Damianos Diasakos,&nbsp;Nicolas Sklavos","doi":"10.1016/j.micpro.2025.105155","DOIUrl":"10.1016/j.micpro.2025.105155","url":null,"abstract":"<div><div>Ambient Assisted Living (AAL) technologies aim at increasing the quality of life for people with impairments. Practicality, reliability, autonomy, ease-of-use, safety, and low cost are of the utmost importance and in some cases omitted or overlooked by the research community. In this paper, an AAL-based smart cane system with security and privacy features for blind and visually impaired individuals that aims at satisfying these requirements is proposed. Multiple services that facilitate the everyday life for both indoor and outdoor activities are implemented: obstacle detection for ground and head level via ultrasonic (US) sensors and vibrations, ascending and descending stair detection/recognition via computer vision, image processing through various sensors, an emergency button for additional safety, and a LoRa antenna with a security and privacy mechanism for safely communicating with the Health 4.0-based environment. The proposed system is implemented with an Arduino and Raspberry Pi Zero combination and provides more practical and economic services compared to other published related works, including head-level detection, an indoor-outdoor adjustment switch and security mechanisms that are in most cases dismissed. It achieves a 7.4 % accuracy increase for general obstacle detection and a 100 % consistent drop or wall detection accuracy compared to published works. The proposed system presents a 37.82 % increase of speed-adjusted recall and a 24.4 % performance increase in its stair detection feature compared to published works. It focuses on hardware efficiency, safety and real-world autonomy with cost efficient alternatives. The proposed architecture of the security mechanism achieves a small area consumption, minimum of 35.6 % decrease compared to published designs, and an efficient throughput, that is appropriate with the utilized antenna.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"114 ","pages":"Article 105155"},"PeriodicalIF":1.9,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143906800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Coarse-Grained Reconfigurable Array architecture for machine learning applications in space using DARE65T library platform 利用 DARE65T 库平台为空间机器学习应用设计高效的粗粒度可重构阵列架构
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2025-01-14 DOI: 10.1016/j.micpro.2025.105142
Luca Zulberti , Matteo Monopoli , Pietro Nannipieri , Silvia Moranti , Geert Thys , Luca Fanucci
<div><div>With the increasing use of satellites, rovers, and other space exploration devices, Artificial Intelligence (AI) is also becoming an important tool for space exploration, allowing autonomous decision-making and operations in harsh environments. As a result, there is an increasing demand for reliable and energy-efficient processing platforms in the space industry. Among all processing architectures, Coarse-Grained Reconfigurable Arrays (CGRAs) are becoming popular, particularly in data-intensive applications like machine learning, demonstrating a substantial improvement in the energy efficiency of inference operations while preserving a good degree of versatility. In high-level class space missions, the hardware platforms incorporate radiation-hardened Field Programmable Gate Arrays (FPGAs) and microcontrollers, which do not meet the performance requirements for the aforementioned AI applications. The use of CGRA architectures in space missions is still not widely studied. The main contribution of this work is a comprehensive Design Space Exploration (DSE) activity with our highly parameterized CGRA architecture, exploring the costs associated with various design parameters when targeting AI in the space domain. We evaluated performance, power consumption, and area occupation after synthesis on the radiation-hardened DARE65T standard cell library developed by imec, based on a commercial 65 nm technology process. We characterize different CGRA configurations, comparing them with state-of-the-art solutions used for the acceleration of the AI algorithms. This work highlights Performance, Power, and Area (PPA) results that range from <span><math><mrow><mi>100</mi><mspace></mspace><mi>MHz</mi></mrow></math></span> (up to <span><math><mrow><mi>600</mi><mspace></mspace><mi>MOps</mi></mrow></math></span>), <span><math><mrow><mi>2.43</mi><mo>×</mo><msup><mrow><mi>10</mi></mrow><mrow><mi>4</mi></mrow></msup><mspace></mspace><mstyle><mstyle><mi>μ</mi></mstyle></mstyle><msup><mrow><mi>m</mi></mrow><mrow><mi>2</mi></mrow></msup></mrow></math></span> cell area occupation and <span><math><mrow><mi>0.699</mi><mspace></mspace><mi>mW</mi></mrow></math></span> power consumption, to <span><math><mrow><mi>625</mi><mspace></mspace><mi>MHz</mi></mrow></math></span> (up to <span><math><mrow><mi>3.75</mi><mspace></mspace><mi>GOps</mi></mrow></math></span>), <span><math><mrow><mi>2.43</mi><mo>×</mo><msup><mrow><mi>10</mi></mrow><mrow><mi>5</mi></mrow></msup><mspace></mspace><mstyle><mstyle><mi>μ</mi></mstyle></mstyle><msup><mrow><mi>m</mi></mrow><mrow><mi>2</mi></mrow></msup><mo>,</mo><mi>46.5</mi><mspace></mspace><mi>mW</mi></mrow></math></span>. During DSE activity, we highlight the optimal solutions in terms of area efficiency (up to <span><math><mrow><mi>313.1</mi><mspace></mspace><msup><mrow><mi>GOps/mm</mi></mrow><mrow><mi>2</mi></mrow></msup></mrow></math></span>) and energy efficiency (up to <span><math><mrow><mi>289</mi><mspace></mspace><mi>GOps/W</mi></
随着卫星、漫游者和其他空间探索设备的使用越来越多,人工智能(AI)也成为空间探索的重要工具,可以在恶劣环境下自主决策和操作。因此,航天工业对可靠和节能的处理平台的需求日益增加。在所有的处理架构中,粗粒度可重构阵列(CGRAs)正变得越来越流行,特别是在数据密集型应用中,如机器学习,在保持良好通用性的同时,证明了推理操作的能源效率的大幅提高。在高级别空间任务中,硬件平台包含抗辐射的现场可编程门阵列(fpga)和微控制器,它们不满足上述人工智能应用的性能要求。CGRA结构在空间任务中的应用还没有得到广泛的研究。这项工作的主要贡献是利用我们高度参数化的CGRA架构进行全面的设计空间探索(DSE)活动,探索在空间领域瞄准人工智能时与各种设计参数相关的成本。我们评估了imec基于商用65纳米工艺开发的抗辐射DARE65T标准细胞库合成后的性能、功耗和面积占用。我们描述了不同的CGRA配置,并将它们与用于加速AI算法的最先进解决方案进行了比较。这项工作突出了性能,功率和面积(PPA)结果,范围从100MHz(高达600MOps), 2.43×104μm2小区面积占用和0.699mW功耗,到625MHz(高达3.75GOps), 2.43×105μm2,46.5mW。在DSE活动期间,我们强调了每个CGRA配置在面积效率(高达313.1GOps/mm2)和能源效率(高达289GOps/W)方面的最佳解决方案。
{"title":"Efficient Coarse-Grained Reconfigurable Array architecture for machine learning applications in space using DARE65T library platform","authors":"Luca Zulberti ,&nbsp;Matteo Monopoli ,&nbsp;Pietro Nannipieri ,&nbsp;Silvia Moranti ,&nbsp;Geert Thys ,&nbsp;Luca Fanucci","doi":"10.1016/j.micpro.2025.105142","DOIUrl":"10.1016/j.micpro.2025.105142","url":null,"abstract":"&lt;div&gt;&lt;div&gt;With the increasing use of satellites, rovers, and other space exploration devices, Artificial Intelligence (AI) is also becoming an important tool for space exploration, allowing autonomous decision-making and operations in harsh environments. As a result, there is an increasing demand for reliable and energy-efficient processing platforms in the space industry. Among all processing architectures, Coarse-Grained Reconfigurable Arrays (CGRAs) are becoming popular, particularly in data-intensive applications like machine learning, demonstrating a substantial improvement in the energy efficiency of inference operations while preserving a good degree of versatility. In high-level class space missions, the hardware platforms incorporate radiation-hardened Field Programmable Gate Arrays (FPGAs) and microcontrollers, which do not meet the performance requirements for the aforementioned AI applications. The use of CGRA architectures in space missions is still not widely studied. The main contribution of this work is a comprehensive Design Space Exploration (DSE) activity with our highly parameterized CGRA architecture, exploring the costs associated with various design parameters when targeting AI in the space domain. We evaluated performance, power consumption, and area occupation after synthesis on the radiation-hardened DARE65T standard cell library developed by imec, based on a commercial 65 nm technology process. We characterize different CGRA configurations, comparing them with state-of-the-art solutions used for the acceleration of the AI algorithms. This work highlights Performance, Power, and Area (PPA) results that range from &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;100&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;MHz&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; (up to &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;600&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;MOps&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;), &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;2.43&lt;/mi&gt;&lt;mo&gt;×&lt;/mo&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;10&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;4&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mstyle&gt;&lt;mstyle&gt;&lt;mi&gt;μ&lt;/mi&gt;&lt;/mstyle&gt;&lt;/mstyle&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;2&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; cell area occupation and &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;0.699&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;mW&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; power consumption, to &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;625&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;MHz&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; (up to &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;3.75&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;GOps&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;), &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;2.43&lt;/mi&gt;&lt;mo&gt;×&lt;/mo&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;10&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;5&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mstyle&gt;&lt;mstyle&gt;&lt;mi&gt;μ&lt;/mi&gt;&lt;/mstyle&gt;&lt;/mstyle&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;2&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;mo&gt;,&lt;/mo&gt;&lt;mi&gt;46.5&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;mW&lt;/mi&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;. During DSE activity, we highlight the optimal solutions in terms of area efficiency (up to &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;313.1&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;GOps/mm&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;2&lt;/mi&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;) and energy efficiency (up to &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;289&lt;/mi&gt;&lt;mspace&gt;&lt;/mspace&gt;&lt;mi&gt;GOps/W&lt;/mi&gt;&lt;/","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105142"},"PeriodicalIF":1.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143180785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review on hardware accelerators for convolutional neural network-based inference engines: Strategies for performance and energy-efficiency enhancement 基于卷积神经网络的推理引擎硬件加速器综述:性能和能效提升策略
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2025-02-18 DOI: 10.1016/j.micpro.2025.105146
Deepika S․ , Arunachalam V․ , Alex Noel Joseph Raj
In time-critical & safety-critical image classification applications, Convolutional Neural Networks (CNNs) based Inference Engines (IEs) are preferred and required to be fast, accurate, and cost-effective to meet the market demands. The self-feature extraction capabilities use millions of parameters and neurons in the stack of layers with restricted processing time. This paper reviews strategies applied in Hardware-based image classification CNN inference engines. The acceleration strategies are (1) Arithmetic Logic Unit (ALU)-based, (2) Data flow-based, and (3) Sparsity-based are considered here. Considering benchmark accuracy, the 16-bit mixed fixed/floating point could provide 99 % and 3.75 times more performance than Half-precision floating point in an application-specific CNN model. Feeding 2-dimensional or 3-dimensional data frames to the CNN layers would reuse the data. It optimizes the volume of memory usage and improves the efficiency of the processor array. The pruning of zero/near-zero valued Input Feature Maps (IFMs) and weights leads to sparsity in the data fed to the different layers. Therefore, data compression strategies and skipping the trivial computation (zero skipping approach) would reduce the complexity of the controller. There is a benchmark performance improvement of 1.17 times and 6.2 times in power efficiency compared to dense architecture. Minimizing the complexity of indexing and load balancing controller would improve the performance further.
在时间紧迫的&;基于卷积神经网络(Convolutional Neural Networks, cnn)的推理引擎(Inference engine, IEs)在安全关键的图像分类应用中更受青睐,并且需要快速、准确和高性价比来满足市场需求。在有限的处理时间内,自特征提取能力使用了数以百万计的参数和神经元。本文综述了在基于硬件的图像分类CNN推理引擎中应用的策略。本文考虑了(1)基于算术逻辑单元(ALU)、(2)基于数据流和(3)基于稀疏性的加速策略。考虑到基准精度,在特定应用的CNN模型中,16位混合固定/浮点可以提供比半精度浮点高99%和3.75倍的性能。向CNN层提供二维或三维数据帧将重用这些数据。它优化了内存使用量,提高了处理器阵列的效率。零/近零值的输入特征映射(ifm)和权重的修剪导致了馈送到不同层的数据的稀疏性。因此,数据压缩策略和跳过琐碎的计算(跳零方法)将降低控制器的复杂性。与密集架构相比,基准性能提高了1.17倍,能效提高了6.2倍。最小化索引和负载平衡控制器的复杂性将进一步提高性能。
{"title":"A review on hardware accelerators for convolutional neural network-based inference engines: Strategies for performance and energy-efficiency enhancement","authors":"Deepika S․ ,&nbsp;Arunachalam V․ ,&nbsp;Alex Noel Joseph Raj","doi":"10.1016/j.micpro.2025.105146","DOIUrl":"10.1016/j.micpro.2025.105146","url":null,"abstract":"<div><div>In time-critical &amp; safety-critical image classification applications, Convolutional Neural Networks (CNNs) based Inference Engines (IEs) are preferred and required to be fast, accurate, and cost-effective to meet the market demands. The self-feature extraction capabilities use millions of parameters and neurons in the stack of layers with restricted processing time. This paper reviews strategies applied in Hardware-based image classification CNN inference engines. The acceleration strategies are (1) Arithmetic Logic Unit (ALU)-based, (2) Data flow-based, and (3) Sparsity-based are considered here. Considering benchmark accuracy, the 16-bit mixed fixed/floating point could provide 99 % and 3.75 times more performance than Half-precision floating point in an application-specific CNN model. Feeding 2-dimensional or 3-dimensional data frames to the CNN layers would reuse the data. It optimizes the volume of memory usage and improves the efficiency of the processor array. The pruning of zero/near-zero valued Input Feature Maps (IFMs) and weights leads to sparsity in the data fed to the different layers. Therefore, data compression strategies and skipping the trivial computation (zero skipping approach) would reduce the complexity of the controller. There is a benchmark performance improvement of 1.17 times and 6.2 times in power efficiency compared to dense architecture. Minimizing the complexity of indexing and load balancing controller would improve the performance further.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105146"},"PeriodicalIF":1.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143510725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open-source ROS-based simulation for verification of FPGA robotics applications 开源的基于ros的FPGA机器人应用验证仿真
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2025-02-10 DOI: 10.1016/j.micpro.2025.105143
Rubén Nieto , Felipe Machado , Jesús Fernández-Conde , David Lobato , José M. Cañas
FPGAs are increasingly incorporated in many high-end robotics applications, often involving computer vision and motor control. However, functional verification of FPGA designs is labor-intensive, time-consuming, and consequently expensive. Moreover, validation of complex systems, such as robots, poses even further challenges because neither the external interactions can be easily modeled with traditional testbenches nor the robot’s response can be adequately observed and ascertained. This work presents a new methodology that validates the robot’s behavior in a realistic simulated environment before transferring the design to the physical robot and the onboard FPGA. This methodology allows integral, fast, and flexible debugging cycles of robotics applications by integrating the functional simulation of the processing unit (FPGA) with the simulation of the robot, its environment, and their mutual interconnections. The Verilator simulation tool is used for fast Verilog/SystemVerilog verification and simulation. ROS, the standard robotics middleware, and Gazebo 3D robotics simulator are used for realistic robot simulation, including a robust physics engine. We have implemented several open-source software extensions to interconnect the Verilog circuit with the simulated ROS sensors and actuators. This methodology’s utility and correctness have been assessed by developing a complete proof-of-concept FPGA-based robotics application in which a commercial robot follows a colored object using its onboard camera and differential drive motors. This work establishes the foundations for developing and testing complex robot FPGA-based modules more efficiently and flexibly.
fpga越来越多地应用于许多高端机器人应用,通常涉及计算机视觉和电机控制。然而,FPGA设计的功能验证是劳动密集型的,耗时的,因此是昂贵的。此外,对复杂系统(如机器人)的验证提出了进一步的挑战,因为传统的试验台既不能轻易地对外部相互作用进行建模,也不能充分观察和确定机器人的响应。这项工作提出了一种新的方法,在将设计转移到物理机器人和板载FPGA之前,在真实的模拟环境中验证机器人的行为。这种方法通过将处理单元(FPGA)的功能模拟与机器人、其环境及其相互连接的模拟相结合,允许机器人应用程序的集成、快速和灵活的调试周期。Verilator仿真工具用于快速Verilog/SystemVerilog验证和仿真。ROS,标准机器人中间件和Gazebo 3D机器人模拟器用于逼真的机器人模拟,包括一个强大的物理引擎。我们已经实现了几个开源软件扩展,将Verilog电路与模拟ROS传感器和执行器互连。该方法的实用性和正确性已经通过开发一个完整的基于fpga的概念验证机器人应用程序进行了评估,其中一个商业机器人使用其机载摄像头和差动驱动电机跟踪彩色物体。为更高效、灵活地开发和测试复杂的机器人fpga模块奠定了基础。
{"title":"Open-source ROS-based simulation for verification of FPGA robotics applications","authors":"Rubén Nieto ,&nbsp;Felipe Machado ,&nbsp;Jesús Fernández-Conde ,&nbsp;David Lobato ,&nbsp;José M. Cañas","doi":"10.1016/j.micpro.2025.105143","DOIUrl":"10.1016/j.micpro.2025.105143","url":null,"abstract":"<div><div>FPGAs are increasingly incorporated in many high-end robotics applications, often involving computer vision and motor control. However, functional verification of FPGA designs is labor-intensive, time-consuming, and consequently expensive. Moreover, validation of complex systems, such as robots, poses even further challenges because neither the external interactions can be easily modeled with traditional testbenches nor the robot’s response can be adequately observed and ascertained. This work presents a new methodology that validates the robot’s behavior in a realistic simulated environment before transferring the design to the physical robot and the onboard FPGA. This methodology allows integral, fast, and flexible debugging cycles of robotics applications by integrating the functional simulation of the processing unit (FPGA) with the simulation of the robot, its environment, and their mutual interconnections. The Verilator simulation tool is used for fast Verilog/SystemVerilog verification and simulation. ROS, the standard robotics middleware, and Gazebo 3D robotics simulator are used for realistic robot simulation, including a robust physics engine. We have implemented several open-source software extensions to interconnect the Verilog circuit with the simulated ROS sensors and actuators. This methodology’s utility and correctness have been assessed by developing a complete proof-of-concept FPGA-based robotics application in which a commercial robot follows a colored object using its onboard camera and differential drive motors. This work establishes the foundations for developing and testing complex robot FPGA-based modules more efficiently and flexibly.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105143"},"PeriodicalIF":1.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143428214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A cost-effective design for a mid-range microcontroller-based lock-in amplifier 一种基于中档微控制器的锁相放大器的高性价比设计
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2025-02-19 DOI: 10.1016/j.micpro.2025.105145
Ignacio Horcas , David Moreno-Salinas , José Sánchez-Moreno
Lock-in amplifiers are instruments widely used in physics and engineering laboratories, whose invention goes back to the 1940s. Due to the late electronic developments, the former analog implementations have been replaced with digital versions, mainly based on FPGAs (field-programmable gate arrays). The present work, exploiting the last advances in the microcontrollers field, consists in the development of a functional prototype of a low-cost lock-in amplifier based on a microcontroller with similar specifications to mid-range commercial amplifiers. The performance of the prototype has been tested and compared with commercial devices, showing a similar performance in common use cases at a much reduced cost.
锁相放大器是一种广泛应用于物理和工程实验室的仪器,其发明可以追溯到20世纪40年代。由于后期电子技术的发展,以前的模拟实现已经被主要基于fpga(现场可编程门阵列)的数字版本所取代。目前的工作,利用微控制器领域的最新进展,包括基于微控制器的低成本锁定放大器的功能原型的开发,其规格与中程商用放大器相似。原型机的性能已经过测试,并与商用设备进行了比较,在常见用例中显示出相似的性能,成本大大降低。
{"title":"A cost-effective design for a mid-range microcontroller-based lock-in amplifier","authors":"Ignacio Horcas ,&nbsp;David Moreno-Salinas ,&nbsp;José Sánchez-Moreno","doi":"10.1016/j.micpro.2025.105145","DOIUrl":"10.1016/j.micpro.2025.105145","url":null,"abstract":"<div><div>Lock-in amplifiers are instruments widely used in physics and engineering laboratories, whose invention goes back to the 1940s. Due to the late electronic developments, the former analog implementations have been replaced with digital versions, mainly based on FPGAs (field-programmable gate arrays). The present work, exploiting the last advances in the microcontrollers field, consists in the development of a functional prototype of a low-cost lock-in amplifier based on a microcontroller with similar specifications to mid-range commercial amplifiers. The performance of the prototype has been tested and compared with commercial devices, showing a similar performance in common use cases at a much reduced cost.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105145"},"PeriodicalIF":1.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143453911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A real-time interception system for compromised frequency-hopping signal eavesdropping 一种用于窃听跳频信号的实时拦截系统
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2025-02-13 DOI: 10.1016/j.micpro.2025.105144
Corentin Lavaud , Robin Gerzaguet , Matthieu Gautier , Olivier Berder , Erwan Nogues , Stephane Molton
In modern computing architectures, sensitive data (red data) is carried out in the same processing units as encrypted data (black data). Due to leaks (internal mixing, coupling …), this red data can be emitted in a legitimate radio transmission through a so-called telecom side-channel. This new type of side-channel creates an important threat as it can be passively and remotely processed by a dedicated interception system. This threat becomes even more concerning within the context of the Internet of Things, as the use of low-cost components leads to increased leaks. This paper addresses telecom side-channels on frequency-hopping signals, that are harsh to eavesdrop due to their sporadic nature in both time and frequency domains. To that goal, a wideband interception system is proposed, able to intercept frequency-hopping signals in real time and to extract sensitive red data from it. The system relies on software-defined radios and leverages both hardware and software resources to process a 200MHz bandwidth in real time. The proposed architecture is capable of detecting jumps on the order of 20μs and can therefore track 50,000 jumps per second across 1,024 channels. Finally, the criticality of telecom side-channels in Bluetooth communications is demonstrated through real interception on several microcontroller chips.
在现代计算架构中,敏感数据(红色数据)与加密数据(黑色数据)在相同的处理单元中执行。由于泄漏(内部混合,耦合…),这些红色数据可以通过所谓的电信侧信道在合法的无线电传输中发射。这种新型的侧信道产生了一个重要的威胁,因为它可以被专用拦截系统被动地远程处理。在物联网的背景下,这种威胁变得更加令人担忧,因为使用低成本组件会导致泄漏增加。本文研究了电信跳频信号的边信道,这种信号在时域和频域都具有偶发性,对窃听很不利。为此,提出了一种能够实时截获跳频信号并从中提取敏感红色数据的宽带截获系统。该系统依赖于软件定义无线电,并利用硬件和软件资源实时处理200MHz带宽。所提出的架构能够检测20μs量级的跳变,因此可以在1024个通道中每秒跟踪50,000个跳变。最后,通过在多个单片机上的实际拦截,论证了电信侧信道在蓝牙通信中的重要性。
{"title":"A real-time interception system for compromised frequency-hopping signal eavesdropping","authors":"Corentin Lavaud ,&nbsp;Robin Gerzaguet ,&nbsp;Matthieu Gautier ,&nbsp;Olivier Berder ,&nbsp;Erwan Nogues ,&nbsp;Stephane Molton","doi":"10.1016/j.micpro.2025.105144","DOIUrl":"10.1016/j.micpro.2025.105144","url":null,"abstract":"<div><div>In modern computing architectures, sensitive data (<em>red data</em>) is carried out in the same processing units as encrypted data (<em>black data</em>). Due to leaks (internal mixing, coupling …), this red data can be emitted in a legitimate radio transmission through a so-called telecom side-channel. This new type of side-channel creates an important threat as it can be passively and remotely processed by a dedicated interception system. This threat becomes even more concerning within the context of the Internet of Things, as the use of low-cost components leads to increased leaks. This paper addresses telecom side-channels on frequency-hopping signals, that are harsh to eavesdrop due to their sporadic nature in both time and frequency domains. To that goal, a wideband interception system is proposed, able to intercept frequency-hopping signals in real time and to extract sensitive red data from it. The system relies on software-defined radios and leverages both hardware and software resources to process a 200MHz bandwidth in real time. The proposed architecture is capable of detecting jumps on the order of <span><math><mrow><mn>20</mn><mi>μ</mi><mi>s</mi></mrow></math></span> and can therefore track 50,000 jumps per second across 1,024 channels. Finally, the criticality of telecom side-channels in Bluetooth communications is demonstrated through real interception on several microcontroller chips.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105144"},"PeriodicalIF":1.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143463740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware implementation of a high-resolution auto-tuned time-frequency signal analyzer over TMS320C6713 DSK using a compact support polynomial kernel 基于TMS320C6713 DSK的高分辨率自调谐时频信号分析仪的硬件实现,采用紧凑的支持多项式核
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-03-01 Epub Date: 2025-01-09 DOI: 10.1016/j.micpro.2025.105141
Ibrahim Lantri , Mansour Abed , Adel Belouchrani
This paper explores the hardware implementation of an embedded time-frequency signal analyzer using the Polynomial Cheriet-Belouchrani Distribution (PCBD) with a compact kernel. We implemented this distribution on a Texas Instruments TMS320C6713 Digital Signal Processing Starter Kit (DSK). Compared to other quadratic time-frequency distributions (TFDs), the PCBD requires a low computational cost due to its compact support nature, which reduces the number of points needing calculation. The sole smoothing parameter γ that controls its kernel's bandwidth is an integer, simplifying the unsupervised approach. To ensure that the realized TF analyzer is automatically tuned, an accurate low-complexity performance measure must be employed to achieve optimal concentration, resolution, and cross-term suppression. Failure to do so may result in missing or degraded essential signal characteristics. The Stankovic measure has been identified as the preferred measure among many others for finding the optimal value of the integer γ. We have also been exploring methods to optimize the execution of various algorithms by taking advantage of specific mathematical properties inherent in the compact polynomial kernel and the PCBD. Additionally, we propose a recursive method to minimize the computation cost associated with the discrete PCB kernel. These strategies are designed to enhance efficiency and reduce the required machine cycles. To compare the performances provided, we thoroughly evaluate the numerical complexity of our implemented distribution, both with and without mathematical optimization. The findings obtained demonstrate the effectiveness of using the TMS320C6713 DSK board to design a high-resolution auto-tuned time-frequency signal analyzer. We not only achieved a perfect match with the results obtained from MATLAB, but the optimized approach also reduced runtime by approximately 19 % to 47 % compared to the direct method, depending on the input signal length and the number of loops required to optimize the Stankovic measure. A comparative analysis was also conducted to assess the effectiveness of our approach in relation to other linear and quadratic TF analyzers, including those implemented on field-programmable gate arrays (FPGAs).
本文探讨了一种嵌入式时频信号分析仪的硬件实现,该分析仪采用具有紧凑核的多项式cherieet - belouchrani分布(PCBD)。我们在德州仪器TMS320C6713数字信号处理入门套件(DSK)上实现了这个分布。与其他二次时频分布(TFDs)相比,PCBD的计算成本较低,因为它具有紧凑的支撑特性,减少了需要计算的点的数量。控制其核带宽的唯一平滑参数γ是一个整数,简化了无监督方法。为了确保所实现的TF分析仪自动调谐,必须采用精确的低复杂度性能测量来实现最佳浓度、分辨率和交叉项抑制。如果不这样做,可能会导致丢失或降级的基本信号特性。Stankovic测度已被确定为在许多其他测度中寻找整数γ的最优值的首选测度。我们也一直在探索利用紧多项式核和PCBD中固有的特定数学性质来优化各种算法执行的方法。此外,我们提出了一种递归方法来最小化与离散PCB内核相关的计算成本。这些策略旨在提高效率并减少所需的机器周期。为了比较所提供的性能,我们彻底地评估了我们实现的分布的数值复杂性,无论是否进行了数学优化。实验结果证明了利用TMS320C6713 DSK板设计高分辨率自调谐时频信号分析仪的有效性。我们不仅实现了与MATLAB得到的结果的完美匹配,而且优化后的方法与直接方法相比,运行时间减少了约19%至47%,具体取决于优化Stankovic测量所需的输入信号长度和循环数量。还进行了比较分析,以评估我们的方法与其他线性和二次型TF分析仪(包括在现场可编程门阵列(fpga)上实现的分析仪)的有效性。
{"title":"Hardware implementation of a high-resolution auto-tuned time-frequency signal analyzer over TMS320C6713 DSK using a compact support polynomial kernel","authors":"Ibrahim Lantri ,&nbsp;Mansour Abed ,&nbsp;Adel Belouchrani","doi":"10.1016/j.micpro.2025.105141","DOIUrl":"10.1016/j.micpro.2025.105141","url":null,"abstract":"<div><div>This paper explores the hardware implementation of an embedded time-frequency signal analyzer using the Polynomial Cheriet-Belouchrani Distribution (PCBD) with a compact kernel. We implemented this distribution on a Texas Instruments TMS320C6713 Digital Signal Processing Starter Kit (DSK). Compared to other quadratic time-frequency distributions (TFDs), the PCBD requires a low computational cost due to its compact support nature, which reduces the number of points needing calculation. The sole smoothing parameter <em>γ</em> that controls its kernel's bandwidth is an integer, simplifying the unsupervised approach. To ensure that the realized TF analyzer is automatically tuned, an accurate low-complexity performance measure must be employed to achieve optimal concentration, resolution, and cross-term suppression. Failure to do so may result in missing or degraded essential signal characteristics. The Stankovic measure has been identified as the preferred measure among many others for finding the optimal value of the integer <em>γ</em>. We have also been exploring methods to optimize the execution of various algorithms by taking advantage of specific mathematical properties inherent in the compact polynomial kernel and the PCBD. Additionally, we propose a recursive method to minimize the computation cost associated with the discrete PCB kernel. These strategies are designed to enhance efficiency and reduce the required machine cycles. To compare the performances provided, we thoroughly evaluate the numerical complexity of our implemented distribution, both with and without mathematical optimization. The findings obtained demonstrate the effectiveness of using the TMS320C6713 DSK board to design a high-resolution auto-tuned time-frequency signal analyzer. We not only achieved a perfect match with the results obtained from MATLAB, but the optimized approach also reduced runtime by approximately 19 % to 47 % compared to the direct method, depending on the input signal length and the number of loops required to optimize the Stankovic measure. A comparative analysis was also conducted to assess the effectiveness of our approach in relation to other linear and quadratic TF analyzers, including those implemented on field-programmable gate arrays (FPGAs).</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105141"},"PeriodicalIF":1.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143179736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Microprocessors and Microsystems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1