首页 > 最新文献

IEEE Embedded Systems Letters最新文献

英文 中文
Formal Modeling and Verification of Generic Credential Management Processes for Industrial Cyber–Physical Systems 工业信息物理系统通用凭证管理过程的形式化建模和验证
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-10-16 DOI: 10.1109/LES.2025.3598202
Julian Göppert;Axel Sikora
Industrial cyber-physical systems (ICPS) face rising cyberattacks, requiring secure credential management also in resource-constrained embedded systems. Standards specifying field level communication of ICPS (e.g., PROFINET or OPC UA) define protocol-specific credential management processes, yet lack formal security verification. We propose a generic model capturing initial security onboarding and automated credential provisioning. Using ProVerif, an automatic symbolic protocol verifier, we formalize certificate-based authentication under a Dolev-Yao adversary, verifying private key secrecy, component authentication, and mutual authentication with the operator domain. Robustness checks confirm resilience against key leakage and highlight the vulnerabilities of the trust on first use concept proposed by the standards. Our model offers the first formal guarantees for secure credential management in ICPS.
工业网络物理系统(ICPS)面临越来越多的网络攻击,在资源受限的嵌入式系统中也需要安全的凭证管理。指定ICPS现场级通信的标准(例如,PROFINET或OPC UA)定义了特定于协议的凭据管理过程,但缺乏正式的安全验证。我们提出了一个通用模型,用于捕获初始安全登录和自动凭证配置。使用ProVerif(一个自动符号协议验证器),我们在Dolev-Yao对手下形式化了基于证书的身份验证,验证私钥保密、组件身份验证以及与操作员域的相互身份验证。鲁棒性检查确认了密钥泄漏的弹性,并突出了标准提出的首次使用信任概念的漏洞。我们的模型为ICPS中的安全凭据管理提供了第一个正式保证。
{"title":"Formal Modeling and Verification of Generic Credential Management Processes for Industrial Cyber–Physical Systems","authors":"Julian Göppert;Axel Sikora","doi":"10.1109/LES.2025.3598202","DOIUrl":"https://doi.org/10.1109/LES.2025.3598202","url":null,"abstract":"Industrial cyber-physical systems (ICPS) face rising cyberattacks, requiring secure credential management also in resource-constrained embedded systems. Standards specifying field level communication of ICPS (e.g., PROFINET or OPC UA) define protocol-specific credential management processes, yet lack formal security verification. We propose a generic model capturing initial security onboarding and automated credential provisioning. Using ProVerif, an automatic symbolic protocol verifier, we formalize certificate-based authentication under a Dolev-Yao adversary, verifying private key secrecy, component authentication, and mutual authentication with the operator domain. Robustness checks confirm resilience against key leakage and highlight the vulnerabilities of the trust on first use concept proposed by the standards. Our model offers the first formal guarantees for secure credential management in ICPS.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"349-352"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Instruction-Level Support for Deterministic Dataflow in Real-Time Systems 实时系统中确定性数据流的指令级支持
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-10-16 DOI: 10.1109/LES.2025.3600618
Bo Zhang;Yinkang Gao;Caixu Zhao;Xi Li
Ensuring predictable and repeatable behavior in concurrent real-time systems requires dataflow determinism—that is, each consumer task instance must always read data from the same producer instance. While the logical execution time (LET) model enforces this property, its software implementations typically rely on timed I/O or multibuffering protocols. These approaches introduce software complexity, execution overhead, and priority inversion, resulting in increased and unstable task response times, thereby degrading overall schedulability. We propose time-semantic memory instruction (TSMI), a new instruction set extension that embeds logical timing into memory access operations. Unlike existing LET implementations, TSMI enforces dataflow determinism at the instruction level, eliminating the need for memory protocols or access ordering constraints. We develop a TSMI microarchitectural implementation that translates TSMI instructions into standard memory accesses and a programming model that not only captures LET semantics but also enables more expressive, per-access dataflow control. A cycle-accurate RISC-V simulator with TSMI achieves up to 95.36% worst-case response time (WCRT) and 98.88% response time variability (RTV) reduction compared to existing methods.
在并发实时系统中确保可预测和可重复的行为需要数据流确定性——也就是说,每个消费者任务实例必须始终从相同的生产者实例读取数据。虽然逻辑执行时间(LET)模型强制执行此属性,但其软件实现通常依赖于定时I/O或多缓冲协议。这些方法引入了软件复杂性、执行开销和优先级反转,导致任务响应时间增加且不稳定,从而降低了总体可调度性。我们提出了时间语义存储器指令(TSMI),这是一种新的指令集扩展,它将逻辑时序嵌入到存储器访问操作中。与现有的LET实现不同,TSMI在指令级别强制执行数据流确定性,从而消除了对内存协议或访问顺序约束的需求。我们开发了一个TSMI微架构实现,将TSMI指令转换为标准内存访问,并开发了一个编程模型,该模型不仅可以捕获LET语义,还可以实现更具表现力的每次访问数据流控制。与现有方法相比,采用TSMI的周期精确RISC-V模拟器可实现95.36%的最坏情况响应时间(WCRT)和98.88%的响应时间变异性(RTV)降低。
{"title":"Instruction-Level Support for Deterministic Dataflow in Real-Time Systems","authors":"Bo Zhang;Yinkang Gao;Caixu Zhao;Xi Li","doi":"10.1109/LES.2025.3600618","DOIUrl":"https://doi.org/10.1109/LES.2025.3600618","url":null,"abstract":"Ensuring predictable and repeatable behavior in concurrent real-time systems requires dataflow determinism—that is, each consumer task instance must always read data from the same producer instance. While the logical execution time (LET) model enforces this property, its software implementations typically rely on timed I/O or multibuffering protocols. These approaches introduce software complexity, execution overhead, and priority inversion, resulting in increased and unstable task response times, thereby degrading overall schedulability. We propose time-semantic memory instruction (TSMI), a new instruction set extension that embeds logical timing into memory access operations. Unlike existing LET implementations, TSMI enforces dataflow determinism at the instruction level, eliminating the need for memory protocols or access ordering constraints. We develop a TSMI microarchitectural implementation that translates TSMI instructions into standard memory accesses and a programming model that not only captures LET semantics but also enables more expressive, per-access dataflow control. A cycle-accurate RISC-V simulator with TSMI achieves up to 95.36% worst-case response time (WCRT) and 98.88% response time variability (RTV) reduction compared to existing methods.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"341-344"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RISC-V Integrated Nested Loop Analyzer for Runtime DRAM Test Pattern Generation RISC-V运行时DRAM测试模式生成集成嵌套循环分析仪
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-10-16 DOI: 10.1109/LES.2025.3600611
Saeyeon Kim;Sunyoung Park;Nahyeon Kim;Jiyoung Lee;Ji-Hoon Kim
Recent advancements in DRAM technology have increased the complexity and variety of memory faults, necessitating efficient and programmable fault diagnosis, especially in AI and automotive systems where reliability is critical. This letter proposes a Nested Loop Analyzer (NLA) integrated into a RISC-V-based memory test platform to enhance both efficiency and programmability in run-time memory testing. By leveraging Loop Control Flow Analysis and Basic Block Identification, the NLA eliminates complex loop control in pattern generation and reduces pattern buffer overhead between the Pattern Generator (PG) and the DRAM physical layer (PHY). Additionally, integrating memory testing within the RISC-V system-on-chip (SoC) environment enables seamless development and integration of memory testing with general application tasks. The proposed approach provides a high-programmability, run-time DRAM test pattern generation platform with efficient hardware usage, reduced buffer requirements, and seamless RISC-V integration.
DRAM技术的最新进步增加了内存故障的复杂性和多样性,需要高效和可编程的故障诊断,特别是在可靠性至关重要的人工智能和汽车系统中。本信函建议将嵌套循环分析器(NLA)集成到基于risc - v的内存测试平台中,以提高运行时内存测试的效率和可编程性。通过利用环路控制流分析和基本块识别,NLA消除了模式生成中的复杂环路控制,并减少了模式生成器(PG)和DRAM物理层(PHY)之间的模式缓冲开销。此外,在RISC-V片上系统(SoC)环境中集成内存测试可以实现内存测试与一般应用任务的无缝开发和集成。所提出的方法提供了一个高可编程性、运行时DRAM测试模式生成平台,具有高效的硬件使用、减少的缓冲区需求和无缝的RISC-V集成。
{"title":"RISC-V Integrated Nested Loop Analyzer for Runtime DRAM Test Pattern Generation","authors":"Saeyeon Kim;Sunyoung Park;Nahyeon Kim;Jiyoung Lee;Ji-Hoon Kim","doi":"10.1109/LES.2025.3600611","DOIUrl":"https://doi.org/10.1109/LES.2025.3600611","url":null,"abstract":"Recent advancements in DRAM technology have increased the complexity and variety of memory faults, necessitating efficient and programmable fault diagnosis, especially in AI and automotive systems where reliability is critical. This letter proposes a Nested Loop Analyzer (NLA) integrated into a RISC-V-based memory test platform to enhance both efficiency and programmability in run-time memory testing. By leveraging Loop Control Flow Analysis and Basic Block Identification, the NLA eliminates complex loop control in pattern generation and reduces pattern buffer overhead between the Pattern Generator (PG) and the DRAM physical layer (PHY). Additionally, integrating memory testing within the RISC-V system-on-chip (SoC) environment enables seamless development and integration of memory testing with general application tasks. The proposed approach provides a high-programmability, run-time DRAM test pattern generation platform with efficient hardware usage, reduced buffer requirements, and seamless RISC-V integration.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"333-336"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Iterative Beam Search for Human–Robot Collaborative Assembly Line Balancing 人机协同装配线平衡的高效迭代梁搜索
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-10-16 DOI: 10.1109/LES.2025.3600560
Suraj Meshram;Sanket Jaipuriar;Arnab Sarkar;Arijit Mondal
Modern cyber–physical systems (CPSs) and IoT-enabled smart factories rely on human–robot collaboration (HRC) to combine human intuition and robotic precision in real time. Balancing such HRC assembly lines, where each task may execute in human-only, robot-only, or collaborative modes, poses a combinatorial challenge that defies scalable mixed integer linear programming (MILP) and oversimplified heuristics. In this letter, we present IBSHRC, a proof-of-concept Iterative Beam Search framework designed for single-product, straight-line CPS assembly systems. IBSHRC leverages mode-aware initialization, binary-search cycle-time refinement, and efficient pruning to navigate vast scheduling spaces at the network edge. On benchmark instances up to 100 tasks, our method delivers near-optimal cycle times with up to $300times $ speed-ups over MILP (subsecond runtimes), demonstrating its promise for real-time, IoT-driven industrial scheduling.
现代网络物理系统(cps)和支持物联网的智能工厂依赖人机协作(HRC)来实时结合人类的直觉和机器人的精度。平衡这样的HRC装配线,其中每个任务可能在只有人类、只有机器人或协作模式下执行,这构成了一个组合挑战,它违背了可扩展的混合整数线性规划(MILP)和过于简化的启发式。在这封信中,我们提出了IBSHRC,这是一个为单产品直线CPS装配系统设计的概念验证迭代光束搜索框架。IBSHRC利用模式感知的初始化、二进制搜索周期时间优化和有效的修剪来导航网络边缘的巨大调度空间。在多达100个任务的基准实例上,我们的方法提供了接近最佳的周期时间,比MILP(亚秒运行时间)加速高达300倍,证明了其对实时、物联网驱动的工业调度的承诺。
{"title":"An Efficient Iterative Beam Search for Human–Robot Collaborative Assembly Line Balancing","authors":"Suraj Meshram;Sanket Jaipuriar;Arnab Sarkar;Arijit Mondal","doi":"10.1109/LES.2025.3600560","DOIUrl":"https://doi.org/10.1109/LES.2025.3600560","url":null,"abstract":"Modern cyber–physical systems (CPSs) and IoT-enabled smart factories rely on human–robot collaboration (HRC) to combine human intuition and robotic precision in real time. Balancing such HRC assembly lines, where each task may execute in human-only, robot-only, or collaborative modes, poses a combinatorial challenge that defies scalable mixed integer linear programming (MILP) and oversimplified heuristics. In this letter, we present IBSHRC, a proof-of-concept Iterative Beam Search framework designed for single-product, straight-line CPS assembly systems. IBSHRC leverages mode-aware initialization, binary-search cycle-time refinement, and efficient pruning to navigate vast scheduling spaces at the network edge. On benchmark instances up to 100 tasks, our method delivers near-optimal cycle times with up to <inline-formula> <tex-math>$300times $ </tex-math></inline-formula> speed-ups over MILP (subsecond runtimes), demonstrating its promise for real-time, IoT-driven industrial scheduling.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"313-316"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigation of the Adversarial Robustness of End-to-End Deep Sensor Fusion Models 端到端深度传感器融合模型的对抗鲁棒性研究
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-10-16 DOI: 10.1109/LES.2025.3599829
Mohamed Marwen Moslah;Ramzi Zouari;Ahmad Shahnejat Bushehri;Felipe Gohring de Magalhaes;Gabriela Nicolescu
Autonomous driving systems increasingly depend on multimodal sensor fusion (deep sensor fusion (DSF)), integrating data from cameras, radar, and LiDAR to improve environmental perception and decision-making. The integration of deep learning models into sensor fusion has significantly enhanced perception capabilities, but it also raises concerns about the robustness of these models when exposed to adversarial attacks. As prior research on the adversarial robustness of TransFuser — one of the most advanced end-to-end transformer-based DSF models for autonomous driving — has been limited to single-modality attacks targeting the camera sensor, this work extends the investigation to assess the robustness of TransFuser under various attack scenarios, including those involving the LiDAR modality. We employed the fast gradient sign method (FGSM) and projected gradient descent (PGD) to perform single-channel adversarial attacks on camera and LiDAR modalities separately, as well as the joint-channel attack. The experiments were conducted in the CARLA simulator using the Town05 Short urban environment, including 32 routes featuring diverse driving scenarios. The results clearly demonstrate the vulnerability of TransFuser to adversarial attacks where transformer-based sensor fusion is utilized, particularly under joint-channel attacks. Our experiments demonstrate that LiDAR-targeted single-channel attacks significantly degrade driving performance, reducing the driving score by 49.87% under FGSM attacks, and by 50.15% and 42.12% under joint FGSM and PGD attacks, respectively. This study informs the design of more robust and secure DSF architectures for end-to-end autonomous driving.
自动驾驶系统越来越依赖于多模态传感器融合(深度传感器融合(DSF)),整合来自摄像头、雷达和激光雷达的数据,以改善环境感知和决策。将深度学习模型集成到传感器融合中大大增强了感知能力,但它也引起了对这些模型在暴露于对抗性攻击时的鲁棒性的担忧。由于之前对transferuser(最先进的端到端基于变压器的自动驾驶DSF模型之一)的对抗性鲁棒性的研究仅限于针对相机传感器的单模态攻击,这项工作扩展了调查,以评估transferuser在各种攻击场景下的鲁棒性,包括那些涉及LiDAR模式的攻击场景。我们采用快速梯度符号法(FGSM)和投影梯度下降法(PGD)分别对相机和LiDAR模式进行单通道对抗性攻击,以及联合通道攻击。实验在CARLA模拟器中进行,使用Town05短城市环境,包括32条不同驾驶场景的路线。结果清楚地表明,在利用基于变压器的传感器融合的对抗性攻击中,特别是在联合通道攻击下,transferuser的脆弱性。我们的实验表明,针对激光雷达的单通道攻击显著降低了驾驶性能,在FGSM攻击下,驾驶分数降低了49.87%,在FGSM和PGD联合攻击下,驾驶分数分别降低了50.15%和42.12%。这项研究为端到端自动驾驶提供了更强大、更安全的DSF架构设计。
{"title":"Investigation of the Adversarial Robustness of End-to-End Deep Sensor Fusion Models","authors":"Mohamed Marwen Moslah;Ramzi Zouari;Ahmad Shahnejat Bushehri;Felipe Gohring de Magalhaes;Gabriela Nicolescu","doi":"10.1109/LES.2025.3599829","DOIUrl":"https://doi.org/10.1109/LES.2025.3599829","url":null,"abstract":"Autonomous driving systems increasingly depend on multimodal sensor fusion (deep sensor fusion (DSF)), integrating data from cameras, radar, and LiDAR to improve environmental perception and decision-making. The integration of deep learning models into sensor fusion has significantly enhanced perception capabilities, but it also raises concerns about the robustness of these models when exposed to adversarial attacks. As prior research on the adversarial robustness of TransFuser — one of the most advanced end-to-end transformer-based DSF models for autonomous driving — has been limited to single-modality attacks targeting the camera sensor, this work extends the investigation to assess the robustness of TransFuser under various attack scenarios, including those involving the LiDAR modality. We employed the fast gradient sign method (FGSM) and projected gradient descent (PGD) to perform single-channel adversarial attacks on camera and LiDAR modalities separately, as well as the joint-channel attack. The experiments were conducted in the CARLA simulator using the Town05 Short urban environment, including 32 routes featuring diverse driving scenarios. The results clearly demonstrate the vulnerability of TransFuser to adversarial attacks where transformer-based sensor fusion is utilized, particularly under joint-channel attacks. Our experiments demonstrate that LiDAR-targeted single-channel attacks significantly degrade driving performance, reducing the driving score by 49.87% under FGSM attacks, and by 50.15% and 42.12% under joint FGSM and PGD attacks, respectively. This study informs the design of more robust and secure DSF architectures for end-to-end autonomous driving.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"325-328"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Configurable ReRAM Engine for Energy-Efficient Sparse Neural Network Acceleration 高效稀疏神经网络加速的可配置ReRAM引擎
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-10-16 DOI: 10.1109/LES.2025.3600592
I-Yang Chen;Kai-Wei Hou;Ya-Shu Chen
Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures offer high computational parallelism for accelerating neural networks. However, they suffer from high power consumption, primarily due to the extensive use of analog-to-digital converters (ADCs). In this work, we propose ReCEN, a configurable engine designed to reduce energy consumption by dynamically adjusting the operating frequency of ADCs through column sparsity exploration in neural networks. To further enhance energy efficiency, we exploit sparsity by introducing effective bias and discarding least significant bits during ReRAM weight programming. Experimental results demonstrate that, under equivalent computational resources, our proposed engine significantly reduces ADC power consumption, thereby improving overall energy efficiency.
基于电阻随机存取存储器(ReRAM)的内存处理(PIM)架构为加速神经网络提供了高计算并行性。然而,由于模数转换器(adc)的广泛使用,它们的功耗很高。在这项工作中,我们提出了ReCEN,一个可配置的引擎,旨在通过神经网络中的列稀疏性探索动态调整adc的工作频率来降低能耗。为了进一步提高能源效率,我们在ReRAM权重编程期间通过引入有效偏置和丢弃最低有效位来利用稀疏性。实验结果表明,在同等计算资源下,我们提出的引擎显著降低了ADC功耗,从而提高了整体能效。
{"title":"A Configurable ReRAM Engine for Energy-Efficient Sparse Neural Network Acceleration","authors":"I-Yang Chen;Kai-Wei Hou;Ya-Shu Chen","doi":"10.1109/LES.2025.3600592","DOIUrl":"https://doi.org/10.1109/LES.2025.3600592","url":null,"abstract":"Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures offer high computational parallelism for accelerating neural networks. However, they suffer from high power consumption, primarily due to the extensive use of analog-to-digital converters (ADCs). In this work, we propose ReCEN, a configurable engine designed to reduce energy consumption by dynamically adjusting the operating frequency of ADCs through column sparsity exploration in neural networks. To further enhance energy efficiency, we exploit sparsity by introducing effective bias and discarding least significant bits during ReRAM weight programming. Experimental results demonstrate that, under equivalent computational resources, our proposed engine significantly reduces ADC power consumption, thereby improving overall energy efficiency.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"305-308"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Register-Balancing for Masked Hardware 屏蔽硬件的有效寄存器平衡
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-10-16 DOI: 10.1109/LES.2025.3600586
Nilotpola Sarma;Sujeet Narayan Kamble;Chandan Karfa
Power side channel attacks (PSCAs) are a significant threat to secure cryptographic processors. Masking is an algorithmic countermeasure against PSCAs. During masking, the insertion of registers at designated locations according to the masking scheme is pivotal to ensure glitches in hardware do not effect the PSCA security. Such an insertion should be followed by proper insertion of balancing registers to ensure equal number of registers in all parallel paths of the design. This is called register balancing (RB). An RB procedure should also ensure minimum latency to meet the time constraints of cryptographic workloads. Previously, RB has been carried using a retiming-based approach which involved finding a correct set of retiming labels via solving a set of constraints on these labels, with a time complexity of $O(V(V+E))$ . This work presents a faster RB approach with a time complexity of $O(V+E)$ . The new RB approach used to generate masked AES-256 Canright’s and PRESENT S-boxes demonstrated upto $28times $ faster automated synthesis of masked hardware over existing approaches.
功率侧信道攻击(psca)是对安全加密处理器的重大威胁。掩蔽是一种对抗psca的算法。在屏蔽期间,根据屏蔽方案在指定位置插入寄存器是确保硬件故障不影响PSCA安全性的关键。这样的插入之后应该适当地插入平衡寄存器,以确保在设计的所有并行路径中寄存器的数量相等。这被称为寄存器平衡(RB)。RB过程还应该确保最小的延迟,以满足加密工作负载的时间限制。以前,RB使用基于重新计时的方法进行,该方法包括通过解决这些标签上的一组约束来找到一组正确的重新计时标签,时间复杂度为$O(V(V+E))$。这项工作提出了一个更快的RB方法,时间复杂度为$O(V+E)$。用于生成屏蔽AES-256 Canright 's和PRESENT s -box的新RB方法证明,与现有方法相比,屏蔽硬件的自动合成速度提高了28倍。
{"title":"Efficient Register-Balancing for Masked Hardware","authors":"Nilotpola Sarma;Sujeet Narayan Kamble;Chandan Karfa","doi":"10.1109/LES.2025.3600586","DOIUrl":"https://doi.org/10.1109/LES.2025.3600586","url":null,"abstract":"Power side channel attacks (PSCAs) are a significant threat to secure cryptographic processors. Masking is an algorithmic countermeasure against PSCAs. During masking, the insertion of registers at designated locations according to the masking scheme is pivotal to ensure glitches in hardware do not effect the PSCA security. Such an insertion should be followed by proper insertion of balancing registers to ensure equal number of registers in all parallel paths of the design. This is called register balancing (RB). An RB procedure should also ensure minimum latency to meet the time constraints of cryptographic workloads. Previously, RB has been carried using a retiming-based approach which involved finding a correct set of retiming labels via solving a set of constraints on these labels, with a time complexity of <inline-formula> <tex-math>$O(V(V+E))$ </tex-math></inline-formula>. This work presents a faster RB approach with a time complexity of <inline-formula> <tex-math>$O(V+E)$ </tex-math></inline-formula>. The new RB approach used to generate masked AES-256 Canright’s and PRESENT S-boxes demonstrated upto <inline-formula> <tex-math>$28times $ </tex-math></inline-formula> faster automated synthesis of masked hardware over existing approaches.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"301-304"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimizing Backbone Ethernet Traffic for Enabling Interzonal Messages in Software-Defined Vehicles 在软件定义的交通工具中实现区域间消息最小化骨干以太网流量
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-10-16 DOI: 10.1109/LES.2025.3600612
Ashiqur Rahaman Molla;Ram Mohan Kota;Jaishree Mayank;Arnab Sarkar;Arijit Mondal;Soumyajit Dey
software-defined vehicles (SDVs) employ zonal architectures whose zones exchange periodic real-time traffic over a high-speed Time-Sensitive Ethernet (IEEE 802.1Q) backbone. Existing research has produced many deterministic routing schemes for such backbone Ethernet networks, yet has largely ignored the complementary problem of payload-level frame minimization: multiplexing dozens of small zonal messages—whether CAN, LIN, or any other in-zone protocol—into the fewest possible Ethernet frames so that bandwidth, switch buffers and latency budgets are not squandered. This letter concentrates on that neglected frame-minimization facet and presents an optimization flow dedicated to reducing the number of Ethernet frames. We first formulate an exact Satisfiability Modulo Theories (SMTs) model that finds the minimal set of multiplexed frames required across an entire hyper-period. Because SMT becomes intractable on large vehicle topologies, we then introduce a matrix-based heuristic aggregation (MaHA) algorithm that reproduces the SMT’s frame-minimization decisions to within a few percent while executing in milliseconds. Experiments on synthetic SDV workloads show that naive one zonal message per Ethernet frame policies waste significant available payload capacity; our SMT eliminates this waste completely, and the heuristic achieves closely similar savings with up to three orders of magnitude less run-time, making it a practical drop-in solution for next-generation SDV networks.
软件定义车辆(sdv)采用区域架构,其区域通过高速时间敏感以太网(IEEE 802.1Q)骨干网交换周期性实时流量。现有的研究已经为这种骨干以太网产生了许多确定的路由方案,但在很大程度上忽略了有效负载级帧最小化的补充问题:将数十个小的区域消息(无论是CAN、LIN还是任何其他区域内协议)多路复用到尽可能少的以太网帧中,这样就不会浪费带宽、交换机缓冲区和延迟预算。这封信集中在被忽略的帧最小化方面,并提出了一个致力于减少以太网帧数量的优化流程。我们首先制定了一个精确的可满足模理论(smt)模型,该模型找到了整个超周期所需的最小复用帧集。由于SMT在大型车辆拓扑上变得难以处理,因此我们引入了一种基于矩阵的启发式聚合(MaHA)算法,该算法将SMT的帧最小化决策再现到几个百分点,同时在毫秒内执行。在合成SDV工作负载上的实验表明,朴素的每帧一个区域消息策略浪费了大量的可用负载容量;我们的SMT完全消除了这种浪费,启发式方法实现了类似的节省,最多减少了三个数量级的运行时间,使其成为下一代SDV网络的实用解决方案。
{"title":"Minimizing Backbone Ethernet Traffic for Enabling Interzonal Messages in Software-Defined Vehicles","authors":"Ashiqur Rahaman Molla;Ram Mohan Kota;Jaishree Mayank;Arnab Sarkar;Arijit Mondal;Soumyajit Dey","doi":"10.1109/LES.2025.3600612","DOIUrl":"https://doi.org/10.1109/LES.2025.3600612","url":null,"abstract":"software-defined vehicles (SDVs) employ zonal architectures whose zones exchange periodic real-time traffic over a high-speed Time-Sensitive Ethernet (IEEE 802.1Q) backbone. Existing research has produced many deterministic routing schemes for such backbone Ethernet networks, yet has largely ignored the complementary problem of payload-level frame minimization: multiplexing dozens of small zonal messages—whether CAN, LIN, or any other in-zone protocol—into the fewest possible Ethernet frames so that bandwidth, switch buffers and latency budgets are not squandered. This letter concentrates on that neglected frame-minimization facet and presents an optimization flow dedicated to reducing the number of Ethernet frames. We first formulate an exact Satisfiability Modulo Theories (SMTs) model that finds the minimal set of multiplexed frames required across an entire hyper-period. Because SMT becomes intractable on large vehicle topologies, we then introduce a matrix-based heuristic aggregation (MaHA) algorithm that reproduces the SMT’s frame-minimization decisions to within a few percent while executing in milliseconds. Experiments on synthetic SDV workloads show that naive one zonal message per Ethernet frame policies waste significant available payload capacity; our SMT eliminates this waste completely, and the heuristic achieves closely similar savings with up to three orders of magnitude less run-time, making it a practical drop-in solution for next-generation SDV networks.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"353-356"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EMGAxO: Extending Machine Learning Hardware Generators With Approximate Operators EMGAxO:用近似算子扩展机器学习硬件生成器
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-10-16 DOI: 10.1109/LES.2025.3600043
Ali Asghar;Shahzad Bangash;Suleman Shah;Laiq Hasan;Salim Ullah;Siva Satyendra Sahoo;Akash Kumar
FPGAs provide customizable, low-power, and real-time ML Models acceleration for embedded systems, making them ideal for edge applications like robotics and IoT. However, ML models are computationally intensive and rely heavily on multiplication operations, which dominate the overall resource and power consumption, especially in deep neural networks. Currently available open-source frameworks, such as hls4ml, FINN and Tensil artificial intelligence (AI), facilitate FPGA-based implementation of ML algorithms but exclusively use accurate arithmetic operators, failing to exploit the inherent error resilience of ML models. Meanwhile, a large body of research in approximate computing has produced Approximate Multipliers that offer substantial reductions in area, power, and latency by sacrificing a small amount of accuracy. However, these Approximate Multipliers are not integrated into widely used hardware generation workflows, and no automated mechanism exists for incorporating them into ML model implementations at both software and hardware levels. In this work, we extend the hls4ml framework to support the use of Approximate Multipliers. Our approach enables seamless evaluation of multiple approximate designs, allowing tradeoffs between resource usage and inference accuracy to be explored efficiently. Experimental results demonstrate up to 3.94% LUTs savings and 7.33% reduction in On-Chip Power, with accuracy degradation of 1% compared to accurate designs.
fpga为嵌入式系统提供可定制、低功耗和实时的ML模型加速,使其成为机器人和物联网等边缘应用的理想选择。然而,机器学习模型是计算密集型的,并且严重依赖乘法运算,这在整体资源和功耗中占主导地位,特别是在深度神经网络中。目前可用的开源框架,如hls4ml、FINN和Tensil人工智能(AI),促进了基于fpga的机器学习算法的实现,但只使用精确的算术运算符,未能利用机器学习模型固有的错误弹性。与此同时,在近似计算方面的大量研究已经产生了近似乘法器,它通过牺牲少量的精度来大幅减少面积、功耗和延迟。然而,这些近似乘数器并没有集成到广泛使用的硬件生成工作流中,也没有自动化的机制将它们合并到软件和硬件级别的ML模型实现中。在这项工作中,我们扩展了hls4ml框架,以支持近似乘数器的使用。我们的方法能够无缝地评估多个近似设计,允许有效地探索资源使用和推理精度之间的权衡。实验结果表明,与精确设计相比,可节省高达3.94%的lut,降低7.33%的片上功耗,精度下降1%。
{"title":"EMGAxO: Extending Machine Learning Hardware Generators With Approximate Operators","authors":"Ali Asghar;Shahzad Bangash;Suleman Shah;Laiq Hasan;Salim Ullah;Siva Satyendra Sahoo;Akash Kumar","doi":"10.1109/LES.2025.3600043","DOIUrl":"https://doi.org/10.1109/LES.2025.3600043","url":null,"abstract":"FPGAs provide customizable, low-power, and real-time ML Models acceleration for embedded systems, making them ideal for edge applications like robotics and IoT. However, ML models are computationally intensive and rely heavily on multiplication operations, which dominate the overall resource and power consumption, especially in deep neural networks. Currently available open-source frameworks, such as hls4ml, FINN and Tensil artificial intelligence (AI), facilitate FPGA-based implementation of ML algorithms but exclusively use accurate arithmetic operators, failing to exploit the inherent error resilience of ML models. Meanwhile, a large body of research in approximate computing has produced Approximate Multipliers that offer substantial reductions in area, power, and latency by sacrificing a small amount of accuracy. However, these Approximate Multipliers are not integrated into widely used hardware generation workflows, and no automated mechanism exists for incorporating them into ML model implementations at both software and hardware levels. In this work, we extend the hls4ml framework to support the use of Approximate Multipliers. Our approach enables seamless evaluation of multiple approximate designs, allowing tradeoffs between resource usage and inference accuracy to be explored efficiently. Experimental results demonstrate up to 3.94% LUTs savings and 7.33% reduction in On-Chip Power, with accuracy degradation of 1% compared to accurate designs.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"345-348"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fused Tensor Core: A Hardware–Software Co-Design for Efficient Execution of Attentions on GPUs 融合张量核心:一种高效执行gpu关注的软硬件协同设计
IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-10-16 DOI: 10.1109/LES.2025.3601057
Reza Jahadi;Phil Munz;Ehsan Atoofian
Attention mechanism has become the backbone of machine learning applications, expanding beyond natural language processing into domains, such as computer vision and recommendation systems. We observe that implementing attention layers on GPUs with tensor cores (TCs) using matrix-multiply and accumulate (MMA) operations is suboptimal as the attention layer incurs an excessively large-memory footprint and significant computational complexity, especially with a higher number of input elements. In this work, we propose a hardware-software co-design approach to accelerate the execution of attention layers and reduce energy consumption on GPUs with TCs. Our proposed mechanism efficiently processes costly attention operations through a unique fusion mechanism, reducing the memory requirements of attention layers. Additionally, we revisit the design of TCs and offload certain non-MMA operations within the attention layer from standard CUDA cores to TCs. We extend the instruction set of GPUs to support new operations performed on TCs. Our evaluations reveal that optimizing both hardware and software for attention layers results in a 13.4% performance improvement and an 18.3% reduction in energy-delay on average, compared to a software-only optimization approach.
注意机制已经成为机器学习应用的支柱,从自然语言处理扩展到计算机视觉和推荐系统等领域。我们观察到,在具有张量核(TCs)的gpu上使用矩阵乘法和累积(MMA)操作实现注意层是次优的,因为注意层会导致过大的内存占用和显著的计算复杂性,特别是在输入元素数量较多的情况下。在这项工作中,我们提出了一种硬件软件协同设计方法,以加速注意力层的执行并降低带有tc的gpu的能耗。我们提出的机制通过独特的融合机制有效地处理昂贵的注意操作,降低了注意层的内存需求。此外,我们重新审视了tc的设计,并将注意力层内的某些非mma操作从标准CUDA内核卸载到tc。我们扩展了gpu的指令集,以支持在tc上执行的新操作。我们的评估显示,与仅使用软件优化方法相比,对注意力层的硬件和软件进行优化,平均可以提高13.4%的性能,减少18.3%的能量延迟。
{"title":"Fused Tensor Core: A Hardware–Software Co-Design for Efficient Execution of Attentions on GPUs","authors":"Reza Jahadi;Phil Munz;Ehsan Atoofian","doi":"10.1109/LES.2025.3601057","DOIUrl":"https://doi.org/10.1109/LES.2025.3601057","url":null,"abstract":"Attention mechanism has become the backbone of machine learning applications, expanding beyond natural language processing into domains, such as computer vision and recommendation systems. We observe that implementing attention layers on GPUs with tensor cores (TCs) using matrix-multiply and accumulate (MMA) operations is suboptimal as the attention layer incurs an excessively large-memory footprint and significant computational complexity, especially with a higher number of input elements. In this work, we propose a hardware-software co-design approach to accelerate the execution of attention layers and reduce energy consumption on GPUs with TCs. Our proposed mechanism efficiently processes costly attention operations through a unique fusion mechanism, reducing the memory requirements of attention layers. Additionally, we revisit the design of TCs and offload certain non-MMA operations within the attention layer from standard CUDA cores to TCs. We extend the instruction set of GPUs to support new operations performed on TCs. Our evaluations reveal that optimizing both hardware and software for attention layers results in a 13.4% performance improvement and an 18.3% reduction in energy-delay on average, compared to a software-only optimization approach.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"317-320"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Embedded Systems Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1