Pub Date : 2025-10-17DOI: 10.1109/LES.2025.3598189
G. Noble;S. Nalesh;S. Kala;Salim Ullah;Akash Kumar
Efficient representation of sparse matrices is critical for reducing memory usage and improving performance in hardware-accelerated computing systems. This letter presents memory-efficient delta-compressed storage row (MdCSR), a novel sparse matrix format designed to improve both storage efficiency and execution speed. MdCSR replaces absolute column indices with compact relative offsets and selectively applies delta encoding, resulting in a more compact index structure. Compared to traditional formats, it achieves an average of 15.45% memory savings over compressed sparse row (CSR), 52.77% over dCSR, and around 20% reduction in execution time. A dedicated architecture for CSR to MdCSR compression is also presented, optimized for real-time and low-overhead FPGA deployment.
{"title":"MdCSR: A Memory-Efficient Sparse Matrix Compression Format","authors":"G. Noble;S. Nalesh;S. Kala;Salim Ullah;Akash Kumar","doi":"10.1109/LES.2025.3598189","DOIUrl":"https://doi.org/10.1109/LES.2025.3598189","url":null,"abstract":"Efficient representation of sparse matrices is critical for reducing memory usage and improving performance in hardware-accelerated computing systems. This letter presents memory-efficient delta-compressed storage row (MdCSR), a novel sparse matrix format designed to improve both storage efficiency and execution speed. MdCSR replaces absolute column indices with compact relative offsets and selectively applies delta encoding, resulting in a more compact index structure. Compared to traditional formats, it achieves an average of 15.45% memory savings over compressed sparse row (CSR), 52.77% over dCSR, and around 20% reduction in execution time. A dedicated architecture for CSR to MdCSR compression is also presented, optimized for real-time and low-overhead FPGA deployment.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"289-292"},"PeriodicalIF":2.0,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-17DOI: 10.1109/LES.2025.3600581
Changjo Cho;Hamin An;Jangho Shin;Jong-Chan Kim
Future software-defined vehicles (SDVs) are expected to employ the zonal architecture with container-based microservices on distributed computing nodes. Such containers should be carefully orchestrated to ensure operational continuity by proper failover mechanisms. For that, we first try to use K3s, a lightweight Kubernetes implementation, as our baseline. However, we found that K3s has significant failover delays (over 5 min by default settings), which makes it unsuitable for safety-critical applications. We thus propose an enhanced health check and failover mechanism by minimized sensor-triggered heartbeat intervals and warm standby container redundancy. Our experiments with realistic containerized applications (i.e., in-cabin pose estimation and on-road lane detection) show that the failover delays are reduced to under 1 s, achieving the real-time performance for safety-critical applications in future SDVs.
{"title":"Container-Based Fail-Operational System Architecture for Software-Defined Vehicles","authors":"Changjo Cho;Hamin An;Jangho Shin;Jong-Chan Kim","doi":"10.1109/LES.2025.3600581","DOIUrl":"https://doi.org/10.1109/LES.2025.3600581","url":null,"abstract":"Future software-defined vehicles (SDVs) are expected to employ the zonal architecture with container-based microservices on distributed computing nodes. Such containers should be carefully orchestrated to ensure operational continuity by proper failover mechanisms. For that, we first try to use K3s, a lightweight Kubernetes implementation, as our baseline. However, we found that K3s has significant failover delays (over 5 min by default settings), which makes it unsuitable for safety-critical applications. We thus propose an enhanced health check and failover mechanism by minimized sensor-triggered heartbeat intervals and warm standby container redundancy. Our experiments with realistic containerized applications (i.e., in-cabin pose estimation and on-road lane detection) show that the failover delays are reduced to under 1 s, achieving the real-time performance for safety-critical applications in future SDVs.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"293-296"},"PeriodicalIF":2.0,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-17DOI: 10.1109/LES.2025.3614406
Aviral Shrivastava
{"title":"The Upcoming Era of Specialized Models","authors":"Aviral Shrivastava","doi":"10.1109/LES.2025.3614406","DOIUrl":"https://doi.org/10.1109/LES.2025.3614406","url":null,"abstract":"","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"288-288"},"PeriodicalIF":2.0,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11206580","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-17DOI: 10.1109/LES.2025.3600585
Dina A. Moussa;Michael Hefenbrock;Mehdi Tahoori
neural networks (NNs) have made profound achievements in various safety-critical applications such as healthcare, medical devices, and automotive. These NN models are usually trained using cloud systems; however, due to latency, privacy, and bandwidth concerns, inference is performed on edge devices. Consequently, the model size is often reduced through pruning and quantization to map the cloud-trained models to edge artificial intelligence hardware. To ensure that the reduced models maintain the integrity of the original, larger models, detecting inequivalences is crucial. In this letter, we focus on inequivalence detection by identifying cases where the behavior of the reduced model diverges from the original model. This is achieved by formulating an optimization problem to maximize the difference between the two models. In contrast to the related work, our proposed approach is agnostic to the choice of activation function and can be applied to networks utilizing a wide variety of nonlinearities. Furthermore, it considers only counterexamples that are in range of the original data, the so-called In Distribution, as only in these regions, the model can be considered properly specified. The experimental results showed that the found counterexamples were able to differentiate models for various NN architectures and datasets.
{"title":"Detecting Nonequivalence in Neural Networks Through In-Distribution Counterexample Generation","authors":"Dina A. Moussa;Michael Hefenbrock;Mehdi Tahoori","doi":"10.1109/LES.2025.3600585","DOIUrl":"https://doi.org/10.1109/LES.2025.3600585","url":null,"abstract":"neural networks (NNs) have made profound achievements in various safety-critical applications such as healthcare, medical devices, and automotive. These NN models are usually trained using cloud systems; however, due to latency, privacy, and bandwidth concerns, inference is performed on edge devices. Consequently, the model size is often reduced through pruning and quantization to map the cloud-trained models to edge artificial intelligence hardware. To ensure that the reduced models maintain the integrity of the original, larger models, detecting inequivalences is crucial. In this letter, we focus on inequivalence detection by identifying cases where the behavior of the reduced model diverges from the original model. This is achieved by formulating an optimization problem to maximize the difference between the two models. In contrast to the related work, our proposed approach is agnostic to the choice of activation function and can be applied to networks utilizing a wide variety of nonlinearities. Furthermore, it considers only counterexamples that are in range of the original data, the so-called In Distribution, as only in these regions, the model can be considered properly specified. The experimental results showed that the found counterexamples were able to differentiate models for various NN architectures and datasets.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"297-300"},"PeriodicalIF":2.0,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16DOI: 10.1109/LES.2025.3598209
Kanta Yoshioka;Hakaru Tamukoh
We propose a reservoir computing (RC) system that operates under extremely strict power consumption constraints of less than $340~{mu }$ W, which enables continuous operation for one year on a LR6 battery. By combining a look-up tables networks based RC (LUTNet-RC) with iCE40 series field-programmable gate arrays (FPGAs), the proposed system achieves high computational accuracy while significantly reducing power consumption compared with conventional TinyML devices. The proposed method achieves 93.1%, 98.6%, and 92.7% accuracy in real TinyML applications such as human activity recognition, epilepsy detection, and electrocardiogram signal analysis, respectively, while operating at about 254 to $335~{mu }$ W power consumption. This work shows that the proposed LUTNet-RC on iCE40 FPGAs are promising solutions for long-term operational machine learning application implementations in battery-powered edge devices.
{"title":"A 340- μ W TinyML Using LUT-Based Reservoir Computing on Low-Cost FPGAs","authors":"Kanta Yoshioka;Hakaru Tamukoh","doi":"10.1109/LES.2025.3598209","DOIUrl":"https://doi.org/10.1109/LES.2025.3598209","url":null,"abstract":"We propose a reservoir computing (RC) system that operates under extremely strict power consumption constraints of less than <inline-formula> <tex-math>$340~{mu }$ </tex-math></inline-formula>W, which enables continuous operation for one year on a LR6 battery. By combining a look-up tables networks based RC (LUTNet-RC) with iCE40 series field-programmable gate arrays (FPGAs), the proposed system achieves high computational accuracy while significantly reducing power consumption compared with conventional TinyML devices. The proposed method achieves 93.1%, 98.6%, and 92.7% accuracy in real TinyML applications such as human activity recognition, epilepsy detection, and electrocardiogram signal analysis, respectively, while operating at about 254 to <inline-formula> <tex-math>$335~{mu }$ </tex-math></inline-formula>W power consumption. This work shows that the proposed LUTNet-RC on iCE40 FPGAs are promising solutions for long-term operational machine learning application implementations in battery-powered edge devices.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"357-360"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11205905","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
field-programmable gate array (FPGA) accelerator design has gradually become a mainstream acceleration solution, widely applied in fields, such as large language models, deep learning inference, autonomous driving, real-time 3-D scene reconstruction, and embedded intelligent terminals. high-level synthesis (HLS) technology has provided significant support for FPGA accelerator design, greatly improving design efficiency and flexibility. However, manual parameter tuning by designers is still required to achieve optimal performance. Existing research has proposed automated design space exploration (DSE) methods to assist in parameter tuning, but these methods often exhibit low efficiency when dealing with complex HLS designs and, in some cases, fail to function properly. To address this, we present an efficient DSE method guided by hierarchical analysis and rule mining (RM), aimed at tackling more complex design challenges. This approach performs hierarchical analysis of design solutions and integrates RM techniques to optimize the design space search process, enabling efficient exploration of superior design solutions. Experimental results show that our method achieves performance comparable to state-of-the-art (SOTA) techniques, while delivering a speed-up of $3.6{times }$ to $30.4{times }$ . Moreover, it enables the effective exploration of complex design spaces that existing methods struggle to handle.
{"title":"Toward Efficient FPGA Accelerator DSE via Hierarchical and RM-Guided Methods","authors":"Chao Shi;Qianyu Cheng;Teng Wang;Chao Wang;Xuehai Zhou","doi":"10.1109/LES.2025.3600555","DOIUrl":"https://doi.org/10.1109/LES.2025.3600555","url":null,"abstract":"field-programmable gate array (FPGA) accelerator design has gradually become a mainstream acceleration solution, widely applied in fields, such as large language models, deep learning inference, autonomous driving, real-time 3-D scene reconstruction, and embedded intelligent terminals. high-level synthesis (HLS) technology has provided significant support for FPGA accelerator design, greatly improving design efficiency and flexibility. However, manual parameter tuning by designers is still required to achieve optimal performance. Existing research has proposed automated design space exploration (DSE) methods to assist in parameter tuning, but these methods often exhibit low efficiency when dealing with complex HLS designs and, in some cases, fail to function properly. To address this, we present an efficient DSE method guided by hierarchical analysis and rule mining (RM), aimed at tackling more complex design challenges. This approach performs hierarchical analysis of design solutions and integrates RM techniques to optimize the design space search process, enabling efficient exploration of superior design solutions. Experimental results show that our method achieves performance comparable to state-of-the-art (SOTA) techniques, while delivering a speed-up of <inline-formula> <tex-math>$3.6{times }$ </tex-math></inline-formula> to <inline-formula> <tex-math>$30.4{times }$ </tex-math></inline-formula>. Moreover, it enables the effective exploration of complex design spaces that existing methods struggle to handle.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"361-364"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The integration of machine learning into safety-critical cyber-physical systems has significantly increased computational demands, which are often met by modern multicore platforms. While complex memory subsystems, including local caches, make it challenging to maintain timing predictability, they also provide opportunities for worst-case execution time (WCET) optimization through improved data locality. To address this, we propose a multicore partitioning and allocation strategy that leverages sparse structures through neural network disaggregation to optimize the WCET. Our evaluation shows that disaggregated neural networks achieve a significantly reduced WCET, compared to fully connected monolithic neural networks of similar size.
{"title":"WCET-Aware Partitioning and Allocation of Disaggregated Networks for Multicore Systems","authors":"Junjie Shi;Christian Hakert;Kay Heider;Mario Günzel;Nils Hölscher;Daniel Kuhse;Jian-Jia Chen;Logan Kenwright;Sobhan Chatterjee;Nathan Allen;Partha Roop","doi":"10.1109/LES.2025.3600584","DOIUrl":"https://doi.org/10.1109/LES.2025.3600584","url":null,"abstract":"The integration of machine learning into safety-critical cyber-physical systems has significantly increased computational demands, which are often met by modern multicore platforms. While complex memory subsystems, including local caches, make it challenging to maintain timing predictability, they also provide opportunities for worst-case execution time (WCET) optimization through improved data locality. To address this, we propose a multicore partitioning and allocation strategy that leverages sparse structures through neural network disaggregation to optimize the WCET. Our evaluation shows that disaggregated neural networks achieve a significantly reduced WCET, compared to fully connected monolithic neural networks of similar size.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"309-312"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11205907","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16DOI: 10.1109/LES.2025.3600565
Muhammad Sabih;Mohamed Abdo;Frank Hannig;Jürgen Teich
Binary neural networks (BNNs) are known for their minimal memory requirements, making them an attractive choice for resource-constrained environments. SBNN-nps are a more recent advancement that extend the benefits of BNNs by compressing them even further, achieving sub-bit level representations to maximize efficiency. However, effectively compressing and accelerating BNNs presents challenges. In this letter, we propose a novel approach to compress BNNs using a fixed-length compression scheme that can be efficiently decoded at runtime. We then propose RISC-V extensions, implemented as a custom function unit (CFU), to decode compressed weights via a codebook stored on an FPGA on-board memory, followed by XOR and population count operations. This approach achieves a speedup of up to 2× compared to conventional BNNs deployed on the RISC-V softcore, with Significantly less accuracy degradation, and provides a foundation for exploring even higher compression configurations to improve performance further.
{"title":"Beyond BNNs: Design and Acceleration of Sub-Bit Neural Networks Using RISC-V Custom Functional Units","authors":"Muhammad Sabih;Mohamed Abdo;Frank Hannig;Jürgen Teich","doi":"10.1109/LES.2025.3600565","DOIUrl":"https://doi.org/10.1109/LES.2025.3600565","url":null,"abstract":"Binary neural networks (BNNs) are known for their minimal memory requirements, making them an attractive choice for resource-constrained environments. SBNN-nps are a more recent advancement that extend the benefits of BNNs by compressing them even further, achieving sub-bit level representations to maximize efficiency. However, effectively compressing and accelerating BNNs presents challenges. In this letter, we propose a novel approach to compress BNNs using a fixed-length compression scheme that can be efficiently decoded at runtime. We then propose RISC-V extensions, implemented as a custom function unit (CFU), to decode compressed weights via a codebook stored on an FPGA on-board memory, followed by XOR and population count operations. This approach achieves a speedup of up to 2× compared to conventional BNNs deployed on the RISC-V softcore, with Significantly less accuracy degradation, and provides a foundation for exploring even higher compression configurations to improve performance further.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"329-332"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16DOI: 10.1109/LES.2025.3598202
Julian Göppert;Axel Sikora
Industrial cyber-physical systems (ICPS) face rising cyberattacks, requiring secure credential management also in resource-constrained embedded systems. Standards specifying field level communication of ICPS (e.g., PROFINET or OPC UA) define protocol-specific credential management processes, yet lack formal security verification. We propose a generic model capturing initial security onboarding and automated credential provisioning. Using ProVerif, an automatic symbolic protocol verifier, we formalize certificate-based authentication under a Dolev-Yao adversary, verifying private key secrecy, component authentication, and mutual authentication with the operator domain. Robustness checks confirm resilience against key leakage and highlight the vulnerabilities of the trust on first use concept proposed by the standards. Our model offers the first formal guarantees for secure credential management in ICPS.
{"title":"Formal Modeling and Verification of Generic Credential Management Processes for Industrial Cyber–Physical Systems","authors":"Julian Göppert;Axel Sikora","doi":"10.1109/LES.2025.3598202","DOIUrl":"https://doi.org/10.1109/LES.2025.3598202","url":null,"abstract":"Industrial cyber-physical systems (ICPS) face rising cyberattacks, requiring secure credential management also in resource-constrained embedded systems. Standards specifying field level communication of ICPS (e.g., PROFINET or OPC UA) define protocol-specific credential management processes, yet lack formal security verification. We propose a generic model capturing initial security onboarding and automated credential provisioning. Using ProVerif, an automatic symbolic protocol verifier, we formalize certificate-based authentication under a Dolev-Yao adversary, verifying private key secrecy, component authentication, and mutual authentication with the operator domain. Robustness checks confirm resilience against key leakage and highlight the vulnerabilities of the trust on first use concept proposed by the standards. Our model offers the first formal guarantees for secure credential management in ICPS.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"349-352"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16DOI: 10.1109/LES.2025.3600618
Bo Zhang;Yinkang Gao;Caixu Zhao;Xi Li
Ensuring predictable and repeatable behavior in concurrent real-time systems requires dataflow determinism—that is, each consumer task instance must always read data from the same producer instance. While the logical execution time (LET) model enforces this property, its software implementations typically rely on timed I/O or multibuffering protocols. These approaches introduce software complexity, execution overhead, and priority inversion, resulting in increased and unstable task response times, thereby degrading overall schedulability. We propose time-semantic memory instruction (TSMI), a new instruction set extension that embeds logical timing into memory access operations. Unlike existing LET implementations, TSMI enforces dataflow determinism at the instruction level, eliminating the need for memory protocols or access ordering constraints. We develop a TSMI microarchitectural implementation that translates TSMI instructions into standard memory accesses and a programming model that not only captures LET semantics but also enables more expressive, per-access dataflow control. A cycle-accurate RISC-V simulator with TSMI achieves up to 95.36% worst-case response time (WCRT) and 98.88% response time variability (RTV) reduction compared to existing methods.
{"title":"Instruction-Level Support for Deterministic Dataflow in Real-Time Systems","authors":"Bo Zhang;Yinkang Gao;Caixu Zhao;Xi Li","doi":"10.1109/LES.2025.3600618","DOIUrl":"https://doi.org/10.1109/LES.2025.3600618","url":null,"abstract":"Ensuring predictable and repeatable behavior in concurrent real-time systems requires dataflow determinism—that is, each consumer task instance must always read data from the same producer instance. While the logical execution time (LET) model enforces this property, its software implementations typically rely on timed I/O or multibuffering protocols. These approaches introduce software complexity, execution overhead, and priority inversion, resulting in increased and unstable task response times, thereby degrading overall schedulability. We propose time-semantic memory instruction (TSMI), a new instruction set extension that embeds logical timing into memory access operations. Unlike existing LET implementations, TSMI enforces dataflow determinism at the instruction level, eliminating the need for memory protocols or access ordering constraints. We develop a TSMI microarchitectural implementation that translates TSMI instructions into standard memory accesses and a programming model that not only captures LET semantics but also enables more expressive, per-access dataflow control. A cycle-accurate RISC-V simulator with TSMI achieves up to 95.36% worst-case response time (WCRT) and 98.88% response time variability (RTV) reduction compared to existing methods.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 5","pages":"341-344"},"PeriodicalIF":2.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}