Real-Time Systems最新文献

Multi-core interference over-estimation reduction by static scheduling of multi-phase tasks 通过静态调度多阶段任务减少多核干扰高估

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Real-Time Systems

Pub Date : 2024-09-05 DOI: 10.1007/s11241-024-09427-3

Rémi Meunier, Thomas Carle, Thierry Monteil

Interference between tasks running on separate cores in multi-core processors is a major challenge to predictability for real-time systems, and a source of over-estimation of worst-case execution duration bounds. This paper investigates how the multi-phase task model can be used together with static scheduling algorithms to improve the precision of the interference analysis. The paper focuses on single-period task systems (or multi-periodic systems that can be expanded over an hyperperiod). In particular, we propose an Integer Linear Programming (ILP) formulation of a generic scheduling problem as well as 3 heuristics that we evaluate on synthetic benchmarks and on 2 realistic applications. We observe that, compared to the classical 1-phase model, the multi-phase model allows to reduce the effect of interference on the worst-case makespan of the system by around 9% on average using the ILP on small systems, and up to 24% on our larger case studies. These results pave the way for future heuristics and for the adoption of the multi-phase model in multi-core context.

在多核处理器的不同内核上运行的任务之间的干扰是对实时系统可预测性的一大挑战，也是高估最坏情况下执行持续时间界限的根源。本文研究了如何将多阶段任务模型与静态调度算法结合使用，以提高干扰分析的精度。本文重点关注单周期任务系统（或可在超周期内扩展的多周期系统）。特别是，我们提出了通用调度问题的整数线性规划（ILP）公式以及 3 种启发式算法，并在合成基准和 2 个现实应用中进行了评估。我们发现，与经典的单阶段模型相比，多阶段模型允许在小型系统上使用 ILP 将干扰对系统最坏情况下的正常运行时间的影响平均减少约 9%，而在我们的大型案例研究中则可减少高达 24%。这些结果为未来的启发式方法以及在多核环境中采用多阶段模型铺平了道路。

引用次数: 0

Connecting the physical space and cyber space of autonomous systems more closely 更紧密地连接自主系统的物理空间和网络空间

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Real-Time Systems

Pub Date : 2024-08-23 DOI: 10.1007/s11241-024-09426-4

Xisheng Li, Jinghao Sun, Jiarui Wang, Kailu Duan, Mingsong Chen, Nan Guan, Zhishan Guo, Qingxu Deng, Yong Xie

Autonomous machines are typically subject to real-time constraints, but traditional real-time scheduling theories are not adequate to guarantee their real-time properties. One fundamental reason is that autonomous machines in the physical space and the ones in the cyber space percept timing correctness inconsistently. This paper proposes a novel framework to tie autonomous machines in the physical space and the ones in the cyber space more closely. Under this framework, the timing correctness of autonomous machines in the physical space can be correctly perceived by autonomous machines in the cyber space by properly configuring the timing parameters. Specifically, we develop an integer linear program (ILP) to derive such a feasible configuration of an autonomous machine’s timing parameters and propose an incremental algorithm to accelerate the process of solving the ILP. Experiment results show that our method is capable of designing timing-correct autonomous machines that can cope with the physical environment with more urgent events.

自主机器通常受到实时约束，但传统的实时调度理论不足以保证其实时性。其中一个根本原因是物理空间中的自主机器和网络空间中的自主机器对时序正确性的感知不一致。本文提出了一种新的框架，将物理空间中的自主机器与网络空间中的自主机器更紧密地联系在一起。在此框架下，通过正确配置时序参数，网络空间中的自主机器可以正确感知物理空间中自主机器的时序正确性。具体来说，我们开发了一个整数线性程序（ILP）来推导自主机器时序参数的可行配置，并提出了一种增量算法来加速 ILP 的求解过程。实验结果表明，我们的方法能够设计出时序正确的自主机器，以应对物理环境中更多的紧急事件。

引用次数: 0

Mcti: mixed-criticality task-based isolation Mcti：基于任务的混合关键性隔离

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Real-Time Systems

Pub Date : 2024-07-10 DOI: 10.1007/s11241-024-09425-5

Denis Hoornaert, Golsana Ghaemi, Andrea Bastoni, Renato Mancuso, Marco Caccamo, Giulio Corradi

The ever-increasing demand for high performance in the time-critical, low-power embedded domain drives the adoption of powerful but unpredictable, heterogeneous Systems-on-Chip. On these platforms, the main source of unpredictability—the shared memory subsystem—has been widely studied, and several approaches to mitigate undesired effects have been proposed over the years. Among them, performance-counter-based regulation methods have proved particularly successful. Unfortunately, such regulation methods require precise knowledge of each task’s memory consumption and cannot be extended to isolate mixed-criticality tasks running on the same core as the regulation budget is shared. Moreover, the desirable combination of these methodologies with well-known time-isolation techniques—such as server-based reservations—is still an uncharted territory and lacks a precise characterization of possible benefits and limitations. Recognizing the importance of such consolidation for designing predictable real-time systems, we introduce MCTI (Mixed-Criticality Task-based Isolation) as a first initial step in this direction. MCTI is a hardware/software co-design architecture that aims to improve both CPU and memory isolations among tasks with different criticalities even when they share the same CPU. In order to ascertain the correct behavior and distill the benefits of MCTI, we implemented and tested the proposed prototype architecture on a widely available off-the-shelf platform. The evaluation of our prototype shows that (1) MCTI helps shield critical tasks from concurrent non-critical tasks sharing the same memory budget, with only a limited increase in response time being observed, and (2) critical tasks running under memory stress exhibit an average response time close to that achieved when running without memory stress.

在时间紧迫、低功耗的嵌入式领域，对高性能的需求与日俱增，这促使人们采用功能强大但不可预测的异构片上系统。在这些平台上，不可预测性的主要来源--共享内存子系统--已被广泛研究，多年来已提出了几种方法来减轻不期望的影响。其中，基于性能计数器的调节方法被证明特别成功。遗憾的是，这种调节方法需要精确了解每个任务的内存消耗，而且由于调节预算是共享的，因此无法扩展到隔离运行在同一内核上的混合关键性任务。此外，这些方法与众所周知的时间隔离技术（如基于服务器的保留）的理想结合仍是一个未知领域，缺乏对可能的优势和局限性的精确描述。我们认识到这种整合对于设计可预测的实时系统的重要性，因此引入了 MCTI（基于混合关键任务的隔离），作为朝这个方向迈出的第一步。MCTI 是一种硬件/软件协同设计架构，旨在改善不同关键性任务之间的 CPU 和内存隔离，即使这些任务共享同一个 CPU。为了确定 MCTI 的正确行为并提炼其优势，我们在一个广泛使用的现成平台上实施并测试了所提出的原型架构。对原型的评估结果表明：(1) MCTI 有助于将关键任务与共享相同内存预算的并发非关键任务隔离开来，而且只观察到有限的响应时间增加；(2) 在内存压力下运行的关键任务的平均响应时间接近于无内存压力时的响应时间。

{"title":"Mcti: mixed-criticality task-based isolation","authors":"Denis Hoornaert, Golsana Ghaemi, Andrea Bastoni, Renato Mancuso, Marco Caccamo, Giulio Corradi","doi":"10.1007/s11241-024-09425-5","DOIUrl":"https://doi.org/10.1007/s11241-024-09425-5","url":null,"abstract":"The ever-increasing demand for high performance in the time-critical, low-power embedded domain drives the adoption of powerful but unpredictable, heterogeneous Systems-on-Chip. On these platforms, the main source of unpredictability—the shared memory subsystem—has been widely studied, and several approaches to mitigate undesired effects have been proposed over the years. Among them, performance-counter-based regulation methods have proved particularly successful. Unfortunately, such regulation methods require precise knowledge of each task’s memory consumption and cannot be extended to isolate mixed-criticality tasks running on the same core as the regulation budget is shared. Moreover, the desirable combination of these methodologies with well-known time-isolation techniques—such as server-based reservations—is still an uncharted territory and lacks a precise characterization of possible benefits and limitations. Recognizing the importance of such consolidation for designing predictable real-time systems, we introduce MCTI (Mixed-Criticality Task-based Isolation) as a first initial step in this direction. MCTI is a hardware/software co-design architecture that aims to improve both CPU and memory isolations among tasks with different criticalities even when they share the same CPU. In order to ascertain the correct behavior and distill the benefits of MCTI, we implemented and tested the proposed prototype architecture on a widely available off-the-shelf platform. The evaluation of our prototype shows that (1) MCTI helps shield critical tasks from concurrent non-critical tasks sharing the same memory budget, with only a limited increase in response time being observed, and (2) critical tasks running under memory stress exhibit an average response time close to that achieved when running without memory stress.","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"34 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141587129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Minimizing cache usage with fixed-priority and earliest deadline first scheduling 使用固定优先级和最早截止时间优先调度，最大限度减少缓存使用量

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Real-Time Systems

Pub Date : 2024-06-28 DOI: 10.1007/s11241-024-09423-7

Binqi Sun, Tomasz Kloda, Sergio Arribas Garcia, Giovani Gracioli, Marco Caccamo

Cache partitioning is a technique to reduce interference among tasks running on the processors with shared caches. To make this technique effective, cache segments should be allocated to tasks that will benefit the most from having their data and instructions stored in the cache. The requests for cached data and instructions can be retrieved faster from the cache memory instead of fetching them from the main memory, thereby reducing overall execution time. The existing partitioning schemes for real-time systems divide the available cache among the tasks to guarantee their schedulability as the sole and primary optimization criterion. However, it is also preferable, particularly in systems with power constraints or mixed criticalities where low- and high-criticality workloads are executing alongside, to reduce the total cache usage for real-time tasks. Cache minimization as part of design space exploration can also help in achieving optimal system performance and resource utilization in embedded systems. In this paper, we develop optimization algorithms for cache partitioning that, besides ensuring schedulability, also minimize cache usage. We consider both preemptive and non-preemptive scheduling policies on single-processor systems with fixed- and dynamic-priority scheduling algorithms (Rate Monotonic (RM) and Earliest Deadline First (EDF), respectively). For preemptive scheduling, we formulate the problem as an integer quadratically constrained program and propose an efficient heuristic achieving near-optimal solutions. For non-preemptive scheduling, we combine linear and binary search techniques with different fixed-priority schedulability tests and Quick Processor-demand Analysis (QPA) for EDF. Our experiments based on synthetic task sets with parameters from real-world embedded applications show that the proposed heuristic: (i) achieves an average optimality gap of 0.79% within 0.1× run time of a mathematical programming solver and (ii) reduces average cache usage by 39.15% compared to existing cache partitioning approaches. Besides, we find that for large task sets with high utilization, non-preemptive scheduling can use less cache than preemptive to guarantee schedulability.

高速缓存分区是一种减少在共享高速缓存的处理器上运行的任务之间干扰的技术。为使这一技术有效，应将缓存段分配给能从缓存中存储的数据和指令中获益最多的任务。对缓存数据和指令的请求可以更快地从缓存中获取，而不是从主存储器中获取，从而缩短整体执行时间。现有的实时系统分区方案将可用的高速缓存分配给各个任务，以保证任务的可调度性，并将此作为唯一和主要的优化标准。不过，减少实时任务的总高速缓存使用量也是一种可取的做法，尤其是在有功耗限制或混合关键性（低关键性和高关键性工作负载同时执行）的系统中。作为设计空间探索的一部分，缓存最小化也有助于在嵌入式系统中实现最佳系统性能和资源利用率。在本文中，我们开发了高速缓存分区的优化算法，除了确保可调度性，还能最大限度地减少高速缓存的使用。我们考虑了单处理器系统上的抢占式和非抢占式调度策略，以及固定优先级和动态优先级调度算法（分别为速率单调（RM）和最早截止时间优先（EDF））。对于抢占式调度，我们将问题表述为一个整数二次约束程序，并提出了一种高效的启发式方法，以获得接近最优的解决方案。对于非抢占式调度，我们将线性搜索和二进制搜索技术与不同的固定优先级可调度性测试和针对 EDF 的快速处理器需求分析（QPA）相结合。我们基于实际嵌入式应用参数的合成任务集进行的实验表明，所提出的启发式：(i) 在数学编程求解器 0.1 倍的运行时间内实现了 0.79% 的平均优化差距；(ii) 与现有的缓存分区方法相比，平均缓存使用率降低了 39.15%。此外，我们还发现，对于高利用率的大型任务集，非抢占式调度比抢占式调度能使用更少的缓存来保证可调度性。

{"title":"Minimizing cache usage with fixed-priority and earliest deadline first scheduling","authors":"Binqi Sun, Tomasz Kloda, Sergio Arribas Garcia, Giovani Gracioli, Marco Caccamo","doi":"10.1007/s11241-024-09423-7","DOIUrl":"https://doi.org/10.1007/s11241-024-09423-7","url":null,"abstract":"Cache partitioning is a technique to reduce interference among tasks running on the processors with shared caches. To make this technique effective, cache segments should be allocated to tasks that will benefit the most from having their data and instructions stored in the cache. The requests for cached data and instructions can be retrieved faster from the cache memory instead of fetching them from the main memory, thereby reducing overall execution time. The existing partitioning schemes for real-time systems divide the available cache among the tasks to guarantee their schedulability as the sole and primary optimization criterion. However, it is also preferable, particularly in systems with power constraints or mixed criticalities where low- and high-criticality workloads are executing alongside, to reduce the total cache usage for real-time tasks. Cache minimization as part of design space exploration can also help in achieving optimal system performance and resource utilization in embedded systems. In this paper, we develop optimization algorithms for cache partitioning that, besides ensuring schedulability, also minimize cache usage. We consider both preemptive and non-preemptive scheduling policies on single-processor systems with fixed- and dynamic-priority scheduling algorithms (Rate Monotonic (RM) and Earliest Deadline First (EDF), respectively). For preemptive scheduling, we formulate the problem as an integer quadratically constrained program and propose an efficient heuristic achieving near-optimal solutions. For non-preemptive scheduling, we combine linear and binary search techniques with different fixed-priority schedulability tests and Quick Processor-demand Analysis (QPA) for EDF. Our experiments based on synthetic task sets with parameters from real-world embedded applications show that the proposed heuristic: (i) achieves an average optimality gap of 0.79% within 0.1× run time of a mathematical programming solver and (ii) reduces average cache usage by 39.15% compared to existing cache partitioning approaches. Besides, we find that for large task sets with high utilization, non-preemptive scheduling can use less cache than preemptive to guarantee schedulability.","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"71 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141502995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MemPol: polling-based microsecond-scale per-core memory bandwidth regulation MemPol：基于轮询的微秒级每核内存带宽调节

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Real-Time Systems

Pub Date : 2024-06-17 DOI: 10.1007/s11241-024-09422-8

Alexander Zuepke, Andrea Bastoni, Weifan Chen, Marco Caccamo, Renato Mancuso

In today’s multiprocessor systems-on-a-chip, the shared memory subsystem is a known source of temporal interference. The problem causes logically independent cores to affect each other’s performance, leading to pessimistic worst-case execution time analysis. Memory regulation via throttling is one of the most practical techniques to mitigate interference. Traditional regulation schemes rely on a combination of timer and performance counter interrupts to be delivered and processed on the same cores running real-time workload. Unfortunately, to prevent excessive overhead, regulation can only be enforced at a millisecond-scale granularity. In this work, we present a novel regulation mechanism from outside the cores that monitors performance counters for the application core’s activity in main memory at a microsecond scale. The approach is fully transparent to the applications on the cores, and can be implemented using widely available on-chip debug facilities. The presented mechanism also allows more complex composition of metrics to enact load-aware regulation. For instance, it allows redistributing unused bandwidth between cores while keeping the overall memory bandwidth of all cores below a given threshold. We implement our approach on a host of embedded platforms and conduct an in-depth evaluation on the Xilinx Zynq UltraScale+ ZCU102, NXP i.MX8M and NXP S32G2 platforms using the San Diego Vision Benchmark Suite.

在当今的多处理器片上系统中，共享内存子系统是众所周知的时间干扰源。这个问题会导致逻辑上独立的内核相互影响性能，从而导致悲观的最坏情况执行时间分析。通过节流进行内存调节是缓解干扰的最实用技术之一。传统的调节方案依赖于定时器和性能计数器中断的组合，在运行实时工作负载的相同内核上进行传递和处理。遗憾的是，为了防止过多的开销，调节只能以毫秒级的粒度执行。在这项工作中，我们从内核外部提出了一种新颖的调节机制，它能以微秒级监控主内存中应用内核活动的性能计数器。这种方法对内核上的应用完全透明，可利用广泛可用的片上调试设施来实现。所提出的机制还允许更复杂的指标组合，以实施负载感知调节。例如，它允许在内核之间重新分配未使用的带宽，同时将所有内核的总体内存带宽保持在给定阈值以下。我们在大量嵌入式平台上实施了我们的方法，并使用圣地亚哥视觉基准套件在赛灵思 Zynq UltraScale+ ZCU102、恩智浦 i.MX8M 和恩智浦 S32G2 平台上进行了深入评估。

{"title":"MemPol: polling-based microsecond-scale per-core memory bandwidth regulation","authors":"Alexander Zuepke, Andrea Bastoni, Weifan Chen, Marco Caccamo, Renato Mancuso","doi":"10.1007/s11241-024-09422-8","DOIUrl":"https://doi.org/10.1007/s11241-024-09422-8","url":null,"abstract":"In today’s multiprocessor systems-on-a-chip, the shared memory subsystem is a known source of temporal interference. The problem causes logically independent cores to affect each other’s performance, leading to pessimistic worst-case execution time analysis. Memory regulation via throttling is one of the most practical techniques to mitigate interference. Traditional regulation schemes rely on a combination of timer and performance counter interrupts to be delivered and processed on the same cores running real-time workload. Unfortunately, to prevent excessive overhead, regulation can only be enforced at a millisecond-scale granularity. In this work, we present a novel regulation mechanism from outside the cores that monitors performance counters for the application core’s activity in main memory at a microsecond scale. The approach is fully transparent to the applications on the cores, and can be implemented using widely available on-chip debug facilities. The presented mechanism also allows more complex composition of metrics to enact load-aware regulation. For instance, it allows redistributing unused bandwidth between cores while keeping the overall memory bandwidth of all cores below a given threshold. We implement our approach on a host of embedded platforms and conduct an in-depth evaluation on the Xilinx Zynq UltraScale+ ZCU102, NXP i.MX8M and NXP S32G2 platforms using the San Diego Vision Benchmark Suite.","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"48 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141502994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Priority-based concurrency and shared resource access mechanisms for nested intercomponent requests in CAmkES CAmkES 中嵌套组件间请求的基于优先级的并发和共享资源访问机制

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Real-Time Systems

Pub Date : 2024-04-15 DOI: 10.1007/s11241-024-09419-3

Marion Sudvarg, Zhuoran Sun, Ao Li, Chris Gill, Ning Zhang

Component-based design encapsulates and isolates state and the operations on it, but timing semantics cross-cut these boundaries when a real-time task’s control flow spans multiple components. Under priority-based scheduling, inter-component control flow should be coupled with priority information, so that task execution can be prioritized appropriately end-to-end. However, the CAmkES component architecture for the seL4 microkernel does not adequately support priority propagation across intercomponent requests: component interfaces are bound to threads that execute at fixed priorities provided at compile-time in the component specification. In this paper, we present a new library for CAmkES with a thread model that supports (1) multiple concurrent requests to the same component endpoint; (2) propagation and enforcement of priority metadata, such that those requests are appropriately prioritized; (3) implementations of Non-Preemptive Critical Sections, the Immediate Priority Ceiling Protocol, and the Priority Inheritance Protocol for components encapsulating critical sections of exclusive access to a shared resource; and (4) extensions of these mechanisms to support nested lock acquisition. We measure overheads and blocking times for these new features, use existing theory to discuss schedulability analysis, and present a new hyperbolic bound for rate-monotonic scheduling of tasks with blocking times that allows tasks to be assigned non-unique priorities. Evaluations on both Intel x86 and ARM platforms demonstrate that our library allows CAmkES to provide suitable end-to-end timing for real-time systems.

基于组件的设计封装并隔离了状态和对状态的操作，但当实时任务的控制流跨越多个组件时，时序语义就会跨越这些边界。在基于优先级的调度下，组件间控制流应与优先级信息相结合，以便端到端对任务执行进行适当的优先级排序。然而，seL4 微内核的 CAmkES 组件架构并不充分支持跨组件请求的优先级传播：组件接口与线程绑定，线程按照组件规范中编译时提供的固定优先级执行。在本文中，我们为 CAmkES 提出了一个新库，其线程模型支持：(1) 对同一组件端点的多个并发请求；(2) 优先级元数据的传播和执行，以便对这些请求进行适当的优先级排序；(3) 非抢占式关键部分、立即优先级上限协议和优先级继承协议的实现，用于封装独占访问共享资源的关键部分的组件；以及 (4) 这些机制的扩展，以支持嵌套锁获取。我们测量了这些新功能的开销和阻塞时间，利用现有理论讨论了可调度性分析，并为具有阻塞时间的任务速率单调调度提出了一个新的双曲约束，允许为任务分配非唯一优先级。在英特尔 x86 和 ARM 平台上进行的评估表明，我们的库允许 CAmkES 为实时系统提供合适的端到端计时。

{"title":"Priority-based concurrency and shared resource access mechanisms for nested intercomponent requests in CAmkES","authors":"Marion Sudvarg, Zhuoran Sun, Ao Li, Chris Gill, Ning Zhang","doi":"10.1007/s11241-024-09419-3","DOIUrl":"https://doi.org/10.1007/s11241-024-09419-3","url":null,"abstract":"Component-based design encapsulates and isolates state and the operations on it, but timing semantics cross-cut these boundaries when a real-time task’s control flow spans multiple components. Under priority-based scheduling, inter-component control flow should be coupled with priority information, so that task execution can be prioritized appropriately end-to-end. However, the CAmkES component architecture for the seL4 microkernel does not adequately support priority propagation across intercomponent requests: component interfaces are bound to threads that execute at fixed priorities provided at compile-time in the component specification. In this paper, we present a new library for CAmkES with a thread model that supports (1) multiple concurrent requests to the same component endpoint; (2) propagation and enforcement of priority metadata, such that those requests are appropriately prioritized; (3) implementations of Non-Preemptive Critical Sections, the Immediate Priority Ceiling Protocol, and the Priority Inheritance Protocol for components encapsulating critical sections of exclusive access to a shared resource; and (4) extensions of these mechanisms to support nested lock acquisition. We measure overheads and blocking times for these new features, use existing theory to discuss schedulability analysis, and present a new hyperbolic bound for rate-monotonic scheduling of tasks with blocking times that allows tasks to be assigned non-unique priorities. Evaluations on both Intel x86 and ARM platforms demonstrate that our library allows CAmkES to provide suitable end-to-end timing for real-time systems.","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"35 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Inference serving with end-to-end latency SLOs over dynamic edge networks 通过动态边缘网络提供具有端到端延迟 SLO 的推理服务

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Real-Time Systems

Pub Date : 2024-02-06 DOI: 10.1007/s11241-024-09418-4

Vinod Nigade, Pablo Bauszat, Henri Bal, Lin Wang

While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish—a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate dynamic DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.

虽然高精度对于深度学习（DL）推理至关重要，但按时满足推理请求也同样重要，但人们尚未对此进行仔细研究，尤其是当请求必须通过边缘的动态无线网络提供服务时。在本文中，我们提出了 Jellyfish--一种新型边缘深度学习推理服务系统，它能实现端到端推理延迟服务级目标（SLO）的软保证。Jellyfish 利用数据和深度神经网络（DNN）自适应来处理网络的可变性，从而在准确性和延迟之间进行权衡。Jellyfish 采用了一种新的设计，实现了集体适应策略，在这种策略中，数据和 DNN 适应的决策在网络条件不同的多个用户之间进行协调和统一。我们提出了在运行时持续映射用户和适应 DNN 的高效算法，从而在实现延迟 SLO 的同时，最大限度地提高整体推理精度。我们进一步研究了动态 DNN，即包含多种架构变体的 DNN，并通过初步实验证明了它们的潜在优势。我们基于原型实现和真实世界的 WiFi 和 LTE 网络跟踪进行的实验表明，Jellyfish 可以在保持高准确率的同时，以约 99% 的速度满足延迟 SLO。

{"title":"Inference serving with end-to-end latency SLOs over dynamic edge networks","authors":"Vinod Nigade, Pablo Bauszat, Henri Bal, Lin Wang","doi":"10.1007/s11241-024-09418-4","DOIUrl":"https://doi.org/10.1007/s11241-024-09418-4","url":null,"abstract":"While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish—a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate dynamic DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.\u0000","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"26 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139754341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Statistical verification of autonomous system controllers under timing uncertainties 时序不确定情况下自主系统控制器的统计验证

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Real-Time Systems

Pub Date : 2024-01-29 DOI: 10.1007/s11241-023-09417-x

Abstract

Software in autonomous systems like autonomous cars, robots or drones is often implemented on resource-constrained embedded systems with heterogeneous architectures. At the heart of such software are multiple feedback control loops, whose dynamics not only depend on the control strategy being used, but also on the timing behavior the control software experiences. But performing timing analysis for safety critical control software tasks, particularly on heterogeneous computing platforms, is challenging. Consequently, a number of recent papers have addressed the problem of stability analysis of feedback control loops in the presence of timing uncertainties (cf., deadline misses). In this paper, we address a different class of safety properties, viz., whether the system trajectory with timing uncertainties deviates too much from the nominal trajectory. Verifying such quantitative safety properties involves performing a reachability analysis that is computationally intractable, or is too conservative. To alleviate these problems we propose to provide statistical guarantees over the behavior of control systems with timing uncertainties. More specifically, we present a Bayesian hypothesis testing method that estimates deviations from a nominal or ideal behavior. We show that our analysis can provide, with high confidence, tighter estimates of the deviation from nominal behavior than using known reachability analysis methods. We also illustrate the scalability of our techniques by obtaining bounds in cases where reachability analysis fails, thereby establishing the practicality of our proposed method.

摘要自动驾驶汽车、机器人或无人机等自主系统中的软件通常是在资源受限的嵌入式系统和异构架构上实现的。这些软件的核心是多个反馈控制回路，其动态变化不仅取决于所使用的控制策略，还取决于控制软件所经历的时序行为。但是，对安全关键控制软件任务进行时序分析，尤其是在异构计算平台上进行分析，具有很大的挑战性。因此，最近的一些论文探讨了存在时序不确定性时反馈控制环路的稳定性分析问题（参见最后期限错过）。在本文中，我们将讨论一类不同的安全特性，即存在时序不确定性的系统轨迹是否会过度偏离标称轨迹。验证这类定量安全特性需要进行可达性分析，而可达性分析在计算上难以实现，或者过于保守。为了缓解这些问题，我们建议为具有时序不确定性的控制系统行为提供统计保证。更具体地说，我们提出了一种贝叶斯假设检验方法，用于估算名义或理想行为的偏差。我们的分析表明，与使用已知的可达性分析方法相比，我们的分析能以高置信度提供对标称行为偏差更严格的估计。我们还通过在可达性分析失败的情况下获得界限来说明我们技术的可扩展性，从而确立了我们提出的方法的实用性。

{"title":"Statistical verification of autonomous system controllers under timing uncertainties","authors":"","doi":"10.1007/s11241-023-09417-x","DOIUrl":"https://doi.org/10.1007/s11241-023-09417-x","url":null,"abstract":"<h3>Abstract</h3> Software in autonomous systems like autonomous cars, robots or drones is often implemented on resource-constrained embedded systems with heterogeneous architectures. At the heart of such software are multiple feedback control loops, whose dynamics not only depend on the control strategy being used, but also on the timing behavior the control software experiences. But performing timing analysis for safety critical control software tasks, particularly on heterogeneous computing platforms, is challenging. Consequently, a number of recent papers have addressed the problem of stability analysis of feedback control loops in the presence of timing uncertainties (cf., deadline misses). In this paper, we address a different class of safety properties, viz., whether the system trajectory with timing uncertainties deviates too much from the nominal trajectory. Verifying such quantitative safety properties involves performing a reachability analysis that is computationally intractable, or is too conservative. To alleviate these problems we propose to provide statistical guarantees over the behavior of control systems with timing uncertainties. More specifically, we present a Bayesian hypothesis testing method that estimates deviations from a nominal or ideal behavior. We show that our analysis can provide, with high confidence, tighter estimates of the deviation from nominal behavior than using known reachability analysis methods. We also illustrate the scalability of our techniques by obtaining bounds in cases where reachability analysis fails, thereby establishing the practicality of our proposed method.","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"45 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139584464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ChamelIoT: a tightly- and loosely-coupled hardware-assisted OS framework for low-end IoT devices ChamelIoT：面向低端物联网设备的紧耦合和松耦合硬件辅助操作系统框架

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Real-Time Systems

Pub Date : 2023-12-20 DOI: 10.1007/s11241-023-09412-2

Miguel Silva, Tiago Gomes, Mongkol Ekpanyapong, Adriano Tavares, Sandro Pinto

The evergrowing Internet of Things (IoT) ecosystem continues to impose new requirements and constraints on every device. At the edge, low-end devices are getting pressured by increasing workloads and stricter timing deadlines while simultaneously are desired to minimize their power consumption, form factor, and memory footprint. Field-Programmable Gate Arrays (FPGAs) emerge as a possible solution for the increasing demands of the IoT. Reconfigurable IoT platforms enable the offloading of software tasks to hardware, enhancing their performance and determinism. This paper presents ChamelIoT, an agnostic hardware operating systems (OSes) framework for reconfigurable IoT devices. The framework provides hardware acceleration for kernel services of different IoT OSes by leveraging the RISC-V open-source instruction set architecture (ISA). The ChamelIoT hardware accelerator can be deployed in a tightly- or loosely-coupled approach and implements the following kernel services: thread management, scheduling, synchronization mechanisms, and inter-process communication (IPC). ChamelIoT allows developers to run unmodified applications of three well-established OSes, RIOT, Zephyr, and FreeRTOS. The experiments conducted on both coupling approaches consisted of microbenchmarks to measure the API latency, the Thread Metric benchmark suite to evaluated the system performance, and tests to the FPGA resource consumption. The results show that the latency can be reduced up to 92.65% and 89.14% for the tightly- and loosely-coupled approaches, respectively, the jitter removed, and the execution performance increased by 199.49% and 184.85% for both approaches.

不断发展的物联网（IoT）生态系统继续对每台设备提出新的要求和限制。在边缘领域，低端设备正承受着不断增加的工作负载和更严格的时间期限的压力，同时又希望最大限度地降低功耗、外形尺寸和内存占用。现场可编程门阵列（FPGA）成为满足日益增长的物联网需求的可能解决方案。可重构物联网平台能够将软件任务卸载到硬件上，从而提高性能和确定性。本文介绍了用于可重构物联网设备的不可知论硬件操作系统（OSes）框架 ChamelIoT。该框架利用 RISC-V 开源指令集架构（ISA），为不同物联网操作系统的内核服务提供硬件加速。ChamelIoT 硬件加速器可以紧耦合或松耦合方式部署，并实现以下内核服务：线程管理、调度、同步机制和进程间通信（IPC）。ChamelIoT 允许开发人员运行三个成熟操作系统（RIOT、Zephyr 和 FreeRTOS）的未修改应用程序。对这两种耦合方法进行的实验包括测量 API 延迟的微基准测试、评估系统性能的线程度量基准套件以及 FPGA 资源消耗测试。结果表明，紧密耦合和松散耦合方法的延迟分别降低了 92.65% 和 89.14%，抖动消除了，两种方法的执行性能分别提高了 199.49% 和 184.85%。

{"title":"ChamelIoT: a tightly- and loosely-coupled hardware-assisted OS framework for low-end IoT devices","authors":"Miguel Silva, Tiago Gomes, Mongkol Ekpanyapong, Adriano Tavares, Sandro Pinto","doi":"10.1007/s11241-023-09412-2","DOIUrl":"https://doi.org/10.1007/s11241-023-09412-2","url":null,"abstract":"The evergrowing Internet of Things (IoT) ecosystem continues to impose new requirements and constraints on every device. At the edge, low-end devices are getting pressured by increasing workloads and stricter timing deadlines while simultaneously are desired to minimize their power consumption, form factor, and memory footprint. Field-Programmable Gate Arrays (FPGAs) emerge as a possible solution for the increasing demands of the IoT. Reconfigurable IoT platforms enable the offloading of software tasks to hardware, enhancing their performance and determinism. This paper presents ChamelIoT, an agnostic hardware operating systems (OSes) framework for reconfigurable IoT devices. The framework provides hardware acceleration for kernel services of different IoT OSes by leveraging the RISC-V open-source instruction set architecture (ISA). The ChamelIoT hardware accelerator can be deployed in a tightly- or loosely-coupled approach and implements the following kernel services: thread management, scheduling, synchronization mechanisms, and inter-process communication (IPC). ChamelIoT allows developers to run unmodified applications of three well-established OSes, RIOT, Zephyr, and FreeRTOS. The experiments conducted on both coupling approaches consisted of microbenchmarks to measure the API latency, the Thread Metric benchmark suite to evaluated the system performance, and tests to the FPGA resource consumption. The results show that the latency can be reduced up to 92.65% and 89.14% for the tightly- and loosely-coupled approaches, respectively, the jitter removed, and the execution performance increased by 199.49% and 184.85% for both approaches.","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"486 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138820344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Configuration optimization for heterogeneous time-sensitive networks 异构时敏网络的配置优化

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Real-Time Systems

Pub Date : 2023-11-23 DOI: 10.1007/s11241-023-09414-0

Niklas Reusch, Mohammadreza Barzegaran, Luxi Zhao, Silviu S. Craciunas, Paul Pop

Time-Sensitive Networking (TSN) collectively defines a set of protocols and standard amendments that enhance IEEE 802.1Q Ethernet nodes with time-aware and fault-tolerant capabilities. Specifically, the IEEE 802.1Qbv amendment defines a timed-gate mechanism that governs the real-time transmission of critical traffic via a so-called Gate Control List (GCL) schedule encoded in each TSN-capable network device. Most TSN scheduling mechanisms are designed for homogeneous TSN networks in which all network devices must have at least the TSN capabilities related to scheduled gates and time synchronization. However, this assumption is often unrealistic since many distributed applications use heterogeneous TSN networks with legacy or off-the-shelf end systems that are unscheduled and/or unsynchronized. We propose a new scheduling paradigm for heterogeneous TSN networks that intertwines a network calculus worst-case interference analysis within the scheduling step. Through this, we compromise on the solution’s optimality to be able to support heterogeneous TSN networks featuring unscheduled and/or unsynchronized end-systems while guaranteeing the real-time properties of critical communication. Within this new paradigm, we propose two solutions to solve the problem, one based on a Constraint Programming formulation and one based on a Simulated Annealing metaheuristic, that provide different trade-offs and scalability properties. We compare and evaluate our flexible window-based scheduling methods using both synthetic and real-world test cases, validating the correctness and scalability of our implementation. Furthermore, we use OMNET++ to validate the generated GCL schedules.

时间敏感网络(TSN)定义了一组协议和标准修订，增强了具有时间感知和容错能力的IEEE 802.1Q以太网节点。具体来说，IEEE 802.1Qbv修正案定义了一个定时门机制，该机制通过在每个支持tsn的网络设备中编码的所谓的门控制列表(GCL)调度来管理关键流量的实时传输。大多数TSN调度机制都是为同构TSN网络设计的，在这种网络中，所有网络设备都必须至少具有与调度门和时间同步相关的TSN功能。然而，这种假设通常是不现实的，因为许多分布式应用程序使用异构TSN网络，这些网络具有未调度和/或未同步的遗留或现成的终端系统。本文提出了一种新的异构TSN网络调度范式，该范式在调度步骤中交织了网络演算最坏情况干扰分析。通过这种方式，我们在解决方案的最优性上做出妥协，以便能够支持异构TSN网络，这些网络具有非计划和/或非同步的终端系统，同时保证关键通信的实时性。在这个新范例中，我们提出了两种解决方案来解决问题，一种基于约束规划公式，另一种基于模拟退火元启发式，它们提供了不同的权衡和可扩展性属性。我们使用合成的和真实的测试用例来比较和评估我们灵活的基于窗口的调度方法，验证我们实现的正确性和可扩展性。此外，我们使用omnet++来验证生成的GCL调度。

{"title":"Configuration optimization for heterogeneous time-sensitive networks","authors":"Niklas Reusch, Mohammadreza Barzegaran, Luxi Zhao, Silviu S. Craciunas, Paul Pop","doi":"10.1007/s11241-023-09414-0","DOIUrl":"https://doi.org/10.1007/s11241-023-09414-0","url":null,"abstract":"Time-Sensitive Networking (TSN) collectively defines a set of protocols and standard amendments that enhance IEEE 802.1Q Ethernet nodes with time-aware and fault-tolerant capabilities. Specifically, the IEEE 802.1Qbv amendment defines a timed-gate mechanism that governs the real-time transmission of critical traffic via a so-called Gate Control List (GCL) schedule encoded in each TSN-capable network device. Most TSN scheduling mechanisms are designed for homogeneous TSN networks in which all network devices must have at least the TSN capabilities related to scheduled gates and time synchronization. However, this assumption is often unrealistic since many distributed applications use heterogeneous TSN networks with legacy or off-the-shelf end systems that are unscheduled and/or unsynchronized. We propose a new scheduling paradigm for heterogeneous TSN networks that intertwines a network calculus worst-case interference analysis within the scheduling step. Through this, we compromise on the solution’s optimality to be able to support heterogeneous TSN networks featuring unscheduled and/or unsynchronized end-systems while guaranteeing the real-time properties of critical communication. Within this new paradigm, we propose two solutions to solve the problem, one based on a Constraint Programming formulation and one based on a Simulated Annealing metaheuristic, that provide different trade-offs and scalability properties. We compare and evaluate our flexible window-based scheduling methods using both synthetic and real-world test cases, validating the correctness and scalability of our implementation. Furthermore, we use OMNET++ to validate the generated GCL schedules.","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"205 2","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138496265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0