Euromicro Conference on Real-Time Systems最新文献_第10页

Communication Centric Design in Complex Automotive Embedded Systems 以通信为中心的复杂汽车嵌入式系统设计

Euromicro Conference on Real-Time Systems

Pub Date : 1900-01-01 DOI: 10.4230/LIPIcs.ECRTS.2017.10

A. Hamann, D. Dasari, S. Kramer, M. Pressler, Falk Wurst

Automotive embedded applications like the engine management system are composed of multiple functional components that are tightly coupled via numerous communication dependencies and intensive data sharing, while also having real-time requirements. In order to cope with complexity, especially in multi-core settings, various communication mechanisms are used to ensure data consistency and temporal determinism along functional cause-effect chains. However, existing timing analysis methods generally only support very basic communication models that need to be extended to handle the analysis of industry grade problems which involve more complex communication semantics. In this work, we give an overview of communication semantics used in the automotive industry and the different constraints to be considered in the design process. We also propose a method for model transformation to increase the expressiveness of current timing analysis methods enabling them to work with more complex communication semantics. We demonstrate this transformation approach for concrete implementations of two communication semantics, namely, implicit and LET communication. We discuss the impact on end-to-end latencies and communication overheads based on a full blown engine management system.

像发动机管理系统这样的汽车嵌入式应用由多个功能组件组成，这些组件通过大量的通信依赖关系和密集的数据共享紧密耦合，同时还具有实时性要求。为了应对复杂性，特别是在多核环境下，采用各种通信机制来确保数据的一致性和功能因果链的时间确定性。然而，现有的时序分析方法通常只支持非常基本的通信模型，这些模型需要扩展以处理涉及更复杂通信语义的工业级问题的分析。在这项工作中，我们概述了汽车行业中使用的通信语义以及在设计过程中要考虑的不同约束。我们还提出了一种模型转换方法，以增加当前时序分析方法的表达性，使它们能够处理更复杂的通信语义。我们将这种转换方法用于两种通信语义的具体实现，即隐式通信和LET通信。我们将讨论基于完全成熟的引擎管理系统对端到端延迟和通信开销的影响。

{"title":"Communication Centric Design in Complex Automotive Embedded Systems","authors":"A. Hamann, D. Dasari, S. Kramer, M. Pressler, Falk Wurst","doi":"10.4230/LIPIcs.ECRTS.2017.10","DOIUrl":"https://doi.org/10.4230/LIPIcs.ECRTS.2017.10","url":null,"abstract":"Automotive embedded applications like the engine management system are composed of multiple functional components that are tightly coupled via numerous communication dependencies and intensive data sharing, while also having real-time requirements. In order to cope with complexity, especially in multi-core settings, various communication mechanisms are used to ensure data consistency and temporal determinism along functional cause-effect chains. However, existing timing analysis methods generally only support very basic communication models that need to be extended to handle the analysis of industry grade problems which involve more complex communication semantics. In this work, we give an overview of communication semantics used in the automotive industry and the different constraints to be considered in the design process. We also propose a method for model transformation to increase the expressiveness of current timing analysis methods enabling them to work with more complex communication semantics. We demonstrate this transformation approach for concrete implementations of two communication semantics, namely, implicit and LET communication. We discuss the impact on end-to-end latencies and communication overheads based on a full blown engine management system.","PeriodicalId":191379,"journal":{"name":"Euromicro Conference on Real-Time Systems","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134022356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

Design and Implementation of a Time Predictable Processor: Evaluation With a Space Case Study 时间可预测处理器的设计与实现:以空间为例的评估

Euromicro Conference on Real-Time Systems

Pub Date : 1900-01-01 DOI: 10.4230/LIPIcs.ECRTS.2017.16

Carles Hernández, J. Abella, F. Cazorla, Alen Bardizbanyan, J. Andersson, F. Cros, Franck Wartel

Embedded real-time systems like those found in automotive, rail and aerospace, steadily require higher levels of guaranteed computing performance (and hence time predictability) motivated by the increasing number of functionalities provided by software. However, high-performance processor design is driven by the average-performance needs of mainstream market. To make things worse, changing those designs is hard since the embedded real-time market is comparatively a small market. A path to address this mismatch is designing low-complexity hardware features that favor time predictability and can be enabled/disabled not to affect average performance when performance guarantees are not required. In this line, we present the lessons learned designing and implementing LEOPARD, a four-core processor facilitating measurement-based timing analysis (widely used in most domains). LEOPARD has been designed adding low-overhead hardware mechanisms to a LEON3 processor baseline that allow capturing the impact of jittery resources (i.e. with variable latency) in the measurements performed at analysis time. In particular, at core level we handle the jitter of caches, TLBs and variable-latency floating point units; and at the chip level, we deal with contention so that time-composable timing guarantees can be obtained. The result of our applied study with a Space application shows how per-resource jitter is controlled facilitating the computation of high-quality WCET estimates.

嵌入式实时系统，如汽车、铁路和航空航天系统，由于软件提供的功能越来越多，越来越需要更高水平的有保证的计算性能(以及时间可预测性)。然而，高性能处理器的设计是由主流市场的平均性能需求驱动的。更糟糕的是，由于嵌入式实时市场是一个相对较小的市场，改变这些设计是很困难的。解决这种不匹配的一种途径是设计低复杂性的硬件特性，这些特性有利于时间可预测性，并且在不需要性能保证时可以启用/禁用，而不会影响平均性能。在这一行中，我们介绍了设计和实现LEOPARD的经验教训，LEOPARD是一种促进基于测量的时序分析的四核处理器(广泛应用于大多数领域)。LEOPARD的设计为LEON3处理器基线添加了低开销的硬件机制，允许在分析时执行的测量中捕获抖动资源(即可变延迟)的影响。特别是，在核心级别，我们处理缓存、tlb和可变延迟浮点单元的抖动;在芯片级，我们处理争用，以获得可时间组合的定时保证。我们对空间应用程序的应用研究结果显示了如何控制每个资源的抖动，从而促进高质量WCET估计的计算。

{"title":"Design and Implementation of a Time Predictable Processor: Evaluation With a Space Case Study","authors":"Carles Hernández, J. Abella, F. Cazorla, Alen Bardizbanyan, J. Andersson, F. Cros, Franck Wartel","doi":"10.4230/LIPIcs.ECRTS.2017.16","DOIUrl":"https://doi.org/10.4230/LIPIcs.ECRTS.2017.16","url":null,"abstract":"Embedded real-time systems like those found in automotive, rail and aerospace, steadily require higher levels of guaranteed computing performance (and hence time predictability) motivated by the increasing number of functionalities provided by software. However, high-performance processor design is driven by the average-performance needs of mainstream market. To make things worse, changing those designs is hard since the embedded real-time market is comparatively a small market. A path to address this mismatch is designing low-complexity hardware features that favor time predictability and can be enabled/disabled not to affect average performance when performance guarantees are not required. In this line, we present the lessons learned designing and implementing LEOPARD, a four-core processor facilitating measurement-based timing analysis (widely used in most domains). LEOPARD has been designed adding low-overhead hardware mechanisms to a LEON3 processor baseline that allow capturing the impact of jittery resources (i.e. with variable latency) in the measurements performed at analysis time. In particular, at core level we handle the jitter of caches, TLBs and variable-latency floating point units; and at the chip level, we deal with contention so that time-composable timing guarantees can be obtained. The result of our applied study with a Space application shows how per-resource jitter is controlled facilitating the computation of high-quality WCET estimates.","PeriodicalId":191379,"journal":{"name":"Euromicro Conference on Real-Time Systems","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124749310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Impact of Transient Faults on Timing Behavior and Mitigation with Near-Zero WCET Overhead 瞬态故障对定时行为的影响及近零WCET开销的缓解

Euromicro Conference on Real-Time Systems

Pub Date : 1900-01-01 DOI: 10.4230/LIPIcs.ECRTS.2023.15

Pegdwende Romaric Nikiema, A. Kritikakou, Marcello Traiola, O. Sentieys

As time-critical systems require timing guarantees, Worst-Case Execution Times (WCET) have to be employed. However, WCET estimation methods usually assume fault-free hardware. If proper actions are not taken, such fault-free WCET approaches become unsafe, when faults impact the hardware during execution. The majority of approaches, dealing with hardware faults, address the impact of faults on the functional behavior of an application, i.e., denial of service and binary correctness. Few approaches address the impact of faults on the application timing behavior, i.e., time to finish the application, and target faults occurring in memories. However, as the transistor size in modern technologies is significantly reduced, faults in cores cannot be considered negligible anymore. This work shows that faults not only affect the functional behavior, but they can have a significant impact on the timing behavior of applications. To expose the overall impact of faults, we enhance vulnerability analysis to include not only functional, but also timing correctness, and show that faults impact WCET estimations. As common techniques to deal with faults, such as watchdog timers and re-execution, have large timing overhead for error detection and correction, we propose a mechanism with near-zero and bounded timing overhead. A RISC-V core is used as a case study. The obtained results show that faults can lead up to almost 700% increase in the maximum observed execution time between fault-free and faulty execution without protection, affecting the WCET estimations. On the contrary, the proposed mechanism is able to restore fault-free WCET estimations with a bounded overhead of 2 execution cycles.

由于时间关键型系统需要时间保证，因此必须使用最坏情况执行时间(WCET)。然而，WCET估计方法通常假设硬件是无故障的。如果不采取适当的措施，当故障在执行过程中影响硬件时，这种无故障的WCET方法就会变得不安全。大多数处理硬件故障的方法都是解决故障对应用程序功能行为的影响，即拒绝服务和二进制正确性。很少有方法解决错误对应用程序计时行为的影响，即完成应用程序的时间，并针对内存中发生的错误。然而，随着现代技术中晶体管尺寸的显著减小，核中的故障已经不能被认为是可以忽略不计的了。这项工作表明，故障不仅会影响功能行为，而且还会对应用程序的计时行为产生重大影响。为了揭示故障的总体影响，我们增强了漏洞分析，不仅包括功能，还包括时间正确性，并显示故障影响WCET估计。由于常见的故障处理技术，如看门狗定时器和重新执行，在错误检测和纠正方面有很大的时间开销，我们提出了一种接近零和有界的时间开销机制。以RISC-V内核为例进行研究。得到的结果表明，故障可以导致无故障和无保护的故障执行之间的最大观察执行时间增加近700%，从而影响WCET估计。相反，所提出的机制能够以2个执行周期的有限开销恢复无故障的WCET估计。

{"title":"Impact of Transient Faults on Timing Behavior and Mitigation with Near-Zero WCET Overhead","authors":"Pegdwende Romaric Nikiema, A. Kritikakou, Marcello Traiola, O. Sentieys","doi":"10.4230/LIPIcs.ECRTS.2023.15","DOIUrl":"https://doi.org/10.4230/LIPIcs.ECRTS.2023.15","url":null,"abstract":"As time-critical systems require timing guarantees, Worst-Case Execution Times (WCET) have to be employed. However, WCET estimation methods usually assume fault-free hardware. If proper actions are not taken, such fault-free WCET approaches become unsafe, when faults impact the hardware during execution. The majority of approaches, dealing with hardware faults, address the impact of faults on the functional behavior of an application, i.e., denial of service and binary correctness. Few approaches address the impact of faults on the application timing behavior, i.e., time to finish the application, and target faults occurring in memories. However, as the transistor size in modern technologies is significantly reduced, faults in cores cannot be considered negligible anymore. This work shows that faults not only affect the functional behavior, but they can have a significant impact on the timing behavior of applications. To expose the overall impact of faults, we enhance vulnerability analysis to include not only functional, but also timing correctness, and show that faults impact WCET estimations. As common techniques to deal with faults, such as watchdog timers and re-execution, have large timing overhead for error detection and correction, we propose a mechanism with near-zero and bounded timing overhead. A RISC-V core is used as a case study. The obtained results show that faults can lead up to almost 700% increase in the maximum observed execution time between fault-free and faulty execution without protection, affecting the WCET estimations. On the contrary, the proposed mechanism is able to restore fault-free WCET estimations with a bounded overhead of 2 execution cycles.","PeriodicalId":191379,"journal":{"name":"Euromicro Conference on Real-Time Systems","volume":"1 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120999855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Schedulability Analysis for Multi-Core Systems Accounting for Resource Stress and Sensitivity 考虑资源压力和敏感性的多核系统可调度性分析

Euromicro Conference on Real-Time Systems

Pub Date : 1900-01-01 DOI: 10.4230/LIPIcs.ECRTS.2021.7

Robert I. Davis, D. Griffin, I. Bate

Timing verification of multi-core systems is complicated by contention for shared hardware resources between co-running tasks on different cores. This paper introduces the Multi-core Resource Stress and Sensitivity (MRSS) task model that characterizes how much stress each task places on resources and how much it is sensitive to such resource stress. This model facilitates a separation of concerns, thus retaining the advantages of the traditional two-step approach to timing verification (i.e. timing analysis followed by schedulability analysis). Response time analysis is derived for the MRSS task model, providing efficient context-dependent and context independent schedulability tests for both fixed priority preemptive and fixed priority non-preemptive scheduling. Dominance relations are derived between the tests, and proofs of optimal priority assignment provided. The MRSS task model is underpinned by a proof-of-concept industrial case study. 2012 ACM Subject Classification Computer systems organization → Real-time systems; Software and its engineering → Real-time schedulability

多核系统的时间验证由于在不同核上共同运行的任务之间争用共享硬件资源而变得复杂。本文介绍了多核资源压力和敏感性(MRSS)任务模型，该模型描述了每个任务对资源施加的压力以及对这种资源压力的敏感程度。该模型促进了关注点的分离，从而保留了传统的两步计时验证方法的优点(例如，计时分析之后是可调度性分析)。推导了MRSS任务模型的响应时间分析，为固定优先级抢占调度和固定优先级非抢占调度提供了有效的上下文相关和上下文无关的可调度性测试。推导了测试间的优势关系，并给出了最优优先级分配的证明。MRSS任务模型由概念验证工业案例研究支撑。2012 ACM学科分类计算机系统组织→实时系统;软件及其工程→实时调度

引用次数: 8

nDimNoC: Real-Time D-dimensional NoC nDimNoC:实时d维NoC

Euromicro Conference on Real-Time Systems

Pub Date : 1900-01-01 DOI: 10.4230/LIPIcs.ECRTS.2021.5

Yilian Ribot González, Geoffrey Nelissen, E. Tovar

The growing demand of powerful embedded systems to perform advanced functionalities led to a large increase in the number of computation nodes integrated in Systems-on-chip (SoC). In this context, network-on-chips (NoCs) emerged as a new standard communication infrastructure for multi-processor SoCs (MPSoCs). In this work, we present nDimNoC, a new D-dimensional NoC that provides real-time guarantees for systems implemented upon MPSoCs. Specifically, (1) we propose a new router architecture and a new deflection-based routing policy that use the properties of circulant topologies to ensure bounded worst-case communication delays, and (2) we develop a generic worst-case communication time (WCCT) analysis for packets transmitted over nDimNoC. In our experiments, we show that the WCCT of packets decreases when we increase the dimensionality of the NoC using nDimNoC’s topolgy and routing policy. By implementing nDimNoC in Verilog and synthesizing it for an FPGA platform, we show that a 3D-nDimNoC requires ≈5-times less silicon than routers that use virtual channels (VC). We computed the maximum operating frequency of a 3D-nDimNoC with Xilinx Vivado. Increasing the number dimensions in the NoC improves WCCT at the cost of a more complex routing logic that may result in a reduced operating clock frequency.

强大的嵌入式系统对执行高级功能的需求日益增长，导致集成在片上系统(SoC)中的计算节点数量大幅增加。在这种背景下，片上网络(noc)成为多处理器soc (mpsoc)的一种新的标准通信基础设施。在这项工作中，我们提出了nDimNoC，一种新的d维NoC，为mpsoc上实现的系统提供实时保证。具体而言，(1)我们提出了一种新的路由器架构和一种新的基于偏转的路由策略，该策略使用循环拓扑的属性来确保有界的最坏情况通信延迟;(2)我们开发了一种通用的最坏情况通信时间(WCCT)分析，用于通过nDimNoC传输的数据包。在我们的实验中，我们发现当我们使用nDimNoC的拓扑和路由策略增加NoC的维数时，数据包的WCCT会降低。通过在Verilog中实现nDimNoC并将其合成到FPGA平台上，我们发现3D-nDimNoC比使用虚拟通道(VC)的路由器所需的硅少约5倍。我们用Xilinx Vivado计算了3D-nDimNoC的最大工作频率。增加NoC中的数字维度可以改善WCCT，但代价是更复杂的路由逻辑可能导致工作时钟频率降低。

{"title":"nDimNoC: Real-Time D-dimensional NoC","authors":"Yilian Ribot González, Geoffrey Nelissen, E. Tovar","doi":"10.4230/LIPIcs.ECRTS.2021.5","DOIUrl":"https://doi.org/10.4230/LIPIcs.ECRTS.2021.5","url":null,"abstract":"The growing demand of powerful embedded systems to perform advanced functionalities led to a large increase in the number of computation nodes integrated in Systems-on-chip (SoC). In this context, network-on-chips (NoCs) emerged as a new standard communication infrastructure for multi-processor SoCs (MPSoCs). In this work, we present nDimNoC, a new D-dimensional NoC that provides real-time guarantees for systems implemented upon MPSoCs. Specifically, (1) we propose a new router architecture and a new deflection-based routing policy that use the properties of circulant topologies to ensure bounded worst-case communication delays, and (2) we develop a generic worst-case communication time (WCCT) analysis for packets transmitted over nDimNoC. In our experiments, we show that the WCCT of packets decreases when we increase the dimensionality of the NoC using nDimNoC’s topolgy and routing policy. By implementing nDimNoC in Verilog and synthesizing it for an FPGA platform, we show that a 3D-nDimNoC requires ≈5-times less silicon than routers that use virtual channels (VC). We computed the maximum operating frequency of a 3D-nDimNoC with Xilinx Vivado. Increasing the number dimensions in the NoC improves WCCT at the cost of a more complex routing logic that may result in a reduced operating clock frequency.","PeriodicalId":191379,"journal":{"name":"Euromicro Conference on Real-Time Systems","volume":"2 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120809534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Low-Overhead Online Assessment of Timely Progress as a System Commodity 作为系统商品的及时进度的低开销在线评估

Euromicro Conference on Real-Time Systems

Pub Date : 1900-01-01 DOI: 10.4230/LIPIcs.ECRTS.2023.13

Weifan Chen, Ivan Izhbirdeev, Denis Hoornaert, Shahin Roozkhosh, Patrick Carpanedo, Sanskriti Sharma, R. Mancuso

The correctness of safety-critical systems depends on both their logical and temporal behavior. Control-flow integrity (CFI) is a well-established and understood technique to safeguard the logical flow of safety-critical applications. But unfortunately, no established methodologies exist for the complementary problem of detecting violations of control flow timeliness. Worse yet, the latter dimension, which we term Timely Progress Integrity (TPI), is increasingly more jeopardized as the complexity of our embedded systems continues to soar. As key resources of the memory hierarchy become shared by several CPUs and accelerators, they become hard-to-analyze performance bottlenecks. And the precise interplay between software and hardware components becomes hard to predict and reason about. How to restore control over timely progress integrity? We postulate that the first stepping stone toward TPI is to develop methodologies for Timely Progress Assessment (TPA). TPA refers to the ability of a system to live-monitor the positive/negative slack – with respect to a known reference – at key milestones throughout an application’s lifespan. In this paper, we propose one such methodology that goes under the name of Milestone-Based Timely Progress Assessment or MB-TPA, for short. Among the key design principles of MB-TPA is the ability to operate on black-box binary executables with near-zero time overhead and implementable on commercial platforms. To prove its feasibility and effectiveness, we propose and evaluate a full-stack implementation called Timely Progress Assessment with 0 Overhead (TPAw0v). We demonstrate its capability in providing live TPA for complex vision applications while introducing less than 0.6% time overhead for applications under test. Finally, we demonstrate one use case where TPA information is used to restore TPI in the presence of temporal interference over shared memory resources.

安全关键型系统的正确性取决于它们的逻辑和时间行为。控制流完整性(CFI)是一种完善且易于理解的技术，用于保护安全关键应用程序的逻辑流。但不幸的是，对于检测违反控制流及时性的补充问题，没有既定的方法存在。更糟糕的是，后一个维度，我们称之为及时进度完整性(TPI)，随着嵌入式系统的复杂性不断飙升，它正日益受到威胁。由于内存层次结构的关键资源由多个cpu和加速器共享，因此它们成为难以分析的性能瓶颈。软件和硬件组件之间的精确相互作用变得难以预测和推理。如何恢复对及时进度完整性的控制?我们假设迈向TPI的第一步是开发及时进展评估(TPA)的方法。TPA指的是系统在整个应用程序生命周期的关键里程碑上实时监控正/负松弛的能力——相对于已知参考。在本文中，我们提出了一种这样的方法，其名称为基于里程碑的及时进度评估(简称MB-TPA)。MB-TPA的关键设计原则之一是能够以近乎零的时间开销操作黑盒二进制可执行文件，并且可以在商业平台上实现。为了证明其可行性和有效性，我们提出并评估了一个全栈实现，称为及时进度评估与零开销(TPAw0v)。我们展示了它在为复杂视觉应用提供实时TPA的能力，同时为被测应用引入不到0.6%的时间开销。最后，我们演示了一个用例，其中TPA信息用于在共享内存资源上存在时间干扰的情况下恢复TPI。

{"title":"Low-Overhead Online Assessment of Timely Progress as a System Commodity","authors":"Weifan Chen, Ivan Izhbirdeev, Denis Hoornaert, Shahin Roozkhosh, Patrick Carpanedo, Sanskriti Sharma, R. Mancuso","doi":"10.4230/LIPIcs.ECRTS.2023.13","DOIUrl":"https://doi.org/10.4230/LIPIcs.ECRTS.2023.13","url":null,"abstract":"The correctness of safety-critical systems depends on both their logical and temporal behavior. Control-flow integrity (CFI) is a well-established and understood technique to safeguard the logical flow of safety-critical applications. But unfortunately, no established methodologies exist for the complementary problem of detecting violations of control flow timeliness. Worse yet, the latter dimension, which we term Timely Progress Integrity (TPI), is increasingly more jeopardized as the complexity of our embedded systems continues to soar. As key resources of the memory hierarchy become shared by several CPUs and accelerators, they become hard-to-analyze performance bottlenecks. And the precise interplay between software and hardware components becomes hard to predict and reason about. How to restore control over timely progress integrity? We postulate that the first stepping stone toward TPI is to develop methodologies for Timely Progress Assessment (TPA). TPA refers to the ability of a system to live-monitor the positive/negative slack – with respect to a known reference – at key milestones throughout an application’s lifespan. In this paper, we propose one such methodology that goes under the name of Milestone-Based Timely Progress Assessment or MB-TPA, for short. Among the key design principles of MB-TPA is the ability to operate on black-box binary executables with near-zero time overhead and implementable on commercial platforms. To prove its feasibility and effectiveness, we propose and evaluate a full-stack implementation called Timely Progress Assessment with 0 Overhead (TPAw0v). We demonstrate its capability in providing live TPA for complex vision applications while introducing less than 0.6% time overhead for applications under test. Finally, we demonstrate one use case where TPA information is used to restore TPI in the presence of temporal interference over shared memory resources.","PeriodicalId":191379,"journal":{"name":"Euromicro Conference on Real-Time Systems","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114903264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Simultaneous Multithreading Applied to Real Time 同步多线程应用于实时

Euromicro Conference on Real-Time Systems

Pub Date : 1900-01-01 DOI: 10.4230/LIPIcs.ECRTS.2019.3

S. Osborne, Joshua Bakita, James H. Anderson

Existing models used in real-time scheduling are inadequate to take advantage of simultaneous multithreading (SMT), which has been shown to improve performance in many areas of computing, but has seen little application to real-time systems. The SMART task model, which allows for combining SMT and real time by accounting for the variable task execution costs caused by SMT, is introduced, along with methods and conditions for scheduling SMT tasks under global earliest-deadline-first scheduling. The benefits of using SMT are demonstrated through a large-scale schedulability study in which we show that task systems with utilizations 30% larger than what would be schedulable without SMT can be correctly scheduled.

实时调度中使用的现有模型不足以利用同步多线程(SMT)，同步多线程已经被证明可以提高许多计算领域的性能，但在实时系统中的应用很少。介绍了SMART任务模型，该模型考虑了由SMT引起的可变任务执行成本，允许将SMT和实时相结合，以及在全局最早截止日期优先调度下调度SMT任务的方法和条件。使用SMT的好处通过一项大规模的可调度性研究得到了证明，在这项研究中，我们发现可以正确调度利用率比不使用SMT时高30%的任务系统。

引用次数: 5

Quasi Isolation QoS Setups to Control MPSoC Contention in Integrated Software Architectures 在集成软件架构中控制MPSoC争用的准隔离QoS设置

Euromicro Conference on Real-Time Systems

Pub Date : 1900-01-01 DOI: 10.4230/LIPIcs.ECRTS.2023.5

Sergio Garcia-Esteban, A. Serrano-Cases, J. Abella, E. Mezzetti, F. Cazorla

The use of integrated architectures, such as integrated modular avionics (IMA) in avionics, IMA-SP in space, and AUTOSAR in automotive, running on Multi-Processor System-on-Chip (MPSoC) is on the rise. Timing isolation among the different software partitions or applications thereof in an integrated architecture is key to simplifying software integration and its timing validation by ensuring the performance of each partition has no or very limited impact on others despite they share MPSoC’s hardware resources. In this work, we contend that the increasing hardware support for Quality of Service (QoS) guarantees in modern MPSoCs can be leveraged via specific setups to provide strong, albeit not full, isolation among different software partitions. We introduce the concept of Quasi Isolation QoS (QIQoS) setups and instantiate it in the Xilinx Zynq UltraScale+. To that end, out of the millions of setups offered by the different QoS mechanisms, we identify specific QoS configurations that isolate the traffic of time-critical software partitions executing in the core cluster from that generated by contender partitions in the programmable logic. Our results show that the selected isolation setup results in performance variations of the partitions run in the computing cores that are below 6 percentage points, even under scenarios with extremely high traffic coming from the programmable logic.

集成架构，如航空电子领域的集成模块化航空电子设备(IMA)、太空领域的IMA- sp和汽车领域的AUTOSAR，在多处理器片上系统(MPSoC)上运行的情况正在增加。在集成架构中，不同软件分区或其应用程序之间的时间隔离是简化软件集成及其时间验证的关键，通过确保每个分区的性能对其他分区没有或非常有限的影响，尽管它们共享MPSoC的硬件资源。在这项工作中，我们认为，现代mpsoc中对服务质量(QoS)保证的硬件支持不断增加，可以通过特定的设置来利用，在不同的软件分区之间提供强大的(尽管不是完全的)隔离。我们介绍了准隔离QoS (QIQoS)设置的概念，并在Xilinx Zynq UltraScale+中实例化了它。为此，在不同QoS机制提供的数百万种设置中，我们确定了特定的QoS配置，这些配置将在核心集群中执行的时间关键型软件分区的流量与可编程逻辑中的竞争者分区生成的流量隔离开来。我们的结果表明，所选择的隔离设置导致在计算核心中运行的分区的性能变化低于6个百分点，即使在来自可编程逻辑的极高流量的场景下也是如此。

{"title":"Quasi Isolation QoS Setups to Control MPSoC Contention in Integrated Software Architectures","authors":"Sergio Garcia-Esteban, A. Serrano-Cases, J. Abella, E. Mezzetti, F. Cazorla","doi":"10.4230/LIPIcs.ECRTS.2023.5","DOIUrl":"https://doi.org/10.4230/LIPIcs.ECRTS.2023.5","url":null,"abstract":"The use of integrated architectures, such as integrated modular avionics (IMA) in avionics, IMA-SP in space, and AUTOSAR in automotive, running on Multi-Processor System-on-Chip (MPSoC) is on the rise. Timing isolation among the different software partitions or applications thereof in an integrated architecture is key to simplifying software integration and its timing validation by ensuring the performance of each partition has no or very limited impact on others despite they share MPSoC’s hardware resources. In this work, we contend that the increasing hardware support for Quality of Service (QoS) guarantees in modern MPSoCs can be leveraged via specific setups to provide strong, albeit not full, isolation among different software partitions. We introduce the concept of Quasi Isolation QoS (QIQoS) setups and instantiate it in the Xilinx Zynq UltraScale+. To that end, out of the millions of setups offered by the different QoS mechanisms, we identify specific QoS configurations that isolate the traffic of time-critical software partitions executing in the core cluster from that generated by contender partitions in the programmable logic. Our results show that the selected isolation setup results in performance variations of the partitions run in the computing cores that are below 6 percentage points, even under scenarios with extremely high traffic coming from the programmable logic.","PeriodicalId":191379,"journal":{"name":"Euromicro Conference on Real-Time Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124881269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Memory Latency Distribution-Driven Regulation for Temporal Isolation in MPSoCs mpsoc中内存延迟分布驱动的时间隔离调节

Euromicro Conference on Real-Time Systems

Pub Date : 1900-01-01 DOI: 10.4230/LIPIcs.ECRTS.2023.4

Ahsan Saeed, Denis Hoornaert, D. Dasari, D. Ziegenbein, Daniel Mueller-Gritschneder, Ulf Schlichtmann, A. Gerstlauer, R. Mancuso

Temporal isolation is one of the most significant challenges that must be addressed before Multi-Processor Systems-on-Chip (MPSoCs) can be widely adopted in mixed-criticality systems with both time-sensitive real-time (RT) applications and performance-oriented non-real-time (NRT) applications. Specifically, the main memory subsystem is one of the most prevalent causes of interference, performance degradation and loss of isolation. Existing memory bandwidth regulation mechanisms use static, dynamic, or predictive DRAM bandwidth management techniques to restore the execution time of an application under contention as close as possible to the execution time in isolation. In this paper, we propose a novel distribution-driven regulation whose goal is to achieve a timeliness objective formulated as a constraint on the probability of meeting a certain target execution time for the RT applications. Using existing interconnect-level Performance Monitoring Units (PMU), we can observe the Cumulative Distribution Function (CDF) of the per-request memory latency. Regulation is then triggered to enforce first-order stochastical dominance with respect to a desired reference. Consequently, it is possible to enforce that the overall observed execution time random variable is dominated by the reference execution time. The mechanism requires no prior information of the contending application and treats the DRAM subsystem as a black box. We provide a full-stack implementation of our mechanism on a Commercial Off-The-Shelf (COTS) platform (Xilinx Ultrascale+ MPSoC), evaluate it using real and synthetic benchmarks, experimentally validate that the timeliness objectives are met for the RT applications, and demonstrate that it is able to provide 2.2x more overall throughput for NRT applications compared to DRAM bandwidth management-based regulation approaches.

在多处理器片上系统(mpsoc)广泛应用于具有时间敏感实时(RT)应用和面向性能的非实时(NRT)应用的混合临界系统之前，时间隔离是必须解决的最重要的挑战之一。具体来说，主存储器子系统是造成干扰、性能下降和隔离性丧失的最普遍原因之一。现有的内存带宽调节机制使用静态、动态或预测性DRAM带宽管理技术，将争用下的应用程序的执行时间恢复到尽可能接近隔离状态下的执行时间。在本文中，我们提出了一种新的分布驱动规则，其目标是实现一个时效性目标，该目标被表述为对RT应用程序满足某个目标执行时间的概率的约束。使用现有的互连级性能监控单元(PMU)，我们可以观察到每个请求的内存延迟的累积分布函数(CDF)。然后触发监管，以执行相对于期望参考的一阶随机优势。因此，有可能强制使总体观察到的执行时间随机变量由引用执行时间支配。该机制不需要竞争应用程序的先验信息，并将DRAM子系统视为黑盒。我们在商用现货(COTS)平台(Xilinx Ultrascale+ MPSoC)上提供了我们的机制的全栈实现，使用真实和合成基准对其进行评估，实验验证了RT应用的及时性目标，并证明与基于DRAM带宽管理的调节方法相比，它能够为NRT应用提供2.2倍的总体吞吐量。

{"title":"Memory Latency Distribution-Driven Regulation for Temporal Isolation in MPSoCs","authors":"Ahsan Saeed, Denis Hoornaert, D. Dasari, D. Ziegenbein, Daniel Mueller-Gritschneder, Ulf Schlichtmann, A. Gerstlauer, R. Mancuso","doi":"10.4230/LIPIcs.ECRTS.2023.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.ECRTS.2023.4","url":null,"abstract":"Temporal isolation is one of the most significant challenges that must be addressed before Multi-Processor Systems-on-Chip (MPSoCs) can be widely adopted in mixed-criticality systems with both time-sensitive real-time (RT) applications and performance-oriented non-real-time (NRT) applications. Specifically, the main memory subsystem is one of the most prevalent causes of interference, performance degradation and loss of isolation. Existing memory bandwidth regulation mechanisms use static, dynamic, or predictive DRAM bandwidth management techniques to restore the execution time of an application under contention as close as possible to the execution time in isolation. In this paper, we propose a novel distribution-driven regulation whose goal is to achieve a timeliness objective formulated as a constraint on the probability of meeting a certain target execution time for the RT applications. Using existing interconnect-level Performance Monitoring Units (PMU), we can observe the Cumulative Distribution Function (CDF) of the per-request memory latency. Regulation is then triggered to enforce first-order stochastical dominance with respect to a desired reference. Consequently, it is possible to enforce that the overall observed execution time random variable is dominated by the reference execution time. The mechanism requires no prior information of the contending application and treats the DRAM subsystem as a black box. We provide a full-stack implementation of our mechanism on a Commercial Off-The-Shelf (COTS) platform (Xilinx Ultrascale+ MPSoC), evaluate it using real and synthetic benchmarks, experimentally validate that the timeliness objectives are met for the RT applications, and demonstrate that it is able to provide 2.2x more overall throughput for NRT applications compared to DRAM bandwidth management-based regulation approaches.","PeriodicalId":191379,"journal":{"name":"Euromicro Conference on Real-Time Systems","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116362652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Memory Scheduling Infrastructure for Multi-Core Systems with Re-Programmable Logic 具有可编程逻辑的多核系统的内存调度基础结构

Euromicro Conference on Real-Time Systems

Pub Date : 1900-01-01 DOI: 10.4230/LIPIcs.ECRTS.2021.2

Denis Hoornaert, Shahin Roozkhosh, R. Mancuso

The sharp increase in demand for performance has prompted an explosion in the complexity of modern multi-core embedded systems. This has lead to unprecedented temporal unpredictability concerns in Cyber-Physical Systems (CPS). On-chip integration of programmable logic (PL) alongside a conventional Processing System (PS) in modern Systems-on-Chip (SoC) establishes a genuine compromise between specialization, performance, and reconfigurability. In addition to typical use-cases, it has been shown that the PL can be used to observe, manipulate, and ultimately manage memory traffic generated by a traditional multi-core processor. This paper explores the possibility of PL-aided memory scheduling by proposing a Scheduler Inthe-Middle (SchIM). We demonstrate that the SchIM enables transaction-level control over the main memory traffic generated by a set of embedded cores. Focusing on extensibility and reconfigurability, we put forward a SchIM design covering two main objectives. First, to provide a safe playground to test innovative memory scheduling mechanisms; and second, to establish a transition path from software-based memory regulation to provably correct hardware-enforced memory scheduling. We evaluate our design through a full-system implementation on a commercial PS-PL platform using synthetic and real-world benchmarks. 2012 ACM Subject Classification Computer systems organization → Real-time system architecture

对性能需求的急剧增加导致了现代多核嵌入式系统复杂性的爆炸式增长。这导致了网络物理系统(CPS)中前所未有的时间不可预测性问题。在现代片上系统(SoC)中，可编程逻辑(PL)与传统处理系统(PS)的片上集成在专门化、性能和可重构性之间建立了真正的折衷。除了典型的用例之外，还表明PL可用于观察、操作并最终管理由传统多核处理器生成的内存流量。本文通过提出一种中间调度器(Scheduler in - the- middle, SchIM)来探讨pl辅助内存调度的可能性。我们演示了SchIM支持对一组嵌入式内核生成的主内存流量进行事务级控制。在可扩展性和可重构性方面，我们提出了一个包含两个主要目标的SchIM设计。首先，提供一个安全的平台来测试创新的内存调度机制;第二，建立从基于软件的内存调节到可证明正确的硬件强制内存调度的过渡路径。我们通过在商业PS-PL平台上使用合成和实际基准测试的完整系统实现来评估我们的设计。2012 ACM学科分类计算机系统组织→实时系统架构

{"title":"A Memory Scheduling Infrastructure for Multi-Core Systems with Re-Programmable Logic","authors":"Denis Hoornaert, Shahin Roozkhosh, R. Mancuso","doi":"10.4230/LIPIcs.ECRTS.2021.2","DOIUrl":"https://doi.org/10.4230/LIPIcs.ECRTS.2021.2","url":null,"abstract":"The sharp increase in demand for performance has prompted an explosion in the complexity of modern multi-core embedded systems. This has lead to unprecedented temporal unpredictability concerns in Cyber-Physical Systems (CPS). On-chip integration of programmable logic (PL) alongside a conventional Processing System (PS) in modern Systems-on-Chip (SoC) establishes a genuine compromise between specialization, performance, and reconfigurability. In addition to typical use-cases, it has been shown that the PL can be used to observe, manipulate, and ultimately manage memory traffic generated by a traditional multi-core processor. This paper explores the possibility of PL-aided memory scheduling by proposing a Scheduler Inthe-Middle (SchIM). We demonstrate that the SchIM enables transaction-level control over the main memory traffic generated by a set of embedded cores. Focusing on extensibility and reconfigurability, we put forward a SchIM design covering two main objectives. First, to provide a safe playground to test innovative memory scheduling mechanisms; and second, to establish a transition path from software-based memory regulation to provably correct hardware-enforced memory scheduling. We evaluate our design through a full-system implementation on a commercial PS-PL platform using synthetic and real-world benchmarks. 2012 ACM Subject Classification Computer systems organization → Real-time system architecture","PeriodicalId":191379,"journal":{"name":"Euromicro Conference on Real-Time Systems","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122785182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12