21st IEEE Real-Time and Embedded Technology and Applications Symposium最新文献

SPeCK: a kernel for scalable predictability SPeCK:一个可扩展的可预测性内核

21st IEEE Real-Time and Embedded Technology and Applications Symposium

Pub Date : 2015-04-13 DOI: 10.1109/RTAS.2015.7108434

Qi Wang, Yuxin Ren, Matt Scaperoth, Gabriel Parmer

Multi- and many-core systems are increasingly prevalent in embedded systems. Additionally, isolation requirements between different partitions and criticalities are gaining in importance. This difficult combination is not well addressed by current software systems. Parallel systems require consistency guarantees on shared data-structures often provided by locks that use predictable resource sharing protocols. However, as the number of cores increase, even a single shared cache-line (e.g. for the lock) can cause significant interference. In this paper, we present a clean-slate design of the SPeCK kernel, the next generation of our COMPOSITE OS, that attempts to provide a strong version of scalable predictability - where predictability bounds made on a single core, remain constant with an increase in cores. Results show that, despite using a non-preemptive kernel, it has strong scalable predictability, low average-case overheads, and demonstrates better response-times than a state-of-the-art preemptive system.

多核和多核系统在嵌入式系统中越来越普遍。此外，不同分区和临界状态之间的隔离要求也变得越来越重要。当前的软件系统并没有很好地解决这个困难的组合。并行系统需要对共享数据结构提供一致性保证，这些数据结构通常由使用可预测资源共享协议的锁提供。然而，随着内核数量的增加，即使是一条共享缓存线(例如锁)也会引起严重的干扰。在本文中，我们展示了SPeCK内核的全新设计，这是我们的下一代COMPOSITE操作系统，它试图提供一个强大的可扩展可预测性版本——在单核上的可预测性界限随着核的增加而保持不变。结果表明，尽管使用了非抢占式内核，但它具有很强的可伸缩性可预测性、较低的平均情况开销，并且比最先进的抢占式系统显示出更好的响应时间。

引用次数: 24

The Packing Server for real-time scheduling of MapReduce workflows Packing Server用于实时调度MapReduce工作流

21st IEEE Real-Time and Embedded Technology and Applications Symposium

Pub Date : 2015-04-13 DOI: 10.1109/RTAS.2015.7108416

Shen Li, Shaohan Hu, T. Abdelzaher

This paper develops new schedulability bounds for a simplified MapReduce workflow model. MapReduce is a distributed computing paradigm, deployed in industry for over a decade. Different from conventional multiprocessor platforms, MapReduce deployments usually span thousands of machines, and a MapReduce job may contain as many as tens of thousands of parallel segments. State-of-the-art MapReduce workflow schedulers operate in a best-effort fashion, but the need for real-time operation has grown with the emergence of real-time analytic applications. MapReduce workflow details can be captured by the generalized parallel task model from recent real-time literature. Under this model, the best-known result guarantees schedulability if the task set utilization stays below 50% of total capacity, and the deadline to critical path length ratio, which we call the stretch φ, surpasses 2. This paper improves this bound further by introducing a hierarchical scheduling scheme based on the novel notion of a Packing Server, inspired by servers for aperiodic tasks. The Packing Server consists of multiple periodically replenished budgets that can execute in parallel and that appear as independent tasks to the underlying scheduler. Hence, the original problem of scheduling MapReduce workflows reduces to that of scheduling independent tasks. We prove that the utilization bound for schedulability of MapReduce workflows is UB · φ-β/φ , where UB is the utilization bound of the underlying independent task scheduling policy, and β is a tunable parameter that controls the maximum individual budget utilization. By leveraging past schedulability results for independent tasks on multiprocessors, we improve schedulable utilization of DAG workflows above 50% of total capacity, when the number of processors is large and the largest server budget is (sufficiently) smaller than its deadline. This surpasses the best known bounds for the generalized parallel task model. Our evaluation using a Yahoo! MapReduce trace as well as a physical cluster of 46 machines confirms the validity of the new utilization bound for MapReduce workflows.

本文为简化的MapReduce工作流模型开发了新的可调度性边界。MapReduce是一种分布式计算范例，在工业中部署了十多年。与传统的多处理器平台不同，MapReduce部署通常跨越数千台机器，一个MapReduce作业可能包含多达数万个并行段。最先进的MapReduce工作流调度器以最努力的方式运行，但随着实时分析应用程序的出现，对实时操作的需求也在增长。MapReduce工作流细节可以通过最近的实时文献中的广义并行任务模型来捕获。在该模型下，最著名的结果是任务集利用率低于总容量的50%，且截止日期与关键路径长度之比(我们称之为拉伸φ)超过2，从而保证了可调度性。本文进一步改进了这一界限，引入了一种基于打包服务器的分层调度方案，该方案受到非周期任务服务器的启发。packingserver由多个定期补充的预算组成，这些预算可以并行执行，并且对底层调度器显示为独立任务。因此，原来的调度MapReduce工作流的问题就变成了调度独立任务的问题。证明了MapReduce工作流可调度性的利用率界为UB·φ-β/φ，其中UB为底层独立任务调度策略的利用率界，β为控制最大个体预算利用率的可调参数。通过利用多处理器上独立任务的过去可调度性结果，当处理器数量很大且最大服务器预算(足够)小于其截止日期时，我们将DAG工作流的可调度利用率提高到总容量的50%以上。这超越了最著名的广义并行任务模型的界限。我们使用Yahoo!MapReduce跟踪以及46台机器的物理集群确认了MapReduce工作流的新利用率界限的有效性。

{"title":"The Packing Server for real-time scheduling of MapReduce workflows","authors":"Shen Li, Shaohan Hu, T. Abdelzaher","doi":"10.1109/RTAS.2015.7108416","DOIUrl":"https://doi.org/10.1109/RTAS.2015.7108416","url":null,"abstract":"This paper develops new schedulability bounds for a simplified MapReduce workflow model. MapReduce is a distributed computing paradigm, deployed in industry for over a decade. Different from conventional multiprocessor platforms, MapReduce deployments usually span thousands of machines, and a MapReduce job may contain as many as tens of thousands of parallel segments. State-of-the-art MapReduce workflow schedulers operate in a best-effort fashion, but the need for real-time operation has grown with the emergence of real-time analytic applications. MapReduce workflow details can be captured by the generalized parallel task model from recent real-time literature. Under this model, the best-known result guarantees schedulability if the task set utilization stays below 50% of total capacity, and the deadline to critical path length ratio, which we call the stretch φ, surpasses 2. This paper improves this bound further by introducing a hierarchical scheduling scheme based on the novel notion of a Packing Server, inspired by servers for aperiodic tasks. The Packing Server consists of multiple periodically replenished budgets that can execute in parallel and that appear as independent tasks to the underlying scheduler. Hence, the original problem of scheduling MapReduce workflows reduces to that of scheduling independent tasks. We prove that the utilization bound for schedulability of MapReduce workflows is UB · φ-β/φ , where UB is the utilization bound of the underlying independent task scheduling policy, and β is a tunable parameter that controls the maximum individual budget utilization. By leveraging past schedulability results for independent tasks on multiprocessors, we improve schedulable utilization of DAG workflows above 50% of total capacity, when the number of processors is large and the largest server budget is (sufficiently) smaller than its deadline. This surpasses the best known bounds for the generalized parallel task model. Our evaluation using a Yahoo! MapReduce trace as well as a physical cluster of 46 machines confirms the validity of the new utilization bound for MapReduce workflows.","PeriodicalId":320300,"journal":{"name":"21st IEEE Real-Time and Embedded Technology and Applications Symposium","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127649567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Memory efficient global scheduling of real-time tasks 内存高效的实时任务全局调度

21st IEEE Real-Time and Embedded Technology and Applications Symposium

Pub Date : 2015-04-13 DOI: 10.1109/RTAS.2015.7108452

A. Alhammad, Saud Wasly, R. Pellizzoni

Current computing architectures are commonly built with multiple cores and a single shared main memory. Even though this architecture increases the overall computation power, main memory can easily become a bottleneck. Simultaneous access to main memory from multiple cores can cause both (1) severe degradation in performance and (2) unpredictable execution time for real-time applications. We propose in this paper to mitigate these two problems by co-scheduling cores as well as the main memory for predictable execution. In particular, we use a DMA component to overlap memory with computation for hiding the memory latency and therefore increasing the system performance. The main contribution of this paper is a novel global co-scheduling algorithm along with its associated schedulability analysis for sporadic hard real-time tasks. We evaluated our system by generating synthetic tasksets based on real benchmark parameters. The results show a significant improvement in system utilization while retaining a predictable system behavior.

当前的计算架构通常是由多个核心和一个共享主内存构建的。尽管这种架构提高了整体计算能力，但主存很容易成为瓶颈。从多个核心同时访问主内存可能会导致(1)性能严重下降和(2)实时应用程序不可预测的执行时间。我们在本文中建议通过共同调度内核和主内存来缓解这两个问题，以实现可预测的执行。特别是，我们使用DMA组件将内存与计算重叠，以隐藏内存延迟，从而提高系统性能。本文的主要贡献是提出了一种新的全局协同调度算法，并对偶发硬实时任务的可调度性进行了分析。我们通过生成基于真实基准参数的合成任务集来评估我们的系统。结果显示，在保留可预测的系统行为的同时，系统利用率有了显著的提高。

引用次数: 50

Mixed-criticality runtime mechanisms and evaluation on multicores 多核混合临界运行机制和评估

21st IEEE Real-Time and Embedded Technology and Applications Symposium

Pub Date : 2015-04-13 DOI: 10.1109/RTAS.2015.7108442

L. Sigrist, G. Giannopoulou, Pengcheng Huang, Andres Gomez, L. Thiele

Multicore systems are being increasingly used for embedded system deployments, even in safety-critical domains. Co-hosting applications of different criticality levels in the same platform requires sufficient isolation among them, which has given rise to the mixed-criticality scheduling problem and several recently proposed policies. Such policies typically employ runtime mechanisms to monitor task execution, detect exceptional events like task overruns, and react by switching scheduling mode. Implementing such mechanisms efficiently is crucial for any scheduler to detect runtime events and react in a timely manner, without compromising the system's safety. This paper investigates implementation alternatives for these mechanisms and empirically evaluates the effect of their runtime overhead on the schedulability of mixed-criticality applications. Specifically, we implement in user-space two state-of-the-art scheduling policies: the flexible time-triggered FTTS [1] and the partitioned EDFVD [2], and measure their runtime overheads on a 60-core Intel R Xeon Phi and a 4-core Intel R Core i5 for the first time. Based on extensive executions of synthetic task sets and an industrial avionic application, we show that these overheads cannot be neglected, esp. on massively multicore architectures, where they can incur a schedulability loss up to 97%. Evaluating runtime mechanisms early in the design phase and integrating their overheads into schedulability analysis seem therefore inevitable steps in the design of mixed-criticality systems. The need for verifiably bounded overheads motivates the development of novel timing-predictable architectures and runtime environments specifically targeted for mixed-criticality applications.

多核系统越来越多地用于嵌入式系统部署，甚至在安全关键领域。在同一平台上共同托管不同临界级别的应用程序需要充分的隔离，这就产生了混合临界调度问题和最近提出的一些策略。此类策略通常使用运行时机制来监视任务执行，检测任务超时等异常事件，并通过切换调度模式做出反应。对于任何调度器来说，有效地实现这种机制对于检测运行时事件并及时做出反应，而不损害系统的安全性至关重要。本文研究了这些机制的实现方案，并经验地评估了它们的运行时开销对混合临界应用程序的可调度性的影响。具体来说，我们在用户空间实现了两种最先进的调度策略:灵活的时间触发FTTS[1]和分区的EDFVD[2]，并首次在60核Intel R Xeon Phi和4核Intel R Core i5上测量了它们的运行时开销。基于合成任务集的大量执行和工业航空电子应用，我们表明这些开销不能被忽视，特别是在大规模多核架构上，它们可能导致高达97%的可调度性损失。因此，在设计阶段早期评估运行时机制，并将其开销集成到可调度性分析中，似乎是混合临界系统设计中不可避免的步骤。对可验证的有限开销的需求促使开发专门针对混合临界应用程序的新型时间可预测架构和运行时环境。

{"title":"Mixed-criticality runtime mechanisms and evaluation on multicores","authors":"L. Sigrist, G. Giannopoulou, Pengcheng Huang, Andres Gomez, L. Thiele","doi":"10.1109/RTAS.2015.7108442","DOIUrl":"https://doi.org/10.1109/RTAS.2015.7108442","url":null,"abstract":"Multicore systems are being increasingly used for embedded system deployments, even in safety-critical domains. Co-hosting applications of different criticality levels in the same platform requires sufficient isolation among them, which has given rise to the mixed-criticality scheduling problem and several recently proposed policies. Such policies typically employ runtime mechanisms to monitor task execution, detect exceptional events like task overruns, and react by switching scheduling mode. Implementing such mechanisms efficiently is crucial for any scheduler to detect runtime events and react in a timely manner, without compromising the system's safety. This paper investigates implementation alternatives for these mechanisms and empirically evaluates the effect of their runtime overhead on the schedulability of mixed-criticality applications. Specifically, we implement in user-space two state-of-the-art scheduling policies: the flexible time-triggered FTTS [1] and the partitioned EDFVD [2], and measure their runtime overheads on a 60-core Intel R Xeon Phi and a 4-core Intel R Core i5 for the first time. Based on extensive executions of synthetic task sets and an industrial avionic application, we show that these overheads cannot be neglected, esp. on massively multicore architectures, where they can incur a schedulability loss up to 97%. Evaluating runtime mechanisms early in the design phase and integrating their overheads into schedulability analysis seem therefore inevitable steps in the design of mixed-criticality systems. The need for verifiably bounded overheads motivates the development of novel timing-predictable architectures and runtime environments specifically targeted for mixed-criticality applications.","PeriodicalId":320300,"journal":{"name":"21st IEEE Real-Time and Embedded Technology and Applications Symposium","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125393807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Providing task isolation via TLB coloring 通过TLB着色提供任务隔离

21st IEEE Real-Time and Embedded Technology and Applications Symposium

Pub Date : 2015-04-13 DOI: 10.1109/RTAS.2015.7108391

Shrinivas Anand Panchamukhi, F. Mueller

The translation look aside buffer (TLB) improves the performance of systems by caching the virtual page to physical frame mapping. But TLBs present a source of unpredictability for real-time systems. Standard heap allocated regions do not provide guarantees on the TLB set that will hold a particular page translation. This unpredictability can lead to TLB misses with a penalty of up to thousands of cycles and consequently intra- and inter-task interference resulting in loose bounds on the worst case execution time (WCET) and TLB-related preemption delay. In this work, we design and implement a new heap allocator that guarantees the TLB set, which will hold a particular page translation on a uniprocessor of a contemporary architecture. The allocator is based on the concept of page coloring, a software TLB partitioning method. Virtual pages are colored such that two pages of different color cannot map to the same TLB set. Our experimental evaluations confirm the unpredictability associated with the standard heap allocation. Using a set of synthetic and standard benchmarks, we show that our allocator provides task isolation for real-time tasks. To the best of our knowledge, such TLB isolation without special hardware support is unprecedented, increases TLB predictability and can facilitate WCET analysis.

转换暂置缓冲区(TLB)通过将虚拟页面缓存到物理帧映射来提高系统的性能。但是tlb为实时系统提供了一个不可预测性的来源。标准的堆分配区域不提供TLB集的保证，该TLB集将保存特定的页面转换。这种不可预测性可能导致TLB失败，并导致多达数千个周期的损失，从而导致任务内部和任务间的干扰，从而导致最坏情况执行时间(WCET)和TLB相关抢占延迟的松散界限。在这项工作中，我们设计并实现了一个新的堆分配器，它保证了TLB集，它将在当代体系结构的单处理器上保存特定的页面翻译。分配器基于页面着色的概念，这是一种软件TLB分区方法。虚拟页是有颜色的，因此两个不同颜色的页不能映射到同一个TLB集。我们的实验评估证实了与标准堆分配相关的不可预测性。使用一组综合和标准基准测试，我们展示了分配器为实时任务提供任务隔离。据我们所知，这种没有特殊硬件支持的TLB隔离是前所未有的，它增加了TLB的可预测性，并且可以促进WCET分析。

{"title":"Providing task isolation via TLB coloring","authors":"Shrinivas Anand Panchamukhi, F. Mueller","doi":"10.1109/RTAS.2015.7108391","DOIUrl":"https://doi.org/10.1109/RTAS.2015.7108391","url":null,"abstract":"The translation look aside buffer (TLB) improves the performance of systems by caching the virtual page to physical frame mapping. But TLBs present a source of unpredictability for real-time systems. Standard heap allocated regions do not provide guarantees on the TLB set that will hold a particular page translation. This unpredictability can lead to TLB misses with a penalty of up to thousands of cycles and consequently intra- and inter-task interference resulting in loose bounds on the worst case execution time (WCET) and TLB-related preemption delay. In this work, we design and implement a new heap allocator that guarantees the TLB set, which will hold a particular page translation on a uniprocessor of a contemporary architecture. The allocator is based on the concept of page coloring, a software TLB partitioning method. Virtual pages are colored such that two pages of different color cannot map to the same TLB set. Our experimental evaluations confirm the unpredictability associated with the standard heap allocation. Using a set of synthetic and standard benchmarks, we show that our allocator provides task isolation for real-time tasks. To the best of our knowledge, such TLB isolation without special hardware support is unprecedented, increases TLB predictability and can facilitate WCET analysis.","PeriodicalId":320300,"journal":{"name":"21st IEEE Real-Time and Embedded Technology and Applications Symposium","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125091731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Ultrasonic time synchronization and ranging on smartphones 智能手机上的超声波时间同步和测距

21st IEEE Real-Time and Embedded Technology and Applications Symposium

Pub Date : 2015-04-13 DOI: 10.1109/RTAS.2015.7108422

Patrick Lazik, N. Rajagopal, B. Sinopoli, Anthony G. Rowe

In this paper, we present the design and evaluation of a platform that can be used for time synchronization and indoor positioning of mobile devices. The platform uses the Time-Difference-Of-Arrival (TDOA) of multiple ultrasonic chirps broadcast from a network of beacons placed throughout the environment to find an initial location as well as synchronize a receiver's clock with the infrastructure. These chirps encode identification data and ranging information that can be used to compute the receiver's location. Once the clocks have been synchronized, the system can continue performing localization directly using Time-of-Flight (TOF) ranging as opposed to TDOA. This provides similar position accuracy with fewer beacons (for tens of minutes) until the mobile device clock drifts enough that a TDOA signal is once again required. Our hardware platform uses RF-based time synchronization to distribute clock synchronization from a subset of infrastructure beacons connected to a GPS source. Mobile devices use a novel time synchronization technique leverages the continuously free-running audio sampling subsystem of a smartphone to synchronize with global time. Once synchronized, each device can determine an accurate proximity from as little as one beacon using TOF measurements. This significantly decreases the number of beacons required to cover an indoor space and improves performance in the face of obstructions. We show through experiments that this approach outperforms the Network Time Protocol (NTP) on smartphones by an order of magnitude, providing an average 720μs synchronization accuracy with clock drift rates as low as 2ppm.

在本文中，我们提出了一个可以用于移动设备的时间同步和室内定位平台的设计和评估。该平台利用放置在整个环境中的信标网络广播的多个超声波啁啾的时间差(TDOA)来找到初始位置，并将接收器的时钟与基础设施同步。这些啁啾编码识别数据和测距信息，可用于计算接收器的位置。一旦时钟被同步，系统可以继续执行定位直接使用飞行时间(TOF)范围而不是TDOA。这样可以用更少的信标(持续几十分钟)提供类似的定位精度，直到移动设备时钟漂移到再次需要TDOA信号为止。我们的硬件平台使用基于射频的时间同步，从连接到GPS源的基础设施信标子集分发时钟同步。移动设备使用一种新颖的时间同步技术，利用智能手机的连续自由运行的音频采样子系统与全球时间同步。一旦同步，每个设备都可以使用TOF测量从一个信标确定精确的距离。这大大减少了覆盖室内空间所需的信标数量，并提高了面对障碍物时的性能。我们通过实验证明，这种方法在智能手机上的网络时间协议(NTP)上优于一个数量级，提供平均720μs的同步精度，时钟漂移率低至2ppm。

{"title":"Ultrasonic time synchronization and ranging on smartphones","authors":"Patrick Lazik, N. Rajagopal, B. Sinopoli, Anthony G. Rowe","doi":"10.1109/RTAS.2015.7108422","DOIUrl":"https://doi.org/10.1109/RTAS.2015.7108422","url":null,"abstract":"In this paper, we present the design and evaluation of a platform that can be used for time synchronization and indoor positioning of mobile devices. The platform uses the Time-Difference-Of-Arrival (TDOA) of multiple ultrasonic chirps broadcast from a network of beacons placed throughout the environment to find an initial location as well as synchronize a receiver's clock with the infrastructure. These chirps encode identification data and ranging information that can be used to compute the receiver's location. Once the clocks have been synchronized, the system can continue performing localization directly using Time-of-Flight (TOF) ranging as opposed to TDOA. This provides similar position accuracy with fewer beacons (for tens of minutes) until the mobile device clock drifts enough that a TDOA signal is once again required. Our hardware platform uses RF-based time synchronization to distribute clock synchronization from a subset of infrastructure beacons connected to a GPS source. Mobile devices use a novel time synchronization technique leverages the continuously free-running audio sampling subsystem of a smartphone to synchronize with global time. Once synchronized, each device can determine an accurate proximity from as little as one beacon using TOF measurements. This significantly decreases the number of beacons required to cover an indoor space and improves performance in the face of obstructions. We show through experiments that this approach outperforms the Network Time Protocol (NTP) on smartphones by an order of magnitude, providing an average 720μs synchronization accuracy with clock drift rates as low as 2ppm.","PeriodicalId":320300,"journal":{"name":"21st IEEE Real-Time and Embedded Technology and Applications Symposium","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122660956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

AUTOBEST: a united AUTOSAR-OS and ARINC 653 kernel AUTOBEST:一个统一的AUTOSAR-OS和arinc653内核

21st IEEE Real-Time and Embedded Technology and Applications Symposium

Pub Date : 2015-04-13 DOI: 10.1109/RTAS.2015.7108435

Alexander Zuepke, Marc Bommert, D. Lohmann

This paper presents AUTOBEST, a united AUTOSAR-OS and ARINC 653 RTOS kernel that addresses the requirements of both automotive and avionics domains. We show that their domain-specific requirements have a common basis and can be implemented with a small partitioning microkernel-based design on embedded microcontrollers with memory protection (MPU) support. While both, AUTOSAR and ARINC 653, use a unified task model in the kernel, we address their differences in dedicated user space libraries. Based on the kernel abstractions of futexes and lazy priority switching, these libraries provide domain specific synchronization mechanisms. Our results show that thereby it is possible to get the best of both worlds: AUTOBEST combines avionics safety with the resource-efficiency known from automotive systems.

本文介绍了AUTOBEST，一个联合AUTOSAR-OS和arinc653 RTOS内核，解决了汽车和航空电子领域的需求。我们证明了它们的特定领域需求具有共同的基础，并且可以在支持内存保护(MPU)的嵌入式微控制器上通过基于小分区微内核的设计来实现。虽然AUTOSAR和ARINC 653都在内核中使用统一的任务模型，但我们在专用用户空间库中解决了它们之间的差异。基于futexes和惰性优先级切换的内核抽象，这些库提供了特定于领域的同步机制。我们的研究结果表明，因此有可能获得两全其美:AUTOBEST将航空电子设备的安全性与汽车系统的资源效率相结合。

引用次数: 19

Top-down and bottom-up multi-level cache analysis for WCET estimation

21st IEEE Real-Time and Embedded Technology and Applications Symposium

Pub Date : 2015-04-13 DOI: 10.1109/RTAS.2015.7108413

Zhenkai Zhang, X. Koutsoukos

In many multi-core architectures, inclusive shared caches are used to reduce cache coherence complexity. However, the enforcement of the inclusion property can cause invalidation of memory blocks at higher cache levels. In order to ensure safety, analysis of cache hierarchies with inclusive caches for worst-case execution time (WCET) estimation is typically based on conservative decisions. Thus, the estimation may not be tight. In order to tighten the estimation, this paper proposes an approach that can more precisely analyze the behavior of a cache hierarchy maintaining the inclusion property. We illustrate the approach in the context of multi-level instruction caches. The approach first analyzes all the inclusive caches in the hierarchy in a bottom-up direction, and then analyzes the remaining non-inclusive caches in a top-down direction. In order to capture the inclusion victims and their effects, we also propose a concept of aging barrier and integrate it with the traditional must and persistence analyses to safely slow down their aging process so as to derive more precise analyses. We evaluate the proposed approach on a set of benchmarks and the evaluation reveals that the estimations are tightened.

在许多多核体系结构中，使用包容性共享缓存来降低缓存一致性的复杂性。但是，执行包含属性可能会导致更高缓存级别的内存块失效。为了确保安全性，对包含缓存的缓存层次结构进行最坏情况执行时间(WCET)估计的分析通常基于保守决策。因此，估计可能不严密。为了严格估计，本文提出了一种可以更精确地分析保持包含属性的缓存层次结构的行为的方法。我们在多级指令缓存的背景下说明这种方法。该方法首先以自底向上的方向分析层次结构中的所有包含缓存，然后以自顶向下的方向分析剩余的不包含缓存。为了更好地捕捉包含受害者及其影响，我们还提出了老化屏障的概念，并将其与传统的必须性和持久性分析相结合，以安全地减缓其老化过程，从而获得更精确的分析。我们在一组基准上评估了建议的方法，评估表明估计是收紧的。

{"title":"Top-down and bottom-up multi-level cache analysis for WCET estimation","authors":"Zhenkai Zhang, X. Koutsoukos","doi":"10.1109/RTAS.2015.7108413","DOIUrl":"https://doi.org/10.1109/RTAS.2015.7108413","url":null,"abstract":"In many multi-core architectures, inclusive shared caches are used to reduce cache coherence complexity. However, the enforcement of the inclusion property can cause invalidation of memory blocks at higher cache levels. In order to ensure safety, analysis of cache hierarchies with inclusive caches for worst-case execution time (WCET) estimation is typically based on conservative decisions. Thus, the estimation may not be tight. In order to tighten the estimation, this paper proposes an approach that can more precisely analyze the behavior of a cache hierarchy maintaining the inclusion property. We illustrate the approach in the context of multi-level instruction caches. The approach first analyzes all the inclusive caches in the hierarchy in a bottom-up direction, and then analyzes the remaining non-inclusive caches in a top-down direction. In order to capture the inclusion victims and their effects, we also propose a concept of aging barrier and integrate it with the traditional must and persistence analyses to safely slow down their aging process so as to derive more precise analyses. We evaluate the proposed approach on a set of benchmarks and the evaluation reveals that the estimations are tightened.","PeriodicalId":320300,"journal":{"name":"21st IEEE Real-Time and Embedded Technology and Applications Symposium","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121533581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Unifying fixed- and dynamic-priority scheduling based on priority promotion and an improved ready queue management technique 基于优先级提升和改进的就绪队列管理技术统一固定优先级和动态优先级调度

21st IEEE Real-Time and Embedded Technology and Applications Symposium

Pub Date : 2015-04-13 DOI: 10.1109/RTAS.2015.7108444

R. Pathan

This paper proposes a new preemptive scheduling algorithm, called Fixed-Priority with Priority Promotion (FPP), for scheduling sporadic tasks on uni- and multiprocessor platform. In FPP scheduling, tasks are executed similar to traditional fixed-priority (FP) scheduling but the priority of some tasks may be promoted at fixed time interval (called, promotion point) relative to the release time of each job. A policy called Increase Priority at Deadline Difference (IPDD) to compute the promotion points and promoted priorities for each task is proposed. It is shown that when all tasks' priorities are governed under IPDD policy, then FPP scheduling essentially prioritizes jobs according to Earliest-Deadline-First (EDF) priority. It is known that inserting and removing jobs to and from the ready queue of traditional EDF scheduler is more complex and has higher overhead than that of FP scheduler. To avoid such problem in FPP scheduling, a simple data structure and efficient operations to insert and remove jobs to and from the ready queue are proposed. Finally, an effective scheme to reduce overhead due to priority promotion is proposed: if a task set is not schedulable using traditional FP scheduling, then promotion points are assigned only to those tasks that need them to meet the deadlines; otherwise, tasks are assigned fixed priorities without any priority promotion and executed same as traditional FP scheduling. Empirical investigation shows the effectiveness of the proposed scheme in reducing overhead on uniprocessor and in accepting larger number of task sets in comparison to that of using state-of-the-art global schedulability tests for multiprocessors.

针对单处理器和多处理器平台上的零星任务调度问题，提出了一种新的抢占调度算法——固定优先级优先级提升算法(FPP)。在FPP调度中，任务的执行与传统的固定优先级调度类似，但相对于每个任务的释放时间，某些任务的优先级可以在固定的时间间隔(称为提升点)内提升。提出了一种名为IPDD (Increase Priority at Deadline Difference)的策略来计算每个任务的提升点和提升优先级。结果表明，当所有任务的优先级都在IPDD策略控制下时，FPP调度本质上是根据EDF优先级对作业进行优先级排序。众所周知，传统的EDF调度器在就绪队列中插入和删除作业比FP调度器更复杂，开销也更高。为了避免在FPP调度中出现这样的问题，提出了一种简单的数据结构和高效的作业插入和从就绪队列中移除操作。最后，提出了一种有效的降低优先级提升带来的开销的方案:如果使用传统的FP调度，任务集是不可调度的，那么提升点只分配给那些需要在截止日期前完成的任务;否则，任务被分配固定的优先级，没有任何优先级提升，并按照传统的FP调度方式执行。实证研究表明，与使用最先进的多处理器全局可调度性测试相比，所提出的方案在减少单处理器开销和接受更多任务集方面是有效的。

{"title":"Unifying fixed- and dynamic-priority scheduling based on priority promotion and an improved ready queue management technique","authors":"R. Pathan","doi":"10.1109/RTAS.2015.7108444","DOIUrl":"https://doi.org/10.1109/RTAS.2015.7108444","url":null,"abstract":"This paper proposes a new preemptive scheduling algorithm, called Fixed-Priority with Priority Promotion (FPP), for scheduling sporadic tasks on uni- and multiprocessor platform. In FPP scheduling, tasks are executed similar to traditional fixed-priority (FP) scheduling but the priority of some tasks may be promoted at fixed time interval (called, promotion point) relative to the release time of each job. A policy called Increase Priority at Deadline Difference (IPDD) to compute the promotion points and promoted priorities for each task is proposed. It is shown that when all tasks' priorities are governed under IPDD policy, then FPP scheduling essentially prioritizes jobs according to Earliest-Deadline-First (EDF) priority. It is known that inserting and removing jobs to and from the ready queue of traditional EDF scheduler is more complex and has higher overhead than that of FP scheduler. To avoid such problem in FPP scheduling, a simple data structure and efficient operations to insert and remove jobs to and from the ready queue are proposed. Finally, an effective scheme to reduce overhead due to priority promotion is proposed: if a task set is not schedulable using traditional FP scheduling, then promotion points are assigned only to those tasks that need them to meet the deadlines; otherwise, tasks are assigned fixed priorities without any priority promotion and executed same as traditional FP scheduling. Empirical investigation shows the effectiveness of the proposed scheme in reducing overhead on uniprocessor and in accepting larger number of task sets in comparison to that of using state-of-the-art global schedulability tests for multiprocessors.","PeriodicalId":320300,"journal":{"name":"21st IEEE Real-Time and Embedded Technology and Applications Symposium","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132731126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

GPES: a preemptive execution system for GPGPU computing GPES:用于GPGPU计算的抢占式执行系统

21st IEEE Real-Time and Embedded Technology and Applications Symposium

Pub Date : 2015-04-13 DOI: 10.1109/RTAS.2015.7108420

Husheng Zhou, G. Tong, Cong Liu

Graphics processing units (GPUs) are being widely used as co-processors in many application domains to accelerate general-purpose workloads that are computationally intensive, known as GPGPU computing. Real-time multi-tasking support is a critical requirement for many emerging GPGPU computing domains. However, due to the asynchronous and non-preemptive nature of GPU processing, in multi-tasking environments, tasks with higher priority may be blocked by lower priority tasks for a lengthy duration. This severely harms the system's timing predictability and is a serious impediment limiting the applicability of GPGPU in many real-time and embedded systems. In this paper, we present an efficient GPGPU preemptive execution system (GPES), which combines user-level and driverlevel runtime engines to reduce the pending time of high-priority GPGPU tasks that may be blocked by long-freezing low-priority competing workloads. GPES automatically slices a long-running kernel execution into multiple subkernel launches and splits data transaction into multiple chunks at user-level, then inserts preemption points between subkernel launches and memorycopy operations at driver-level. We implement a prototype of GPES, and use real-world benchmarks and case studies for evaluation. Experimental results demonstrate that GPES is able to reduce the pending time of high-priority tasks in a multitasking environment by up to 90% over the existing GPU driver solutions, while introducing small overheads.

图形处理单元(gpu)在许多应用程序领域被广泛用作协处理器，以加速计算密集型的通用工作负载，即GPGPU计算。实时多任务支持是许多新兴GPGPU计算领域的关键要求。然而，由于GPU处理的异步性和非抢占性，在多任务环境中，高优先级的任务可能会被低优先级的任务阻塞很长时间。这严重损害了系统的时间可预测性，严重限制了GPGPU在许多实时和嵌入式系统中的适用性。本文提出了一种高效的GPGPU抢先执行系统(GPES)，该系统结合了用户级和驱动级运行时引擎，以减少高优先级GPGPU任务因低优先级竞争工作负载长时间冻结而阻塞的等待时间。GPES自动将长时间运行的内核执行分割为多个子内核启动，并在用户级将数据事务分割为多个块，然后在驱动级的子内核启动和内存复制操作之间插入抢占点。我们实现了GPES的原型，并使用真实世界的基准测试和案例研究进行评估。实验结果表明，与现有的GPU驱动方案相比，GPES能够将多任务环境中高优先级任务的等待时间减少高达90%，同时引入较小的开销。

{"title":"GPES: a preemptive execution system for GPGPU computing","authors":"Husheng Zhou, G. Tong, Cong Liu","doi":"10.1109/RTAS.2015.7108420","DOIUrl":"https://doi.org/10.1109/RTAS.2015.7108420","url":null,"abstract":"Graphics processing units (GPUs) are being widely used as co-processors in many application domains to accelerate general-purpose workloads that are computationally intensive, known as GPGPU computing. Real-time multi-tasking support is a critical requirement for many emerging GPGPU computing domains. However, due to the asynchronous and non-preemptive nature of GPU processing, in multi-tasking environments, tasks with higher priority may be blocked by lower priority tasks for a lengthy duration. This severely harms the system's timing predictability and is a serious impediment limiting the applicability of GPGPU in many real-time and embedded systems. In this paper, we present an efficient GPGPU preemptive execution system (GPES), which combines user-level and driverlevel runtime engines to reduce the pending time of high-priority GPGPU tasks that may be blocked by long-freezing low-priority competing workloads. GPES automatically slices a long-running kernel execution into multiple subkernel launches and splits data transaction into multiple chunks at user-level, then inserts preemption points between subkernel launches and memorycopy operations at driver-level. We implement a prototype of GPES, and use real-world benchmarks and case studies for evaluation. Experimental results demonstrate that GPES is able to reduce the pending time of high-priority tasks in a multitasking environment by up to 90% over the existing GPU driver solutions, while introducing small overheads.","PeriodicalId":320300,"journal":{"name":"21st IEEE Real-Time and Embedded Technology and Applications Symposium","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133403921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60