Pub Date : 2016-04-11DOI: 10.1109/RTAS.2016.7461352
Corey Tessler, N. Fisher
Summary form only given. Research on hard real-time systems and their models has predominately focused upon single-threaded tasks. When multithreaded tasks are introduced to the classical real-time model the individual threads are treated as distinct tasks, one for each thread. These artificial tasks share the deadline, period, and worst case execution time of their parent task. In the presence of instruction and data caches this model is overly pessimistic, failing to account for the execution time benefit of cache hits when multiple threads of execution share a memory address space. This work takes a new perspective on instruction caches. Treating the cache as a benefit to schedulability for a single task with m threads. To realize the “inter-thread cache benefit” a new scheduling algorithm and accompanying worst-case execution time (WCET) calculation method are proposed. The scheduling algorithm permits threads to execute across conflict free regions, and blocks those threads that would create an unnecessary cache conflict. The WCET bound is determined for the entire set of m threads, rather than treating each thread as a distinct task. Both the scheduler and WCET method rely on the calculation of conflict free regions which are found by a static analysis method that relies on no external information from the system designer. By virtue of this perspective the system's total execution execution time is reduced and is reflected in a tighter WCET bound compared to the techniques applied to the classical model. Obtaining this tighter bound requires the integration of three typically independent areas: WCET, schedulability, and cache-related preemption delay analysis.
{"title":"Poster Abstract: Scheduling Multi-Threaded Tasks to Reduce Intra-Task Cache Contention","authors":"Corey Tessler, N. Fisher","doi":"10.1109/RTAS.2016.7461352","DOIUrl":"https://doi.org/10.1109/RTAS.2016.7461352","url":null,"abstract":"Summary form only given. Research on hard real-time systems and their models has predominately focused upon single-threaded tasks. When multithreaded tasks are introduced to the classical real-time model the individual threads are treated as distinct tasks, one for each thread. These artificial tasks share the deadline, period, and worst case execution time of their parent task. In the presence of instruction and data caches this model is overly pessimistic, failing to account for the execution time benefit of cache hits when multiple threads of execution share a memory address space. This work takes a new perspective on instruction caches. Treating the cache as a benefit to schedulability for a single task with m threads. To realize the “inter-thread cache benefit” a new scheduling algorithm and accompanying worst-case execution time (WCET) calculation method are proposed. The scheduling algorithm permits threads to execute across conflict free regions, and blocks those threads that would create an unnecessary cache conflict. The WCET bound is determined for the entire set of m threads, rather than treating each thread as a distinct task. Both the scheduler and WCET method rely on the calculation of conflict free regions which are found by a static analysis method that relies on no external information from the system designer. By virtue of this perspective the system's total execution execution time is reduced and is reflected in a tighter WCET bound compared to the techniques applied to the classical model. Obtaining this tighter bound requires the integration of three typically independent areas: WCET, schedulability, and cache-related preemption delay analysis.","PeriodicalId":338179,"journal":{"name":"2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)","volume":"350 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115231188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-04-11DOI: 10.1109/RTAS.2016.7461364
A. Alhammad, R. Pellizzoni
Federated scheduling has been proposed for parallel tasks. In this scheduling scheme, each parallel task is a assigned a private set of cores whereas sequential tasks share the remaining set of cores. Since parallel tasks are assigned dedicated cores, they receive no interference from other tasks. However, multicore processors are commonly built with shared main memory. The memory bandwidth is limited and therefore subject to contention. Consequently, parallel tasks can interfere with each other through the shared main memory. In this paper, we propose a novel method that is memory-aware when assigning cores to tasks. Our experimental results show a significant advantage of our method with respect to memory- oblivious methods.
{"title":"Trading Cores for Memory Bandwidth in Real-Time Systems","authors":"A. Alhammad, R. Pellizzoni","doi":"10.1109/RTAS.2016.7461364","DOIUrl":"https://doi.org/10.1109/RTAS.2016.7461364","url":null,"abstract":"Federated scheduling has been proposed for parallel tasks. In this scheduling scheme, each parallel task is a assigned a private set of cores whereas sequential tasks share the remaining set of cores. Since parallel tasks are assigned dedicated cores, they receive no interference from other tasks. However, multicore processors are commonly built with shared main memory. The memory bandwidth is limited and therefore subject to contention. Consequently, parallel tasks can interfere with each other through the shared main memory. In this paper, we propose a novel method that is memory-aware when assigning cores to tasks. Our experimental results show a significant advantage of our method with respect to memory- oblivious methods.","PeriodicalId":338179,"journal":{"name":"2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114425012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-04-11DOI: 10.1109/RTAS.2016.7461341
Yonghui Li, B. Akesson, Kai Lampka, K. Goossens
In modern multi-core systems with multiple real-time (RT) applications, memory traffic accessing the shared SDRAM is increasingly diverse, e.g., transactions have variable sizes. RT memory controllers with dynamic command scheduling can efficiently address the diversity by issuing appropriate commands subject to the SDRAM timing constraints. However, the scheduling dependencies between commands make it challenging to derive tight bounds for the worst-case response time (WCRT) and the worst-case bandwidth (WCBW) of a memory controller. Existing modeling and analysis techniques either do not provide tight WCRT and WCBW bounds for diverse memory traffic with variable transaction sizes or are difficult to adapt to different RT memory controllers. This paper models a memory controller using Timed Automata (TA), where model checking is applied for analysis. Our TA model is modular and accurately captures the behavior of a RT memory controller with dynamic command scheduling. We obtain WCRT and WCBW bounds, which are validated by simulating the worst- case transaction traces obtained by model checking with a cycle-accurate model of the memory controller. Our method outperforms three state-of-the-art analysis techniques. We reduce WCRT bound by up to 20%, while the average improvement is 7.7%, and increase the WCBW bound by up to 25% with an average improvement of 13.6%. In addition, our modeling is generic enough to extend to memory controllers with different mechanisms.
{"title":"Modeling and Verification of Dynamic Command Scheduling for Real-Time Memory Controllers","authors":"Yonghui Li, B. Akesson, Kai Lampka, K. Goossens","doi":"10.1109/RTAS.2016.7461341","DOIUrl":"https://doi.org/10.1109/RTAS.2016.7461341","url":null,"abstract":"In modern multi-core systems with multiple real-time (RT) applications, memory traffic accessing the shared SDRAM is increasingly diverse, e.g., transactions have variable sizes. RT memory controllers with dynamic command scheduling can efficiently address the diversity by issuing appropriate commands subject to the SDRAM timing constraints. However, the scheduling dependencies between commands make it challenging to derive tight bounds for the worst-case response time (WCRT) and the worst-case bandwidth (WCBW) of a memory controller. Existing modeling and analysis techniques either do not provide tight WCRT and WCBW bounds for diverse memory traffic with variable transaction sizes or are difficult to adapt to different RT memory controllers. This paper models a memory controller using Timed Automata (TA), where model checking is applied for analysis. Our TA model is modular and accurately captures the behavior of a RT memory controller with dynamic command scheduling. We obtain WCRT and WCBW bounds, which are validated by simulating the worst- case transaction traces obtained by model checking with a cycle-accurate model of the memory controller. Our method outperforms three state-of-the-art analysis techniques. We reduce WCRT bound by up to 20%, while the average improvement is 7.7%, and increase the WCBW bound by up to 25% with an average improvement of 13.6%. In addition, our modeling is generic enough to extend to memory controllers with different mechanisms.","PeriodicalId":338179,"journal":{"name":"2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128037768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-04-11DOI: 10.1109/RTAS.2016.7461350
Behnaz Sanati, A. Cheng
Summary form only given. Multiprocessor real-time scheduling algorithms may follow a partitioned or global approach or some hybrid of the two, called semi-partitioning. Semi-partitioned real-time scheduling algorithms extend partitioned ones by allowing a subset of tasks to migrate. Given the goal of “less overhead”, it is desirable for such strategy to be boundary-limited, and allow a migrating task to migrate only between successive invocations (job boundaries). Non-boundary-limited schedulers allow jobs to migrate, which can be expensive in practice, if jobs maintain much cached state. Previously proposed semi-partitioned algorithms for soft real-time (SRT) tasks such as EDF-fm and EDF-os, have two phases: an offline assignment phase, where tasks are assigned to processors and fixed tasks (which do not migrate) are distinguished from migrating ones; and an online execution phase. In their execution phase, rules that extend EDF scheduling are used. These strategies aim to minimize tardiness. In this paper, we propose a new online reward-based semi-partitioning approach to schedule periodic soft real-time tasks in homogeneous multiprocessor systems. We use an online choice of two approximation algorithms, Greedy and Load-Balancing, for partitioning, which provides an optimized usage of processing time. In this method, no prior information is needed. Hence, there is no offline phase. Our objective is to enhance the QoS by minimizing tardiness and maximizing the total reward obtained by completed tasks in minimum makespan. Therefore, we allow different jobs of any task get assigned to different processors (migration at job boundaries) based on their reward-based priorities and workload of the processors. This method can also extend to direct SRT systems with mixed set of tasks (aperiodic, sporadic and periodic) by defining their deadline accordingly. Many real-time applications can benefit from this solution including but not limited to video streaming servers, multi-player video games, mobile online banking and medical monitoring systems.
{"title":"Poster Abstract: Online Semi-Partitioned Multiprocessor Scheduling of Soft Real-Time Periodic Tasks for QoS Optimization","authors":"Behnaz Sanati, A. Cheng","doi":"10.1109/RTAS.2016.7461350","DOIUrl":"https://doi.org/10.1109/RTAS.2016.7461350","url":null,"abstract":"Summary form only given. Multiprocessor real-time scheduling algorithms may follow a partitioned or global approach or some hybrid of the two, called semi-partitioning. Semi-partitioned real-time scheduling algorithms extend partitioned ones by allowing a subset of tasks to migrate. Given the goal of “less overhead”, it is desirable for such strategy to be boundary-limited, and allow a migrating task to migrate only between successive invocations (job boundaries). Non-boundary-limited schedulers allow jobs to migrate, which can be expensive in practice, if jobs maintain much cached state. Previously proposed semi-partitioned algorithms for soft real-time (SRT) tasks such as EDF-fm and EDF-os, have two phases: an offline assignment phase, where tasks are assigned to processors and fixed tasks (which do not migrate) are distinguished from migrating ones; and an online execution phase. In their execution phase, rules that extend EDF scheduling are used. These strategies aim to minimize tardiness. In this paper, we propose a new online reward-based semi-partitioning approach to schedule periodic soft real-time tasks in homogeneous multiprocessor systems. We use an online choice of two approximation algorithms, Greedy and Load-Balancing, for partitioning, which provides an optimized usage of processing time. In this method, no prior information is needed. Hence, there is no offline phase. Our objective is to enhance the QoS by minimizing tardiness and maximizing the total reward obtained by completed tasks in minimum makespan. Therefore, we allow different jobs of any task get assigned to different processors (migration at job boundaries) based on their reward-based priorities and workload of the processors. This method can also extend to direct SRT systems with mixed set of tasks (aperiodic, sporadic and periodic) by defining their deadline accordingly. Many real-time applications can benefit from this solution including but not limited to video streaming servers, multi-player video games, mobile online banking and medical monitoring systems.","PeriodicalId":338179,"journal":{"name":"2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134371286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-04-11DOI: 10.1109/RTAS.2016.7461344
Debayan Roy, Licong Zhang, Wanli Chang, Dip Goswami, S. Chakraborty
Recently, research on control and architecture co- design has been drawing increasingly more attention. This is because these techniques integrate the design of the controllers and the architecture and explore the characteristics on both sides to achieve more efficient design of embedded control systems. However, there still exist several challenges like the large design space and inadequate trade-off opportunities for different objectives like control performance and resource utilization. In this paper, we propose a co-optimization approach for FlexRay-based distributed control systems, that synthesizes both the controllers and the task and communication schedules. This approach exploits some FlexRay protocol specific characteristics to reduce the complexity of the whole optimization problem. This is done by employing a customized control design and a nested two-layered optimization technique. Therefore, compared to existing methods, the proposed approach is more scalable. It also allows multi-objective optimization taking into account both the overall control performance and the bus resource utilization. This approach generates a Pareto front representing the trade-offs between these two, which allows the engineers to make suitable design choices.
{"title":"Multi-Objective Co-Optimization of FlexRay-Based Distributed Control Systems","authors":"Debayan Roy, Licong Zhang, Wanli Chang, Dip Goswami, S. Chakraborty","doi":"10.1109/RTAS.2016.7461344","DOIUrl":"https://doi.org/10.1109/RTAS.2016.7461344","url":null,"abstract":"Recently, research on control and architecture co- design has been drawing increasingly more attention. This is because these techniques integrate the design of the controllers and the architecture and explore the characteristics on both sides to achieve more efficient design of embedded control systems. However, there still exist several challenges like the large design space and inadequate trade-off opportunities for different objectives like control performance and resource utilization. In this paper, we propose a co-optimization approach for FlexRay-based distributed control systems, that synthesizes both the controllers and the task and communication schedules. This approach exploits some FlexRay protocol specific characteristics to reduce the complexity of the whole optimization problem. This is done by employing a customized control design and a nested two-layered optimization technique. Therefore, compared to existing methods, the proposed approach is more scalable. It also allows multi-objective optimization taking into account both the overall control performance and the bus resource utilization. This approach generates a Pareto front representing the trade-offs between these two, which allows the engineers to make suitable design choices.","PeriodicalId":338179,"journal":{"name":"2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131715467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-04-11DOI: 10.1109/RTAS.2016.7461333
Geoffrey Nelissen, H. Carvalho, David Pereira, E. Tovar
With the increasing complexity of embedded systems, it becomes unrealistic to formally verify that all the system requirements will be respected under any possible execution scenario. Moreover, the worst-case analyses that are usually performed before the system deployment are also based on a set of assumptions (e.g., minimum activation period, worst-case execution time, maximum release jitter) that may not always be respected at run-time. For those reasons, run-time monitoring and run-time verification become an interesting alternative to the traditional offline verification. Run-time verification is based on the instrumentation of the target applications. Monitors are then added to the system to verify at run-time that the system requirements are respected during the execution. If a misbehaviour is detected, an alarm can be raised so as to trigger appropriate counter-measures (e.g., execution mode change, reset or deactivation of some of the functionalities). In this work, we present four different implementations of a run-time monitoring framework suited to real-time and safety critical systems. Two implementations are written in Ada and follow the Ravenscar profile, which make them particularly suited to the development of high integrity systems. The first version is available as a standalone library for Ada programs while the second has been integrated in the GNAT run-time environment and instruments the ORK+ micro-kernel. Information on the task scheduling events, directly originating from the kernel, can thus be used by the monitors to check if the system follows all its requirements. The third implementation is a standalone library written in C++ that can be used in any POSIX compliant run-time environment. It is therefore compatible with the vast majority of operating systems used in embedded systems. The last implementation is a loadable kernel module for Linux. It has for main advantage to be able to enforce complete space partitioning between the monitors and the monitored applications. It is therefore impossible for memory faults to propagate and corrupt the state of the monitors.
{"title":"Demo Abstract: Run-Time Monitoring Environments for Real-Time and Safety Critical Systems","authors":"Geoffrey Nelissen, H. Carvalho, David Pereira, E. Tovar","doi":"10.1109/RTAS.2016.7461333","DOIUrl":"https://doi.org/10.1109/RTAS.2016.7461333","url":null,"abstract":"With the increasing complexity of embedded systems, it becomes unrealistic to formally verify that all the system requirements will be respected under any possible execution scenario. Moreover, the worst-case analyses that are usually performed before the system deployment are also based on a set of assumptions (e.g., minimum activation period, worst-case execution time, maximum release jitter) that may not always be respected at run-time. For those reasons, run-time monitoring and run-time verification become an interesting alternative to the traditional offline verification. Run-time verification is based on the instrumentation of the target applications. Monitors are then added to the system to verify at run-time that the system requirements are respected during the execution. If a misbehaviour is detected, an alarm can be raised so as to trigger appropriate counter-measures (e.g., execution mode change, reset or deactivation of some of the functionalities). In this work, we present four different implementations of a run-time monitoring framework suited to real-time and safety critical systems. Two implementations are written in Ada and follow the Ravenscar profile, which make them particularly suited to the development of high integrity systems. The first version is available as a standalone library for Ada programs while the second has been integrated in the GNAT run-time environment and instruments the ORK+ micro-kernel. Information on the task scheduling events, directly originating from the kernel, can thus be used by the monitors to check if the system follows all its requirements. The third implementation is a standalone library written in C++ that can be used in any POSIX compliant run-time environment. It is therefore compatible with the vast majority of operating systems used in embedded systems. The last implementation is a loadable kernel module for Linux. It has for main advantage to be able to enforce complete space partitioning between the monitors and the monitored applications. It is therefore impossible for memory faults to propagate and corrupt the state of the monitors.","PeriodicalId":338179,"journal":{"name":"2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133194654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-04-11DOI: 10.1109/RTAS.2016.7461347
Syed Aftab Rashid, Geoffrey Nelissen, E. Tovar
Summary form only given. The existing gap between the processor and main memory operating speeds necessitates the use of intermediate cache memories to accelerate the average case access time to instructions and data that must be executed or treated on the processor. However, the introduction of cache memories in modern computing platforms is the cause of big variations in the execution time of each instruction depending on whether the instruction and the data it treats are already loaded in the cache or not. During the worst-case response time (WCRT) analysis, the existing works assume that each job released by the preempting tasks will ask for their worst-case memory demand. This is however pessimistic since there is a high chance that a big portion of the instructions and data associated with the preempting task τj , are still available in the cache when τj releases its next jobs. We call this content persistent cache blocks (PCBs). In this work, we propose a method to accurately bound the memory overhead incurred by a low priority task due to high priority tasks executing during its response time. For this purpose, we first identify the existence of persistent and nonpersistent cache blocks (i.e., PCBs and nPCBs) associated with each task. We then show with an example that due to the existence of PCBs, the memory demand of a task can significantly vary over time. Therefore, accounting for PCBs in the memory demand of the preempting task allows to reduce the pessimism on the total memory demand considered by the WCRT analysis. Finally, we propose a refined WCRT analysis for fixed priority preemptive systems considering (i) the effect of PCBs on the memory demand of the preempting task, and (ii) accounting for the number of PCBs that can be evicted by the preempted tasks between two successive job releases of the preempting tasks.
{"title":"Poster Abstract: Cache Persistence Aware Response Time Analysis for Fixed Priority Preemptive Systems","authors":"Syed Aftab Rashid, Geoffrey Nelissen, E. Tovar","doi":"10.1109/RTAS.2016.7461347","DOIUrl":"https://doi.org/10.1109/RTAS.2016.7461347","url":null,"abstract":"Summary form only given. The existing gap between the processor and main memory operating speeds necessitates the use of intermediate cache memories to accelerate the average case access time to instructions and data that must be executed or treated on the processor. However, the introduction of cache memories in modern computing platforms is the cause of big variations in the execution time of each instruction depending on whether the instruction and the data it treats are already loaded in the cache or not. During the worst-case response time (WCRT) analysis, the existing works assume that each job released by the preempting tasks will ask for their worst-case memory demand. This is however pessimistic since there is a high chance that a big portion of the instructions and data associated with the preempting task τj , are still available in the cache when τj releases its next jobs. We call this content persistent cache blocks (PCBs). In this work, we propose a method to accurately bound the memory overhead incurred by a low priority task due to high priority tasks executing during its response time. For this purpose, we first identify the existence of persistent and nonpersistent cache blocks (i.e., PCBs and nPCBs) associated with each task. We then show with an example that due to the existence of PCBs, the memory demand of a task can significantly vary over time. Therefore, accounting for PCBs in the memory demand of the preempting task allows to reduce the pessimism on the total memory demand considered by the WCRT analysis. Finally, we propose a refined WCRT analysis for fixed priority preemptive systems considering (i) the effect of PCBs on the memory demand of the preempting task, and (ii) accounting for the number of PCBs that can be evicted by the preempted tasks between two successive job releases of the preempting tasks.","PeriodicalId":338179,"journal":{"name":"2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125413542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-04-11DOI: 10.1109/RTAS.2016.7461361
P. K. Valsan, H. Yun, F. Farshchi
In this paper, we show that cache partitioning does not necessarily ensure predictable cache performance in modern COTS multicore platforms that use non-blocking caches to exploit memory- level-parallelism (MLP). Through carefully designed experiments using three real COTS multicore platforms (four distinct CPU architectures) and a cycle- accurate full system simulator, we show that special hardware registers in non-blocking caches, known as Miss Status Holding Registers (MSHRs), which track the status of outstanding cache-misses, can be a significant source of contention; we observe up to 21X WCET increase in a real COTS multicore platform due to MSHR contention. We propose a hardware and system software (OS) collaborative approach to efficiently eliminate MSHR contention for multicore real-time systems. Our approach includes a low-cost hardware extension that enables dynamic control of per-core MLP by the OS. Using the hardware extension, the OS scheduler then globally controls each core's MLP in such a way that eliminates MSHR contention and maximizes overall throughput of the system. We implement the hardware extension in a cycle- accurate fullsystem simulator and the scheduler modification in Linux 3.14 kernel. We evaluate the effectiveness of our approach using a set of synthetic and macro benchmarks. In a case study, we achieve up to 19% WCET reduction (average: 13%) for a set of EEMBC benchmarks compared to a baseline cache partitioning setup.
在本文中,我们表明,在使用非阻塞缓存来利用内存级并行性(MLP)的现代COTS多核平台中,缓存分区不一定能确保可预测的缓存性能。通过使用三个真实的COTS多核平台(四种不同的CPU架构)和一个周期精确的全系统模拟器精心设计的实验,我们表明,非阻塞缓存中的特殊硬件寄存器,称为Miss Status Holding寄存器(MSHRs),用于跟踪未完成的缓存未完成状态,可能是争用的重要来源;我们观察到,由于MSHR争用,在真正的COTS多核平台中,WCET增加了21倍。我们提出了一种硬件和系统软件(OS)协作的方法来有效地消除多核实时系统的MSHR争用。我们的方法包括一个低成本的硬件扩展,使操作系统能够动态控制每核MLP。使用硬件扩展,操作系统调度器然后全局控制每个核心的MLP,从而消除MSHR争用并最大限度地提高系统的总体吞吐量。我们在一个周期精确的全系统模拟器中实现了硬件扩展,并在Linux 3.14内核中实现了调度器的修改。我们使用一组综合和宏观基准来评估我们方法的有效性。在一个案例研究中,与基线缓存分区设置相比,我们在一组EEMBC基准测试中实现了19%的WCET减少(平均:13%)。
{"title":"Taming Non-Blocking Caches to Improve Isolation in Multicore Real-Time Systems","authors":"P. K. Valsan, H. Yun, F. Farshchi","doi":"10.1109/RTAS.2016.7461361","DOIUrl":"https://doi.org/10.1109/RTAS.2016.7461361","url":null,"abstract":"In this paper, we show that cache partitioning does not necessarily ensure predictable cache performance in modern COTS multicore platforms that use non-blocking caches to exploit memory- level-parallelism (MLP). Through carefully designed experiments using three real COTS multicore platforms (four distinct CPU architectures) and a cycle- accurate full system simulator, we show that special hardware registers in non-blocking caches, known as Miss Status Holding Registers (MSHRs), which track the status of outstanding cache-misses, can be a significant source of contention; we observe up to 21X WCET increase in a real COTS multicore platform due to MSHR contention. We propose a hardware and system software (OS) collaborative approach to efficiently eliminate MSHR contention for multicore real-time systems. Our approach includes a low-cost hardware extension that enables dynamic control of per-core MLP by the OS. Using the hardware extension, the OS scheduler then globally controls each core's MLP in such a way that eliminates MSHR contention and maximizes overall throughput of the system. We implement the hardware extension in a cycle- accurate fullsystem simulator and the scheduler modification in Linux 3.14 kernel. We evaluate the effectiveness of our approach using a set of synthetic and macro benchmarks. In a case study, we achieve up to 19% WCET reduction (average: 13%) for a set of EEMBC benchmarks compared to a baseline cache partitioning setup.","PeriodicalId":338179,"journal":{"name":"2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132551215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-04-11DOI: 10.1109/RTAS.2016.7461320
Peter Wägemann, T. Distler, Heiko Janker, Phillip Raffeck, V. Sieh
Energy-neutral real-time systems harvest the entire energy they use from their environment, making it essential to treat energy as an equally important resource as time. As a result, such systems need to solve a number of problems that so far have not been addressed by traditional real-time systems. In particular, this includes the scheduling of tasks with both time and energy constraints, the monitoring of energy budgets, as well as the survival of blackout periods during which not enough energy is available to keep the system fully operational. In this paper, we address these issues presenting ENOS, an operating-system kernel for energy-neutral real-time systems. ENOS considers mixed time criticality levels for different energy criticality modes, which enables a decoupling of time and energy constraints during phases when one is considered less critical than the other. When switching the energy criticality mode, the system also changes the set of tasks to be executed and is therefore able to dynamically adapt its energy consumption depending on external conditions. By keeping track of the energy budget available, ENOS ensures that in case of a blackout the system state is safely stored to persistent memory, allowing operations to resume at a later point when enough energy is harvested again.
{"title":"A Kernel for Energy-Neutral Real-Time Systems with Mixed Criticalities","authors":"Peter Wägemann, T. Distler, Heiko Janker, Phillip Raffeck, V. Sieh","doi":"10.1109/RTAS.2016.7461320","DOIUrl":"https://doi.org/10.1109/RTAS.2016.7461320","url":null,"abstract":"Energy-neutral real-time systems harvest the entire energy they use from their environment, making it essential to treat energy as an equally important resource as time. As a result, such systems need to solve a number of problems that so far have not been addressed by traditional real-time systems. In particular, this includes the scheduling of tasks with both time and energy constraints, the monitoring of energy budgets, as well as the survival of blackout periods during which not enough energy is available to keep the system fully operational. In this paper, we address these issues presenting ENOS, an operating-system kernel for energy-neutral real-time systems. ENOS considers mixed time criticality levels for different energy criticality modes, which enables a decoupling of time and energy constraints during phases when one is considered less critical than the other. When switching the energy criticality mode, the system also changes the set of tasks to be executed and is therefore able to dynamically adapt its energy consumption depending on external conditions. By keeping track of the energy budget available, ENOS ensures that in case of a blackout the system state is safely stored to persistent memory, allowing operations to resume at a later point when enough energy is harvested again.","PeriodicalId":338179,"journal":{"name":"2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114526668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-04-11DOI: 10.1109/RTAS.2016.7461326
Thomas Sewell, Felix Kam, G. Heiser
Worst-case execution time (WCET) analysis of real-time code needs to be performed on the executable binary code for soundness. Determination of loop bounds and elimination of infeasible paths, essential for obtaining tight bounds, frequently depends on program state that is difficult to extract from static analysis of the binary. Obtaining this information generally requires manual intervention, or compiler modifications to preserve more semantic information from the source program. We propose an alternative approach, which leverages an existing translation-validation framework, to enable high-assurance, automatic determination of loop bounds and infeasible paths. We show that this approach automatically determines all loop bounds and many (possibly all) infeasible paths in the seL4 microkernel, as well as in standard WCET benchmarks which are in the language subset of our C parser.
{"title":"Complete, High-Assurance Determination of Loop Bounds and Infeasible Paths for WCET Analysis","authors":"Thomas Sewell, Felix Kam, G. Heiser","doi":"10.1109/RTAS.2016.7461326","DOIUrl":"https://doi.org/10.1109/RTAS.2016.7461326","url":null,"abstract":"Worst-case execution time (WCET) analysis of real-time code needs to be performed on the executable binary code for soundness. Determination of loop bounds and elimination of infeasible paths, essential for obtaining tight bounds, frequently depends on program state that is difficult to extract from static analysis of the binary. Obtaining this information generally requires manual intervention, or compiler modifications to preserve more semantic information from the source program. We propose an alternative approach, which leverages an existing translation-validation framework, to enable high-assurance, automatic determination of loop bounds and infeasible paths. We show that this approach automatically determines all loop bounds and many (possibly all) infeasible paths in the seL4 microkernel, as well as in standard WCET benchmarks which are in the language subset of our C parser.","PeriodicalId":338179,"journal":{"name":"2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121601113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}