Ming Yang, Tanya Amert, Kecheng Yang, Nathan Otterness, James H. Anderson, F. D. Smith, Shige Wang
OpenVX is a recently ratified standard that was expressly proposed to facilitate the design of computer-vision (CV) applications used in real-time embedded systems. Despite its real-time focus, OpenVX presents several challenges when validating real-time constraints. Many of these challenges are rooted in the fact that OpenVX only implicitly defines any notion of a schedulable entity. Under OpenVX, CV applications are specified in the form of processing graphs that are inherently considered to execute monolithically end-to-end. This monolithic execution hinders parallelism and can lead to significant processing-capacity loss. Prior work partially addressed this problem by treating graph nodes as schedulable entities, but under OpenVX, these nodes represent rather coarse-grained CV functions, so the available parallelism that can be obtained in this way is quite limited. In this paper, a much more fine-grained approach for scheduling OpenVX graphs is proposed. This approach was designed to enable additional parallelism and to eliminate schedulability-related processing-capacity loss that arises when programs execute on both CPUs and graphics processing units (GPUs). Response-time analysis for this new approach is presented and its efficacy is evaluated via a case study involving an actual CV application.
{"title":"Making OpenVX Really \"Real Time\"","authors":"Ming Yang, Tanya Amert, Kecheng Yang, Nathan Otterness, James H. Anderson, F. D. Smith, Shige Wang","doi":"10.1109/RTSS.2018.00018","DOIUrl":"https://doi.org/10.1109/RTSS.2018.00018","url":null,"abstract":"OpenVX is a recently ratified standard that was expressly proposed to facilitate the design of computer-vision (CV) applications used in real-time embedded systems. Despite its real-time focus, OpenVX presents several challenges when validating real-time constraints. Many of these challenges are rooted in the fact that OpenVX only implicitly defines any notion of a schedulable entity. Under OpenVX, CV applications are specified in the form of processing graphs that are inherently considered to execute monolithically end-to-end. This monolithic execution hinders parallelism and can lead to significant processing-capacity loss. Prior work partially addressed this problem by treating graph nodes as schedulable entities, but under OpenVX, these nodes represent rather coarse-grained CV functions, so the available parallelism that can be obtained in this way is quite limited. In this paper, a much more fine-grained approach for scheduling OpenVX graphs is proposed. This approach was designed to enable additional parallelism and to eliminate schedulability-related processing-capacity loss that arises when programs execute on both CPUs and graphics processing units (GPUs). Response-time analysis for this new approach is presented and its efficacy is evaluated via a case study involving an actual CV application.","PeriodicalId":294784,"journal":{"name":"2018 IEEE Real-Time Systems Symposium (RTSS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133647602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The existing sporadic task model is inadequate for real-time systems to take advantage of Simultaneous Multithreading (SMT), which has been shown to improve performance in many areas of computing, but has seen little application to real-time systems. A new family of task models, collectively referred to as SMART, is introduced. SMART models allow for combining SMT and real time by accounting for the variable task execution costs caused by SMT.
{"title":"Work in Progress: Combining Real Time and Multithreading","authors":"S. Osborne, James H. Anderson","doi":"10.1109/RTSS.2018.00024","DOIUrl":"https://doi.org/10.1109/RTSS.2018.00024","url":null,"abstract":"The existing sporadic task model is inadequate for real-time systems to take advantage of Simultaneous Multithreading (SMT), which has been shown to improve performance in many areas of computing, but has seen little application to real-time systems. A new family of task models, collectively referred to as SMART, is introduced. SMART models allow for combining SMT and real time by accounting for the variable task execution costs caused by SMT.","PeriodicalId":294784,"journal":{"name":"2018 IEEE Real-Time Systems Symposium (RTSS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134010143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cyber-physical systems such as autonomous vehicles need to process and analyze multiple simultaneous streams of sensor data in real-time. Therefore, these systems require powerful multi-core platforms with hardware accelerators such as GP-GPUs. These accelerators generally consume significant amounts of power. Therefore, power management is required to ensure that task deadlines are met while staying within the energy and thermal constraints of the system. In these systems, most tasks execute using a combination of CPU and accelerator resources. Hence, the power of the CPU and the accelerator needs to be managed in tandem. To reduce energy consumption, commercially-available accelerators such as GP-GPUs and DSPs expose interfaces to scale their operating voltage and frequency. Hence, we propose the CycleTandem static frequency-scaling technique to co-optimize the operating frequencies of both the CPU and the hardware accelerator. Based on practical considerations of real-world platforms, we consider various energy-management scenarios where the accelerator or CPU frequencies may or may not be adjustable, and propose the CycleSolo family of algorithms for such contexts. Furthermore, we also study partitioning techniques to reduce the operating frequency when multi-core processors are used in conjunction with hardware accelerators. Experimental evaluations indicate that our proposed techniques can yield significant energy savings. We also present a case-study on the NVIDIA TX2 embedded platform to illustrate the energy savings delivered by our proposed techniques.
{"title":"CycleTandem: Energy-Saving Scheduling for Real-Time Systems with Hardware Accelerators","authors":"Sandeep M. D'Souza, R. Rajkumar","doi":"10.1109/RTSS.2018.00019","DOIUrl":"https://doi.org/10.1109/RTSS.2018.00019","url":null,"abstract":"Cyber-physical systems such as autonomous vehicles need to process and analyze multiple simultaneous streams of sensor data in real-time. Therefore, these systems require powerful multi-core platforms with hardware accelerators such as GP-GPUs. These accelerators generally consume significant amounts of power. Therefore, power management is required to ensure that task deadlines are met while staying within the energy and thermal constraints of the system. In these systems, most tasks execute using a combination of CPU and accelerator resources. Hence, the power of the CPU and the accelerator needs to be managed in tandem. To reduce energy consumption, commercially-available accelerators such as GP-GPUs and DSPs expose interfaces to scale their operating voltage and frequency. Hence, we propose the CycleTandem static frequency-scaling technique to co-optimize the operating frequencies of both the CPU and the hardware accelerator. Based on practical considerations of real-world platforms, we consider various energy-management scenarios where the accelerator or CPU frequencies may or may not be adjustable, and propose the CycleSolo family of algorithms for such contexts. Furthermore, we also study partitioning techniques to reduce the operating frequency when multi-core processors are used in conjunction with hardware accelerators. Experimental evaluations indicate that our proposed techniques can yield significant energy savings. We also present a case-study on the NVIDIA TX2 embedded platform to illustrate the energy savings delivered by our proposed techniques.","PeriodicalId":294784,"journal":{"name":"2018 IEEE Real-Time Systems Symposium (RTSS)","volume":"472 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134322576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a method for designing software transactional memory that relies on the use of locking protocols to ensure that transactions will never be forced to retry. We discuss our approaches to implementing this method and tunable parameters that may be able to improve schedulability on an application-specific basis.
{"title":"Work-in-Progress: Lock-Based Software Transactional Memory for Real-Time Systems","authors":"Catherine E. Nemitz, James H. Anderson","doi":"10.1109/RTSS.2018.00026","DOIUrl":"https://doi.org/10.1109/RTSS.2018.00026","url":null,"abstract":"We propose a method for designing software transactional memory that relies on the use of locking protocols to ensure that transactions will never be forced to retry. We discuss our approaches to implementing this method and tunable parameters that may be able to improve schedulability on an application-specific basis.","PeriodicalId":294784,"journal":{"name":"2018 IEEE Real-Time Systems Symposium (RTSS)","volume":"2 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133825392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In real-time embedded systems certain tasks are activated according to a rotation source, such as angular tasks in engine control unit triggered whenever the engine crankshaft reaches a specific angular position. To reduce the workload at high speeds, these tasks also adopt different implementations at different rotation speed intervals. However, the current studies limit to the case that the switching speeds at which task implementations should change are configured at design time. In this paper, we propose to study the task model where switching speeds are dynamically adjusted. We develop schedulability analysis techniques for such systems, including a new digraph-based task model to safely approximate the workload from software tasks triggered at predefined rotation angles. Experiments on synthetic task systems demonstrate that the proposed approach provides substantial benefits on system schedulability.
{"title":"Schedulability Analysis of Adaptive Variable-Rate Tasks with Dynamic Switching Speeds","authors":"Chao Peng, Yecheng Zhao, Haibo Zeng","doi":"10.1109/RTSS.2018.00054","DOIUrl":"https://doi.org/10.1109/RTSS.2018.00054","url":null,"abstract":"In real-time embedded systems certain tasks are activated according to a rotation source, such as angular tasks in engine control unit triggered whenever the engine crankshaft reaches a specific angular position. To reduce the workload at high speeds, these tasks also adopt different implementations at different rotation speed intervals. However, the current studies limit to the case that the switching speeds at which task implementations should change are configured at design time. In this paper, we propose to study the task model where switching speeds are dynamically adjusted. We develop schedulability analysis techniques for such systems, including a new digraph-based task model to safely approximate the workload from software tasks triggered at predefined rotation angles. Experiments on synthetic task systems demonstrate that the proposed approach provides substantial benefits on system schedulability.","PeriodicalId":294784,"journal":{"name":"2018 IEEE Real-Time Systems Symposium (RTSS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115591480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the problem of guaranteeing performance and predictability of NAND flash memory in a real-time storage system. Our approach implements a new flash translation layer scheme that exploits internal parallelism within solid state storage devices. We describe the Partitioned Real-Time Flash Translation Layer (PaRT-FTL), which splits a set of flash chips into separate read and write sets. This ensures reads and writes to separate chips proceed in parallel. However, PaRT-FTL is also able to rebuild the data for a read request from a flash chip that is busy servicing a write request or performing garbage collection. Consequently, reads are never blocked by writes or storage space reclamation. PaRT-FTL is compared to previous real-time approaches including scheduling, over-provisioning and partial garbage collection. We show that by isolating read and write requests using encoding techniques, PaRT-FTL provides better latency guarantees for real-time applications.
{"title":"Partitioned Real-Time NAND Flash Storage","authors":"Katherine Missimer, R. West","doi":"10.1109/RTSS.2018.00036","DOIUrl":"https://doi.org/10.1109/RTSS.2018.00036","url":null,"abstract":"This paper addresses the problem of guaranteeing performance and predictability of NAND flash memory in a real-time storage system. Our approach implements a new flash translation layer scheme that exploits internal parallelism within solid state storage devices. We describe the Partitioned Real-Time Flash Translation Layer (PaRT-FTL), which splits a set of flash chips into separate read and write sets. This ensures reads and writes to separate chips proceed in parallel. However, PaRT-FTL is also able to rebuild the data for a read request from a flash chip that is busy servicing a write request or performing garbage collection. Consequently, reads are never blocked by writes or storage space reclamation. PaRT-FTL is compared to previous real-time approaches including scheduling, over-provisioning and partial garbage collection. We show that by isolating read and write requests using encoding techniques, PaRT-FTL provides better latency guarantees for real-time applications.","PeriodicalId":294784,"journal":{"name":"2018 IEEE Real-Time Systems Symposium (RTSS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131490020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern operating systems allow task migrations to be restricted by specifying per-task processor affinity masks. Such a mask specifies the set of processor cores upon which a task can be scheduled. In this paper, a semi-partitioned scheduler, AM-Red (affinity mask reduction), is presented for scheduling implicit-deadline sporadic tasks with arbitrary affinity masks on an identical multiprocessor. AM-Red is obtained by applying an affinity-mask-reduction method that produces affinities in accordance with those specified, without compromising feasibility, but with only a linear number of migrating tasks. It functions by employing a tunable frame of size |F|. For any choice of |F|, AM-Red is soft-real-time optimal, with tardiness bounded by |F|, but the frequency of task migrations is proportional to |F|. If |F| divides all task periods, then AM-Red is also hard-real-time-optimal (tardiness is zero). AM-Red is the first optimal scheduler proposed for arbitrary affinity masks without future knowledge of all job releases. Experiments are presented that show that AM-Red is implementable with low overhead and yields reasonable tardiness and task-migration frequency.
{"title":"An Optimal Semi-Partitioned Scheduler Assuming Arbitrary Affinity Masks","authors":"S. Voronov, James H. Anderson","doi":"10.1109/RTSS.2018.00055","DOIUrl":"https://doi.org/10.1109/RTSS.2018.00055","url":null,"abstract":"Modern operating systems allow task migrations to be restricted by specifying per-task processor affinity masks. Such a mask specifies the set of processor cores upon which a task can be scheduled. In this paper, a semi-partitioned scheduler, AM-Red (affinity mask reduction), is presented for scheduling implicit-deadline sporadic tasks with arbitrary affinity masks on an identical multiprocessor. AM-Red is obtained by applying an affinity-mask-reduction method that produces affinities in accordance with those specified, without compromising feasibility, but with only a linear number of migrating tasks. It functions by employing a tunable frame of size |F|. For any choice of |F|, AM-Red is soft-real-time optimal, with tardiness bounded by |F|, but the frequency of task migrations is proportional to |F|. If |F| divides all task periods, then AM-Red is also hard-real-time-optimal (tardiness is zero). AM-Red is the first optimal scheduler proposed for arbitrary affinity masks without future knowledge of all job releases. Experiments are presented that show that AM-Red is implementable with low overhead and yields reasonable tardiness and task-migration frequency.","PeriodicalId":294784,"journal":{"name":"2018 IEEE Real-Time Systems Symposium (RTSS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121652591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we present a novel technique to automatically generate satisfying and violating traces for a Signal Temporal Logic (STL) formula. STL is a logic whose formulas are interpreted over real-valued signals that evolve over dense time, which is a natural setting for Cyber-Physical Systems (CPS) applications. However, the process of developing appropriate STL requirements can be difficult and error prone. In this work, we provide a method to assist designers in the development of STL requirements for CPS applications. Our technique automatically encodes a given STL formula into a satisfiability modulo theory (SMT) formula in an appropriate theory. Satisfying and violating traces for the STL specification can be obtained by solving satisfiability problems on the encoded SMT formulas. In particular, models returned by the SMT solver correspond to traces that satisfy/violate the STL formula, thus offering a window into the types of behaviors specified by the formula. We demonstrate how the method can be used to debug problems with STL requirements, and we evaluate the performance of the method on a collection of requirements developed for CPS applications.
{"title":"Automatic Trace Generation for Signal Temporal Logic","authors":"P. Prabhakar, Ratan Lal, J. Kapinski","doi":"10.1109/RTSS.2018.00038","DOIUrl":"https://doi.org/10.1109/RTSS.2018.00038","url":null,"abstract":"In this work, we present a novel technique to automatically generate satisfying and violating traces for a Signal Temporal Logic (STL) formula. STL is a logic whose formulas are interpreted over real-valued signals that evolve over dense time, which is a natural setting for Cyber-Physical Systems (CPS) applications. However, the process of developing appropriate STL requirements can be difficult and error prone. In this work, we provide a method to assist designers in the development of STL requirements for CPS applications. Our technique automatically encodes a given STL formula into a satisfiability modulo theory (SMT) formula in an appropriate theory. Satisfying and violating traces for the STL specification can be obtained by solving satisfiability problems on the encoded SMT formulas. In particular, models returned by the SMT solver correspond to traces that satisfy/violate the STL formula, thus offering a window into the types of behaviors specified by the formula. We demonstrate how the method can be used to debug problems with STL requirements, and we evaluate the performance of the method on a collection of requirements developed for CPS applications.","PeriodicalId":294784,"journal":{"name":"2018 IEEE Real-Time Systems Symposium (RTSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128818247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thanks to Mixed-Criticality (MC) scheduling, high and low-criticality tasks can share the same execution platform, improving considerably the usage of computation resources. Even if the execution platform is shared with low-criticality tasks, deadlines of high-criticality tasks must be respected. This is usually enforced thanks to operational modes of the system: if necessary, a high-criticality execution mode allocates more time to high-criticality tasks at the expense of low-criticality tasks' execution. Nonetheless, most MC scheduling policies in the literature have only considered independent task sets. For safety-critical real-time systems, this is a strong limitation: models used to describe reactive safety-critical software often consider dependencies among tasks or jobs. In this paper, we define a meta-heuristic to schedule multiprocessor systems composed of multi-periodic Directed Acyclic Graphs of MC tasks. This meta-heuristic computes the scheduling of the system in the high-criticality mode first. The computation of the low-criticality scheduling respects a condition on high-criticality tasks' jobs, ensuring that high-criticality tasks never miss their deadlines. An efficient implementation of this meta-heuristic is presented. In high-criticality mode, high-criticality tasks are scheduled as late as possible. Then two global scheduling tables are produced, one per criticality mode. Experimental results demonstrate our method outperforms approaches of the literature in terms of acceptance rate for randomly generated systems.
{"title":"Scheduling Multi-periodic Mixed-Criticality DAGs on Multi-core Architectures","authors":"R. Medina, Etienne Borde, L. Pautet","doi":"10.1109/RTSS.2018.00042","DOIUrl":"https://doi.org/10.1109/RTSS.2018.00042","url":null,"abstract":"Thanks to Mixed-Criticality (MC) scheduling, high and low-criticality tasks can share the same execution platform, improving considerably the usage of computation resources. Even if the execution platform is shared with low-criticality tasks, deadlines of high-criticality tasks must be respected. This is usually enforced thanks to operational modes of the system: if necessary, a high-criticality execution mode allocates more time to high-criticality tasks at the expense of low-criticality tasks' execution. Nonetheless, most MC scheduling policies in the literature have only considered independent task sets. For safety-critical real-time systems, this is a strong limitation: models used to describe reactive safety-critical software often consider dependencies among tasks or jobs. In this paper, we define a meta-heuristic to schedule multiprocessor systems composed of multi-periodic Directed Acyclic Graphs of MC tasks. This meta-heuristic computes the scheduling of the system in the high-criticality mode first. The computation of the low-criticality scheduling respects a condition on high-criticality tasks' jobs, ensuring that high-criticality tasks never miss their deadlines. An efficient implementation of this meta-heuristic is presented. In high-criticality mode, high-criticality tasks are scheduled as late as possible. Then two global scheduling tables are produced, one per criticality mode. Experimental results demonstrate our method outperforms approaches of the literature in terms of acceptance rate for randomly generated systems.","PeriodicalId":294784,"journal":{"name":"2018 IEEE Real-Time Systems Symposium (RTSS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117077125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}