Björn Forsberg, Kai Lampka, Vasileios Spiliopoulos
This paper proposes a scheme which drives a processor beyond its rated operation frequency, e. g., by exploiting Intel's boost technology, to digest the peak workload of the system in time. In the setting of deadline constrained workloads, this is far from trivial: the boost mode can only be used during short time spans, therefore it can only help to digest the peak workload, rather than serving the normal case. A lowered processor frequency, used outside the peak workload time, yields a backlog of not completed jobs. This backlog may result in deadline violations or buffer overflows, if the next burst of job arrivals appears too early. To overcome the above problem, we propose a peak workload aware speed assignment strategy, which only allows the system to build up computation backlog if the absence of high computation demands is assured. Contrasting the existing body of work, we take advantage of bursty arrival patterns of compute jobs, thereby progressing over the standard (non-bursty sporadic) job release model. Together with our scheme, we also present a tool chain and simulations of synthetic workloads for investigating the thermal effects of different speed assignment strategies.
{"title":"An online overclocking scheme for bursty real-time tasks and an evaluation of its thermal impact","authors":"Björn Forsberg, Kai Lampka, Vasileios Spiliopoulos","doi":"10.1145/2993452.2993568","DOIUrl":"https://doi.org/10.1145/2993452.2993568","url":null,"abstract":"This paper proposes a scheme which drives a processor beyond its rated operation frequency, e. g., by exploiting Intel's boost technology, to digest the peak workload of the system in time. In the setting of deadline constrained workloads, this is far from trivial: the boost mode can only be used during short time spans, therefore it can only help to digest the peak workload, rather than serving the normal case. A lowered processor frequency, used outside the peak workload time, yields a backlog of not completed jobs. This backlog may result in deadline violations or buffer overflows, if the next burst of job arrivals appears too early. To overcome the above problem, we propose a peak workload aware speed assignment strategy, which only allows the system to build up computation backlog if the absence of high computation demands is assured. Contrasting the existing body of work, we take advantage of bursty arrival patterns of compute jobs, thereby progressing over the standard (non-bursty sporadic) job release model. Together with our scheme, we also present a tool chain and simulations of synthetic workloads for investigating the thermal effects of different speed assignment strategies.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124646289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Kurth, Andreas Tretter, P. Hager, S. Sanabria, Orçun Göksel, L. Thiele, L. Benini
Ultrasound imaging is one of the most important medical diagnostic methods. The bulkiness of state-of-the-art high-quality ultrasound devices, however, drastically limits their usability in important application scenarios. In this paper, we show how a portable medical ultrasound device can be built using many-core technology and programmable logic, combining low power consumption with high flexibility. We discuss a typical ultrasound image reconstruction algorithm and howit can be parallelized using a pipelined design that efficiently partitions theworkload among heterogeneous processing elements. A special focus lies on the limited memory resources and data bandwidth between components. To tackle both problems, we use floating windowbuffers and approximate computations, and we minimize lookup table sizes using on-the-fly calculations. We evaluate the design on the Adapteva Parallella platform, which contains a power-efficient 16-core Epiphany coprocessor and a Zynq SoC including a dual-core ARM A9 processor and programmable logic. Experimental results show that parallel beamforming of 128 input channels to a 288x128 pixel ultrasound image can be achieved on the Parallella at a rate of 5.3 frames per second consuming only 2watt of dynamic power.
{"title":"Mobile ultrasound imaging on heterogeneous multi-core platforms","authors":"Andreas Kurth, Andreas Tretter, P. Hager, S. Sanabria, Orçun Göksel, L. Thiele, L. Benini","doi":"10.1145/2993452.2993565","DOIUrl":"https://doi.org/10.1145/2993452.2993565","url":null,"abstract":"Ultrasound imaging is one of the most important medical diagnostic methods. The bulkiness of state-of-the-art high-quality ultrasound devices, however, drastically limits their usability in important application scenarios. In this paper, we show how a portable medical ultrasound device can be built using many-core technology and programmable logic, combining low power consumption with high flexibility. We discuss a typical ultrasound image reconstruction algorithm and howit can be parallelized using a pipelined design that efficiently partitions theworkload among heterogeneous processing elements. A special focus lies on the limited memory resources and data bandwidth between components. To tackle both problems, we use floating windowbuffers and approximate computations, and we minimize lookup table sizes using on-the-fly calculations. We evaluate the design on the Adapteva Parallella platform, which contains a power-efficient 16-core Epiphany coprocessor and a Zynq SoC including a dual-core ARM A9 processor and programmable logic. Experimental results show that parallel beamforming of 128 input channels to a 288x128 pixel ultrasound image can be achieved on the Parallella at a rate of 5.3 frames per second consuming only 2watt of dynamic power.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131220947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-time streaming applications with cyclic data dependencies that are executed on multiprocessor systems with processor sharing usually require a temporal analysis to give guarantees on their temporal behavior at design time. Current accurate analysis techniques for cyclic applications that are scheduled with Static Priority Preemptive (SPP) schedulers are however limited to the analysis of applications that can be expressed with Homogeneous Synchronous Dataflow (HSDF) models, i.e. in which all tasks operate at a single rate. Moreover, it is required that both input and output buffers synchronize atomically at the beginnings and finishes of task executions, which is difficult to realize on many existing hardware platforms. This paper presents a temporal analysis approach for cyclic real-time streaming applications executed on multiprocessor systems with processor sharing and SPP scheduling that can be expressed using Cyclo-Static Dataflow (CSDF) models. This allows to model tasks with multiple phases and changing rates and furthermore resolves the problematic restriction that buffer synchronization must occur atomically at the boundaries of task executions. For that purpose a joint interference characterization over multiple phases is introduced, which realizes a significant accuracy improvement compared to an isolated consideration of interference. Applicability, efficiency and accuracy of the presented approach are evaluated in a case study using a WLAN 802.11p transceiver application. Thereby different use-cases of CSDF modeling are discussed, including a CSDF model relaxing the requirement of atomic synchronization.
{"title":"Temporal analysis of static priority preemptive scheduled cyclic streaming applications using CSDF models","authors":"P. Kurtin, M. Bekooij","doi":"10.1145/2993452.2993564","DOIUrl":"https://doi.org/10.1145/2993452.2993564","url":null,"abstract":"Real-time streaming applications with cyclic data dependencies that are executed on multiprocessor systems with processor sharing usually require a temporal analysis to give guarantees on their temporal behavior at design time. Current accurate analysis techniques for cyclic applications that are scheduled with Static Priority Preemptive (SPP) schedulers are however limited to the analysis of applications that can be expressed with Homogeneous Synchronous Dataflow (HSDF) models, i.e. in which all tasks operate at a single rate. Moreover, it is required that both input and output buffers synchronize atomically at the beginnings and finishes of task executions, which is difficult to realize on many existing hardware platforms. This paper presents a temporal analysis approach for cyclic real-time streaming applications executed on multiprocessor systems with processor sharing and SPP scheduling that can be expressed using Cyclo-Static Dataflow (CSDF) models. This allows to model tasks with multiple phases and changing rates and furthermore resolves the problematic restriction that buffer synchronization must occur atomically at the boundaries of task executions. For that purpose a joint interference characterization over multiple phases is introduced, which realizes a significant accuracy improvement compared to an isolated consideration of interference. Applicability, efficiency and accuracy of the presented approach are evaluated in a case study using a WLAN 802.11p transceiver application. Thereby different use-cases of CSDF modeling are discussed, including a CSDF model relaxing the requirement of atomic synchronization.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129400698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}