An efficient photoplethysmography imaging system and advanced algorithm for continuous monitoring of skin microcirculation was developed. The system comprises compact device and computer with software for visualizing of skin blood volume changes. The software is able to process high-resolution microcirculation amplitude maps in real-time. It was tested in clinical environment during the regional anesthesia procedures. The Eulerian-based method showed improved sensitivity and higher resolution of microcirculation maps.
{"title":"Photoplethysmography imaging algorithm for continuous monitoring of regional anesthesia","authors":"U. Rubins, J. Spigulis, A. Miscuks","doi":"10.1145/2993452.2994308","DOIUrl":"https://doi.org/10.1145/2993452.2994308","url":null,"abstract":"An efficient photoplethysmography imaging system and advanced algorithm for continuous monitoring of skin microcirculation was developed. The system comprises compact device and computer with software for visualizing of skin blood volume changes. The software is able to process high-resolution microcirculation amplitude maps in real-time. It was tested in clinical environment during the regional anesthesia procedures. The Eulerian-based method showed improved sensitivity and higher resolution of microcirculation maps.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130495645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many domains such as robotics and industrial automation, a growing number of Control Applications utilize cameras as a sensor. Such Visual Servoing Systems increasingly rely on Gigabit Ethernet (GigE) as a communication backbone and require real-time execution. The implementation on small, low-power embedded platforms suitable for the respective domain is challenging in terms of both computation and communication. Whilst advances in CPU and Field Programmable Gate Array (FPGA) technology enable the implementation of computationally heavier Image Processing Pipelines, the interface between such platforms and an Ethernet-based communication backbone still requires careful design to achieve fast and deterministic Image Acquisition. Although standardized Ethernet-based camera protocols such as GigE Vision unify camera configuration and data transmission, traditional software-based Image Acquisition is insufficient on small, low-power embedded platforms due to tight throughput and latency constraints and the overhead caused by decoding such multi-layered protocols. In this paper, we propose Scatter-Gather Direct Memory Access (SG/DMA) Proxying as a generic method to seamlessly extend the existing network subsystem of current Systemson- Chip (SoCs) with hardware-based filtering capabilities. Based thereon, we present a novel mixed-hardcore/softcore GigE Vision Framegrabber capable of directly feeding a subsequent in-stream Image Processing Pipeline with sub-microsecond acquisition latency. By rerouting all incoming Ethernet frames to our GigE Vision Bridge using SG/DMA Proxying, we are able to separate image and non-image data with zero CPU and memory intervention and perform Image Acquisition at full line rate of Gigabit Ethernet (i.e., 125 Mpx/s for grayscale video). Our experimental evaluation shows the benefits of our proposed architecture on a Programmable SoC (pSoC) that combines a fixed-function multi-core SoC with configurable FPGA fabric.
{"title":"GigE vision data acquisition for visual servoing using SG/DMA proxying","authors":"M. Geier, Florian Pitzl, S. Chakraborty","doi":"10.1145/2993452.2993455","DOIUrl":"https://doi.org/10.1145/2993452.2993455","url":null,"abstract":"In many domains such as robotics and industrial automation, a growing number of Control Applications utilize cameras as a sensor. Such Visual Servoing Systems increasingly rely on Gigabit Ethernet (GigE) as a communication backbone and require real-time execution. The implementation on small, low-power embedded platforms suitable for the respective domain is challenging in terms of both computation and communication. Whilst advances in CPU and Field Programmable Gate Array (FPGA) technology enable the implementation of computationally heavier Image Processing Pipelines, the interface between such platforms and an Ethernet-based communication backbone still requires careful design to achieve fast and deterministic Image Acquisition. Although standardized Ethernet-based camera protocols such as GigE Vision unify camera configuration and data transmission, traditional software-based Image Acquisition is insufficient on small, low-power embedded platforms due to tight throughput and latency constraints and the overhead caused by decoding such multi-layered protocols. In this paper, we propose Scatter-Gather Direct Memory Access (SG/DMA) Proxying as a generic method to seamlessly extend the existing network subsystem of current Systemson- Chip (SoCs) with hardware-based filtering capabilities. Based thereon, we present a novel mixed-hardcore/softcore GigE Vision Framegrabber capable of directly feeding a subsequent in-stream Image Processing Pipeline with sub-microsecond acquisition latency. By rerouting all incoming Ethernet frames to our GigE Vision Bridge using SG/DMA Proxying, we are able to separate image and non-image data with zero CPU and memory intervention and perform Image Acquisition at full line rate of Gigabit Ethernet (i.e., 125 Mpx/s for grayscale video). Our experimental evaluation shows the benefits of our proposed architecture on a Programmable SoC (pSoC) that combines a fixed-function multi-core SoC with configurable FPGA fabric.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121840415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Networks-on-Chip (NoCs) for contemporary multiprocessors systems must integrate complex multimedia applications which require not only high performance but also timing guarantees. However, in existing NoCs, designed for real-time systems, timing constraints are frequently implemented at the cost of decreased hardware utilization, i.e strict spatial or temporal isolation between transmissions. In this work, we propose an alternative - multi-path scheduling (MPS) - mechanism exploiting the multidimensional structure of NoCs, to combine the path selection and the temporal flow control based on the global state of the system. Consequently, MPS allows a safe sharing of NoC resources while preserving a high utilization achieved through a predictable load distribution of data traffic among different paths, reachable from source to destination. We demonstrate using benchmarks, that MPS not only provides higher average performance compared to existing solutions, but also allows to provide worst-case guarantees. We prove this important feature using formal timing analysis. Moreover, MPS induces a low implementation overhead as it can be applied to many existing wormhole-switched and performance optimized NoCs without requiring complex hardware modifications.
{"title":"Multi-path scheduling for multimedia traffic in safety critical on-chip network","authors":"Adam Kostrzewa, R. Ernst, Selma Saidi","doi":"10.1145/2993452.2993563","DOIUrl":"https://doi.org/10.1145/2993452.2993563","url":null,"abstract":"Networks-on-Chip (NoCs) for contemporary multiprocessors systems must integrate complex multimedia applications which require not only high performance but also timing guarantees. However, in existing NoCs, designed for real-time systems, timing constraints are frequently implemented at the cost of decreased hardware utilization, i.e strict spatial or temporal isolation between transmissions. In this work, we propose an alternative - multi-path scheduling (MPS) - mechanism exploiting the multidimensional structure of NoCs, to combine the path selection and the temporal flow control based on the global state of the system. Consequently, MPS allows a safe sharing of NoC resources while preserving a high utilization achieved through a predictable load distribution of data traffic among different paths, reachable from source to destination. We demonstrate using benchmarks, that MPS not only provides higher average performance compared to existing solutions, but also allows to provide worst-case guarantees. We prove this important feature using formal timing analysis. Moreover, MPS induces a low implementation overhead as it can be applied to many existing wormhole-switched and performance optimized NoCs without requiring complex hardware modifications.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114960674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
State-of-the-art smartphones can generate excessive amounts of heat during high computational activity or long durations of use. While throttling mechanisms ensure safe component and outer skin level temperatures, frequent throttling can largely degrade the user-perceived performance. This work explores the impact of multiple different thermal constraints in a real-life smartphone on user experience. In addition to high processor temperatures, which have traditionally been a major point of interest, we show that applications can also quickly elevate battery and device skin temperatures to critical levels. We introduce and evaluate various thermally-efficient runtime management techniques that slow down heating under performance guarantees so as to sustain a desirable performance for maximum durations. Our techniques achieve up to 8x longer sustainable QoS.
{"title":"Providing sustainable performance in thermally constrained mobile devices","authors":"O. Sahin, A. Coskun","doi":"10.1145/2993452.2994309","DOIUrl":"https://doi.org/10.1145/2993452.2994309","url":null,"abstract":"State-of-the-art smartphones can generate excessive amounts of heat during high computational activity or long durations of use. While throttling mechanisms ensure safe component and outer skin level temperatures, frequent throttling can largely degrade the user-perceived performance. This work explores the impact of multiple different thermal constraints in a real-life smartphone on user experience. In addition to high processor temperatures, which have traditionally been a major point of interest, we show that applications can also quickly elevate battery and device skin temperatures to critical levels. We introduce and evaluate various thermally-efficient runtime management techniques that slow down heating under performance guarantees so as to sustain a desirable performance for maximum durations. Our techniques achieve up to 8x longer sustainable QoS.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131217678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we will explore and develop an embedded real time system and associated algorithms that enable an aggregation of limited resource, low-quality, projection-enabled mobile devices to collaboratively produce a higher quality video stream for a superior viewing experience. Such a resource aggregation across multiple projector enabled devices can lead to a per unit resource savings while moving the cost to the aggregate. The pico-projectors that are embedded in mobile devices such as cell phones have a much lower resolution and brightness than standard projectors. Tiling (putting the projection area of multiple projectors in a rectangular array overlapping them slightly around the boundary) and superimposing (putting the projection area of multiple projectors right on top of each other) multiple of such projectors, registered via automated registration through the cameras residing within those mobile devices, result in different ways of aggregating resources across these multiple devices. Evaluation of our proof-of-concept system shows significant improvement for each mobile device in two primary factors of bandwidth usage and power consumption when using a collaborative federation of projection-embedded mobile devices. To the best of our knowledge, this is the first time aggregation of resources across a federation of low-cost, low power mobile devices can be achieved completely and automatically in real-time that can result in a viewing experience of as high as 4K (3840x2160) content with integrated four mobile devices playing 1080p content.
{"title":"Resource aggregation for collaborative video from multiple projector enabled mobile devices","authors":"Hung Nguyen, F. Kurdahi, A. Majumder","doi":"10.1145/2993452.2993561","DOIUrl":"https://doi.org/10.1145/2993452.2993561","url":null,"abstract":"In this paper, we will explore and develop an embedded real time system and associated algorithms that enable an aggregation of limited resource, low-quality, projection-enabled mobile devices to collaboratively produce a higher quality video stream for a superior viewing experience. Such a resource aggregation across multiple projector enabled devices can lead to a per unit resource savings while moving the cost to the aggregate. The pico-projectors that are embedded in mobile devices such as cell phones have a much lower resolution and brightness than standard projectors. Tiling (putting the projection area of multiple projectors in a rectangular array overlapping them slightly around the boundary) and superimposing (putting the projection area of multiple projectors right on top of each other) multiple of such projectors, registered via automated registration through the cameras residing within those mobile devices, result in different ways of aggregating resources across these multiple devices. Evaluation of our proof-of-concept system shows significant improvement for each mobile device in two primary factors of bandwidth usage and power consumption when using a collaborative federation of projection-embedded mobile devices. To the best of our knowledge, this is the first time aggregation of resources across a federation of low-cost, low power mobile devices can be achieved completely and automatically in real-time that can result in a viewing experience of as high as 4K (3840x2160) content with integrated four mobile devices playing 1080p content.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"SE-1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126571176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping. In this paper we first provide detailed infrared imaging results that show the impact of mapping decisions on the thermal and power profiles of CPU+GPU processors. Furthermore, we observe that runtime conditions such as power and CPU load from traditional workloads also affect the mapping decision. To exploit our observations, we propose techniques to characterize the OpenCL kernel workloads during run-time and map them on appropriate device under time-varying physical (i.e., chip power limit) and CPU load conditions, in particular the number of available CPU cores for the OpenCL kernel. We implement our dynamic scheduler on a real CPU+GPU processor and evaluate it using various OpenCL benchmarks. Compared to the state-ofthe- art kernel-level scheduling method, the proposed scheduler provides up to 31% and 10% improvements in runtime and energy, respectively.
{"title":"Scheduling challenges and opportunities in integrated CPU+GPU processors","authors":"K. Dev, S. Reda","doi":"10.1145/2993452.2994307","DOIUrl":"https://doi.org/10.1145/2993452.2994307","url":null,"abstract":"Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping. In this paper we first provide detailed infrared imaging results that show the impact of mapping decisions on the thermal and power profiles of CPU+GPU processors. Furthermore, we observe that runtime conditions such as power and CPU load from traditional workloads also affect the mapping decision. To exploit our observations, we propose techniques to characterize the OpenCL kernel workloads during run-time and map them on appropriate device under time-varying physical (i.e., chip power limit) and CPU load conditions, in particular the number of available CPU cores for the OpenCL kernel. We implement our dynamic scheduler on a real CPU+GPU processor and evaluate it using various OpenCL benchmarks. Compared to the state-ofthe- art kernel-level scheduling method, the proposed scheduler provides up to 31% and 10% improvements in runtime and energy, respectively.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116692105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Demand for computer vision analytics in the embedded world has increased rapidly as the Internet of Things (IoT) expands into cities, workplaces, and homes. Common computationally intensive video and scene analysis tasks, such as pedestrian detection, counting, and tracking, are often relegated to acceleration hardware, or embedded GPUs. This paper showcases decision-making heuristics designed to improve the performance of these analytics. Working within the constraints of low power IoT infrastructure typically utilized in urban, traffic-heavy environments, our Precedent-Aware Classification (PAC) framework provides efficient pedestrian and vehicle detection in the absence of dedicated acceleration hardware. Our implementation takes advantage of frequently traveled routes in order to reduce the amount of required computation, which helps meet the tight timing requirements of embedded platforms where traditional computation models tend to fail. Testing and performance analysis of PAC was done using an ARM Cortex-A9 embedded processor, residing within the Xilinx Zynq 7000 FPGA. In normally populated traffic situations, PAC produced an average 3.23x speed-up and an average 16% improvement in pedestrian detection accuracy over using traditional classifiers alone.
{"title":"Rapid precedent-aware pedestrian and car classification on constrained IoT platforms","authors":"J. Danner, L. Wills, E. M. Ruiz, L. Lerner","doi":"10.1145/2993452.2993562","DOIUrl":"https://doi.org/10.1145/2993452.2993562","url":null,"abstract":"Demand for computer vision analytics in the embedded world has increased rapidly as the Internet of Things (IoT) expands into cities, workplaces, and homes. Common computationally intensive video and scene analysis tasks, such as pedestrian detection, counting, and tracking, are often relegated to acceleration hardware, or embedded GPUs. This paper showcases decision-making heuristics designed to improve the performance of these analytics. Working within the constraints of low power IoT infrastructure typically utilized in urban, traffic-heavy environments, our Precedent-Aware Classification (PAC) framework provides efficient pedestrian and vehicle detection in the absence of dedicated acceleration hardware. Our implementation takes advantage of frequently traveled routes in order to reduce the amount of required computation, which helps meet the tight timing requirements of embedded platforms where traditional computation models tend to fail. Testing and performance analysis of PAC was done using an ARM Cortex-A9 embedded processor, residing within the Xilinx Zynq 7000 FPGA. In normally populated traffic situations, PAC produced an average 3.23x speed-up and an average 16% improvement in pedestrian detection accuracy over using traditional classifiers alone.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121508268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many multimedia applications exhibit a phasic behavior. Phasic behavior of applications has been studied primarily focused on code execution. However, temporal variation in an application's memory usage can deviate from its program behavior, providing opportunities to exploit these memory phases to enable more efficient use of on-chip memory resources. In this work, we define memory phases as opposed to program phases, and illustrate the potential disparity between them. We propose mechanisms for light-weight online memory-phase detection. Additionally, we demonstrate their utility by deploying these techniques for sharing distributed on-chip Scratchpad Memories (SPMs) in multi-core platforms. The information gathered during memory phases are used to prioritize different memory pages in a multi-core platform without having any prior knowledge about running applications. By exploiting memory-phasic behavior, we achieved up to 45% memory access latency improvement on a set of multimedia applications.
{"title":"On detecting and using memory phases in multimedia systems","authors":"H. Tajik, Bryan Donyanavard, N. Dutt","doi":"10.1145/2993452.2993566","DOIUrl":"https://doi.org/10.1145/2993452.2993566","url":null,"abstract":"Many multimedia applications exhibit a phasic behavior. Phasic behavior of applications has been studied primarily focused on code execution. However, temporal variation in an application's memory usage can deviate from its program behavior, providing opportunities to exploit these memory phases to enable more efficient use of on-chip memory resources. In this work, we define memory phases as opposed to program phases, and illustrate the potential disparity between them. We propose mechanisms for light-weight online memory-phase detection. Additionally, we demonstrate their utility by deploying these techniques for sharing distributed on-chip Scratchpad Memories (SPMs) in multi-core platforms. The information gathered during memory phases are used to prioritize different memory pages in a multi-core platform without having any prior knowledge about running applications. By exploiting memory-phasic behavior, we achieved up to 45% memory access latency improvement on a set of multimedia applications.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123324381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently a novel extension of a dataflow model with a library task has been proposed to overcome the severe limitation of dataflow models to handle shared resources. The library task that contains library functions and shared data inside plays the role of a server task when dataflow tasks as clients call library functions. In this paper, we propose a meta-heuristic technique based on a multi-objective genetic algorithm to find Pareto-optimal solutions in terms of resource requirement and the worst-case response time (WCRT) of the extended synchronous dataflow (SDF) graph with library tasks. For a given task graph, the proposed technique determines not only the mapping and scheduling in a heterogeneous multiprocessor system, but also task priorities and library task duplication. When multiple tasks request the service of the library task simultaneously, a task may experience a significant contention delay. For fast design space exploration, a fast and conservative method to estimate the contention delay of library tasks is devised. With synthetic examples and two real-life applications, the viability of the proposed technique is verified.
{"title":"Multiprocessor scheduling of an SDF graph with library tasks considering the worst case contention delay","authors":"Hanwoong Jung, Hyunok Oh, S. Ha","doi":"10.1145/2993452.2993567","DOIUrl":"https://doi.org/10.1145/2993452.2993567","url":null,"abstract":"Recently a novel extension of a dataflow model with a library task has been proposed to overcome the severe limitation of dataflow models to handle shared resources. The library task that contains library functions and shared data inside plays the role of a server task when dataflow tasks as clients call library functions. In this paper, we propose a meta-heuristic technique based on a multi-objective genetic algorithm to find Pareto-optimal solutions in terms of resource requirement and the worst-case response time (WCRT) of the extended synchronous dataflow (SDF) graph with library tasks. For a given task graph, the proposed technique determines not only the mapping and scheduling in a heterogeneous multiprocessor system, but also task priorities and library task duplication. When multiple tasks request the service of the library task simultaneously, a task may experience a significant contention delay. For fast design space exploration, a fast and conservative method to estimate the contention delay of library tasks is devised. With synthetic examples and two real-life applications, the viability of the proposed technique is verified.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"26 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124636070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junbin Wang, Ke Yan, Kaiyuan Guo, Jincheng Yu, Lingzhi Sui, Song Yao, Song Han, Yu Wang
Real-time pedestrian detection and tracking are vital to many applications, such as the interaction between drones and human. However, the high complexity of Convolutional Neural Network (CNN) makes them rely on powerful servers, thus is hard for mobile platforms like drones. In this paper, we propose a CNN-based real-time pedestrian detection and tracking system, which can achieve 14.7 fps detection and 200 fps tracking with only 3W.
{"title":"Real-time pedestrian detection and tracking on customized hardware","authors":"Junbin Wang, Ke Yan, Kaiyuan Guo, Jincheng Yu, Lingzhi Sui, Song Yao, Song Han, Yu Wang","doi":"10.1145/2993452.2995268","DOIUrl":"https://doi.org/10.1145/2993452.2995268","url":null,"abstract":"Real-time pedestrian detection and tracking are vital to many applications, such as the interaction between drones and human. However, the high complexity of Convolutional Neural Network (CNN) makes them rely on powerful servers, thus is hard for mobile platforms like drones. In this paper, we propose a CNN-based real-time pedestrian detection and tracking system, which can achieve 14.7 fps detection and 200 fps tracking with only 3W.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121883295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}