2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)最新文献

英文中文

Photoplethysmography imaging algorithm for continuous monitoring of regional anesthesia 区域麻醉连续监测的光容积脉搏波成像算法

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

Pub Date : 2016-10-01 DOI: 10.1145/2993452.2994308

U. Rubins, J. Spigulis, A. Miscuks

An efficient photoplethysmography imaging system and advanced algorithm for continuous monitoring of skin microcirculation was developed. The system comprises compact device and computer with software for visualizing of skin blood volume changes. The software is able to process high-resolution microcirculation amplitude maps in real-time. It was tested in clinical environment during the regional anesthesia procedures. The Eulerian-based method showed improved sensitivity and higher resolution of microcirculation maps.

研制了一种高效的光容积脉搏波成像系统和先进的皮肤微循环连续监测算法。该系统包括紧凑的装置和带有可视化皮肤血容量变化软件的计算机。该软件能够实时处理高分辨率微循环振幅图。在局部麻醉过程中，在临床环境中进行了测试。欧拉方法对微循环图的灵敏度和分辨率均有提高。

引用次数: 17

GigE vision data acquisition for visual servoing using SG/DMA proxying 使用SG/DMA代理进行视觉伺服的GigE视觉数据采集

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

Pub Date : 2016-10-01 DOI: 10.1145/2993452.2993455

M. Geier, Florian Pitzl, S. Chakraborty

In many domains such as robotics and industrial automation, a growing number of Control Applications utilize cameras as a sensor. Such Visual Servoing Systems increasingly rely on Gigabit Ethernet (GigE) as a communication backbone and require real-time execution. The implementation on small, low-power embedded platforms suitable for the respective domain is challenging in terms of both computation and communication. Whilst advances in CPU and Field Programmable Gate Array (FPGA) technology enable the implementation of computationally heavier Image Processing Pipelines, the interface between such platforms and an Ethernet-based communication backbone still requires careful design to achieve fast and deterministic Image Acquisition. Although standardized Ethernet-based camera protocols such as GigE Vision unify camera configuration and data transmission, traditional software-based Image Acquisition is insufficient on small, low-power embedded platforms due to tight throughput and latency constraints and the overhead caused by decoding such multi-layered protocols. In this paper, we propose Scatter-Gather Direct Memory Access (SG/DMA) Proxying as a generic method to seamlessly extend the existing network subsystem of current Systemson- Chip (SoCs) with hardware-based filtering capabilities. Based thereon, we present a novel mixed-hardcore/softcore GigE Vision Framegrabber capable of directly feeding a subsequent in-stream Image Processing Pipeline with sub-microsecond acquisition latency. By rerouting all incoming Ethernet frames to our GigE Vision Bridge using SG/DMA Proxying, we are able to separate image and non-image data with zero CPU and memory intervention and perform Image Acquisition at full line rate of Gigabit Ethernet (i.e., 125 Mpx/s for grayscale video). Our experimental evaluation shows the benefits of our proposed architecture on a Programmable SoC (pSoC) that combines a fixed-function multi-core SoC with configurable FPGA fabric.

在许多领域，如机器人和工业自动化，越来越多的控制应用使用相机作为传感器。这种视觉伺服系统越来越依赖于千兆以太网(GigE)作为通信骨干，并且需要实时执行。在适合各自领域的小型、低功耗嵌入式平台上实现在计算和通信方面都具有挑战性。虽然CPU和现场可编程门阵列(FPGA)技术的进步使计算量更大的图像处理管道得以实现，但这些平台与基于以太网的通信骨干之间的接口仍然需要仔细设计，以实现快速和确定的图像采集。尽管标准化的基于以太网的摄像机协议(如GigE Vision)统一了摄像机配置和数据传输，但传统的基于软件的图像采集在小型、低功耗嵌入式平台上是不够的，因为这种多层协议的严格吞吐量和延迟限制以及解码带来的开销。在本文中，我们提出散射-收集直接存储器访问(SG/DMA)代理作为一种通用方法，以无缝扩展现有的系统芯片(soc)的现有网络子系统，具有基于硬件的过滤功能。在此基础上，我们提出了一种新型的混合硬核/软核GigE视觉抓帧器，能够以亚微秒的采集延迟直接馈送随后的流内图像处理管道。通过使用SG/DMA代理将所有传入的以太网帧重新路由到我们的GigE视觉桥，我们能够在没有CPU和内存干预的情况下分离图像和非图像数据，并以千兆以太网的全线速率(即灰度视频的125 Mpx/s)执行图像采集。我们的实验评估显示了我们提出的架构在可编程SoC (pSoC)上的优势，该pSoC结合了固定功能的多核SoC和可配置的FPGA结构。

{"title":"GigE vision data acquisition for visual servoing using SG/DMA proxying","authors":"M. Geier, Florian Pitzl, S. Chakraborty","doi":"10.1145/2993452.2993455","DOIUrl":"https://doi.org/10.1145/2993452.2993455","url":null,"abstract":"In many domains such as robotics and industrial automation, a growing number of Control Applications utilize cameras as a sensor. Such Visual Servoing Systems increasingly rely on Gigabit Ethernet (GigE) as a communication backbone and require real-time execution. The implementation on small, low-power embedded platforms suitable for the respective domain is challenging in terms of both computation and communication. Whilst advances in CPU and Field Programmable Gate Array (FPGA) technology enable the implementation of computationally heavier Image Processing Pipelines, the interface between such platforms and an Ethernet-based communication backbone still requires careful design to achieve fast and deterministic Image Acquisition. Although standardized Ethernet-based camera protocols such as GigE Vision unify camera configuration and data transmission, traditional software-based Image Acquisition is insufficient on small, low-power embedded platforms due to tight throughput and latency constraints and the overhead caused by decoding such multi-layered protocols. In this paper, we propose Scatter-Gather Direct Memory Access (SG/DMA) Proxying as a generic method to seamlessly extend the existing network subsystem of current Systemson- Chip (SoCs) with hardware-based filtering capabilities. Based thereon, we present a novel mixed-hardcore/softcore GigE Vision Framegrabber capable of directly feeding a subsequent in-stream Image Processing Pipeline with sub-microsecond acquisition latency. By rerouting all incoming Ethernet frames to our GigE Vision Bridge using SG/DMA Proxying, we are able to separate image and non-image data with zero CPU and memory intervention and perform Image Acquisition at full line rate of Gigabit Ethernet (i.e., 125 Mpx/s for grayscale video). Our experimental evaluation shows the benefits of our proposed architecture on a Programmable SoC (pSoC) that combines a fixed-function multi-core SoC with configurable FPGA fabric.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121840415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Multi-path scheduling for multimedia traffic in safety critical on-chip network 片上安全关键网络中多媒体流量的多路径调度

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

Pub Date : 2016-10-01 DOI: 10.1145/2993452.2993563

Adam Kostrzewa, R. Ernst, Selma Saidi

Networks-on-Chip (NoCs) for contemporary multiprocessors systems must integrate complex multimedia applications which require not only high performance but also timing guarantees. However, in existing NoCs, designed for real-time systems, timing constraints are frequently implemented at the cost of decreased hardware utilization, i.e strict spatial or temporal isolation between transmissions. In this work, we propose an alternative - multi-path scheduling (MPS) - mechanism exploiting the multidimensional structure of NoCs, to combine the path selection and the temporal flow control based on the global state of the system. Consequently, MPS allows a safe sharing of NoC resources while preserving a high utilization achieved through a predictable load distribution of data traffic among different paths, reachable from source to destination. We demonstrate using benchmarks, that MPS not only provides higher average performance compared to existing solutions, but also allows to provide worst-case guarantees. We prove this important feature using formal timing analysis. Moreover, MPS induces a low implementation overhead as it can be applied to many existing wormhole-switched and performance optimized NoCs without requiring complex hardware modifications.

现代多处理器系统的片上网络(noc)必须集成复杂的多媒体应用程序，这些应用程序不仅需要高性能，而且需要定时保证。然而，在为实时系统设计的现有noc中，时间限制的实施往往以降低硬件利用率为代价，即传输之间严格的空间或时间隔离。本文提出了一种多路径调度机制，利用noc的多维结构，将路径选择和基于系统全局状态的时间流量控制相结合。因此，MPS允许安全共享NoC资源，同时通过从源到目的地可到达的不同路径之间的可预测的数据流量负载分配来保持高利用率。我们使用基准测试证明，与现有解决方案相比，MPS不仅提供了更高的平均性能，而且还允许提供最坏情况的保证。我们用正式的时序分析证明了这一重要特征。此外，MPS的实现开销很低，因为它可以应用于许多现有的虫洞切换和性能优化的noc，而不需要进行复杂的硬件修改。

{"title":"Multi-path scheduling for multimedia traffic in safety critical on-chip network","authors":"Adam Kostrzewa, R. Ernst, Selma Saidi","doi":"10.1145/2993452.2993563","DOIUrl":"https://doi.org/10.1145/2993452.2993563","url":null,"abstract":"Networks-on-Chip (NoCs) for contemporary multiprocessors systems must integrate complex multimedia applications which require not only high performance but also timing guarantees. However, in existing NoCs, designed for real-time systems, timing constraints are frequently implemented at the cost of decreased hardware utilization, i.e strict spatial or temporal isolation between transmissions. In this work, we propose an alternative - multi-path scheduling (MPS) - mechanism exploiting the multidimensional structure of NoCs, to combine the path selection and the temporal flow control based on the global state of the system. Consequently, MPS allows a safe sharing of NoC resources while preserving a high utilization achieved through a predictable load distribution of data traffic among different paths, reachable from source to destination. We demonstrate using benchmarks, that MPS not only provides higher average performance compared to existing solutions, but also allows to provide worst-case guarantees. We prove this important feature using formal timing analysis. Moreover, MPS induces a low implementation overhead as it can be applied to many existing wormhole-switched and performance optimized NoCs without requiring complex hardware modifications.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114960674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Providing sustainable performance in thermally constrained mobile devices 在受热限制的移动设备中提供可持续的性能

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

Pub Date : 2016-10-01 DOI: 10.1145/2993452.2994309

O. Sahin, A. Coskun

State-of-the-art smartphones can generate excessive amounts of heat during high computational activity or long durations of use. While throttling mechanisms ensure safe component and outer skin level temperatures, frequent throttling can largely degrade the user-perceived performance. This work explores the impact of multiple different thermal constraints in a real-life smartphone on user experience. In addition to high processor temperatures, which have traditionally been a major point of interest, we show that applications can also quickly elevate battery and device skin temperatures to critical levels. We introduce and evaluate various thermally-efficient runtime management techniques that slow down heating under performance guarantees so as to sustain a desirable performance for maximum durations. Our techniques achieve up to 8x longer sustainable QoS.

最先进的智能手机在高计算活动或长时间使用时可能会产生过多的热量。虽然节流机制可以确保组件和外皮温度的安全，但频繁的节流会在很大程度上降低用户感知的性能。这项工作探讨了现实生活中的智能手机中多种不同的热约束对用户体验的影响。除了传统上主要关注的高处理器温度外，我们还展示了应用程序可以快速将电池和设备表面温度提升到临界水平。我们介绍并评估了各种热效率高的运行时管理技术，这些技术可以在性能保证的情况下降低加热速度，从而在最长的持续时间内保持理想的性能。我们的技术可实现长达8倍的可持续QoS。

引用次数: 12

Resource aggregation for collaborative video from multiple projector enabled mobile devices 来自多个支持投影仪的移动设备的协作视频的资源聚合

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

Pub Date : 2016-10-01 DOI: 10.1145/2993452.2993561

Hung Nguyen, F. Kurdahi, A. Majumder

In this paper, we will explore and develop an embedded real time system and associated algorithms that enable an aggregation of limited resource, low-quality, projection-enabled mobile devices to collaboratively produce a higher quality video stream for a superior viewing experience. Such a resource aggregation across multiple projector enabled devices can lead to a per unit resource savings while moving the cost to the aggregate. The pico-projectors that are embedded in mobile devices such as cell phones have a much lower resolution and brightness than standard projectors. Tiling (putting the projection area of multiple projectors in a rectangular array overlapping them slightly around the boundary) and superimposing (putting the projection area of multiple projectors right on top of each other) multiple of such projectors, registered via automated registration through the cameras residing within those mobile devices, result in different ways of aggregating resources across these multiple devices. Evaluation of our proof-of-concept system shows significant improvement for each mobile device in two primary factors of bandwidth usage and power consumption when using a collaborative federation of projection-embedded mobile devices. To the best of our knowledge, this is the first time aggregation of resources across a federation of low-cost, low power mobile devices can be achieved completely and automatically in real-time that can result in a viewing experience of as high as 4K (3840x2160) content with integrated four mobile devices playing 1080p content.

在本文中，我们将探索和开发一种嵌入式实时系统和相关算法，使有限资源、低质量、支持投影的移动设备聚合在一起，协同产生更高质量的视频流，以获得卓越的观看体验。这种跨多个支持投影仪的设备的资源聚合可以在将成本转移到聚合时节省单位资源。嵌入手机等移动设备中的微型投影仪的分辨率和亮度都比标准投影仪低得多。平铺(将多个投影仪的投影区域放在一个矩形阵列中，在边界周围稍微重叠)和叠加(将多个投影仪的投影区域放在彼此之上)多个这样的投影仪，通过驻留在这些移动设备中的相机通过自动注册进行注册，导致不同的方式在这些多个设备上聚合资源。对我们的概念验证系统的评估显示，当使用投影嵌入式移动设备的协作联盟时，每个移动设备在带宽使用和功耗两个主要因素上都有显着改善。据我们所知，这是第一次在低成本、低功耗的移动设备联盟中实现资源的完全、自动的实时聚合，可以实现高达4K (3840x2160)内容的观看体验，并集成四个移动设备播放1080p内容。

{"title":"Resource aggregation for collaborative video from multiple projector enabled mobile devices","authors":"Hung Nguyen, F. Kurdahi, A. Majumder","doi":"10.1145/2993452.2993561","DOIUrl":"https://doi.org/10.1145/2993452.2993561","url":null,"abstract":"In this paper, we will explore and develop an embedded real time system and associated algorithms that enable an aggregation of limited resource, low-quality, projection-enabled mobile devices to collaboratively produce a higher quality video stream for a superior viewing experience. Such a resource aggregation across multiple projector enabled devices can lead to a per unit resource savings while moving the cost to the aggregate. The pico-projectors that are embedded in mobile devices such as cell phones have a much lower resolution and brightness than standard projectors. Tiling (putting the projection area of multiple projectors in a rectangular array overlapping them slightly around the boundary) and superimposing (putting the projection area of multiple projectors right on top of each other) multiple of such projectors, registered via automated registration through the cameras residing within those mobile devices, result in different ways of aggregating resources across these multiple devices. Evaluation of our proof-of-concept system shows significant improvement for each mobile device in two primary factors of bandwidth usage and power consumption when using a collaborative federation of projection-embedded mobile devices. To the best of our knowledge, this is the first time aggregation of resources across a federation of low-cost, low power mobile devices can be achieved completely and automatically in real-time that can result in a viewing experience of as high as 4K (3840x2160) content with integrated four mobile devices playing 1080p content.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"SE-1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126571176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scheduling challenges and opportunities in integrated CPU+GPU processors 集成CPU+GPU处理器的调度挑战与机遇

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

Pub Date : 2016-10-01 DOI: 10.1145/2993452.2994307

K. Dev, S. Reda

Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping. In this paper we first provide detailed infrared imaging results that show the impact of mapping decisions on the thermal and power profiles of CPU+GPU processors. Furthermore, we observe that runtime conditions such as power and CPU load from traditional workloads also affect the mapping decision. To exploit our observations, we propose techniques to characterize the OpenCL kernel workloads during run-time and map them on appropriate device under time-varying physical (i.e., chip power limit) and CPU load conditions, in particular the number of available CPU cores for the OpenCL kernel. We implement our dynamic scheduler on a real CPU+GPU processor and evaluate it using various OpenCL benchmarks. Compared to the state-ofthe- art kernel-level scheduling method, the proposed scheduler provides up to 31% and 10% improvements in runtime and energy, respectively.

在同一芯片上集成不同架构器件(CPU和GPU)的异构处理器为各种工作负载提供了良好的性能和能源效率。然而，就在适当的设备上调度工作负载而言，它们也带来了挑战和机遇。当前的调度实践主要使用内核工作负载的特征来决定CPU/GPU的映射。在本文中，我们首先提供了详细的红外成像结果，显示了映射决策对CPU+GPU处理器的热和功耗配置文件的影响。此外，我们观察到运行时条件(如传统工作负载的功率和CPU负载)也会影响映射决策。为了利用我们的观察结果，我们提出了在运行期间表征OpenCL内核工作负载的技术，并在时变的物理(即芯片功率限制)和CPU负载条件下将它们映射到适当的设备上，特别是OpenCL内核可用的CPU内核数量。我们在一个真正的CPU+GPU处理器上实现了动态调度器，并使用各种OpenCL基准测试对其进行了评估。与最先进的内核级调度方法相比，所提出的调度程序在运行时间和能源方面分别提供了31%和10%的改进。

{"title":"Scheduling challenges and opportunities in integrated CPU+GPU processors","authors":"K. Dev, S. Reda","doi":"10.1145/2993452.2994307","DOIUrl":"https://doi.org/10.1145/2993452.2994307","url":null,"abstract":"Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping. In this paper we first provide detailed infrared imaging results that show the impact of mapping decisions on the thermal and power profiles of CPU+GPU processors. Furthermore, we observe that runtime conditions such as power and CPU load from traditional workloads also affect the mapping decision. To exploit our observations, we propose techniques to characterize the OpenCL kernel workloads during run-time and map them on appropriate device under time-varying physical (i.e., chip power limit) and CPU load conditions, in particular the number of available CPU cores for the OpenCL kernel. We implement our dynamic scheduler on a real CPU+GPU processor and evaluate it using various OpenCL benchmarks. Compared to the state-ofthe- art kernel-level scheduling method, the proposed scheduler provides up to 31% and 10% improvements in runtime and energy, respectively.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116692105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Rapid precedent-aware pedestrian and car classification on constrained IoT platforms 基于受限物联网平台的快速先例感知行人和汽车分类

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

Pub Date : 2016-10-01 DOI: 10.1145/2993452.2993562

J. Danner, L. Wills, E. M. Ruiz, L. Lerner

Demand for computer vision analytics in the embedded world has increased rapidly as the Internet of Things (IoT) expands into cities, workplaces, and homes. Common computationally intensive video and scene analysis tasks, such as pedestrian detection, counting, and tracking, are often relegated to acceleration hardware, or embedded GPUs. This paper showcases decision-making heuristics designed to improve the performance of these analytics. Working within the constraints of low power IoT infrastructure typically utilized in urban, traffic-heavy environments, our Precedent-Aware Classification (PAC) framework provides efficient pedestrian and vehicle detection in the absence of dedicated acceleration hardware. Our implementation takes advantage of frequently traveled routes in order to reduce the amount of required computation, which helps meet the tight timing requirements of embedded platforms where traditional computation models tend to fail. Testing and performance analysis of PAC was done using an ARM Cortex-A9 embedded processor, residing within the Xilinx Zynq 7000 FPGA. In normally populated traffic situations, PAC produced an average 3.23x speed-up and an average 16% improvement in pedestrian detection accuracy over using traditional classifiers alone.

随着物联网(IoT)扩展到城市、工作场所和家庭，嵌入式世界对计算机视觉分析的需求迅速增长。常见的计算密集型视频和场景分析任务，如行人检测、计数和跟踪，通常被降级为加速硬件或嵌入式gpu。本文展示了决策启发法，旨在提高这些分析的性能。在通常用于城市交通繁忙环境的低功耗物联网基础设施的限制下，我们的先例感知分类(PAC)框架在没有专用加速硬件的情况下提供有效的行人和车辆检测。我们的实现利用了频繁旅行的路线来减少所需的计算量，这有助于满足嵌入式平台对时间的严格要求，而传统的计算模型往往会失败。PAC的测试和性能分析是使用ARM Cortex-A9嵌入式处理器完成的，该处理器位于Xilinx Zynq 7000 FPGA中。在正常拥挤的交通情况下，与单独使用传统分类器相比，PAC的行人检测准确率平均提高了3.23倍，平均提高了16%。

{"title":"Rapid precedent-aware pedestrian and car classification on constrained IoT platforms","authors":"J. Danner, L. Wills, E. M. Ruiz, L. Lerner","doi":"10.1145/2993452.2993562","DOIUrl":"https://doi.org/10.1145/2993452.2993562","url":null,"abstract":"Demand for computer vision analytics in the embedded world has increased rapidly as the Internet of Things (IoT) expands into cities, workplaces, and homes. Common computationally intensive video and scene analysis tasks, such as pedestrian detection, counting, and tracking, are often relegated to acceleration hardware, or embedded GPUs. This paper showcases decision-making heuristics designed to improve the performance of these analytics. Working within the constraints of low power IoT infrastructure typically utilized in urban, traffic-heavy environments, our Precedent-Aware Classification (PAC) framework provides efficient pedestrian and vehicle detection in the absence of dedicated acceleration hardware. Our implementation takes advantage of frequently traveled routes in order to reduce the amount of required computation, which helps meet the tight timing requirements of embedded platforms where traditional computation models tend to fail. Testing and performance analysis of PAC was done using an ARM Cortex-A9 embedded processor, residing within the Xilinx Zynq 7000 FPGA. In normally populated traffic situations, PAC produced an average 3.23x speed-up and an average 16% improvement in pedestrian detection accuracy over using traditional classifiers alone.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121508268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

On detecting and using memory phases in multimedia systems 多媒体系统中存储相位的检测与利用

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

Pub Date : 2016-10-01 DOI: 10.1145/2993452.2993566

H. Tajik, Bryan Donyanavard, N. Dutt

Many multimedia applications exhibit a phasic behavior. Phasic behavior of applications has been studied primarily focused on code execution. However, temporal variation in an application's memory usage can deviate from its program behavior, providing opportunities to exploit these memory phases to enable more efficient use of on-chip memory resources. In this work, we define memory phases as opposed to program phases, and illustrate the potential disparity between them. We propose mechanisms for light-weight online memory-phase detection. Additionally, we demonstrate their utility by deploying these techniques for sharing distributed on-chip Scratchpad Memories (SPMs) in multi-core platforms. The information gathered during memory phases are used to prioritize different memory pages in a multi-core platform without having any prior knowledge about running applications. By exploiting memory-phasic behavior, we achieved up to 45% memory access latency improvement on a set of multimedia applications.

许多多媒体应用程序表现出相位行为。应用程序的相位行为研究主要集中在代码执行上。但是，应用程序内存使用的时间变化可能会偏离其程序行为，从而提供了利用这些内存阶段来更有效地使用片上内存资源的机会。在这项工作中，我们定义了内存阶段，而不是程序阶段，并说明了它们之间的潜在差异。我们提出了一种轻量级在线记忆相位检测机制。此外，我们通过在多核平台上部署这些技术来共享分布式片上Scratchpad memory (spm)来展示它们的实用性。在内存阶段收集的信息用于确定多核平台中不同内存页的优先级，而无需事先了解运行的应用程序。通过利用内存阶段行为，我们在一组多媒体应用程序上实现了高达45%的内存访问延迟改进。

引用次数: 7

Multiprocessor scheduling of an SDF graph with library tasks considering the worst case contention delay 考虑最坏情况下争用延迟的带有库任务的SDF图的多处理器调度

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

Pub Date : 2016-10-01 DOI: 10.1145/2993452.2993567

Hanwoong Jung, Hyunok Oh, S. Ha

Recently a novel extension of a dataflow model with a library task has been proposed to overcome the severe limitation of dataflow models to handle shared resources. The library task that contains library functions and shared data inside plays the role of a server task when dataflow tasks as clients call library functions. In this paper, we propose a meta-heuristic technique based on a multi-objective genetic algorithm to find Pareto-optimal solutions in terms of resource requirement and the worst-case response time (WCRT) of the extended synchronous dataflow (SDF) graph with library tasks. For a given task graph, the proposed technique determines not only the mapping and scheduling in a heterogeneous multiprocessor system, but also task priorities and library task duplication. When multiple tasks request the service of the library task simultaneously, a task may experience a significant contention delay. For fast design space exploration, a fast and conservative method to estimate the contention delay of library tasks is devised. With synthetic examples and two real-life applications, the viability of the proposed technique is verified.

为了克服数据流模型在处理共享资源方面的严重限制，最近提出了一种基于库任务的数据流模型扩展。当数据流任务作为客户端调用库函数时，包含库函数和内部共享数据的库任务扮演服务器任务的角色。本文提出了一种基于多目标遗传算法的元启发式技术，用于寻找具有库任务的扩展同步数据流(SDF)图的资源需求和最坏情况响应时间(WCRT)的帕累托最优解。对于给定的任务图，该技术不仅确定了异构多处理器系统中的映射和调度，而且确定了任务优先级和库任务重复。当多个任务同时请求库任务的服务时，一个任务可能会遇到明显的争用延迟。为了快速探索设计空间，提出了一种快速保守估计库任务争用时延的方法。通过综合算例和两个实际应用，验证了该技术的可行性。

{"title":"Multiprocessor scheduling of an SDF graph with library tasks considering the worst case contention delay","authors":"Hanwoong Jung, Hyunok Oh, S. Ha","doi":"10.1145/2993452.2993567","DOIUrl":"https://doi.org/10.1145/2993452.2993567","url":null,"abstract":"Recently a novel extension of a dataflow model with a library task has been proposed to overcome the severe limitation of dataflow models to handle shared resources. The library task that contains library functions and shared data inside plays the role of a server task when dataflow tasks as clients call library functions. In this paper, we propose a meta-heuristic technique based on a multi-objective genetic algorithm to find Pareto-optimal solutions in terms of resource requirement and the worst-case response time (WCRT) of the extended synchronous dataflow (SDF) graph with library tasks. For a given task graph, the proposed technique determines not only the mapping and scheduling in a heterogeneous multiprocessor system, but also task priorities and library task duplication. When multiple tasks request the service of the library task simultaneously, a task may experience a significant contention delay. For fast design space exploration, a fast and conservative method to estimate the contention delay of library tasks is devised. With synthetic examples and two real-life applications, the viability of the proposed technique is verified.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"26 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124636070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Real-time pedestrian detection and tracking on customized hardware 实时行人检测和跟踪定制硬件

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

Pub Date : 2016-10-01 DOI: 10.1145/2993452.2995268

Junbin Wang, Ke Yan, Kaiyuan Guo, Jincheng Yu, Lingzhi Sui, Song Yao, Song Han, Yu Wang

Real-time pedestrian detection and tracking are vital to many applications, such as the interaction between drones and human. However, the high complexity of Convolutional Neural Network (CNN) makes them rely on powerful servers, thus is hard for mobile platforms like drones. In this paper, we propose a CNN-based real-time pedestrian detection and tracking system, which can achieve 14.7 fps detection and 200 fps tracking with only 3W.

实时行人检测和跟踪对于许多应用至关重要，例如无人机与人之间的交互。然而，卷积神经网络(CNN)的高复杂性使其依赖于强大的服务器，因此很难用于无人机等移动平台。本文提出了一种基于cnn的实时行人检测与跟踪系统，该系统仅用3W即可实现14.7 fps的检测和200 fps的跟踪。

引用次数: 6

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀