2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)最新文献

英文中文

3D tomography back-projection parallelization on FPGAs using opencl 基于opencl的fpga三维断层扫描反投影并行化

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-27 DOI: 10.1109/DASIP.2017.8122119

M. Martelli, N. Gac, A. Mérigot, C. Enderli

This paper deals with the evaluation of FPGAs resurgence for hardware acceleration applied to computed tomography on the back-projection operator used in iterative reconstruction algorithms. We focus our attention on the tools developed by FPGAs manufacturers, in particular the Intel FPGA SDK for OpenCL, that promises a new level of hardware abstraction from the developer's perspective, allowing a software-like programming of FPGAs. For this purpose, we start with evaluating different custom OpenCL implementations of the back-projection algorithm. With some clues on memory fetching and coalescing, we then further tune designs to improve performance. Finally, a comparison is made with GPU implementations, and a preliminary conclusion is drawn on FPGAs future in computed tomography.

本文讨论了迭代重建算法中使用的反投影算子对计算机断层扫描硬件加速的fpga回弹效果的评价。我们将注意力集中在FPGA制造商开发的工具上，特别是面向OpenCL的英特尔FPGA SDK，从开发人员的角度来看，它承诺了一个新的硬件抽象层次，允许FPGA的软件编程。为此，我们首先评估反向投影算法的不同自定义OpenCL实现。有了一些关于内存获取和合并的线索，我们进一步调优设计以提高性能。最后，与GPU实现进行了比较，得出了fpga在计算机断层扫描中的应用前景。

引用次数: 2

Adaptive space-time structural coherence for selective imaging 选择性成像的自适应时空结构相干性

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-27 DOI: 10.1109/DASIP.2017.8122126

D. Gibson, N. Campbell

In this paper we present a novel close-to-sensor computational camera design. The hardware can be configured for a wide range of autonomous applications such as industrial inspection, binocular/stereo robotic vision, UAV navigation/control and biological vision analogues. Close coupling of the image sensor with computation, motor control and motion sensors enables low latency responses to changes in the visual field. An image processing pipeline that detects and processes regions containing space-time structural coherence, in order to reduce the transmission of redundant pixel data and stabilise selective imaging, is introduced. The pipeline is designed to exploit close-to-sensor processing of regions-of-interest (ROI) adaptively captured at high temporal rates (up to 1000 ROI/s) and at multiple spatial and temporal resolutions. Space-time structurally coherent macro blocks are detected using a novel temporal block matching approach; the high temporal sampling rate allows a monotonicity constraint to be enforced to efficiently assess confidence of matches. The robustness of the sparse motion estimation approach is demonstrated in comparison to a state-of-the-art optical flow algorithm and optimal Baysian grid-based filtering. A description of how the system can generate unsupervised training data for higher level multiple instance or deep learning systems is discussed.

本文提出了一种新颖的近传感器计算相机设计。硬件可以配置为广泛的自主应用，如工业检测，双目/立体机器人视觉，无人机导航/控制和生物视觉模拟。图像传感器与计算，电机控制和运动传感器的紧密耦合使得对视野变化的低延迟响应成为可能。为了减少冗余像素数据的传输和稳定选择性成像，介绍了一种检测和处理包含时空结构相干区域的图像处理管道。该管道旨在利用高时间速率(高达1000 ROI/s)和多空间和时间分辨率下自适应捕获的感兴趣区域(ROI)的近传感器处理。采用一种新颖的时间块匹配方法检测时空结构相干宏块;高时间采样率允许执行单调性约束，以有效地评估匹配的置信度。与最先进的光流算法和最优贝叶斯网格滤波相比，稀疏运动估计方法的鲁棒性得到了证明。讨论了该系统如何为高级多实例或深度学习系统生成无监督训练数据。

{"title":"Adaptive space-time structural coherence for selective imaging","authors":"D. Gibson, N. Campbell","doi":"10.1109/DASIP.2017.8122126","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122126","url":null,"abstract":"In this paper we present a novel close-to-sensor computational camera design. The hardware can be configured for a wide range of autonomous applications such as industrial inspection, binocular/stereo robotic vision, UAV navigation/control and biological vision analogues. Close coupling of the image sensor with computation, motor control and motion sensors enables low latency responses to changes in the visual field. An image processing pipeline that detects and processes regions containing space-time structural coherence, in order to reduce the transmission of redundant pixel data and stabilise selective imaging, is introduced. The pipeline is designed to exploit close-to-sensor processing of regions-of-interest (ROI) adaptively captured at high temporal rates (up to 1000 ROI/s) and at multiple spatial and temporal resolutions. Space-time structurally coherent macro blocks are detected using a novel temporal block matching approach; the high temporal sampling rate allows a monotonicity constraint to be enforced to efficiently assess confidence of matches. The robustness of the sparse motion estimation approach is demonstrated in comparison to a state-of-the-art optical flow algorithm and optimal Baysian grid-based filtering. A description of how the system can generate unsupervised training data for higher level multiple instance or deep learning systems is discussed.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"31 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82521322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An efficient framework for design and assessment of arithmetic operators with Reduced-Precision Redundancy 一种有效的低精度冗余算术运算符设计与评估框架

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-27 DOI: 10.1109/DASIP.2017.8122117

I. Wali, E. Casseau, A. Tisserand

For arithmetic circuits, Reduced-Precision Redundancy (RPR) is considered to be a viable alternative to Triple Modular Redundancy (TMR), as it offers significant power reduction. However, efficient implementation and assessment of hardware arithmetic operators with RPR is still a challenge. In this work we propose a lightweight RPR design methodology that exploits the capabilities of modern synthesis and simulation tools to simplify the design and verification of robust arithmetic operators. To demonstrate the effectiveness of the proposed framework we apply it to implement and compare two commonly used RPR schemes. Our experimental results show that the proposed framework simplifies the design and provides robustness indicators with a maximum coefficient of variation of 14.7% with a 3× experimentation speed-up at a cost of 25% computational effort compared to an exhaustive approach.

对于算术电路，降低精度冗余(RPR)被认为是三模冗余(TMR)的可行替代方案，因为它提供了显着的功耗降低。然而，利用RPR对硬件运算符进行有效的实现和评估仍然是一个挑战。在这项工作中，我们提出了一种轻量级的RPR设计方法，该方法利用现代综合和仿真工具的能力来简化鲁棒算术运算符的设计和验证。为了证明所提出的框架的有效性，我们应用它来实现和比较两种常用的RPR方案。实验结果表明，与穷举方法相比，所提出的框架简化了设计，并提供了最大变异系数为14.7%的鲁棒性指标，实验速度提高了3倍，计算工作量减少了25%。

引用次数: 2

Parallel implementation of an iterative PCA algorithm for hyperspectral images on a manycore platform 高光谱图像迭代PCA算法在多核平台上的并行实现

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122111

R. Lazcano, D. Madroñal, H. Fabelo, S. Ortega, R. Salvador, G. Callicó, E. Juárez, C. Sanz

This paper presents a study of the par alle lization possibilities of a Non-Linear Iterative Partial Least Squares algorithm and its adaptation to a Massively Parallel Processor Array manycore architecture, which assembles 256 cores distributed over 16 clusters. The aim of this work is twofold: first, to test the behavior of iterative, complex algorithms in a manycore architecture; and, secondly, to achieve real-time processing of hyperspectral images, which is fixed by the image capture rate of the hyperspectral sensor. Real-time is a challenging objective, as hyperspectral images are composed of extensive volumes of spectral information. This issue is usually addressed by reducing the image size prior to the processing phase itself. Consequently, this paper proposes an analysis of the intrinsic parallelism of the algorithm and its subsequent implementation on a manycore architecture. As a result, an average speedup of 13 has been achieved when compared to the sequential version. Additionally, this implementation has been compared with other state-of-the-art applications, outperforming them in terms of performance.

本文研究了非线性迭代偏最小二乘算法的并行化可能性及其对大规模并行处理器阵列多核架构的适应性，该架构由分布在16个集群上的256个核组成。这项工作的目的是双重的:首先，在多核架构中测试迭代的复杂算法的行为;其次，实现高光谱图像的实时处理，这是由高光谱传感器的图像捕获率决定的。实时是一个具有挑战性的目标，因为高光谱图像是由大量的光谱信息组成的。这个问题通常是通过在处理阶段之前减小图像尺寸来解决的。因此，本文分析了该算法的内在并行性及其在多核架构上的后续实现。因此，与顺序版本相比，平均加速提高了13。此外，还将此实现与其他最先进的应用程序进行了比较，在性能方面优于它们。

{"title":"Parallel implementation of an iterative PCA algorithm for hyperspectral images on a manycore platform","authors":"R. Lazcano, D. Madroñal, H. Fabelo, S. Ortega, R. Salvador, G. Callicó, E. Juárez, C. Sanz","doi":"10.1109/DASIP.2017.8122111","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122111","url":null,"abstract":"This paper presents a study of the par alle lization possibilities of a Non-Linear Iterative Partial Least Squares algorithm and its adaptation to a Massively Parallel Processor Array manycore architecture, which assembles 256 cores distributed over 16 clusters. The aim of this work is twofold: first, to test the behavior of iterative, complex algorithms in a manycore architecture; and, secondly, to achieve real-time processing of hyperspectral images, which is fixed by the image capture rate of the hyperspectral sensor. Real-time is a challenging objective, as hyperspectral images are composed of extensive volumes of spectral information. This issue is usually addressed by reducing the image size prior to the processing phase itself. Consequently, this paper proposes an analysis of the intrinsic parallelism of the algorithm and its subsequent implementation on a manycore architecture. As a result, an average speedup of 13 has been achieved when compared to the sequential version. Additionally, this implementation has been compared with other state-of-the-art applications, outperforming them in terms of performance.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88769431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A rapid control prototyping platform methodology for decentralized automation 一种用于分散自动化的快速控制原型平台方法

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122125

Florian Kastner, Benedikt Janßen, Sebastian Schwanewilms, M. Hübner

Today's industrial requirements regarding the ability of embedded devices used for decentralized automation are increasing. Industrial providers of automation equipment strive to make their products and thus, industrial plants, smarter to raise efficiency. This evolution is based on new technologies like machine learning, predictive maintenance, sensor fusion and advanced process controls. These techniques require performance and energy efficient hardware platforms supporting a fast execution of computational intensive algorithms in compliance with real-time constraints. Therefore, to achieve these targets in a cost-efficient manner, the sharing of hardware resources to implement advanced process controls or machine learning algorithms is beneficial. Further, if different institutions integrating intellectual property (IP) into a single platform a certain degree of isolation is mandatory to protect their IP against theft or manipulation. In this paper, we propose a rapid control prototyping platform supporting the sharing of resources in an isolated manner to evaluate new control or monitoring strategies on a single platform with the help of Linux Containers for process isolation, MQTT for interprocess communication, OPC UA for vertical integration and partial bitstreams.

今天的工业对用于分散自动化的嵌入式设备的能力的要求越来越高。自动化设备的工业供应商努力使他们的产品，从而使工业工厂更智能，以提高效率。这种演变是基于机器学习、预测性维护、传感器融合和先进过程控制等新技术。这些技术需要性能和能源高效的硬件平台，支持符合实时约束的计算密集型算法的快速执行。因此，为了以经济有效的方式实现这些目标，共享硬件资源以实现先进的过程控制或机器学习算法是有益的。此外，如果不同的机构将知识产权(IP)整合到单一平台中，则必须进行一定程度的隔离，以保护其知识产权免遭盗窃或操纵。在本文中，我们提出了一个快速控制原型平台，该平台支持以隔离的方式共享资源，以在单个平台上评估新的控制或监控策略，并借助Linux容器进行进程隔离，MQTT进行进程间通信，OPC UA进行垂直集成和部分比特流。

{"title":"A rapid control prototyping platform methodology for decentralized automation","authors":"Florian Kastner, Benedikt Janßen, Sebastian Schwanewilms, M. Hübner","doi":"10.1109/DASIP.2017.8122125","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122125","url":null,"abstract":"Today's industrial requirements regarding the ability of embedded devices used for decentralized automation are increasing. Industrial providers of automation equipment strive to make their products and thus, industrial plants, smarter to raise efficiency. This evolution is based on new technologies like machine learning, predictive maintenance, sensor fusion and advanced process controls. These techniques require performance and energy efficient hardware platforms supporting a fast execution of computational intensive algorithms in compliance with real-time constraints. Therefore, to achieve these targets in a cost-efficient manner, the sharing of hardware resources to implement advanced process controls or machine learning algorithms is beneficial. Further, if different institutions integrating intellectual property (IP) into a single platform a certain degree of isolation is mandatory to protect their IP against theft or manipulation. In this paper, we propose a rapid control prototyping platform supporting the sharing of resources in an isolated manner to evaluate new control or monitoring strategies on a single platform with the help of Linux Containers for process isolation, MQTT for interprocess communication, OPC UA for vertical integration and partial bitstreams.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"325 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80354082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Tool flow for automatic generation of architectures and test-cases to enable the evaluation of CGRAs in the context of HPC applications 用于自动生成架构和测试用例的工具流，以便在HPC应用程序的上下文中评估CGRAs

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122124

Florian Fricke, André Werner, M. Hübner

The toolflow presented in this demo was created to generate CGRA overlay architectures from either algorithm definitions (mainly for evaluation) or from a simple definition format. The output of the toolchain is always the complete definition of the hardware in VHDL and supplemental files providing information regarding the configuration and the interfaces of the created hardware. In the demo, we show the complete process from the selection of an algorithm over the creation of the hardware definition and the generation of the HDL-files to the implemented FPGA design in the Xilinx Vivado software. The main reason for the implementation of the presented tools is the creation of real-world applications for evaluating dynamic-partial reconfiguration in the context of compute intensive tasks. The integration of reconfigurability into the designs is to be done either semi-automatically using the Xilinx tools or automatically using the TLUT/TCON-toolflow proposed by Ghent-University.

创建此演示中提供的工具流是为了从算法定义(主要用于评估)或从简单的定义格式生成CGRA覆盖体系结构。工具链的输出始终是VHDL中硬件的完整定义和提供有关所创建硬件的配置和接口信息的补充文件。在演示中，我们展示了从选择算法到创建硬件定义和生成hdl文件到在Xilinx Vivado软件中实现FPGA设计的完整过程。实现这些工具的主要原因是创建用于在计算密集型任务上下文中评估动态部分重新配置的实际应用程序。将可重构性集成到设计中，可以使用Xilinx工具半自动地完成，也可以使用根特大学提出的TLUT/ tcon -工具流自动完成。

引用次数: 2

Robust lane recognition for autonomous driving 用于自动驾驶的鲁棒车道识别

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122130

Lester Kalms, J. Rettkowski, Marc Hamme, D. Göhringer

An accurate and robust lane recognition is a key aspect for autonomous cars of the near future. This paper presents the design and implementation of a robust autonomous driving algorithm using the proven Viola-Jones object detection method for lane recognition. The Viola-Jones method is used to detect traffic cones that are located besides the road as it can be done in emergency situations. The positions of the traffic cones are analyzed to provide a model of the road. Based on this model, a vehicle is autonomously and safely driven through the emergency situation. The presented approach is implemented on a raspberry pi and evaluated using a driving simulator. For high resolution images with a size of 1920×1080 pixels, the execution time for object detection is less than 218 ms while a high detection rate is established. Furthermore, the planning and execution for autonomous driving requires only 0.55 ms.

在不久的将来，准确而稳健的车道识别是自动驾驶汽车的一个关键方面。本文介绍了一种鲁棒自动驾驶算法的设计和实现，该算法使用经过验证的Viola-Jones物体检测方法进行车道识别。Viola-Jones方法用于检测位于道路之外的交通锥，因为它可以在紧急情况下进行。分析了交通锥的位置，给出了道路的模型。基于该模型，车辆可以在紧急情况下自动安全地行驶。所提出的方法在树莓派上实现，并使用驾驶模拟器进行了评估。对于大小为1920×1080像素的高分辨率图像，目标检测的执行时间小于218 ms，同时建立了较高的检测率。此外，自动驾驶的规划和执行只需要0.55 ms。

引用次数: 5

Demo: WIFI-WiMax vertical handover on an ARM-FPGA platform with partial reconfiguration 演示:部分重构的ARM-FPGA平台上WIFI-WiMax垂直切换

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122120

Mohamad-Al-Fadl Rihani, Jean-Christophe Prévotet, F. Nouvel, M. Mroué, Y. Mohanna

In recent wireless networks, end-nodes are capable of detecting the existence of multiple wireless standards. In this context, it becomes very interesting to design an on-line reconfigurable communication system controlled by a Vertical Handover Algorithm (VHA) that allows selecting the best available wireless standard. In this demo, we propose implementing the Partial Reconfiguration (PR) technique on a platform based on ARM-FPGA SoC device to apply vertical handover between two wireless communication standards; WIFI and Wimax. The demo simulates the mobility of an end-node in an WIFI-WiMax network on a GUI Interface connected to a ZedBoard. On the board, the VHA senses specific parameters and decides accordingly to reconfigure a unified chain before applying partial reconfiguration on the device.

在当前的无线网络中，终端节点能够检测多个无线标准的存在。在这种情况下，设计一个由垂直切换算法(VHA)控制的在线可重构通信系统变得非常有趣，该系统允许选择最佳可用的无线标准。在本演示中，我们提出在基于ARM-FPGA SoC器件的平台上实现部分重构(PR)技术，以实现两种无线通信标准之间的垂直切换;WIFI和Wimax。该演示在连接到ZedBoard的GUI界面上模拟WIFI-WiMax网络中终端节点的移动性。在单板上，VHA感知到特定的参数后，决定对统一链进行重新配置，然后再对设备进行部分重新配置。

引用次数: 0

LibHSA: One step towards mastering the era of heterogeneous hardware accelerators using FPGAs LibHSA:向使用fpga的异构硬件加速器时代迈进了一步

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122108

M. Reichenbach, Philipp Holzinger, K. Häublein, T. Lieske, Paul Blinzer, D. Fey

Various signal and image processing applications require vast acceleration in order to enable real-time processing and meet constraints in power consumption. On FPGAs these applications can be implemented as application-specific circuit. Although IP cores for various applications exist, even interfacing these usually requires experienced knowledge in hardware design. Using FPGAs or other accelerators in a heterogeneous system from a host CPU would simplify the usage of accelerator hardware for a common software developer. Recognizing this, several companies and partners from academia created the HSA Foundation (Heterogeneous System Architecture Foundation) to define a platform specification for heterogeneous system requirements as a macro-architecture for efficient and easy targeting heterogeneous processors from popular high-level languages like C/C++, Python, Java and other domain specific languages. In this paper we present an IP library (LibHSA), that greatly simplifies integration of hardware accelerator functions into existing HSA compliant systems. This allows accelerators to take advantage of the existing HSA programming model, libraries, compilers and toolchains. We will demonstrate the work of LibHSA utilizing a programmable image processor implementation on a Xilinx FPGA. The image processor supports low-level algorithms, e.g. Sobel, Median, Laplace, or Gauss. Our results show a substantial decrease integrating customized hardware accelerators using the LibHSA infrastructure. To our knowledge, our library is the first approach for integrating reconfigurable hardware into an HSA compliant system.

为了实现实时处理和满足功耗限制，各种信号和图像处理应用需要巨大的加速度。在fpga上，这些应用可以作为专用电路实现。尽管存在用于各种应用程序的IP核，但即使是对这些应用程序进行接口，通常也需要具有丰富的硬件设计知识。在来自主机CPU的异构系统中使用fpga或其他加速器将简化普通软件开发人员对加速器硬件的使用。认识到这一点，一些公司和来自学术界的合作伙伴创建了HSA基金会(异构系统架构基金会)，将异构系统需求定义为一个平台规范，作为一个宏观架构，用于高效、轻松地针对来自流行的高级语言(如C/ c++、Python、Java和其他领域特定语言)的异构处理器。在本文中，我们提出了一个IP库(LibHSA)，它大大简化了硬件加速器功能与现有HSA兼容系统的集成。这允许加速器利用现有的HSA编程模型、库、编译器和工具链。我们将利用Xilinx FPGA上的可编程图像处理器实现演示LibHSA的工作。图像处理器支持低级算法，例如索贝尔算法、中位数算法、拉普拉斯算法或高斯算法。我们的结果显示，使用LibHSA基础设施集成定制硬件加速器的成本大幅降低。据我们所知，我们的库是将可重构硬件集成到HSA兼容系统中的第一种方法。

{"title":"LibHSA: One step towards mastering the era of heterogeneous hardware accelerators using FPGAs","authors":"M. Reichenbach, Philipp Holzinger, K. Häublein, T. Lieske, Paul Blinzer, D. Fey","doi":"10.1109/DASIP.2017.8122108","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122108","url":null,"abstract":"Various signal and image processing applications require vast acceleration in order to enable real-time processing and meet constraints in power consumption. On FPGAs these applications can be implemented as application-specific circuit. Although IP cores for various applications exist, even interfacing these usually requires experienced knowledge in hardware design. Using FPGAs or other accelerators in a heterogeneous system from a host CPU would simplify the usage of accelerator hardware for a common software developer. Recognizing this, several companies and partners from academia created the HSA Foundation (Heterogeneous System Architecture Foundation) to define a platform specification for heterogeneous system requirements as a macro-architecture for efficient and easy targeting heterogeneous processors from popular high-level languages like C/C++, Python, Java and other domain specific languages. In this paper we present an IP library (LibHSA), that greatly simplifies integration of hardware accelerator functions into existing HSA compliant systems. This allows accelerators to take advantage of the existing HSA programming model, libraries, compilers and toolchains. We will demonstrate the work of LibHSA utilizing a programmable image processor implementation on a Xilinx FPGA. The image processor supports low-level algorithms, e.g. Sobel, Median, Laplace, or Gauss. Our results show a substantial decrease integrating customized hardware accelerators using the LibHSA infrastructure. To our knowledge, our library is the first approach for integrating reconfigurable hardware into an HSA compliant system.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"3 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84587559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Object tracking with the use of a moving camera implemented in heterogeneous zynq SoC — A demo 在异构zynq SoC - a演示中使用移动相机实现目标跟踪

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122123

M. Kowalczyk, T. Kryjak, M. Gorgon

In this paper a hardware-software design of an object tracking system, which uses a moving camera is presented. The solution is implemented on the Zybo development board with the Zynq SoC (System on Chip) device from Xilinx. The object's position is used to control two servomotors, which constitute a pan-tilt mounting of the camera. The proposed system is able to process a 1280 × 720 @ 60 fps video stream in real-time and track moving objects.

本文介绍了一种基于运动摄像机的目标跟踪系统的软硬件设计。该解决方案在Zybo开发板上使用赛灵思的Zynq SoC(片上系统)设备实现。物体的位置用来控制两个伺服电机，这两个伺服电机构成了相机的倾斜支架。该系统能够实时处理1280 × 720 @ 60 fps的视频流并跟踪运动物体。

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀