Pub Date : 2017-09-27DOI: 10.1109/DASIP.2017.8122119
M. Martelli, N. Gac, A. Mérigot, C. Enderli
This paper deals with the evaluation of FPGAs resurgence for hardware acceleration applied to computed tomography on the back-projection operator used in iterative reconstruction algorithms. We focus our attention on the tools developed by FPGAs manufacturers, in particular the Intel FPGA SDK for OpenCL, that promises a new level of hardware abstraction from the developer's perspective, allowing a software-like programming of FPGAs. For this purpose, we start with evaluating different custom OpenCL implementations of the back-projection algorithm. With some clues on memory fetching and coalescing, we then further tune designs to improve performance. Finally, a comparison is made with GPU implementations, and a preliminary conclusion is drawn on FPGAs future in computed tomography.
{"title":"3D tomography back-projection parallelization on FPGAs using opencl","authors":"M. Martelli, N. Gac, A. Mérigot, C. Enderli","doi":"10.1109/DASIP.2017.8122119","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122119","url":null,"abstract":"This paper deals with the evaluation of FPGAs resurgence for hardware acceleration applied to computed tomography on the back-projection operator used in iterative reconstruction algorithms. We focus our attention on the tools developed by FPGAs manufacturers, in particular the Intel FPGA SDK for OpenCL, that promises a new level of hardware abstraction from the developer's perspective, allowing a software-like programming of FPGAs. For this purpose, we start with evaluating different custom OpenCL implementations of the back-projection algorithm. With some clues on memory fetching and coalescing, we then further tune designs to improve performance. Finally, a comparison is made with GPU implementations, and a preliminary conclusion is drawn on FPGAs future in computed tomography.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76824701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-27DOI: 10.1109/DASIP.2017.8122126
D. Gibson, N. Campbell
In this paper we present a novel close-to-sensor computational camera design. The hardware can be configured for a wide range of autonomous applications such as industrial inspection, binocular/stereo robotic vision, UAV navigation/control and biological vision analogues. Close coupling of the image sensor with computation, motor control and motion sensors enables low latency responses to changes in the visual field. An image processing pipeline that detects and processes regions containing space-time structural coherence, in order to reduce the transmission of redundant pixel data and stabilise selective imaging, is introduced. The pipeline is designed to exploit close-to-sensor processing of regions-of-interest (ROI) adaptively captured at high temporal rates (up to 1000 ROI/s) and at multiple spatial and temporal resolutions. Space-time structurally coherent macro blocks are detected using a novel temporal block matching approach; the high temporal sampling rate allows a monotonicity constraint to be enforced to efficiently assess confidence of matches. The robustness of the sparse motion estimation approach is demonstrated in comparison to a state-of-the-art optical flow algorithm and optimal Baysian grid-based filtering. A description of how the system can generate unsupervised training data for higher level multiple instance or deep learning systems is discussed.
{"title":"Adaptive space-time structural coherence for selective imaging","authors":"D. Gibson, N. Campbell","doi":"10.1109/DASIP.2017.8122126","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122126","url":null,"abstract":"In this paper we present a novel close-to-sensor computational camera design. The hardware can be configured for a wide range of autonomous applications such as industrial inspection, binocular/stereo robotic vision, UAV navigation/control and biological vision analogues. Close coupling of the image sensor with computation, motor control and motion sensors enables low latency responses to changes in the visual field. An image processing pipeline that detects and processes regions containing space-time structural coherence, in order to reduce the transmission of redundant pixel data and stabilise selective imaging, is introduced. The pipeline is designed to exploit close-to-sensor processing of regions-of-interest (ROI) adaptively captured at high temporal rates (up to 1000 ROI/s) and at multiple spatial and temporal resolutions. Space-time structurally coherent macro blocks are detected using a novel temporal block matching approach; the high temporal sampling rate allows a monotonicity constraint to be enforced to efficiently assess confidence of matches. The robustness of the sparse motion estimation approach is demonstrated in comparison to a state-of-the-art optical flow algorithm and optimal Baysian grid-based filtering. A description of how the system can generate unsupervised training data for higher level multiple instance or deep learning systems is discussed.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"31 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82521322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-27DOI: 10.1109/DASIP.2017.8122117
I. Wali, E. Casseau, A. Tisserand
For arithmetic circuits, Reduced-Precision Redundancy (RPR) is considered to be a viable alternative to Triple Modular Redundancy (TMR), as it offers significant power reduction. However, efficient implementation and assessment of hardware arithmetic operators with RPR is still a challenge. In this work we propose a lightweight RPR design methodology that exploits the capabilities of modern synthesis and simulation tools to simplify the design and verification of robust arithmetic operators. To demonstrate the effectiveness of the proposed framework we apply it to implement and compare two commonly used RPR schemes. Our experimental results show that the proposed framework simplifies the design and provides robustness indicators with a maximum coefficient of variation of 14.7% with a 3× experimentation speed-up at a cost of 25% computational effort compared to an exhaustive approach.
{"title":"An efficient framework for design and assessment of arithmetic operators with Reduced-Precision Redundancy","authors":"I. Wali, E. Casseau, A. Tisserand","doi":"10.1109/DASIP.2017.8122117","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122117","url":null,"abstract":"For arithmetic circuits, Reduced-Precision Redundancy (RPR) is considered to be a viable alternative to Triple Modular Redundancy (TMR), as it offers significant power reduction. However, efficient implementation and assessment of hardware arithmetic operators with RPR is still a challenge. In this work we propose a lightweight RPR design methodology that exploits the capabilities of modern synthesis and simulation tools to simplify the design and verification of robust arithmetic operators. To demonstrate the effectiveness of the proposed framework we apply it to implement and compare two commonly used RPR schemes. Our experimental results show that the proposed framework simplifies the design and provides robustness indicators with a maximum coefficient of variation of 14.7% with a 3× experimentation speed-up at a cost of 25% computational effort compared to an exhaustive approach.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"104 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80575433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/DASIP.2017.8122111
R. Lazcano, D. Madroñal, H. Fabelo, S. Ortega, R. Salvador, G. Callicó, E. Juárez, C. Sanz
This paper presents a study of the par alle lization possibilities of a Non-Linear Iterative Partial Least Squares algorithm and its adaptation to a Massively Parallel Processor Array manycore architecture, which assembles 256 cores distributed over 16 clusters. The aim of this work is twofold: first, to test the behavior of iterative, complex algorithms in a manycore architecture; and, secondly, to achieve real-time processing of hyperspectral images, which is fixed by the image capture rate of the hyperspectral sensor. Real-time is a challenging objective, as hyperspectral images are composed of extensive volumes of spectral information. This issue is usually addressed by reducing the image size prior to the processing phase itself. Consequently, this paper proposes an analysis of the intrinsic parallelism of the algorithm and its subsequent implementation on a manycore architecture. As a result, an average speedup of 13 has been achieved when compared to the sequential version. Additionally, this implementation has been compared with other state-of-the-art applications, outperforming them in terms of performance.
{"title":"Parallel implementation of an iterative PCA algorithm for hyperspectral images on a manycore platform","authors":"R. Lazcano, D. Madroñal, H. Fabelo, S. Ortega, R. Salvador, G. Callicó, E. Juárez, C. Sanz","doi":"10.1109/DASIP.2017.8122111","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122111","url":null,"abstract":"This paper presents a study of the par alle lization possibilities of a Non-Linear Iterative Partial Least Squares algorithm and its adaptation to a Massively Parallel Processor Array manycore architecture, which assembles 256 cores distributed over 16 clusters. The aim of this work is twofold: first, to test the behavior of iterative, complex algorithms in a manycore architecture; and, secondly, to achieve real-time processing of hyperspectral images, which is fixed by the image capture rate of the hyperspectral sensor. Real-time is a challenging objective, as hyperspectral images are composed of extensive volumes of spectral information. This issue is usually addressed by reducing the image size prior to the processing phase itself. Consequently, this paper proposes an analysis of the intrinsic parallelism of the algorithm and its subsequent implementation on a manycore architecture. As a result, an average speedup of 13 has been achieved when compared to the sequential version. Additionally, this implementation has been compared with other state-of-the-art applications, outperforming them in terms of performance.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88769431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/DASIP.2017.8122125
Florian Kastner, Benedikt Janßen, Sebastian Schwanewilms, M. Hübner
Today's industrial requirements regarding the ability of embedded devices used for decentralized automation are increasing. Industrial providers of automation equipment strive to make their products and thus, industrial plants, smarter to raise efficiency. This evolution is based on new technologies like machine learning, predictive maintenance, sensor fusion and advanced process controls. These techniques require performance and energy efficient hardware platforms supporting a fast execution of computational intensive algorithms in compliance with real-time constraints. Therefore, to achieve these targets in a cost-efficient manner, the sharing of hardware resources to implement advanced process controls or machine learning algorithms is beneficial. Further, if different institutions integrating intellectual property (IP) into a single platform a certain degree of isolation is mandatory to protect their IP against theft or manipulation. In this paper, we propose a rapid control prototyping platform supporting the sharing of resources in an isolated manner to evaluate new control or monitoring strategies on a single platform with the help of Linux Containers for process isolation, MQTT for interprocess communication, OPC UA for vertical integration and partial bitstreams.
{"title":"A rapid control prototyping platform methodology for decentralized automation","authors":"Florian Kastner, Benedikt Janßen, Sebastian Schwanewilms, M. Hübner","doi":"10.1109/DASIP.2017.8122125","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122125","url":null,"abstract":"Today's industrial requirements regarding the ability of embedded devices used for decentralized automation are increasing. Industrial providers of automation equipment strive to make their products and thus, industrial plants, smarter to raise efficiency. This evolution is based on new technologies like machine learning, predictive maintenance, sensor fusion and advanced process controls. These techniques require performance and energy efficient hardware platforms supporting a fast execution of computational intensive algorithms in compliance with real-time constraints. Therefore, to achieve these targets in a cost-efficient manner, the sharing of hardware resources to implement advanced process controls or machine learning algorithms is beneficial. Further, if different institutions integrating intellectual property (IP) into a single platform a certain degree of isolation is mandatory to protect their IP against theft or manipulation. In this paper, we propose a rapid control prototyping platform supporting the sharing of resources in an isolated manner to evaluate new control or monitoring strategies on a single platform with the help of Linux Containers for process isolation, MQTT for interprocess communication, OPC UA for vertical integration and partial bitstreams.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"325 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80354082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/DASIP.2017.8122124
Florian Fricke, André Werner, M. Hübner
The toolflow presented in this demo was created to generate CGRA overlay architectures from either algorithm definitions (mainly for evaluation) or from a simple definition format. The output of the toolchain is always the complete definition of the hardware in VHDL and supplemental files providing information regarding the configuration and the interfaces of the created hardware. In the demo, we show the complete process from the selection of an algorithm over the creation of the hardware definition and the generation of the HDL-files to the implemented FPGA design in the Xilinx Vivado software. The main reason for the implementation of the presented tools is the creation of real-world applications for evaluating dynamic-partial reconfiguration in the context of compute intensive tasks. The integration of reconfigurability into the designs is to be done either semi-automatically using the Xilinx tools or automatically using the TLUT/TCON-toolflow proposed by Ghent-University.
{"title":"Tool flow for automatic generation of architectures and test-cases to enable the evaluation of CGRAs in the context of HPC applications","authors":"Florian Fricke, André Werner, M. Hübner","doi":"10.1109/DASIP.2017.8122124","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122124","url":null,"abstract":"The toolflow presented in this demo was created to generate CGRA overlay architectures from either algorithm definitions (mainly for evaluation) or from a simple definition format. The output of the toolchain is always the complete definition of the hardware in VHDL and supplemental files providing information regarding the configuration and the interfaces of the created hardware. In the demo, we show the complete process from the selection of an algorithm over the creation of the hardware definition and the generation of the HDL-files to the implemented FPGA design in the Xilinx Vivado software. The main reason for the implementation of the presented tools is the creation of real-world applications for evaluating dynamic-partial reconfiguration in the context of compute intensive tasks. The integration of reconfigurability into the designs is to be done either semi-automatically using the Xilinx tools or automatically using the TLUT/TCON-toolflow proposed by Ghent-University.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"36 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90065921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/DASIP.2017.8122130
Lester Kalms, J. Rettkowski, Marc Hamme, D. Göhringer
An accurate and robust lane recognition is a key aspect for autonomous cars of the near future. This paper presents the design and implementation of a robust autonomous driving algorithm using the proven Viola-Jones object detection method for lane recognition. The Viola-Jones method is used to detect traffic cones that are located besides the road as it can be done in emergency situations. The positions of the traffic cones are analyzed to provide a model of the road. Based on this model, a vehicle is autonomously and safely driven through the emergency situation. The presented approach is implemented on a raspberry pi and evaluated using a driving simulator. For high resolution images with a size of 1920×1080 pixels, the execution time for object detection is less than 218 ms while a high detection rate is established. Furthermore, the planning and execution for autonomous driving requires only 0.55 ms.
{"title":"Robust lane recognition for autonomous driving","authors":"Lester Kalms, J. Rettkowski, Marc Hamme, D. Göhringer","doi":"10.1109/DASIP.2017.8122130","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122130","url":null,"abstract":"An accurate and robust lane recognition is a key aspect for autonomous cars of the near future. This paper presents the design and implementation of a robust autonomous driving algorithm using the proven Viola-Jones object detection method for lane recognition. The Viola-Jones method is used to detect traffic cones that are located besides the road as it can be done in emergency situations. The positions of the traffic cones are analyzed to provide a model of the road. Based on this model, a vehicle is autonomously and safely driven through the emergency situation. The presented approach is implemented on a raspberry pi and evaluated using a driving simulator. For high resolution images with a size of 1920×1080 pixels, the execution time for object detection is less than 218 ms while a high detection rate is established. Furthermore, the planning and execution for autonomous driving requires only 0.55 ms.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"84 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77308161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/DASIP.2017.8122120
Mohamad-Al-Fadl Rihani, Jean-Christophe Prévotet, F. Nouvel, M. Mroué, Y. Mohanna
In recent wireless networks, end-nodes are capable of detecting the existence of multiple wireless standards. In this context, it becomes very interesting to design an on-line reconfigurable communication system controlled by a Vertical Handover Algorithm (VHA) that allows selecting the best available wireless standard. In this demo, we propose implementing the Partial Reconfiguration (PR) technique on a platform based on ARM-FPGA SoC device to apply vertical handover between two wireless communication standards; WIFI and Wimax. The demo simulates the mobility of an end-node in an WIFI-WiMax network on a GUI Interface connected to a ZedBoard. On the board, the VHA senses specific parameters and decides accordingly to reconfigure a unified chain before applying partial reconfiguration on the device.
{"title":"Demo: WIFI-WiMax vertical handover on an ARM-FPGA platform with partial reconfiguration","authors":"Mohamad-Al-Fadl Rihani, Jean-Christophe Prévotet, F. Nouvel, M. Mroué, Y. Mohanna","doi":"10.1109/DASIP.2017.8122120","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122120","url":null,"abstract":"In recent wireless networks, end-nodes are capable of detecting the existence of multiple wireless standards. In this context, it becomes very interesting to design an on-line reconfigurable communication system controlled by a Vertical Handover Algorithm (VHA) that allows selecting the best available wireless standard. In this demo, we propose implementing the Partial Reconfiguration (PR) technique on a platform based on ARM-FPGA SoC device to apply vertical handover between two wireless communication standards; WIFI and Wimax. The demo simulates the mobility of an end-node in an WIFI-WiMax network on a GUI Interface connected to a ZedBoard. On the board, the VHA senses specific parameters and decides accordingly to reconfigure a unified chain before applying partial reconfiguration on the device.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"83 10 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88074706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/DASIP.2017.8122108
M. Reichenbach, Philipp Holzinger, K. Häublein, T. Lieske, Paul Blinzer, D. Fey
Various signal and image processing applications require vast acceleration in order to enable real-time processing and meet constraints in power consumption. On FPGAs these applications can be implemented as application-specific circuit. Although IP cores for various applications exist, even interfacing these usually requires experienced knowledge in hardware design. Using FPGAs or other accelerators in a heterogeneous system from a host CPU would simplify the usage of accelerator hardware for a common software developer. Recognizing this, several companies and partners from academia created the HSA Foundation (Heterogeneous System Architecture Foundation) to define a platform specification for heterogeneous system requirements as a macro-architecture for efficient and easy targeting heterogeneous processors from popular high-level languages like C/C++, Python, Java and other domain specific languages. In this paper we present an IP library (LibHSA), that greatly simplifies integration of hardware accelerator functions into existing HSA compliant systems. This allows accelerators to take advantage of the existing HSA programming model, libraries, compilers and toolchains. We will demonstrate the work of LibHSA utilizing a programmable image processor implementation on a Xilinx FPGA. The image processor supports low-level algorithms, e.g. Sobel, Median, Laplace, or Gauss. Our results show a substantial decrease integrating customized hardware accelerators using the LibHSA infrastructure. To our knowledge, our library is the first approach for integrating reconfigurable hardware into an HSA compliant system.
{"title":"LibHSA: One step towards mastering the era of heterogeneous hardware accelerators using FPGAs","authors":"M. Reichenbach, Philipp Holzinger, K. Häublein, T. Lieske, Paul Blinzer, D. Fey","doi":"10.1109/DASIP.2017.8122108","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122108","url":null,"abstract":"Various signal and image processing applications require vast acceleration in order to enable real-time processing and meet constraints in power consumption. On FPGAs these applications can be implemented as application-specific circuit. Although IP cores for various applications exist, even interfacing these usually requires experienced knowledge in hardware design. Using FPGAs or other accelerators in a heterogeneous system from a host CPU would simplify the usage of accelerator hardware for a common software developer. Recognizing this, several companies and partners from academia created the HSA Foundation (Heterogeneous System Architecture Foundation) to define a platform specification for heterogeneous system requirements as a macro-architecture for efficient and easy targeting heterogeneous processors from popular high-level languages like C/C++, Python, Java and other domain specific languages. In this paper we present an IP library (LibHSA), that greatly simplifies integration of hardware accelerator functions into existing HSA compliant systems. This allows accelerators to take advantage of the existing HSA programming model, libraries, compilers and toolchains. We will demonstrate the work of LibHSA utilizing a programmable image processor implementation on a Xilinx FPGA. The image processor supports low-level algorithms, e.g. Sobel, Median, Laplace, or Gauss. Our results show a substantial decrease integrating customized hardware accelerators using the LibHSA infrastructure. To our knowledge, our library is the first approach for integrating reconfigurable hardware into an HSA compliant system.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"3 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84587559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/DASIP.2017.8122123
M. Kowalczyk, T. Kryjak, M. Gorgon
In this paper a hardware-software design of an object tracking system, which uses a moving camera is presented. The solution is implemented on the Zybo development board with the Zynq SoC (System on Chip) device from Xilinx. The object's position is used to control two servomotors, which constitute a pan-tilt mounting of the camera. The proposed system is able to process a 1280 × 720 @ 60 fps video stream in real-time and track moving objects.
{"title":"Object tracking with the use of a moving camera implemented in heterogeneous zynq SoC — A demo","authors":"M. Kowalczyk, T. Kryjak, M. Gorgon","doi":"10.1109/DASIP.2017.8122123","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122123","url":null,"abstract":"In this paper a hardware-software design of an object tracking system, which uses a moving camera is presented. The solution is implemented on the Zybo development board with the Zynq SoC (System on Chip) device from Xilinx. The object's position is used to control two servomotors, which constitute a pan-tilt mounting of the camera. The proposed system is able to process a 1280 × 720 @ 60 fps video stream in real-time and track moving objects.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"7 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88234467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}